By Sara Colantonio, Researcher at ISTI-CNR
AI trustworthiness in prostate cancer: where we stand
Artificial intelligence (AI) is set to drive groundbreaking medical solutions and services that could transform the diagnosis and treatment in clinical pracise, with image-based diagnostics at the forefront of these advancements. ProCAncer-I is a brilliant example of how AI can enhance the entire imaging-based diagnostic and therapeutic continuum in oncology. The project is developing AI models designed to aid essential clinical tasks related to prostate cancer. By utilising the extensive imaging and clinical data available, the AI models will enable radiologists and oncologists to detect and distinguish indolent from aggressive cases, make early predictions of recurrence, and detect metastases, towards a more precise and personalised patients’ care.
As stated by M.D. Villanova, one of the project radiologists, “The possibility of having an intelligent prostate software means a great saving in analysis time, but also the most important is to improve the diagnostic accuracy to better detect cancer […]. These technological tools that the ProCAncer-I project is developing, give us, healthcare personnel, the opportunity to free up part of our working day, dedicated to healthcare visits, and use it to continue with research to improve the health of our patients”
In this direction, the largest collection of prostate cancer magnetic resonance images of its kind has been assembled to facilitate the development of highly effective and efficient AI models. At the same time, the partners involved fully recognise the importance of releasing AI models that can ensure reliability and clinical value, gain stakeholders’ trust and acceptance, and guarantee the complete safety of patients. This is the only way to obtain regulatory clearance and acceptance within the clinical community. Therefore, the project is focussed on creating a methodological framework and tools that endorse the trustworthiness and robustness of the AI models in development and undergoing testing.
Actually, despite the undeniable potential of AI, real-world adoption and deployment of AI-powered applications in clinical practice remains limited. Adoption barriers include perceived challenges to human autonomy, accountability and liability issues, potential biases and risks as well as excessive requirements in terms of effort and cognitive load and dissatisfaction with user interfaces. Overall, a general lack of trust is reported, which also seems to be linked to a lack of knowledge about the assumptions, limitations and capabilities of AI-based tools. From the perspective of citizens, a recent survey of more than 900 respondents in the United States showed that most had a positive view of AI’s ability to improve healthcare by making it much better (10.9%) or somewhat better (44.5%). However, the survey also revealed predictable concerns about potential misdiagnoses, privacy breaches, a reduction in time spent with clinicians and increased costs, with racial and ethnic minority groups expressing greater concern. The survey also found that most respondents would be very uncomfortable (31%) or somewhat uncomfortable (40.5%) with receiving a diagnosis from an AI algorithm that was 90% accurate but unable to explain its rationale. It is worth noting that trust is a complex, multidimensional construct that touches on technological, but also psycho-sociological, philosophical and ethical issues. A large body of literature has been devoted to defining and modelling trust in human interactions. Many of the attempts to ensure trust in AI have focused on the kind of characteristics AI applications should have in order to be considered trustworthy. ProCAncer-I moves in this direction.
In collaboration with the AI4HI (Artificial Intelligence for Health Imaging) cluster, ProCAncer-I has contributed to define the so-called “FUTURE-AI Guidelines: Guiding Principles and Consensus Recommendations for Trustworthy Artificial Intelligence in Medical Imaging”, which aim to provide guidance and concrete recommendations for developing future trustworthy AI solutions in medical imaging. The guidelines consist of six core principles:
(i) Fairness, which refers to creating equitable systems that do not promote discrimination and that perform well even on underrepresented sub-groups of the population. Alternatively, systems should at least make stakeholders aware of any limitations in this area;
(ii) Universality, which mandates the use of standard and verifiable approaches that can be easily deployed in settings with limited resources;
(iii) Traceability, which demands for transparent and traceable systems, with comprehensive tracking of details during both the development and usage phases;
(iv) Usability, which requires that AI systems integrate seamlessly and effectively into clinical processes;
(v) Robustness, which mandates for systems capable of generalizing and managing adverse situations; and
(vi) Explainability, which requires that AI systems provide end users with all the necessary elements to make safe and appropriate use of the system’s outputs and predictions.
ProCAncer-I’s AI approach aligns with the six FUTURE-AI principles, providing tailored solutions across multiple work packages. In terms of fairness, the GDPR-compliant data infrastructure securely stores fully anonymised data from various clinical centres. This implies that the available demographic data is restricted to patients’ age. Consequently, the fairness analyses are being conducted as sub-cohort analyses, by primarily considering the tumour severity classes and the data acquisition vendors and protocols. Their purpose is to determine whether there are any variations in performances across the different options. As far as universality is concerned, the multi-tier approach, based on the delivery of Master, Vendor-specific and Vendor-neutral models, aims to guarantee AI models’ reliability across a range of diagnostic resources. This approach is also relevant to ensure robustness of the models. Traceability is ensured through two approaches. Firstly, each AI model is equipped with its designated Model Passport, housed within a dedicated model registry. The passport presents all the necessary details in a uniform format, utilizing a metadata structure. The information includes the scope of the models, actors and developers involved, development tools, technical choices made, versions, performance metrics and deployment information. Furthermore, the Passport provides details on the data employed in the model’s training, incorporating data provenance and localization. Secondly, methods are being developed to track the performance of AI models over time after deployment in order to detect any data and concept drift. The Passport also encompasses pertinent information regarding this matter. The foundation of usability was established by determining nine use-cases that meet the requirements of clinical partners. A further elicitation exercise has been carried out to ascertain clinicians’ preferences regarding interaction and integration modalities. Various questions were posed to gain insights into the key features that a high-quality AI system should possess to effectively persuade and gain the trust of clinical stakeholders.
Questions covered a large range of topics from preferred performance metrics, sensitivity and sensitivity balance, to reading modalities and types of explanations offered by the AI system. This last point is also related to the principle of explainability. In fact, one of the still open challenges in explaining the results of an AI system is how to assess the quality of an explanation for different user groups, as there is still no common agreed understanding of what ‘explainability’ means from the end-user’s point of view. In this respect, a field study has been organised to gather their requirements and feedback to test their understanding and satisfaction with different types of explanations. Finally, in terms of robustness, we are also working to provide the outputs of the AI models with a confidence or certainty value that can be used by clinical end users to make a safe and confident use of them. The importance of such a certainty/uncertainty score has emerged as one of the key requirements from the elicitation exercise.
Thanks to feedback from clinical partners, an improved version is being circulated to members of the European Society of Oncology Imaging (ESOI) to gather a more comprehensive view of what an AI system should exhibit to be trusted by clinical end-users. The ProCAncer-I consortium aims to contribute to finding the most appropriate means to ensure that radiologists can benefit from the best that AI has to offer.