Prostate cancer can vary greatly in how aggressive it is, and knowing this in advance helps doctors decide the best course of treatment. Magnetic resonance imaging (MRI) is often used to detect prostate cancer, but figuring out just how aggressive the cancer is remains tricky.
A new study from researchers at the Champalimaud Foundation (CF) investigated whether artificial intelligence (AI) could analyse MRI scans to reliably tell which cancers are less aggressive and which pose a greater risk. While AI showed promise, its accuracy varied significantly depending on the MRI scanner’s brand.
A Large, Multi-Centre Effort
The researchers gathered over 5,000 MRI scans from multiple hospitals, using different brands of MRI machines (Siemens, Philips, and GE), and, in some cases, a special device called an endorectal coil (ERC) to get clearer images. Each MRI scan was paired with confirmed lab results indicating how aggressive the cancer really was, based on what’s called the ISUP grade. This provided the researchers with a reliable benchmark—essentially the “correct answers”—to evaluate the accuracy of their AI models.
CF contributed patient data from the Champalimaud Clinical Centre, while also playing a key role in developing the AI tools used in the study. As senior author Nickolas Papanikolaou, head of CF’s Computational Clinical Imaging Lab, explains: “As a founding member of the ProCAncer-I Consortium, we had privileged access to a lot of data that made this study possible. CF is not only a data provider for the Consortium but also an active AI developer, advancing research on AI-driven diagnostics for prostate cancer.”
Training AI to Classify Prostate Cancer
The researchers’ main goal was to see if AI models could, on their own, distinguish low-grade from higher-grade tumours by “looking” at the entire prostate in each MRI—without radiologists first having to outline the tumour. They tested different “deep learning” models, from older, simpler ones (like VGG) to newer, more complex “transformers”. They also checked whether adding patient details—such as age and blood test results—would boost the models’ accuracy.
“We then compared how well the models worked when trained on MRIs from one brand of scanner versus data from all brands mixed together”, says Papanikolaou. “We also checked how using or not using the endorectal coil affected the models’ performance”.
Why Scanner Differences Matter
Overall, the AI models could generally tell low-grade from high-grade prostate cancers fairly well. The best model achieved 73% accuracy when tested on new data that came from the same scanners it had seen during training. However, the brand and setup of the MRI scanner had a clear impact on how well the models performed.
José Almeida, a postdoc at CF and first author of the study, notes: “The main finding of our study is that models did their best when tested on data that came from the same brand/scanner type used for training, but performance often dropped when the model was tested on a different brand or on scans taken with an endorectal coil if it hadn’t seen similar examples during training”.
Almeida continues: “Normally, if your model is performing well, as you increase the data you train it on, its performance improves. Indeed, we found this to be the case when training and testing models on the same brand. But we failed to see this happening when we trained the model on one brand, and tested on another”.
Surprisingly, adding patient details did not consistently improve the model’s ability to spot aggressive cancers. When the model was trained on data from all the different MRI machine brands, the AI performed more reliably across different brands, though endorectal coil scans remained a challenge. While increasing the amount of training data improved performance overall, it still did not completely erase the drop in accuracy when switching between scanner types.
Bigger Isn’t Always Better
Using AI to identify high-risk tumours could help some men with low-risk cancers avoid invasive procedures to determine cancer aggressiveness. Nevertheless, as this study illustrates, MRI scanner brand and the use of special coils can strongly affect performance. A large, multi-centre dataset—like the one used here—is crucial for capturing the realities of clinical practice and ensuring that AI tools deliver reliable results everywhere.
“The models aren’t perfect and could miss some aggressive cancers or flag some non-aggressive ones”, points out Papanikolaou. “Including more detailed or advanced information—like exact tumour locations or additional patient data—might further increase accuracy”. He also notes that because the data mostly came from European centres, it is unclear how well the models would perform in more diverse populations.
Almeida underscores the need for broader collaboration: “A large-scale study across the globe where doctors use these AI tools in real time will be essential to see how well they work in practice. We need to increase the diversity of test data, and motivate multi-centric studies. Big data does not solve the issues of low-diversity data!”.