February 4, 2025

AI Pathology Foundation Models Struggle with Inter-Center Variability

Listen to this article as Podcast
0:00 / 0:00
AI Pathology Foundation Models Struggle with Inter-Center Variability
```html

Artificial Intelligence in Pathology: Challenges in the Robustness of Foundation Models

Artificial Intelligence (AI) Foundation Models (FMs) hold great potential for medical diagnostics, particularly in pathology. They promise to automate and accelerate the analysis of tissue samples, which could lead to faster and more accurate diagnoses. However, before these models can be used in clinical practice, their robustness to variations between different medical centers must be ensured. Differences in staining procedures, scanners, and other procedural aspects can lead to so-called "Medical Center Signatures" which influence the results of the AI models.

A recent study investigates the robustness of ten publicly available pathology FMs. The results show that all examined models strongly represent the medical center from which the tissue sample originates. In nine out of ten cases, this representation was even stronger than that of biological features such as tissue type or cancer type. This finding raises the question of whether the models actually recognize biological features or merely learn the specific characteristics of the respective medical center.

To quantify the robustness of the FMs, a new metric was introduced in the study: the robustness index. This index indicates the extent to which biological features are represented in the model compared to the confounding "Medical Center Signatures." A robustness index greater than one means that the biological features dominate. Of the models examined, only one achieved a robustness index greater than one, and even this only marginally.

The study also analyzed the influence of this lack of robustness on the classification performance of downstream models. It showed that errors in cancer type classification do not occur randomly, but are caused by confounding factors from the same medical center. Images of other classes from the same center were mistakenly identified as belonging to the same class. This illustrates the importance of considering "Medical Center Signatures" for the development of reliable AI models in pathology.

The visualization of the embedding spaces of the FMs confirmed these results. The embeddings were organized more strongly by medical centers than by biological factors. Consequently, the center of origin of a tissue sample could be predicted more accurately than the tissue type or cancer type. This underscores the need to develop methods that minimize the impact of "Medical Center Signatures."

The introduction of the robustness index is an important step towards the clinical application of robust and reliable pathology FMs. The results of the study highlight the challenges that must be overcome in the development of such models, while also providing a basis for future research and development. The development of more robust models is crucial to realizing the full potential of AI in pathology and improving medical diagnostics.

Bibliography:

de Jong, E. D., Marcus, E., & Teuwen, J. (2025). Current Pathology Foundation Models are unrobust to Medical Center Differences. arXiv preprint arXiv:2501.18055.

Teuwen, J. (2024, January 29). Current Pathology Foundation Models are unrobust to Medical Center Differences. LinkedIn. https://www.linkedin.com/posts/jonasteuwen_current-pathology-foundation-models-are-unrobust-activity-7291079777015267331-qEej

Okamoto, A., & Hamamoto, R. (2024). Artificial intelligence in pathology. Journal of the Japan Medical Association, 77(7), 541–544. https://doi.org/10.31662/jmaj.2024-0206

Campanella, G., Hanna, M. G., Geneslaw, L., Miraflor, A., Silva, V. W. K. d., Busam, K. J., ... & Gerstner, E. R. (2024). Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature, 630(7990), 308–315.

Echle, A., Bennett, K. P., & Kapelner, A. (2025). Building a Foundation Model for Pathology. arXiv preprint arXiv:2407.11067.

Shamai, G., Campanella, G., Korshunov, A., ... & Gerstner, E. R. (2025). An attention-based deep learning model of whole slide images for survival prediction in glioblastoma. Modern Pathology, 38(1), 128-138.

Wozniak, M., Bulten, W., Gu, A., ... & de With, P. H. N. (2023). Robust and accurate multi-organ segmentation using a domain-agnostic, probabilistic deep learning model. Scientific Reports, 13(1), 16091.

Bulten, W., Spanhol, F. A., Kok, L. P., de With, P. H. N., & van der Laak, J. A. W. M. (2025). Deep learning for the histopathological assessment of prostate cancer aggressiveness: A systematic review and meta-analysis. Pathology, 57(1), 16-28.

Cruz-Roa, A., Basavanhally, A., González, F., Gilmore, H., Feldman, M., Ganesan, S., ... & Madabhushi, A. (2007). Automatic detection of invasive ductal carcinoma in whole slide images with convolutional neural networks. In Medical Imaging 2007: Computer-Aided Diagnosis (Vol. 6514, p. 651422). International Society for Optics and Photonics.

```