Mechanistic Interpretability Uncovers Biological Hypotheses in a Deep Learning Model for Breast Cancer Histology
Abstract
Predicting spatial gene expression from tissue images provides a scalable method for molecular profiling, but the deep learning models used are often opaque. This lack of internal transparency hampers clinical trust and obscures biological insights. Our research applies mechanistic interpretability to a state-of-the-art model, iStar. We extracted learned deep features from an intermediate layer and correlated them against traditional morphological traits, spatially-resolved gene expression, and expert cell-type annotations in breast cancer (n=167780 cells). This analysis identified a deep feature (cls45) that robustly represents proliferative invasive tumor cells (p<1e-100), characterized by high metabolic (SCD, r=0.60; FASN, r=0.59) and luminal (FOXA1, r=0.58) genes and a morphology of high nucleus-to-cytoplasm ratios (r=0.39) and small cell area (r=−0.24). A second feature (cls53) represents the immune-stromal microenvironment, corresponding to large, elongated cells (r=0.41) and expressing stromal (CCDC80, r=0.56) and chemokine (CXCL12, r=0.53) markers. Spatially, these features form coherent, mutually exclusive domains (Moran's I > 0.98) and are strongly anti-correlated (r=−0.50), demonstrating that the model learned the fundamental tumor-stroma architecture without explicit labels. By reverse-engineering deep features into biological concepts, we validate the model's reasoning and provide a workflow for generating testable hypotheses (e.g., a functional link between cls45’s specific visual phenotype and metabolic gene upregulation). This converts the predictive model from a black box into an interpretable tool for biological discovery, enabling the scalable identification of novel biomarkers.
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.