Tissue “fingerprinting” enables a deep-learning algorithm to predict biomarkers, prognosis, and theragnosis, set to improve understanding of cancer biology, and accelerate clinical workflows, new research finds.


The Lawrence J. Ellison Institute for Transformative Medicine of USC (“Ellison Institute”) and Oracle reveal a promising two-step technique to train a high-confidence predictive algorithm for enhanced cancer diagnostics. The study uses novel tissue “fingerprints”—discriminating microscopic hematoxylin and eosin (H&E) histologic features—of tumors paired with correct diagnoses to facilitate deep learning in the classification of breast cancer ER/PR/HER2 status.

The approach was able to achieve unprecedented diagnostic accuracy for an algorithm of its type and purpose while using less than a thousand annotated breast cancer pathology slides. The findings suggest that the algorithm’s ability to make correlations between a tumor’s architectural pattern and a correct diagnosis can ultimately help clinicians determine how a tumor will behave to a given treatment.

The study was facilitated by Oracle for Research, Oracle’s global program that provides selected researchers with access to Oracle Cloud technology, consulting and support, and participation in the Oracle research user community.

The research appears in Scientific Reports.

Challenges of medical machine learning

The challenge of developing artificial intelligence (AI) tools to diagnose cancer is that machine learning algorithms require clinically annotated data from tens of thousands of patients to analyze before they can recognize meaningful relationships in the data with consistency and high confidence. An ideal size dataset is nearly impossible to gather in cancer pathology.  Researchers training computers to diagnose cancer typically only have access to hundreds or low thousands of pathology slides annotated with correct diagnoses.

To overcome this limitation, the Ellison Institute scientists introduced a two-step process of priming the algorithm to identify unique patterns in cancerous tissue before teaching it the correct diagnoses.

“If you train a computer to reproduce what a person knows how to do, it’s never going to get far beyond human performance,” said lead author Rishi Rawat, Ph.D. “But if you train it on a task 10 times harder than anything a person could do you give it a chance to go beyond human capability. With tissue fingerprinting, we can train a computer to look through thousands of tumor images and recognize the visual features to identify an individual tumor. Through training, we have essentially evolved a computer eye that’s optimized to look at cancer patterns.”

The first step in the process introduces the concept of tissue “fingerprints,” or distinguishing architectural patterns in a tumor’s tissue, that an algorithm can use to discriminate between samples because no two patients’ tumors are identical.  These fingerprints are the result of biological variations such as the presence of signaling molecules and receptors that influence the 3D organization of a tumor. The study shows that AI spotted these fine, structural differentiations on pathology slides with greater accuracy and reliability than the human eye, and was able to recognize these variations without human guidance.

In this study, the research team took digital pathology images, split them in half, and prompted a machine-learning algorithm to pair them back together based on their molecular fingerprints.  This practice showcased the algorithm’s ability to group “same” and “different” pathology slides without paired diagnoses, which allowed the team to train the algorithm on large, unannotated datasets (a technique known as self-supervised learning).

“With clinically annotated pathology data in short supply, we must use it wisely when building classifiers,” said corresponding author Dan Ruderman, PhD., director of analytics and machine learning at the Ellison Institute. “Our work leveraged abundant unannotated data to find a reduced set of tumor features that can represent unique biology. Building classifiers upon the biology that these features represent enables us to efficiently focus the precious annotated data on clinical aspects.”

Once the model was trained to identify breast cancer tissue structure that distinguishes patients, the second step called upon its established grouping ability to learn which of those known patterns correlated to a particular diagnosis.  The discovery training set of 939 cases obtained from The Cancer Genome Atlas enabled the algorithm to accurately assign diagnostic categories of ER, PR, and Her2 status to whole-slide H&E images with 0.89 AUC (ER), 0.81 AUC (PR), and 0.79 AUC (HER2) on a large independent test set of 2531 breast cancer cases from the Australian Breast Cancer Tissue Bank.

While using Oracle Cloud technology, the study’s groundbreaking technique creates a new paradigm in medical machine learning, which may allow the future use of machine learning to process unannotated or unlabeled tissue specimens, as well as variably processed tissue samples, to assist pathologists in cancer diagnostics.

“Oracle for Research is thrilled to support and accelerate the Ellison Institute’s trailblazing discoveries through advanced cloud technology,” said Mamei Sun, Vice President, Oracle. “The Ellison Institute’s innovative use of machine learning and AI can revolutionize cancer research, treatment, and patient care – and ultimately improve the lives of many.”

Technique democratizes cancer diagnosis

In breast cancer, tumors that express a molecule called estrogen receptor look unique at the cellular level and fall into their own diagnostic category because they typically respond to anti-estrogen therapies.  Currently, pathologists must use chemical stains to probe biopsy samples for the presence of the estrogen receptor to make that diagnosis, and the process is time-consuming, expensive, and variable.

The established algorithm aims to improve pathologists’ accuracy and efficiency in a digital pathology workflow by directly analyzing tumor images to diagnose them as “estrogen receptor-positive” without staining specifically for estrogen receptor. The study’s results support the notion that the use of tissue “fingerprints” may allow for a direct treatment response prediction, potentially obviating the need for molecular staining approaches currently utilized in cancer theragnosis.

An exciting application of this technology lies in the possibility of deploying computer-assisted diagnostics in medically underserved regions and developing nations that lack expert pathologists, specialists, and the laboratory infrastructure to stain for molecular markers.

While the study suggests the additional investigation is warranted to gain a deeper understanding of AI’s ability to determine molecular status based on tissue architecture, it sets the stage for future applications where the technique could potentially aid in troubleshooting challenging tumor classification issues and enhance human pathologists’ abilities to arrive at correct diagnoses and better inform treatment decisions.

About this study

In addition to Rawat and Ruderman, other study authors include Itzel Ortega, Preeyam Roy and David Agus of the Ellison Institute; along with Ellison Institute affiliate Fei Sha of USC Michelson Center for Convergent Bioscience; and USC collaborator Darryl Shibata of the Norris Comprehensive Cancer Center at Keck School of Medicine.

The study’s computing resources were facilitated by Oracle Cloud Infrastructure through Oracle for Research, Oracle’s global program providing free cloud credits and technical support to researchers, and was supported in part by the Breast Cancer Research Foundation grant BCRF-18-002.  

In addition to his appointment at the Ellison Institute, Ruderman is an assistant professor of research medicine at USC’s Keck School of Medicine.