Personalized disease signatures through information-theoretic compaction of big cancer data

Swetha Vasudevan, Efrat Flashner-Abramson, F. Remacle, R. D. Levine*, Nataly Kravchenko-Balasha

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

16 Scopus citations


Every individual cancer develops and grows in its own specific way, giving rise to a recognized need for the development of personalized cancer diagnostics. This suggested that the identification of patient-specific oncogene markers would be an effective diagnostics approach. However, tumors that are classified as similar according to the expression levels of certain oncogenes can eventually demonstrate divergent responses to treatment. This implies that the information gained from the identification of tumor-specific biomarkers is still not sufficient. We present a method to quantitatively transform heterogeneous big cancer data to patient-specific transcription networks. These networks characterize the unbalanced molecular processes that deviate the tissue from the normal state. We study a number of datasets spanning five different cancer types, aiming to capture the extensive interpatient heterogeneity that exists within a specific cancer type as well as between cancers of different origins. We show that a relatively small number of altered molecular processes suffices to accurately characterize over 500 tumors, showing extreme compaction of the data. Every patient is characterized by a small specific subset of unbalanced processes. We validate the result by verifying that the processes identified characterize other cancer patients as well. We show that different patients may display similar oncogene expression levels, albeit carrying biologically distinct tumors that harbor different sets of unbalanced molecular processes. Thus, tumors may be inaccurately classified and addressed as similar. These findings highlight the need to expand the notion of tumor-specific oncogenic biomarkers to patient-specific, comprehensive transcriptional networks for improved patient-tailored diagnostics.

Original languageAmerican English
Pages (from-to)7694-7699
Number of pages6
JournalProceedings of the National Academy of Sciences of the United States of America
Issue number30
StatePublished - 24 Jul 2018

Bibliographical note

Publisher Copyright:
© 2018 National Academy of Sciences. All Rights Reserved.


  • Cancer diagnostics
  • Information theory
  • Intertumor heterogeneity
  • Patient-specific gene expression signatures
  • Surprisal analysis


Dive into the research topics of 'Personalized disease signatures through information-theoretic compaction of big cancer data'. Together they form a unique fingerprint.

Cite this