Abstract
We present a systematic analysis of the effects of synchronizing a large-scale, deeply characterized, multi-omic dataset to the current human reference genome, using updated software, pipelines, and annotations. For each of 5 molecular data platforms in The Cancer Genome Atlas (TCGA)—mRNA and miRNA expression, single nucleotide variants, DNA methylation and copy number alterations—comprehensive sample, gene, and probe-level studies were performed, towards quantifying the degree of similarity between the ‘legacy’ GRCh37 (hg19) TCGA data and its GRCh38 (hg38) version as ‘harmonized’ by the Genomic Data Commons. We offer gene lists to elucidate differences that remained after controlling for confounders, and strategies to mitigate their impact on biological interpretation. Our results demonstrate that the hg19 and hg38 TCGA datasets are very highly concordant, promote informed use of either legacy or harmonized omics data, and provide a rubric that encourages similar comparisons as new data emerge and reference data evolve. Gao et al. performed a systematic analysis of the effects of synchronizing the large-scale, widely used, multi-omic dataset of The Cancer Genome Atlas to the current human reference genome. For each of the five molecular data platforms assessed, they demonstrated a very high concordance between the ‘legacy’ GRCh37 (hg19) TCGA data and its GRCh38 (hg38) version as ‘harmonized’ by the Genomic Data Commons.
Original language | American English |
---|---|
Pages (from-to) | 24-34.e10 |
Journal | Cell Systems |
Volume | 9 |
Issue number | 1 |
DOIs | |
State | Published - 24 Jul 2019 |
Bibliographical note
Funding Information:We thank the U.S. National Cancer Institute for funding through grants 1U24CA210999-01, 1U24CA210974-01, 1U24CA211006-01, 1U24CA210949-01, 1U24CA210978-01, 1U24CA210952-01, 1U24CA210989-01, 1U24CA210957-01, 1U24CA210990-01, 1U24CA211000-01, 1U24CA210950-01, 1U24CA210969-01, and 1U24CA210988-01. We are grateful for advice and dialogue from numerous colleagues at our respective institutions, TCGA and GDAN collaborators, the technical support team from GDC; and especially the NCI Office of Cancer Genomics at NCI, for steadfast organizational support. H.L. and M.S.N. working group leaders; D.I.H. manuscript coordinator. Analyses of miRNA Expression: S.R. G.R. and T.K. section leaders; D.B. A.J.M. R.A. S.S. and M.S.N. contributors. Analyses of Somatic Copy Number Alterations: A.C. and G.G. section leaders; K.H. and M.S.N contributors. Analyses of DNA Methylation: B.P.B, W.Z. and T.C.S. section leaders; H.S. P.W.L. T.K. A.C. and M.S.N contributors. Analyses of mRNA Expression: J.P. and S.B. section leaders; K.H. S.C. D.H. and H.L. contributors. Analyses of Somatic Mutations: L.B.W. section leader; L.D. B.B. R.J. M.B. M.W. and H.L. contributors. All the authors contributed to the project administration and helpful discussion. The authors declare no competing interests.
Funding Information:
We thank the U.S. National Cancer Institute for funding through grants 1U24CA210999-01 , 1U24CA210974-01 , 1U24CA211006-01 , 1U24CA210949-01 , 1U24CA210978-01 , 1U24CA210952-01 , 1U24CA210989-01 , 1U24CA210957-01 , 1U24CA210990-01 , 1U24CA211000-01 , 1U24CA210950-01 , 1U24CA210969-01 , and 1U24CA210988-01 . We are grateful for advice and dialogue from numerous colleagues at our respective institutions, TCGA and GDAN collaborators, the technical support team from GDC; and especially the NCI Office of Cancer Genomics at NCI, for steadfast organizational support.
Publisher Copyright:
© 2019 The Authors
Keywords
- DNA methylation
- The Cancer Genome Atlas
- human reference genome
- mRNA expression
- microRNA expression
- quality control
- somatic copy number alteration
- somatic mutation