Accounting for measurement error in biomarker data and misclassification of subtypes in the analysis of tumor data

Daniel Nevo*, David M. Zucker, Rulla M. Tamimi, Molin Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Scopus citations


A common paradigm in dealing with heterogeneity across tumors in cancer analysis is to cluster the tumors into subtypes using marker data on the tumor, and then to analyze each of the clusters separately. A more specific target is to investigate the association between risk factors and specific subtypes and to use the results for personalized preventive treatment. This task is usually carried out in two steps–clustering and risk factor assessment. However, two sources of measurement error arise in these problems. The first is the measurement error in the biomarker values. The second is the misclassification error when assigning observations to clusters. We consider the case with a specified set of relevant markers and propose a unified single-likelihood approach for normally distributed biomarkers. As an alternative, we consider a two-step procedure with the tumor type misclassification error taken into account in the second-step risk factor analysis. We describe our method for binary data and also for survival analysis data using a modified version of the Cox model. We present asymptotic theory for the proposed estimators. Simulation results indicate that our methods significantly lower the bias with a small price being paid in terms of variance. We present an analysis of breast cancer data from the Nurses' Health Study to demonstrate the utility of our method.

Original languageAmerican English
Pages (from-to)5686-5700
Number of pages15
JournalStatistics in Medicine
Issue number30
StatePublished - 30 Dec 2016

Bibliographical note

Publisher Copyright:
Copyright © 2016 John Wiley & Sons, Ltd.


  • classification
  • clustering
  • heterogeneity
  • measurement error
  • risk-factor analysis


Dive into the research topics of 'Accounting for measurement error in biomarker data and misclassification of subtypes in the analysis of tumor data'. Together they form a unique fingerprint.

Cite this