TY - JOUR
T1 - Imputing Phenotypes for Genome-wide Association Studies
AU - Hormozdiari, Farhad
AU - Kang, Eun Yong
AU - Bilow, Michael
AU - Ben-David, Eyal
AU - Vulpe, Chris
AU - McLachlan, Stela
AU - Lusis, Aldons J.
AU - Han, Buhm
AU - Eskin, Eleazar
N1 - Publisher Copyright:
© 2016 American Society of Human Genetics
PY - 2016/7/7
Y1 - 2016/7/7
N2 - Genome-wide association studies (GWASs) have been successful in detecting variants correlated with phenotypes of clinical interest. However, the power to detect these variants depends on the number of individuals whose phenotypes are collected, and for phenotypes that are difficult to collect, the sample size might be insufficient to achieve the desired statistical power. The phenotype of interest is often difficult to collect, whereas surrogate phenotypes or related phenotypes are easier to collect and have already been collected in very large samples. This paper demonstrates how we take advantage of these additional related phenotypes to impute the phenotype of interest or target phenotype and then perform association analysis. Our approach leverages the correlation structure between phenotypes to perform the imputation. The correlation structure can be estimated from a smaller complete dataset for which both the target and related phenotypes have been collected. Under some assumptions, the statistical power can be computed analytically given the correlation structure of the phenotypes used in imputation. In addition, our method can impute the summary statistic of the target phenotype as a weighted linear combination of the summary statistics of related phenotypes. Thus, our method is applicable to datasets for which we have access only to summary statistics and not to the raw genotypes. We illustrate our approach by analyzing associated loci to triglycerides (TGs), body mass index (BMI), and systolic blood pressure (SBP) in the Northern Finland Birth Cohort dataset.
AB - Genome-wide association studies (GWASs) have been successful in detecting variants correlated with phenotypes of clinical interest. However, the power to detect these variants depends on the number of individuals whose phenotypes are collected, and for phenotypes that are difficult to collect, the sample size might be insufficient to achieve the desired statistical power. The phenotype of interest is often difficult to collect, whereas surrogate phenotypes or related phenotypes are easier to collect and have already been collected in very large samples. This paper demonstrates how we take advantage of these additional related phenotypes to impute the phenotype of interest or target phenotype and then perform association analysis. Our approach leverages the correlation structure between phenotypes to perform the imputation. The correlation structure can be estimated from a smaller complete dataset for which both the target and related phenotypes have been collected. Under some assumptions, the statistical power can be computed analytically given the correlation structure of the phenotypes used in imputation. In addition, our method can impute the summary statistic of the target phenotype as a weighted linear combination of the summary statistics of related phenotypes. Thus, our method is applicable to datasets for which we have access only to summary statistics and not to the raw genotypes. We illustrate our approach by analyzing associated loci to triglycerides (TGs), body mass index (BMI), and systolic blood pressure (SBP) in the Northern Finland Birth Cohort dataset.
UR - http://www.scopus.com/inward/record.url?scp=84989871804&partnerID=8YFLogxK
U2 - 10.1016/j.ajhg.2016.04.013
DO - 10.1016/j.ajhg.2016.04.013
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 27292110
AN - SCOPUS:84989871804
SN - 0002-9297
VL - 99
SP - 89
EP - 103
JO - American Journal of Human Genetics
JF - American Journal of Human Genetics
IS - 1
ER -