Identity inference of genomic data using long-range familial searches

Yaniv Erlich*, Tal Shor, Itsik Pe’er, Shai Carmi

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

196 Scopus citations


Consumer genomics databases have reached the scale of millions of individuals. Recently, law enforcement authorities have exploited some of these databases to identify suspects via distant familial relatives. Using genomic data of 1.28 million individuals tested with consumer genomics, we investigated the power of this technique. We project that about 60% of the searches for individuals of European descent will result in a third-cousin or closer match, which theoretically allows their identification using demographic identifiers. Moreover, the technique could implicate nearly any U.S. individual of European descent in the near future. We demonstrate that the technique can also identify research participants of a public sequencing project. On the basis of these results, we propose a potential mitigation strategy and policy implications for human subject research.

Original languageAmerican English
Pages (from-to)690-694
Number of pages5
Issue number6415
StatePublished - 9 Nov 2018

Bibliographical note

Publisher Copyright:
© 2018 American Association for the Advancement of Science. All rights reserved.


Dive into the research topics of 'Identity inference of genomic data using long-range familial searches'. Together they form a unique fingerprint.

Cite this