Structural variants (SVs) rearrange large segments of DNA1 and can have profound consequences in evolution and human disease2,3. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)4 have become integral in the interpretation of single-nucleotide variants (SNVs)5. However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25–29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage6. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings7. This SV resource is freely distributed via the gnomAD browser8 and will have broad utility in population genetics, disease-association studies, and diagnostic screening.
Bibliographical noteFunding Information:
Acknowledgements We thank the many individuals whose sequence data are aggregated in gnomAD for their contributions to research, and the users of gnomAD for their collaborative feedback. We are grateful to the families at the participating Simons Simplex Collection (SSC) sites, as well as the SSC principal investigators. We thank T. Hefferon of the NIH National Center for Biotechnology Information for his help hosting gnomAD-SV on dbVar. We have complied with all relevant ethical regulations. Research and contributing authors were supported by resources from the Broad Institute, the National Institutes of Health (NIH) (R01MH115957 to M.E.T., B.N. and D.G.M.; UM1HG008895 to M.J.D., B.N., S.G., E.S.L., S.K., M.E.T.; R01HD081256, P01GM061354, R01HD091797, R01HD096326, R01MH111776, R01HD099547 to M.E.T.; U01MH105669 to M.J.D., B.N. and M.E.T.; P50HD028138 to B.N. and M.E.T.; P01HD068250 to H.B.) and the Simons Foundation for Autism Research Initiative (SFARI #573206 to M.E.T.). R.L.C. was supported by NHGRI T32HG002295 and NSF GRFP #2017240332. H.B. was supported by NIDCR K99DE026824. A.V.K. was supported by NHGRI K08HG010155. M.E.T. was supported by Desmond and Ann Heathwood. MESA and the MESA SHARe project are conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with MESA investigators. Support for MESA is provided by contracts HHSN268201500003I, N01-HC-95159, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, UL1-TR-000040, UL1-TR-00107, and UL1-TR-001420. MESA family is conducted and supported by the NHLBI in collaboration with MESA investigators. Support is provided by grants and contracts R01HL071051, R01HL071205, R01HL071250, R01HL071251, R01HL071258 and R01HL071259, by the National Center for Research Resources, grant UL1RR033176, and the National Center for Advancing Translational Sciences ULTR001881, and the National Institute of Diabetes and Digestive and Kidney Disease Diabetes Research Center (DRC) grant DK063491 to the Southern California Diabetes Endocrinology Research Center.
© 2020, The Author(s).