Adenosine-to-inosine editing is one of the most frequent post-transcriptional modifications, manifested as A-to-G mismatches when comparing RNA sequences with their source DNA. Recently, a number of RNA-seq data sets have been screened for the presence of A-to-G editing, and hundreds of thousands of editing sites identified. Here we show that existing screens missed the majority of sites by ignoring reads with excessive ('hyper') editing that do not easily align to the genome. We show that careful alignment and examination of the unmapped reads in RNA-seq studies reveal numerous new sites, usually many more than originally discovered, and in precisely those regions that are most heavily edited. Specifically, we discover 327,096 new editing sites in the heavily studied Illumina Human BodyMap data and more than double the number of detected sites in several published screens. We also identify thousands of new sites in mouse, rat, opossum and fly. Our results establish that hyper-editing evnts account for the majority of editing sites.
Bibliographical noteFunding Information:
We thank Nurit Paz-Yaakov for help with experimental procedures, Lily Bazak for assistance with the Alu analysis, Shahar Alon for assistance with the miRNA data set and Oliver Keller for early work on the data. We thank Eli Eisenberg and Jin Billy Li for a critical reading of an early version of the manuscript. We thank the NICHD Brain and Tissue Bank for Developmental Disorders at the University of Maryland, Baltimore, MD, USA for providing a human brain tissue sample. S.C. thanks the Human Frontier Science Program for financial support. This work was supported by the Legacy Heritage Biomedical Science Partnership, Israel Science Foundation (grant no. 1466/10), the European Research Council (grant no. 311257) and the I-CORE Program of the Planning and Budgeting Committee in Israel, and the Israel Science Foundation (grants no. 41/11 and 1796/12).