Many human introns carry out a function, in the sense that they are critical to maintain normal cellular activity. Their identification is fundamental to understanding cellular processes and disease. However, being noncoding elements, such functional introns are poorly predicted based on traditional approaches of sequence and structure conservation. Here, we generated a dataset of human functional introns that carry out different types of functions. We showed that functional introns share common characteristics, such as higher positional conservation along the coding sequence and reduced loss rates, regardless of their specific function. A unique property of the data is that if an intron is unknown to be functional, it still does not mean that it is indeed non-functional. We developed a probabilistic framework that explicitly accounts for this unique property, and predicts which specific human introns are functional. We show that we successfully predict function even when the algorithm is trained on introns with a different type of function. This ability has many implications in studying regulatory networks, gene regulation, the effect of mutations outside exons on human disease, and on our general understanding of intron evolution and their functional exaptation in mammals.
Bibliographical notePublisher Copyright:
© 2017 The Author(s).