A gamma mixture model better accounts for among site rate heterogeneity

Itay Mayrose, Nir Friedman, Tal Pupko*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

104 Scopus citations

Abstract

Motivation: Variation of substitution rates across nucleotide and amino acid sites has long been recognized as a characteristic of molecular sequence evolution. Evolutionary models that account for this rate heterogeneity usually use a gamma density function to model the rate distribution across sites. This density function, however, may not fit real datasets, especially when there is a multimodal distribution of rates. Here, we present a novel evolutionary model based on a mixture of gamma density functions. This model better describes the among-site rate variation characteristic of molecular sequence evolution. The use of this model may improve the accuracy of various phylogenetic methods, such as reconstructing phylogenetic trees, dating divergence events, inferring ancestral sequences and detecting conserved sites in proteins. Results: Using diverse sets of protein sequences we show that the gamma mixture model better describes the stochastic process underlying protein evolution. We show that the proposed gamma mixture model fits protein datasets significantly better than the single-gamma model in 9 out of 10 datasets tested. We further show that using the gamma mixture model improves the accuracy of model-based prediction of conserved residues in proteins.

Original languageEnglish
Pages (from-to)ii151-ii158
JournalBioinformatics
Volume21
Issue numberSUPPL. 2
DOIs
StatePublished - Sep 2005

Bibliographical note

Funding Information:
We thank D. Burstein, A. Doron-Faigenboim, M. Ninio, E. Privamn, N. Rubinstein and A. Stern for their insightful comments. This work is supported in part by ISF grant number 1208/04 to N.F. and T.P. T.P. was also supported by a grant in Complexity Science from the Yeshaia Horvitz Association.

Fingerprint

Dive into the research topics of 'A gamma mixture model better accounts for among site rate heterogeneity'. Together they form a unique fingerprint.

Cite this