Implicit biology in peptide spectral libraries

Manor Askenazi*, Michal Linial

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Mass spectral libraries are collections of mass spectra curated specifically to facilitate the identification of small molecules, metabolites, and short peptides. One of the most comprehensive peptide spectral libraries is curated by NIST and contains upward of half a million annotated spectra dominated by human and model organisms including budding yeast and mouse. While motivated primarily by the technological goal of increasing sensitivity and specificity in spectral identification, we have found that the NIST spectral library constitutes a surprisingly rich source of biological knowledge. In this Article, we show that data-mining of these published libraries while applying strict empirical thresholds yields many characteristics of protein biology. In particular, we demonstrate that the size and increasingly comprehensive nature of these libraries, generated from whole-proteome digests, enables inference from the presence but crucially also from the absence of spectra for individual peptides. We illustrate implicit biological trends that lead to significant absence of spectra accounted for by complex post-translational modifications and overlooked proteolytic sites. We conclude that many subtle biological signatures such as genetic variants, regulated proteolysis, and post-translational modifications are exposed through the systematic mining of spectral collections originally compiled as general-purpose, technology-oriented resources.

Original languageEnglish
Pages (from-to)7919-7925
Number of pages7
JournalAnalytical Chemistry
Volume84
Issue number18
DOIs
StatePublished - 18 Sep 2012

Fingerprint

Dive into the research topics of 'Implicit biology in peptide spectral libraries'. Together they form a unique fingerprint.

Cite this