When less is more: Improving classification of protein families with a minimal set of global features

Roy Varshavsky*, Menachem Fromer, Amit Man, Michal Linial

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

Sequence-derived structural and physicochemical features have been used to develop models for predicting protein families. Here, we test the hypothesis that high-level functional groups of proteins may be classified by a very small set of global features directly extracted from sequence alone. To test this, we represent each protein using a small number of normalized global sequence features and classify them into functional groups, using support vector machines (SVM). Furthermore, the contribution of specific subsets of features to the classification quality is thoroughly investigated. The representation of proteins using global features provides effective information for protein family classification, with comparable results to those obtained by representation using local sequence alignment scores. Furthermore, a combination of global and local sequence features significantly improves classification performance.

Original languageEnglish
Title of host publicationAlgorithms in Bioinformatics - 7th International Workshop, WABI 2007, Proceedings
PublisherSpringer Verlag
Pages12-24
Number of pages13
ISBN (Print)9783540741251
DOIs
StatePublished - 2007
Event7th International Workshop on Algorithms in Bioinformatics, WABI 2007 - PhiIadelphia, PA, United States
Duration: 8 Sep 20079 Sep 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4645 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference7th International Workshop on Algorithms in Bioinformatics, WABI 2007
Country/TerritoryUnited States
CityPhiIadelphia, PA
Period8/09/079/09/07

Keywords

  • Feature selection
  • Olfactory receptor
  • Porins protein family
  • Support vector machines (SVM)

Fingerprint

Dive into the research topics of 'When less is more: Improving classification of protein families with a minimal set of global features'. Together they form a unique fingerprint.

Cite this