Modeling dependencies in protein-DNA binding sites

Yoseph Barash, Gal Elidan, Nir Friedman*, Tommy Kaplan

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

171 Scopus citations

Abstract

The availability of whole genome sequences and high-throughput genomic assays opens the door for in silica analysis of transcription regulation. This includes methods for discovering and characterizing the binding sites of DNA-binding proteins, such as transcription factors. A common representation of transcription factor binding sites is a. position specific score matrix (PSSM). This representation makes the strong assumption that binding site positions are independent of each other. In this work, we explore Bayesian network representations of binding sites that provide different tradeoffs between complexity (number of parameters) and the richness of dependencies between positions. We develop the formal machinery for learning such models from data and for estimating the statistical significance of putative binding sites. We then evaluate the ramifications of these richer representations in characterizing binding site motifs and predicting their genomic locations. We show that these richer representations improve over the PSSM model in both tasks.

Original languageAmerican English
Pages28-37
Number of pages10
DOIs
StatePublished - 2003
EventSeventh Annual International Conference on Research in Computational Molecular Biology - Berlin, Germany
Duration: 10 Apr 200313 Apr 2003

Conference

ConferenceSeventh Annual International Conference on Research in Computational Molecular Biology
Country/TerritoryGermany
CityBerlin
Period10/04/0313/04/03

Keywords

  • Bayesian networks
  • DNA sequence motifs
  • Transcription factors binding sites

Fingerprint

Dive into the research topics of 'Modeling dependencies in protein-DNA binding sites'. Together they form a unique fingerprint.

Cite this