TY - JOUR
T1 - Deciphering eukaryotic gene-regulatory logic with 100 million random promoters
AU - de Boer, Carl G.
AU - Vaishnav, Eeshit Dhaval
AU - Sadeh, Ronen
AU - Abeyta, Esteban Luis
AU - Friedman, Nir
AU - Regev, Aviv
N1 - Publisher Copyright:
© 2019, The Author(s), under exclusive licence to Springer Nature America, Inc.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - How transcription factors (TFs) interpret cis-regulatory DNA sequence to control gene expression remains unclear, largely because past studies using native and engineered sequences had insufficient scale. Here, we measure the expression output of >100 million synthetic yeast promoter sequences that are fully random. These sequences yield diverse, reproducible expression levels that can be explained by their chance inclusion of functional TF binding sites. We use machine learning to build interpretable models of transcriptional regulation that predict ~94% of the expression driven from independent test promoters and ~89% of the expression driven from native yeast promoter fragments. These models allow us to characterize each TF’s specificity, activity and interactions with chromatin. TF activity depends on binding-site strand, position, DNA helical face and chromatin context. Notably, expression level is influenced by weak regulatory interactions, which confound designed-sequence studies. Our analyses show that massive-throughput assays of fully random DNA can provide the big data necessary to develop complex, predictive models of gene regulation.
AB - How transcription factors (TFs) interpret cis-regulatory DNA sequence to control gene expression remains unclear, largely because past studies using native and engineered sequences had insufficient scale. Here, we measure the expression output of >100 million synthetic yeast promoter sequences that are fully random. These sequences yield diverse, reproducible expression levels that can be explained by their chance inclusion of functional TF binding sites. We use machine learning to build interpretable models of transcriptional regulation that predict ~94% of the expression driven from independent test promoters and ~89% of the expression driven from native yeast promoter fragments. These models allow us to characterize each TF’s specificity, activity and interactions with chromatin. TF activity depends on binding-site strand, position, DNA helical face and chromatin context. Notably, expression level is influenced by weak regulatory interactions, which confound designed-sequence studies. Our analyses show that massive-throughput assays of fully random DNA can provide the big data necessary to develop complex, predictive models of gene regulation.
UR - http://www.scopus.com/inward/record.url?scp=85076087934&partnerID=8YFLogxK
U2 - 10.1038/s41587-019-0315-8
DO - 10.1038/s41587-019-0315-8
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 31792407
AN - SCOPUS:85076087934
SN - 1087-0156
VL - 38
SP - 56
EP - 65
JO - Nature Biotechnology
JF - Nature Biotechnology
IS - 1
ER -