TY - JOUR
T1 - Identification of GATC- And CCGG-recognizing Type II REases and their putative specificity-determining positions using Scan2S - A novel motif scan algorithm with optional secondary structure constraints
AU - Niv, Masha Y.
AU - Skrabanek, Lucy
AU - Roberts, Richard J.
AU - Scheraga, Harold A.
AU - Weinstein, Harel
PY - 2008/5/1
Y1 - 2008/5/1
N2 - Restriction endonucleases (REases) are DNA-cleaving enzymes that have become indispensable tools in molecular biology. Type II REases are highly divergent in sequence despite their common structural core, function and, in some cases, common specificities towards DNA sequences. This makes it difficult to identify and classify them functionally based on sequence, and has hampered the efforts of specificity-engineering. Here, we define novel REase sequence motifs, which extend beyond the PD-(D/E)XK hallmark, and incorporate secondary structure information. The automated search using these motifs is carried out with a newly developed fast regular expression matching algorithm that accommodates long patterns with optional secondary structure constraints. Using this new tool, named Scan2S, motifs derived from REases with specificity towards GATC- and CGGG-containing DNA sequences successfully identify REases of the same specificity. Notably, some of these sequences are not identified by standard sequence detection tools. The new motifs highlight potential specificity-determining positions that do not fully overlap for the GATC- and the CCGG-recognizing REases and are candidates for specificity re-engineering.
AB - Restriction endonucleases (REases) are DNA-cleaving enzymes that have become indispensable tools in molecular biology. Type II REases are highly divergent in sequence despite their common structural core, function and, in some cases, common specificities towards DNA sequences. This makes it difficult to identify and classify them functionally based on sequence, and has hampered the efforts of specificity-engineering. Here, we define novel REase sequence motifs, which extend beyond the PD-(D/E)XK hallmark, and incorporate secondary structure information. The automated search using these motifs is carried out with a newly developed fast regular expression matching algorithm that accommodates long patterns with optional secondary structure constraints. Using this new tool, named Scan2S, motifs derived from REases with specificity towards GATC- and CGGG-containing DNA sequences successfully identify REases of the same specificity. Notably, some of these sequences are not identified by standard sequence detection tools. The new motifs highlight potential specificity-determining positions that do not fully overlap for the GATC- and the CCGG-recognizing REases and are candidates for specificity re-engineering.
KW - Physicochemical properties
KW - Protein motif
KW - Regular expression
KW - Restriction endonucleases
KW - Secondary structure
KW - Specificity-determining positions
UR - http://www.scopus.com/inward/record.url?scp=41149161800&partnerID=8YFLogxK
U2 - 10.1002/prot.21777
DO - 10.1002/prot.21777
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 17972284
AN - SCOPUS:41149161800
SN - 0887-3585
VL - 71
SP - 631
EP - 640
JO - Proteins: Structure, Function and Genetics
JF - Proteins: Structure, Function and Genetics
IS - 2
ER -