Motivation: Secondary structures are key descriptors of a protein fold and its topology. In recent years, they facilitated intensive computational tasks for finding structural homologues, fold prediction and protein design. Their popularity stems from an appealing regularity in patterns of geometry and chemistry. However, the definition of secondary structures is of subjective nature. An unsupervised de-novo discovery of these structures would shed light on their nature, and improve the way we use these structures in algorithms of structural bioinformatics. Methods: We developed a new method for unsupervised partitioning of undirected graphs, based on patterns of small recurring network motifs. Our input was the network of all H-bonds and covalent interactions of protein backbones. This method can be also used for other biological and non-biological networks. Results: In a fully unsupervised manner, and without assuming any explicit prior knowledge, we were able to rediscover the existence of conventional α-helices, parallel β-sheets, anti-parallel sheets and loops, as well as various non-conventional hybrid structures. The relation between connectivity and crystallographic temperature factors establishes the existence of novel secondary structures.
Bibliographical noteFunding Information:
This work was partially funded by the Israel Ministry of Science and Technology. O.R. thanks Yossi Shaul for fruitful discussions about the problem of assigning secondary structures.