Overview
Computational protein folding is the study of how to predict the three-dimensional structure of a protein given only its primary structure and a database of proteins with known primary-, secondary-, and tertiary-structures.
From transporting oxygen to our vital organs to digesting food, proteins play hundreds of crucial roles in our bodies. Proteins are composed of chains of amino acids, one bonded to the next. This linear form is called the primary structure of a protein. Complex interactions between molecules in the protein give rise to its tertiary structure: an extremely precise three-dimensional structure that allows the it to play precise roles in chemical reactions occurring in the body.
Within the tertiary structure of many proteins, there are two very common arrangements of amino acids that appear: α-helices and β-sheets. The detail and rigor of tertiary structure frequently necessitates approximation. When unable to usefully describe the tertiary structure of a protein, we often fall back to its secondary structure. The secondary structure of a protein can be formally thought of as a mapping from an amino acid in its primary structure to one of three secondary structures: the α-helix, β-sheet, or loop. Loops are not patterns but simply the fallback when no particular arrangement of amino acids is present.
Our comps project focused on the data-mining aspects of protein folding.
Namely, given a large set of proteins whose primary and tertiary structures are known, how can we predict the tertiary structure of a protein for which only the primary structure is known?
We investigated work by Karplus et al at the University of California at Santa Cruz who have published several papers that detail data-mining approaches to the protein folding problem. We also looked at Rost and Sander's artificial neural network based approach to secondary structure prediction, and Martin, Gibrat, and Rodolphe's use of hidden Markov models in predicting secondary structure.