Predicting Tertiary Structure
The Tertiary Structure Problem:
Given a target protein t whose primary-structure is known and a set of proteins with known primary- and tertiary-structures, how can we determine the tertiary structure of t?
As it turns out, predicting the tertiary-structure of an unknown protein is quite hard. The exact position of each amino acid is determined not only by interactions between molecules in the protein, but also by interactions with the protein's environment. There exist algorithms that attempt to answer this question with first-principles — the basic chemical principles that govern the amino acids — but for this project we chose to focus on data mining algorithms.
Research led us to work by Karplus et al using Hidden Markov Models and evolutionary information to inform tertiary structure prediction. The paper Predicting Protein Structure using only Sequence Information describes an algorithm using SAM to align and compare proteins not only to each other but also to families of proteins. The authors go on to select from their best matches manually — we opted to focus exclusively on the computational aspects of the problem.
The Karplus paper takes advantage of the evolutionary relationships between proteins. As organisms evolve, there is significant pressure on the tertiary structure of a given protein to remain similar, if not the same because its tertiary structure allows it to participate in critical reactions. There is, however, no pressure to conserve primary structure. Mutations in primary structure can be tolerated as long as the tertiary structure remains functional.
This evolutionary relationship between proteins means that related organisms (humans and chimpanzees, for example) share proteins with similar tertiary structures that perform similar functions but are likely to have slightly different primary structures. By using SAM to compare the target protein to sets of related proteins, Karplus et al are able to identify the structure of more complex proteins.
Our implementation of this work is in sam.py
in
the prediction
directory.