Bioinformatics is a fast growing interdisciplinary science that seeks to uncover knowledge from biological data using computational techniques. Genetic sequencing methods have improved rapidly over the last decade such that scientists now have a plethora of data that is often not easy to interpret or use in actual applications. This is where computer scientists can have a huge impact in discovering solutions to many important questions in biology.
One important application of bioinformatic techniques is to use genetic data to reconstruct accurate phylogenetic trees. A phylogenetic tree is a graphical representation showing the inferred evolutionary relationships among species or other entities that share a common ancestor. The only figure in Darwin's On the Origin of Species is a phylogenetic tree, and these trees remain of high importance to understanding many aspects of evolutionary biology today. Phylogenetic trees can be used to understand important evolutionary events in a population's past, such as lateral gene transfer or gene duplication events, and they can be used to find conserved genome sites and predict gene functions. A reconstructions of the phylogenetic tree of the influenza virus is what led to the ability to predict future strains; predictions that are used to prepare an effective flu shot each year.
In this project you will explore methods to reconstruct phylogenetic trees from various types of data. There are several algorithms used for tree construction, including character-based methods such as maximum parsimony, maximum likelihood, or compatibility, and distance-based methods such as additive trees, neighbor-joining, and others. Some success has also been found using evolutionary algorithms to construct trees.
You will implement several of these methods and compare/analyze the results. This in itself leads to more interesting problems, such as how to decide which tree is best among several options, or how to compare trees at all even if the correct tree is known a priori. Again there are several techniques to explore, including similarity measures, distance measures, and various algorithms that determine the consensus tree.
Further directions/extensions to the project may also include
You will work with a librarian in the fall to do a thorough literature search for currently used algorithms/techniques in phylogenetic reconstruction, but here are a few starting resources: