2023–24 Projects:
Advisor: Layla Oesper
Many human diseases have genetic origins. A gene is a sequence of DNA that codes for a protein that is associated with a particular biological function (you can think of genes and proteins as having a 1-1 relationship). A mutation, or aberration, that occurs in a gene and disrupts the normal function is called a disease gene. While some human diseases like cystic fibrosis or Huntington’s disease are caused by a mutation to a single gene, many other diseases such as cancer or diabetes are complex diseases and result from mutations to many genes. Identifying the genes associated with a particular disease is an essential step towards the development of treatments or diagnostics tests for these diseases.
Recent advances in DNA sequencing technologies have allowed for unprecedented analysis of the genetic origins of such complex diseases. However, the human genome has ~20,000 genes, so prioritization of candidate disease genes before experimental testing is essential. One recent approach to prioritizing such disease genes utilizes a particular type of biological data encoded in a large network or graph called a protein-protein interaction network (PPI). In these networks each vertex represents a protein (or the gene that codes for that protein) and each edge represents a pair of proteins that have been shown to have some form biological interaction. The idea is that the topology of a large PPI network and the location of the known disease genes in that network may be useful for identifying and ranking other potential disease genes. For example, a naive approach may be to rank genes based on how many known disease genes they have as direct neighbors.
(Left) Yeast Protein-Protein Interaction Network, (Right) Human Protein-Protein Interaction Network.
In this project you will investigate graph-based methods for disease gene prioritization using large-scale protein-protein interaction networks. In particular you will:
Enthusiasm for working on biologically relevant problems is encouraged, but previous biology experience is NOT required for this project. Experience working with large datasets will be useful, but not required. Other courses that could be useful for this project include algorithms, linear algebra, AI, data mining, and computational biology.
Below are a few papers about existing work in disease gene prioritization. These are only intended to provide you a minimal start for your literature search - they are certainly not the only nor necessarily the best sources for ideas. You will be finding and reading many additional papers!
Tuesday/Thursday 1:15pm - 2:15pm for Fall and Winter