Network Analysis Algorithms for Disease Gene Prioritization
Biological research has made many advances in the last 50 years. We now have incredible amounts of data and no good way to use it to its full potential. This data takes the form of protein-protein interaction networks which are large graphs where nodes are proteins and edges represent an interaction between two proteins. Our research is on how graph analysis algorithms can be used to help biologists more efficiently research genetic diseases.
We implemented and explored the performance of 3 different algorithms, Page Rank, Random Walk with Restart, and Diffusion Kernel, on different diseases to try and predict previously unstudied candidate genes that might be related to these diseases. These algorithms were originally developed and used for very different purposes than computational biology. PageRank was developed by Google to rank webpages, Random Walk with Restart was designed for image segmentation, and Diffusion Kernel was designed to model heat flow. However, we can take these algorithms designed for more abstract tasks and apply them to the biological sciences. We used a large Protein-Protein interaction network with 20,000 nodes and 12 million interactions and sets of known disease genes to predict candidate genes for research for ischemic stroke, endometriosis, and lymphoma. In our presentation, we will discuss the biological data we used, the performance of the algorithms and their sensitivities to graph shape, and novel genes we predicted.