Carleton Comps Project: Search Engine

Project: Building a web search engine

Advisor: Dave Musicant

Meeting time: TTh 3:10-4:55

Final Results

I. Background

Google has made searching the web a snap. If you want to index your own intranet, Google will do so for a very large fee. There are a variety of open source web search engines available (ht://Dig, WAIS, etc.), though all seem to be lacking in one way or another. This project will involve creating a new web search engine that we will test on the Carleton intranet.

II. The Project

Here is a list of the concepts and technologies that will be necessary.

Web and database programming. PHP and MySQL will be used to create a dynamic web interface that is functional and attractive.
Java. There is a considerable amount of preprocessing that must be done so that the search engine responds immediately to a user's request. By using Java to do this, our tools will be cross-platform.
Algorithms. A number of techniques have been published for ranking web pages based not just on the keywords being searched for, but also on referring web pages. We will try some of these techniques to see which we are happier with.
Satisfaction survey. In order to determine how well our ranking techniques work relative to each other as well as relative to the standard Carleton search engine, we will survey users in a scientific manner to determine how they respond.
Parsing. We will need to parse HTML in order to find important words as well as for identifying links. Additional functionality can be provided for parsing non-text file formats such as PDF, DOC, etc.

The final project will be a completed collection of tools that can be used to set up a local search engine.

III. References

L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web.

S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Mining the link structure of the World Wide Web. IEEE Computer, August 1999.