Carleton Comps Project: Augmented Reality on a Mobile Phone

Final Results

Text Recognition in Images

Advisor: Jeff Ondich

I. Background

Optical Character Recognition (OCR) is the process of turning images of letters or words into digital text. The earliest OCR systems were developed shortly after the invention of the computer, and the subject has been an active area of research ever since. For the simplest and most common application of OCR technology—converting printed textual documents into text files—there are dozens of applications available. OCR of scanned documents has also made possible projects like Project Gutenberg, the conversion of medical records from paper to digital, and the move to on-line legal search engines like LexisNexis and Westlaw.

As powerful and important as document scanning may be, OCR can be extended to accomplish many other tasks that would otherwise require human intervention. For example:

Can we automatically identify these guys?
Many new applications make use of geo-tagged objects (i.e. objects marked with their geographical positions) such as photographs, videos, blogs, news stories, etc. Last year's augmented reality comps project, for an example close to home, made use of geo-tagged Flickr photos to give people easy access to photos taken near their current locations. Although many photographs are automatically geo-tagged by the cameras that take them, many are not. If you can find techniques for automatically geo-tagging photographs, you can enhance the usefulness of any application that puts such resources to use.

Where in the world, for example, are Jeff and David?
Can you apply OCR to the computer's screen buffer to identify the word currently under the mouse pointer? If you can, then you can provide information about that word (e.g. a dictionary definition or part of a wikipedia page) to the user via a tool-tip or a pop-up window.
With more communities placing surveillance cameras at busy intersections, there is (rightly or wrongly) call for OCR for reading license plates in the video feeds.

II. The Project

For this project, you will develop an optical character recognition system, including an infrastructure for applying OCR to problems like those described above. During this project, you will:

Research existing OCR libraries and algorithms.
Design a uniform interface that will enable you to apply distinct OCR libraries to the same problems. This will enable you to compare the performance of your own OCR system to existing libraries (e.g. the Tesseract library).
Write an OCR library to, at minimum, recognize clearly printed black text on a white background.
Apply your OCR library to one or more of the applications listed above (or similar applications of your own devising).
Evaluate the effectiveness of your library compared to existing libraries.

III. References

In the fall, you'll work with a librarian to do a thorough literature search to find out what others have done in this area. In the meantime, here are a few relevant resources.

The Tesseract open source OCR project.
Optical Character Recognition, by Line Eikvil. This apparently self-published article from 1993 is a reasonably elementary introduction to OCR algorithms.
S.Impedovo, L. Ottaviano & S.Occhiegro, "Optical Character Recognition -- A Survey", International Journal of Pattern Recognition and Artificial Intelligence, Vol. 5, 1991, pp. 1-24.
Optical character recognition : an illustrated guide to the frontier, by Stephen V. Rice, George Nagy, Thomas A. Nartker. Kluwer Academic Publishers, Boston, 1999. This book is in the Carleton library.