Text Recognition in Images
Advisor: Jeff Ondich
I. Background
Optical
Character Recognition
(OCR) is the process of turning images of letters or words into digital text.
The earliest OCR systems were developed shortly after the invention
of the computer, and the subject has been an active area of research ever since.
For the simplest and most common application of OCR technology—converting
printed textual documents into text files—there are
dozens of
applications
available. OCR of scanned documents has also made possible projects
like
Project Gutenberg,
the conversion of medical records from paper to digital,
and the move to on-line legal search engines like
LexisNexis and
Westlaw.
As powerful and important as document scanning may be, OCR can be extended
to accomplish many other tasks that would otherwise require
human intervention. For example:
- Can we automatically identify these guys?
- Many new applications make use of geo-tagged objects (i.e. objects marked
with their geographical positions) such as
photographs, videos, blogs, news stories, etc. Last year's
augmented
reality comps project, for an example close to home, made use of geo-tagged Flickr
photos to give people easy access to photos taken near their current locations.
Although many photographs are automatically geo-tagged by the cameras that
take them, many are not. If you can find techniques for automatically
geo-tagging photographs, you can enhance the usefulness of any application
that puts such resources to use.
Where in the world, for example, are Jeff and David?
- Can you apply OCR to the computer's screen buffer to identify
the word currently under the mouse pointer? If you can, then you can
provide information about that word (e.g. a dictionary definition
or part of a wikipedia page) to the user via a tool-tip or
a pop-up window.
- With more communities placing surveillance cameras at busy intersections,
there is (rightly or wrongly) call for OCR for
reading license plates
in the video feeds.
II. The Project
For this project, you will develop an optical character recognition system,
including an infrastructure for applying OCR to problems like those
described above. During this project, you will:
- Research existing OCR libraries and algorithms.
- Design a uniform interface that will enable you to apply distinct
OCR libraries to the same problems. This will enable you to compare the performance
of your own OCR system to existing libraries (e.g. the
Tesseract library).
- Write an OCR library to, at minimum, recognize clearly printed black text on a
white background.
- Apply your OCR library to one or more of the applications listed above (or
similar applications of your own devising).
- Evaluate the effectiveness of your library compared to existing libraries.
III. References
In the fall, you'll work with a librarian to do a thorough literature
search to find out what others have done in this area. In the meantime, here
are a few relevant resources.
- The Tesseract
open source OCR project.
- Optical Character Recognition,
by Line Eikvil. This apparently self-published article from 1993 is a
reasonably elementary introduction to OCR algorithms.
- S.Impedovo, L. Ottaviano & S.Occhiegro, "Optical Character Recognition -- A Survey",
International Journal of Pattern Recognition and Artificial Intelligence,
Vol. 5, 1991, pp. 1-24.
- Optical character recognition : an illustrated guide to the frontier, by Stephen V. Rice, George Nagy, Thomas A. Nartker. Kluwer Academic Publishers, Boston, 1999. This book is in
the Carleton library.