Identifying authors

Due midnight Monday 9/30/02. You may work in groups or individually.

What to do

Select and download corpora written by three authors with distinctive styles Also select more or less randomly at least five paragraphs apiece from other works written by these same authors, and set those paragraphs aside.

Now, devise and implement a scheme for using n-gram data from the three corpora to guess the author of a given paragraph. Feed your fifteen paragraphs into your guessing code and record the results. If your code computes some sort of measurement of similarity between the paragraph and each of the corpora, report the similarity measures as well.

What to hand in

Have fun, start early, and keep in touch.





Jeff Ondich, Department of Mathematics and Computer Science, Carleton College, Northfield, MN 55057, (507) 646-4364, jondich@carleton.edu