CS 117: AI Author

Overview

Artificial intelligence is an active and exciting area of computer science. Can computers be made to do tasks that seem "intelligent"? Some of the goals (and partial successes) of artificial intelligence include important areas such as medical diagnosis, drug discovery, and robotics. For this assignment, we will visit the part of artificial intelligence referred to as "natural language processing," i.e. getting a computer to work with a natural language such as English.

More specifically, you will write a program called AIAuthor that will actually write original prose in the style of a chosen human author. Can computers actually be creative? We'll find out!

Details

Find a source text. Specifically, find a text file that contains some text that you want the computer to creatively imitate. Project Gutenberg is a great place to start.

The technique that you will use works conceptually as follows:

Here's an example of some text that my program generates, using a key of 5 on Grimm's fairy tales:

I will you learn home to taken his way, and around him, and left his come intender him in they cried: 'The little child,' said the other; 'let us for the cloak, and brough the peasant was to King.

Alternatively, when I tried it with a key of 8, here's what I got one of the times that I ran it:

The servants running across the stables to salt and butter to eat and drink, the head you must get up quickly, and fell fainting to the test.

Finally, you may find for this assignment that instead of using arrays, the ArrayList class will be easier. We will talk about this in class, but you can also read about it in your textbook.

Making it run faster

The above technique works, but you may find it runs somewhat slowly on large source texts. Get the program to work using the above technique first. If you submit a working version of the program using the above ideas, and your program is well written (style, etc.), you'll get 15 of the 20 points for the project. To get the remaining 5 points, implement the following ideas to speed up your program.

The main speed bottleneck in your program is due to the fact that in order to generate each character, you need to do a sequential search through the entire text to look for your seed. Your program would run considerably faster if you created a list of all the possible seeds in your program followed by the characters that come next, all in sorted order. You could then use a binary search to find your seed, and quickly find all characters that follow it.

The technique to clearly use, then, is to sort all of the seed strings (and the associated next characters) before starting. The catch is that the various sorts that we have already considered (insertion sort and selection sort) are too slow given the size of the source text. Try it and see! Instead, you should use a technique called radix sort, which works quite well when the data that you use all has the same length. We'll talk about radix sort in class, but you can also find a description of radix sort here. (This linked description shows how to do it for a string of digits where the only allowable characters are 0 through 9; you'll need to do it for a string of characters where any legitimate ASCII value is allowed.)

Final notes

Make sure that you use good object-oriented design. Think carefully about how to layout your program so that you use classes and objects appropriately, and so that your methods remain short and self-contained. Follow the course style guidelines.

Remember to get a slow version of the program working first before going for the fast one. Start early, and good luck!