Skip Site NavigationCarleton College: Home
 
 

Overview

Artificial intelligence is an active and exciting area of computer science. Can computers be made to do tasks that seem "intelligent"? Some of the goals (and partial successes) of artificial intelligence include important areas such as medical diagnosis, drug discovery, and robotics. For this assignment, we will visit the part of artificial intelligence referred to as "natural language processing," i.e. getting a computer to work with a natural language such as English.

More specifically, you will write a program called AIAuthor that will actually write original prose in the style of a chosen human author. Can computers be creative? We'll find out!

This is a team assignment.

Details

Find a source text. Specifically, find a text file that contains some text that you want the computer to creatively imitate. Project Gutenberg is a great place to start.

The technique that you will use works conceptually as follows:

  • Ask the user to specify a source text file, to specify an integer to server as a window size (which we'll denote as w throughout this description), and the number of characters that the program should generate.
  • Read the source text into memory as a single string. Unlike previous examples of reading text (like for the spelling dictionary), we actually want to retain spaces in the original document. Therefore, you should read in an entire line at a time, instead of a word at time. Check out the Scanner class nextLine method.

    You will also want to use a StringBuilder instead of a String to hold the data as you read it in. The problem with using a String to hold the data is that every time you read a new line, you need to append it to the old one. Every time that you append to a String, Java creates a brand new one by copying the old String and then appending the new text. This is extemely slow. A StringBuilder is much faster way to manage appending multiple strings of text. When you're done appending, you can convert the StringBuilder back to a normal String. Here's an example:

    StringBuilder sbtext = new StringBuilder();
    sbtext.append("hello");
    sbtext.append(" goodbye ");
    sbtext.append("friends");
    String text = sbtext.toString(); // converts the StringBuilder to a normal string
    

    Be careful when you append the lines together that you append a space appropriately at the end of each line. If you don't handle this right, the last character of each line and the first character of the next next line will always appear right next to each other.

  • Once you have the source text in memory, choose a random substring of length w to start as the seed. You may find the substring method of String to be helpful here. Print this random substring to the screen.
  • Look through the entire source text to find all occurrences of this seed in order to determine what character follows. For example, if w=5, and your seed is memor, you may find that the text memor appears multiple times throughout your document sometimes followed by an i (as in the word "memories") and sometimes followed by a y (as in the word "memory"). Determine all characters (including punctuation, etc.) that follow your seed, and choose one randomly. Make sure that you do this in such a way that if the i appears more often than the y for example, it is more likely to be chosen. Print this letter to the screen. Again, you may find the substring method of String to be useful.
  • Create a new seed. This new seed should consist of all characters from your previous seed except the first, with the character you chose in the previous step appended at the end. For example, if your previous seed was memor and you chose an i as the next character, your new seed is emori. Repeat the above process to generate another random character.

Here's an example of some text that my program generates, using a window size of 5 on Grimm's fairy tales:

I will you learn home to taken his way, and around him, and left his come intender him in they cried: 'The little child,' said the other; 'let us for the cloak, and brough the peasant was to King.

Alternatively, when I tried it with a window size of 8, here's what I got one of the times that I ran it:

The servants running across the stables to salt and butter to eat and drink, the head you must get up quickly, and fell fainting to the test.

For part 1 of this assignment, your program should at least be able to read in a source text, ask for a window size, and choose a random seed from the text (which should be printed to the screen). Of course, you can turn in more of the entire assignment (or the whole thing) for part 1 if you like.

Final notes

Make sure that you use good design. Think carefully about how to layout your program so that you use classes and objects appropriately, and so that your methods remain short and self-contained.

Once you get your program running, try to get your program to write the most funny or insightful text that you can. I have created a Moodle forum for this assignment called "AI Author Results." Post your favorite results there so we can see them.

Available from: Tuesday, May 8 2007, 04:30 PM
Due date: Monday, May 14 2007, 11:55 PM