In-Class Regular Expressions Activity

Task 1: String Matching & Regular Expressions

The website http://regexpal.com/ allows you to demo regular expressions on sample text. We will use this site in the steps below. Enter regular expressions in the top box and sample text in the bottom box.

  1. Download act_regex1.py and run the program. After you enter your name, it should print out a poem based on your name, followed by a line of stats. Copy the text of the poem (and the stats line) into the bottom box of the regexpal website.
  2. Enter the following lines into the top box. What do you think brackets do?
    • n
    • fn
    • [fn]
    • [aeiou]
  3. Get all the words that prepend the letters b, f, or m to your name (or the suffix of your name). (For example, I would want to match badrian, fadrian, and madrian.)
  4. Look back at act_regex1.py. Notice how there are two functions for detecting whether a particular character is a vowel. In the function bananaFana(), change the call to isVowel() to instead use isVowelRegEx(), and then run the program again. What's going on?
  5. Enter the following lines into the top box. What do you think \w does? What do you think the + sign does?
    • f
    • f\w
    • f\w\w
    • f\w+
  6. Enter the following lines into the top box. What do you think \s does?
    • \s
    • \sm
  7. Get all the words that start with b, f, or m.
  8. Enter the following line into the top box. What do you think $ does?
    • .$
  9. Get all the words that appear at the end of a line.
  10. What type of things does \d match? How can you match runs of multiple such things?

Task 2: Other Regular Expressions, and Match Objects

Now download act_regex2.py. Download and save poem.txt in the same directory as act_regex2.py. Open poem.txt in your text editor, so you can refer back to it as the program operates on it.

  1. The first argument to the function printFirstRegexMatch() is a string representing a regular expression. The r before the open-quotation-mark indicates that the literal is a raw string literal, meaning that backslashes don't do anything special. This way, we can put backslashes into our regular expressions without needing to escape them.
  2. Run the program and see what it does. Based on your knowledge of regular expressions, you should be able to explain why it prints out the string that it does.
  3. But where does it get the start- and end-index information? There are comments in the code that explain that. Ask questions if you don't understand what's going on.
  4. Once you've figured out how the program works, change the first argument of the call to printFirstRegexMatch() to be each of the following regular expressions:
    • r'\sg\w+'
    • r'\s[gG]\w+'
    • r'out'
    • r'\sout'
    • r'\w+out'
    • r'[\s\w]out'
    • r'[\s\w][Oo]ut'

Task 3: Match Iterators

Download act_regex3.py and put it in the same directory as everything else. This program works similarly to the last one, but it does something slightly different.

Read through the source code and change the regular expression that's being used. (Replace it with some of the exampls from the previous task.) Again, ask questions if you don't quite get what's going on.

Once you've figured out how the program works, it's time to design your own regular expressions. Change the regular expression so that the program does the following:

  1. Print all occurrences of the substring it.
  2. Print all occurrences of the word it. A word should be surrounded by whitespace; it's okay in this case to include the whitespace in your matches. There are five instances of the word it.
  3. Print all words that contain it followed by at least one other letter. There are six such words. Can you make a pattern that will match the whole word but not include the whitespace on the ends?
  4. Print all words that end in ing. There are two such words.
  5. Print all phrases surrounded by double quotes (all occurrences of speech). There is only one phrase.
  6. Print all contractions (words with a single-quote character in the middle). Remember that single quotes are “special” — they require a \. There are two such words (She'd and I'll), but write the expression to return any contraction.