CS 204: Software Design

Mailing addresses and automated testing

The problem

When you order software on-line, you are usually asked for your mailing address, even though you're going to download the software. Some people reject this request, considering it none of the vendor's business. Others enter gibberish or false addresses. But in my experience, most people dutifully fill in their addresses. From the vendor's point of view, there are at least two good reasons to ask for an address: taxes and market research. (My company pays less tax on non-US sales, and more tax on Minnesota sales, so we need documentation. Furthermore, we can learn a lot about the most promising places to advertise by tracking sales by country.)

For this assignment, I want you to imagine the following problem. You are given a text file, each line of which consists of:

e-mail address [tab] mailing address

The task is to identify the country represented by each line, along with the state or province if the country is the US, Canada, or Australia. This task can be trickier than it sounds. For example, there's a France Avenue in the Twin Cities, and the two-letter abbreviations of Canada and California are the same. Furthermore, people often leave off the country name, especially if they're in the US.

You should assume that the output format for this program looks like this:

e-mail address [tab] mailing address [tab] country abbreviation [[tab] state/province abbreviation]

That is, each line will repeat the data from the corresponding line of the input file, and then add the ISO two-letter country abbreviation computed by your program, and then (if the country is the US, Canada, or Australia) the two-letter state or province code. If the program is unable to identify the country, use ?? as the country code. If the program identifies the US, Canada, or Australia, but is not able to identify the state or province, use ?? for the state/province.

Country names (in English and French) and the corresponding abbreviations can be found on this list of ISO country codes. You can search for state and province abbreviations, too.

Your jobs

  1. Your first job is not to write a program to solve this problem. Instead, I want you to develop test data and an automated system for performing your tests. You may assume that the program you are testing reads from standard input and writes to standard output, that the input format is as described above, and that the output format will be:

    e-mail address [tab] mailing address [tab] country abbreviation [[tab] state/province abbreviation]

    where the state/province information will be present only if the country is the US, Canada, or Australia.

    I will leave it to you to decide how your testing system should operate. However, I am looking for a thorough collection of common and boundary test cases, useful test reports, and ease of use. I especially encourage you to consider using a Makefile to help you make your tests easy to run.

    Turn in your test code and data in a folder called "addresstest".

  2. Your second job, not surprisingly, is to write a Python program to solve the country/state identification problem described above. Turn in your source code plus a readme.txt in a folder called "addresses". In the readme.txt, describe your program's command-line syntax, and write a brief paragraph about whether you found your test system helpful in the development of the program.

Have fun.