Assignment 1 - Getting Started With C

Due: Thursday, September 14, at 10:00pm

Starter code: starting-c-package.tar
Upload solutions via Moodle as: starting-c.tar

Goals

  • Learn some of the fundamentals of C programming
  • Start thinking of your data in terms of bytes
  • Get used to using some simple program testing automation

Collaboration policy

For this first assignment, you may work alone or with a partner, but you must type up all of the code yourself. (It is therefore unexpected for two code submissions to be completely identical.) You may also discuss the assignment at a high level with other students. You should list any student with whom you discussed the assignment, and the manner of discussion (high level, partner, etc.) in comments at the top of your C source files.

Rubric

This assignment is worth a total of 12 points. They are allocated as follows:

1 - author name(s) in a comment at the top of each C source file
4 - "depunctuate" program correctness
5 - "sorter" program correctness
2 - code quality

Learning a new language

The following is taken from Jeff Ondich’s guidance on learning a new language, with minor adaptations.

Once you have learned a couple of programming languages, getting started in a new language is mostly a matter of finding good reference materials, getting a source of sample programs, and writing a bunch of small programs to get the syntax and core libraries under control.

An approach to this involves writing a few small programs, to make sure you can handle the basics. You could just do the first 10 programs at Project Euler. Instead, you could ramp up faster by writing programs specifically aimed at teaching yourself key elements of the new language. For example:

  • Output: Write a “Hello, world!” program to print a simple text message to standard output.
  • Input: Write a program that asks the users for their names, age in years, and some non-integer number (e.g., their hourly wage), and prints the information back.
  • A function: Write a recursive factorial function as an example of a function that takes an integer parameter and returns an integer.
  • Arithmetic and conditionals: Write a change-making program. Here’s a description of such a program as assigned to a CS 111 class many years ago.
  • Input/output parameters: Write a function that takes two parameters and swaps them. Whether this is possible depends on whether the programming language supports pass-by-reference or pointers.
  • File input, loops, and command-line-arguments: Read the contents of one text file and write the same contents, in all uppercase, to a second text file. Both the input and the output file names should be specified as command-line arguments.
  • Lists/arrays: Given a text file consisting of one word on each line, read in the list of words, sort them into alphabetical order, and print the sorted list. Let the user specify the text file as a command-line argument.
  • String manipulation and searching: Count the number of times each word in a text file appears. Print the results sorted in decreasing order by word count. Let the user specify the text file as a command-line argument.
  • Dictionaries/hash tables: Do the word-counting exercise mentioned in the “string manipulation” item above, but keep track of the counts using a hash table (also known as a dictionary in some languages, including Python). This should run much faster than the one using lists, especially on large files with many unique words.
  • Classes: If your language has some form of object orientation (which C does not!), create a class called Circle with instance variables to keep track of the center and radius of a circle. The class should have a suitable constructor (or constructors), plus methods getArea, getCircumference, and a collection of appropriate accessors. Your program should read a list of circles from a text file whose lines consist of three numbers separated by spaces (e.g., “3.2 4 2.7” represents a circle of radius 2.7 centered at the coordinate (3.2, 4)), instantiating a Circle object for each one. Once you have a list of Circle objects, run through the list reporting the center, radius, area, and circumference of each circle. Let the user specify the file of circle data as a command-line argument.
  • Pointers, references, and memory allocation: Read a list of integers from a file into a linked list of your own construction. Sort the linked list (insertion sort and merge sort both work well with linked lists) and print the sorted list. Let the user specify the file as a command-line argument.

After getting these basics under control, you can start exploring the language’s standard libraries. For example, it is useful to know more about string manipulation, the file system (how to create/delete/move files, traverse a directory tree, etc.), simple GUIs and line graphics, invoking other programs, networking, etc. However, this is a longer-term project. If you have a good personal project (or class assignment!) to work on, most of these libraries come up naturally.

Your assignment

For this first assignment, you will write C version of the file/loops/command-line and lists/arrays programs described above.

Specifically, you’ll write the following programs:

  1. depunctuate.c: This program’s command-line syntax will be:

    ./depunctuate inputfile outputfile
    

The program will copy the contents of inputfile to outputfile unchanged except that only digits (ASCII 48-57), tabs and spaces (9, 32), newlines (10, 13), and letters (65-90 uppercase, 97-122 lowercase) will be written to the output file.

  1. sorter.c: This program’s command-line syntax will be:

    ./sorter textfile
    

The program will read the lines of the text file into an array, sort them lexicographically (i.e., using the return value of strcmp or strncmp as the comparison function), and print the sorted list to standard output. You may assume that:

  • the input file is an ASCII file; that is, every byte in the file has a value between 0 and 127
  • lines are delimited by the newline char '\n' (ASCII 10), except possibly for the last line in the file, which may or may not end in '\n'
  • no line contains more than 200 bytes, including the '\n'
  • there are no more than 500 lines in the input file
  • it’s okay to use an O(N^2) sorting algorithm, where N is the number of lines in the file

Getting the starter package

For many assignments this term, you’ll receive some starter code, some testing tools, or miscellaneous other materials to help you get started. These will generally be delivered to you via downloadable tar files. As noted in this handy tutorial from Indiana University, you can extract the files and folders contained in a tar file by using the command:

tar xvf whatever.tar

To get started on this first assignment:

  1. Login to mantis.mathcs.carleton.edu using VS Code and open your cs208 folder. Go back and look at Lab 1 if you have any questions about how to do this.

  2. In your VS Code terminal, run:

    wget https://cs.carleton.edu/faculty/tamert/courses/cs208-f23/resources/assignments/starting-c-package.tar
    
  3. Still in your VS Code terminal, extract the starting-c-package folder:

    tar xvf starting-c-package.tar
    

This will create a folder named starting-c-package with some stuff in it.

  1. Read the readme.txt file and get started.

Automated testing

In the starting-c-package.tar file linked at the top of this page, you will find:

  • Makefile: a file that you’ll use to perform some very simple automated tests for your depunctuate.c and sorter.c programs
  • readme.txt: an explanation of how to run the tests
  • some test data files

Note that for most assignments, I will only give you very simple tests as part of an assignment’s starter package. The grader and I will certainly add some more sophisticated tests to explore the boundaries of a given assignment. You are, of course, free to use the testing infrastructure from the starter package to add your own tests. Getting used to automated testing and to writing detailed tests of your own will serve you well in the long run.

Submitting your work

You’ll need to follow these steps to submit your work:

  1. Put your source files (depunctuate.c and sorter.c) in a folder named starting-c (note the change of folder name from the starter code).

  2. Change directories (cd) to the parent directory of starting-c.

  3. Create a tar file, as follows:

    tar cvf starting-c.tar starting-c
    
  4. Download the .tar file to your local machine (in VS Code while connected to mantis.mathcs.carleton.edu, you can right-click on starting-c.tar and select “Download”).

  5. Use the Moodle web interface to submit your .tar file.

Advice

  • There are some really helpful examples on the Sample Programs page. Make sure to take a look and check your notes for any we specifically referenced in class!

  • To help simplify the code for depunctuate.c, try looking at the manual pages for isspace, isalpha, and isdigit. (For example, in a terminal, type man isalpha.)

  • Similarly, take a look at character escape sequences in C. You might not need these for the current assignment, but if you do, you would probably care especially about: '\n' (newline, ASCII 10), '\r' (carriage return, ASCII 13), and '\t' (tab, ASCII 9). This will make your code much easier to read and understand. This code, for example:

    char ch;
    /* ...ch gets a value... */
    if (isdigit(ch))
    {
        /* do something */
    }
    else if (ch == '\n')
    {
        /* something else */
    }
    /* more code follows */
    

    is much better than this code:

    char ch;
    /* ...ch gets a value... */
    if (ch >= 48 && ch <= 57)
    {
        /* do something */
    }
    else if (ch == 10)
    {
        /* something else */
    }
    /* more code follows */
    

  • Add one test case to each program consisting of an empty input file. You always want your programs to be able to deal with this very common boundary case.

Was it all over too soon?

Want an extra C challenge to fill your quiet hours? Try writing the word-counting program, described above, for fun. I will not grade this program, not will you get any course credit for it. However, the practice won’t hurt.

What follows is a more detailed description of the program, named wordcounter.c.

This program’s command-line syntax will be:

./wordcounter textfile

The program will read the words from the input file, count the number of times each word occurs in the file, and print to standard output a list of words and their counts, in the following format:

the,37
and,12
for,6
in,4
of,4
were,4
...

That is, each line of output consist of a word, then a comma, then the base-10 count of the number of occurrences of that word, with no spaces. The lines of outout are sorted in reverse order of their counts, with ties broken by putting words in alphabetical order (see in/of/were above). You should assume that:

  • A “word” consists of any contiguous block of Latin letters (a-z, A-Z), delimited by non-Latin-letter characters (e.g., punctuation, spaces) or the beginning or end of the file. (Note that with this definition, contractions like “don’t” will generate weird “words” like “don” and “t”–don’t try to fix that!)

  • No word will be longer than 30 bytes (including the null-termination).

  • The input file will contain no more than 1000 distinct words.