C: Escape sequences

This is an individual assignment.

Overview

Many programming languages use escape sequences in strings to represent special characters. Escape sequences are sequences of two or more characters starting with a backslash (\). For example, the sequence \n usually represents a newline, while the escape sequence \\ represents a backslash.

In this assignment, you will write two functions to translate escape sequences to and from their explicit representation. This assignment is intended to give you more practice with dynamic memory allocation in C, as well as character strings.

Getting started

Look for a GitHub called username_escape, where username is yout GitHub user name. Once you've found your project, clone it from GitHub.

Download the files escape.h, escape.c, and escape_test.c. Put them in your git respository, and commit and push them.

Escaping and unescaping strings

escape.h contains the headers for two functions: escape and unescape. Your job is to implement both of these functions.

escape should take a normal C string, possibly including special characters, and "escape" it by converting each special character to its backslashed form. Note that both escape and unescape should allocate a new string for their return value, NOT modify the original argument. For example, if escape is passed the string:

H e l l o \n w o r l d \0

then it should return a string with the newline character converted to two characters:

H e l l o \\ n w o r l d \0

Note we do not modify the \0 character at the end of the string. This is a special character called the null terminator that is at the end of every string in C. It's not part of the string per se; rather, it's used by functions like printf to know when the string is over. \0 is actually just the byte 0, so you can iterate through a string quite easily by remembering that 0 is false in C:

char *s = ...;
for (int i = 0; s[i]; i++) {
    ... // do something with s[i]
}

Put another way, for any string we can type into C, we should be able to print it as typed by escaping it, then passing the result to printf:

char *s = "My\tname\tis\tDave";
printf("%s\n", s);         // prints:   My      name    is      Dave
char *s_escaped = escape(s);
printf("%s\n", s_escaped); // prints:   My\tname\tis\tDave

Note that the string returned by escape is longer than the original string. escape should allocate a string of the correct size (but no larger!) and return that allocated string. (Don't forget to leave room for the null terminator!)

The unescape function should perform the reverse transformation; i.e., it should take an explicitly escaped string, and convert the escape sequences into characters. So calling unescape on the following string:

H e l l o \\ n w o r l d \0

should allocate and return the string:

H e l l o \n w o r l d \0

In particular, we should be able to perform a round-trip in either direction:

#include <string.h>
#include <assert.h>

...

char *s = "images\\foo.jpg";
char *s_escaped = escape(s);
char *s_escaped_unescaped = unescape(s_escaped);
assert(strcmp(s, s_escaped_unescaped) == 0);

(Don't know strcmp or assert? Try man strcmp or man assert.)

A few important details:

Note that escape_test.c does not correctly free up memory allocated by the escape and unescape functions. Make sure to add code to escape_test.c that does so appropriately.

Switch statements

C provides a control structure called the switch statement for quickly choosing among many options. It's basically a streamlined if/else if/else statement. The idea is that you start with some expression, and switch to a particular piece of code based on that expression's value:

switch (e) {
case 1:
    // e is 1! do something!
    break;
case 2:
    // e is 2! do something else!
    break;
...
default:
    // nothing matches! fall-back case here!
    break;
}

Switch statements are sometimes for string processing, since the switch expression is allowed to be a char (since chars are just small ints anyway). You don't need to use it for this assignment, but you can if you want to try out something new.

Note that each case has a break statement at the end. If you don't have this, execution "falls through" to the next case, which is weird and often not what you actually want. I recommend always using break so you don't run into this behavior.

For more information on switch statements, read section 6.4.2 of Scott, or pages 84-87 of Kochan.

Memory errors

Your code should have no memory leaks or memory errors when run using valgrind. We will be checking this during grading. We'll also be running your code with our own main function, so having an empty main will not get you past this requirement. You can run valgrind by doing:

valgrind --leak-check=full ./escape_test

Turning in your assignment

Commit and push your code to GitHub. You must include the following:


Written by Laura Effinger-Dean. Lightly modified by Dave Musicant.