Assignment 4 - Assembly-to-C Puzzles

Due: Thursday, October 12, at 10:00pm
Update: This assignment will instead be due at 10:00pm on Tuesday, October 17 instead of Thursday, October 12. Please still give yourself a Fall Break!

Starter code: asm-to-c-package.tar
Upload solutions via Moodle as: asm-to-c.tar (see below)

Goals

This assignment is designed to help you with the following:

  • getting familiar with the basics of x86-64 assembly language
  • practicing learning how C code constructs are translated into the equivalent assembly code

Rubric

This assignment is worth a total of 16 points. They are allocated as follows:

  1 - author name (and collaborators) in a comment at the top of each .c file
  0 - puzzle0 (we'll do this together in class)
2.5 - puzzle1
2.5 - puzzle2
2.5 - puzzle3
2.5 - puzzle4
2.5 - puzzle5
2.5 - puzzle6

Note that for each of puzzle1–puzzle6, the 2.5 points are distributed into 1.5 points for code and 1 point for an explanation.

You are expected to be able to generate your .tar file for submission. Read that section early in case you want to verify that you’re doing it correctly.

Background

What does a compiler do?

That question has a long, complicated answer. In brief, though, our compiler (gcc) takes C source files as input and produces an executable program as output. The executable program contains, among other things, machine language instructions whose behavior implements the computations articulated in the original C code.

Machine language is just bits, like anything else in the computer, but this means it’s hard to read. So, if we want to understand the correspondence between C code and its equivalent machine language, we’re better off asking gcc to output assembly language code instead. Assembly isn’t particularly easy to read either, but it’s a lot easier than machine language.

As a general rule, each assembly language instruction corresponds to exactly one machine language instruction, and vice versa. There are some exceptions (e.g., sometimes one assembly language instruction is an alias for a sequence of two or three machine language instructions), but as a rough guide, you can think of assembly and machine code as being in one-to-one correspondence. As a result, by understanding the assembly generated by gcc, we will be very close to understanding the machine code as well.

This assignment

For this assignment, you are going to practice understanding the correspondence between simple C code and its equivalent assembly language by solving a sequence of puzzles. For each puzzle, you will read some given assembly and try to come up with the original C code that generated it. This is a simple form of reverse engineering, and it’s pretty fun, too!

Although we could use gcc on mantis, for this assignment we’re instead going to use an extremely handy tool called the Compiler Explorer. You’ll put some C code into the input panel, and the output panel will show you the assembly generated by the selected compiler. As you adjust your C code, you’ll be able to watch the changes in the assembly, and then compare your assembly code to the puzzle’s code.

Your job

In the provided asm-to-c-package.tar package, you will find several files named puzzle0.asm, puzzle1.asm, etc. For each puzzle, you need to do the following:

  1. Study the puzzleN.asm code to understand what it does. You should try to understand it holistically rather than just line-by-line, so you’ll be able to describe the code’s purpose in a single short sentence.

  2. Write an equivalent C function (or sometimes two functions). Then, use the Compiler Explorer to compile your code to assembly, and see how closely your assembly matches the contents of puzzleN.asm. Refine your code until you feel the match-up is close enough. (Exact matches are great, of course, but close matches might also be correct. Very slight changes in source code can lead to changes in the assembly code, even if the code’s end result is unchanged.)

  3. Write a one-sentence description of the purpose of the code in for the current puzzle. For example, you can imagine a description like: “This function takes one positive integer parameter n and returns the nth prime number.”

  4. Put your code in a file named puzzleN.c, and put your name(s) and one-sentence description in a comment at the top of the source code.

Getting started

You should refer back to Lab 4 if you have any issues or questions about getting Compiler Explorer set up.

We will solve puzzle0.asm together in class. Ask any questions you have about this process!

In working through these puzzles, I encourage you to try out tons of simple C programs, for example ones with if statements, with for/while loops, and any other constructs you can think of, to see what they look like in assembly. Don’t be afraid to experiment!

Note: You can follow the instructions below to build the .tar file you will submit, containing only puzzle0.c, and email it to me by Tuesday, October 11th at 10pm, and I’ll reply on Wednesday to let you know if you’ve successfully built the .tar file. (I won’t pre-grade any of your solutions, though.)

Handing it in

Put all of your .c files in a folder named asm-to-c (you can just rename your original folder you un-tar’ed, or make a new one). Then, from that folder, type the following commands to build your submission:

cd ..
tar -cvf asm-to-c.tar asm-to-c/

To submit this assignment, you should upload your asm-t-c.tar to the ASM-to-C Puzzles assignment on Moodle.