Assembly to C
Starter code: asm-to-c-package.tar
Upload via Moodle as: asm-to-c.tar
Goals
- Get familiar with x86_64 assembly language basics
- Practice learning how C code constructs get translated into equivalent assembly language
Rubric
Background
What does a compiler do?
That question has a long, complicated answer. But in brief, our compiler (gcc) takes C sources as input, and produces an executable program as output. The executable program contains, among other things, machine language instructions whose behavior implements the computations articulated in the original C code.
Machine language is just bits, and is thus hard to read. So if we want to understand the correspondence between C code and its corresponding machine language, we're better off asking gcc to output assembly language code instead. Assembly isn't particularly easy to read, but it's a lot easier than machine language. And as a general rule, each assembly language instruction corresponds to exactly one machine language instruction, and vice versa. There are some exceptions (e.g. sometimes one assembly language instruction is an alias for a sequence of two or three machine language instructions), but as a rough guide, you can think of assembly and machine language instructions as being in one-to-one correspondence. As a result, by understanding the assembly language generated by gcc, we will be very close to understanding the machine language as well.
For this assignment, you are going to practice understanding the correspondence between simple C code and its equivalent assembly language by solving a sequence of puzzles. For each puzzle, you will read some given assembly language and try to come up with the original C code that generated it. This is a form of reverse engineering, and it's pretty fun.
Though we could use gcc on mantis, for this assignment we're instead going to use an extremely handy tool called the Compiler Explorer. You'll put some C code into the input panel, and the output panel will show you the assembly language generated by the selected compiler. As you adjust your C code, you'll be able to watch the changes in the assembly language, and then compare your assembly code to the puzzle's code.
Your assignment
In the asm-to-c-package.tar package, you will find several files named puzzle0.asm, puzzle1.asm, etc. For each puzzle, your job will go like this:
- Study the puzzleN.asm code to understand what it does. Ideally, you'll understand it holistically rather than just line-by-line, so you'll be able to describe the code's purpose in a single short sentence.
- Write an equivalent C function (or sometimes two functions). Then use the Compiler Explorer to compile it to assembly, and see how closely your assembly matches the contents of puzzleN.asm. Refine your code until you feel the match-up is close enough (Exact matches are great, of course, but close matches might also be correct. Very slight changes in source code can make changes in the assembly code, even if the code's end result is unchanged.
- Put your C code into a file called puzzleN.c (i.e. puzzle0.c for puzzle0.asm, puzzle1.c for puzzle1.asm, etc.). In the comment at the top of puzzleN.c, write a sentence or two explaining what this code does. You may then add anything else you want to say about the differences between your assembly code and the contents of puzzleN.asm.
Compiler Explorer settings
To create the puzzles, I used Compiler Explorer with the following settings. You should use the same settings.
- Choose the C language above the input panel
- Choose the x86-64 gcc 11.2 compiler above the output panel
- Enter "-Og" (that's a minus sign, a capital O for "Optimization" and a little g) in the "Compiler options" field above the output panel
- In the "Filters" dropdown list above the output panel, check "Unused labels", "Library functions", "Directives", and "Comments" to remove all sorts of clutter from your assembly output.
Here's how all of that looks.