Exploiting buffer overflows

This assignment is an adaptation and contraction of Aaron Bauer's adaption of a lab developed for the Carnegie Mellon University's 15-213 (Introduction to Computer Systems) course, which is the course for which our textbook was written.

You may work alone or with one other classmate on this assignment.

Goals

Rubric

Maximum score: 25 points

10 - phase 1 10 - phase 2 5 - phase 3

Your progress through the phases will be automatically tracked as it was during the bombs assignment. For this assignment, your entire score will be based on your completion of the three phases of the assignment, and you don't need to hand in anything else.

Check your progress here: http://cs208.mathcs.carleton.edu:1866/progress

Get your attack target

This assignment involves generating three attacks on a program that has a security vulnerability. As in the bombs lab, you will analyze a pre-compiled executable and devise appropriate inputs to achieve particular goals.

Sections 3.10.3 and 3.10.4 of the textbook will provide useful reference material for this assignment.

You can obtain your target by filling out the form at http://cs208.mathcs.carleton.edu:1866/. You will need to be on campus or connected to the Carleton VPN to access this page. Once you have filled out and submitted the form, the server will build your target and and return it to your browser in a file named targetN.tar, where N is the unique ID of your target.

Your targetN.tar contains a whole bunch of files because we're doing only a portion of the original lab. The files you will need are:

About ctarget

ctarget reads strings from standard input. It does so using the function getbuf:

unsigned getbuf() { char buf[BUFFER_SIZE]; Gets(buf); return 1; }

The function Gets is similar to the standard library function gets—it reads a string from standard input (terminated by ‘\n’ or end-of-file) and stores it (along with a null terminator) at the specified destination address. In this code, you can see that the destination is an array buf, declared as having BUFFER_SIZE bytes. At the time your targets were generated, BUFFER_SIZE was a compile-time constant specific to your version of the program.

Functions Gets and gets have no way to determine whether their destination buffers are large enough to store the string they read. They simply copy sequences of bytes, possibly overrunning the bounds of the storage allocated at the destinations.

(Note that getbuf() is really weird, in that it reads data into buf, and then just returns. That seems dumb, right? What you need to imagine is that this same operation—declare a buffer and read data into it—is happening in the context of a real-life program, where the dangerous Gets(buf) line is followed by code that does something important with the contents of buf. This scenario is disturbingly common in important code that runs the world.)

If the string typed by the user and read by getbuf is sufficiently short, it is clear that getbuf will return 1, as shown by this execution example, with user input italicized (not also that your cookie value will be whatever is in your cookie.txt file):

./ctarget Cookie: 0x1a7dd803 Type string: Keep it short! No exploit. Getbuf returned 0x1 Normal return

Typically an error occurs if you type a long string:

./ctarget Cookie: 0x1a7dd803 Type string: This is not a very interesting string, but it has the property ... Ouch!: You caused a segmentation fault! Better luck next time FAILED

As the error message indicates, overrunning the buffer typically causes the program state to be corrupted, leading to a memory access error.

ctarget takes the following command-line arguments:

Your job

Your task is to be clever with the strings you feed ctarget so that it does interesting (and unintended) things. These are called exploit strings. As with the bomb-defusing assignment, you are looking for one successful exploit string for each of the three phases of this assignment.

Your exploit strings will typically contain byte values that do not correspond to the ASCII values for printable characters. The program hex2raw will enable you to generate these raw strings. See Appendix A for more information on how to use hex2raw.

When you have correctly solved one of the phases, your target program will tell you so. For example:

cat ctarget.phase2 | ./hex2raw | ./ctarget Cookie: 0x1a7dd803 Type string:Touch2!: You called touch2(0x1a7dd803) Valid solution for level 2 with target ctarget PASS: Sent exploit string to server to be validated. NICE JOB!

Unless you run ctarget with the -q flag, your exploint string will be sent to the assignment server and tested. The server will then update your status on the progress page:

There is no penalty for making mistakes in this assignment. Feel free to mess with ctarget using any strings you like.

A few more observations:

Phase 1: make ctarget call the wrong function

For phase 1, your exploit string will redirect the program to execute an existing function that it's not intended to execute. Function getbuf is called within ctarget by a function test:

1 void test() { 2 int val; 3 val = getbuf(); 4 printf("No exploit. Getbuf returned 0x%x\n", val); 5 }

When getbuf executes its return statement, the program ordinarily resumes execution within function test (at line 4 of this function). We want to change this behavior. Within the file ctarget, there is code for a function touch1:

1 void touch1() { 2 vlevel = 1; /* Part of validation protocol */ 3 printf("Touch1!: You called touch1()\n"); 4 validate(1); 5 exit(0); 6}

Your task is to get ctarget to execute the code for touch1 when getbuf executes its return statement, rather than returning to test. Note that your exploit string may also corrupt parts of the stack not directly related to the this modified return statement, but that's OK with us, since touch1 causes the program to exit directly.

Some Advice:

Phase 2: make ctarget call the wrong function with an int parameter

Phase 2 involves placing a small amount of machine-language code on the stack as part of your exploit string, and then inducing the program to execute this injected code.

Within ctarget there is code for a function touch2:

1 void touch2(unsigned val) { 2 vlevel = 2; /* Part of validation protocol */ 3 if (val == cookie) { 4 printf("Touch2!: You called touch2(0x%.8x)\n", val); 5 validate(2); 6 } else { 7 printf("Misfire: You called touch2(0x%.8x)\n", val); 8 fail(2); 9 } 10 exit(0); 11 }

Your task is to get ctarget to execute the code for touch2 rather than returning to test. In this case, however, you must make it appear to touch2 as if you have passed your cookie as its argument.

Some Advice:

Phase 3: make ctarget call the wrong function with a string parameter

Like Phase 2, Phase 3 involves a code injection attack, but this time passing a string rather than an integer as a parameter.

ctarget contains functions hexmatch and touch3:

/* Compare string to hex represention of unsigned value */ 1 int hexmatch(unsigned val, char *sval) { 2 char cbuf[110]; 3 /* Make position of check string unpredictable */ 4 char *s = cbuf + random() % 100; 5 sprintf(s, "%.8x", val); 6 return strncmp(sval, s, 9) == 0; 7 } 8 9 void touch3(char *sval) { 10 vlevel = 3; /* Part of validation protocol */ 11 if (hexmatch(cookie, sval)) { 12 printf("Touch3!: You called touch3(\"%s\")\n", sval); 13 validate(3); 14 } else { 15 printf("Misfire: You called touch3(\"%s\")\n", sval); 16 fail(3); 17 } 18 exit(0); 19 }

Your task is to get ctarget to execute the code for touch3 rather than returning to test. You must make it appear to touch3 as if you have passed a string representation of your cookie as its parameter.

Some Advice:

Appendix A: Using hex2raw

hex2raw is a command-line utility that takes as input a hex-formatted string. In this format, each byte value is represented by two hex digits. For example, the string “012ABC” could be entered in hex format as "30 31 32 41 42 43 00".

The hex characters you pass to hex2raw should be separated by whitespace (blanks or newlines). We recommend separating different parts of your exploit string with newlines while you’re working on it. hex2raw supports C-style block comments, so you can mark off sections of your exploit string. For example:

48 c7 c1 f0 11 40 00 /* mov $0x40011f0,%rcx */

Be sure to leave space around both the starting and ending comment strings ("/*", "*/"), so that the comments will be properly ignored. If you generate a hexformatted exploit string in the file ctarget.phase1, you can apply the raw string to ctarget in several different ways:

  1. You can set up a series of pipes to pass the string through hex2raw

    cat ctarget.phase1 | ./hex2raw | ./ctarget
  2. You can store the raw string in a file and use I/O redirection:

    ./hex2raw < ctarget.phase1 > ctarget.phase1.raw ./ctarget < ctarget.phase1.raw

    This approach can also be used when running from within GDB:

    gdb ctarget (gdb) run < ctarget.phase1.raw
  3. You can store the raw string in a file and provide the filename as a command-line argument:

    ./hex2raw < ctarget.phase1 > ctarget.phase1.raw ./ctarget -i ctarget.phase1.raw

    This approach also can be used when running from within GDB.

Appendix B: Generating Byte Codes

Using gcc as an assembler and objdump as a disassembler makes it convenient to generate the byte codes for instruction sequences. For example, suppose you write a file example.s containing the following assembly code:

# Example of hand-generated assembly code pushq $0xabcdef # Push value onto stack addq $17,%rax # Add 17 to %rax movl %eax,%edx # Copy lower 32 bits to %edx

The code can contain a mixture of instructions and data. Anything to the right of a ‘#’ character is a comment. You can now assemble and disassemble this file:

gcc -c example.s objdump -d example.o > example.d

The generated file example.d contains the following:

example.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <.text>: 0: 68 ef cd ab 00 pushq $0xabcdef 5: 48 83 c0 11 add $0x11,%rax 9: 89 c2 mov %eax,%edx

The lines at the bottom show the machine code generated from the assembly language instructions. Each line has a hexadecimal number on the left indicating the instruction’s starting address (starting with 0), while the hex digits after the ‘:’ character indicate the byte codes for the instruction. Thus, we can see that the instruction push $0xABCDEF has hex-formatted byte code 68 ef cd ab 00.

From this file, you can get the byte sequence for the code:

68 ef cd ab 00 48 83 c0 11 89 c2

This string can then be passed through hex2raw to generate an input string for the target programs. Alternatively, you can edit example.d to omit extraneous values and to contain C-style comments for readability, yielding:

68 ef cd ab 00 /* pushq $0xabcdef */ 48 83 c0 11 /* add $0x11,%rax */ 89 c2 /* mov %eax,%edx */

This is also a valid input you can pass through hex2raw before sending to one of the target programs.

A note about learning vulnerability exploitation

This stuff feels weird at first. But you can do it! Draw pictures, step slowly through the code, take a look at the stack and the registers, figure out where function parameters are stored, and just gradually put together a picture of what's going on for yourself. Talk to each other, talk to me, step away and take a walk, and come back to it again.

Another note about learning vulnerability exploitation

Learning how to exploit software vulnerabilities helps you to understand those vulnerabilities and how to prevent them. At the same time, of course, this knowledge could potentially be used to harm other people. Don't do that.

Have fun!