Assignment 6 - Exploiting Buffer Overflows
Due: Tuesday, October 31, at 10:00pm
This assignment is an adaptation of Jeff Ondich’s adaptation+contraction of Aaron Bauer’s adaptation of a lab developed for the Carnegie Mellon University’s 15-213 (Introduction to Computer Systems) course, which is the course for which our textbook was written.
You may work on this assignment alone or with a single partner.
Goals
This assignment is designed to help you with the following:
- learning some of the ways that carefully designed malicious input can cause a vulnerable program to behave in ways not intended by the program’s creators
- using that knowledge to learn how to avoid creating similar vulnerabilities in your own programs
- learning about how x86-64 machine language is structured
- deepening your understanding of the stack structures used to implement C function-calling in x86-64
- further strengthening your
gdb
skills
Rubric
The maximum score for this assignment is 15 points:
5 - phase #1
5 - phase #2
5 - phase #3
Your progress through the phases will be automatically tracked as it was during your zoo escape.
Submission
For this assignment, your entire score will be based on your completion of the three phases, and you don’t need to hand in anything else.
Check your progress here: http://cs208.mathcs.carleton.edu:1866/progress.
Buffer overflow attacks
Suppose you write a program that includes this function:
int some_function(int a, int b)
{
int j;
char str[100];
int k;
/* other stuff here */
}
When you call some_function
, a stack frame gets allocated on the system stak (typically via sub $0xSOMETHING, %rsp
) to store the return address, the local variables, and the values saved from the miscellaneous registers that some_function
needs to use (so those registers, like %rbx
and %rbp
, can be restored to their original values before the function returns). Depending on the specific code in some_function
, the parameters a
and b
might also get stored in the stack frame.
As you might imagine, it would be typical for a compiler to position the local variables j
, str
, and k
adjacent to one another in the memory occupied by the stack frame.
Suppose then that some_function
has been written sloppily and reads user input into str
without checking to make sure the user input fits in the 100 bytes of str
. The excess bytes in the user input will spill over the end of str
and corrupt the contents of j
or k
, depending on whether one of them is stored in memory immediately after str
. If the user input is long enough it might also corrupt a lot more of the stack (including the stack frames of previous function calls).
As mentioned above, another piece of data in the stack frame is the return address. That is, the stack frame includes the 8-byte address of the instruction immediately following the call
instruction that brought us to some_function
in the first place. When the ret
insruction gets executed by some_function
to return to the function from which it was called, ret
uses the return address as the destination to which to return.
If our users are clever enough, they can type input that will:
- overflow
str
, and - overwrite the return address with the address of some other function.
This would cause some_function
to “return” to the wrong place, which could lead to all sorts of mayhem depending on what code resides at that wrong place.
This kind of sneaky user is engaged in what is known as a buffer overflow attack (closely related to stack overflow from which many programmer’s favorite website derives its name). In this assignment, you are going to play the role of buffer-overflow attacker. In the process, you’ll get more familiar with the way function calling works at the assembly language level.
What to do
For this assignment, you will need to:
- Get your attack target: Fill out the form at http://cs208.mathcs.carleton.edu:1866/ to download your
targetN.tar
file. This file is analagous to thezooN.tar
file you obtained during the zoo escape assignment.
-
Move your
targetN.tar
file tomantis
: Copy the file and expand it there (withtar xvf targetN.tar
). ThetargetN.tar
file contains a whole bunch of files, some of which you will not need because we’re doing only a portion of the original lab. -
Explore the files we’re using: The files you will need are:
ctarget
— the executable program that you will attack; this program is vulnerable to code injection attacksctarget.phaseN
forN
= 1, 2, and 3 — files where you can put your solution to the phases as you work (analagous to thepasscodes.txt
file from the zoo assignment)cookie.txt
— an 8-digit hex code that you will use as a unique identifier in your attackshex2raw
— a utility program that will help you to generate attack strings
You can ignore
farm.c
and all files of the formrtarget*
.
-
Complete each phase: For each phase, you’ll want to follow the following sub-steps:
-
Use
gdb
(andgcc
for some phases) to figure out what bytes you want to write into (and past) the input buffer so as to corrupt the stack frame in whatever way is required to achieve your goal. -
Store the bytes you want to input as space-delimited bytes (2 hex digits each) in the
ctarget.phaseN
file (withN
for whichever phase you’re working on).
Note that your exploit strings must not contain the byte value
0x0a
, as this is the ASCII code for newline ('\n'
) andctarget
will consider it the end of your input.- Perform your attack like so:
cat ctarget.phaseN | ./hex2raw | ./ctarget
If you’re successful,
ctarget
will let you know. Note that we’ll see more about whatcat
and pipes (|
) do in the next couple of weeks.-
Make sure you run the attack on
mantis
so that your success gets recorded. -
Check out your progress at http://cs208.mathcs.carleton.edu:1866/progress.
Unless you run
ctarget
with the-q
flag (to run it “quietly”), your exploit string will be sent to the assignment server and tested. The server will then update your status on the progress page.There is no penalty for making mistakes in this assignment, nor are mistakes even recorded anywhere. Experiment at will!
The progress server expects you to work on
mantis.mathcs.carleton.edu
. You can certainly do your work on any Linux x86-64 computer, but eventually, you’ll need to submit your solution to each phase frommantis
. -
Celebrate!
General information about ctarget
The ctarget
program reads strings from standard input. It does so using the function getbuf
:
unsigned getbuf()
{
char buf[BUFFER_SIZE];
Gets(buf);
return 1;
}
The function Gets
is similar to the standard library function gets
: it reads a string from standard input (terminated by '\n'
or EOF
) and stores it including a null terminator at the specified destination address. In this code, you can see that the destination is an array buf
, declared as having BUFFER_SIZE
bytes. At the time your ctarget
was generated, BUFFER_SIZE
was a compile-time constant unique to your version of the program.
The functions Gets
and gets
have no way to determine whether their destination buffers are large enough to store the string they read. They just keep copying input bytes to the buffer until they encounter '\n'
or EOF
. People who program using these input functions are at risk of buffer-overflow attacks.
(Note that getbuf()
is really weird in a “this-is-a-classroom-exercise” sort of way. It reads data into the local variable buf
, and then immediately returns without doing anything with the data. That seems silly, right? What you should imagine is that this same operation—declare a buffer and read data into it—is happening in the context of a real-life program, where the dangerous Gets(buf)
line is followed by code that does something important with the contents of buf
. This scenario—uncontrolled input followed by critical computation—is disturbingly common in important code that runs the world.)
For this assignment, we will refer to the bytes entered by the user as the exploit string. If your exploit string is sufficiently short, it won’t overrun the buffer buf
, so getbuf
will finish normally and return 1
, as shown by this execution example (the user input is Look! A beaver!
):
./ctarget
Cookie: 0x1a7dd803
Type string: Look! A beaver!
No exploit. Getbuf returned 0x1
Normal return
Typically, though, an error occurs if you type a long string:
./ctarget
Cookie: 0x1a7dd803
Type string: Three little beavers jumping on the bed, one fell off and bumped her head, so Sadie called the doctor and the doctor said, "Why are there beavers in this bed?"
Ouch!: You caused a segmentation fault!
Better luck next time
FAILED
As the error message indicates, overrunning the buffer typically causes the program state to be corrupted, leading to a memory access error.
Because many of the bytes you are going to want to feed to ctarget
will not be printable ASCII characters, you will usually use the technique mentioned in the previous section and elaborated in Appendix A: store your bytes in hex in a text file called ctarget.phaseN
and use hex2raw
to convert your text data to the desired bytes before piping (via |
) the result to ctarget
’s stdin
. Here, for example, is what you might see if you have a correct solution to Phase 2:
cat ctarget.phase2 | ./hex2raw | ./ctarget
Cookie: 0x1a7dd803
Type string:Touch2!: You called touch2(0x1a7dd803)
Valid solution for level 2 with target ctarget
Pass: Sent exploit string to server to be validated.
NICE JOB!
Note that ctarget
has some command-line flags that may be handy. You can execute ctarget -h
to read about them.
Phase 1: Make ctarget
call the wrong function
For Phase 1, your exploit string will redirect the program to execute an existing function that it’s not intended to execute. Function getbuf
is called within ctarget
by a function test
:
|
|
When getbuf
executes its return statement, the program ordinarily resumes execution within fuction test
(at line 5 of this function). We want to change this behavior. Within the file ctarget
there is code for a function touch1
:
void touch1()
{
vlevel = 1; /* part of validation protocol */
printf("Touch1!: You called touch1()\n");
validate(1);
exit(0);
}
Your task is to get ctarget
to execute the code for touch1
when getbuf
executes its return statement, rather than returning to test
. Note that your exploit string may also corrupt parts of the stack not directly related to this modified return statement, but that’s okay with us, given that touch1
causes the program to exit directly (via a call to exit(0)
).
Some advice:
- All of the information you need to devise your exploit string for this phase can be determined by examining a disassembled version of
ctarget
. Useobjdump -d ctarget
to get this disassembled version.
More specifically, I recommend using
objdump -d ctarget > disassembly.s
instead, so you can open the disassembled code in an editor like VS Code to make it easier to explore.
-
The idea is to position a byte representation of the starting address for
touch1
so that theret
instruction at the end of the code forgetbuf
will transfer control totouch1
. -
Be careful about byte ordering.
-
You will certainly want to use
gdb
to step through the program through the last few instructions ofgetbuf
to make sure it is doing what you want. You’ll be modifying the contents of the stack on purpose, so don’t forgetx/1ss $rsp
,x/20dw 0x1234567
, and similargdb
commands to see whether you’re modifying it the way you intend. -
The placement of
buf
within the stack frame forgetbuf
depends on the value of the compile-time constantBUFFER_SIZE
, as well as the stack allocation strategy used bygcc
. You will need to examine the disassembled code to determine its position.
Phase 2: Make ctarget
call the wrong function with an int
parameter
Phase 2 involves placing a small amount of machine-language code on the stack as part of your exploit string, and then inducing the program to execute this injected code.
Within ctarget
there is code for a function touch2
:
void touch2(unsigned val)
{
vlevel = 2; /* part of validation protocol */
if (val == cookie)
{
printf("Touch2! You called touch2(0x%.8x)\n", val);
validate(2);
}
else
{
printf("Misfire: You called touch2(0x%.8x)\n", val);
fail(2);
}
exit(0);
}
Your task is to get ctarget
to execute the code for touch2
rather than returning to test. In this case, however, you must make it appear to touch2
as if you have passed your cookie as its argument.
Some advice:
-
Your program’s cookie is in the file
cookie.txt
included in yourtargetN.tar
. -
You will want to position a byte representation of the address of your injected code in such a way that the
ret
instruction at the end of the code forgetbuf
will transfer control to it. -
As you hopefully remember, the first argument to a function is passed in register
%rdi
. Your injected code should set%rdi
to your cookie, and then use aret
instruction to transfer control to the first instruction intouch2
. -
Do not attempt to use
jmp
orcall
instructions in your exploit code. The encodings of destination addresses for these instructions are difficult to formulate. Instead useret
instructions for all transfers of control, even when you are not returning from a call (recall that aret
pops the return address from the stack and then jumps to it). -
See the instructions in Appendix B on how to use tools to generate the byte-level representations of instruction sequences.
Phase 3: Make ctarget
call the wrong function with a string parameter
Like Phase 2, Phease 3 involves a code injection attack, but this time requires passing a string rather than an integer as a parameter.
The ctarget
program contains functions hexmatch
and touch3
:
/* Compare string to hex representation of unsigned value */
int hexmatch(unsigned val, char *sval)
{
char cbuf[110];
/* Make the position of the check string unpredictable */
char *s = cbuf + random() % 100;
sprintf(s, "%.8x", val);
return strncmp(sval, s, 9) == 0;
}
void touch3(char *sval)
{
vlevel = 3; /* part of validation protocol */
if (hexmatch(cookie, sval))
{
printf("Touch3! You called touch3(\"%s\")\n", sval);
validate(3);
}
else
{
printf("Misfire: You called touch3(\"%s\")\n", sval);
fail(3);
}
exit(0);
}
Your task is to get ctarget
to execute the code for touch3
rather than returning to test
. You must make it appear to touch3
as if you have passed a string representation of your cookie as its parameter.
Some advice:
-
You will need to include a string representation of your cookie in your exploit string. The string should consist of the eight hex digits (order from most to least significant) without the leading
0x
. -
Recall that a string is represented in C as a sequence of bytes followed by a byte with value
0
. Typeman ascii
on any Linux/Mac/WSL terminal to see the byte representations of the characters you need. -
Your injected code should set register
%rdi
to the address of your cookie string. -
When functions
hexmatch
andstrncmp
are called, they push data onto the stack, overwriting portions of memory that held the buffer used bygetbuf
. As a result, you will need to be careful where you place the string representation of your cookie.
Appendix A: Using hex2raw
Provided with this assignment, hex2raw
is a command-line utility that takes as input a hex-formatted string. In this format, each byte value is represented by two hex digits. For example, the string "012ABC"
could be entered in hex format as "30 31 32 41 42 43 00"
.
The hex characters you pass to hex2raw
should be separated by whitespace (spaces or newlines). We recommend separating different parts of your exploit string with newlines while you’re working on it. Conveniently, hex2raw
supports C-style block comments, so you can mark of sections of your exploit string. For example:
48 c7 c1 f0 11 40 00 /* mov $0x40011f0, %rcx */
Be sure to leave space around both the starting and ending comment strings ("/*"
, "*/
) so that the comments will be properly ignored.
Don’t forget that byte order matters in many contexts. If, say, a phase is expecting a 4-byte int
to be stored at a particular memory location, you’ll want to make sure you put the lowest-order byte of the int
first in memory.
If you generate a hex-formatted exploit string in the file ctarget.phase1
, you can provide the raw string to ctarget
in a couple of different ways:
-
You can set up a series of “pipes” to pass the string to
hex2raw
and pass its output toctarget
:cat ctarget.phase1 | ./hex2raw | ./ctarget
-
You can store the raw string in a file and provide the filename as a command-line argument:
./hex2raw < ctarget.phase1 > ctarget.phase1.raw ./ctarget -i ctarget.phase1.raw
This approach also can be used when running from within
gdb
. Note that the direction of the<
and>
are very important here.
Appendix B: Generating byte codes
For Phases 2 and 3, you need to send machine-language instructions as part of your input to ctarget
. But how can you determine which bytes comprise your desired instructions?
You can use gcc
as an assembler and objdump
as a disassembler to make it convenient to generate the bytes in your desired instruction sequences. For example, suppose you write a file example.s
containing the following assembly code:
# Example of hand-generated assembly code
pushq $0xABCDEF # push value onto stack
addq $17, %rax # add 17 to %rax
movl %eax, %edx # copy lower 32 bits of %eax to %edx
This code can contain a mixture of instructions and data. Anything to the right of a #
character is a comment. You can now assemble and disassemble this file:
gcc -c example.s
objdump -d example.o > example.d
The generated file example.d
contains the following:
example.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <.text>:
0: 68 ef cd ab 00 pushq $0xabcdef
5: 48 83 c0 11 add $0x11,%rax
9: 89 c2 mov %eax,%edx
The lines at the bottom show the machine code generated from the assembly language instructions. Each line has a hex number on the left indicating the instruction’s starting address (starting with 0
), and the hex digits after the :
character indicate the actual bytes that make up the machine-language version of the instruction. Thus, we can see that the instruction push $0xABCDEF
has hex-formatted machine-language code 68 ef cd ab 00
.
From this file, you can get the byte sequence for the entire code file:
68 ef cd ab 00 48 83 c0 11 89 c2
This string can then be passed through hex2raw
to generate an input string for the target programs. Alternatively, you can edit example.d
to omit extraneous values and to contain C-style comments for readability, yielding:
68 ef cd ab 00 /* pushq $0xabcdef */
48 83 c0 11 /* add $0x11,%rax */
89 c2 /* mov %eax,%edx */
This is also valid input that you can pass through hex2raw
before sending to one of the target programs.
Some final notes
-
Sections 3.10.3 and 3.10.4 of the textbook provide useful reference material for this assignment.
-
This stuff feels weird at first, but you can figure it out! Draw pictures, step slowly through the code, take a look at the stack and registers, figure out where function parameters are stored, and just gradually put together a picture of what’s going on for yourself. Talk to each other, talk to me, step away and take a walk, and come back to it again.
-
Learning how to exploit software vulnerabilities helps you to understand those vulnerabilities and how to prevent them. At the same time, of course, this knowledge could potentially be used to harm other people. Don’t do that.
Have fun and good luck!