Brute-force password cracking

Folder: passwords
Files: passwords/summary.txt, cracked1.txt, cracked2.txt, cracked3.txt, passwords.py

Partner or solo, as you wish.

Goals

Learn some basic facts about how passwords are stored
Analyze the effects of salt as a protection against brute-force password cracking
Think about why we store hashed passwords instead of just the passwords

Rubric

2 - all passwords cracked for part 1 1 - some passwords cracked for part 2 1 - some passwords cracked for part 3 1 - all passwords cracked for part 4 3 - timing reports 6 - analysis

Background

User names on Linux systems are typically stored in a file named /etc/passwd. (Check out this description of the /etc/passwd file format.) This file is world-readable, so if you want to see all the users on a particular system you're logged into, just do cat /etc/passwd.

It used to be that the password hashes were also stored in /etc/passwd, but that changed some years ago for obvious security reasons. These days, it is more typical for the password hashes to be stored in a file named /etc/shadow. (Here's a description of the shadow file format.)

Check out the difference in permissions between these two files on mantis or mirage:

ls -l /etc/passwd ls -l /etc/shadow

For this assignment, you're going to do a little password cracking and thinking about the time and space complexity of password cracking.

Part 1: Unsalted one-word passwords

A typical line in /etc/shadow might look like this:

jondich:182072537ada59e4d6b18034a80302ebae935f66adbdf0f271d3d36309c2d481::0:99999:7:::

This colon-delimited set of fields includes the user name, the SHA-256 hash of the user's password, and then miscellaneous other stuff that won't concern us.

As we'll see, this is not quite a modern format for a variety of reasons, including the fact that the cryptographic hash function has not been explicitly specified and there's no salt included. But this is pretty close to how passwords are stored on most Linux systems.

Consider example password file #1. Your job for Part 1 is to:

Figure out as many of the one-word passwords as possible
Collect timing information (see What to hand in below for details)

A little help

Every one of the passwords in this file is one all-lowercase word taken randomly from this file of words.
For your debugging assistance: the password for "jondich" is "marmot".
You can load the words file into a python list quickly using this line:
words = [line.strip().lower() for line in open('words.txt')]
Working in python3 means that you have to do some pretty nutty type conversions between the password string and the hash value, etc. Here's a short sample program to help you get started.
You can time a Unix command-line program by putting "time " in front of the command (like "time python yourprogram.py"). It produces output like this:
real 0m0.022s user 0m0.002s sys 0m0.006s
(This was the output for "ls", so the numbers are very small.) There's a nice description of the meaning of this output on stackoverflow, not surprisingly. When you report times, you should report user time.

Part 2: Unsalted two-word passwords

Same task as Part 1, but using example password file #2. This time, all the passwords are two random words concatenated (e.g. "cowgecko"). The password for jondich is still "marmot".

Part 3: Salted passwords

Now, let's switch to a slightly different password file format:

jondich:$5$e75fa822$8a604057b98aff07885d29eea97e885e::0:99999:7:::

For this phase, use example password file #3. The passwords in this file are one-word passwords as in Part 1.

In the hash field, we have an 8-digit hexadecimal number known as "salt", then a dollar sign, and then the hash of the salt concatenated with the password (i.e. H(salt || pw)). As before, the "jondich" password is "marmot", so you can use that to check your hash computation code. Also, as before, the hash function we're using is SHA-256.

(WARNING: This hashing technique is designed to introduce you to the basic idea of salted passwords. However, the technique and salt sizes used here are not ready for prime-time. There are a couple more steps we need to take—longer salts, multiple rounds of hashing, etc.—before we're getting close to best practice for password storage.)

As in parts 1 and 2:

Figure out as many of the passwords as possible
Collect timing information (see What to hand in below for details)

Part 4: password-cracking software

There are many password-cracking software packages out there in the world for use by ethical and not-so-ethical hackers. Two of the most popular are hashcat and john the ripper.

For this part of the assignment, read up on one of these tools and then use it to crack the passwords from part 3 above.

NOTE: Sometimes, you need to modify the file format to use hashcat or john on a set of password hashes. Making sure your file format works is part of the job for Part 4.

What to hand in?

In your "passwords" folder, put:

Your source code in a program named passwords.py
A file for Part 1 named cracked1.txt, consisting of lines of the form "username:password", like

jondich:marmot

for each of the passwords you discovered.
A file for Part 2 named cracked2.txt, using the same format as cracked1.txt.
A file for Part 3 named cracked3.txt, using the same format as cracked1.txt.
A file named summary.txt, showing:

Part 1 Total time: [user time from a "time" command] Number of hashes computed: [count of the # of hashes computed] Passwords cracked: [number cracked] Time per hash computed: [seconds per hash] Time per password cracked: [seconds per password] Passwords cracked per number of hashes computed: [passwords per hash] Part 2 Total time: [user time from a "time" command] Number of hashes computed: [count of the # of hashes computed] Passwords cracked: [number cracked] Time per hash computed: [seconds per hash] Time per password cracked: [seconds per password] Passwords cracked per number of hashes computed: [passwords per hash] Part 3 Total time: [user time from a "time" command] Number of hashes computed: [count of the # of hashes computed] Passwords cracked: [number cracked] Time per hash computed: [seconds per hash] Time per password cracked: [seconds per password] Passwords cracked per number of hashes computed: [passwords per hash] Part 4 Which software did you use? What commands did you issue to do your password-cracking? How many passwords did you crack? How long did it take? Analysis: - Did your time per hash computed change between phases? By what factor? Why? - Did your time per password crack change between phases? By what factor? Why? - Suppose you wanted to precompute all the hashes for each possible password so you could just look up the password in a table indexed by the hash. How much memory would be required for each phase? - How well did hashcat or john do compared to your python program? Give details. - Give 3-4 reasons we should store password hashes and not the passwords themselves. Think in terms of threats, who the attackers might be, etc.

A couple clarifications

For the memory question in summary.txt, as a rough estimate you may assume:
- Each hash string is 32 bytes long.
- Each password is 16 bytes long.
- Each mapping of a hash to a password takes up an additional 32 bytes of overhead beyond the space required to store the hash and the password.
How many passwords do you need to crack for Parts 2, 3, and 4 to get those points' worth of credit?

For Part 2, I'm going to say 50 passwords is sufficient. With a reasonable approach in python, you should be able to pull this off within an hour or two of runtime.

For Parts 3 and 4, I'm running some tests of my own right now. I'll edit this paragraph once I have clarity.