Lab: a little data representation
Today, you'll save text and integer data in files and use the hexdump command to study the details of how the data items get stored.
I don't expect you to fully understand everything here after one class day. Keep reading, keep experimenting, collect your questions and ask them—gradually, things will start to make sense.
1. Characters
Do the following. Collect questions, play around, take notes. After about 20 minutes, we'll discuss.
- Connect to mantis in VS Code
- Create a new text file named something.txt. Type two or three short lines of ASCII text (just a few English words should do fine) and save.
- In the VS Code terminal, make sure you are cd'd to the directory
containing something.txt and then run
hexdump -C something.txt
- Do the hex values you see in your file correspond to the characters you entered? Do the characters come in the order you expect? Do you see any newline characters? (You might find it helpful to open an ASCII chart in a browser tab or view the chart in a terminal by running "man ascii".)
- Add the word "résumé" to something.txt. Note that é is not an ASCII character. Save and run "hexdump -C something.txt" again. Which bytes correspond to the é characters?
- On the bottom right status bar of VS Code, click on "UTF-8", select "Save with encoding", and then select "UTF-16 LE".
- Run hexdump again. What changed? Which bytes correspond to which characters?
- Again at the bottom right of VS Code, click on "UTF-16 LE", select "Save with encoding", and then select "UTF-16 BE".
- Run hexdump again. What changed? Which bytes correspond to which characters? How is this different from UTF-8? UTF-16 LE?
- Later, do a little internet exploration to figure out the difference between Unicode codepoints and character encodings.
2. Integers
- Grab a copy of integers.c and save it in your mantis working directory.
- Read it, predict what it will do, and run it:
gcc -Wall -Werror -o integers integers.c ./integers > output.dat
- Look at the output.dat file's contents using hexdump. How does what the C program did correspond to what you see in the output file?
- (By the way, what did the ">" sign do in the command above?)
- In integers.c, change "j = 25" to "j = -25", save, recompile, rerun, and hexdump. What changed? Why did it change exactly like that?