Lab: a little data representation, part 1
Today, you'll save text and integer data in files and use the hexdump command to study the details of how the data items get stored.
I don't expect you to fully understand everything here after one class day. Keep reading, keep experimenting, collect your questions and ask them—gradually, things will start to make sense.
Characters
Do the following. Collect questions, play around, take notes. After about 20 minutes, we'll discuss.
- Connect to mantis in VS Code
- Create a new text file named something.txt. Type two or three short lines of ASCII text (just a few English words should do fine) and save.
- In the VS Code terminal, make sure you are cd'd to the directory
containing something.txt and then run
hexdump -C something.txt
- Do the hex values you see in your file correspond to the characters you entered? Do the characters come in the order you expect? Do you see any newline characters? (You might find it helpful to open an ASCII chart in a browser tab or view the chart in a terminal by running "man ascii".)
- Add the word "résumé" and the Greek letters "αβγδ" to something.txt. Note that é and the Greek letters are not ASCII characters. Save and run "hexdump -C something.txt" again. Which bytes correspond to é, α, β, γ, and δ?
- On the bottom right status bar of VS Code, click on "UTF-8", select "Save with encoding", and then select "UTF-16 LE".
- Run hexdump again. What changed? Which bytes correspond to which characters?
- Again at the bottom right of VS Code, click on "UTF-16 LE", select "Save with encoding", and then select "UTF-16 BE".
- Run hexdump again. What changed? Which bytes correspond to which characters? How is this different from UTF-8? UTF-16 LE?
- Later, do a little internet exploration to figure out the difference between Unicode codepoints and character encodings.