Lab 3: A little data representation
This lab will take place in two parts. First, we’ll look at character encodings, and then move on to exploring how integers are stored.
There will likely be parts of this lab that you don’t understand right away. Keep reading, keep experimenting, collect your questions, and ask them! Gradually, things will start to make more sense.
Part 1: Character encodings
Do the following. Remember to collect questions, play around, and take notes. After about 20 minutes, we’ll discuss.
-
Connect to
mantis
in VS Code. -
Create a new text file named
something.txt
. Type two or three short lines of ASCII text (e.g., just a few English words) and save your file. -
In the VS Code terminal, make sure you are
cd
’d into the directory containingsomething.txt
and run this command:hexdump -C something.txt
-
Do the hex values you see in your file correspond to the characters you entered? Do the characters come in the order you expect? Do you see any newline characters? (You may find it helpful to open an ASCII chart in a browser tab or view the chart in a terminal by running
man ascii
.) -
Copy the word “résumé” and the Greek letters “αβγδ” from this page into
something.txt
. Note thaté
and the Greek letters are not ASCII characters. Save and runhexdump -C something.txt
again. Which bytes correspond toé
,α
,β
,γ
, andδ
?
Now, let’s explore some alternate encodings.
-
On the bottom-right status bar of VS Code, click on
UTF-8
, selectSave with encoding
, and then selectUTF-16 LE
. -
Run
hexdump
again. What changed? Which bytes correspond to which characters? -
Again at the bottom right of VS Code, click on
UTF-16 LE
, selectSave with encoding
, and then selectUTF-16 BE
. -
Run
hexdump
again. What changed? Which byes correspond to which characters? How is this different from UTF-8? What about UTF-16 LE? -
You are hopefully starting to make some sense of the difference between Unicode codepoints and the character encodings using different encoding formats, like UTF-8, UTF-16 LE, and UTF-16 BE. If these are at all fuzzy, do a little internet exploration (after class) to figure out the differences between them.
Part 2: Integers
For this part, you’ll again work through a set of instructions, and answer some questions by writing C code.
a) Representing integers
-
Grab a copy of integer_rep.c and save it in your
mantis
working directory. -
Read it, predict what it will do, and run it:
gcc -Wall -Werror -o integer_rep integer_rep.c ./integer_rep > output.txt
-
Display
output.txt
’s file contents usinghexdump
. How does what the C program did correspond to what you see in the output file? -
As an aside, what did the
>
symbol do in the command above? -
In
integer_rep.c
, changej = 25
toj = -25
, save, recompile, rerun, and check the output again withhexdump
. What changed? Why did it change exactly like that?
b) A handy tool
If you want to know the exact bits contained in an int
, do this:
int j = 314;
printf("0x%08X\n", j);
It gets slightly weirder for long
(note the l
before the X
):
long k = 314159;
printf("0x%016lX\n", k);
It gets even weirder for char
, as we’ll explore shortly.
c) Some questions
Take some time to try using the sizeof
C function (which isn’t actually a function, but it behaves enough like one that we’ll pretend it is) to answer the following questions.
-
How many bytes are in an
int
? -
How many bytes are in a
long
? -
How many bytes are in a
char
? -
How many bytes are in an
unsigned int
? -
How many bytes are in an
unsigned long
? -
How many bytes are in an
unsigned char
? -
If you do this:
int j = -1; unsigned k = -1;
what bits are in
j
? What aboutk
?
-
What is going on here?!?
char c1 = 0x41; printf("c1 as char: %c\n", c1); printf("c1 as decimal int: %d\n", c1); printf("c1 as hexadecimal int: %X\n", c1); char c2 = 0xCE; printf("c2 as char: %c\n", c2); printf("c2 as decimal int: %d\n", c2); printf("c2 as hexadecimal int: %X\n", c2);
-
What about this one?!?
int s = -1; int t = (s >> 4); printf("s (-1): 0x%08X\n", s); printf("s >> 4: 0x%08X\n", t);
-
Do the same thing as before, but with
s
andt
declared asunsigned
. Before you run it, make a prediction as to whether it will be different or the same. Was your prediction correct?