CS208, Wednesday 19 Jan 2022
Fire up VS Code now
You'll need it in later in class
Questions
- quiz
- (how many of you can't see the ASCII chart in the quiz?)
- C
- anything else?
Binary numbers
- If you have a 7-bit box, you can store any of 2^7 = 128 bit patterns in it
- If you add 1 bit to your box (so it's now an 8-bit box), you double the number of possible bit patterns to 2^8 = 256
- ASCII is a 7-bit character set, with bit patterns 0000000 = 0 to 1111111 = 127
- 16 bits? 2^16 = 65,536
- 32 bits? 2^32 = (2^16)^2 = 4,294,967,296
- 32-bit two's complement? -2,147,483,648 to +2,147,483,647
- 64 bits? yikes, that's a lotta bit patterns!
A little more history
- ASCII
- ASCII extensions
- Unicode: late 1980s
Encodings
- Unicode (1988-present) specifies codepoints for characters ("this character goes with that integer")
- Unicode does a bunch of other stuff too (how to draw characters, how to sort strings, etc.)
- reminder: an encoding is a specific byte-for-byte scheme for storing a particular kind of data as a sequence of bytes (in memory, in a file, in a network stream, etc.)
- example: one way to encode integers is "32-bit little endian two's complement"
- character encodings: given a character's codepoint, how do I store that character?
- what character encoding names will you see?
- UTF-8 (with or without BOM)
- UCS-2 (obsolete, superceded by UTF-16)
- UTF-16 (UTF-16LE, UTF-16BE)
- UTF-32 (UTF-32LE, UTF-32BE)
- ASCII
- ISO-8859-1
- Big5 (traditional Chinese characters)
- GB 18030 (simplified Chinese characters)
- etc. etc. etc.
Experiment
Open up my tiny encoding tool
Copy these four items into a text file in VS Code and save it as "experiment-utf8.txt" or something like that.
- ABC
- résumé
- Привет
- 😀
Then use "hexdump -C filename" to take a look.
Then Save-As to formats UTF-8 with BOM