CS 231: Computer Security

base64: an example

See http://en.wikipedia.org/wiki/Base64 for more detail.

  1. Suppose we have this string, encoded in UTF-8, and we want to convert into an ASCII-only form for storage or transmission.

    abç ✂

    Note that the special character is U+2702 BLACK SCISSORS.

  2. Here's the UTF-8 version of this string:

    [ a ] [ b ] [ ç ] [space ] [ ✂ ] 01100001 01100010 11000011 10100111 00100000 11100010 10011100 10000010 0x61 0x62 0xC3 0xA7 0x20 0xE2 0x9C 0x82
  3. Now smoosh all the bits together:

    0110000101100010110000111010011100100000111000101001110010000010

    Break them into 6-bit chunks:

    011000 010110 001011 000011 101001 110010 000011 100010 100111 001000 0010

    If the last group isn't 6 bits, pad it with zeros:

    011000 010110 001011 000011 101001 110010 000011 100010 100111 001000 001000

    And for our convenience in the next step, let's convert those into decimal integers:

    24 22 11 3 41 50 3 34 39 8 8
  4. Use the six-bit integers as indexes into this chart:

    0 A 16 Q 32 g 48 w 1 B 17 R 33 h 49 x 2 C 18 S 34 i 50 y 3 D 19 T 35 j 51 z 4 E 20 U 36 k 52 0 5 F 21 V 37 l 53 1 6 G 22 W 38 m 54 2 7 H 23 X 39 n 55 3 8 I 24 Y 40 o 56 4 9 J 25 Z 41 p 57 5 10 K 26 a 42 q 58 6 11 L 27 b 43 r 59 7 12 M 28 c 44 s 60 8 13 N 29 d 45 t 61 9 14 O 30 e 46 u 62 + 15 P 31 f 47 v 63 /

    Which in this case yields:

    YWLDpyDinII
  5. Finally, if the number of characters in the result is not a multiple of 4, pad it out with enough equals-signs to make it so:

    YWLDpyDinII=

    And that, friends, is the base64 encoding of the byte sequence we started with&emdash;that is, the UTF-8 encoding of the string

    abç ✂
  6. Here's a little bit of Python code that will compute the same result:

    import base64 originalString = 'ab\xC3\xA7 \xE2\x9C\x82' print 'The original string: %s' % originalString encodedString = base64.b64encode(originalString) print 'The base64-encoded version of the string: %s' % encodedString decodedString = base64.b64decode(encodedString) print 'The re-decoded version of the string: %s' % decodedString