==== UTF-8 ====

http://www.cl.cam.ac.uk/~mgk25/unicode.html

==== Thumbnail sketch of UTF-8 ====

from;:  http://www.pemberley.com/janeinfo/latin1.html

In UTF-8, each 16-bit Unicode character is encoded as a sequence of one,
two, or three 8-bit bytes, depending on the value of the character.  The
following table shows the format of such UTF-8 byte sequences (where the
"free bits" shown by x's in the table are combined in the order shown, and
interpreted from most significant to least significant)

<file>
 Binary format of bytes in sequence:
                                        Number of    Maximum expressible
 1st byte     2nd byte    3rd byte      free bits:      Unicode value:

 0xxxxxxx                                  7           007F hex   (127)
 110xxxxx     10xxxxxx                  (5+6)=11       07FF hex  (2047)
 1110xxxx     10xxxxxx    10xxxxxx     (4+6+6)=16      FFFF hex (65535)
</file>

The value of each individual byte indicates its UTF-8 function, as
follows:

<file>
 00 to 7F hex   (0 to 127):  first and only byte of a sequence.
 80 to BF hex (128 to 191):  continuing byte in a multi-byte sequence.
 C2 to DF hex (194 to 223):  first byte of a two-byte sequence.
 E0 to EF hex (224 to 239):  first byte of a three-byte sequence.
</file>

Other byte values are either not used when encoding 16-bit
Unicode characters (i.e. F0 to F4 hex), or are not part of any well-formed
Unicode UTF-8 sequence (i.e. C0, C1, and F5 to FF hex);

see the links to UTF-8 standards documents below for further details.