Most developers know that a character like "A" is represented inside microprocessor systems as a simple number. That's called encoding. Everything started with teletype machines which used a 5bit encoding. The so-called Baudot-code, named after Mr Émile Baudot who invented it in 1874) allowed to encode the 26 uppercase chars of the latin alphabet, the ten numbers from 0 to 9, some interpunction and and some non-printable control characters within that limited 32 character space, using a sophisticated shifting system, comparable to our modern caps lock function. Later, when the first computers came up, with screens and printers, the Baudot code was extended from 5 to 7 bits, giving a 128 character space. Now, almost everything could be encoded without shifting and tricks (they thought): Upper and lowercase letters, numbers, special characters, interpunction, mathematical signs and more control characters. Again, everybody was amazed, until people hit the flaws of the ASCII system when they wanted to do text I/O in other languages than English. That gave birth to the 8bit code page system: In whatever code page, the first half, the characters #0 to #127 remained the same, identical to the ASCII code. In the upper half, the characters #128 to #255 were used for encoding national specifics, like for example à, é, ô, or ç in France, ä, ö, ü, ß in Germany, and so on. In order to display or to print a text correctly, you'd have to know which code page was used for encoding, so that you could use the same for decoding. The ISO standard superseded and re-ordered these code pages with the intention to reduce the national chaos, but without success. And we didn't yet talk about all the languages which use non-latin characters, mostly in Asia. The UTF standard was then created to encode everything, Latin, Cyrillic, Chinese, Japanese, Serbian, etc. plus interpunction, plus graphical block characters plus smileys, plus..., plus... A 32bit space is a great thing to encode more than 4 billions of symbols, but it's not practical because each symbol would eat up 4 precious bytes. Thus, the UTF system applies a few tricks: First of all, the numbers #0 to #127 represent still - you guess it - the ASCII character set, which is still the biggest common denominator of character encoding. Then, parts of the upper half of the first byte indicate if the current symbol is encoded in one, two, three or four bytes (step back to Baudot's shifting system). And the encoding has been selected in a way that the more common a symbol is, the less bytes are required to encode it, which saves storage space and transmission bandwidth. All that to tell you, that yes, even today in 2021, English is the predominant common language and the ASCII character set is still the gold and unambiguous standard for character encoding. That's why in this article, we'll look deeper into the relationship between a character and it's numerical representation.