Character Sets

Contents
  1. Character Sets
  2. ASCII
  3. Unicode

1. Character Sets

A character is an individual letter, number or symbol Computers work entirely using binary math and logic, but need to be able to display written human languages (and their characters) – otherwise they wouldn’t be much use to us! Computers represent each characters we use with a unique binary number. A character set is the table a computer uses to convert the unique binary numbers into the characters that we use.


2. ASCII

ASCII is a character set that was created in the 1960s in the US. ASCII is an acronym for the American Standard Code for Information Interchange. It was based on telegraph code and each character is represented by 7 bits. Using 7 bits means that ASCII can represent 128 characters (lowercase, uppercase, numbers, a range of symbols, and control characters). The order of characters in the ASCII character set follows a logical order: e.g. A (65 in denary) is followed by B (66 in denary) so it it possible to deduce the binary representation of characters if you have some reference.

Limitations of ASCII

 With only using 7 bits, 128 characters can only cover characters within the English language. As more countries began to use computers and interest grew in being able to use them to communicate there became a need to represent a larger number of characters. ASCII Extended was created to make use of 8 bits. This allowed 256 characters to be used and included a range of additional Latin characters including those with accents.


3. Unicode

Over time ASCII Extended became too limiting and we created a new character set called Unicode. All computers worldwide are capable of using the Unicode character set. Unicode can get quite complicated, but to keep it simple for the exam, we’ll say each character uses 16 bits and we can represent around 65,000 characters The first 256 characters in Unicode match those in ASCII and ASCII Extended to maintain backward compatibility.

Benefits of Unicode

  • Contains a large number of the worlds symbols.
  • Almost all devices can decode text sent to them using Unicode.
  • Backwards compatible with ASCII (the first 128 characters use the same unique binary numbers!)
  • Contains emoji