6. Coding — Introduction

6.1. What’s the big picture?

The word “code” has lots of meanings in computer science. It’s often used to talk about programming, and a program can be referred to as “source code”. However, in this chapter (and the next three chapters), we will use it to talk about representing information in useful ways, such as secret codes. In the previous chapter we looked at using binary representations to store all kinds of data — numbers, text, images and more. But often simple binary representations aren’t so useful. Sometimes they take up too much space, sometimes small errors in the data can cause big problems, and sometimes we worry that someone else could easily read our messages. Most of the the time all three of these things are a problem! The codes that we will look overcome all of these problems, and are widely used for storing and transmitting important information.

The three main reasons that we use more complex representations of binary data are:

  • Compression: this reduces the amount of space the data needs (for example, coding an audio file using MP3 compression can reduce the size of an audio file to well under 10% of its original size)
  • Encryption: this changes the representation of data so that you need to have a “key” to unlock the message (for example, whenever your browser uses “https” instead of “http” to communicate with a website, encryption is being used to make sure that anyone eavesdropping on the connection can’t make any sense of the information)
  • Error Control: this adds extra information to your data so that if there are minor failures in the storage device or transmission, it is possible to detect that the data has been corrupted, and even reconstruct the information (for example, every bar code has an extra digit added to it so that if the bar code is scanned incorrectly in a checkout, it makes a warning sound instead of charging you for the wrong product).

Often all three of these are applied to the same data; for example, a photo taken on a camera is often compressed using JPG, stored on the camera card with error correction, and stored on a backup disk with encryption so that if the disk was stolen the data couldn’t be accessed.

Without these forms of coding, digital devices would be very slow, have limited capacity, be unreliable, and be unable to keep your information private.

6.2. The whole story!

The idea of encoding data to make the representation more compact, robust or secure is centuries old, but the solid theory needed to support codes in the information age was developed in the 1940s — not surprisingly considering that technology played such an important role in World War II, where efficiency, reliability and secrecy were all very important. One of the most celebrated researchers in this area was Claude Shannon, who developed the field of “information theory”, which is all about how data can be represented effectively.

A key concept in Shannon’s work is a measure of information called “entropy”, which established mathematical limits like how small files could be compressed, and how many extra bits must be added to a message to achieve a given level of reliability. While the idea of entropy is beyond the scope of this section, there are some fun games that provide a taste of how you could measure information content by guessing what letter comes next; there’s an Unplugged activity called Twenty Guesses, and an online game for guessing sentences.

6.3. Further reading

James Gleick’s book The Information: A History, a Theory, a Flood provides an interesting view of the history of several areas relating to coding.