2. Algorithms

2.1. What’s the big picture?

Every computer device you have ever used, from your school computers to your calculator, has been using algorithms to tell it how to do whatever it was doing. Algorithms are a very important topic in Computer Science because they help software developers create efficient and error free programs. The most important thing to remember about algorithms is that there can be many different algorithms for the same problem, but some are much better than others!

Click an image to play sorting animations

Animation of Selection Sort

Selection sort

Animation of Quick Sort

Quick sort

Animations provided by David Martin from www.sorting-algorithms.com

Computers are incredibly fast at manipulating, moving and looking through data. However the amount of data computers use is often so large that it doesn’t matter how fast the computer is, it will take it far too long to examine every single piece of data (companies like Google, Facebook and Twitter process about 1 billion things per day). This is where algorithms come in. If a computer is given a better algorithm to process the data then it doesn’t matter how much information it has to look through, it will still be able to do it in a reasonable amount of time.

If you have read through the Introduction chapter you may remember that the speed of an application on a computer makes a big difference to a human using it. If an application you create is too slow, people will get frustrated with it and won’t use it. It doesn’t matter if your program can solve all their life problems, if it takes too long they will simply get bored and close it!

2.1.1. Algorithms, Programs and Informal Instructions

At this stage you might be thinking that algorithms and computer programs kind of sound like the same thing, but they are actually two very distinct concepts. Descriptions of these and another important concept, Informal Instructions, are below. They are each different ways of describing how to do something:

  • Informal Instruction: An instruction using natural language. They are un-precise so computers cannot understand them, but humans are able to use their own intelligence to interpret them. This is the least precise of our three descriptions for doing things, and is typically used to give a very simple description of the general idea of an algorithm.
  • Algorithm: step by step process that describes how to solve a problem and/or complete a task, which will always give a result. They are more precise than Informal Instructions and do not require any knowledge or intelligence to follow, however they are still not precise enough for a computer to follow. These are often expressed using pseudo-code, which matches a programming language fairly closely, but leaves out details that could easily be added later by a programmer, and doesn’t specify the kinds of commands that can be used.
  • Program: a specific implementation of an algorithm, which is written in a specific programming language and will give a certain result. This is the most precise of these three descriptions and computers are able to follow and understand these.

For example…

  • Informal Instruction: “Get me a glass of water”. A human can understand what this means and can figure out how to accomplish this task by thinking, but a computer would have no idea how to do this!
  • Algorithm: 1) Go to the kitchen. 2) Pick up a glass. 3) Turn on the tap. 4) Put the glass under the running water and remove it once it is almost full. 5) Turn off the tap. 6) Take the glass back to the person who gave the instruction. A human could follow these instructions easily, but a computer could not figure out exactly what to do.
  • Program: A computer program, written in a programming language, which would tell a robot exactly how to retrieve a glass of water and bring it back to the person who asked for the water.

2.1.2. Algorithm cost

When Computer Scientists are comparing algorithms they often talk about the ‘cost’ of an algorithm. The cost of an algorithm can be interpreted in several different ways, but it is always related to how well an algorithm performs based on the size of its input, n. In this chapter we will talk about the cost of an algorithm as either the time it takes a program (which performs the algorithm) to complete, and the number of comparisons the algorithm makes before it finishes.

The amount of time a program which performs the algorithm takes to complete may seem like the simplest cost we could look at, but this can actually be affected by a lot of different things, like the speed of the computer being used, or the programming language the program has been written in. This means that if the time the program takes to complete is used to measure the cost of an algorithm it is important to use the same program and the same computer (or another computer with the same speed) for testing the algorithm with different numbers of inputs.

The number of comparisons an algorithm makes however will not change depending on the speed of a computer, or the programming language the program using the algorithm is written in. Some algorithms will always make the same number of comparisons for a certain input size, while others might vary.

If you want to find out more about how the cost of an algorithm is described in industry, with ‘Big-O Notation’, then check out “The Whole Story!” section at the end of this chapter.

2.1.3. Searching and Sorting

In this chapter we will look at two of the most common and important types of algorithms, Searching and Sorting. You probably come across these kinds of algorithms every time you use a computer without even realising!

2.2. Searching

Searching through collections of data is something computers have to do all the time. It happens every time you type in a search on Google, or when you type in a file name to search for on your computer. Computers deal with such huge amounts of data that we need fast algorithms to help us find information quickly. Lets investigate searching with a game…

You may have noticed that the numbers on the monsters and pets in the game were in a random order, which meant that finding the pet was basically luck! You might have found it on your first try, or if you were less lucky you might have had to look inside almost all the presents before you found it. This might not seem like such a bad thing since you had enough lives to look under all the boxes, but imagine if there had been 1,000 boxes, or worse 1,000,000! It would have taken far too long to look through all the boxes and the pet might have never been found.

Now this next game is slightly different. You have less lives, which makes things a bit more challenging, but this time the numbers inside the boxes will be in order. The monsters, or maybe the pet, with the smallest number is in the present on the far left, and the one with the largest number is in the present on the far right. Let’s see if you can collect all the pets without running out of lives…

Now that you have played through the whole game (and hopefully found all of the lost pets!) you may have noticed that even though you had less lives in the second part of the game, and lots of presents to search through, you were still able to find the pet. Why was this possible?

2.2.1. Two contrasting search algorithms

Since the boxes in the first game were in a random order there really wasn’t any strategy you could have used to find the pet, except simply keep opening presents one by one until you found the pet. This is very similar to the Linear Search Algorithm (sometimes called a sequential search). In plain english, this algorithm is as follows:

  • Check if the first item in a list is the item you are searching for, if it is the one you are looking for, you are done.
  • If it isn’t the item you are searching for move on and check the next item.
  • Continue checking items until you find the one you are searching for.

If you used this algorithm you might get lucky and find what you are looking for on your first go, but if you were really unlucky you might have to look through everything in your list before you found the right object! For a list of 10 items this means on average you would only have to look at 5 items to find what you were looking for, but for a list of 10000 you would have to look through on average 5000.

Curiosity: Bozo Search

If you watched the video at the beginning of the chapter you might be thinking that what you did in the present searching game sounds more like Bozo Search than Linear Search, but actually Bozo Search is even sillier than this! If you were doing a Bozo Search then after unwrapping a present and finding a monster inside, you would wrap the present back up before you moved on to the next one! This means you might end up checking the same present again and again and again and you might never find the pet, even with a small number of presents!

A much better algorithm to use is called Binary Search. In the second part of the present searching game the boxes were in order, which meant you were able to be more clever when you were searching for the pet, and you might have been using a Binary Search without realising...

If you used a Binary Search on each of the levels then you would have always had enough lives to find the pet! Informally, the Binary Search algorithm is as follows.

  • Look at the item in the centre of the list and compare it to what you are searching for
  • If it is what you are looking for then you are done.
  • If it is larger than the item you are looking for then you can ignore all the items in the list which are larger than that item (if the list is from smallest to largest this means you can ignore all the items to the right of the centre item).
  • If it is smaller then you can ignore all the items in the list which are smaller than that centre item.
  • Now repeat the algorithm on the remaining half of the list, checking the middle of the list and choosing one of the halves, until you find the item you are searching for.

Binary Search is a very powerful algorithm. If you had 1000 presents to Search through it would take you at most 10 checks for Binary search to find something and Linear search would take at most 1000 checks, but if you doubled the number of presents to search through how would this change the number of checks made by Binary Search and Linear search?

Hopefully you’ve noticed that the answer for each of these algorithms would be different.

It is important to remember that you can only perform a Binary Search if the items you are searching through are sorted into order. This makes the sorting algorithms we will look at next even more important because without sorting algorithms we wouldn’t be able to use Binary Search to quickly look through data!

The following files will run linear and binary search in various languages:

2.3. Sorting algorithms

Sorting is another very important area of algorithms. Computers often have to sort large amounts of data into order based on some attribute of that data, such as sorting a list of files by their name or size, or emails by the date they were received, or a customer list according to people’s names. Most of the time this is done to make searching easier. For example you might have a large amount of data and each piece of data could be someone’s name and their phone number. If you want to search for someone by name it would help to first have the data sorted alphabetically according to everyones names, but if you then wanted to search for a phone number it would be more useful to have the data sorted according to people’s phone numbers.

Like searching there are many different sorting algorithms, but some take much longer than others. In this section you will be introduced to two slower algorithms and one much better one.

2.3.1. Scales Interactive

Throughout this section you can use the sorting interactive to test out the algorithms we talk about. When you’re using it make sure you take note of the comparisons at the bottom of the screen, each time you compare two boxes the algorithm is making ‘one comparison’ so the total number of comparisons you have to make with each algorithm is the cost of that algorithm for the 8 boxes.

Use the scales to compare the boxes (you can only compare two boxes at a time) and then arrange them along the bottom of the screen. Arrange them so that the lightest box is on the far left and the heaviest is on the far right. Once you think they are in order click ‘Test order’.

If the interactive does not run properly on your computer you can use a set of physical balance scales instead — just make sure you can only tell if one box is heavier than the other, not their exact weight (so not digital scales that show the exact weight).

2.3.2. Selection Sort

One of the most intuitive ways to sort a group of boxes into order, from lightest to heaviest, is to start by first finding the lightest (or the heaviest) box and placing that to the side. Try this with the scales interactive.

After finding the lightest box simply repeat the process again with the remaining boxes until you find the second lightest, now place that to the side alongside the lightest box. If you keep repeating this process you will eventually find you have placed each box into order. Try sorting the whole group of boxes in the scales interactive into order using this method and count how many comparisons you have to make.

Tip: Start by moving all the boxes to the right of the screen and then once you have found the lightest box place it to the far right (if you want to find the heaviest first instead then move them all to the left).

If you record how many comparisons you had to make each time to find the next lightest box you might notice a pattern (hint: finding the lightest should take 7 comparisons, and then finding the second lightest should take 6 comparisons…). If you can see the pattern then how many comparisons do you think it would take to then sort 9 boxes into order? What about 20? If you knew how many comparisons it would take to sort 1000 boxes, then how many more comparisons would it take to sort 1001 instead?

This algorithm is called Selection sort, because each time you look through the list you are ‘selecting’ the next lightest box and putting it into the correct position. If you go back to the algorithms racing interactive at the top of the page you might now be able to watch the selection sort list and understand what it is doing at each step.

The selection sort algorithm can be described as follows.

  • Find the smallest item in the list and place it to one side. This will be your sorted list.
  • Next find the smallest item in the remaining list, remove it and place it into your sorted list beside the item you previously put to the side.
  • Repeat this process until all items have been selected and moved into their correct position in the sorted list.

You can swap the word ‘smallest’ for ‘largest’ and the algorithm will still work, as long as you are consistent it doesn’t matter if you are looking for the smallest or the largest item each time.

2.3.3. Insertion Sort

This algorithm works by removing each box from the original group of boxes and inserting it into its correct position in a new sorted list. Like Selection Sort, it is very intuitive and people often perform it when they are sorting objects themselves, like cards in their hands.

Try this with the scales interactive. Start by moving all the boxes to one side of the screen, this is your original, and unsorted, group. Now choose a box at random and place that on the other side of the screen, this is the start of your sorted group.

To insert another box into the sorted group, compare it to the box that is already in the sorted group and then arrange these two boxes in the correct order. Then to add the next box compare it to these boxes (depending on the weight of the box you might only have to compare it to one!) and then arrange these three boxes in the correct order. Continue inserting boxes until the sorted list is complete. Don’t forget to count how many comparisons you had to make!

This algorithm is called Insertion Sort. If you’re not quite sure if you’ve got the idea of the algorithm yet then have a look at this animation from Wikipedia.

Insertion sort can be described with informal instructions as follows:

  • Take an item from your unsorted list and place it to the side, this will be your sorted list.
  • One by one, take each item from the unsorted list and insert it into the correct position in the sorted list.
  • Do this until all items have been sorted.

People often perform this when they physically sort items. It can also be a very useful algorithm to use if you already have a sorted set of data and want to add a new piece of data into the set. For example if you owned a library and purchased a new book you wouldn’t do a Selection Sort on the entire library just to place this new book, you would simply insert the new book in its correct place.

2.3.4. Quicksort

Insertion and Selection Sort may seem like logical ways to sort things into order, but they both take far too many comparisons when they are used for large amounts of data. Remember computers often have to search through HUGE amounts of data, so even if they use a good searching algorithm like Binary Search to look through their data, if they use a bad sorting algorithm to first sort that data into order then finding anything will take far too long!

A much better sorting algorithm is Quicksort! (the name is a bit of a giveaway)

This algorithm is a little more complicated, but is very powerful. To do this algorithm with the sorting interactive, start by randomly choosing a box and placing it on the scales. Now compare every other box to the one you selected; heavier boxes should be put on the right of the second row and lighter boxes are put on the left. When you are done, place the box you were comparing everything else to between these two groups, but to help you keep track of things, put it in the row below. The following example shows how it might look after this step. Note that the selected block is in the right place for the final sorted order, and everything on either side will remain on the side that it is on.

Quicksort interactive in progress

Now apply this process to each of the two groups of boxes (the lighter ones, then the heavier ones). Keep on doing this until they are all sorted. The boxes should then be in sorted order!

It might be worth trying this algorithm out a few times and counting the number of comparisons you perform each time. This is because sometimes you might be unlucky and happen to pick the heaviest, or the lightest box first. On the other hand you might be very lucky and choose the middle box to compare everything to first. Depending on this the number of comparisons you perform will change.

Quicksort can be described in the following way:

  • Choose an item from the list and compare every other item in the list to this (this item is often called the pivot).
  • Place all the items that are greater than it into one subgroup and all the items that are smaller into another subgroup. Place the pivot item in between these two subgroups.
  • Choose a subgroup and repeat this process. Eventually each subgroup will contain only one item and at this stage the items will be in sorted order.

The following files will run quicksort in various languages:

2.4. Other topics in algorithms

  • There is another searching algorithm which performs even better than Binary Search. It is called Hashing and can be investigated with the CS Unplugged Battleships Game.
  • There are some problems for which no good algorithms have been found (and many people believe they will never be found). For more on these kinds of algorithms see the Complexity and Tractability chapter in the Field Guide.

2.5. The whole story!

We’ve only really scraped the surface of algorithms in this chapter, as there are millions of different algorithms for millions of different problems! Algorithms are used in maths, route planning, network planning and operation, problem solving, artificial intelligence, genetic programming, computer vision, the list goes on and on! But by going through this chapter you should have gained an understanding of the key concepts of algorithms and will be well prepared to tackle more complicated ones in the future.

In this chapter we have only talked about the number of comparisons an algorithm makes, and the amount of time a program takes to complete as ‘costs’ of algorithms. There are actually many other ways of measuring the cost of an algorithm. These include the amount of memory the algorithm uses and its computational complexity. Computer Scientists use ‘Big O notation’ to more accurately describe the performance or complexity of an algorithm, and you are likely to come across this notation very quickly when investigating the performance of algorithms. It characterises the resources needed by an algorithm and is usually applied to the execution time required, or sometimes the space used by the algorithm.

Here are some Big O examples:

  • O(1) - An algorithm with O(1) complexity will always execute in the same amount of time regardless of how much data you give it to process
  • O(n) - The amount of time an algorithm with O(n) complexity will take to execute will increase linearly and in direct proportion to the amount of data you give it to process. Remember that Big O describes the worst case scenario so the algorithm might sometimes take less time, but the greatest amount of time it can take will increase in direct proportion to the amount of data it is given.
  • O(n2) - The performance of an algorithm with this complexity is directly proportional to the square of the size of the input data set.
  • O(2n) - The amount of time an algorithm with this complexity will take to complete will double with each additional element added to the data set! Does this remind you of any of the algorithms you have looked at in this chapter?

Big O Notation however requires some advanced mathematics to explore thoroughly so has been intentionally left out of this main chapter, but if you want to learn more check out the Useful Links section. These topics are looked at in more depth in the Complexity and Tractability chapter.

To make things even more complicated, in practice algorithms are running on computers that have cached memory and virtual memory, where the time to access a particular value can be particularly short, or particularly long. There is a whole range of algorithms that are used for this situation to make sure that the algorithm still runs efficiently in such environments. Such algorithms are still based on the ideas we’ve looked at in this chapter, but require some clever adjustments to ensure that they work well.

2.6. Further reading