The Sweet Learning Computer

Royal Society #summerscience 8 July 2016 - The Sweet Computer

How do machines learn? Don’t they just blindly follow rules? You can build a machine just from cups and sweets that learns how to beat humans at simple games. It learns from its mistakes (because you eat its sweets when it loses!) Let’s see how by building one to play the game of Ladders or Hexapawn.

The Sweet Learning Computer is just a tastier version of Donald Michie’s Matchbox Learning Computer (MENACE). It learns to play Noughts and Crosses. Donald Michie also worked at Bletchley Park helping crack German ciphers during the war. The idea of using it to play Hexapawn was Martin Gardner’s.

Learning to Play Ladder (or Chocolate Chilli)

Ladder is a simple game (based on the game called Chocolate Chilli) turned into a board game. Played on a 9 square + target square, there is a winning strategy for the first player. The aim is for the sweet learning computer to work out that strategy.

You will need

To demonstrate the Sweet Learning Computer learning the game, you will need :

a set of 9 transparent plastic cups,
Starburst Sweets (Original) or other similar wrapped sweets in the same colours: 9 Purple, 8 Red and 7 Green.
A large 9 x 1 board (e.g. a garden size noughts and crosses board laid out in a line rather than a square) or create squares from card
A target square (eg colour a paper plate) for the winning square on the board.
An X piece (e.g. write X on the back of a paper plate)
Pen and squared paper to plot the progress of the sweet computer learning.

Here are sheets for the game positions to download and print:

JPG of Ladder game states Download

To use in a workshop situation you will need sets for students to use in pairs, with a table top version of the board.

See the CS4FN blog post for an overview of the Ladder Sweet Learning Computer.

How to play

Place the 9 squares of the board in a line with the target at the top as a 10th square. Place the X piece on the first square at the other end from the target.

Two players take it in turns to move the single X piece either 1,2 or 3 places up the ladder towards the target. The winner is the player who moves the X piece onto the target.

(In Chocolate Chilli you have a pile of chocolate buttons and a chilli. On each move you eat 1, 2 or 3 buttons. If there are no buttons left you lose and have to eat the chilli. This is an equivalent game.)

Setting up the Sweet Learning Computer

You need 9 pictures of the board, each one with the X in a different one of the 9 positions, and with coloured arrows showing the possible moves as shown.

Place a cup with each and put a sweet of each colour in each cup matching the arrows (so all have three sweets apart from the last two nearest the target that two and one).

How the Sweet Computer Plays

The Sweet Computer plays first. When it is its move, you find the cup with the current position, shut your eyes and take a sweet at random. Place the sweet next to the cup. The machine makes the move based on the colour of the sweet. A purple sweet means move one place, a red sweet means move two places and a green sweet means move three places. Move the X that many places. Turn then passes to the human. They ignore the sweets and just play whatever move they like, before play passes back to the machine.

If there are no sweets in the cup for the current position, the machine resigns immediately.

How the Sweet Computer Learns

If the Human wins, eat the sweet corresponding to the last move it made. It will never make that bad move again. Put the other sweets back.

If the machine wins, put all the sweets back. They were a good way to play.

Demonstrating the Sweet Computer Learning.

To demonstrate the game and how the Sweet computer Learns, play a few games on the full board. Then to demonstrate it learning focus down to a board with 5 squares, playing lots of games. (If done in a workshop situation students can do this in pairs. Fairly quickly the sweets for immediately losing moves will be eaten, until the machine always wins. The cups and sweets now represent a winning strategy for playing first on a 5 square board.

Now, add the other 4 squares back and continue to play. If the machine ends up in a position with no sweets, it resigns so the last move’s sweet is eaten and the learning back-propagates back up the board, until it has worked out perfect strategy for the 9 square board.

Plot wins and losses on squared paper with an up diagonal showing a win for the machine and a down diagonal showing a loss. At first there will be lots of losses but gradually it will win more and more so the graph will head upwards.Be very careful that you do not get mixed up over whose go it is, as if you do you may eat sweets you shouldn’t and then the winning strategy could be lost!

Things to think about

What is the strategy the machine works out for winning? Can you write it down as an algorithm?
What happens when the machine plays second? There is no guaranteed winning strategy so what does it ultimately learn to do?
Can you change the learning algorithm so that it learns to give itself as much chance as winning as possible if the first player makes a mistake?
- Possible variations include adding extra sweets of the winning colour when it wins, never eating the last sweet of a cup, …and more complicated things…
Can you create a Sweet Learning Computer to play Chocolate Chilli?

Learning to Play Hexapawn

Hexapawn is a slightly more complex game with a winning strategy for player 2. It is a bit more complex to set up, but works in the same way.

Resources

You will also need :

a set of 24 transparent plastic cups,
Starburst Sweets (Original) or other similar wrapped sweets in the same colours: 14 Red, 14 Green, 14 Orange and 13 Purple
A large 3 x 3 board (e.g. a garden size noughts and crosses board)
3 X and 3 O pieces (e.g. write X and Os on the back of paper plates)
Pen and squared paper to plot the progress of the sweet computer learning.