Haupz Blog

... still a totally disordered mix

Menace

2025-12-15 — Michael Haupt

Watching a German quiz show a while back, I learned from this most unsuspected of sources that a man named Donald Michie had, in the early 1960s, devised an entire machine learning solution for playing Tic-Tac-Toe. This thing would actually apply reinforcement learning to improve over time, to the degree that games against a strong human player who knows the optimal strategy would always end in a draw.

The model is called MENACE. What’s particularly fun about it is that it is built using matchboxes filled with coloured beads. Each box represents one possible layout in the grid. The beads are randomly used to decide on the next move, and beads that lead to winning the game are multiplied in the respective box, thereby increasing the chance for being picked. Beads leading to losing the game are removed.

This illustrates nicely how reinforcement learning is about statistics. It is so simple that children can understand it.

Tags: the-nerdy-bit