reasoning-games — search agents for Connect Four & Uno

What it is

Two regimes of game AI

Connect Four is perfect information with no chance — the textbook home of minimax with alpha–beta pruning. On small boards the agent searches all the way to terminal positions, so its moves are provably optimal; on the full 6×7 board it searches to a fixed depth with a threat heuristic.

Uno (a faithful but simplified variant) adds the two things minimax can't see: hidden opponent hands and a random draw pile. The agent handles both by determinization — it samples many full game states consistent with what it can observe, searches each by Monte-Carlo rollouts, and averages.

Honest scope. The early sketch for this project named Uno, Quoridor and Ghosts. What is actually built and tested is two games: Connect Four (perfect-information minimax) and simplified Uno (chance + hidden information). Quoridor and Ghosts are not implemented; Connect Four is the perfect-information game. The simplifications to Uno are listed in the README.

Results

Real win rates vs a random agent

From python demo.py — sides swapped each game, fixed seeds. These are measured, not estimated.

Match	Search agent	Games	Win rate
Connect Four 6×7	minimax depth-4 + alpha–beta	50	100%
Uno (2p, 7-card)	determinized Monte-Carlo	40	90%

In the test suite, the exact (unbounded-depth) minimax agent wins 27+/30 on a small board where it plays perfectly, and alpha–beta is checked to return the identical value to plain minimax.

Try it

Play Connect Four against alpha–beta

You are purple and move first; click a column to drop a disc. The opponent is a small alpha–beta search reimplemented in JavaScript just for this page. The canonical agent is the Python one in this repo — this is only a playable illustration.

Your move.

The stack

From a game state to a chosen move

interface

Game / Agent

A shared interface — legal_moves, apply, is_terminal, winner — so every game plugs into every agent and the match runner.

Minimax + alpha–beta

Exact to terminal on small boards; depth-limited with a heuristic on the full board. Centre-first ordering and a transposition table; value equals plain minimax.

heuristic

Threat counting

Score non-terminal Connect Four positions by counting length-4 windows that still belong to a single player.

uncertainty

Determinization

For Uno: sample full states consistent with what the agent can see, fixing opponents' hand sizes and the public discard.

rollouts

Monte-Carlo

Evaluate each move by greedy playouts across the sampled worlds, then average — the standard PIMC approach for card games.

harness

Match runner

Self-play and agent-vs-agent matches with side-swapping and seeds, so reported win rates are reproducible.

Run it

Locally

# install test deps (runtime is stdlib-only)
pip install -r requirements.txt

# run the 27 tests
python -m pytest -q

# search agents vs random, with real win rates
python demo.py