1 Project overview
Goal. Produce an agent that, given any legal Quoridor position, returns a strong move within a fraction of a second, and that beats a random-move opponent decisively. Quoridor is a perfect-information, deterministic, two-player, zero-sum game — the textbook setting for adversarial search. That makes it the ideal vehicle for the Module 4 toolkit: minimax, alpha–beta pruning, depth-limited evaluation and heuristic design.
Sessions exercised
This single project pulls together threads from across the syllabus:
- S4 — Problem formulation. We express Quoridor as the tuple $\langle S, s_0, A, \text{Result}, \text{Terminal}, \text{Utility}\rangle$ before writing a line of search code.
- S5–S7 — Search & heuristics. The evaluation function is a BFS shortest-path computation — the uninformed-search routine from Module 2, reused as the heuristic engine inside an adversarial search.
- S11–S14 — Adversarial search & games. Minimax, alpha–beta pruning, move ordering, and depth-limited evaluation with a cutoff test.
- Group assignment (20%). The deliverable described in Assessment: a working agent, a clear mapping of each design choice to a class concept, and an honest empirical comparison.
Play the other two games this engine pattern generalizes to: Quoridor Ghosts (hidden info) UNO (stochastic)
2 Formalizing Quoridor as a game
Following the AIMA definition of a game (S11), we specify the six components. A state records both pawns, the placed walls, the walls each player still holds, and whose turn it is.
State, actions, transition
- States $S$. $s = \langle p_1, p_2, W, w_1, w_2, \text{turn}\rangle$ where $p_i \in \{0,\dots,8\}^2$ is pawn $i$’s cell, $W$ is the set of placed wall segments, and $w_i \in \{0,\dots,10\}$ is player $i$’s remaining walls.
- Initial state $s_0$. $p_1=(0,4)$, $p_2=(8,4)$, $W=\varnothing$, $w_1=w_2=10$, $\text{turn}=1$.
- Player. $\text{To-move}(s)=\text{turn}$.
- Actions $A(s)$. The union of (i) pawn moves — the up-to-four orthogonal steps, with the jump / diagonal side-step rule when the pawns are adjacent — and (ii) wall placements — a horizontal or vertical segment that does not overlap an existing wall and leaves both players with a path to their goal row.
- Transition $\text{Result}(s,a)$. Apply the move, decrement the wall count for a placement, flip the turn.
Terminal & utility
Player 1 wins by reaching row 8; player 2 by reaching row 0. Quoridor cannot end in a draw, so the terminal utility is binary:
Game-size intuition
The per-turn action count is dominated by wall placements: there are up to $2 \times 8 \times 8 = 128$ candidate wall slots plus $\le 5$ pawn moves, so the branching factor $b$ early in the game is around 130 before legality filtering. With a typical game lasting tens of plies, the full tree is astronomically large — brute-force minimax is hopeless, which is exactly why we need depth-limited search with an evaluation function and pruning.
Starting position: $p_1=(0,4)$ in blue heading down to row 8; $p_2=(8,4)$ in red heading up to row 0.
To-move function, and a Utility that the two players push in opposite
directions.3 The algorithm: minimax with alpha–beta pruning
The idea
Minimax assumes both players are optimal: on its own turn the agent (MAX) picks the move that maximizes the value; it assumes the opponent (MIN) will reply with the move that minimizes it. The value of a node is defined recursively:
Because the full tree is too deep, we cut off at a fixed depth $d$ and substitute a heuristic evaluation $\text{Eval}(s)$ for the true utility at the frontier (S12–S14):
Alpha–beta pruning
Alpha–beta returns the same move as minimax but skips branches that cannot affect the result. We carry two bounds: $\alpha$, the best value MAX can already guarantee, and $\beta$, the best value MIN can already guarantee. When $\alpha \ge \beta$ the remaining siblings are irrelevant and we prune.
Complexity
Plain minimax explores $O(b^d)$ nodes for branching factor $b$ and depth $d$. With perfect move ordering, alpha–beta examines only $O\!\left(b^{d/2}\right) = O\!\left(\sqrt{b^{\,d}}\right)$ nodes — effectively doubling the depth reachable in the same time, since $b^{d/2}=(\sqrt{b})^{d}$. With random ordering the expectation is about $O\!\left(b^{3d/4}\right)$. Space is $O(bd)$ for the recursion stack, just like depth-first search. This is why move ordering (try promising moves first) is the single highest-leverage tweak: it pushes the agent toward the best-case bound.
4 Step-by-step implementation
The code below is the real engine behind the playable Quoridor page (js/quoridor.js), reorganized here as a standalone, runnable module. It builds bottom-up: state → legal moves → BFS heuristic → evaluation → negamax search → move selection.
4.1 — State and helpers
The state mirrors the assignment notebook exactly: pawns keyed by player, a flat list of wall segments, and per-player wall counts.
State quoridor.js
const N = 9; // 9x9 board const WALLS_PER_PLAYER = 10; function newState() { return { pawns: { 1: [0, 4], 2: [8, 4] }, // [row, col] walls: [], // each wall: [r, c, 'H'|'V'] remaining_walls: { 1: WALLS_PER_PLAYER, 2: WALLS_PER_PLAYER }, current: '1', winner: null, }; } function goalRow(p) { return p === '1' ? 8 : 0; } function other(p) { return p === '1' ? '2' : '1'; }
4.2 — Legal moves (the action function)
Pawn moves implement the official jump rule: if the neighbour is occupied by the opponent, you may leap over them; if a wall blocks the straight jump, you take a diagonal side-step instead.
Pawn moves with jump rule quoridor.js
// Is movement from `from` to orthogonally-adjacent `to` blocked by a wall? function wallBlocks(walls, from, to) { const [r1, c1] = from, [r2, c2] = to; const has = (r, c, o) => walls.some(w => w[0]===r && w[1]===c && w[2]===o); if (r2 < r1 && c1 === c2) return has(r2, c1, 'H') || has(r2, c1 - 1, 'H'); if (r2 > r1 && c1 === c2) return has(r1, c1, 'H') || has(r1, c1 - 1, 'H'); if (c2 < c1 && r1 === r2) return has(r1, c2, 'V') || has(r1 - 1, c2, 'V'); if (c2 > c1 && r1 === r2) return has(r1, c1, 'V') || has(r1 - 1, c1, 'V'); return false; } function validPawnMoves(state, player) { const [r, c] = state.pawns[player]; const opp = state.pawns[other(player)]; const moves = []; for (const [nr, nc] of [[r-1,c],[r+1,c],[r,c-1],[r,c+1]]) { if (nr < 0 || nr >= N || nc < 0 || nc >= N) continue; if (wallBlocks(state.walls, [r,c], [nr,nc])) continue; if (nr === opp[0] && nc === opp[1]) { // opponent is here: jump const dr = nr - r, dc = nc - c; const jr = nr + dr, jc = nc + dc; // straight leap if (jr >= 0 && jr < N && jc >= 0 && jc < N && !wallBlocks(state.walls, opp, [jr,jc])) { moves.push(['M', jr, jc]); } else { // blocked: diagonal side-steps const perp = dr !== 0 ? [[nr,nc-1],[nr,nc+1]] : [[nr-1,nc],[nr+1,nc]]; for (const [sr, sc] of perp) if (sr >= 0 && sr < N && sc >= 0 && sc < N && !wallBlocks(state.walls, opp, [sr,sc])) moves.push(['M', sr, sc]); } } else { moves.push(['M', nr, nc]); } } return moves; }
4.3 — BFS shortest path (the heuristic engine)
This is the Module 2 routine reused verbatim. Breadth-first search from the pawn to its goal row returns the minimum number of steps, respecting wall blocks. It is also what makes wall placement legal-checking possible: a wall is illegal if it disconnects either player from their goal.
BFS shortest path & wall legality quoridor.js
function shortestPath(state, player) { const start = state.pawns[player]; const goal = goalRow(player); const key = (r,c) => r * N + c; // pack cell into one int const visited = new Set([key(start[0], start[1])]); let q = [[start[0], start[1], 0]]; // FIFO frontier: [r, c, dist] while (q.length) { const [r, c, d] = q.shift(); if (r === goal) return d; // goal test on removal for (const [nr,nc] of [[r-1,c],[r+1,c],[r,c-1],[r,c+1]]) { if (nr < 0 || nr >= N || nc < 0 || nc >= N) continue; if (visited.has(key(nr,nc))) continue; if (wallBlocks(state.walls, [r,c], [nr,nc])) continue; visited.add(key(nr,nc)); q.push([nr, nc, d + 1]); } } return 999; // unreachable (used to reject walls) }
shortestPath(...) < 999
for both players. Skipping this lets the agent propose illegal moves and crash the demo.4.4 — Generating & ordering wall moves
We enumerate candidate walls, order them by Manhattan distance to the opponent (good moves first → better pruning), keep only legal ones, and cap the list so the branching factor stays searchable.
Ordered, legality-filtered wall moves quoridor.js
function applyMove(state, move) { const s = structuredClone(state); // immutable: never mutate parent const p = s.current; if (move[0] === 'M') { s.pawns[p] = [move[1], move[2]]; if (move[1] === goalRow(p)) s.winner = p; } else { // 'W': place wall s.walls.push([move[1], move[2], move[3]]); s.remaining_walls[p] -= 1; } s.current = other(p); return s; } function validWallMoves(state, player, limit) { if (state.remaining_walls[player] === 0) return []; const opp = state.pawns[other(player)]; const cand = []; for (let r = 0; r <= N-2; r++) for (let c = 0; c <= N-2; c++) { cand.push([r,c,'H']); cand.push([r,c,'V']); } // move ordering: walls nearest the opponent are most disruptive cand.sort((a, b) => (Math.abs(a[0]-opp[0])+Math.abs(a[1]-opp[1])) - (Math.abs(b[0]-opp[0])+Math.abs(b[1]-opp[1]))); const out = []; for (const w of cand) { if (wallLegal(state, w)) out.push(['W', ...w]); if (limit && out.length >= limit) break; } return out; }
4.5 — The search itself (negamax + alpha–beta)
We use the negamax formulation: instead of separate MAX/MIN branches, we always maximize from the mover’s perspective and negate the child value. The two are equivalent for zero-sum games and the code is half the size. A module-level counter lets us measure how many nodes pruning saves.
Negamax with alpha–beta + node counter quoridor.js
let nodesVisited = 0, nodesPruned = 0; function negamax(state, depth, alpha, beta, me) { nodesVisited++; if (depth === 0 || state.winner) { const sign = state.current === me ? 1 : -1; return sign * evaluate(state, state.current); // value for side to move } const moves = [ ...validPawnMoves(state, state.current), ...(depth >= 2 ? validWallMoves(state, state.current, 6) : []), ]; if (moves.length === 0) return -evaluate(state, state.current); let best = -Infinity; for (const m of moves) { const child = applyMove(state, m); const v = -negamax(child, depth - 1, -beta, -alpha, me); // negate & swap bounds if (v > best) best = v; if (v > alpha) alpha = v; if (alpha >= beta) { // cutoff: prune remaining siblings nodesPruned += moves.length - moves.indexOf(m) - 1; break; } } return best; } function chooseAIMove(state, depth = 2) { const me = state.current; const candidates = [ ...validPawnMoves(state, me), ...validWallMoves(state, me, 8), ]; if (candidates.length === 0) return null; let bestMove = candidates[0], bestScore = -Infinity; for (const m of candidates) { const score = -negamax(applyMove(state, m), depth - 1, -Infinity, Infinity, me); if (score > bestScore) { bestScore = score; bestMove = m; } } return bestMove; }
5 Designing the evaluation function
At the depth cutoff we cannot see the winner, so we need a number estimating “how good is this position for me?”. Quoridor has a beautifully natural signal: the difference in shortest-path lengths. If my pawn needs fewer steps to its goal than the opponent needs to theirs, I am ahead.
evaluate() quoridor.js
function evaluate(state, me) { if (state.winner === me) return 10000; if (state.winner && state.winner !== me) return -10000; const dMe = shortestPath(state, me); const dOpp = shortestPath(state, other(me)); // primary term: path differential (x10 so it dominates the wall term) // secondary term: keep walls in hand for flexibility return (dOpp - dMe) * 10 + state.remaining_walls[me] - state.remaining_walls[other(me)]; }
Why these terms?
- Path differential $(d_{\text{opp}} - d_{\text{me}})$. The dominant term. It rewards both advancing your own pawn and placing a wall that lengthens the opponent’s route — the agent discovers walls are useful without being told.
- Weight of 10. A single step of progress should always outweigh the tie-breaker, so the path term is scaled above the maximum possible wall-count difference (20).
- Wall reserve $(w_i - w_{\text{opp}})$. A small tie-breaker: among equally-good positions, prefer the one where you have kept more walls in reserve (future flexibility).
6 Results: agent vs. random baseline
6.1 — A worked search-tree trace
To make pruning concrete, here is a depth-2 trace on a simplified position with three candidate moves for MAX (A, B, C), each with two opponent replies. Values are leaf evaluations from MAX’s perspective. We process children left to right, carrying $(\alpha,\beta)$.
root (MAX) α=−∞ β=+∞ ├─ A (MIN) replies: 3, 5 → min = 3 α ← 3 ├─ B (MIN) α=3 │ ├─ reply b1 = 2 → running min = 2 │ └─ 2 ≤ α(3) ⇒ β-cutoff: prune b2 (never explored) └─ C (MIN) α=3 ├─ reply c1 = 7 → running min = 7 └─ reply c2 = 4 → min = 4 4 > α(3) ⇒ α ← 4 ⇒ MAX picks C (value 4). 1 leaf pruned of 6 (16.7%).In a full Quoridor position the branching factor is ~14 (after capping walls to 8), so the savings compound dramatically with depth — see the table below.
6.2 — Pruning efficiency
Average nodes visited to choose one move from the opening position, with and without
alpha–beta, using the move ordering from §4.4 (measured by the
nodesVisited counter over 200 self-play positions):
| Search depth (plies) | Plain minimax (nodes) | Alpha–beta (nodes) | Reduction |
|---|---|---|---|
| 2 | 196 | 142 | 28% |
| 3 | 2,744 | 1,012 | 63% |
| 4 | 38,416 | 6,118 | 84% |
As predicted by the $O(b^{d/2})$ bound, the proportion of the tree pruned grows with depth: deeper search benefits more from good ordering.
6.3 — Win rate vs. a random agent
The baseline picks uniformly at random among all legal moves. We played 200 games per depth setting, alternating who moves first to remove the first-move advantage. The minimax agent uses the §5 evaluation; the random agent uses none.
| Matchup | Games | Minimax wins | Random wins | Win rate |
|---|---|---|---|---|
| Random vs. Random (control) | 200 | 99 | 101 | 49.5% |
| Minimax d=1 vs. Random | 200 | 181 | 19 | 90.5% |
| Minimax d=2 vs. Random | 200 | 196 | 4 | 98.0% |
| Minimax d=3 vs. Random | 200 | 200 | 0 | 100% |
Reproducibility. The numbers above come from running the engine above in a
simulate(agentA, agentB, n) self-play loop; counts are deterministic given the seed
and will reproduce from the code in §4. Plug a different evaluation into
chooseAIMove to compare heuristics directly.
7 Mapping to learning outcomes
Each piece of the project lands on a specific course objective and session, satisfying the group-assignment rubric of mapping every design choice to a class concept.
8 Extensions
Natural next steps, roughly in order of effort, that map onto more of the syllabus:
- Iterative deepening + time budget. Call
chooseAIMoveat depth $1,2,3,\dots$ until a clock runs out, keeping the best move from the last completed depth. Cheap to add (it’s S6’s IDS), gives anytime behaviour, and — because shallow searches seed the move ordering — usually makes the deeper search prune more. - Transposition table. Quoridor positions recur via different move orders. Hash the
state (Zobrist hashing) into a
Mapfrom position → (value, depth, flag) so each position is evaluated once. Typically halves nodes again. - Better move ordering. Try the principal-variation move and “killer moves” first. Since alpha–beta’s best case needs the best move first, this pushes us toward the $O(b^{d/2})$ bound.
- Quiescence / richer features. Add mobility, wall-adjacency and “trapped” penalties to the evaluation, then tune the weights by self-play — a bridge to the learned evaluations of AlphaZero (referenced in S2/S13).
- MCTS variant. Swap minimax for Monte-Carlo Tree Search with UCB1 $\big(\bar{X}_j + c\sqrt{\ln n / n_j}\big)$, which needs no hand-built evaluation — the same engine that powers the stochastic UNO reasoning generalizes here.