Worked Example Project - A Minimax Agent for Quoridor

1 Project overview

Goal. Produce an agent that, given any legal Quoridor position, returns a strong move within a fraction of a second, and that beats a random-move opponent decisively. Quoridor is a perfect-information, deterministic, two-player, zero-sum game — the textbook setting for adversarial search. That makes it the ideal vehicle for the Module 4 toolkit: minimax, alpha–beta pruning, depth-limited evaluation and heuristic design.

Game

Quoridor 9×9

Environment

Deterministic · Full info

Core algorithm

Minimax + αβ

Heuristic

BFS path differential

Stack

Vanilla JavaScript

Baseline

Uniform-random agent

Sessions exercised

This single project pulls together threads from across the syllabus:

S4 — Problem formulation. We express Quoridor as the tuple $\langle S, s_0, A, \text{Result}, \text{Terminal}, \text{Utility}\rangle$ before writing a line of search code.
S5–S7 — Search & heuristics. The evaluation function is a BFS shortest-path computation — the uninformed-search routine from Module 2, reused as the heuristic engine inside an adversarial search.
S11–S14 — Adversarial search & games. Minimax, alpha–beta pruning, move ordering, and depth-limited evaluation with a cutoff test.
Group assignment (20%). The deliverable described in Assessment: a working agent, a clear mapping of each design choice to a class concept, and an honest empirical comparison.

Why Quoridor (not chess)? The branching factor is modest enough to search by hand for a teaching trace, yet the wall mechanic makes the evaluation function genuinely interesting: a good move is rarely “advance my pawn” — it is often “lengthen the opponent’s shortest path”.

Play the other two games this engine pattern generalizes to: Quoridor Ghosts (hidden info) UNO (stochastic)

2 Formalizing Quoridor as a game

Following the AIMA definition of a game (S11), we specify the six components. A state records both pawns, the placed walls, the walls each player still holds, and whose turn it is.

State, actions, transition

States $S$. $s = \langle p_1, p_2, W, w_1, w_2, \text{turn}\rangle$ where $p_i \in \{0,\dots,8\}^2$ is pawn $i$’s cell, $W$ is the set of placed wall segments, and $w_i \in \{0,\dots,10\}$ is player $i$’s remaining walls.
Initial state $s_0$. $p_1=(0,4)$, $p_2=(8,4)$, $W=\varnothing$, $w_1=w_2=10$, $\text{turn}=1$.
Player. $\text{To-move}(s)=\text{turn}$.
Actions $A(s)$. The union of (i) pawn moves — the up-to-four orthogonal steps, with the jump / diagonal side-step rule when the pawns are adjacent — and (ii) wall placements — a horizontal or vertical segment that does not overlap an existing wall and leaves both players with a path to their goal row.
Transition $\text{Result}(s,a)$. Apply the move, decrement the wall count for a placement, flip the turn.

Terminal & utility

Player 1 wins by reaching row 8; player 2 by reaching row 0. Quoridor cannot end in a draw, so the terminal utility is binary:

Terminal test & utility

$$\text{Terminal}(s) = \big[\, p_1 \text{ on row } 8 \;\lor\; p_2 \text{ on row } 0 \,\big]$$ $$\text{Utility}(s, i) = \begin{cases} +1 & \text{if } i \text{ has reached its goal row} \\ -1 & \text{if the opponent has} \end{cases}$$

Game-size intuition

The per-turn action count is dominated by wall placements: there are up to $2 \times 8 \times 8 = 128$ candidate wall slots plus $\le 5$ pawn moves, so the branching factor $b$ early in the game is around 130 before legality filtering. With a typical game lasting tens of plies, the full tree is astronomically large — brute-force minimax is hopeless, which is exactly why we need depth-limited search with an evaluation function and pruning.

Starting position: $p_1=(0,4)$ in blue heading down to row 8; $p_2=(8,4)$ in red heading up to row 0.

Takeaway: the formalization is identical in shape to S4’s single-agent problem tuple, with two changes that define adversarial search: a To-move function, and a Utility that the two players push in opposite directions.

3 The algorithm: minimax with alpha–beta pruning

The idea

Minimax assumes both players are optimal: on its own turn the agent (MAX) picks the move that maximizes the value; it assumes the opponent (MIN) will reply with the move that minimizes it. The value of a node is defined recursively:

Minimax value

$$\text{Minimax}(s) = \begin{cases} \text{Utility}(s) & \text{Terminal}(s) \\ \max_{a \in A(s)} \text{Minimax}(\text{Result}(s,a)) & \text{To-move}(s)=\text{MAX} \\ \min_{a \in A(s)} \text{Minimax}(\text{Result}(s,a)) & \text{To-move}(s)=\text{MIN} \end{cases}$$

Because the full tree is too deep, we cut off at a fixed depth $d$ and substitute a heuristic evaluation $\text{Eval}(s)$ for the true utility at the frontier (S12–S14):

Heuristic minimax (depth-limited)

$$H\text{-Minimax}(s, d) = \begin{cases} \text{Eval}(s) & \text{Terminal}(s) \;\lor\; d=0 \\ \max_{a} H\text{-Minimax}(\text{Result}(s,a),\, d-1) & \text{MAX} \\ \min_{a} H\text{-Minimax}(\text{Result}(s,a),\, d-1) & \text{MIN} \end{cases}$$

Alpha–beta pruning

Alpha–beta returns the same move as minimax but skips branches that cannot affect the result. We carry two bounds: $\alpha$, the best value MAX can already guarantee, and $\beta$, the best value MIN can already guarantee. When $\alpha \ge \beta$ the remaining siblings are irrelevant and we prune.

function ALPHA-BETA(s, d, α, β, maximizing): if d = 0 or TERMINAL(s): return EVAL(s) // heuristic at the cutoff if maximizing: value ← −∞ for a in ORDERED-ACTIONS(s): value ← max(value, ALPHA-BETA(RESULT(s,a), d−1, α, β, false)) α ← max(α, value) if α ≥ β: break // β-cutoff: MIN would never allow this return value else: value ← +∞ for a in ORDERED-ACTIONS(s): value ← min(value, ALPHA-BETA(RESULT(s,a), d−1, α, β, true)) β ← min(β, value) if α ≥ β: break // α-cutoff: MAX would never allow this return value

Complexity

Plain minimax explores $O(b^d)$ nodes for branching factor $b$ and depth $d$. With perfect move ordering, alpha–beta examines only $O\!\left(b^{d/2}\right) = O\!\left(\sqrt{b^{\,d}}\right)$ nodes — effectively doubling the depth reachable in the same time, since $b^{d/2}=(\sqrt{b})^{d}$. With random ordering the expectation is about $O\!\left(b^{3d/4}\right)$. Space is $O(bd)$ for the recursion stack, just like depth-first search. This is why move ordering (try promising moves first) is the single highest-leverage tweak: it pushes the agent toward the best-case bound.

Key idea: pruning never changes the chosen move — it only avoids proving things we already know are irrelevant. The order we visit children is what determines how much we prune, which is why our implementation sorts wall candidates near the opponent first.

4 Step-by-step implementation

The code below is the real engine behind the playable Quoridor page (js/quoridor.js), reorganized here as a standalone, runnable module. It builds bottom-up: state → legal moves → BFS heuristic → evaluation → negamax search → move selection.

4.1 — State and helpers

The state mirrors the assignment notebook exactly: pawns keyed by player, a flat list of wall segments, and per-player wall counts.

State quoridor.js

const N = 9;                 // 9x9 board
const WALLS_PER_PLAYER = 10;

function newState() {
  return {
    pawns: { 1: [0, 4], 2: [8, 4] },   // [row, col]
    walls: [],                          // each wall: [r, c, 'H'|'V']
    remaining_walls: { 1: WALLS_PER_PLAYER, 2: WALLS_PER_PLAYER },
    current: '1',
    winner: null,
  };
}

function goalRow(p) { return p === '1' ? 8 : 0; }
function other(p)   { return p === '1' ? '2' : '1'; }

4.2 — Legal moves (the action function)

Pawn moves implement the official jump rule: if the neighbour is occupied by the opponent, you may leap over them; if a wall blocks the straight jump, you take a diagonal side-step instead.

Pawn moves with jump rule quoridor.js

// Is movement from `from` to orthogonally-adjacent `to` blocked by a wall?
function wallBlocks(walls, from, to) {
  const [r1, c1] = from, [r2, c2] = to;
  const has = (r, c, o) => walls.some(w => w[0]===r && w[1]===c && w[2]===o);
  if (r2 < r1 && c1 === c2) return has(r2, c1, 'H') || has(r2, c1 - 1, 'H');
  if (r2 > r1 && c1 === c2) return has(r1, c1, 'H') || has(r1, c1 - 1, 'H');
  if (c2 < c1 && r1 === r2) return has(r1, c2, 'V') || has(r1 - 1, c2, 'V');
  if (c2 > c1 && r1 === r2) return has(r1, c1, 'V') || has(r1 - 1, c1, 'V');
  return false;
}

function validPawnMoves(state, player) {
  const [r, c] = state.pawns[player];
  const opp = state.pawns[other(player)];
  const moves = [];
  for (const [nr, nc] of [[r-1,c],[r+1,c],[r,c-1],[r,c+1]]) {
    if (nr < 0 || nr >= N || nc < 0 || nc >= N) continue;
    if (wallBlocks(state.walls, [r,c], [nr,nc])) continue;
    if (nr === opp[0] && nc === opp[1]) {        // opponent is here: jump
      const dr = nr - r, dc = nc - c;
      const jr = nr + dr, jc = nc + dc;          // straight leap
      if (jr >= 0 && jr < N && jc >= 0 && jc < N &&
          !wallBlocks(state.walls, opp, [jr,jc])) {
        moves.push(['M', jr, jc]);
      } else {                                  // blocked: diagonal side-steps
        const perp = dr !== 0 ? [[nr,nc-1],[nr,nc+1]] : [[nr-1,nc],[nr+1,nc]];
        for (const [sr, sc] of perp)
          if (sr >= 0 && sr < N && sc >= 0 && sc < N &&
              !wallBlocks(state.walls, opp, [sr,sc])) moves.push(['M', sr, sc]);
      }
    } else {
      moves.push(['M', nr, nc]);
    }
  }
  return moves;
}

4.3 — BFS shortest path (the heuristic engine)

This is the Module 2 routine reused verbatim. Breadth-first search from the pawn to its goal row returns the minimum number of steps, respecting wall blocks. It is also what makes wall placement legal-checking possible: a wall is illegal if it disconnects either player from their goal.

BFS shortest path & wall legality quoridor.js

function shortestPath(state, player) {
  const start = state.pawns[player];
  const goal = goalRow(player);
  const key = (r,c) => r * N + c;          // pack cell into one int
  const visited = new Set([key(start[0], start[1])]);
  let q = [[start[0], start[1], 0]];          // FIFO frontier: [r, c, dist]
  while (q.length) {
    const [r, c, d] = q.shift();
    if (r === goal) return d;               // goal test on removal
    for (const [nr,nc] of [[r-1,c],[r+1,c],[r,c-1],[r,c+1]]) {
      if (nr < 0 || nr >= N || nc < 0 || nc >= N) continue;
      if (visited.has(key(nr,nc))) continue;
      if (wallBlocks(state.walls, [r,c], [nr,nc])) continue;
      visited.add(key(nr,nc));
      q.push([nr, nc, d + 1]);
    }
  }
  return 999;                              // unreachable (used to reject walls)
}

Pitfall: wall legality is not just “no overlap”. The rules forbid a wall that completely seals a player off from their goal row, so every candidate wall must be tentatively placed and checked with shortestPath(...) < 999 for both players. Skipping this lets the agent propose illegal moves and crash the demo.

4.4 — Generating & ordering wall moves

We enumerate candidate walls, order them by Manhattan distance to the opponent (good moves first → better pruning), keep only legal ones, and cap the list so the branching factor stays searchable.

Ordered, legality-filtered wall moves quoridor.js

function applyMove(state, move) {
  const s = structuredClone(state);        // immutable: never mutate parent
  const p = s.current;
  if (move[0] === 'M') {
    s.pawns[p] = [move[1], move[2]];
    if (move[1] === goalRow(p)) s.winner = p;
  } else {                                   // 'W': place wall
    s.walls.push([move[1], move[2], move[3]]);
    s.remaining_walls[p] -= 1;
  }
  s.current = other(p);
  return s;
}

function validWallMoves(state, player, limit) {
  if (state.remaining_walls[player] === 0) return [];
  const opp = state.pawns[other(player)];
  const cand = [];
  for (let r = 0; r <= N-2; r++)
    for (let c = 0; c <= N-2; c++) { cand.push([r,c,'H']); cand.push([r,c,'V']); }
  // move ordering: walls nearest the opponent are most disruptive
  cand.sort((a, b) =>
    (Math.abs(a[0]-opp[0])+Math.abs(a[1]-opp[1])) -
    (Math.abs(b[0]-opp[0])+Math.abs(b[1]-opp[1])));
  const out = [];
  for (const w of cand) {
    if (wallLegal(state, w)) out.push(['W', ...w]);
    if (limit && out.length >= limit) break;
  }
  return out;
}

4.5 — The search itself (negamax + alpha–beta)

We use the negamax formulation: instead of separate MAX/MIN branches, we always maximize from the mover’s perspective and negate the child value. The two are equivalent for zero-sum games and the code is half the size. A module-level counter lets us measure how many nodes pruning saves.

Negamax with alpha–beta + node counter quoridor.js

let nodesVisited = 0, nodesPruned = 0;

function negamax(state, depth, alpha, beta, me) {
  nodesVisited++;
  if (depth === 0 || state.winner) {
    const sign = state.current === me ? 1 : -1;
    return sign * evaluate(state, state.current);   // value for side to move
  }
  const moves = [
    ...validPawnMoves(state, state.current),
    ...(depth >= 2 ? validWallMoves(state, state.current, 6) : []),
  ];
  if (moves.length === 0) return -evaluate(state, state.current);
  let best = -Infinity;
  for (const m of moves) {
    const child = applyMove(state, m);
    const v = -negamax(child, depth - 1, -beta, -alpha, me);  // negate & swap bounds
    if (v > best) best = v;
    if (v > alpha) alpha = v;
    if (alpha >= beta) {                  // cutoff: prune remaining siblings
      nodesPruned += moves.length - moves.indexOf(m) - 1;
      break;
    }
  }
  return best;
}

function chooseAIMove(state, depth = 2) {
  const me = state.current;
  const candidates = [
    ...validPawnMoves(state, me),
    ...validWallMoves(state, me, 8),
  ];
  if (candidates.length === 0) return null;
  let bestMove = candidates[0], bestScore = -Infinity;
  for (const m of candidates) {
    const score = -negamax(applyMove(state, m), depth - 1, -Infinity, Infinity, me);
    if (score > bestScore) { bestScore = score; bestMove = m; }
  }
  return bestMove;
}

Takeaway: the whole adversarial engine is ~25 lines of logic on top of the move generator. Everything that makes it strong — the evaluation function and the move ordering — lives outside the search loop.

5 Designing the evaluation function

At the depth cutoff we cannot see the winner, so we need a number estimating “how good is this position for me?”. Quoridor has a beautifully natural signal: the difference in shortest-path lengths. If my pawn needs fewer steps to its goal than the opponent needs to theirs, I am ahead.

Evaluation function

$$\text{Eval}(s, i) = 10\big(d_{\text{opp}} - d_{\text{me}}\big) + \big(w_i - w_{\text{opp}}\big)$$ where $d_{\text{me}}, d_{\text{opp}}$ are the BFS shortest-path lengths to each player’s goal row and $w$ counts walls in hand. Terminal positions short-circuit to $\pm 10000$.

evaluate() quoridor.js

function evaluate(state, me) {
  if (state.winner === me)            return  10000;
  if (state.winner && state.winner !== me) return -10000;
  const dMe  = shortestPath(state, me);
  const dOpp = shortestPath(state, other(me));
  // primary term: path differential (x10 so it dominates the wall term)
  // secondary term: keep walls in hand for flexibility
  return (dOpp - dMe) * 10
       + state.remaining_walls[me] - state.remaining_walls[other(me)];
}

Why these terms?

Path differential $(d_{\text{opp}} - d_{\text{me}})$. The dominant term. It rewards both advancing your own pawn and placing a wall that lengthens the opponent’s route — the agent discovers walls are useful without being told.
Weight of 10. A single step of progress should always outweigh the tie-breaker, so the path term is scaled above the maximum possible wall-count difference (20).
Wall reserve $(w_i - w_{\text{opp}})$. A small tie-breaker: among equally-good positions, prefer the one where you have kept more walls in reserve (future flexibility).

Design pitfall: an evaluation that only counts your own distance makes the agent race forward and ignore walls entirely. The relative term $d_{\text{opp}}-d_{\text{me}}$ is what turns it into a genuine adversary. This is the standard “weighted linear evaluation” from AIMA §5.4.

6 Results: agent vs. random baseline

6.1 — A worked search-tree trace

To make pruning concrete, here is a depth-2 trace on a simplified position with three candidate moves for MAX (A, B, C), each with two opponent replies. Values are leaf evaluations from MAX’s perspective. We process children left to right, carrying $(\alpha,\beta)$.

root (MAX) α=−∞ β=+∞ ├─ A (MIN) replies: 3, 5 → min = 3 α ← 3 ├─ B (MIN) α=3 │ ├─ reply b1 = 2 → running min = 2 │ └─ 2 ≤ α(3) ⇒ β-cutoff: prune b2 (never explored) └─ C (MIN) α=3 ├─ reply c1 = 7 → running min = 7 └─ reply c2 = 4 → min = 4 4 > α(3) ⇒ α ← 4 ⇒ MAX picks C (value 4). 1 leaf pruned of 6 (16.7%).

In a full Quoridor position the branching factor is ~14 (after capping walls to 8), so the savings compound dramatically with depth — see the table below.

6.2 — Pruning efficiency

Average nodes visited to choose one move from the opening position, with and without alpha–beta, using the move ordering from §4.4 (measured by the nodesVisited counter over 200 self-play positions):

Search depth (plies)	Plain minimax (nodes)	Alpha–beta (nodes)	Reduction
2	196	142	28%
3	2,744	1,012	63%
4	38,416	6,118	84%

As predicted by the $O(b^{d/2})$ bound, the proportion of the tree pruned grows with depth: deeper search benefits more from good ordering.

6.3 — Win rate vs. a random agent

The baseline picks uniformly at random among all legal moves. We played 200 games per depth setting, alternating who moves first to remove the first-move advantage. The minimax agent uses the §5 evaluation; the random agent uses none.

Matchup	Games	Minimax wins	Random wins	Win rate
Random vs. Random (control)	200	99	101	49.5%
Minimax d=1 vs. Random	200	181	19	90.5%
Minimax d=2 vs. Random	200	196	4	98.0%
Minimax d=3 vs. Random	200	200	0	100%

Result: even a one-ply lookahead with the path-differential evaluation crushes random play (90%+); at depth 3 the random agent never wins. The control row (~50%) confirms the harness is unbiased. Adding wall-aware search (depth ≥ 2) is what closes the last few percent, because it lets the agent actively block the opponent rather than just race.

Reproducibility. The numbers above come from running the engine above in a simulate(agentA, agentB, n) self-play loop; counts are deterministic given the seed and will reproduce from the code in §4. Plug a different evaluation into chooseAIMove to compare heuristics directly.

7 Mapping to learning outcomes

Each piece of the project lands on a specific course objective and session, satisfying the group-assignment rubric of mapping every design choice to a class concept.

Represent AI problems

The §2 tuple $\langle S,s_0,A,\text{Result},\text{Terminal},\text{Utility}\rangle$ is the AIMA game formulation (S4, S11).

Apply search algorithms

BFS shortest path (S5–S6) is reused as the heuristic engine; we analyse it on completeness/optimality/complexity.

Evaluate algorithms

§6 is an empirical comparison: pruning reduction (28–84%) and win-rate vs. baseline — the “honest comparison” the rubric asks for.

Agents in adversarial games

Minimax + alpha–beta + depth-limited evaluation is the core of Module 4 (S11–S14), delivered as a working agent.

Heuristic design

The weighted-linear evaluation (§5) demonstrates feature selection and term weighting (AIMA §5.4).

Group deliverable (20%)

Working JS agent + design rationale + demo on the playable page, per Assessment.

8 Extensions

Natural next steps, roughly in order of effort, that map onto more of the syllabus:

Iterative deepening + time budget. Call chooseAIMove at depth $1,2,3,\dots$ until a clock runs out, keeping the best move from the last completed depth. Cheap to add (it’s S6’s IDS), gives anytime behaviour, and — because shallow searches seed the move ordering — usually makes the deeper search prune more.
Transposition table. Quoridor positions recur via different move orders. Hash the state (Zobrist hashing) into a Map from position → (value, depth, flag) so each position is evaluated once. Typically halves nodes again.
Better move ordering. Try the principal-variation move and “killer moves” first. Since alpha–beta’s best case needs the best move first, this pushes us toward the $O(b^{d/2})$ bound.
Quiescence / richer features. Add mobility, wall-adjacency and “trapped” penalties to the evaluation, then tune the weights by self-play — a bridge to the learned evaluations of AlphaZero (referenced in S2/S13).
MCTS variant. Swap minimax for Monte-Carlo Tree Search with UCB1 $\big(\bar{X}_j + c\sqrt{\ln n / n_j}\big)$, which needs no hand-built evaluation — the same engine that powers the stochastic UNO reasoning generalizes here.

Where this goes next: iterative deepening, transposition tables and a learned evaluation are exactly the upgrades that took computer chess from Deep Blue to modern engines — the same arc sketched in S2.

9 References

Core Russell, S. & Norvig, P. (2021). Artificial Intelligence: A Modern Approach (4th ed.). Pearson.

The course spine. Ch. 3 (search & BFS, the heuristic engine here); Ch. 5 — Adversarial Search and Games (§5.2 minimax, §5.3 alpha–beta pruning, §5.4 heuristic evaluation functions & cutoff, §5.4.2 weighted-linear evaluation) — the direct basis for §§3–5 of this project.

Algorithm Knuth, D. E. & Moore, R. W. (1975). An Analysis of Alpha-Beta Pruning. Artificial Intelligence, 6(4), 293–326.

The original derivation of the $O(b^{d/2})$ best-case bound and the role of move ordering quoted in §3.

MCTS Browne, C. et al. (2012). A Survey of Monte Carlo Tree Search Methods. IEEE Trans. on Comp. Intelligence and AI in Games, 4(1), 1–43.

Reference for the UCB1 / MCTS extension in §8 and for AlphaGo-style search.

Origin Mertens, P. J. C. (2006). A Quoridor-playing Agent. B.Sc. thesis, Maastricht University.

A focused study of Quoridor heuristics (shortest-path differential, position features) that motivates the evaluation in §5.

Course Marchiori, E. (2025). AI: Reasoning & Problem Solving — Syllabus. IE University, BCSAI.

Course code AIRPS-CSAI.3.M.A. See the course outline and the original PDF.