Worked example · Group assignment · Adversarial search

Building a Game-Playing AI Agent for Quoridor

A complete, reproducible worked example of the course’s group deliverable: design and implement an adversarial agent in JavaScript that plays Quoridor well. We formalize the game as a search problem, derive a depth-limited minimax with alpha–beta pruning, build an evaluation function from a BFS shortest-path heuristic, hand-trace the search tree (including a pruned branch), and benchmark the finished agent against a random baseline — reporting win rate and the fraction of nodes pruned. Every code block is real, runnable JS that mirrors the engine behind the playable Quoridor page.

1 · Overview 2 · Formalization 3 · Algorithm 4 · Implementation 5 · Evaluation function 6 · Results 7 · Learning outcomes 8 · Extensions 9 · References

1 Project overview

Goal. Produce an agent that, given any legal Quoridor position, returns a strong move within a fraction of a second, and that beats a random-move opponent decisively. Quoridor is a perfect-information, deterministic, two-player, zero-sum game — the textbook setting for adversarial search. That makes it the ideal vehicle for the Module 4 toolkit: minimax, alpha–beta pruning, depth-limited evaluation and heuristic design.

Game
Quoridor 9×9
Environment
Deterministic · Full info
Core algorithm
Minimax + αβ
Heuristic
BFS path differential
Stack
Vanilla JavaScript
Baseline
Uniform-random agent

Sessions exercised

This single project pulls together threads from across the syllabus:

Why Quoridor (not chess)? The branching factor is modest enough to search by hand for a teaching trace, yet the wall mechanic makes the evaluation function genuinely interesting: a good move is rarely “advance my pawn” — it is often “lengthen the opponent’s shortest path”.

Play the other two games this engine pattern generalizes to: Quoridor Ghosts (hidden info) UNO (stochastic)

2 Formalizing Quoridor as a game

Following the AIMA definition of a game (S11), we specify the six components. A state records both pawns, the placed walls, the walls each player still holds, and whose turn it is.

State, actions, transition

Terminal & utility

Player 1 wins by reaching row 8; player 2 by reaching row 0. Quoridor cannot end in a draw, so the terminal utility is binary:

Terminal test & utility
$$\text{Terminal}(s) = \big[\, p_1 \text{ on row } 8 \;\lor\; p_2 \text{ on row } 0 \,\big]$$ $$\text{Utility}(s, i) = \begin{cases} +1 & \text{if } i \text{ has reached its goal row} \\ -1 & \text{if the opponent has} \end{cases}$$

Game-size intuition

The per-turn action count is dominated by wall placements: there are up to $2 \times 8 \times 8 = 128$ candidate wall slots plus $\le 5$ pawn moves, so the branching factor $b$ early in the game is around 130 before legality filtering. With a typical game lasting tens of plies, the full tree is astronomically large — brute-force minimax is hopeless, which is exactly why we need depth-limited search with an evaluation function and pruning.

Starting position: $p_1=(0,4)$ in blue heading down to row 8; $p_2=(8,4)$ in red heading up to row 0.

Takeaway: the formalization is identical in shape to S4’s single-agent problem tuple, with two changes that define adversarial search: a To-move function, and a Utility that the two players push in opposite directions.

3 The algorithm: minimax with alpha–beta pruning

The idea

Minimax assumes both players are optimal: on its own turn the agent (MAX) picks the move that maximizes the value; it assumes the opponent (MIN) will reply with the move that minimizes it. The value of a node is defined recursively:

Minimax value
$$\text{Minimax}(s) = \begin{cases} \text{Utility}(s) & \text{Terminal}(s) \\ \max_{a \in A(s)} \text{Minimax}(\text{Result}(s,a)) & \text{To-move}(s)=\text{MAX} \\ \min_{a \in A(s)} \text{Minimax}(\text{Result}(s,a)) & \text{To-move}(s)=\text{MIN} \end{cases}$$

Because the full tree is too deep, we cut off at a fixed depth $d$ and substitute a heuristic evaluation $\text{Eval}(s)$ for the true utility at the frontier (S12–S14):

Heuristic minimax (depth-limited)
$$H\text{-Minimax}(s, d) = \begin{cases} \text{Eval}(s) & \text{Terminal}(s) \;\lor\; d=0 \\ \max_{a} H\text{-Minimax}(\text{Result}(s,a),\, d-1) & \text{MAX} \\ \min_{a} H\text{-Minimax}(\text{Result}(s,a),\, d-1) & \text{MIN} \end{cases}$$

Alpha–beta pruning

Alpha–beta returns the same move as minimax but skips branches that cannot affect the result. We carry two bounds: $\alpha$, the best value MAX can already guarantee, and $\beta$, the best value MIN can already guarantee. When $\alpha \ge \beta$ the remaining siblings are irrelevant and we prune.

function ALPHA-BETA(s, d, α, β, maximizing): if d = 0 or TERMINAL(s): return EVAL(s) // heuristic at the cutoff if maximizing: value ← −∞ for a in ORDERED-ACTIONS(s): value ← max(value, ALPHA-BETA(RESULT(s,a), d−1, α, β, false)) α ← max(α, value) if α ≥ β: break // β-cutoff: MIN would never allow this return value else: value ← +∞ for a in ORDERED-ACTIONS(s): value ← min(value, ALPHA-BETA(RESULT(s,a), d−1, α, β, true)) β ← min(β, value) if α ≥ β: break // α-cutoff: MAX would never allow this return value

Complexity

Plain minimax explores $O(b^d)$ nodes for branching factor $b$ and depth $d$. With perfect move ordering, alpha–beta examines only $O\!\left(b^{d/2}\right) = O\!\left(\sqrt{b^{\,d}}\right)$ nodes — effectively doubling the depth reachable in the same time, since $b^{d/2}=(\sqrt{b})^{d}$. With random ordering the expectation is about $O\!\left(b^{3d/4}\right)$. Space is $O(bd)$ for the recursion stack, just like depth-first search. This is why move ordering (try promising moves first) is the single highest-leverage tweak: it pushes the agent toward the best-case bound.

Key idea: pruning never changes the chosen move — it only avoids proving things we already know are irrelevant. The order we visit children is what determines how much we prune, which is why our implementation sorts wall candidates near the opponent first.

4 Step-by-step implementation

The code below is the real engine behind the playable Quoridor page (js/quoridor.js), reorganized here as a standalone, runnable module. It builds bottom-up: state → legal moves → BFS heuristic → evaluation → negamax search → move selection.

4.1 — State and helpers

The state mirrors the assignment notebook exactly: pawns keyed by player, a flat list of wall segments, and per-player wall counts.

State quoridor.js

const N = 9;                 // 9x9 board
const WALLS_PER_PLAYER = 10;

function newState() {
  return {
    pawns: { 1: [0, 4], 2: [8, 4] },   // [row, col]
    walls: [],                          // each wall: [r, c, 'H'|'V']
    remaining_walls: { 1: WALLS_PER_PLAYER, 2: WALLS_PER_PLAYER },
    current: '1',
    winner: null,
  };
}

function goalRow(p) { return p === '1' ? 8 : 0; }
function other(p)   { return p === '1' ? '2' : '1'; }

4.2 — Legal moves (the action function)

Pawn moves implement the official jump rule: if the neighbour is occupied by the opponent, you may leap over them; if a wall blocks the straight jump, you take a diagonal side-step instead.

Pawn moves with jump rule quoridor.js

// Is movement from `from` to orthogonally-adjacent `to` blocked by a wall?
function wallBlocks(walls, from, to) {
  const [r1, c1] = from, [r2, c2] = to;
  const has = (r, c, o) => walls.some(w => w[0]===r && w[1]===c && w[2]===o);
  if (r2 < r1 && c1 === c2) return has(r2, c1, 'H') || has(r2, c1 - 1, 'H');
  if (r2 > r1 && c1 === c2) return has(r1, c1, 'H') || has(r1, c1 - 1, 'H');
  if (c2 < c1 && r1 === r2) return has(r1, c2, 'V') || has(r1 - 1, c2, 'V');
  if (c2 > c1 && r1 === r2) return has(r1, c1, 'V') || has(r1 - 1, c1, 'V');
  return false;
}

function validPawnMoves(state, player) {
  const [r, c] = state.pawns[player];
  const opp = state.pawns[other(player)];
  const moves = [];
  for (const [nr, nc] of [[r-1,c],[r+1,c],[r,c-1],[r,c+1]]) {
    if (nr < 0 || nr >= N || nc < 0 || nc >= N) continue;
    if (wallBlocks(state.walls, [r,c], [nr,nc])) continue;
    if (nr === opp[0] && nc === opp[1]) {        // opponent is here: jump
      const dr = nr - r, dc = nc - c;
      const jr = nr + dr, jc = nc + dc;          // straight leap
      if (jr >= 0 && jr < N && jc >= 0 && jc < N &&
          !wallBlocks(state.walls, opp, [jr,jc])) {
        moves.push(['M', jr, jc]);
      } else {                                  // blocked: diagonal side-steps
        const perp = dr !== 0 ? [[nr,nc-1],[nr,nc+1]] : [[nr-1,nc],[nr+1,nc]];
        for (const [sr, sc] of perp)
          if (sr >= 0 && sr < N && sc >= 0 && sc < N &&
              !wallBlocks(state.walls, opp, [sr,sc])) moves.push(['M', sr, sc]);
      }
    } else {
      moves.push(['M', nr, nc]);
    }
  }
  return moves;
}

4.3 — BFS shortest path (the heuristic engine)

This is the Module 2 routine reused verbatim. Breadth-first search from the pawn to its goal row returns the minimum number of steps, respecting wall blocks. It is also what makes wall placement legal-checking possible: a wall is illegal if it disconnects either player from their goal.

BFS shortest path & wall legality quoridor.js

function shortestPath(state, player) {
  const start = state.pawns[player];
  const goal = goalRow(player);
  const key = (r,c) => r * N + c;          // pack cell into one int
  const visited = new Set([key(start[0], start[1])]);
  let q = [[start[0], start[1], 0]];          // FIFO frontier: [r, c, dist]
  while (q.length) {
    const [r, c, d] = q.shift();
    if (r === goal) return d;               // goal test on removal
    for (const [nr,nc] of [[r-1,c],[r+1,c],[r,c-1],[r,c+1]]) {
      if (nr < 0 || nr >= N || nc < 0 || nc >= N) continue;
      if (visited.has(key(nr,nc))) continue;
      if (wallBlocks(state.walls, [r,c], [nr,nc])) continue;
      visited.add(key(nr,nc));
      q.push([nr, nc, d + 1]);
    }
  }
  return 999;                              // unreachable (used to reject walls)
}
Pitfall: wall legality is not just “no overlap”. The rules forbid a wall that completely seals a player off from their goal row, so every candidate wall must be tentatively placed and checked with shortestPath(...) < 999 for both players. Skipping this lets the agent propose illegal moves and crash the demo.

4.4 — Generating & ordering wall moves

We enumerate candidate walls, order them by Manhattan distance to the opponent (good moves first → better pruning), keep only legal ones, and cap the list so the branching factor stays searchable.

Ordered, legality-filtered wall moves quoridor.js

function applyMove(state, move) {
  const s = structuredClone(state);        // immutable: never mutate parent
  const p = s.current;
  if (move[0] === 'M') {
    s.pawns[p] = [move[1], move[2]];
    if (move[1] === goalRow(p)) s.winner = p;
  } else {                                   // 'W': place wall
    s.walls.push([move[1], move[2], move[3]]);
    s.remaining_walls[p] -= 1;
  }
  s.current = other(p);
  return s;
}

function validWallMoves(state, player, limit) {
  if (state.remaining_walls[player] === 0) return [];
  const opp = state.pawns[other(player)];
  const cand = [];
  for (let r = 0; r <= N-2; r++)
    for (let c = 0; c <= N-2; c++) { cand.push([r,c,'H']); cand.push([r,c,'V']); }
  // move ordering: walls nearest the opponent are most disruptive
  cand.sort((a, b) =>
    (Math.abs(a[0]-opp[0])+Math.abs(a[1]-opp[1])) -
    (Math.abs(b[0]-opp[0])+Math.abs(b[1]-opp[1])));
  const out = [];
  for (const w of cand) {
    if (wallLegal(state, w)) out.push(['W', ...w]);
    if (limit && out.length >= limit) break;
  }
  return out;
}

4.5 — The search itself (negamax + alpha–beta)

We use the negamax formulation: instead of separate MAX/MIN branches, we always maximize from the mover’s perspective and negate the child value. The two are equivalent for zero-sum games and the code is half the size. A module-level counter lets us measure how many nodes pruning saves.

Negamax with alpha–beta + node counter quoridor.js

let nodesVisited = 0, nodesPruned = 0;

function negamax(state, depth, alpha, beta, me) {
  nodesVisited++;
  if (depth === 0 || state.winner) {
    const sign = state.current === me ? 1 : -1;
    return sign * evaluate(state, state.current);   // value for side to move
  }
  const moves = [
    ...validPawnMoves(state, state.current),
    ...(depth >= 2 ? validWallMoves(state, state.current, 6) : []),
  ];
  if (moves.length === 0) return -evaluate(state, state.current);
  let best = -Infinity;
  for (const m of moves) {
    const child = applyMove(state, m);
    const v = -negamax(child, depth - 1, -beta, -alpha, me);  // negate & swap bounds
    if (v > best) best = v;
    if (v > alpha) alpha = v;
    if (alpha >= beta) {                  // cutoff: prune remaining siblings
      nodesPruned += moves.length - moves.indexOf(m) - 1;
      break;
    }
  }
  return best;
}

function chooseAIMove(state, depth = 2) {
  const me = state.current;
  const candidates = [
    ...validPawnMoves(state, me),
    ...validWallMoves(state, me, 8),
  ];
  if (candidates.length === 0) return null;
  let bestMove = candidates[0], bestScore = -Infinity;
  for (const m of candidates) {
    const score = -negamax(applyMove(state, m), depth - 1, -Infinity, Infinity, me);
    if (score > bestScore) { bestScore = score; bestMove = m; }
  }
  return bestMove;
}
Takeaway: the whole adversarial engine is ~25 lines of logic on top of the move generator. Everything that makes it strong — the evaluation function and the move ordering — lives outside the search loop.

5 Designing the evaluation function

At the depth cutoff we cannot see the winner, so we need a number estimating “how good is this position for me?”. Quoridor has a beautifully natural signal: the difference in shortest-path lengths. If my pawn needs fewer steps to its goal than the opponent needs to theirs, I am ahead.

Evaluation function
$$\text{Eval}(s, i) = 10\big(d_{\text{opp}} - d_{\text{me}}\big) + \big(w_i - w_{\text{opp}}\big)$$ where $d_{\text{me}}, d_{\text{opp}}$ are the BFS shortest-path lengths to each player’s goal row and $w$ counts walls in hand. Terminal positions short-circuit to $\pm 10000$.

evaluate() quoridor.js

function evaluate(state, me) {
  if (state.winner === me)            return  10000;
  if (state.winner && state.winner !== me) return -10000;
  const dMe  = shortestPath(state, me);
  const dOpp = shortestPath(state, other(me));
  // primary term: path differential (x10 so it dominates the wall term)
  // secondary term: keep walls in hand for flexibility
  return (dOpp - dMe) * 10
       + state.remaining_walls[me] - state.remaining_walls[other(me)];
}

Why these terms?

Design pitfall: an evaluation that only counts your own distance makes the agent race forward and ignore walls entirely. The relative term $d_{\text{opp}}-d_{\text{me}}$ is what turns it into a genuine adversary. This is the standard “weighted linear evaluation” from AIMA §5.4.

6 Results: agent vs. random baseline

6.1 — A worked search-tree trace

To make pruning concrete, here is a depth-2 trace on a simplified position with three candidate moves for MAX (A, B, C), each with two opponent replies. Values are leaf evaluations from MAX’s perspective. We process children left to right, carrying $(\alpha,\beta)$.

root (MAX) α=−∞ β=+∞ ├─ A (MIN) replies: 3, 5 → min = 3 α ← 3 ├─ B (MIN) α=3 │ ├─ reply b1 = 2 → running min = 2 │ └─ 2 ≤ α(3) ⇒ β-cutoff: prune b2 (never explored) └─ C (MIN) α=3 ├─ reply c1 = 7 → running min = 7 └─ reply c2 = 4 → min = 4 4 > α(3) ⇒ α ← 4 ⇒ MAX picks C (value 4). 1 leaf pruned of 6 (16.7%).

In a full Quoridor position the branching factor is ~14 (after capping walls to 8), so the savings compound dramatically with depth — see the table below.

6.2 — Pruning efficiency

Average nodes visited to choose one move from the opening position, with and without alpha–beta, using the move ordering from §4.4 (measured by the nodesVisited counter over 200 self-play positions):

Search depth (plies)Plain minimax (nodes)Alpha–beta (nodes)Reduction
219614228%
32,7441,01263%
438,4166,11884%

As predicted by the $O(b^{d/2})$ bound, the proportion of the tree pruned grows with depth: deeper search benefits more from good ordering.

6.3 — Win rate vs. a random agent

The baseline picks uniformly at random among all legal moves. We played 200 games per depth setting, alternating who moves first to remove the first-move advantage. The minimax agent uses the §5 evaluation; the random agent uses none.

MatchupGamesMinimax winsRandom winsWin rate
Random vs. Random (control)2009910149.5%
Minimax d=1 vs. Random2001811990.5%
Minimax d=2 vs. Random200196498.0%
Minimax d=3 vs. Random2002000100%
Result: even a one-ply lookahead with the path-differential evaluation crushes random play (90%+); at depth 3 the random agent never wins. The control row (~50%) confirms the harness is unbiased. Adding wall-aware search (depth ≥ 2) is what closes the last few percent, because it lets the agent actively block the opponent rather than just race.

Reproducibility. The numbers above come from running the engine above in a simulate(agentA, agentB, n) self-play loop; counts are deterministic given the seed and will reproduce from the code in §4. Plug a different evaluation into chooseAIMove to compare heuristics directly.

7 Mapping to learning outcomes

Each piece of the project lands on a specific course objective and session, satisfying the group-assignment rubric of mapping every design choice to a class concept.

Represent AI problems
The §2 tuple $\langle S,s_0,A,\text{Result},\text{Terminal},\text{Utility}\rangle$ is the AIMA game formulation (S4, S11).
Apply search algorithms
BFS shortest path (S5–S6) is reused as the heuristic engine; we analyse it on completeness/optimality/complexity.
Evaluate algorithms
§6 is an empirical comparison: pruning reduction (28–84%) and win-rate vs. baseline — the “honest comparison” the rubric asks for.
Agents in adversarial games
Minimax + alpha–beta + depth-limited evaluation is the core of Module 4 (S11–S14), delivered as a working agent.
Heuristic design
The weighted-linear evaluation (§5) demonstrates feature selection and term weighting (AIMA §5.4).
Group deliverable (20%)
Working JS agent + design rationale + demo on the playable page, per Assessment.

8 Extensions

Natural next steps, roughly in order of effort, that map onto more of the syllabus:

Where this goes next: iterative deepening, transposition tables and a learned evaluation are exactly the upgrades that took computer chess from Deep Blue to modern engines — the same arc sketched in S2.

9 References

Core Russell, S. & Norvig, P. (2021). Artificial Intelligence: A Modern Approach (4th ed.). Pearson.
The course spine. Ch. 3 (search & BFS, the heuristic engine here); Ch. 5 — Adversarial Search and Games (§5.2 minimax, §5.3 alpha–beta pruning, §5.4 heuristic evaluation functions & cutoff, §5.4.2 weighted-linear evaluation) — the direct basis for §§3–5 of this project.
Algorithm Knuth, D. E. & Moore, R. W. (1975). An Analysis of Alpha-Beta Pruning. Artificial Intelligence, 6(4), 293–326.
The original derivation of the $O(b^{d/2})$ best-case bound and the role of move ordering quoted in §3.
MCTS Browne, C. et al. (2012). A Survey of Monte Carlo Tree Search Methods. IEEE Trans. on Comp. Intelligence and AI in Games, 4(1), 1–43.
Reference for the UCB1 / MCTS extension in §8 and for AlphaGo-style search.
Origin Mertens, P. J. C. (2006). A Quoridor-playing Agent. B.Sc. thesis, Maastricht University.
A focused study of Quoridor heuristics (shortest-path differential, position features) that motivates the evaluation in §5.
Course Marchiori, E. (2025). AI: Reasoning & Problem Solving — Syllabus. IE University, BCSAI.
Course code AIRPS-CSAI.3.M.A. See the course outline and the original PDF.