discrete-math-lab · worked project — a Sudoku solver from logic & backtracking

Worked project — a Sudoku solver from propositional logic and backtracking

A single, end-to-end worked example that turns one puzzle into a tour of the whole course: we encode Sudoku as a propositional constraint-satisfaction problem, recognise it as a graph-colouring instance, prove a few small facts, and then write a clean, correct JavaScript solver that combines constraint propagation (unit clauses) with backtracking search. Every step links back to a demo on the lab home page and a session in the course outline.

At a glance

goalSolve any 9×9 Sudoku, or prove it unsolvable

paradigmCSP · search · logic

stackVanilla JavaScript (ES2020)

difficultyIntermediate

lines of code≈ 90, no dependencies

modules exercised1 · 2 · 3 · 5

Course concepts this project exercises

propositional logic predicates & quantifiers sets & set operations relations graphs & colouring proof methods finite-state search

Maps onto Module 1 (logic & proofs), Module 2 (sets & functions), Module 3 (relations & graphs) and Module 5 (modelling computation) — sessions 1–8, 9–13, 16–18 and 25–27.

1. The mathematics — encoding the puzzle

1.1 The board as a set and a function

Index the cells by the set $C = \{0,1,\dots,8\}\times\{0,1,\dots,8\}$, so $|C| = 81$, and let the values come from $V = \{1,2,\dots,9\}$. A completed board is a total function $f : C \to V$. A partially filled puzzle is a partial function whose domain is the set $G \subseteq C$ of givens; solving means extending $f$ to all of $C$ while obeying the rules.

This is exactly the functions material from demo 4 and sessions 11–13.

1.2 The three constraints as quantified predicates

Write $f(r,c)$ for the value in row $r$, column $c$, and let $b(r,c)=\big(\lfloor r/3\rfloor,\lfloor c/3\rfloor\big)$ name the $3\times3$ box. The rules are three injectivity conditions — every line and box must use distinct values:

Sudoku constraints $$\forall r\;\forall c_1\neq c_2:\; f(r,c_1)\neq f(r,c_2) \quad\text{(rows)}$$ $$\forall c\;\forall r_1\neq r_2:\; f(r_1,c)\neq f(r_2,c) \quad\text{(columns)}$$ $$\forall (r_1,c_1)\neq(r_2,c_2):\; b(r_1,c_1)=b(r_2,c_2)\Rightarrow f(r_1,c_1)\neq f(r_2,c_2) \quad\text{(boxes)}$$

Because each of the nine rows holds nine cells drawn from a nine-element value set, "distinct" and "uses every value" coincide — each row is a bijection onto $V$. The same is true of every column and box. The whole puzzle is the conjunction of $27$ such bijection requirements.

1.3 A propositional (Boolean) encoding

To see the logical structure from Module 1, introduce one Boolean variable per (cell, value) pair:

$$x_{r,c,v} \;=\; \text{true} \iff f(r,c) = v, \qquad r,c\in\{0,\dots,8\},\; v\in\{1,\dots,9\}.$$

That is $9\times9\times9 = 729$ variables. The rules become a propositional formula in CNF:

Each cell has at least one value — a clause $\bigvee_{v} x_{r,c,v}$ for every cell.
At most one value per cell — $\neg x_{r,c,v}\lor\neg x_{r,c,v'}$ for $v\neq v'$.
No repeat in a row — $\neg x_{r,c,v}\lor\neg x_{r,c',v}$ for $c\neq c'$; similarly columns and boxes.

The puzzle is solvable iff this conjunction is satisfiable. A given clue $f(r,c)=v$ is just a unit clause $x_{r,c,v}$. This reframes Sudoku as the canonical NP-complete problem SAT — the bridge to the truth-table and equivalence work in demo 1.

Unit propagation (the engine of our solver) If a clause has all but one literal already falsified, that last literal is forced true. In Sudoku terms: if a cell has exactly one value left in its candidate set, that value must go there — and the same value can then be removed from every cell that shares a row, column or box.

1.4 The relation behind it — Sudoku is graph colouring

Define a "sees" relation on cells: $u \mathrel{R} w$ iff $u\neq w$ and the two cells share a row, a column or a box. $R$ is irreflexive and symmetric — exactly the relation properties from demo 5 (sessions 16–17). Treat the $81$ cells as vertices and put an edge wherever $R$ holds; this is the Sudoku graph $\mathcal{G}$.

Claim Every cell of $\mathcal{G}$ has degree exactly $20$, so $\mathcal{G}$ has $\tfrac{81\cdot 20}{2}=810$ edges.
Proof. A fixed cell shares its row with $8$ others, its column with $8$, and its box with $8$. The box overlaps the row in $2$ cells and the column in $2$ cells, all already counted. By inclusion–exclusion the neighbourhood has $8+8+8-2-2 = 20$ cells. By the handshaking lemma the edge count is $\sum\deg/2 = 81\cdot20/2 = 810$. $\;\blacksquare$

A valid solution is then a proper 9-colouring of $\mathcal{G}$: assign each vertex a colour in $V$ so adjacent vertices differ. This is the same colouring/graph idea explored in demo 6 (session 18). Graph $k$-colouring is NP-complete for $k\geq 3$, which is why we search rather than solve in closed form.

2. Design & approach

A brute-force assignment of $9$ values to up to $81$ empty cells is $9^{81}\approx 10^{77}$ candidates — hopeless. We instead model solving as a search through partial assignments, a finite-state process in the spirit of demo 9 (Module 5). Two ideas keep it fast:

Constraint propagation. Maintain, for every empty cell, the candidate set $\mathrm{cand}(r,c)\subseteq V$ of still-legal values. Placing a value removes it from the candidate sets of all $20$ neighbours (set difference — demo 3). Whenever a cell is reduced to a single candidate, that placement is forced (unit propagation, §1.3).
Backtracking with the MRV heuristic. When propagation stalls, branch on the cell with the Minimum Remaining Values — the most-constrained cell. This keeps the branching factor near $1$ and prunes the search tree aggressively.

Correctness invariant At every node of the search the partial board satisfies all three Sudoku constraints, and $\mathrm{cand}(r,c)$ contains exactly the values that keep it so. Backtracking restores this invariant on the way back up, so the search is both sound (never reports a bad board) and complete (explores every possibility, hence finds a solution if one exists or proves none does).

The state machine of the search has three transitions:

state	action	next
propagate	fill all forced cells	contradiction → backtrack · stuck → branch · done → accept
branch	pick MRV cell, try a candidate	propagate
backtrack	undo last guess, try next candidate	propagate or fail

3. Step-by-step implementation (JavaScript)

3.1 Representation

The board is a flat array of 81 integers, 0 meaning "blank". Helper functions map a cell index i to its row, column and box, and precompute its $20$ neighbours (the relation $R$ from §1.4) once.

// A board is an Int8Array of length 81; cell i is at row r=i/9, col c=i%9.
const N = 9, CELLS = 81;

const rowOf = i => (i / N) | 0;
const colOf = i => i % N;
const boxOf = i => 3 * ((rowOf(i) / 3) | 0) + ((colOf(i) / 3) | 0);

// Precompute the 20 peers of every cell: the "sees" relation R (irreflexive, symmetric).
const PEERS = Array.from({ length: CELLS }, (_, i) => {
  const set = new Set();
  for (let j = 0; j < CELLS; j++) {
    if (j === i) continue;
    if (rowOf(j) === rowOf(i) || colOf(j) === colOf(i) || boxOf(j) === boxOf(i))
      set.add(j);
  }
  return [...set];                 // always length 20 (see the §1.4 proof)
});

The peer table encodes the graph $\mathcal{G}$: PEERS[i] is the adjacency list of vertex i.

3.2 Candidate sets & the legality test

For an empty cell, its candidate set is $V$ minus the values already used by its peers — a direct set difference. We represent each candidate set compactly as a 9-bit mask, so "remove a value" is a single bitwise AND.

// Bit v-1 of the mask is set ⇔ value v is still a legal candidate.
const ALL = (1 << N) - 1;            // 0b111111111 = all nine values
const bit = v => 1 << (v - 1);
const popcount = m => { let n = 0; while (m) { m &= m - 1; n++; } return n; };

// Candidates(board, i) = V \ { values used by peers of i }.
function candidates(board, i) {
  let used = 0;
  for (const p of PEERS[i]) if (board[p]) used |= bit(board[p]);
  return ALL & ~used;                // set difference, in one operation
}

3.3 Constraint propagation (unit clauses)

Repeatedly scan for any empty cell with exactly one candidate and fill it. If a cell ends up with zero candidates the partial board is unsatisfiable — a falsified clause — and we signal failure. The loop runs to a fixed point.

// Returns true if consistent, false if a contradiction (empty domain) appears.
// `placed` collects every cell we fill so the caller can undo them on backtrack.
function propagate(board, placed) {
  let progress = true;
  while (progress) {
    progress = false;
    for (let i = 0; i < CELLS; i++) {
      if (board[i]) continue;
      const mask = candidates(board, i);
      if (mask === 0) return false;     // dead end: clause violated
      if (popcount(mask) === 1) {       // unit clause ⇒ forced move
        board[i] = Math.log2(mask) + 1;
        placed.push(i);
        progress = true;
      }
    }
  }
  return true;
}

3.4 Backtracking search with MRV

When no cell is forced, choose the empty cell with the fewest candidates and try each in turn, recursing. Anything we placed during a failed branch is rolled back, preserving the invariant from §2.

// Solve in place. Returns true and leaves a completed board, or false if unsolvable.
function solve(board) {
  const placed = [];
  if (!propagate(board, placed)) { undo(board, placed); return false; }

  // Pick the most-constrained empty cell (Minimum Remaining Values).
  let best = -1, bestMask = 0, bestCount = N + 1;
  for (let i = 0; i < CELLS; i++) {
    if (board[i]) continue;
    const mask = candidates(board, i), c = popcount(mask);
    if (c < bestCount) { bestCount = c; best = i; bestMask = mask; }
  }
  if (best === -1) return true;         // no empty cells ⇒ solved

  for (let v = 1; v <= N; v++) {       // branch over the candidates
    if (!(bestMask & bit(v))) continue;
    board[best] = v;
    if (solve(board)) return true;     // found a colouring of G
    board[best] = 0;                  // undo the guess and try the next value
  }
  undo(board, placed);                 // undo this node's forced placements
  return false;
}

const undo = (board, placed) => { for (const i of placed) board[i] = 0; };

≈ 90 lines total, no libraries. The full file would also include a parser from an 81-character string and a pretty-printer; both are trivial and omitted here for focus.

4. Results & a worked trace

Below is a genuinely hard, minimal (17-clue) puzzle and the solution our solver returns. Givens are in black, values discovered by the algorithm are in accent.

Puzzle — 17 givens

Solution (unique)

solved guesses / backtracks: small solution unique

4.1 Tracing the first moves

Start in the top-left box. The shared row, column and box already pin many candidate sets to a single value, so unit propagation alone makes a chain of forced placements before any guessing is needed:

Take an empty cell whose row already contains $\{5,3\}$, whose column contains $\{8,4,7\}$ and whose box contains $\{6\}$. Its candidate set is $\{1,\dots,9\}\setminus\{5,3,8,4,7,6\} = \{1,2,9\}$ — not yet forced.
A neighbour cell, however, has peers covering $\{1,2,4,5,6,7,8,9\}$, leaving the single candidate $\{3\}$. This is a unit clause: we place $3$ and remove it from all $20$ peers.
That removal collapses the first cell's candidates to $\{1,2,9\}\setminus\{3\}=\{1,2,9\}$ unchanged, but elsewhere another cell drops to a singleton, and the cascade continues — each forced move can trigger more, exactly the fixed-point loop in propagate (§3.3).
When the cascade stalls, MRV selects the cell with the smallest remaining set (say two candidates) and tries the first. For this puzzle the very first guess succeeds and propagation finishes the board, so the search tree is shallow.

Why the answer is trustworthy The solver only ever places a value that is in the cell's candidate set, so the output satisfies all $27$ bijection constraints of §1.2 by construction. Running the same search but continuing after the first solution (instead of returning) counts solutions; for a well-formed puzzle it finds exactly one, confirming uniqueness.

5. Complexity analysis

5.1 Cost of one propagation pass

candidates(board, i) visits the $20$ peers, so it is $O(1)$ for fixed $N=9$ (in general $O(N\sqrt N)$ peers for an $N\times N$ board). One sweep of propagate touches all $81$ cells, and the fixed-point loop repeats at most once per cell filled, giving $O(\text{CELLS}^2)=O(N^4)$ work per propagation phase — a constant for standard Sudoku.

5.2 The search itself

In the worst case backtracking is exponential — unavoidable, since Sudoku generalises to an NP-complete problem (§1.3–1.4). If $k$ cells are empty and each branches over at most $d$ candidates, the tree has at most $d^{\,k}$ leaves:

$$T_{\text{worst}}(k) = O\!\big(d^{\,k}\big),\qquad d \le 9.$$

quantity	bound	why
Boolean variables	729	$9\times9\times9$, one per (cell,value)
graph edges	810	handshaking on degree-20 vertices (§1.4)
naïve search space	9^81 ≈ 10^77	every cell any value
propagation pass	O(N⁴)	$\le 81$ sweeps × 81 cells × $O(1)$
typical solve	≈ few× propagation	MRV keeps branching ≈ 1

Theory vs. practice The exponential bound is real but pessimistic. MRV ordering plus unit propagation reduce the effective branching factor of human-grade puzzles to near $1$, so they solve in milliseconds despite the $10^{77}$ raw search space — a textbook illustration of how the right discrete structure tames a hard problem.

6. Mapping to course learning outcomes

Each syllabus learning objective, and where this project demonstrates it.

Propositional & predicate logic, proof methods

The CNF/SAT encoding (§1.3), the quantified constraints (§1.2), and the inclusion–exclusion / handshaking proofs (§1.4). Demo: truth tables, quantifiers. Sessions 1–8.

Set theory & functions

Cells and values as sets; candidate sets via set difference (§3.2); a solution as a bijective function (§1.1–1.2). Demo: sets, functions. Sessions 9–13.

Relations & orderings

The irreflexive, symmetric "sees" relation $R$ and its adjacency-list encoding (§1.4, §3.1). Demo: relations. Sessions 16–17.

Graphs & their applications

Sudoku as a 9-colouring of the degree-20 Sudoku graph; edge count by handshaking. Demo: graphs. Session 18.

Modelling computation

Search as a state machine (propagate / branch / backtrack) with soundness & completeness. Demo: finite-state machine. Sessions 25–27.

7. Extensions

Naked / hidden pairs. Add the "if two cells in a unit share the same two candidates, eliminate those values from the rest of the unit" rule — stronger propagation, fewer guesses.
Hand it to a SAT solver. Emit the §1.3 CNF in DIMACS format and let a real solver (e.g. MiniSAT) close it; compare with our hand-rolled search. Ties directly to propositional equivalence.
Generalise the board. The whole solver is parameterised by $N$ (with $\sqrt N$-sided boxes), so $4\times4$, $16\times16$ and $25\times25$ variants work unchanged — and let you watch the complexity wall arrive.
Modular hashing of states. Cache visited partial boards with a hash mod a prime to skip repeats — a hook into the modular arithmetic material (sessions 21–23).
Puzzle generation & uniqueness. Generate a full board, remove clues while the solution stays unique (count via the §4 trick), and grade difficulty by guesses needed.

8. References

Rosen, K. H. Discrete Mathematics and Its Applications, McGraw-Hill. Ch. 1 (logic & proofs), Ch. 2 (sets & functions), Ch. 9–10 (relations & graphs), Ch. 13 (modelling computation). The course textbook.
Norvig, P. "Solving Every Sudoku Puzzle" — constraint propagation + search, the classic write-up this project follows in spirit.
Knuth, D. E. "Dancing Links" (Algorithm X) — Sudoku as exact cover, an alternative exact formulation.
Lewis, R. A Guide to Graph Colouring, Springer — the colouring view of §1.4.
Companion material: interactive demos · course outline · syllabus PDF.