Course Structure - AI: Reasoning & Problem Solving

Course overview

This is an introductory course to Artificial Intelligence that establishes the foundations of problem representation, search algorithms, and reasoning agents. Teaching is collaborative, active and applied in the IE tradition: students do not just study algorithms, they implement them in Python and reason about which strategy fits which problem. Work alternates between individual coding assignments (search, CSPs) and a group project (an agent that plays an adversarial game), culminating in a final exam over the whole syllabus.

Code

AIRPS-CSAI.3.M.A

Area

Computer Science

Credits

6.0 ECTS

Sessions

Year / Sem.

3rd · S1

Language

English

By the end of this course you will be able to…

After the course, students hold the theoretical knowledge to: understand how a broad range of AI problems are represented; reason about the application of different search and reasoning strategies to real-world problems; and represent and apply intelligent agents in adversarial games and decision making. They also gain practical Python experience with: writing and visualising search algorithms; solving real-world constraint-satisfaction problems; and developing agents that participate in adversarial games — exactly the deliverable embodied by the Quoridor / Ghosts / UNO agents on this site.

Instructor: Eugenio Marchiori, PhD — startup advisor / angel investor in AI (e.g. Portia Labs), formerly Principal Software Engineer (Director) at Google. PhD on authoring educational video games and simulations. Office hours on request: emarchiori@faculty.ie.edu.

GenAI policy. As a course about how these systems work, students are recommended not to use GenAI for coursework — especially coding — though it is not prohibited (with appropriate acknowledgement). GenAI may not be used for the final exam; inappropriate use is academic misconduct. A suggested acknowledgement format names the tool, the prompts used and how the output was incorporated; if you used no AI, a short "no AI content" disclosure is recommended. Acknowledging AI does not affect your grade.

Prerequisites & how this course connects

This is a third-year, first-semester compulsory course, so it assumes you arrive with the programming and mathematical maturity built earlier in the BCSAI degree:

Python fluency — classes, recursion, dictionaries/sets and the standard library; every assignment is delivered as readable, tested Python.
Data structures & algorithms — queues, stacks, priority queues (heaps), graphs and trees, plus big-O analysis. Search is graph traversal with a smarter frontier.
Discrete maths & logic — sets, relations, propositional logic and proof, which Module 6 generalises to first-order logic.
Probability — sample spaces, conditional probability and expectation, the backbone of stochastic games (Module 4) and reasoning under uncertainty (Module 6).

Looking forward, this course is the classical-AI foundation that later electives in machine learning, reinforcement learning and multi-agent systems build on: MDPs and the maximum-expected-utility principle introduced in S25 become the formal starting point of RL, and the agent abstraction recurs throughout.

Weekly study load. Of the 150 total hours, only ~36 h are lectures — roughly half (75 h) is independent study. Across a ~15-week semester that is about 5 hours of self-study per week on top of class, rising around the three coding deliverables (Assignment 1, the CSP assignment, and the group agent). Treat the cited AIMA chapters as the spine of that independent work and read ahead of each session.

LO Learning objectives

Acquire a global perspective of Artificial Intelligence — its history, scope and uses.
Understand agents: their representations, environments and applications.
Apply search algorithms to problem solving, including the identification and evaluation of such algorithms.
Learn about constraint satisfaction problems and apply solution strategies in practice.
Acquire basic game-theory concepts and work in groups to develop agents to play games.
Understand decision making under uncertainty and the basics of probabilistic programming.

MTD Teaching methodology & workload

The IE method is collaborative, active and applied: students build knowledge by participating in the whole process. The professor leads and guides through a mix of lectures, in-class exercises, asynchronous and field work, and group work. The estimated 150-hour workload breaks down as follows:

Lectures

24% · 36h

Exercises / async / field work

6% · 9h

Group work

20% · 30h

Individual studying

50% · 75h

Total: 100% · 150 hours. The bars above show the share of estimated student time each activity should consume — note that half the workload is independent study, reflecting the depth of the Russell & Norvig material.

EVA Assessment & grading

Evaluation is continuous. All individual and group assignments carry optional extra tasks, and some work is presented to the class. The grade is composed of five weighted components:

Final exam

30%

Individual assignments

20%

Group assignments

20%

Participation / presentations

20%

Quizzes

10%

30%

Final exam

Demonstrates understanding of the whole syllabus and application of concepts to limited situations. No GenAI permitted.

20%

Individual assignments

Coding: search-strategy implementation & comparison (Assignment 1), and a CSP applied to a real-world problem.

20%

Group assignment

Build an agent that plays an adversarial game in Python — the Quoridor / Ghosts / UNO deliverable, demonstrated on simulated situations.

20%

Participation / presentations

Active in-class engagement plus the final group speaker presentations (sessions 26–28).

10%

Quizzes

Short checks on lecture content across the term.

What each component looks for — rubric & tips

Final exam (30%)

Format: in-person comprehensive written exam over the whole syllabus, applying concepts to limited situations (trace an algorithm, design a heuristic, formulate a CSP, do a Bayesian update). No GenAI.
Looks for: correctness, the right algorithm for the problem shape, and clear reasoning about completeness/optimality/complexity.
Tip: practise hand-tracing BFS/UCS/A*, alpha-beta and AC-3 on small instances — the exam rewards process, not just the final answer.

Individual assignments (20%)

Format: annotated Python + a short write-up — the search implementation/comparison (Assignment 1) and a CSP applied to a real-world problem.
Looks for: working code against a shared interface, an honest empirical comparison (nodes expanded, cost, runtime) and a clear modelling rationale.
Tip: visualise the frontier/search and justify why a strategy wins, not just that it does.

Group assignment (20%)

Format: a Python agent that plays an adversarial game, demonstrated live on simulated situations — the Quoridor / Ghosts / UNO deliverable.
Looks for: a sound search/heuristic design, mapping each design choice to a class concept, robustness in the demo, and clear division of work.
Tip: start the engine early (S15→S22) and profile depth vs. time before tuning the heuristic.

Participation / presentations (20%)

Format: active in-class engagement plus the final group speaker presentations (S26–28).
Looks for: contribution to discussion and debriefs, and a well-organised, well-argued presentation with a working demo.
Tip: optional extra tasks on assignments are an easy way to stand out.

Quizzes (10%)

Format: short checks on lecture content spread across the term.
Looks for: retention of definitions and core formulas.
Tip: keep up with the cited AIMA chapter each week and the quizzes take care of themselves.

Attendance. The 80% attendance rule is mandatory: students who fail it fail both the ordinary and extraordinary calls for the year and must re-enroll the following year.

Passing & re-sit. Four chances to pass across two academic years (ordinary + extraordinary June/July re-sits). The re-sit is a single comprehensive exam held in person (Segovia or Madrid); continuous evaluation is not carried over, the minimum passing grade is 5 and the maximum obtainable is 8.0 ("notable"). Retakers (3rd call) may reach 10.0 and must confirm criteria with the professor. Grade appeals require attending the review session first.

30 Program — the full 30 sessions

All sessions are live in-person. They are grouped below into the six thematic modules from the subject description. Each module opens with an overview and learning outcomes; every individual session is then expanded as a timeline item.

Lecture / discussion Assignment / lab / project Exam

MODULE 1

Introduction & Intelligent Agents

Sessions 1–4

What AI is, where it came from, and the central abstraction the rest of the course rests on: the agent that perceives an environment through sensors and acts on it through actuators to maximise a performance measure. We classify environments and agent architectures so that later we can pick the right algorithm for a given problem shape.

Place AI historically and distinguish "thinking/acting humanly vs. rationally".
Define a rational agent and the PEAS (Performance, Environment, Actuators, Sensors) framing.
Classify environments: observable, deterministic, episodic, static, discrete, single-/multi-agent.
Map a real task onto a formal problem: states, initial state, actions, transition model, goal test, path cost.

S1Introduction: FoundationsLecture / Discussion

Set expectations and lay the philosophical and technical groundwork for what "artificial intelligence" means.

Introduction. Course logistics, the three coding deliverables and the GenAI policy. Sets the contract: you will not just read about algorithms, you will implement and compare them.
Expectations & contents. The arc from search → games → CSPs → reasoning under uncertainty. Each block answers a different question: find a path, beat an opponent, satisfy constraints, and act despite incomplete knowledge.
Philosophical & technical foundations of AI. The four classic schools come from crossing two axes — thinking vs. acting and humanly vs. rationally — giving cognitive modelling, the laws of thought, the Turing test, and the rational-agent view. The course commits to the last: build agents that do the right thing, measured against a performance measure, rather than imitate people.

Key idea: we engineer rationality — doing the "right thing" given what the agent knows — not human-likeness.

Takeaway: "rational" is relative to a performance measure, the percept history, and the agent's knowledge — it is not omniscience and it is not human imitation. Connects to: the rational-agent definition formalised in S3.

Reading: Russell & Norvig (AIMA), Ch. 1 — Introduction (esp. 1.1 foundations, 1.2 acting/thinking humanly/rationally).

S2History & State of the ArtLecture / Discussion

Trace how the field rose, fell and rose again, and survey what AI does today.

Inception of AI. The 1956 Dartmouth workshop named the field; early wins were symbolic — the Logic Theorist, GPS (General Problem Solver) and Rosenblatt's perceptron — built on search and logic rather than data.
Ups and downs of the industry. Over-promising and limits (e.g. Minsky & Papert's perceptron critique) triggered the AI winters of funding collapse; expert systems boomed and busted in the 1980s; the 2010s deep-learning resurgence came from data + GPUs + backprop at scale.
State-of-the-art applications. Game-playing milestones map onto this course: Deep Blue (1997, alpha-beta search → Module 4), AlphaGo/AlphaZero (2016+, MCTS + learned evaluation → S13), and today's foundation models. Game milestones are recurring public benchmarks of progress.

Key idea: capability has historically tracked the availability of data, compute and the right problem abstraction — not just clever algorithms.

Takeaway: the classical techniques in this course are not "old news" — Deep Blue's alpha-beta and AlphaGo's MCTS are exactly Modules 4–13, and they still power production systems. Connects to: minimax/alpha-beta (S12) and MCTS (S13).

Reading: AIMA Ch. 1.3–1.4 (history & state of the art); Clancy, Playing with Reality (intro) for the cultural lens on why games drive AI.

S3Agents, Rationality & EnvironmentsLecture / Discussion

Define the agent abstraction and the dimensions along which environments vary.

Understanding agents. The perceive–decide–act loop; percept sequences and the agent function.
Rationality and agents. A rational agent maximises the expected performance measure given its percepts and knowledge.
Properties of environments. Fully/partially observable, deterministic/stochastic, episodic/sequential, static/dynamic, discrete/continuous, single-/multi-agent.
Structure of agents. The agent = architecture + program.

Definition — Rational agent

For each possible percept sequence, a rational agent selects the action expected to maximise its performance measure, given the evidence from the percept sequence and any built-in knowledge: $a^\ast = \arg\max_{a} \; \mathbb{E}\big[\,U \mid a, \text{percepts}\,\big]$. Note the dependence on the environment type: a vacuum-cleaner agent that is rational in a deterministic, fully observable world may be irrational in a stochastic one.

Pitfall: environment properties are not "nice-to-knows" — they decide which algorithm is even applicable. A partially observable, stochastic, multi-agent environment (like UNO) rules out the simple offline search of Module 2.

Takeaway: classify the environment first; the PEAS description and the six environment axes determine the whole rest of the design. Connects to: agent types & problem formulation (S4).

Reading: AIMA Ch. 2 — Intelligent Agents (2.2 good behaviour/rationality, 2.3 environment properties, 2.4 agent structure).

S4Agents Part 2 — Types & Problem FormulationLecture / Discussion

Catalogue agent architectures and formalise a task as a search problem.

Types of agents. A ladder of increasing capability: simple-reflex (condition–action rules on the current percept only), model-based reflex (keeps internal state to handle partial observability), goal-based (searches for action sequences that reach a goal), utility-based (chooses among goals by expected utility), and learning agents (improve any component from experience).
Inner working of agents. A model-based agent maintains state $s_{t+1}=f(s_t, a_t, \text{percept}_t)$ using a transition model and a sensor model, so it can act sensibly when it cannot see the whole world — the seed of belief-state reasoning (S10) and the Ghosts agent.
Formulation of agent problems. Turn a task into a search problem: states, an initial state, actions, a transition model (Result(s,a)), a goal test, and a path-cost function. Worked example (8-puzzle): a state is a tile arrangement, an action slides the blank, the goal test is "tiles in order", and step cost is 1 — nothing more is needed for Module 2's algorithms to run.

Key idea: a well-posed problem is the tuple $\langle S, s_0, A, \text{Result}, \text{Goal}, c\rangle$. Get this right and the algorithm choice almost falls out.

Pitfall: a sloppy state representation (carrying irrelevant detail) explodes the state space; an over-abstracted one can make the goal unreachable. Abstraction is a deliberate modelling choice.

Takeaway: picking the right agent type and a clean problem formulation is the highest-leverage design decision — it fixes the search space the rest of the course operates on. Connects to: the generic search skeleton (S5).

Reading: AIMA Ch. 2.4 (agent programs/types); Ch. 3.1 (problem-solving agents & problem formulation).

MODULE 2

Problem Solving & Search

Sessions 5–8

The workhorse of classical AI: treat a problem as a graph of states and find a path from start to goal. We build up from uninformed strategies (BFS, DFS, uniform-cost / Dijkstra, iterative deepening) to informed, heuristic-guided search (greedy, A*), then implement and compare them in Python (Assignment 1). The BFS shortest-path routine you implement here is exactly the heuristic engine inside the Quoridor agent.

Implement the generic graph-search / tree-search skeleton with a frontier and explored set.
Analyse algorithms on four axes: completeness, optimality, time, space.
Choose between uninformed strategies based on cost structure and memory limits.
Design admissible, consistent heuristics and explain why they make A* optimal.

S5Introduction to Search AlgorithmsLecture / Discussion

Frame problem solving as search over a state-space graph and meet the first strategies.

Understanding search algorithms. Search tree vs. state space; the frontier (open list) and explored set (closed list); the generic best-first template.
Basic search algorithms. Breadth-first search (BFS) and how the frontier data structure (a FIFO queue) determines the strategy.

Algorithm — Breadth-First Search (BFS)

Put the start node on a FIFO frontier; mark it explored.
Pop the front node; if it passes the goal test, return its path.
Expand it, pushing each unexplored successor to the back of the queue.
Repeat until goal or empty frontier.

Complete and optimal for unit step costs; time and space $O(b^d)$ for branching factor $b$ and goal depth $d$ — the memory cost is the real bottleneck.

trace (b=2): pop A → push B,C | pop B → push D,E | pop C → push F,G | pop D (goal!) ⇒ path A→B→D, expanded shallowest-first

Complexity / pitfalls: BFS holds the entire frontier in memory — $O(b^d)$ nodes. At $b=10,\,d=10$ that is ~10 billion nodes; you run out of RAM long before time. This is precisely why IDS (S6) exists.

Takeaway: the frontier's data structure is the strategy — swap the FIFO queue for a stack and you have DFS, for a priority queue and you have UCS/A*. Connects to: uninformed search (S6).

Reading: AIMA Ch. 3.1–3.4 (problem-solving agents, example problems, search algorithm infrastructure, uninformed search).

S6Uninformed Search AlgorithmsLecture / Discussion

Cover cost-sensitive and memory-frugal uninformed strategies.

Dijkstra & Depth-first search. Uniform-cost search (UCS = Dijkstra on a search tree) expands the lowest-cost frontier node; DFS uses a LIFO stack — linear memory but neither complete (infinite paths) nor optimal.
Iterative deepening search (IDS). Repeated depth-limited DFS that gets BFS's optimality with DFS's $O(bd)$ memory.

Algorithm — Uniform-Cost Search (Dijkstra)

Expand the frontier node with least path cost $g(n)$ using a priority queue keyed on $g(n)$. Goal-test on removal (not generation). Complete and optimal for non-negative step costs.

Key idea: IDS is the practical default when the state space is large and the depth is unknown — the re-expansion overhead is dominated by the last, largest layer.

Complexity / pitfalls: plain DFS can loop forever on graphs with cycles and is non-optimal; always track an explored set or a depth limit. UCS can stall expanding many zero-/low-cost steps before reaching the goal. IDS re-expands upper layers, but since the bottom layer has $b^d$ nodes the wasted work is only a constant factor (~$\frac{b}{b-1}$).

Takeaway: match the strategy to the cost structure — unit costs → BFS/IDS, varied non-negative costs → UCS, tight memory → IDS. Connects to: adding a heuristic turns UCS into A* (S7).

Reading: AIMA Ch. 3.4 (uninformed strategies: BFS, UCS, DFS, depth-limited & iterative deepening).

S7Heuristic (Informed) SearchLecture / Discussion

Use problem-specific knowledge to guide search toward the goal.

Informed search strategies. Greedy best-first (expand by $h(n)$ alone) vs. A* (expand by $f(n)=g(n)+h(n)$).
Heuristic functions. Admissibility ($h(n)\le h^\ast(n)$, never overestimates) and consistency (the triangle inequality); dominance and how relaxed problems generate heuristics.

Algorithm — A* search

Best-first search ordered by $f(n) = g(n) + h(n)$, where $g$ is cost-so-far and $h$ is the estimated cost-to-goal. If $h$ is admissible, A* is optimal for tree search; if $h$ is also consistent, it is optimal for graph search and never re-expands a node.

Key idea: A* is "BFS that knows where it's going" — UCS pulls toward cheap paths, $h$ pulls toward the goal, and their sum balances both.

how heuristics arise: drop a constraint (relax) → 8-puzzle "misplaced tiles" ($h_1$) and "Manhattan distance" ($h_2$) are exact costs of easier problems, so both are admissible; $h_2$ dominates $h_1$

Complexity / pitfalls: A* stores every generated node, so memory is the usual failure mode — use IDA* or weighted A* when it blows up. An inadmissible $h$ (overestimating) can return a sub-optimal path; greedy best-first ($f=h$) is fast but neither complete nor optimal.

Takeaway: a better (still admissible) heuristic that dominates another expands strictly fewer nodes — heuristic design is where most real speed-ups come from. Connects to: the BFS-distance heuristic you implement in Assignment 1 (S8) and reuse in Quoridor (S12).

Reading: AIMA Ch. 3.5–3.6 (greedy & A* informed search; admissibility, consistency, generating heuristics).

S8Assignment 1 — Implement & Compare Search in PythonIndividual lab

Demonstrate understanding by implementing the strategies and visualising the decision process, costs and applications.

Implementation. Code BFS, DFS, UCS, IDS and A* against a shared problem interface (e.g. getting through a maze under different conditions).
Comparison. Measure nodes expanded, path cost and runtime; visualise the frontier as it grows.
Applications. Discuss where each strategy wins and why.

Cross-link: the BFS shortest-path-to-goal routine here is the same shortest_path_length helper that powers the Quoridor heuristic.

Takeaway: implementing all five strategies behind one problem interface makes the comparison fair — the only thing that changes is the frontier policy. Connects to: debriefed in S14.

Deliverable format: annotated, runnable Python against a shared problem interface, plus visualisations of the growing frontier, path cost and nodes expanded, and a short write-up comparing strategies. Tip: log nodes-expanded and peak-frontier-size, not just runtime — they expose the $O(b^d)$ memory story far better than a stopwatch.

MODULE 3

Search in Complex Environments

Sessions 9–11

Real environments break the tidy assumptions of classical search. We move to local search for optimisation (where the path doesn't matter, only the final state), then to non-deterministic actions (contingency plans via AND-OR trees), partial/no observability (belief-state search), and online search where the agent must act before it fully knows the map.

Apply hill-climbing, simulated annealing and evolutionary methods to optimisation landscapes.
Build contingency plans with AND-OR search for non-deterministic actions.
Reason over belief states (sets of possible physical states) under partial observability.
Distinguish offline planning from online search where exploration and action interleave.

S9Local Search & OptimisationLecture / Discussion

Search when only the goal state matters, including continuous spaces.

Hill-climbing search. Greedy ascent on an objective; the local-maxima, ridge and plateau traps; random-restart hill climbing.
Simulated annealing. Accept worse moves with probability $p=e^{-\Delta E / T}$, cooling $T$ over time to escape local optima.
Evolutionary algorithms. Populations, selection, crossover and mutation as parallel stochastic search.
Local search in continuous spaces. Gradient ascent and step-size selection.

Key idea: when the state space is enormous but a "goodness" score is cheap to compute, keep one (or a few) states in memory and iteratively improve — trade optimality guarantees for scale.

simulated annealing: ΔE>0 (better) → always accept; ΔE<0 (worse) → accept with prob e^(ΔE/T); as T→0 it becomes pure hill-climbing

Complexity / pitfalls: hill-climbing gets stuck on local maxima, ridges and plateaus — random restarts or sideways moves help. Simulated annealing only converges to the global optimum with an impractically slow cooling schedule; in practice it is a heuristic. Evolutionary methods need careful representation or crossover destroys good structure.

Takeaway: local search keeps O(1) states in memory and ignores the path entirely — ideal for optimisation (n-queens, scheduling) where only the final configuration matters. Connects to: min-conflicts for CSPs (S17) is local search in disguise.

Reading: AIMA Ch. 4.1–4.2 (local search & optimisation; search in continuous spaces).

S10Search with Non-Deterministic ActionsLecture / Discussion

Plan when actions can have several possible outcomes and when the world is only partly visible.

Non-deterministic actions. Solutions become contingency plans (trees of if-then branches) rather than fixed sequences.
AND-OR search trees ("try, try again"). OR-nodes are the agent's choices; AND-nodes are the environment's possible responses — a solution must succeed on every AND branch.
No / partial observation. Search over belief states — sets of physical states consistent with the percepts so far.

Definition — Belief state

Under partial observability the agent tracks a set $b \subseteq S$ of states it could be in. Actions and percepts update this set; a plan must drive the belief state into the goal regardless of which member is the true state. This is precisely the reasoning the Ghosts agent uses over hidden ghost identities.

Complexity / pitfalls: the belief-state space is the power set of physical states — up to $2^{|S|}$ — so naive belief-state search is far larger than the underlying problem. AND-OR solutions are trees, not paths, and must handle every contingency branch; forgetting one branch yields a plan that fails non-deterministically.

Takeaway: with uncertainty the answer stops being a single sequence and becomes a conditional plan (or a policy) over what you might observe. Connects to: partially observable games (S15) and probabilistic reasoning (S23).

Reading: AIMA Ch. 4.3–4.4 (search with non-deterministic actions & AND-OR; search with partial observations / belief states).

S11Online Search AgentsLecture / Discussion

Act in unknown environments where you must move to discover the map.

Online search problems and agents. The agent only learns Result(s,a) by executing $a$. Performance is the competitive ratio — total cost incurred divided by the cost an omniscient offline agent would pay. LRTA* (learning real-time A*) stores and updates a running cost estimate per state, so repeated visits get smarter and it provably reaches the goal in safely explorable spaces.

Key idea: an online agent can't precompute a full plan — it must commit to a step, observe, and revise. Exploration cost is unavoidable when the model is unknown.

Complexity / pitfalls: with adversarial or irreversible maps (dead ends, one-way doors) the competitive ratio can be unbounded — no online algorithm can guarantee good performance unless the space is "safely explorable". Backtracking physically costs real actions, unlike in offline search.

Takeaway: online search is the bridge from planning to learning — the explore/exploit tension and value-update idea reappear as the foundation of reinforcement learning. Connects to: MEU and MDP policies (S25); MCTS playouts (S13) are online value estimation.

Reading: AIMA Ch. 4.5 (online search agents & unknown environments; LRTA*).

MODULE 4

Adversarial Search & Games

Sessions 12–15

The heart of the group project. When another rational agent is actively working against you, search becomes a minimax problem. We add alpha-beta pruning and heuristic cut-offs to make it tractable, Monte-Carlo Tree Search for huge branching factors, and chance nodes / belief states for stochastic and partially observable games. Then the group assignment is launched (S15) and presented later (S26–28). All three of this site's games live here: Quoridor Ghosts UNO

Compute optimal play in deterministic zero-sum games with minimax.
Prune provably irrelevant branches with alpha-beta and order moves to maximise cuts.
Replace exact evaluation with a heuristic evaluation function at a depth cut-off.
Handle chance (expectiminimax) and hidden information; know the limits of game search.

S12Game Theory & Heuristic Alpha-Beta SearchLecture / Discussion

Formalise two-player zero-sum games and make exhaustive search practical.

Game theory. Players, moves, terminal utilities, zero-sum vs. general-sum; the game tree.
Optimal decisions in games. The minimax value: MAX maximises, MIN minimises, assuming perfect play.
Heuristic alpha-beta tree search. Prune branches that cannot affect the result; cut off at a depth limit and apply an evaluation function.

Algorithm — Minimax with alpha-beta pruning

Minimax value: $\text{minimax}(s)=\text{Utility}(s)$ at terminals; otherwise $\max_a \text{minimax}(\text{Result}(s,a))$ at MAX nodes and $\min_a(\dots)$ at MIN nodes. Alpha-beta carries the best-so-far bounds $\alpha$ (MAX) and $\beta$ (MIN) and prunes a subtree once $\alpha \ge \beta$. With good move ordering it explores $O(b^{d/2})$ nodes instead of $O(b^d)$ — effectively doubling the searchable depth.

Pseudocode — Alpha-beta (MAX node)

v ← −∞
for each action a: v ← max(v, MIN-VALUE(Result(s,a), α, β))
if v ≥ β: return v // β-cut: MIN above won't allow this
α ← max(α, v)
return v

MIN-VALUE is symmetric ($\min$, $\alpha$-cut when $v\le\alpha$). Correct move ordering (try likely-best moves first) is what delivers the $O(b^{d/2})$ best case.

Complexity / pitfalls: alpha-beta returns the same value as minimax — pruning is exact, never approximate. But its $O(b^{d/2})$ best case needs near-perfect ordering; worst-case ordering gives no speed-up at all. A depth cut-off introduces the horizon effect: a disaster pushed just beyond the search depth looks invisible.

Cross-link: the Quoridor agent runs depth-2 alpha-beta with the evaluation $10\,(p_\text{opp}-p_\text{me}) + \Delta\text{walls}$, where $p$ is the BFS shortest-path-to-goal length.

Takeaway: minimax assumes a perfectly rational opponent; alpha-beta makes it tractable and an evaluation function makes it finite-depth — three ideas that together built every classical game AI through Deep Blue. Connects to: MCTS (S13) when the branching factor defeats alpha-beta.

Reading: AIMA Ch. 5.1–5.3 (game theory, optimal/minimax decisions, alpha-beta pruning).

S13Monte Carlo Tree Search & Stochastic GamesLecture / Discussion

Search games with huge branching or random elements without a full tree.

Monte Carlo Tree Search (MCTS). Select–expand–simulate–backpropagate; balancing exploration/exploitation with the UCB1 rule $\bar{x}_i + c\sqrt{\ln N / n_i}$. The engine behind AlphaGo.
Stochastic games & evaluation functions. Chance nodes and expectiminimax: average child values weighted by outcome probability.

Definition — Expectiminimax

Add chance nodes whose value is the probability-weighted average of their children: $\text{expectiminimax}(s)=\sum_{r} P(r)\,\text{expectiminimax}(\text{Result}(s,r))$ at chance nodes, with the usual max/min at the players' nodes. This is how the UNO agent reasons about random draws.

Algorithm — MCTS (one iteration)

Select. From the root, descend by UCB1 $\;\bar{x}_i + c\sqrt{\tfrac{\ln N}{n_i}}\;$ until a node with untried moves.
Expand. Add one new child for an untried move.
Simulate. Play out to a terminal state with a fast (often random) policy.
Backpropagate. Push the result up the visited path, updating wins $\bar{x}_i$ and visits $n_i$.

The $\bar{x}_i$ term exploits good moves; the $\sqrt{\ln N/n_i}$ term explores rarely-tried ones — the same explore/exploit balance as online search (S11).

Complexity / pitfalls: MCTS is anytime and needs no evaluation function, but weak playout policies give noisy estimates and slow convergence; the exploration constant $c$ must be tuned. Expectiminimax cost is $O(b^d n^d)$ with $n$ chance outcomes — chance nodes multiply the branching, so depth is very limited.

Key idea: when you can't evaluate states well but can simulate games cheaply, let random playouts estimate value — MCTS converges to minimax with enough samples.

Takeaway: alpha-beta needs a good evaluation function and modest branching; MCTS trades that for cheap simulations and huge branching (Go) — pick by what you can compute. Connects to: chance reasoning powers the UNO draw model.

Reading: AIMA Ch. 5.4–5.5 (Monte Carlo tree search; stochastic games & expectiminimax).

S14Assignment 1 Debrief & Search RecapInteractive debrief

Consolidate the search module by reviewing student implementations.

Discuss implementations and questions. Compare design choices, bugs and visualisations from Assignment 1.
Recap search algorithms. One coherent map from uninformed → informed → local → adversarial search.

Key idea: every algorithm so far is a different policy over the same frontier — what you put in it and how you order it is the algorithm.

Reading: review AIMA Ch. 3–5.

S15Partially Observable Games + Group Assignment LaunchLecture + project kickoff

Tackle hidden-information games and start the group agent project.

Partially observable games. Games of imperfect information (card games, hidden pieces); reasoning over information sets and belief states.
Limitations of game search algorithms. Horizon effect, evaluation-function error, and intractable branching.
Group assignment (1/2). Develop a Python agent that participates in an adversarial game.

Complexity / pitfalls: in imperfect-information games you cannot evaluate a single state — you must reason over an information set of states consistent with what you've seen, which is why naive minimax misplays card games. Determinization (sampling possible hidden states and averaging) is a common practical workaround.

Cross-link: the project deliverable on this site — Ghosts (hidden identities, belief-state weighting) and UNO (imperfect information + chance) are textbook partially-observable games; Quoridor is the perfect-information control case.

Takeaway: know the limits — the horizon effect, evaluation-function error and intractable branching are why we reach for alpha-beta cut-offs, MCTS and belief states rather than exact minimax. Connects to: build phase in S22, presented in S26–28.

Reading: AIMA Ch. 5.6 (partially observable games) & 5.7 (limitations). Deliverable format: a Python agent + a short design doc mapping each choice to a class concept; demoed live on simulated situations. Tip: agree the game-state interface and a baseline random agent on day one so you always have something to beat.

MODULE 5

Constraint Satisfaction Problems

Sessions 16–18, 24

A different problem shape: instead of a path, we want an assignment of values to variables that satisfies all constraints (scheduling, map-colouring, Sudoku, timetabling). The structure of the constraint graph lets us prune massively via inference (arc consistency) and smart ordering, and even solve by local search. The individual CSP assignment is set in S18 and debriefed in S24.

Model a real-world problem as variables, domains and constraints.
Propagate constraints with inference (node/arc consistency, AC-3) to shrink domains.
Order variables/values (MRV, degree, least-constraining-value) to speed backtracking.
Exploit problem structure and apply local search (min-conflicts) to large CSPs.

S16Defining CSPsLecture / Discussion

Formalise the CSP model and the idea of inference by constraint propagation.

Definitions and examples. Variables $X$, domains $D$, constraints $C$; map-colouring, N-queens, Sudoku, scheduling.
Inference in CSPs. Node, arc and path consistency; using the AC-3 algorithm to prune domains before/while searching.

Definition — CSP & arc consistency

A CSP is $\langle X, D, C\rangle$: assign each $X_i$ a value from $D_i$ so every constraint in $C$ holds. An arc $X_i \to X_j$ is consistent if for every value of $X_i$ there is some allowed value of $X_j$. AC-3 repeatedly enforces this, deleting unsupported values until a fixpoint.

Algorithm — AC-3

Put every arc $(X_i,X_j)$ on a queue.
Pop an arc; REVISE $X_i$'s domain by deleting any value with no support in $D_j$.
If anything was deleted, re-enqueue all arcs $(X_k,X_i)$ pointing into $X_i$.
Repeat to a fixpoint; an emptied domain proves the CSP is unsatisfiable. Runs in $O(c\,d^2)$ for $c$ constraints and domain size $d$.

Complexity / pitfalls: arc consistency prunes but does not solve — an arc-consistent CSP can still have no solution (you usually still need search). Forward checking is the cheap special case that only checks arcs into the just-assigned variable.

Takeaway: a CSP factors a goal test into many small constraints, and that structure is exactly what lets inference shrink the search before you ever guess. Connects to: backtracking + ordering (S17).

Reading: AIMA Ch. 6.1–6.2 (defining CSPs; constraint propagation & inference).

S17Backtracking & Local Search for CSPsLecture / Discussion

Search the assignment space efficiently using ordering heuristics and structure.

Variable & value ordering. Minimum-remaining-values (MRV) & degree heuristics for variables; least-constraining-value for values.
Intelligent backtracking. Backjumping and conflict-directed backjumping instead of naive chronological backtracking.
Local search for CSPs. The min-conflicts heuristic — astonishingly effective on large problems like N-queens.
Structure of problems. Tree-structured CSPs solve in linear time; cutset conditioning and tree decomposition.

Algorithm — Backtracking search

If the assignment is complete, return it.
Pick an unassigned variable (MRV); try each value (LCV order).
If the value is consistent, assign it, run inference (forward checking / AC-3), and recurse.
On failure, undo and try the next value; if none work, backtrack.

Complexity / pitfalls: chronological backtracking thrashes — it re-discovers the same conflict repeatedly; conflict-directed backjumping leaps straight to the culprit variable. MRV and degree are tie-breaking heuristics, not guarantees; min-conflicts can cycle on some structured problems and benefits from random restarts.

Key idea: "inference + smart ordering" turns exponential backtracking into something practical — most of the win is pruning domains before you guess.

Takeaway: exploit structure — a tree-structured constraint graph solves in $O(n d^2)$ via a single ordering pass, and cutset conditioning reduces near-trees to that case. Connects to: applied in the individual CSP assignment (S18).

Reading: AIMA Ch. 6.3–6.5 (backtracking search & ordering; local search for CSPs; the structure of problems).

S18Individual Assignment — CSPs in PracticeIndividual lab

Apply CSP modelling and solving to a real-world problem in Python.

Application of CSP to a real-world problem. Model it as $\langle X, D, C\rangle$, choose ordering/inference, and code the solver in Python.

Key idea: the hard part is the modelling — once variables, domains and constraints are right, a generic backtracking-with-inference solver does the rest.

Takeaway: good CSP modelling often means choosing the representation that yields the tightest constraints (e.g. all-different over per-pair inequalities) so inference does more work. Connects to: debriefed in S24.

Deliverable format: a Python CSP solver (backtracking + MRV/LCV + AC-3 or forward checking) applied to a real-world problem (timetabling, allocation, Sudoku-style), plus a write-up of the model $\langle X,D,C\rangle$ and the inference/ordering choices. Tip: validate your constraints on a tiny hand-solvable instance before scaling up. Debriefed in S24.

S24CSP Assignment Debrief & StudyInteractive debrief

Review the individual CSP solutions and deepen understanding.

Debrief & study of the CSP assignment. Compare modelling choices, inference strategies and performance on the real-world problem.

Reading: revisit AIMA Ch. 6. (Scheduled within Module 6's calendar window but belongs to the CSP thread.)

MODULE 6

Knowledge, Reasoning & Uncertainty

Sessions 19–23, 25

From search to representation: logical agents and first-order logic let an agent store facts and derive new ones by inference; planning turns goals into action sequences; and probability theory handles the uncertainty that pervades the real world, leading to multi-agent decision making and probabilistic programming.

Represent knowledge in propositional and first-order logic and perform inference.
Formulate and solve automatic planning problems.
Quantify uncertainty with probability and reason with Bayesian updates.
Frame multi-agent decisions and express models as probabilistic programs.

S19Knowledge, Reasoning & Planning — Logical AgentsLecture / Discussion

Build agents that maintain a knowledge base and reason over it.

Logical agents. A knowledge base is a set of sentences in a formal language; the agent TELLs it percepts and ASKs it queries. Propositional logic gives connectives ($\neg,\wedge,\vee,\Rightarrow,\Leftrightarrow$); entailment $\text{KB}\models\alpha$ holds when $\alpha$ is true in every model of the KB. Inference checks entailment by model enumeration (truth tables, sound & complete but $O(2^n)$), resolution (refutation-complete on CNF), or fast forward/backward chaining on Horn clauses (linear time). The Wumpus-world agent is the canonical example.

Key idea: a logical agent knows things and derives consequences — entailment ("does the KB guarantee $\alpha$?") replaces brute simulation.

Complexity / pitfalls: propositional satisfiability is NP-complete, so model enumeration blows up; restrict to Horn clauses for tractable chaining, or use modern SAT/DPLL solvers. Don't confuse entailment (semantic) with derivation $\vdash$ (syntactic) — a sound, complete proof system makes them coincide.

Takeaway: logic lets an agent represent partial knowledge and derive guaranteed consequences — truth that holds in all consistent worlds, not just the simulated one. Connects to: first-order logic (S20) lifts this to objects and relations.

Reading: AIMA Ch. 7 — Logical Agents (propositional logic, entailment, resolution, chaining, DPLL).

S20Knowledge & Reasoning (Part 2) — First-Order LogicLecture / Discussion

Gain the expressive power of objects, relations and quantifiers.

First-order logic (FOL). Ontologically richer than propositional logic: a world of objects, relations/predicates over them, and functions. Quantifiers $\forall$ ("for all") and $\exists$ ("there exists") range over objects, so one sentence captures infinitely many propositional facts. Example: "everyone who passes the exam is happy" is the single sentence $\forall x\,(\text{Passes}(x)\Rightarrow\text{Happy}(x))$.

Key idea: FOL lets one sentence ($\forall x\ \dots$) state a fact about every object, instead of one propositional clause per object.

Pitfalls: mixing quantifiers is the classic trap — $\forall x\,\exists y\,\text{Loves}(x,y)$ ("everyone loves someone") differs from $\exists y\,\forall x\,\text{Loves}(x,y)$ ("someone is loved by all"). $\forall$ pairs with $\Rightarrow$, $\exists$ pairs with $\wedge$; swapping them is a common modelling error.

Takeaway: the expressive jump from propositional to first-order logic is what makes general knowledge representation possible — at the cost of harder (semidecidable) inference. Connects to: FOL inference (S21).

Reading: AIMA Ch. 8 — First-Order Logic (syntax, semantics, quantifiers, knowledge engineering).

S21Knowledge & Reasoning (Part 2) — First-Order Logic, cont.Lecture / Discussion

Perform inference in first-order logic.

First-order logic (Part 2). Unification finds a substitution $\theta$ making two literals identical, e.g. $\text{Knows}(\text{John},x)$ and $\text{Knows}(y,\text{Mary})$ unify with $\theta=\{x/\text{Mary},\,y/\text{John}\}$. Generalised Modus Ponens then fires rules; forward chaining derives all consequences (data-driven), backward chaining works goals (query-driven, as in Prolog), and resolution gives a refutation-complete general method.

Key idea: unification — finding a substitution that makes two expressions identical — is the engine that drives every FOL inference rule.

Complexity / pitfalls: FOL inference is only semidecidable — if $\text{KB}\models\alpha$ a complete procedure will find it, but if not it may run forever. Standardise variables apart and apply the occurs-check during unification to avoid binding a variable inside a term containing it.

Takeaway: unification + chaining/resolution is the same pattern-match-and-substitute mechanism behind logic programming and rule engines. Connects to: planning & knowledge representation (S23).

Reading: AIMA Ch. 9 — Inference in First-Order Logic (unification, generalised Modus Ponens, forward/backward chaining, resolution).

S22Group Assignment — Application of AI to Games (2/2)Group project work

Continue developing the adversarial-game agent in Python.

Development of an adversarial-game agent. Integrate search, heuristics and (where relevant) belief-state reasoning into a working Python agent; test on simulated situations ahead of the final presentations.

Cross-link: this is the build phase for the Quoridor, Ghosts and UNO agents on this site.

Reading: AIMA Ch. 5 (reference); group rubric.

S23Uncertain Knowledge & ReasoningLecture / Discussion

Represent and plan with incomplete knowledge, then add probability.

Knowledge representation. Ontologies, categories, events and reasoning about the world.
Automatic planning. PDDL-style action schemas; forward/backward state-space planning and plan search.
Probabilistic reasoning. Probability axioms, conditional probability and Bayes' rule.

Definition — Bayes' rule

$P(H \mid E) = \dfrac{P(E \mid H)\,P(H)}{P(E)}$ — update belief in hypothesis $H$ given evidence $E$. This is the formal core of belief-state updates like those the Ghosts agent approximates over hidden ghost types.

Bayes worked example: P(disease)=0.01, P(+|disease)=0.99, P(+|healthy)=0.05 ⇒ P(disease|+) = (.99·.01)/(.99·.01+.05·.99) ≈ 0.17 — base rates dominate

Pitfalls: the base-rate fallacy — a 99%-accurate test on a rare disease still yields mostly false positives. Probabilities must be normalised over a mutually exclusive, exhaustive set; classical planning's closed-world & deterministic assumptions break under genuine uncertainty, which is why we move to probability.

Takeaway: Bayes' rule is the single mechanism that turns a sensor model $P(E\mid H)$ and a prior $P(H)$ into an updated belief $P(H\mid E)$ — the formal core of every belief-state update. Connects to: MEU & decisions (S25).

Reading: AIMA Ch. 10–12 (knowledge representation; classical planning & PDDL; quantifying uncertainty & Bayes).

S25Probabilistic Programming & Multi-Agent DecisionsLecture / Discussion

Combine probability with decisions, and across multiple agents.

Multi-agent decision making. Decisions when several rational agents interact; utilities, equilibria and the link back to game theory.
Probabilistic programming. Expressing generative probabilistic models as programs and inferring posteriors.

Definition — Maximum Expected Utility (MEU)

A rational decision maximises expected utility: $a^\ast = \arg\max_a \sum_{s'} P(s' \mid s, a)\,U(s')$. MEU is the bridge from probabilistic belief to action — the principle behind decision networks and MDP policies.

Complexity / pitfalls: exact Bayesian inference is NP-hard in general, so probabilistic-programming runtimes lean on approximate inference (MCMC, variational) — results are samples with variance, not exact posteriors. In multi-agent settings a single MEU action is ill-defined: you need equilibrium concepts (Nash) because the "best" action depends on others' choices.

Key idea: probabilistic programs let you write a model the way you'd write code, and ask the runtime to do the Bayesian inference for you.

Takeaway: MEU closes the loop of the whole course — belief (probability) + preferences (utility) ⇒ action — and is the exact objective that Markov decision processes and reinforcement learning optimise. Connects to: back to game theory (S12) and forward to RL electives.

Reading: AIMA Ch. 16–18 (making simple/complex decisions & MDPs; multi-agent decision making).

CAPSTONE

Presentations, Conclusion & Final Exam

Sessions 26–30

The course closes with the group agents being presented and demonstrated live, a wrap-up session, and the comprehensive final exam.

Present and defend a working adversarial-game agent to peers.
Demonstrate the agent on simulated game situations.
Integrate the whole syllabus and apply it to limited exam situations.

S26Final Group Speaker Presentations (1)Presentation

Present the application of AI to games and demonstrate the agents working on simulated situations.

Application of AI to games. Algorithm, heuristic and mapping to class concepts, with a live demo.

Cross-link: the matching slide deck follows the rubric — organisation (5pts), per-game algorithm/heuristic/concept-mapping, and presentation quality — with iframed live demos.

Presentation rubric & tips. Graders look for: a clear problem/game framing, the chosen algorithm and heuristic with justification, an explicit mapping to course concepts (which session each design choice comes from), a working live demo, and even speaking time across the group. Tip: rehearse the demo with a fallback recording, and open by naming the environment type (observable? stochastic? multi-agent?) — it frames every later choice.

S27Final Group Speaker Presentations (2)Presentation

Continue the group presentations and live agent demonstrations.

Application of AI to games. Remaining groups present and demonstrate their agents.

S28Final Group Speaker Presentations (3)Presentation

Conclude the round of presentations with live demonstrations on simulated situations.

Application of AI to games. Final groups present; Q&A and peer comparison.

S29Group Assignment ConclusionInteractive session

Wrap up the group project and consolidate lessons learned.

Conclusion. Synthesis across the games, retrospective and reflection on the whole course arc.

S30Final ExamExam

Demonstrate understanding of the whole syllabus and application of concepts to limited situations.

Final exam. Comprehensive, covering agents, search, games, CSPs and reasoning under uncertainty. Worth 30% of the grade; no GenAI permitted.

Reminder: the 80% attendance rule must be met to be eligible for both the ordinary and the June/July re-sit calls.

KEY Key concepts — glossary

A quick reference to the core vocabulary used across the course and in the agent code.

Rational agent: An agent that, for each percept sequence, selects the action expected to maximise its performance measure given its knowledge.

PEAS: The frame for specifying a task environment: Performance measure, Environment, Actuators, Sensors.

Agent function: The mapping from any percept sequence to an action; implemented concretely by the agent program running on an architecture.

Environment properties: Six axes that determine algorithm choice: fully/partially observable, deterministic/stochastic, episodic/sequential, static/dynamic, discrete/continuous, single-/multi-agent.

Agent types: Simple-reflex, model-based reflex, goal-based, utility-based and learning agents — a ladder of increasing capability.

Problem formulation: Casting a task as $\langle S,s_0,A,\text{Result},\text{Goal},c\rangle$: states, initial state, actions, transition model, goal test, path cost.

Transition model: $\text{Result}(s,a)$ — the state(s) reached by doing action $a$ in state $s$; deterministic for a single state, non-deterministic for a set.

State space: The set of all states reachable from the initial state via the actions; search explores its implicit graph.

Branching factor $b$: The (average) number of successors per node; with goal depth $d$ it governs the $O(b^d)$ cost of uninformed search.

Completeness: A search algorithm is complete if it is guaranteed to find a solution whenever one exists.

Optimality: An algorithm is optimal if the solution it returns has the lowest possible path cost.

Frontier (open list): The set of generated-but-not-yet-expanded nodes; the data structure used (queue/stack/priority queue) defines the search strategy.

BFS: Breadth-first search: FIFO frontier; complete and optimal for unit step costs; $O(b^d)$ memory.

DFS: Depth-first search: LIFO frontier; linear memory but neither complete nor optimal in general.

UCS / Dijkstra: Uniform-cost search: expands the least-$g(n)$ frontier node; optimal for non-negative costs.

Iterative deepening (IDS): Repeated depth-limited DFS; BFS optimality with DFS $O(bd)$ memory.

Heuristic $h(n)$: An estimate of the cost from node $n$ to the goal; the knowledge that makes informed search efficient.

Admissible heuristic: Never overestimates the true cost-to-goal ($h(n)\le h^\ast(n)$); guarantees A* tree-search optimality.

Consistent heuristic: Satisfies the triangle inequality $h(n)\le c(n,n') + h(n')$; guarantees A* graph-search optimality.

A* search: Best-first search on $f(n)=g(n)+h(n)$; optimal with an admissible/consistent heuristic.

Dominance: Heuristic $h_2$ dominates $h_1$ if $h_2(n)\ge h_1(n)$ for all $n$ (both admissible); the dominant one expands no more nodes.

Greedy best-first: Informed search ordered by $h(n)$ alone; fast but neither complete nor optimal.

Local search: Search that keeps only the current state(s) and moves to neighbours; ignores the path, ideal for optimisation.

Hill climbing: Local search that greedily moves to the best neighbour; prone to local maxima, ridges and plateaus.

Evolutionary algorithm: Population-based stochastic search using selection, crossover and mutation to explore in parallel.

Online search: Search that interleaves computation and action in an unknown environment; judged by competitive ratio vs. the offline optimum.

Contingency plan: A solution to a non-deterministic problem: a branching tree of if-then actions rather than a fixed sequence.

Simulated annealing: Local search that accepts worse moves with probability $e^{-\Delta E/T}$, cooling $T$ to escape local optima.

Belief state: The set of physical states an agent could be in under partial observability; search/planning operate over it.

AND-OR tree: Search structure for non-deterministic actions: OR = agent choices, AND = environment outcomes that must all succeed.

Zero-sum game: A game where one player's gain exactly equals the other's loss; utilities sum to a constant.

Minimax: Optimal decision rule for two-player zero-sum games: MAX maximises, MIN minimises, assuming perfect opponent play.

Alpha-beta pruning: Prunes branches that cannot change the minimax value; with good ordering explores $O(b^{d/2})$ nodes.

Horizon effect: A flaw of depth-limited game search: a damaging event just beyond the cut-off is invisible to the evaluation function.

UCB1: The selection rule in MCTS, $\bar{x}_i + c\sqrt{\ln N / n_i}$, balancing exploitation of good moves with exploration of rare ones.

Information set: In imperfect-information games, the set of game states a player cannot distinguish given what they have observed.

Chance node: A game-tree node for a random event (dice, draw); its value is the probability-weighted average of its children.

Evaluation function: A heuristic estimate of a non-terminal game state's utility, applied at a depth cut-off.

Expectiminimax: Minimax extended with chance nodes whose value is the probability-weighted average of children.

MCTS: Monte Carlo Tree Search: select–expand–simulate–backpropagate with UCB1; strong on huge branching factors.

CSP: Constraint satisfaction problem $\langle X,D,C\rangle$: assign values to variables so all constraints hold.

Arc consistency / AC-3: Inference that deletes variable-domain values with no consistent partner across a constraint.

Backtracking search: Depth-first assignment of CSP variables with inference and ordering heuristics; backtracks on dead ends.

MRV / LCV: Minimum-remaining-values (choose the most constrained variable) and least-constraining-value (try the least restrictive value) heuristics.

Min-conflicts: Local-search CSP heuristic: reassign a conflicted variable to the value minimising total conflicts.

Forward checking: The cheap inference step that, after assigning a variable, deletes inconsistent values from neighbouring domains.

Conflict-directed backjumping: Intelligent backtracking that jumps directly to the variable responsible for a conflict instead of the previous one.

Tree-structured CSP: A CSP whose constraint graph is a tree; solvable in $O(n d^2)$ with a single ordering + arc-consistency pass.

Knowledge base (KB): The set of sentences an agent believes; queried with ASK and updated with TELL.

Propositional logic: Logic of true/false atoms combined with connectives; sound and complete but inference is NP-complete (SAT).

Resolution: A single, refutation-complete inference rule on clauses in conjunctive normal form (CNF).

Forward / backward chaining: Linear-time inference on Horn clauses: data-driven (forward) or goal-driven (backward, as in Prolog).

Quantifiers: $\forall$ (universal, pairs with $\Rightarrow$) and $\exists$ (existential, pairs with $\wedge$) range over objects in first-order logic.

Automatic planning: Finding an action sequence to reach a goal, e.g. via PDDL action schemas and state-space plan search.

Conditional probability: $P(A\mid B)=P(A\wedge B)/P(B)$ — the probability of $A$ given that $B$ is known to hold.

Prior / posterior: Belief before evidence ($P(H)$) and after evidence ($P(H\mid E)$); Bayes' rule maps one to the other.

First-order logic (FOL): Logic with objects, relations/predicates, functions and quantifiers $\forall,\exists$; far more compact than propositional logic.

Entailment ($\models$): $\text{KB}\models\alpha$ means $\alpha$ is true in every model in which the knowledge base is true.

Unification: Finding a substitution that makes two logical expressions identical; the engine of FOL inference.

Bayes' rule: $P(H\mid E)=P(E\mid H)P(H)/P(E)$; the rule for updating belief given evidence.

Maximum Expected Utility: Rational choice principle: pick the action maximising $\sum_{s'}P(s'\mid s,a)U(s')$.

Utility function: A numeric measure of preference over outcomes; a rational agent maximises its expected utility.

MDP: Markov decision process $\langle S,A,P,R,\gamma\rangle$ — the formal model for sequential decisions under uncertainty; its optimal policy maximises expected discounted reward.

Probabilistic programming: Expressing a generative probability model as a program and letting the runtime infer posteriors (often by approximate inference).

Nash equilibrium: A multi-agent strategy profile where no player can do better by unilaterally changing strategy.

Competitive ratio: For online algorithms, the ratio of the cost actually incurred to the cost an omniscient offline optimum would pay.

REF Annotated bibliography

Recommended · core text

Stuart Russell & Peter Norvig (2022). Artificial Intelligence: A Modern Approach (4th ed., global). Pearson. ISBN 9781292401133 (Digital).

"AIMA" — the standard reference for the entire course. Chapters map almost one-to-one onto the modules above: agents (Ch. 1–2), problem-solving & search (Ch. 3–4), adversarial games (Ch. 5), CSPs (Ch. 6), logic & FOL (Ch. 7–9), and uncertainty / decisions (Ch. 12–18). The chapter readings cited on each session refer to this book.

Recommended · context

Kelly Clancy (2024). Playing with Reality: How Games Shape Our World. ISBN 978024154550 (Digital).

A cultural and historical lens on games — why game-playing has been both a benchmark and a driver of AI progress. Good companion reading for the adversarial-games module and the group project's framing of Quoridor, Ghosts and UNO as testbeds for reasoning.

Policies: the University's Code of Conduct, Attendance Policy and Ethics Code apply; the Program Director may add indications. See the original syllabus PDF for the authoritative wording on attendance, re-sits and behaviour rules.