AI Reasoning & Problem Solving · 2025
Playable website: ../website/index.html
Total time ≤ 10 min talk + 5 min Q&A (per rubric)
my_agent()play_game driver - identical across all three notebooksGameClient + play_game across gamesget_valid_moves, simulate_move, is_terminalshortest_path_length() as a path-completeness check before placing a wall (matches the rule "walls cannot fully block any player")def negamax(state, depth, alpha, beta):
if depth == 0 or state.is_terminal():
return eval(state)
for m in order_moves(state): # walls near opponent first
v = -negamax(apply(state, m), depth-1, -beta, -alpha)
alpha = max(alpha, v)
if alpha >= beta: break
return alpha
The notebook tip is exactly this:
eval(state) = 10 * (opponent_path - my_path)
+ (my_walls_left - opp_walls_left)
opponent_path - my_path is computed via BFS (already provided as shortest_path_length)| Concept | Where it appears |
|---|---|
| Uninformed search (BFS) | Shortest-path distance computation |
| Adversarial search / minimax | Move selection in my_agent |
| Alpha-beta pruning | Negamax cut-offs on the frontier |
| Heuristic evaluation | Path-difference + wall-count combination |
| Move ordering | Walls sorted by distance to opponent before alpha-beta |
| Branching-factor management | Capping the wall-move candidate list (top-K) |
The second loss condition makes this a true bluffing game.
# Per candidate move
child = apply(state, move)
if child.winner: score = eval(child)
else:
score = min(eval(apply(child, om)) for om in opp_moves[:16])
return argmax(moves by score)
Belief reasoning is encoded in that last bullet - hidden pieces are penalised at full weight, revealed-evil pieces at reduced weight. This is the imperfect-information "trick" the notebook hints at.
| Concept | Where it appears |
|---|---|
| Adversarial search | 2-ply min over opponent responses |
| Imperfect information / belief state | Threat weighting based on whether opponent piece type is revealed |
| Expectiminimax-style averaging | Treating hidden pieces as a weighted mixture |
| Heuristic design with multiple objectives | Material + advancement + threat - balanced via weights |
| Game decomposition (setup vs play) | Separate strategy phase: fixed setup, search-based play |
| "Reverse" terminal states | Captured-all-evils-loses → forces non-greedy capture policy |
color, value, type (normal/action/wild)get_valid_moves() (legal plays + draw)We deliberately did not use deep search here. Hidden hands plus a 100+-card stochastic deck make minimax a poor fit; a thoughtful greedy policy with the right features beats it in wall-clock practice for this assignment.
plays = [m for m in valid_moves if m['type'] == 'play']
if not plays: return {'type': 'draw', 'count': 1}
plays.sort(key=score, reverse=True)
return plays[0]
| Feature | Weight |
|---|---|
| Card is normal numeric | + card value (dump high cards first) |
| Card is Draw 2 | +30 |
| Card is Skip | +20 |
| Card is Reverse | +15 |
| Card is Wild | -50 (save them) |
| Card is Wild +4 | -80 (rarer still) |
| Card matches top by color (not just value) | +5 (keep flexibility) |
| Some opponent has ≤2 cards AND we have an aggressive card | +40 |
Information used: get_hand_sizes() tells us when to switch to aggression; get_discard_pile() is available for full card counting (future work).
| Concept | Where it appears |
|---|---|
| Imperfect-information game | We can't see opponents' hands - have to reason from hand sizes and discard pile |
| Stochastic environment | Drawing from a shuffled deck = chance nodes |
| Heuristic / rule-based policy | Why we picked heuristics over minimax when the state space is huge |
| Card counting / belief tracking | get_discard_pile() exposes the played history |
| Aggression switching | Threshold rule on opponent hand size (greedy + threshold = simple form of game-state policy) |
| Trade-off: complexity vs. responsiveness | Same reason we capped Quoridor's wall search |
| Quoridor | Ghosts | UNO | |
|---|---|---|---|
| Info | Perfect | Hidden types | Hidden hands |
| Chance | None | None | Deck draws |
| Players | 2 | 2 | 4 |
| Algorithm | α-β minimax (d=2) | 2-ply min over opp responses | Greedy policy |
| Heuristic | Path-diff + walls | Material + advancement + belief threat | Action-card valuation |
| Key class concept | BFS + α-β | Belief state | Heuristic under uncertainty |
Questions?
Source notebooks: Quoridor_Assignment_Standalone.ipynb, Ghosts_Assignment_Standalone.ipynb, Uno_Assignment_Standalone.ipynb
Playable website: ../website/index.html