05 · Function Approximation

Tabular methods break when the state space is huge or continuous. The fix: represent v̂(s, w) or q̂(s, a, w) with a parameterised function — features × weights, or a neural net. Then learn w by gradient descent on the TD or MC error.

Linear value function: v̂(s, w) = w^T φ(s)
Semi-gradient TD update: w ← w + α [ r + γ v̂(s′, w) − v̂(s, w) ] ∇_w v̂(s, w)
For linear: ∇_w v̂(s,w) = φ(s)

▶ live demo — approximating a noisy target function

A toy 1-D value function is hidden inside v*(s) = sin(3s) + 0.3·s. We collect noisy samples and fit v̂(s,w) using different feature bases. Watch generalisation behaviour.

Feature basis # features 20 α (LR) 0.05 Noise σ 0.20 Samples/step 10

Samples seen0

MSE—

‖w‖—

d (params)—

True v*(s) v̂(s, w) — learned Noisy samples Active features (faded)

MSE over time

The four families

Polynomial	φ(s) = [1, s, s², s³, …]. Easy. Bad at expressing local detail without huge order. Numerically unstable.
RBF (Gaussian)	φᵢ(s) = exp(−\|s−cᵢ\|² / 2σ²). Local features. Good interpolation. Choose centres & widths.
Tile coding	Sparse binary features from overlapping grids. Fast lookup. Used in Mountain Car & much classic RL.
Fourier	φᵢ(s) = cos(iπs). Global basis. Surprisingly strong on bounded domains.
Neural Net	Non-linear, learns features. Powerful but loses convergence guarantees (deadly triad).

The Deadly Triad: bootstrapping + off-policy + function approximation can diverge. DQN works around this with replay buffers and target networks.

← MC / TD · Next: REINFORCE →