05 · Function Approximation

Tabular methods break when the state space is huge or continuous. The fix: represent v̂(s, w) or q̂(s, a, w) with a parameterised function — features × weights, or a neural net. Then learn w by gradient descent on the TD or MC error.

Linear value function:  v̂(s, w) = wT φ(s)
Semi-gradient TD update:  w ← w + α [ r + γ v̂(s′, w) − v̂(s, w) ] ∇w v̂(s, w)
For linear:  ∇w v̂(s,w) = φ(s)
▶ live demo — approximating a noisy target function

A toy 1-D value function is hidden inside v*(s) = sin(3s) + 0.3·s. We collect noisy samples and fit v̂(s,w) using different feature bases. Watch generalisation behaviour.

Samples seen0
MSE
‖w‖
d (params)
True v*(s) v̂(s, w) — learned Noisy samples Active features (faded)

MSE over time

The four families

Polynomialφ(s) = [1, s, s², s³, …]. Easy. Bad at expressing local detail without huge order. Numerically unstable.
RBF (Gaussian)φᵢ(s) = exp(−|s−cᵢ|² / 2σ²). Local features. Good interpolation. Choose centres & widths.
Tile codingSparse binary features from overlapping grids. Fast lookup. Used in Mountain Car & much classic RL.
Fourierφᵢ(s) = cos(iπs). Global basis. Surprisingly strong on bounded domains.
Neural NetNon-linear, learns features. Powerful but loses convergence guarantees (deadly triad).
The Deadly Triad: bootstrapping + off-policy + function approximation can diverge. DQN works around this with replay buffers and target networks.

← MC / TD  ·  Next: REINFORCE →