05 · Function Approximation
Tabular methods break when the state space is huge or continuous. The fix: represent v̂(s, w) or q̂(s, a, w) with a parameterised function — features × weights, or a neural net. Then learn w by gradient descent on the TD or MC error.
Linear value function: v̂(s, w) = wT φ(s)
Semi-gradient TD update: w ← w + α [ r + γ v̂(s′, w) − v̂(s, w) ] ∇w v̂(s, w)
For linear: ∇w v̂(s,w) = φ(s)
Semi-gradient TD update: w ← w + α [ r + γ v̂(s′, w) − v̂(s, w) ] ∇w v̂(s, w)
For linear: ∇w v̂(s,w) = φ(s)
▶ live demo — approximating a noisy target function
A toy 1-D value function is hidden inside v*(s) = sin(3s) + 0.3·s. We collect noisy samples and fit v̂(s,w) using different feature bases. Watch generalisation behaviour.
Samples seen0
MSE—
‖w‖—
d (params)—
True v*(s)
v̂(s, w) — learned
Noisy samples
Active features (faded)
MSE over time
The four families
| Polynomial | φ(s) = [1, s, s², s³, …]. Easy. Bad at expressing local detail without huge order. Numerically unstable. |
| RBF (Gaussian) | φᵢ(s) = exp(−|s−cᵢ|² / 2σ²). Local features. Good interpolation. Choose centres & widths. |
| Tile coding | Sparse binary features from overlapping grids. Fast lookup. Used in Mountain Car & much classic RL. |
| Fourier | φᵢ(s) = cos(iπs). Global basis. Surprisingly strong on bounded domains. |
| Neural Net | Non-linear, learns features. Powerful but loses convergence guarantees (deadly triad). |
The Deadly Triad: bootstrapping + off-policy + function approximation can diverge. DQN works around this with replay buffers and target networks.