⛰️ Ascent tabular RL on MountainCar
GitHub ↗

An underpowered car can't climb the hill with raw engine power — the agent must learn to build momentum by rocking back and forth. Train a tabular agent live and watch its value surface, policy, and strategy emerge.

Agent

Episode 0 / 0ε 1.000

The climb

Learning curve (reward per episode — higher = fewer steps)

Greedy evaluation

Phase portrait (position × velocity — the momentum loops)

What the agent learned — x: position (valley → goal), y: velocity (− bottom, + top)

Value surface V(s)=maxₐ Q
Greedy policy push left idle push right
Visit frequency (log)