Ascent — watch a tabular RL agent learn to climb

An underpowered car can't climb the hill with raw engine power — the agent must learn to build momentum by rocking back and forth. Train a tabular agent live and watch its value surface, policy, and strategy emerge.

Agent

Algorithm Learning rate α Discount γ Exploration decay Episodes Reward shaping (speed bonus)

Episode 0 / 0ε 1.000

The climb

Learning curve (reward per episode — higher = fewer steps)

Greedy evaluation

Phase portrait (position × velocity — the momentum loops)

What the agent learned — x: position (valley → goal), y: velocity (− bottom, + top)

Value surface V(s)=maxₐ Q

Greedy policy push left idle push right

Visit frequency (log)