From-Scratch Build · Reinforcement Learning

Mesh Parking RL

Parking is a deceptively hard control problem: a car can't slide sideways, the goal pose is tight, and one wrong move means a collision. This build teaches an agent to park by laying a mesh over the lot — discretising space into cells the agent can reason about — and learning a policy on it. It comes with a paper-style write-up, because the explanation is half the work.

PythonReinforcement learningSpatial mesh Path planningReward shapingWrite-up

What it is

Learning to park on a grid of cells

A parking lot is continuous, but an agent learns far more easily over something countable. The core idea here is the mesh: overlay the lot with a grid of cells, and the messy continuous problem — where exactly is the car, where exactly is the space — becomes a tractable one the agent can plan and learn over. The mesh is the bridge between physical geometry and a learnable state space.

On top of that representation, an RL agent explores manoeuvres, gets rewarded for edging towards a clean parked pose and penalised for clipping obstacles, and gradually converges on a policy that parks reliably. The accompanying write-up lays out the method and results the way a short paper would.

grid
the single design choice — meshing continuous space into discrete cells — that turns an intractable problem into a learnable one.

The stack

From lot geometry to parked car

Representation first, then the agent, then the explanation.

representation

Spatial mesh

Discretise the lot into a grid of cells, each carrying occupancy and cost — the foundation everything else stands on.

dynamics

Car model

Non-holonomic motion: the car can drive and steer but not slide sideways, which is what makes parking hard.

agent

RL policy

Learn, by trial and error over the mesh, the sequence of moves that lands the car in the goal pose.

signal

Reward shaping

Reward progress towards the slot and orientation; penalise collisions and dithering, so good behaviour is learnable.

baseline

Planning comparison

Hold the learned policy up against a classical grid path-planner to see what RL adds — and costs.

communication

Paper write-up

Method, experiments and results documented like a short paper — the deliverable that makes the work legible.

Architecture

How a parking policy is learned

From an empty lot to a trained agent, the build follows a clear sequence:

  1. Mesh the lot

    Lay a grid over the space and mark which cells are free, blocked, or the goal.

  2. Place the car

    Give the agent a start pose and the car's motion constraints.

  3. Explore

    Let the agent try manoeuvres across the mesh, collecting reward and collisions.

  4. Shape & learn

    Update the policy from the shaped reward so successful approaches get reinforced.

  5. Document

    Run the trained agent, measure success rate, and write up the method and findings.

Reflection

What rebuilding it taught me