Mesh Parking RL — Built From Scratch

What it is

Learning to park on a grid of cells

A parking lot is continuous, but an agent learns far more easily over something countable. The core idea here is the mesh: overlay the lot with a grid of cells, and the messy continuous problem — where exactly is the car, where exactly is the space — becomes a tractable one the agent can plan and learn over. The mesh is the bridge between physical geometry and a learnable state space.

On top of that representation, an RL agent explores manoeuvres, gets rewarded for edging towards a clean parked pose and penalised for clipping obstacles, and gradually converges on a policy that parks reliably. The accompanying write-up lays out the method and results the way a short paper would.

grid

the single design choice — meshing continuous space into discrete cells — that turns an intractable problem into a learnable one.

The stack

From lot geometry to parked car

Representation first, then the agent, then the explanation.

representation

Spatial mesh

Discretise the lot into a grid of cells, each carrying occupancy and cost — the foundation everything else stands on.

dynamics

Car model

Non-holonomic motion: the car can drive and steer but not slide sideways, which is what makes parking hard.

agent

RL policy

Learn, by trial and error over the mesh, the sequence of moves that lands the car in the goal pose.

signal

Reward shaping

Reward progress towards the slot and orientation; penalise collisions and dithering, so good behaviour is learnable.

baseline

Planning comparison

Hold the learned policy up against a classical grid path-planner to see what RL adds — and costs.

communication

Paper write-up

Method, experiments and results documented like a short paper — the deliverable that makes the work legible.

Architecture

How a parking policy is learned

From an empty lot to a trained agent, the build follows a clear sequence:

Mesh the lot
Lay a grid over the space and mark which cells are free, blocked, or the goal.
Place the car
Give the agent a start pose and the car's motion constraints.
Explore
Let the agent try manoeuvres across the mesh, collecting reward and collisions.
Shape & learn
Update the policy from the shaped reward so successful approaches get reinforced.
Document
Run the trained agent, measure success rate, and write up the method and findings.

Reflection

What rebuilding it taught me

Representation is the real decision. Choosing the mesh resolution shapes everything downstream — too coarse and you can't park, too fine and learning crawls.
Non-holonomic constraints bite. A car that can't move sideways turns "just go to the slot" into a genuine sequencing puzzle.
Reward shaping is steering. The agent does exactly what the reward rewards; getting parking behaviour means designing the signal carefully.
RL and planning are complementary. A classical planner gives a baseline; RL earns its keep where the dynamics get awkward.
Writing it up clarifies it. Producing the paper-style explanation forced me to actually justify every choice, not just make it work.

Learning to park on a grid of cells

From lot geometry to parked car

Spatial mesh

Car model

RL policy

Reward shaping

Planning comparison

Paper write-up

How a parking policy is learned

Mesh the lot

Place the car

Explore

Shape & learn

Document

What rebuilding it taught me