From-Scratch Build · Reinforcement Learning
Parking is a deceptively hard control problem: a car can't slide sideways, the goal pose is tight, and one wrong move means a collision. This build teaches an agent to park by laying a mesh over the lot — discretising space into cells the agent can reason about — and learning a policy on it. It comes with a paper-style write-up, because the explanation is half the work.
What it is
A parking lot is continuous, but an agent learns far more easily over something countable. The core idea here is the mesh: overlay the lot with a grid of cells, and the messy continuous problem — where exactly is the car, where exactly is the space — becomes a tractable one the agent can plan and learn over. The mesh is the bridge between physical geometry and a learnable state space.
On top of that representation, an RL agent explores manoeuvres, gets rewarded for edging towards a clean parked pose and penalised for clipping obstacles, and gradually converges on a policy that parks reliably. The accompanying write-up lays out the method and results the way a short paper would.
The stack
Representation first, then the agent, then the explanation.
Discretise the lot into a grid of cells, each carrying occupancy and cost — the foundation everything else stands on.
Non-holonomic motion: the car can drive and steer but not slide sideways, which is what makes parking hard.
Learn, by trial and error over the mesh, the sequence of moves that lands the car in the goal pose.
Reward progress towards the slot and orientation; penalise collisions and dithering, so good behaviour is learnable.
Hold the learned policy up against a classical grid path-planner to see what RL adds — and costs.
Method, experiments and results documented like a short paper — the deliverable that makes the work legible.
Architecture
From an empty lot to a trained agent, the build follows a clear sequence:
Lay a grid over the space and mark which cells are free, blocked, or the goal.
Give the agent a start pose and the car's motion constraints.
Let the agent try manoeuvres across the mesh, collecting reward and collisions.
Update the policy from the shaped reward so successful approaches get reinforced.
Run the trained agent, measure success rate, and write up the method and findings.
Reflection