Smart Recycling Sorting Station — Built From Scratch

What it is

A kiosk that sorts your recycling for you

Most people want to recycle correctly and still get it wrong — the rules differ by item and by city. This station closes that gap. A camera captures whatever you present, an AI classifier decides what category of waste it is, and a display walks you through the rest: which coloured bin to use, whether you got it right, and a small reward when you do.

I built it to learn the full shape of a real-time vision application — not just "run a model on an image", but the whole choreography of camera, inference, hardware signals and an interface that guides a person through a multi-step interaction.

The core idea I wanted to learn: a vision product is mostly state management. The model call is one line; the hard part is the loop that captures, classifies, shows the right screen, waits for the person to act, and resets cleanly for the next user.

The stack

Tools under the hood

The point of this rebuild was the toolchain. Here is what each piece actually does in the system.

capture

Camera module

Grabs frames of whatever the user holds up, feeding the classifier a clean image at the right moment.

vision

YOLO classifier

A fast object-detection model that proposes what the item is, returning the highest-confidence prediction.

reasoning

Vision language model

A second opinion for ambiguous items — a vision-capable LLM that classifies trash the detector isn't sure about.

flow

Hardware loop

The heartbeat of the kiosk: it watches sensors, triggers capture, runs classification, and drives the interaction phases.

display

Media display

Shows the guidance screens — show trash, processing, throw in the blue/brown/yellow bin, great job, try again.

api

State server

A small API exposes the current system phase and state, so the display and any external dashboard stay in sync.

The interaction

One trip through the kiosk

Each user moves through a clear state machine, from presenting an item to being rewarded.

Show trash live
The idle screen invites the user to hold an item up to the camera.
Capture live
The camera grabs a frame once an item is detected in view.
Classify live
The YOLO model — backed by the vision LLM for hard cases — decides the waste category.
Guide live
The display tells the user which coloured bin the item belongs in.
Verify live
The system confirms correct disposal and shows "great job" or "try again".
Reward & reset live
A reward is granted for correct sorting, then the kiosk returns to idle for the next person.

Two models, one decision

Why pair a detector with a vision LLM

Using two models instead of one was the most interesting design choice, and it solves a real reliability problem:

Speed first: the YOLO detector is fast and cheap, so it handles the common, clear-cut items instantly.
Reasoning as backup: when confidence is low or the item is unusual, the vision LLM steps in to reason about what it's actually looking at.
Resilient calls: network requests retry with backoff on throttling and server errors, so a flaky connection doesn't strand a user mid-interaction.

In my rebuild the hardware loop is the centre of gravity — every other module is something it calls at the right phase, which kept the system easy to reason about.

Reflection

What rebuilding it taught me

The model is the easy part. Capture timing, phase transitions and clean resets are where a real-time vision kiosk actually lives.
Two cheap-and-smart models beat one. A fast detector plus a reasoning fallback gives both speed and accuracy without paying for the slow path every time.
Rewards change behaviour. Closing the loop with positive feedback is what turns a classifier into something people actually want to use.
State must be shared. A small API exposing the current phase kept the display, hardware and any observer perfectly in step.

A kiosk that sorts your recycling for you

Tools under the hood

Camera module

YOLO classifier

Vision language model

Hardware loop

Media display

State server

One trip through the kiosk

Show trash live

Capture live

Classify live

Guide live

Verify live

Reward & reset live

Why pair a detector with a vision LLM

What rebuilding it taught me