← all builds

From-Scratch Build · Applied Computer Vision

Smart Recycling Sorting Station

Hold a piece of trash up to a camera and the station tells you which bin it belongs in — then rewards you for getting it right. I built this from scratch to learn how a camera, an AI vision classifier and a guided on-screen flow come together into a working kiosk.

Computer visionYOLO + vision LLMCamera capture State machineReward loop

What it is

A kiosk that sorts your recycling for you

Most people want to recycle correctly and still get it wrong — the rules differ by item and by city. This station closes that gap. A camera captures whatever you present, an AI classifier decides what category of waste it is, and a display walks you through the rest: which coloured bin to use, whether you got it right, and a small reward when you do.

I built it to learn the full shape of a real-time vision application — not just "run a model on an image", but the whole choreography of camera, inference, hardware signals and an interface that guides a person through a multi-step interaction.

The core idea I wanted to learn: a vision product is mostly state management. The model call is one line; the hard part is the loop that captures, classifies, shows the right screen, waits for the person to act, and resets cleanly for the next user.

The stack

Tools under the hood

The point of this rebuild was the toolchain. Here is what each piece actually does in the system.

capture

Camera module

Grabs frames of whatever the user holds up, feeding the classifier a clean image at the right moment.

vision

YOLO classifier

A fast object-detection model that proposes what the item is, returning the highest-confidence prediction.

reasoning

Vision language model

A second opinion for ambiguous items — a vision-capable LLM that classifies trash the detector isn't sure about.

flow

Hardware loop

The heartbeat of the kiosk: it watches sensors, triggers capture, runs classification, and drives the interaction phases.

display

Media display

Shows the guidance screens — show trash, processing, throw in the blue/brown/yellow bin, great job, try again.

api

State server

A small API exposes the current system phase and state, so the display and any external dashboard stay in sync.

The interaction

One trip through the kiosk

Each user moves through a clear state machine, from presenting an item to being rewarded.

  1. Show trash live

    The idle screen invites the user to hold an item up to the camera.

  2. Capture live

    The camera grabs a frame once an item is detected in view.

  3. Classify live

    The YOLO model — backed by the vision LLM for hard cases — decides the waste category.

  4. Guide live

    The display tells the user which coloured bin the item belongs in.

  5. Verify live

    The system confirms correct disposal and shows "great job" or "try again".

  6. Reward & reset live

    A reward is granted for correct sorting, then the kiosk returns to idle for the next person.

Two models, one decision

Why pair a detector with a vision LLM

Using two models instead of one was the most interesting design choice, and it solves a real reliability problem:

In my rebuild the hardware loop is the centre of gravity — every other module is something it calls at the right phase, which kept the system easy to reason about.

Reflection

What rebuilding it taught me