From-Scratch Build · Human–Computer Interaction
A kitchen counter that becomes an interactive recipe. Instead of squinting at a phone with messy hands, you cook while the next step is projected directly onto the workspace — and a vision model watches the counter to keep up with you. I built this from scratch to learn how tracking, projection and vision language models combine into a tangible interface.
What it is
This is a tangible user interface for cooking. A tracking system knows where your tools and ingredients are, an agent-based program holds the recipe state, and a projector draws instructions, timers and highlights straight onto the same surface you're working on. A vision language model interprets what the camera sees in the kitchen, so the interface can react to the real scene rather than a fixed script.
The premise that drew me in: text recipes force you to context-switch constantly — read, look away, cook, look back. Projecting the guidance into the workspace removes that gap. I wanted to build the plumbing that makes the counter itself the screen.
The core idea I wanted to learn: ambient interfaces win by being where your attention already is. The technical challenge is making the projected layer agree with the physical layer — the recipe step has to land next to the bowl it's talking about.
The stack
The point of this rebuild was the toolchain. Here is what each piece actually does in the system.
Locates tangible objects on the counter — tools, containers, markers — and reports their positions to the rest of the system.
Carries tracking data between components and forwards it over UDP to the projection engine.
Holds the recipe state machine — which step you're on — and decides what the projector should show next.
Interprets the camera view of the kitchen so the interface can read and reason about the real cooking scene.
Paints the recipe steps, timers and highlights onto the cooking surface, aligned with the physical workspace.
Spoken and tonal feedback complements the projection, so you get a nudge even when you're not looking down.
Pipeline
Each moment of cooking flows through the same sense-interpret-project loop.
The tracking system reports where objects are on the counter.
The vision model interprets the camera view to understand the current scene.
The agent program advances the recipe state and picks the next instruction.
The relevant step, timer or highlight is projected onto the workspace.
Audio feedback reinforces the visual cue for hands-busy moments.
As you finish a step, the loop repeats with the next one.
Why it works
The interesting claim behind this interface is that moving the recipe off a screen and onto the counter measurably helps people cook:
In my rebuild I concentrated on the tracking-to-projection loop, then layered the vision model on top so the interface could respond to the real scene rather than a fixed sequence.
Reflection