← all builds

From-Scratch Build · Human–Computer Interaction

Smart Cooking Projection Interface

A kitchen counter that becomes an interactive recipe. Instead of squinting at a phone with messy hands, you cook while the next step is projected directly onto the workspace — and a vision model watches the counter to keep up with you. I built this from scratch to learn how tracking, projection and vision language models combine into a tangible interface.

Motion trackingProjectionVision LLM Agent simulationAudio feedback

What it is

The recipe lives on the counter

This is a tangible user interface for cooking. A tracking system knows where your tools and ingredients are, an agent-based program holds the recipe state, and a projector draws instructions, timers and highlights straight onto the same surface you're working on. A vision language model interprets what the camera sees in the kitchen, so the interface can react to the real scene rather than a fixed script.

The premise that drew me in: text recipes force you to context-switch constantly — read, look away, cook, look back. Projecting the guidance into the workspace removes that gap. I wanted to build the plumbing that makes the counter itself the screen.

The core idea I wanted to learn: ambient interfaces win by being where your attention already is. The technical challenge is making the projected layer agree with the physical layer — the recipe step has to land next to the bowl it's talking about.

The stack

Tools under the hood

The point of this rebuild was the toolchain. Here is what each piece actually does in the system.

tracking

Motion capture

Locates tangible objects on the counter — tools, containers, markers — and reports their positions to the rest of the system.

middleware

ROS bridge

Carries tracking data between components and forwards it over UDP to the projection engine.

logic

Agent-based program

Holds the recipe state machine — which step you're on — and decides what the projector should show next.

perception

Vision language model

Interprets the camera view of the kitchen so the interface can read and reason about the real cooking scene.

display

Projector

Paints the recipe steps, timers and highlights onto the cooking surface, aligned with the physical workspace.

feedback

Audio cues

Spoken and tonal feedback complements the projection, so you get a nudge even when you're not looking down.

Pipeline

From the counter to a projected step

Each moment of cooking flows through the same sense-interpret-project loop.

  1. Track live

    The tracking system reports where objects are on the counter.

  2. See live

    The vision model interprets the camera view to understand the current scene.

  3. Decide live

    The agent program advances the recipe state and picks the next instruction.

  4. Project live

    The relevant step, timer or highlight is projected onto the workspace.

  5. Speak live

    Audio feedback reinforces the visual cue for hands-busy moments.

  6. Advance live

    As you finish a step, the loop repeats with the next one.

Why it works

What projecting the recipe actually changes

The interesting claim behind this interface is that moving the recipe off a screen and onto the counter measurably helps people cook:

In my rebuild I concentrated on the tracking-to-projection loop, then layered the vision model on top so the interface could respond to the real scene rather than a fixed sequence.

Reflection

What rebuilding it taught me