Gesture-Driven DJ Visual Interface

What it is

The DJ becomes the performance

This is a tangible interface for DJing built around one ambition: make the performer visible. A motion-capture system tracks marker gloves on the DJ's hands; a real-time audio analyser listens to the music; and a WebGL fluid simulation, projected onto the DJ's workspace, responds to both. The result is that gestures and sound continuously reshape a living visual that the audience sees.

It runs in four interaction modes — knobs change the music, music changes the visuals, gestures change the visuals, and two-handed gestures draw EQ curves back onto the music. I built it to understand how to fuse two real-time input streams, motion and audio, into a single coherent output.

The core idea I wanted to learn: expressive interfaces are about mapping, not sensing. Tracking a hand is easy; deciding how a hand's motion should bend a fluid simulation — and when the audio should override it — is the whole craft.

The stack

Tools under the hood

The point of this rebuild was the toolchain. Here is what each piece actually does in the system.

sensing

Motion capture

Tracks marker gloves on the DJ's hands, reporting position and orientation so gestures become data.

middleware

ROS bridge

Processes the capture stream and forwards hand coordinates to the visual engine over UDP.

audio

Audio analyser

Listens to the live music with low latency and extracts a signal used to trigger and modulate the visuals.

visuals

WebGL fluid simulation

A browser-based fluid simulation in JavaScript — the canvas the gestures and audio paint onto.

mapping

Pointer mapping

Hand coordinates are mapped onto pointer positions in the simulation; an audio threshold decides when a "click" fires.

display

Projection

The finished visual is projected onto the DJ's table, putting the performance in the same space as the performer.

Pipeline

From a hand to a wave of colour

Motion and audio travel separate paths and meet inside the visual engine, where they're fused into one image.

Track live
Motion capture reports the position of the gloved hands.
Bridge live
A ROS node streams the hand coordinates to the visual engine over UDP.
Listen live
The audio analyser turns the live music into a triggering signal.
Map live
Hand coordinates become pointer positions; the audio signal decides when to click.
Simulate live
The fluid simulation reacts — following the hands and bursting on audio triggers.
Project live
The visual is projected onto the DJ's table for the audience to see.

Four modes

The loops between sound and motion

What makes this more than a visualiser is that influence runs in both directions — gesture and audio each touch both the music and the visuals:

Knob changes the music: the DJ shapes sound the classic way, by turning physical controls.
Music changes the visuals: low-latency audio analysis drives the fluid simulation in time with the track.
Gesture changes the visuals: tracked hands stir and steer the projected fluid directly.
Gesture changes the music: two-handed gestures draw EQ curves that are applied back to the audio.

In my rebuild I focused first on the visual loop — mapping hands to pointers and gating clicks on the audio threshold — because getting that fusion right is what sells the whole performance.

Reflection

What rebuilding it taught me

Fusion beats either signal alone. Hands give you where; audio gives you when. Combining them is what makes the visual feel alive.
Latency is the enemy. A visual that lags the beat by even a little stops feeling connected to the music — every stage has to be fast.
Mapping is a creative act. The same tracking data can feel mechanical or expressive depending entirely on how you map it onto the simulation.
Projection collapses the gap. Putting the visual in the DJ's own space is what turns an operator into a performance.

The DJ becomes the performance

Tools under the hood

Motion capture

ROS bridge

Audio analyser

WebGL fluid simulation

Pointer mapping

Projection

From a hand to a wave of colour

Track live

Bridge live

Listen live

Map live

Simulate live

Project live

The loops between sound and motion

What rebuilding it taught me