← all builds

From-Scratch Build · Computer Vision

Marker Tracking Lab

A vision pipeline that spots printed fiducial markers in a camera feed, works out exactly where each one sits in 3D, and lines that up with a motion-capture system's world frame — so a robot, a camera and a tracker all agree on where things are. Built from scratch to learn how machines locate objects in a shared space.

PythonOpenCVArUco ROSMotion capture

What it is

Where is that thing, exactly?

The lab's job is pose estimation: given a video frame, find each printed marker, and report not just where it is on screen but its full position and orientation in metres. Those poses are published as transforms so they can be drawn in a 3D viewer and consumed by anything else on the robotics bus.

The second half is the part I most wanted to build: frame alignment. A camera sees the world from its own viewpoint; a motion-capture rig has its own origin. To make them cooperate, the camera frame has to be expressed in the tracker's coordinates. Once aligned, a marker detected by the camera and a robot tracked by motion capture live in the same map.

The core idea I wanted to learn: a fiducial marker is a cheap, reliable anchor. Because its size and pattern are known, one camera image is enough to recover a full 3D pose — and once you can express that pose in a shared world frame, separate sensing systems suddenly speak the same language.

The stack

Tools under the hood

This rebuild sits at the meeting point of optics, computer vision and robotics plumbing. Here is what each piece does.

capture

Industrial camera

A machine-vision camera driver delivers a steady, high-resolution image stream into the pipeline.

vision

OpenCV ArUco

Detects fiducial markers and, with the camera's intrinsics, solves each marker's 3D pose from a single frame.

calibration

Camera calibration

The camera matrix and distortion coefficients that turn raw pixels into accurate, undistorted measurements.

middleware

ROS + TF

Poses are published on the transform tree so every node shares one consistent picture of where things are.

ground truth

Motion capture

A tracker provides the authoritative world frame that the camera is aligned to.

visualisation

3D viewer

A robotics visualiser draws markers, frames and the camera live, so alignment errors are obvious at a glance.

Architecture

Three nodes, one map

The lab runs as a few independent ROS nodes you bring up in order. Keeping them separate means you can debug the camera without touching detection, or detection without the tracker.

  1. Camera node live

    Starts the machine-vision camera and exposes its rectified image stream to the rest of the graph.

  2. Calibration live

    Feeds the camera matrix and distortion coefficients in, so measurements are metrically accurate.

  3. Marker detection live

    Finds allowed markers in the rectified feed and publishes their poses to the transform tree.

  4. Tracker client live

    Brings in the motion-capture world frame and the poses it reports.

  5. Frame alignment live

    A static transform that expresses the camera in the tracker's coordinates, fusing both worlds.

  6. Web bridge live

    A bridge server so the live data can be reached from outside the robotics graph.

How it runs

Calibrate, detect, align

Getting trustworthy poses is a discipline, not a one-liner. The order matters because every later step assumes the earlier ones are correct:

In my rebuild I treated calibration as the foundation — a beautiful detector on a badly calibrated camera just produces confident, wrong numbers.

Reflection

What rebuilding it taught me