← all builds

From-Scratch Build · Robotics Data

Robot Manipulator Datasets

Two datasets captured from a six-axis robot arm and its software twin — one watching the system scale up under load, one recording the arm repeating four motions. Rebuilt from scratch to understand how robotics data is collected, formatted and used.

6-axis armDigital twinCSV time-series Labelled dataML-ready

What it is

Data from a robot that copies itself

The subject is a small six-axis robot manipulator — an arm with six joints — paired with a digital twin, a live software copy of that arm running on a server. As the real arm moves, the twin mirrors it, and the system records what's happening: how loaded the machines are, how the network behaves, how long each instance takes to respond. Those recordings are the datasets.

I built this to learn the unglamorous but essential half of robotics and machine learning: the data. Not the model, not the robot — the columns in a CSV file, the sampling intervals, the labels, and the decisions behind them that decide whether anything you train later is any good.

The core idea I wanted to learn: a dataset is a designed artefact, not a byproduct. Choosing what to sample, how often, and how to label it is the real work — and it's what turns raw robot telemetry into something a classifier can actually learn from.

The stack

What the data is made of

Two datasets, captured two different ways. Here is what each piece actually is.

subject

Six-axis arm

A compact desktop manipulator with six rotational joints. Its geometry is described by a robot model file, so the twin and any viewer know exactly how the arm is shaped.

source

Digital twin

A software replica of the arm running on edge infrastructure. The datasets are telemetry from this twin and the system hosting it, not raw video of the hardware.

dataset A

Scalability traces

The system is pushed harder over time by adding a new virtual robot instance at a fixed interval. The data records how resource use and timing respond as load climbs.

dataset B

Motion traces

The arm performs four distinct movements, each repeated twenty times. One CSV per movement captures the resulting time-series — clean, repeatable, comparable.

format

CSV time-series

Everything is plain comma-separated values: rows over time, columns of measurements. Portable, diff-able, and openable in anything from a spreadsheet to pandas.

labels

Labelled variant

The largest scalability file ships in a labelled version, with each row tagged so it can directly train a supervised classifier such as a random forest.

Architecture

How the data is organised

The two datasets are kept separate because they answer different questions. The scalability set comes in three resolutions; the motion set is split by movement.

  1. Micro sampling live

    Scalability dataset where a new robot instance is added every 60 seconds — the fine-grained view, lots of detail over a short window.

  2. Small sampling live

    Same experiment, a new instance every 300 seconds — a middle resolution that trades detail for a longer, calmer trace.

  3. Big sampling live

    A new instance every 3600 seconds — the coarse, long-horizon view of how the system scales over hours.

  4. Labelled big set live

    The big dataset with per-row labels added, ready to train and evaluate a supervised classifier.

  5. Motion CSVs live

    Four files, one per movement, each capturing the arm repeating that motion twenty times alongside a reference clip and the robot model.

  6. Generation script live

    A small Python script that produced the scalability traces — the reproducible recipe behind the numbers, not a hand-edited file.

How it's used

From CSV to a trained model

A dataset is only as useful as what you can do with it. These two were shaped with concrete uses in mind:

In my rebuild I focused on the data-handling path: load the CSVs, understand each column, line the three sampling resolutions up against each other, and confirm the labelled set really is ready to train on without further cleaning.

Reflection

What rebuilding it taught me