Distributed Device Agent Platform

What it is

An agent on every device

The platform has two halves. On each device runs an agent — a small long-lived process that connects to a message broker, publishes a retained description of itself (metadata, status, current state, configuration), and listens for commands. On the network sits a central command plane that discovers every agent, pushes configuration, and triggers installs, restarts and upgrades.

I built this to answer a question I kept hitting in hobby hardware projects: once you have more than one or two devices, how do you manage them without SSH-ing into each box by hand? The answer turned out to be a publish/subscribe contract and a registry of installable capabilities.

The core idea I wanted to learn: a device fleet becomes manageable the moment every node speaks the same MQTT contract. Retained messages mean a freshly-connected console instantly knows the full state of the fleet — no polling, no central database of truth, just the broker.

The stack

Tools under the hood

The whole point of this rebuild was the toolchain that makes a self-describing agent possible. Here is what each piece actually does.

transport

MQTT

A lightweight publish/subscribe broker. Every agent and the console meet here; nobody connects to anybody directly.

runtime

Agent core

The process on each device. Connects, publishes retained metadata/status/state/cfg, and runs the command handlers.

control

Central command

The operator-facing plane that lists agents, edits their config and issues fleet-wide commands.

extensibility

Component registry

A loader that lets an agent install and run pluggable capabilities at runtime instead of baking everything into one binary.

packaging

pyproject + systemd

Each piece is its own Python package; the agent ships as a systemd service so it survives reboots.

telemetry

Metrics + logs

CPU, memory and disk percentages stream as telemetry topics; logs batch back over MQTT, each gated by config.

Architecture

Plug-in components

The agent is deliberately boring — its job is connectivity, lifecycle and command handling. The interesting behaviour lives in components that the agent can install on demand. To prove the model I built a few against a shared base contract.

Component base live
The shared contract every component implements — lifecycle hooks, context, and MQTT-aware logging.
LED strip live
Drives an addressable LED strip with a library of effects: fades, wipes, rainbows, sparkle, theatre chase.
Projector live
Controls a projector over a serial connection — power and input switching through the same agent.
ROS bridge live
Bridges the agent to a robotics middleware bus so robot nodes can be driven as fleet components.
Visualisation live
A component that surfaces what the fleet is doing, for debugging the live system.
Foraging behaviour demo
A sample mobile-robot behaviour, included to exercise the platform end-to-end rather than as core infrastructure.

How it runs

The lifecycle of one agent

Bringing a device online follows a fixed path. The MQTT contract is the same for every node, which is what makes the fleet uniform:

Connect: the agent authenticates to the broker with its own credentials and a unique identity.
Announce: it publishes retained metadata, status, state and cfg so the console sees it immediately.
Obey: it subscribes to a command channel — ping, restart, install, uninstall, upgrade, refresh.
Report: when enabled, it streams batched logs and per-metric telemetry, each toggled by its own config.

In my rebuild I focused on getting one agent fully self-describing and remotely commandable before adding a second — the contract is what scales, not the count.

Reflection

What rebuilding it taught me

Retained messages are a free state store. Because the broker holds the last value, a new console doesn't ask "what's out there" — it just subscribes and the fleet describes itself.
A contract beats a codebase. The agent stays tiny; capabilities arrive as components implementing a shared base. Adding a new device type meant writing a component, not touching the core.
Lifecycle commands are the hard part. Install, upgrade and restart over the network — done safely, with deduplication — is where most of the real engineering went.
Telemetry must be opt-in. Logs and metrics gated by config keep quiet devices quiet; you turn the firehose on only for the node you're debugging.

An agent on every device

Tools under the hood

MQTT

Agent core

Central command

Component registry

pyproject + systemd

Metrics + logs

Plug-in components

Component base live

LED strip live

Projector live

ROS bridge live

Visualisation live

Foraging behaviour demo

The lifecycle of one agent

What rebuilding it taught me