← all builds

From-Scratch Build · Observability

Observability Stack with OpenTelemetry

A full telemetry pipeline built from scratch: an OpenTelemetry Collector scraping host metrics and feeding them into a Grafana LGTM backend — Loki, Grafana, Tempo and Mimir — so I can see exactly what a machine is doing, live, on a dashboard.

OpenTelemetryGrafanaLoki · Tempo · Mimir OTLPDocker Compose

What it is

Metrics, logs and traces in one place

Observability is the ability to ask, after the fact, why a system behaved the way it did — using the signals it emits. Those signals come in three flavours: metrics (numbers over time, like CPU usage), logs (timestamped text), and traces (the path of a request through a system). This build wires up a complete pipeline that collects them and puts them on a dashboard.

I built it to learn the modern, vendor-neutral way of doing this. OpenTelemetry is the open standard for collecting telemetry; its Collector is a single agent that gathers signals and forwards them anywhere. Here it forwards to Grafana's LGTM stack — an all-in-one image bundling Grafana (the UI), Loki (logs), Tempo (traces) and Mimir (metrics).

The core idea I wanted to learn: instrumentation should be decoupled from storage. The Collector speaks the open OTLP protocol, so the thing producing telemetry never needs to know which database eventually stores it. Swap the backend and nothing upstream changes.

The stack

Tools under the hood

Two containers, one open protocol between them. Here is the role of each piece.

agent

OTel Collector

The data-collecting workhorse. It scrapes metrics from configured receivers, batches them, and exports them over OTLP to the backend.

receiver

hostmetrics

Reads the machine's own vitals every 10 seconds — CPU, memory, disk, filesystem, network and load.

receiver

prometheus

A second receiver that scrapes a Prometheus-style endpoint, letting the Collector pull in app metrics alongside host metrics.

backend

Grafana LGTM

An all-in-one image: Grafana for dashboards, plus Loki, Tempo and Mimir as the stores for logs, traces and metrics.

protocol

OTLP gRPC

The OpenTelemetry Line Protocol on port 4317 — the open wire format carrying telemetry from Collector to backend.

packaging

Docker Compose

Runs both containers on a shared network with a persistent volume for Grafana, so dashboards survive restarts.

The pipeline

From a CPU tick to a dashboard

The Collector's config file describes a pipeline in three stages — receivers, processors, exporters. Telemetry flows through it like water through pipes:

  1. Receive live

    The hostmetrics receiver scrapes CPU, memory, disk, filesystem, network and load every 10s; a prometheus receiver pulls a scrape target on the side.

  2. Process live

    A batch processor groups readings together before export, cutting overhead and smoothing out network chatter.

  3. Export live

    The otlp exporter ships the batched metrics over gRPC to the LGTM container at otel-lgtm:4317.

  4. Store

    Mimir, inside the LGTM image, persists the metrics; Loki and Tempo stand ready for logs and traces on the same backend.

  5. Visualise

    Grafana on port 3000 queries the store and renders dashboards from series like system_cpu_time and system_memory_usage.

How it runs

Two containers, one command

The stack comes up with a single docker compose up -d; the interesting parts are how the pieces find each other:

Grafana opens at localhost:3000 with a default admin / admin login, where the live host metrics are ready to chart.

Reflection

What rebuilding it taught me