From-Scratch Build · Observability
A full telemetry pipeline built from scratch: an OpenTelemetry Collector scraping host metrics and feeding them into a Grafana LGTM backend — Loki, Grafana, Tempo and Mimir — so I can see exactly what a machine is doing, live, on a dashboard.
What it is
Observability is the ability to ask, after the fact, why a system behaved the way it did — using the signals it emits. Those signals come in three flavours: metrics (numbers over time, like CPU usage), logs (timestamped text), and traces (the path of a request through a system). This build wires up a complete pipeline that collects them and puts them on a dashboard.
I built it to learn the modern, vendor-neutral way of doing this. OpenTelemetry is the open standard for collecting telemetry; its Collector is a single agent that gathers signals and forwards them anywhere. Here it forwards to Grafana's LGTM stack — an all-in-one image bundling Grafana (the UI), Loki (logs), Tempo (traces) and Mimir (metrics).
The core idea I wanted to learn: instrumentation should be decoupled from storage. The Collector speaks the open OTLP protocol, so the thing producing telemetry never needs to know which database eventually stores it. Swap the backend and nothing upstream changes.
The stack
Two containers, one open protocol between them. Here is the role of each piece.
The data-collecting workhorse. It scrapes metrics from configured receivers, batches them, and exports them over OTLP to the backend.
Reads the machine's own vitals every 10 seconds — CPU, memory, disk, filesystem, network and load.
A second receiver that scrapes a Prometheus-style endpoint, letting the Collector pull in app metrics alongside host metrics.
An all-in-one image: Grafana for dashboards, plus Loki, Tempo and Mimir as the stores for logs, traces and metrics.
The OpenTelemetry Line Protocol on port 4317 — the open wire format carrying telemetry from Collector to backend.
Runs both containers on a shared network with a persistent volume for Grafana, so dashboards survive restarts.
The pipeline
The Collector's config file describes a pipeline in three stages — receivers, processors, exporters. Telemetry flows through it like water through pipes:
The hostmetrics receiver scrapes CPU, memory, disk, filesystem, network and load every 10s; a prometheus receiver pulls a scrape target on the side.
A batch processor groups readings together before export, cutting overhead and smoothing out network chatter.
The otlp exporter ships the batched metrics over gRPC to the LGTM container at otel-lgtm:4317.
Mimir, inside the LGTM image, persists the metrics; Loki and Tempo stand ready for logs and traces on the same backend.
Grafana on port 3000 queries the store and renders dashboards from series like system_cpu_time and system_memory_usage.
How it runs
The stack comes up with a single docker compose up -d; the interesting parts are how the pieces find each other:
grafana/otel-lgtm image exposes Grafana on 3000 and OTLP inputs on 4317 (gRPC) and 4318 (HTTP).otel-collector-config.yaml read-only and starts with --config pointed at it.depends_on makes the backend start first.grafana_data volume keeps dashboards and settings across restarts.Grafana opens at localhost:3000 with a default admin / admin login, where the live host metrics are ready to chart.
Reflection