Observability Stack with OpenTelemetry

What it is

Metrics, logs and traces in one place

Observability is the ability to ask, after the fact, why a system behaved the way it did — using the signals it emits. Those signals come in three flavours: metrics (numbers over time, like CPU usage), logs (timestamped text), and traces (the path of a request through a system). This build wires up a complete pipeline that collects them and puts them on a dashboard.

I built it to learn the modern, vendor-neutral way of doing this. OpenTelemetry is the open standard for collecting telemetry; its Collector is a single agent that gathers signals and forwards them anywhere. Here it forwards to Grafana's LGTM stack — an all-in-one image bundling Grafana (the UI), Loki (logs), Tempo (traces) and Mimir (metrics).

The core idea I wanted to learn: instrumentation should be decoupled from storage. The Collector speaks the open OTLP protocol, so the thing producing telemetry never needs to know which database eventually stores it. Swap the backend and nothing upstream changes.

The stack

Tools under the hood

Two containers, one open protocol between them. Here is the role of each piece.

agent

OTel Collector

The data-collecting workhorse. It scrapes metrics from configured receivers, batches them, and exports them over OTLP to the backend.

receiver

hostmetrics

Reads the machine's own vitals every 10 seconds — CPU, memory, disk, filesystem, network and load.

receiver

prometheus

A second receiver that scrapes a Prometheus-style endpoint, letting the Collector pull in app metrics alongside host metrics.

backend

Grafana LGTM

An all-in-one image: Grafana for dashboards, plus Loki, Tempo and Mimir as the stores for logs, traces and metrics.

protocol

OTLP gRPC

The OpenTelemetry Line Protocol on port 4317 — the open wire format carrying telemetry from Collector to backend.

packaging

Docker Compose

Runs both containers on a shared network with a persistent volume for Grafana, so dashboards survive restarts.

The pipeline

From a CPU tick to a dashboard

The Collector's config file describes a pipeline in three stages — receivers, processors, exporters. Telemetry flows through it like water through pipes:

Receive live
The hostmetrics receiver scrapes CPU, memory, disk, filesystem, network and load every 10s; a prometheus receiver pulls a scrape target on the side.
Process live
A batch processor groups readings together before export, cutting overhead and smoothing out network chatter.
Export live
The otlp exporter ships the batched metrics over gRPC to the LGTM container at otel-lgtm:4317.
Store
Mimir, inside the LGTM image, persists the metrics; Loki and Tempo stand ready for logs and traces on the same backend.
Visualise
Grafana on port 3000 queries the store and renders dashboards from series like system_cpu_time and system_memory_usage.

How it runs

Two containers, one command

The stack comes up with a single docker compose up -d; the interesting parts are how the pieces find each other:

The backend: the grafana/otel-lgtm image exposes Grafana on 3000 and OTLP inputs on 4317 (gRPC) and 4318 (HTTP).
The collector: the second container mounts otel-collector-config.yaml read-only and starts with --config pointed at it.
The wiring: both join a shared Docker network so the Collector can reach the backend by its container name, and depends_on makes the backend start first.
Persistence: a named grafana_data volume keeps dashboards and settings across restarts.

Grafana opens at localhost:3000 with a default admin / admin login, where the live host metrics are ready to chart.

Reflection

What rebuilding it taught me

The Collector is the keystone. One agent with a receiver/processor/exporter pipeline replaces a tangle of bespoke agents — and its config is just YAML.
OTLP is the decoupler. Because the open protocol sits between producer and store, the backend becomes a swappable detail rather than a lock-in.
"LGTM" is four tools in a trench coat. Loki, Grafana, Tempo and Mimir each own one signal; bundling them made standing up a full backend a single image.
Batching is not optional. The batch processor looks trivial but it's what keeps a 10-second scrape interval from drowning the exporter in tiny requests.

Metrics, logs and traces in one place

Tools under the hood

OTel Collector

hostmetrics

prometheus

Grafana LGTM

OTLP gRPC

Docker Compose

From a CPU tick to a dashboard

Receive live

Process live

Export live

Store

Visualise

Two containers, one command

What rebuilding it taught me