← all builds

From-Scratch Build · Observability

Service Health Checker

A tiny service that quietly polls a list of websites, asks each one "are you up?", and publishes the answers as metrics a monitoring system can graph and alert on. Built from scratch to learn how uptime monitoring actually works under the hood.

PythonFlaskPrometheus DockerMonitoring

What it is

A heartbeat monitor for your services

When you run several websites, the question that matters most is the simplest: are they up right now? This build answers it continuously. It holds a small list of target URLs, and whenever asked, it sends each one a quick HTTP request and records whether it responded — and with what status code.

Crucially, it doesn't store or graph anything itself. Instead it exposes a single /metrics endpoint that speaks the Prometheus exposition format — plain text lines like http_status{target="site"} 200. A monitoring system scrapes that endpoint on a schedule, builds the history, draws the dashboards and fires the alerts. The checker's whole job is to be the honest little sensor at the bottom of that stack.

The core idea I wanted to learn: good monitoring separates measuring from storing. This service only measures — it exports a number on demand. By following Prometheus's pull model, a fifty-line script plugs straight into a full observability pipeline.

The stack

Tools under the hood

The point of this rebuild was how little it takes to be a proper metrics source. Here is what each piece does.

web server

Flask

A minimal Python web framework. It exists here to serve exactly one route — /metrics — and nothing more.

probing

HTTP requests

For each target the checker sends a GET request with a timeout and reads back the status code — or marks it down if nothing answers.

format

Prometheus metrics

Results are emitted as Prometheus exposition text — one labelled line per target — the lingua franca of modern monitoring.

config

Target list

A simple map of name-to-URL defines what gets checked. Adding a site to monitor is a one-line edit.

packaging

Docker

The whole thing ships as a container, so it drops into any host or monitoring network with no Python setup required.

integration

Shared network

It joins the monitoring stack's own network, so the scraper can reach it by name and pull metrics on its schedule.

How a check works

From a scrape to a number

Every time the monitoring system comes knocking, the same quick cycle runs — and it's deliberately stateless:

  1. Scraper requests metrics live

    The monitoring system hits the /metrics endpoint on its regular interval.

  2. Probe each target live

    The checker sends an HTTP request to every URL in its list, one after another, with a timeout guarding each.

  3. Record the status live

    It captures each response's status code — or a zero when a target fails to answer at all.

  4. Format as metrics live

    The results become labelled Prometheus lines, one per target, ready to parse.

  5. Return the text live

    The endpoint replies with plain text, and the scraper stores the snapshot in its time-series database.

  6. Latency & TLS checks future

    Also exporting response time and certificate expiry per target. Left as future work; for now it reports up/down status.

Why the pull model

A sensor, not a dashboard

Keeping the checker deliberately dumb is the design — and it pays off in three ways:

A failed target reports 0 instead of crashing — so "the site is down" is itself a clean, alertable signal rather than a gap in the data.

Reflection

What building it taught me