Service Health Checker — Built From Scratch

What it is

A heartbeat monitor for your services

When you run several websites, the question that matters most is the simplest: are they up right now? This build answers it continuously. It holds a small list of target URLs, and whenever asked, it sends each one a quick HTTP request and records whether it responded — and with what status code.

Crucially, it doesn't store or graph anything itself. Instead it exposes a single /metrics endpoint that speaks the Prometheus exposition format — plain text lines like http_status{target="site"} 200. A monitoring system scrapes that endpoint on a schedule, builds the history, draws the dashboards and fires the alerts. The checker's whole job is to be the honest little sensor at the bottom of that stack.

The core idea I wanted to learn: good monitoring separates measuring from storing. This service only measures — it exports a number on demand. By following Prometheus's pull model, a fifty-line script plugs straight into a full observability pipeline.

The stack

Tools under the hood

The point of this rebuild was how little it takes to be a proper metrics source. Here is what each piece does.

web server

Flask

A minimal Python web framework. It exists here to serve exactly one route — /metrics — and nothing more.

probing

HTTP requests

For each target the checker sends a GET request with a timeout and reads back the status code — or marks it down if nothing answers.

format

Prometheus metrics

Results are emitted as Prometheus exposition text — one labelled line per target — the lingua franca of modern monitoring.

config

Target list

A simple map of name-to-URL defines what gets checked. Adding a site to monitor is a one-line edit.

packaging

Docker

The whole thing ships as a container, so it drops into any host or monitoring network with no Python setup required.

integration

Shared network

It joins the monitoring stack's own network, so the scraper can reach it by name and pull metrics on its schedule.

How a check works

From a scrape to a number

Every time the monitoring system comes knocking, the same quick cycle runs — and it's deliberately stateless:

Scraper requests metrics live
The monitoring system hits the /metrics endpoint on its regular interval.
Probe each target live
The checker sends an HTTP request to every URL in its list, one after another, with a timeout guarding each.
Record the status live
It captures each response's status code — or a zero when a target fails to answer at all.
Format as metrics live
The results become labelled Prometheus lines, one per target, ready to parse.
Return the text live
The endpoint replies with plain text, and the scraper stores the snapshot in its time-series database.
Latency & TLS checks future
Also exporting response time and certificate expiry per target. Left as future work; for now it reports up/down status.

Why the pull model

A sensor, not a dashboard

Keeping the checker deliberately dumb is the design — and it pays off in three ways:

Stateless by design: it stores nothing. Every scrape is a fresh measurement, so there's no database to corrupt or back up.
Standards-based: because it speaks the Prometheus format, any compatible monitoring stack can use it with zero custom glue.
Trivially composable: graphing, alerting and history all live in the monitoring system — this service just feeds it honest numbers.

A failed target reports 0 instead of crashing — so "the site is down" is itself a clean, alertable signal rather than a gap in the data.

Reflection

What building it taught me

Measuring and storing are different jobs. The Prometheus pull model clicked once I saw that my service only needs to answer "what's true right now" — everything else is someone else's concern.
A standard format is a superpower. Emitting a handful of well-formed metric lines was all it took to plug a hand-written script into a real observability pipeline.
Failure is a value, not an error. Reporting 0 for an unreachable target turns an outage into a first-class signal you can alert on, instead of a hole in the graph.
Small and containerised travels well. Wrapping fifty lines in Docker meant the checker drops into any monitoring network without dragging a runtime along behind it.

A heartbeat monitor for your services

Tools under the hood

Flask

HTTP requests

Prometheus metrics

Target list

Docker

Shared network

From a scrape to a number

Scraper requests metrics live

Probe each target live

Record the status live

Format as metrics live

Return the text live

Latency & TLS checks future

A sensor, not a dashboard

What building it taught me