Lost & Found — a CRUD app with a matching engine at its heart

What it is

An app whose heart is a matcher

On the surface it's CRUD: people file lost reports and found reports, each describing an item — what it is, where, when, what colour — persisted in SQLite. But a list of reports nobody can cross-reference is useless. The real product is the engine that scores every found item against the lost ones and surfaces the likely matches, ranked by confidence.

The engine is real Python, in lostfound/, with pytest tests. The interactive demo lower down runs a faithful re-implementation of the same scoring in your browser, over the exact data/sample_reports.json that ships in the repo — so the numbers you see here match what demo.py prints.

The data model

One table of reports

field	meaning
type	`lost` or `found`
category	coarse class — umbrella, phone, keys, …
description	free text — the main similarity signal
location	where, e.g. `library-3rd-floor` (hyphen tokens)
date	ISO date the item was lost / found
color	optional dominant colour
status	`open` → `matched` / `closed`

The score

Five signals, one confidence

For a given report, every open report of the opposite type is scored by combining five signals into a single confidence in [0, 1]. Weights sum to 1, so the score stays in range.

signal	weight	how it's computed
text	0.40	TF-IDF cosine similarity of the descriptions
category	0.25	1 if categories agree, else 0
location	0.15	Jaccard overlap of location tokens
date	0.10	1.0 same day, decaying to 0 over a 30-day window
color	0.10	1 agree, 0 clash, 0.5 when unknown

The date is a hard veto. A found item dated before the loss — or more than 30 days after — is dropped entirely, because a thing cannot be found before it is lost. Candidates at or above the threshold (0.45) are auto-suggested; confirming a match closes both reports.

# lostfound/matcher.py — the combine step
score = sum(self.weights[k] * signals[k] for k in self.weights)
# date_signal returns None to VETO an impossible candidate
if date_score is None: continue

Try it

Match a lost item, live

Pick a lost report. The demo scores it against every open found report in the sample data, exactly as the Python engine does — same weights, same TF-IDF, same date veto.

Lost report

Reflection

What building it taught me

The matcher is the product. CRUD stores the reports; the value is entirely in connecting them.
Fuzzy beats exact. Real descriptions never match word-for-word, so TF-IDF cosine does the heavy lifting.
Weighting the signals is the craft. How much category, location and timing each count decides whether matches feel right.
Some constraints are hard, not soft. "Found before lost" isn't a low score — it's impossible, so the date vetoes the candidate outright.
Thresholds are a UX decision. Suggest too eagerly and it's noise; too cautiously and reunions never happen.