Built From Scratch · Python · Matching

Lost & Found

Someone loses a blue umbrella on the third floor; someone else hands one in at reception. The whole point of a lost-and-found is to connect those two reports — and that connection is a matching problem. Behind a simple CRUD form sits a genuinely interesting question: when are two descriptions of a thing actually the same thing?

Python 3SQLiteTF-IDF / scikit-learn Multi-signal scoring23 pytest tests

What it is

An app whose heart is a matcher

On the surface it's CRUD: people file lost reports and found reports, each describing an item — what it is, where, when, what colour — persisted in SQLite. But a list of reports nobody can cross-reference is useless. The real product is the engine that scores every found item against the lost ones and surfaces the likely matches, ranked by confidence.

The engine is real Python, in lostfound/, with pytest tests. The interactive demo lower down runs a faithful re-implementation of the same scoring in your browser, over the exact data/sample_reports.json that ships in the repo — so the numbers you see here match what demo.py prints.

The data model

One table of reports

fieldmeaning
typelost or found
categorycoarse class — umbrella, phone, keys, …
descriptionfree text — the main similarity signal
locationwhere, e.g. library-3rd-floor (hyphen tokens)
dateISO date the item was lost / found
coloroptional dominant colour
statusopenmatched / closed

The score

Five signals, one confidence

For a given report, every open report of the opposite type is scored by combining five signals into a single confidence in [0, 1]. Weights sum to 1, so the score stays in range.

signalweighthow it's computed
text0.40TF-IDF cosine similarity of the descriptions
category0.251 if categories agree, else 0
location0.15Jaccard overlap of location tokens
date0.101.0 same day, decaying to 0 over a 30-day window
color0.101 agree, 0 clash, 0.5 when unknown

The date is a hard veto. A found item dated before the loss — or more than 30 days after — is dropped entirely, because a thing cannot be found before it is lost. Candidates at or above the threshold (0.45) are auto-suggested; confirming a match closes both reports.

# lostfound/matcher.py — the combine step
score = sum(self.weights[k] * signals[k] for k in self.weights)
# date_signal returns None to VETO an impossible candidate
if date_score is None: continue

Try it

Match a lost item, live

Pick a lost report. The demo scores it against every open found report in the sample data, exactly as the Python engine does — same weights, same TF-IDF, same date veto.

Reflection

What building it taught me