Built From Scratch · Python · Matching
Someone loses a blue umbrella on the third floor; someone else hands one in at reception. The whole point of a lost-and-found is to connect those two reports — and that connection is a matching problem. Behind a simple CRUD form sits a genuinely interesting question: when are two descriptions of a thing actually the same thing?
What it is
On the surface it's CRUD: people file lost reports and found reports, each describing an item — what it is, where, when, what colour — persisted in SQLite. But a list of reports nobody can cross-reference is useless. The real product is the engine that scores every found item against the lost ones and surfaces the likely matches, ranked by confidence.
The engine is real Python, in lostfound/, with pytest tests. The interactive demo lower down runs a faithful re-implementation of the same scoring in your browser, over the exact data/sample_reports.json that ships in the repo — so the numbers you see here match what demo.py prints.
The data model
| field | meaning |
|---|---|
| type | lost or found |
| category | coarse class — umbrella, phone, keys, … |
| description | free text — the main similarity signal |
| location | where, e.g. library-3rd-floor (hyphen tokens) |
| date | ISO date the item was lost / found |
| color | optional dominant colour |
| status | open → matched / closed |
The score
For a given report, every open report of the opposite type is scored by combining five signals into a single confidence in [0, 1]. Weights sum to 1, so the score stays in range.
| signal | weight | how it's computed |
|---|---|---|
| text | 0.40 | TF-IDF cosine similarity of the descriptions |
| category | 0.25 | 1 if categories agree, else 0 |
| location | 0.15 | Jaccard overlap of location tokens |
| date | 0.10 | 1.0 same day, decaying to 0 over a 30-day window |
| color | 0.10 | 1 agree, 0 clash, 0.5 when unknown |
The date is a hard veto. A found item dated before the loss — or more than 30 days after — is dropped entirely, because a thing cannot be found before it is lost. Candidates at or above the threshold (0.45) are auto-suggested; confirming a match closes both reports.
# lostfound/matcher.py — the combine step score = sum(self.weights[k] * signals[k] for k in self.weights) # date_signal returns None to VETO an impossible candidate if date_score is None: continue
Try it
Pick a lost report. The demo scores it against every open found report in the sample data, exactly as the Python engine does — same weights, same TF-IDF, same date veto.
Reflection