Notification Triage — Built From Scratch

What it is

Sorting the urgent from the noise

Notifications arrive in a flat, undifferentiated stream — a flight delay sits next to a game invite sits next to a two-factor code. The system here reads what each one actually says and assigns it a priority, so the stream can be reordered by importance instead of arrival time. It's triage: not deleting anything, just deciding what deserves your attention first.

The interesting part is that priority lives in the language. "Your account was accessed from a new device" and "New devices are on sale" share words but not urgency. Telling them apart needs a semantic representation, not keyword spotting — which is exactly what makes it a good study.

0.941

held-out accuracy (16/17) on a 25% stratified split of the seed data — against a ~0.42 majority-class baseline.

The stack

From raw alert to ranked queue

Encode the message, score it, order the feed — all in scikit-learn.

data

65 labelled notifications

A hand-built seed set tagged high / medium / low (20 / 20 / 25), committed under data/.

embed

TF-IDF + LSA

Uni/bi-gram TF-IDF compressed by truncated SVD into a ~40-dim semantic vector — Latent Semantic Analysis.

signal

Lexical urgency layer

Interpretable cues: urgency-lexicon hits, ALL-CAPS ratio, codes, money, deadline mentions, sender weight.

classify

Logistic regression

Embedding + signals concatenated, mapped to P(low/medium/high). Trains in under a second on CPU.

rank

Continuous score

Probabilities collapse to one priority score in [0,1]; the inbox sorts by it as a strict total order.

boost

Topic similarity

Cosine similarity in LSA space to a user's important topics nudges relevant messages up — semantic, not keyword.

Architecture

How a notification gets ranked

Each incoming message runs the same read-score-place pipeline:

Ingest
Capture the notification's text and sender as it arrives.
Embed
TF-IDF (1,2-grams) then truncated SVD into a ~40-dim LSA vector.
Signal
Add the lexical layer — urgency hits, caps, codes, money, deadlines, sender weight — and concatenate.
Score
Logistic regression outputs class probabilities, collapsed to one priority score in [0,1].
Place
Sort the feed by score into a strict total order, ties broken by arrival index.

Real output

A mixed batch, ranked

python demo.py feeds eight notifications — a security alert, a password reset, a newsletter, a social like, a calendar reminder, a production-down page, a sale, a delivery — and prints the ranked inbox. This is the actual output:

# score tier message ────────────────────────────────────────────────────────── 1 1.000 high URGENT: production is down, customers cannot… 2 0.999 high Security alert: your account was accessed from… 3 0.532 medium Your package will be delivered today between 2pm… 4 0.509 medium Your password reset link expires in 30 minutes 5 0.497 medium Reminder: dentist appointment tomorrow at 10am 6 0.046 low Maria liked your photo 7 0.021 low Weekly newsletter: 5 articles we think you'll… 8 0.011 low 50% off everything this weekend only, shop the sale

Honest by design

What "semantic" does and doesn't mean

There's no large language model here. "Semantic analysis" means TF-IDF + Latent Semantic Analysis (truncated SVD over the term-document matrix) for a dense embedding, plus a small lexical urgency layer — both fed to a logistic-regression classifier in scikit-learn.

That's enough to separate "your account was accessed from a new device" from "new devices are on sale", it trains in under a second, runs entirely on CPU, and every feature is inspectable. The whole thing is small on purpose.

Reflection

What rebuilding it taught me

Priority is semantic. Keyword rules break instantly; the same words carry opposite urgency depending on meaning.
The costly error is asymmetric. Missing one urgent message is far worse than over-ranking a trivial one — the metric has to reflect that.
Triage, not deletion. The goal is reordering attention, which is a gentler and more useful framing than filtering.
Thresholds are policy. How aggressively you promote or bury is a product decision dressed as a hyperparameter.
Feedback closes the loop. User dismissals and opens are the cheapest, truest labels you'll ever get.