From-Scratch Build · Natural Language Processing

Notification Triage

Your phone buzzes forty times a day and maybe two of them actually matter. This Python build reads the content of each notification, scores it on a continuous priority scale, and returns a ranked inbox — so the security alert rises and the marketing blast sinks. A small, honest NLP problem with an outsized effect on attention.

Pythonscikit-learnTF-IDF + LSA Logistic regressionPriority rankingCPU-only

What it is

Sorting the urgent from the noise

Notifications arrive in a flat, undifferentiated stream — a flight delay sits next to a game invite sits next to a two-factor code. The system here reads what each one actually says and assigns it a priority, so the stream can be reordered by importance instead of arrival time. It's triage: not deleting anything, just deciding what deserves your attention first.

The interesting part is that priority lives in the language. "Your account was accessed from a new device" and "New devices are on sale" share words but not urgency. Telling them apart needs a semantic representation, not keyword spotting — which is exactly what makes it a good study.

0.941
held-out accuracy (16/17) on a 25% stratified split of the seed data — against a ~0.42 majority-class baseline.

The stack

From raw alert to ranked queue

Encode the message, score it, order the feed — all in scikit-learn.

data

65 labelled notifications

A hand-built seed set tagged high / medium / low (20 / 20 / 25), committed under data/.

embed

TF-IDF + LSA

Uni/bi-gram TF-IDF compressed by truncated SVD into a ~40-dim semantic vector — Latent Semantic Analysis.

signal

Lexical urgency layer

Interpretable cues: urgency-lexicon hits, ALL-CAPS ratio, codes, money, deadline mentions, sender weight.

classify

Logistic regression

Embedding + signals concatenated, mapped to P(low/medium/high). Trains in under a second on CPU.

rank

Continuous score

Probabilities collapse to one priority score in [0,1]; the inbox sorts by it as a strict total order.

boost

Topic similarity

Cosine similarity in LSA space to a user's important topics nudges relevant messages up — semantic, not keyword.

Architecture

How a notification gets ranked

Each incoming message runs the same read-score-place pipeline:

  1. Ingest

    Capture the notification's text and sender as it arrives.

  2. Embed

    TF-IDF (1,2-grams) then truncated SVD into a ~40-dim LSA vector.

  3. Signal

    Add the lexical layer — urgency hits, caps, codes, money, deadlines, sender weight — and concatenate.

  4. Score

    Logistic regression outputs class probabilities, collapsed to one priority score in [0,1].

  5. Place

    Sort the feed by score into a strict total order, ties broken by arrival index.

Real output

A mixed batch, ranked

python demo.py feeds eight notifications — a security alert, a password reset, a newsletter, a social like, a calendar reminder, a production-down page, a sale, a delivery — and prints the ranked inbox. This is the actual output:

# score tier message ────────────────────────────────────────────────────────── 1 1.000 high URGENT: production is down, customers cannot… 2 0.999 high Security alert: your account was accessed from… 3 0.532 medium Your package will be delivered today between 2pm… 4 0.509 medium Your password reset link expires in 30 minutes 5 0.497 medium Reminder: dentist appointment tomorrow at 10am 6 0.046 low Maria liked your photo 7 0.021 low Weekly newsletter: 5 articles we think you'll… 8 0.011 low 50% off everything this weekend only, shop the sale

Honest by design

What "semantic" does and doesn't mean

There's no large language model here. "Semantic analysis" means TF-IDF + Latent Semantic Analysis (truncated SVD over the term-document matrix) for a dense embedding, plus a small lexical urgency layer — both fed to a logistic-regression classifier in scikit-learn.

That's enough to separate "your account was accessed from a new device" from "new devices are on sale", it trains in under a second, runs entirely on CPU, and every feature is inspectable. The whole thing is small on purpose.

Reflection

What rebuilding it taught me