From-Scratch Build · Natural Language Processing
Political messaging rarely argues plainly — it nudges, frames, frightens and flatters. This build reads a sentence and names the rhetorical tactics at work — loaded language, appeals to fear, name-calling and five more — each with a confidence score and the trigger phrases that justify the call. I built it to learn to see persuasion, not to outsource the judgement.
What it is
Persuasion techniques are a known catalogue — propaganda analysts have classified them for decades. This system learns eight of them (a subset of the SemEval-2020 propaganda taxonomy) and applies them automatically: hand it a sentence and it returns which tactics appear and which words triggered each. A single sentence often stacks several at once, so it's a multi-label problem, not a tidy one-answer classification.
The value isn't to censor or score — it's transparency. Surfacing that a sentence leans on fear plus a loaded epithet makes the machinery of the message visible, and every call traces back to explicit cue phrases and TF-IDF weights. That's media literacy, made legible.
The stack
A classical, fully transparent multi-label pipeline grounded in a defined rhetoric taxonomy.
Eight persuasion techniques from the SemEval propaganda taxonomy — the label space the model learns.
114 original political-style sentences, hand-labelled — including multi-label and neutral rows.
Word and character n-gram TF-IDF, plus small per-tactic cue lexicons as extra features.
One balanced logistic-regression head per tactic, so each fires independently for multi-label output.
Each prediction reports the cue phrases in the text that justify it — not just a score.
Stratified held-out split, macro- and micro-F1, per-label report — the rare ones are easy to miss.
Architecture
Each sentence flows through the same read-and-label pipeline:
Fix the taxonomy of eight persuasion tactics to detect.
Word + character TF-IDF, concatenated with per-tactic lexicon counts.
One logistic head per tactic; threshold each probability into a label set.
Surface the cue phrases in the text that triggered each detected tactic.
Held-out macro-F1 0.91 on the seed set; inspect per-tactic recall.
Real output
Verbatim from python demo.py — trained on the 114-sentence seed set.
> These radical extremists are coming for your jobs, and if we do nothing, it will be too late.
appeal_to_fear 100% triggers: 'coming for', 'if we do nothing', 'too late'
> My opponent is a spineless coward, a fraud who has lied to you for years.
name_calling 100% triggers: 'spineless', 'coward', 'fraud'
loaded_language 80% triggers: (learned weights)
> We will build the greatest economy the world has ever seen, the biggest boom in all of history.
exaggeration 100% triggers: 'the greatest', 'ever', 'ever seen', 'the biggest'
> Can we really trust a single promise from the same people who failed us every time before?
doubt 97% triggers: 'Can we really trust'
> They lecture us about the deficit, but what about the trillions they wasted when they held power?
whataboutism 99% triggers: 'but what about', 'what about'
> The redistricting committee will publish its proposed map next Tuesday.
no tactics detected
Held-out evaluation (70/30 stratified split): macro-F1 0.909, micro-F1 0.896. A teaching-scale model on a small, clean seed set — on noisy real text, expect lower.
Reflection