Natural Language Processing
— made tangible.
Every core concept from your NLP syllabus, distilled into a live demo you can break, replay, and inspect. Built as a static site you can host on GitHub Pages.
Text Preprocessing
Tokenize, stem, lemmatize, strip stopwords — all live.
TF-IDF & Retrieval
Rank documents against a query with cosine similarity.
Word Embeddings
Walk through a 2D vector space and try analogies.
POS Tagging (Viterbi)
Watch the trellis fill in word by word.
RNN Hidden States
Step a recurrent net through a sequence.
Self-Attention
Render a real attention heatmap from your sentence.
The NLP Pipeline
Almost every NLP system, from a spam filter to GPT, follows the same five stages. Click any stage to see what happens inside.
& Cleaning
Engineering
& Evaluation
From Rules to Foundation Models
Seven decades of NLP, condensed. Click any era to load its defining ideas, systems, and breakthroughs.
Text Preprocessing Playground
Type any text. Watch every preprocessing stage update live. Toggle steps on and off to feel what each one does.
① Sentence segmentation
② Tokenization
③ Normalized tokens
④ Final pipeline output
Stemming chops suffixes (jumping → jump) — fast but crude. Lemmatization uses a dictionary + POS to return real words (was → be, better → good).
Regex Playground
Regex is the workhorse of preprocessing, scraping, and tokenization. Try a pattern, see matches highlight live.
TF-IDF & Cosine Similarity
Build a tiny search engine. Each document becomes a sparse vector weighted by how distinctive its words are; the query is scored by cosine similarity.
Documents
Query
Ranked results
TF-IDF weight matrix (top terms)
Multinomial Naive Bayes Classifier
A spam classifier you can train in your browser. Watch the priors and per-word likelihoods update as you flip labels.
Training data
Classify a new message
Decision math
Top discriminative words per class
Logistic Regression — see the decision boundary
The sigmoid squashes any score into a probability. Drag the weights and bias to move the boundary; click the canvas to add new points.
Sigmoid σ(z) = 1 / (1 + e⁻ᶻ)
Lexicon-based Sentiment Scorer
A miniature VADER-style scorer. Each word carries a valence; negators flip, intensifiers amplify, and the sum tells you the mood.
Per-word contribution
Negation window: a "not" flips the sign of the next 3 tokens. Booster: words like very, absolutely multiply the next word's valence. This mirrors how real lexicon scorers handle context.
Walk the Vector Space
A small pre-computed embedding projected into 2D. Click a word for its nearest neighbours, or run an analogy: king − man + woman ≈ ?
Nearest neighbours of king
Analogy
The 2D layout is a projection of 50-dim toy vectors. Distances on screen approximate cosine similarity — words with similar contexts cluster together.
HMM Part-of-Speech Tagging — Viterbi Trellis
An HMM picks the single most probable tag sequence. The Viterbi algorithm fills a grid of best scores left-to-right, then back-traces the winning path.
Tagged output
Viterbi trellis (best log-prob to reach each state)
Tag set: DET NOUN VERB ADJ. Emission and transition probabilities are toy values you can tweak in app.js. Hover a cell to see its predecessor and the move that produced it.
Named Entity Recognition
A rule + gazetteer NER that finds PERSON, LOCATION, ORGANIZATION, DATE and MONEY. Type something; watch entities highlight.
Annotated output
Entity list
A Tiny Feed-Forward Network
Two inputs, a hidden layer of 4 units, one output. Drag the inputs and watch activations flow through. Edges glow with the size of the weight × activation product.
Activation function
RNN Hidden States — step by step
An Elman RNN processes one token at a time. Each step combines the new input with the previous hidden state through tanh. Press Step and watch the state evolve.
Self-Attention — visualised
Every token computes a Query, Key and Value vector. Attention scores between Q and K decide how much of each Value to mix in. The heatmap below is computed from your sentence in the browser.
Attention heatmap (head 1)
How to read it
Row = the token doing the looking (query). Column = the token being looked at (key). A bright cell means "this row is paying a lot of attention to that column when forming its new representation."
Q · K · V
Transformer block — schematic
Byte-Pair Encoding — watch the merges
BPE is how GPT, BERT and Llama actually split text. Start from characters, then greedily merge the most frequent adjacent pair. After enough merges, common words become single tokens, rare words split into known sub-pieces.
Learned merges (in order)
Final vocabulary
Tokenize new text
Why subwords? Whole-word vocabularies blow up and can't handle unseen words (doomscroll, antivax). Character vocabularies are tiny but sequences become very long. BPE strikes the balance — typical real-world vocab sizes are 30k–100k.
N-gram Language Model
The original next-word predictor. Count how often each word follows the previous (n−1), then sample from those distributions. Add-one (Laplace) smoothing handles unseen contexts.
Generated text
Next-word distribution after current context
Levenshtein Edit Distance
The minimum number of insertions, deletions and substitutions to turn one string into another. Solved with dynamic programming — the trace through the grid reveals the alignment.
DP table (red = substitute, blue = insert, green = delete on the optimal path)
Operations
Topic Modeling — toy LDA
LDA assumes each document is a mixture of topics and each topic is a distribution over words. We use a simple co-occurrence + clustering heuristic to expose the same idea: documents get assigned to topics, topics surface their characteristic words.
Topics (top words by weight)
Document → topic mixture
Beam Search
Greedy decoding picks the single best next token; beam search keeps the k best running sequences and prunes at every step. Higher beam width = better text, at quadratic cost.
Beam expansion tree
Final beams (sorted by log-probability)
Extractive Summarization (TextRank-style)
No language model required. Build a sentence-similarity graph, run PageRank, take the top sentences. Surprisingly effective for news, less so for opinion or dialogue.
Ranked sentences
Generated summary
Span-based Question Answering
SQuAD-style QA finds the answer as a contiguous span inside a given passage. This toy version scores spans by word-overlap and proximity to question keywords — same shape as real start/end pointer networks, far simpler model.
Answer
Top candidate spans
BLEU & ROUGE — generation metrics
BLEU rewards precision of n-grams against a reference. ROUGE rewards recall. Both are crude approximations of human judgment, but they're the default benchmarks in MT and summarization.
BLEU score
ROUGE-1 / ROUGE-2 / ROUGE-L
Confusion Matrix & Per-class metrics
Live confusion matrix computed from your Naive Bayes classifier (above) — fixed test set, no peeking at training labels. Watch precision/recall/F1 shift as you edit the training data.
Test set
Confusion matrix (predicted × actual)
Per-class metrics
Macro-average
Open challenges in modern NLP
Even with foundation models, the field still grapples with these. Click any tile to read the core tension.
Course Flashcards
All — question/answer pairs from your course's Anki deck. Click the card to flip. Filter by tag, shuffle, or shuffle and track which ones you've nailed.