Interactive course companion

Natural Language Processing
— made tangible.

Every core concept from your NLP syllabus, distilled into a live demo you can break, replay, and inspect. Built as a static site you can host on GitHub Pages.

Interactive demos

Sessions covered

314

Flashcards loaded

Foundations

Text Preprocessing

Tokenize, stem, lemmatize, strip stopwords — all live.

Statistical NLP

TF-IDF & Retrieval

Rank documents against a query with cosine similarity.

Representations

Word Embeddings

Walk through a 2D vector space and try analogies.

Sequence labeling

POS Tagging (Viterbi)

Watch the trellis fill in word by word.

Deep learning

RNN Hidden States

Step a recurrent net through a sequence.

Modern NLP

Self-Attention

Render a real attention heatmap from your sentence.

Session 01 · Introduction

The NLP Pipeline

Almost every NLP system, from a spam filter to GPT, follows the same five stages. Click any stage to see what happens inside.

Data Collection
& Cleaning

→

Preprocessing

→

Feature
Engineering

→

Modeling

→

Monitoring
& Evaluation

Session 02 · History

From Rules to Foundation Models

Seven decades of NLP, condensed. Click any era to load its defining ideas, systems, and breakthroughs.

Session 03 · Fundamental Concepts

Text Preprocessing Playground

Type any text. Watch every preprocessing stage update live. Toggle steps on and off to feel what each one does.

① Sentence segmentation

② Tokenization

③ Normalized tokens

④ Final pipeline output

Stemming chops suffixes (jumping → jump) — fast but crude. Lemmatization uses a dictionary + POS to return real words (was → be, better → good).

Session 06 · Regular Expressions

Regex Playground

Regex is the workhorse of preprocessing, scraping, and tokenization. Try a pattern, see matches highlight live.

/ /

Session 07 · Information Retrieval

TF-IDF & Cosine Similarity

Build a tiny search engine. Each document becomes a sparse vector weighted by how distinctive its words are; the query is scored by cosine similarity.

Documents

Query

Ranked results

TF-IDF weight matrix (top terms)

Session 09 · Naive Bayes

Multinomial Naive Bayes Classifier

A spam classifier you can train in your browser. Watch the priors and per-word likelihoods update as you flip labels.

Training data

Classify a new message

Decision math

Top discriminative words per class

Session 10 · Logistic Regression

Logistic Regression — see the decision boundary

The sigmoid squashes any score into a probability. Drag the weights and bias to move the boundary; click the canvas to add new points.

Left-click = ★ class · Shift-click = ▲ class · Right-click = remove

w₁ (x weight) 1.0

w₂ (y weight) -1.0

b (bias) 0.0

Sigmoid σ(z) = 1 / (1 + e⁻ᶻ)

Loss—

Accuracy—

Session 11 · Sentiment Analysis

Lexicon-based Sentiment Scorer

A miniature VADER-style scorer. Each word carries a valence; negators flip, intensifiers amplify, and the sum tells you the mood.

—

Per-word contribution

Negation window: a "not" flips the sign of the next 3 tokens. Booster: words like very, absolutely multiply the next word's valence. This mirrors how real lexicon scorers handle context.

Session 13 · Word Embeddings

Walk the Vector Space

A small pre-computed embedding projected into 2D. Click a word for its nearest neighbours, or run an analogy: king − man + woman ≈ ?

Nearest neighbours of king

Analogy

− + =

The 2D layout is a projection of 50-dim toy vectors. Distances on screen approximate cosine similarity — words with similar contexts cluster together.

Session 14 · POS Tagging

HMM Part-of-Speech Tagging — Viterbi Trellis

An HMM picks the single most probable tag sequence. The Viterbi algorithm fills a grid of best scores left-to-right, then back-traces the winning path.

Tagged output

Viterbi trellis (best log-prob to reach each state)

Tag set: DET NOUN VERB ADJ. Emission and transition probabilities are toy values you can tweak in app.js. Hover a cell to see its predecessor and the move that produced it.

Session 15 · Named Entity Recognition

Named Entity Recognition

A rule + gazetteer NER that finds PERSON, LOCATION, ORGANIZATION, DATE and MONEY. Type something; watch entities highlight.

Annotated output

Entity list

Session 16 · Neural Networks

A Tiny Feed-Forward Network

Two inputs, a hidden layer of 4 units, one output. Drag the inputs and watch activations flow through. Edges glow with the size of the weight × activation product.

Input x₁ 0.5

Input x₂ -0.3

Activation function

Session 18 · Beyond RNNs

RNN Hidden States — step by step

An Elman RNN processes one token at a time. Each step combines the new input with the previous hidden state through tanh. Press Step and watch the state evolve.

Session 19 · Transformers (Attention Is All You Need)

Self-Attention — visualised

Every token computes a Query, Key and Value vector. Attention scores between Q and K decide how much of each Value to mix in. The heatmap below is computed from your sentence in the browser.

Attention heatmap (head 1)

How to read it

Row = the token doing the looking (query). Column = the token being looked at (key). A bright cell means "this row is paying a lot of attention to that column when forming its new representation."

Q · K · V

Transformer block — schematic

Input embedding + positional encoding

↓

Multi-head self-attention

↓

Add & LayerNorm

↓

Feed-forward (MLP)

↓

Add & LayerNorm

↓

Output → next layer

Modern tokenization · Beyond Session 03

Byte-Pair Encoding — watch the merges

BPE is how GPT, BERT and Llama actually split text. Start from characters, then greedily merge the most frequent adjacent pair. After enough merges, common words become single tokens, rare words split into known sub-pieces.

Number of merges 10

Learned merges (in order)

Final vocabulary

Tokenize new text

Why subwords? Whole-word vocabularies blow up and can't handle unseen words (doomscroll, antivax). Character vocabularies are tiny but sequences become very long. BPE strikes the balance — typical real-world vocab sizes are 30k–100k.

Statistical NLP · Pre-neural language modelling

N-gram Language Model

The original next-word predictor. Count how often each word follows the previous (n−1), then sample from those distributions. Add-one (Laplace) smoothing handles unseen contexts.

n = seed:

Generated text

Next-word distribution after current context

Foundational · Used in spell-check, MT eval, DNA alignment

Levenshtein Edit Distance

The minimum number of insertions, deletions and substitutions to turn one string into another. Solved with dynamic programming — the trace through the grid reveals the alignment.

source: target:

DP table (red = substitute, blue = insert, green = delete on the optimal path)

Operations

Notebook 07 · Latent Dirichlet Allocation

Topic Modeling — toy LDA

LDA assumes each document is a mixture of topics and each topic is a distribution over words. We use a simple co-occurrence + clustering heuristic to expose the same idea: documents get assigned to topics, topics surface their characteristic words.

K (number of topics) =

Topics (top words by weight)

Document → topic mixture

Notebook 20 · Text Generation

Beam Search

Greedy decoding picks the single best next token; beam search keeps the k best running sequences and prunes at every step. Higher beam width = better text, at quadratic cost.

beam width = seed: steps:

Beam expansion tree

Final beams (sorted by log-probability)

Notebook 21 · Text Summarization

Extractive Summarization (TextRank-style)

No language model required. Build a sentence-similarity graph, run PageRank, take the top sentences. Surprisingly effective for news, less so for opinion or dialogue.

summary length: 3 sentences

Ranked sentences

Generated summary

Notebook 22 · Question Answering

Span-based Question Answering

SQuAD-style QA finds the answer as a contiguous span inside a given passage. This toy version scores spans by word-overlap and proximity to question keywords — same shape as real start/end pointer networks, far simpler model.

Question:

Answer

—

Top candidate spans

Evaluation · MT, summarization, captioning

BLEU & ROUGE — generation metrics

BLEU rewards precision of n-grams against a reference. ROUGE rewards recall. Both are crude approximations of human judgment, but they're the default benchmarks in MT and summarization.

Candidate:

Reference:

BLEU score

—

ROUGE-1 / ROUGE-2 / ROUGE-L

Evaluation · Classification

Confusion Matrix & Per-class metrics

Live confusion matrix computed from your Naive Bayes classifier (above) — fixed test set, no peeking at training labels. Watch precision/recall/F1 shift as you edit the training data.

Test set

Confusion matrix (predicted × actual)

Per-class metrics

Macro-average

Sessions 24–25 · Challenges & Summary

Open challenges in modern NLP

Even with foundation models, the field still grapples with these. Click any tile to read the core tension.

Study mode · powered by your nlp.tsv

Course Flashcards

All — question/answer pairs from your course's Anki deck. Click the card to flip. Filter by tag, shuffle, or shuffle and track which ones you've nailed.

0 / 0

Loading cards…

0 known 0 missed 0 seen

Natural Language Processing— made tangible.