Machine learning · interactive demo

When every $1 matters.

A thousand synthetic credit-card transactions, ~1.5% of them fraudulent, scored by a calibrated model. Drag the threshold and watch the confusion matrix, precision, recall, and live alert stream re-classify in real time.

Live

Threshold dashboard

Synthetic dataset, deterministic across reloads. The "model" is a calibrated scorer with realistic overlap — frauds tend to score high, legits low, but the two distributions overlap enough that no threshold is perfect.

0.50
flag fewer (precision↑) flag more (recall↑)
Precision of flagged, share that's truly fraud
Recall of all fraud, share you caught
F1 harmonic mean of P & R
Accuracy misleading on imbalanced data

Confusion matrix

truth prediction legit fraud legit fraud
0true negative
0false alarm
0missed fraud
0caught fraud

ROC curve

Each point is a different threshold. The marker is your current pick.

AUC = —

All 1,000 transactions, projected

Two abstract feature axes. Frauds tend to cluster top-right; legits to the lower-left. Marker outline reflects what the current threshold predicts.

Live transaction stream

    Reading the dashboard

    Why threshold is the whole game.

    A binary classifier doesn't give you an answer — it gives you a score between 0 and 1. Someone has to choose where to draw the line. That's the threshold.

    On a balanced dataset, 0.5 is a fine default. On imbalanced data — like credit-card fraud where positives are well under 1% of all transactions — the same threshold will catch almost nothing, because the model's distribution of scores is concentrated near zero. Lower the threshold and you start catching more fraud (recall climbs) but you also start raising more false alarms (precision falls). Move it the other way and you flag almost nothing — your precision looks great, but you're missing real fraud.

    The right threshold is a business decision, not a statistics one: how much is a missed fraud worth versus a customer-annoying false alarm? The dashboard above lets you feel that trade-off rather than read about it. Watch the confusion matrix as you drag, and notice that "accuracy" barely moves — that's why we don't use it on this problem.

    What the metrics mean here