Machine Learning ยท Classification

Predicting term-deposit subscriptions

A supervised-learning study on the UCI Bank Marketing dataset โ€” which clients of a Portuguese bank will subscribe to a term deposit after a phone campaign?

The problem

Marketing calls are expensive, so the bank wants to focus on the clients most likely to say yes. Framed as binary classification: from a client's profile and campaign history, predict y โˆˆ {no, yes}.

One field, duration (call length), is dropped: it's only known after the call ends and would leak the outcome โ€” a classic mistake on this dataset. Everything below uses only information available before a call is placed.

Class balance

The target is imbalanced โ€” only about one client in nine subscribes โ€” so we judge models on ROC-AUC and on recall/F1 for the positive class, not raw accuracy (predicting "no" for everyone would already score ~89%).

Target distribution
Subscription outcome across the full dataset.

Models & results

Three classifiers share one preprocessing pipeline (standardised numerics + one-hot categoricals), trained on a stratified 75/25 split.

ModelROC-AUCAccuracyPrecision (yes)Recall (yes)F1 (yes)
ROC curves
ROC curves for all three models.
Confusion matrix
Confusion matrix for the best model.

What drives the prediction

Permutation importance (how much test ROC-AUC drops when a feature is shuffled) on the best model. Macro-economic indicators and the timing/outcome of prior contact dominate โ€” not the client's demographics.

Feature importance
Top features by permutation importance.

Reproduce it

Clone the repo, then pip install -r requirements.txt and python src/train.py to regenerate every figure and the metrics on this page. The walk-through lives in analysis.ipynb.