MortgagePredict — Built From Scratch

What it is

Prediction with consequences attached

On the surface it's a textbook supervised problem: take a table of applicant and loan attributes — income, loan amount, DTI, credit score, employment, term — and predict approval. The pipeline is a scikit-learn ColumnTransformer (scale numerics, one-hot encode categoricals) feeding a logistic-regression or gradient-boosting classifier. On a held-out test set it reaches 0.787 accuracy and 0.866 ROC-AUC.

But the headline isn't that number. The data carries a deliberate, measurable bias against one group. The model never sees the protected attribute — and is still biased, because bias leaks through correlated features. This build measures that with fairness metrics implemented from their definitions, explains every prediction, and shows a mitigation that closes the gap at a cost you can argue about.

why?

the question every prediction must answer — in lending, an unexplained or discriminatory decision isn't just bad practice, it's often illegal.

Note: the dataset is synthetic — a realistic generator with an injected group bias, since no clean public mortgage file downloads without auth. It is labelled synthetic throughout the code and README.

The stack

From application row to trustworthy decision

Accuracy, then the things that make accuracy usable.

data

Applicant features

Income, loan size, DTI, credit score, employment and term — plus a protected group attribute, with a measurable bias baked in.

prep

ColumnTransformer

StandardScaler on numerics, OneHotEncoder on categoricals. The protected attribute is excluded — the model is group-blind.

model

LogReg + Gradient boosting

A transparent linear model and a strong tabular learner, scored on accuracy, ROC-AUC and a confusion matrix.

audit

Fairness metrics

Demographic-parity difference, equal-opportunity (TPR) gap and disparate-impact ratio — coded from their definitions.

explain

Feature attribution

Global importance (coefficients + permutation) and a signed, per-applicant breakdown of why a score came out as it did.

fix

Bias mitigation

Per-group decision thresholds that close the parity gap — and surface the accuracy / equal-opportunity tradeoff honestly.

Architecture

How a prediction is made — and justified

Each application flows through a pipeline built for scrutiny, not just scoring:

Prepare
Scale numerics and one-hot encode categoricals via a ColumnTransformer; drop the protected attribute.
Train
Fit logistic-regression and gradient-boosting classifiers on the outcomes.
Audit
Measure demographic parity, equal-opportunity gap and disparate impact across the protected group.
Explain
Attribute the model globally, and each individual decision to its driving features.
Mitigate
Apply per-group thresholds to close the parity gap — and report what it costs.

Results

Real numbers from `train.py`

25% stratified hold-out. Synthetic data, but these are genuine model outputs — nothing here is invented.

accuracy

0.787

LogisticRegression on the test set (GradientBoosting: 0.780).

roc-auc

0.866

Ranking quality of the logistic model (GradientBoosting: 0.860).

raw bias

0.297

Approval-rate gap baked into the data: 0.642 for group A vs 0.345 for group B.

Fairness, before → after mitigation

Demographic-parity difference: +0.130 → +0.001. Per-group thresholds equalise selection rates almost exactly.
Disparate-impact ratio: 0.769 → 0.999. Before, it fails the EEOC four-fifths (0.8) rule; after, it passes.
Equal-opportunity gap: −0.018 → −0.133, and accuracy 0.787 → 0.766. The catch, stated plainly: chasing demographic parity worsens the TPR gap and costs accuracy — you cannot have both when base rates differ.

Reflection

What building it taught me

Group-blind ≠ fair. Dropping the protected attribute didn't help — the model still showed a 0.13 parity gap and failed the four-fifths rule, because bias rides in on correlated features.
Fairness must be measured, not assumed. A 0.79 accuracy hid serious disparate impact; you only see it when you compute demographic parity, the TPR gap and disparate impact explicitly.
Explainability is a requirement, not a nicety. In lending, "the model said so" is not an acceptable reason — so every prediction decomposes into signed, named contributions.
There is no free lunch in fairness. Equalising selection rates worsened the equal-opportunity gap and cost accuracy. When base rates differ, the definitions conflict — the honest move is to surface the tradeoff, not bury it.
Deployment changes the brief. The goal isn't the best leaderboard number — it's a model you can stand behind to a regulator and an applicant.