From-Scratch Build · Machine Learning

MortgagePredict

Will this loan be approved — and should it be? MortgagePredict learns to forecast mortgage outcomes from applicant and loan features, but the accuracy is only half the story. I rebuilt it from scratch because lending is where machine learning meets real consequences, and a model you can't explain or trust to be fair has no business making the call.

Pythonscikit-learnLogistic + Gradient boosting Fairness metricsExplainabilityBias mitigation

What it is

Prediction with consequences attached

On the surface it's a textbook supervised problem: take a table of applicant and loan attributes — income, loan amount, DTI, credit score, employment, term — and predict approval. The pipeline is a scikit-learn ColumnTransformer (scale numerics, one-hot encode categoricals) feeding a logistic-regression or gradient-boosting classifier. On a held-out test set it reaches 0.787 accuracy and 0.866 ROC-AUC.

But the headline isn't that number. The data carries a deliberate, measurable bias against one group. The model never sees the protected attribute — and is still biased, because bias leaks through correlated features. This build measures that with fairness metrics implemented from their definitions, explains every prediction, and shows a mitigation that closes the gap at a cost you can argue about.

why?
the question every prediction must answer — in lending, an unexplained or discriminatory decision isn't just bad practice, it's often illegal.

Note: the dataset is synthetic — a realistic generator with an injected group bias, since no clean public mortgage file downloads without auth. It is labelled synthetic throughout the code and README.

The stack

From application row to trustworthy decision

Accuracy, then the things that make accuracy usable.

data

Applicant features

Income, loan size, DTI, credit score, employment and term — plus a protected group attribute, with a measurable bias baked in.

prep

ColumnTransformer

StandardScaler on numerics, OneHotEncoder on categoricals. The protected attribute is excluded — the model is group-blind.

model

LogReg + Gradient boosting

A transparent linear model and a strong tabular learner, scored on accuracy, ROC-AUC and a confusion matrix.

audit

Fairness metrics

Demographic-parity difference, equal-opportunity (TPR) gap and disparate-impact ratio — coded from their definitions.

explain

Feature attribution

Global importance (coefficients + permutation) and a signed, per-applicant breakdown of why a score came out as it did.

fix

Bias mitigation

Per-group decision thresholds that close the parity gap — and surface the accuracy / equal-opportunity tradeoff honestly.

Architecture

How a prediction is made — and justified

Each application flows through a pipeline built for scrutiny, not just scoring:

  1. Prepare

    Scale numerics and one-hot encode categoricals via a ColumnTransformer; drop the protected attribute.

  2. Train

    Fit logistic-regression and gradient-boosting classifiers on the outcomes.

  3. Audit

    Measure demographic parity, equal-opportunity gap and disparate impact across the protected group.

  4. Explain

    Attribute the model globally, and each individual decision to its driving features.

  5. Mitigate

    Apply per-group thresholds to close the parity gap — and report what it costs.

Results

Real numbers from train.py

25% stratified hold-out. Synthetic data, but these are genuine model outputs — nothing here is invented.

accuracy

0.787

LogisticRegression on the test set (GradientBoosting: 0.780).

roc-auc

0.866

Ranking quality of the logistic model (GradientBoosting: 0.860).

raw bias

0.297

Approval-rate gap baked into the data: 0.642 for group A vs 0.345 for group B.

Fairness, before → after mitigation

Reflection

What building it taught me