From-Scratch Build · Machine Learning
Will this loan be approved — and should it be? MortgagePredict learns to forecast mortgage outcomes from applicant and loan features, but the accuracy is only half the story. I rebuilt it from scratch because lending is where machine learning meets real consequences, and a model you can't explain or trust to be fair has no business making the call.
What it is
On the surface it's a textbook supervised problem: take a table of applicant and loan attributes — income, loan amount, DTI, credit score, employment, term — and predict approval. The pipeline is a scikit-learn ColumnTransformer (scale numerics, one-hot encode categoricals) feeding a logistic-regression or gradient-boosting classifier. On a held-out test set it reaches 0.787 accuracy and 0.866 ROC-AUC.
But the headline isn't that number. The data carries a deliberate, measurable bias against one group. The model never sees the protected attribute — and is still biased, because bias leaks through correlated features. This build measures that with fairness metrics implemented from their definitions, explains every prediction, and shows a mitigation that closes the gap at a cost you can argue about.
Note: the dataset is synthetic — a realistic generator with an injected group bias, since no clean public mortgage file downloads without auth. It is labelled synthetic throughout the code and README.
The stack
Accuracy, then the things that make accuracy usable.
Income, loan size, DTI, credit score, employment and term — plus a protected group attribute, with a measurable bias baked in.
StandardScaler on numerics, OneHotEncoder on categoricals. The protected attribute is excluded — the model is group-blind.
A transparent linear model and a strong tabular learner, scored on accuracy, ROC-AUC and a confusion matrix.
Demographic-parity difference, equal-opportunity (TPR) gap and disparate-impact ratio — coded from their definitions.
Global importance (coefficients + permutation) and a signed, per-applicant breakdown of why a score came out as it did.
Per-group decision thresholds that close the parity gap — and surface the accuracy / equal-opportunity tradeoff honestly.
Architecture
Each application flows through a pipeline built for scrutiny, not just scoring:
Scale numerics and one-hot encode categoricals via a ColumnTransformer; drop the protected attribute.
Fit logistic-regression and gradient-boosting classifiers on the outcomes.
Measure demographic parity, equal-opportunity gap and disparate impact across the protected group.
Attribute the model globally, and each individual decision to its driving features.
Apply per-group thresholds to close the parity gap — and report what it costs.
Results
train.py25% stratified hold-out. Synthetic data, but these are genuine model outputs — nothing here is invented.
LogisticRegression on the test set (GradientBoosting: 0.780).
Ranking quality of the logistic model (GradientBoosting: 0.860).
Approval-rate gap baked into the data: 0.642 for group A vs 0.345 for group B.
Reflection