BCSAI · AI: Computer Vision · Group Project

Knowing when a driver stops paying attention.

A real-time fatigue detection system that watches for closing eyes, yawns, and a drooping head — and only switches on once the driver performs the right gesture sequence. Classical computer vision and a deep CNN, working together.

See how it works → Model performance

● ALERT

Eye Aspect Ratio — 0.31

A toy illustration of the eye-aspect-ratio signal — the real system runs at video frame rate.

01 — Overview

Two ways of seeing, combined into one verdict.

The system simulates an in-car monitoring scenario using a webcam or recorded video. It detects fatigue with both classical computer-vision heuristics and a trained deep-learning eye classifier — and recommends running them together in a hybrid mode where each covers the other's blind spots.

02 — How it works

Three detection pipelines.

CLASSICAL

Geometry of a tired face

Eye aspect ratio for eye closure
Yawning detection from mouth opening
Head-pose estimation for nodding off

DEEP LEARNING

A CNN that reads eyes

CNN eye classifier — open vs. closed
Trained on the MRL Eye Dataset
Robust where geometry alone is noisy

HYBRID ★

The recommended mode

Fuses classical + deep-learning outputs
Fewer false alarms, fewer misses
Default for the live demo workflow

03 — Gesture-based activation

It won't watch you until you ask it to.

The detector starts inactive by default. It only arms after the driver performs a correct sequence of gestures — the right gestures, in the right order, within a time limit — handled by a small state machine.

STATE 00

Inactive

System idle, no monitoring.

STATE 01

Gesture sequence

Correct gestures, correct order, under a time constraint.

STATE 02

Armed → ALERT

Normal driving condition confirmed.

STATE 03

DROWSY

Fatigue detected — the system raises the flag.

04 — Model performance

The eye classifier, by the numbers.

The CNN eye classifier was trained and evaluated on the MRL Eye Dataset. The dataset is used only for training and evaluation — never at runtime.

% Accuracy

% Precision

% Recall

% F1-score

05 — Architecture

How the project is laid out.

.
├── src/
│   ├── main.py              # entry point — webcam or video
│   ├── fatigue/             # basic.py (classical) · dl.py (CNN)
│   ├── gestures/            # classifier.py · state_machine.py
│   └── utils/               # dataset download helpers
├── scripts/
│   ├── train_eye_classifier.py
│   └── evaluate.py
├── models/
│   ├── eye_classifier.pt    # trained CNN weights
│   └── hand_landmarker.task # gesture landmarks
├── tests/                   # test_gesture.py · test_sequence.py
└── requirements.txt

Run modes — classical (traditional CV only) · dl (deep learning only) · both (combined, recommended). Launch with python -m src.main --mode both.