Real-time Face-Mask Detection

Overview

A real-time computer-vision build: a face-mask detector running on top of an SSD Multibox detector, designed to run at webcam frame rates and trained with transfer learning so it could converge on a modestly-sized dataset.

Approach

Single-shot detector over two-stage. SSD Multibox runs detection and classification in one forward pass, which is what made the real-time target reachable on commodity hardware.
Transfer learning over training from scratch. I started from pretrained backbone weights and fine-tuned the detection head for the mask/no-mask task, which collapsed training time and improved generalization on the limited Kaggle data.
Aggressive augmentation against the bias of the dataset. Public mask datasets skew toward well-lit, front-facing portraits. Augmentations (rotation, brightness, partial occlusion) widened the distribution the model saw during training.

What I built

The training pipeline in Python + TensorFlow, with image preprocessing and the augmentation chain in OpenCV
The fine-tuned SSD Multibox detector head, trained against the Kaggle face-mask dataset
A real-time inference loop that takes a webcam stream, runs SSD on each frame, and overlays the bounding box with the predicted class

Results

89% accuracy on the held-out test split.
The detector ran at standard webcam frame rates, which was the gating constraint for “real-time.”
Augmentation + transfer learning meant the model held up on inputs visibly outside the dataset’s typical lighting and angle.

Lessons

The interesting work was not the detection — it was the augmentation pipeline. The dataset’s narrow distribution was the real ceiling on accuracy, and pushing it through synthetic variation did more for generalization than any architecture tweak. Computer-vision quality is a data problem before it is a model problem.