Real-time Face-Mask Detection
TensorFlow + SSD Multibox detector for real-time face-mask classification, trained with transfer learning on a Kaggle dataset.
/ Outcomes
- 89% accuracy on the held-out Kaggle test split
- Real-time detection at standard webcam frame rates with the trained SSD model
- Transfer learning + augmentation cut required training data without sacrificing accuracy
Overview
A real-time computer-vision build: a face-mask detector running on top of an SSD Multibox detector, designed to run at webcam frame rates and trained with transfer learning so it could converge on a modestly-sized dataset.
Approach
- Single-shot detector over two-stage. SSD Multibox runs detection and classification in one forward pass, which is what made the real-time target reachable on commodity hardware.
- Transfer learning over training from scratch. I started from pretrained backbone weights and fine-tuned the detection head for the mask/no-mask task, which collapsed training time and improved generalization on the limited Kaggle data.
- Aggressive augmentation against the bias of the dataset. Public mask datasets skew toward well-lit, front-facing portraits. Augmentations (rotation, brightness, partial occlusion) widened the distribution the model saw during training.
What I built
- The training pipeline in Python + TensorFlow, with image preprocessing and the augmentation chain in OpenCV
- The fine-tuned SSD Multibox detector head, trained against the Kaggle face-mask dataset
- A real-time inference loop that takes a webcam stream, runs SSD on each frame, and overlays the bounding box with the predicted class
Results
- 89% accuracy on the held-out test split.
- The detector ran at standard webcam frame rates, which was the gating constraint for “real-time.”
- Augmentation + transfer learning meant the model held up on inputs visibly outside the dataset’s typical lighting and angle.
Lessons
The interesting work was not the detection — it was the augmentation pipeline. The dataset’s narrow distribution was the real ceiling on accuracy, and pushing it through synthetic variation did more for generalization than any architecture tweak. Computer-vision quality is a data problem before it is a model problem.