Back to projects

Real-time Face-Mask Detection

TensorFlow + SSD Multibox detector for real-time face-mask classification, trained with transfer learning on a Kaggle dataset.

  • Python
  • TensorFlow
  • OpenCV
  • SSD Multibox
  • Computer Vision

/ Outcomes

  • 89% accuracy on the held-out Kaggle test split
  • Real-time detection at standard webcam frame rates with the trained SSD model
  • Transfer learning + augmentation cut required training data without sacrificing accuracy

Overview

A real-time computer-vision build: a face-mask detector running on top of an SSD Multibox detector, designed to run at webcam frame rates and trained with transfer learning so it could converge on a modestly-sized dataset.

Approach

  • Single-shot detector over two-stage. SSD Multibox runs detection and classification in one forward pass, which is what made the real-time target reachable on commodity hardware.
  • Transfer learning over training from scratch. I started from pretrained backbone weights and fine-tuned the detection head for the mask/no-mask task, which collapsed training time and improved generalization on the limited Kaggle data.
  • Aggressive augmentation against the bias of the dataset. Public mask datasets skew toward well-lit, front-facing portraits. Augmentations (rotation, brightness, partial occlusion) widened the distribution the model saw during training.

What I built

  • The training pipeline in Python + TensorFlow, with image preprocessing and the augmentation chain in OpenCV
  • The fine-tuned SSD Multibox detector head, trained against the Kaggle face-mask dataset
  • A real-time inference loop that takes a webcam stream, runs SSD on each frame, and overlays the bounding box with the predicted class

Results

  • 89% accuracy on the held-out test split.
  • The detector ran at standard webcam frame rates, which was the gating constraint for “real-time.”
  • Augmentation + transfer learning meant the model held up on inputs visibly outside the dataset’s typical lighting and angle.

Lessons

The interesting work was not the detection — it was the augmentation pipeline. The dataset’s narrow distribution was the real ceiling on accuracy, and pushing it through synthetic variation did more for generalization than any architecture tweak. Computer-vision quality is a data problem before it is a model problem.