Back to Projects

04 Case Study

Network Attack Detection Using Machine Learning

ML classifier for malicious network traffic — KNN-driven, with integrated logging for per-prediction performance validation.

  • Python
  • Scikit-learn
  • KNN
  • Machine Learning

/ Outcomes

  • 94% classification accuracy on the held-out test split with tuned KNN
  • Compared multiple classical models; KNN won on both accuracy and inference cost
  • Integrated per-prediction logging so the classifier was observable, not opaque

Overview

Intrusion-detection systems sit on a tight trade-off: too sensitive and you train operators to ignore them, too lenient and a real attack walks through. The objective here was a small, observable, defensible classifier — accurate enough to be trusted, simple enough to ship, instrumented well enough to prove its trust over time.

I treated it as a supervised classification study: build a clean feature pipeline, run multiple classical models against the same splits, choose on a defendable metric, and instrument every decision the model makes.

Approach

Three steps drove the work:

  • Feature pipeline first. Network features came in messy — categorical fields, dimensional skew, correlated columns. Cleaning that up before any model selection is what made the comparison fair.
  • Bake-off across classical models. Trained KNN, SVM, decision tree, and random forest on the same processed splits with the same evaluation harness. No hand-tuning that wasn’t applied to all of them.
  • Choose on inference cost as well as accuracy. A high-accuracy classifier that’s slow at inference is a worse production system than one a couple of points behind that fits inline. KNN won on both axes for this dataset.

What I built

  • Feature pipeline in Python — categorical encoding, normalization, correlation pruning — applied uniformly across train and test
  • Model bake-off harness that ran each candidate model with consistent train/eval/log behaviour, producing a comparable per-class confusion matrix for each
  • Tuned KNN classifier with distance weighting and a k chosen from validation performance — the production candidate
  • Inline logging layer that records the prediction, the input feature snapshot, and the model’s confidence on every call, so a deployed instance is observable rather than opaque

Results

  • 94% classification accuracy on the held-out test set with the tuned KNN
  • KNN beat SVM, decision tree, and random forest on accuracy and on inference latency for this feature set
  • The integrated logging proved out a class of bugs that would have been invisible without it — input drift, encoding mismatches, low-confidence predictions clustering on a specific attack class

Lessons

The instrumentation mattered as much as the model. An IDS classifier without per-prediction logging is a black box; you can’t tune it, can’t audit it, can’t explain a missed detection. Bake observability into the model layer from day one and the rest of the system stops being a guessing game.