04 Case Study
Network Attack Detection Using Machine Learning
ML classifier for malicious network traffic — KNN-driven, with integrated logging for per-prediction performance validation.
/ Outcomes
- 94% classification accuracy on the held-out test split with tuned KNN
- Compared multiple classical models; KNN won on both accuracy and inference cost
- Integrated per-prediction logging so the classifier was observable, not opaque
Overview
Intrusion-detection systems sit on a tight trade-off: too sensitive and you train operators to ignore them, too lenient and a real attack walks through. The objective here was a small, observable, defensible classifier — accurate enough to be trusted, simple enough to ship, instrumented well enough to prove its trust over time.
I treated it as a supervised classification study: build a clean feature pipeline, run multiple classical models against the same splits, choose on a defendable metric, and instrument every decision the model makes.
Approach
Three steps drove the work:
- Feature pipeline first. Network features came in messy — categorical fields, dimensional skew, correlated columns. Cleaning that up before any model selection is what made the comparison fair.
- Bake-off across classical models. Trained KNN, SVM, decision tree, and random forest on the same processed splits with the same evaluation harness. No hand-tuning that wasn’t applied to all of them.
- Choose on inference cost as well as accuracy. A high-accuracy classifier that’s slow at inference is a worse production system than one a couple of points behind that fits inline. KNN won on both axes for this dataset.
What I built
- Feature pipeline in Python — categorical encoding, normalization, correlation pruning — applied uniformly across train and test
- Model bake-off harness that ran each candidate model with consistent train/eval/log behaviour, producing a comparable per-class confusion matrix for each
- Tuned KNN classifier with distance weighting and a k chosen from validation performance — the production candidate
- Inline logging layer that records the prediction, the input feature snapshot, and the model’s confidence on every call, so a deployed instance is observable rather than opaque
Results
- 94% classification accuracy on the held-out test set with the tuned KNN
- KNN beat SVM, decision tree, and random forest on accuracy and on inference latency for this feature set
- The integrated logging proved out a class of bugs that would have been invisible without it — input drift, encoding mismatches, low-confidence predictions clustering on a specific attack class
Lessons
The instrumentation mattered as much as the model. An IDS classifier without per-prediction logging is a black box; you can’t tune it, can’t audit it, can’t explain a missed detection. Bake observability into the model layer from day one and the rest of the system stops being a guessing game.