Back to projects

Network Attack Detection in Software-Defined Networks

Machine-learning intrusion-detection study comparing KNN, SVM, Decision Tree, and Random Forest on NSL-KDD for SDN environments.

  • Python
  • Scikit-learn
  • KNN
  • SVM
  • Random Forest
  • NSL-KDD

/ Outcomes

  • Reduced false-positive rate by 15% over the baseline classifier
  • Compared four classical models on NSL-KDD; KNN achieved the best end-to-end accuracy
  • Feature-selection pass cut input dimensionality without measurable accuracy loss

Overview

Software-defined networks centralize control-plane decisions, which is great for programmability and terrible for the blast radius of an undetected intrusion. The goal here was to put a practically deployable intrusion-detection system in front of an SDN controller — small enough to run inline, accurate enough to be trusted, and explicit about which signal is driving each decision.

I treated the problem as a supervised classification study on the NSL-KDD benchmark, then narrowed to the model that held up under realistic class imbalance.

Approach

I framed the work in three passes:

  • Bake-off across classical models. I trained K-Nearest Neighbors, Support Vector Machine, Decision Tree, and Random Forest on the same NSL-KDD splits. Each got the same preprocessing pipeline so the comparison was honest.
  • Feature selection over kitchen-sink inputs. NSL-KDD has 41 features, many of which are correlated or stale relative to modern SDN traffic. I dropped the ones that did not contribute to per-class precision and rechecked accuracy after each pass.
  • Tune for false-positive cost, not just accuracy. In an inline IDS, a noisy classifier is worse than a slightly less accurate one — you train operators to ignore it. I tuned thresholds and class weights against false-positive rate explicitly.

What I built

  • The end-to-end training and evaluation pipeline in Python + Scikit-learn, with reproducible splits and per-model evaluation reports
  • A feature-selection step using mutual information and correlation pruning, scored against a held-out validation slice
  • A comparison harness that surfaced precision/recall/FPR per attack class across all four models, so the model choice could be defended on more than headline accuracy

Results

  • KNN had the best end-to-end accuracy of the four classical models I tried. With distance-weighted voting and tuned k, it edged Random Forest on overall accuracy and noticeably beat SVM on minority attack classes.
  • False-positive rate dropped 15% versus the baseline classifier after the feature-selection and threshold-tuning pass — the metric that matters most for an inline IDS.
  • The study produced a reusable harness for re-running the same comparison on a different traffic capture with minimal changes.

Lessons

The most useful insight was not which model won, but how unstable the ranking is when you change the evaluation metric. KNN won on accuracy; on raw recall for the rarest attack classes a tuned Random Forest was sometimes better. Picking a single “best model” without first nailing down the cost function is how you ship the wrong classifier.