A Robust Comparison of the KDDCup99 and NSL-KDD Intrusion Detection Datasets by Utilizing Principle Component Analysis and Evaluating the Performance of Various Machine Learning Algorithms

Authors

  • Suchet Sapre Aspiring Scientists' Summer Internship Program, 2019
  • Dr. Pouyan Ahmadi Department of Information Sciences and Technology, Volgenau School of Engineering, George Mason University
  • Dr. Khondkar Islam Department of Information Sciences and Technology, Volgenau School of Engineering, George Mason University

DOI:

https://doi.org/10.13021/jssr2019.2681

Abstract

In recent years, as intrusion attacks on IoT networks have grown exponentially, there is an immediate need for sophisticated intrusion detection systems (IDSs). A vast majority of current IDSs are data-driven, which means that the most important aspect of this area of research is the quality of the data acquired from IoT network traffic. Two of the most cited intrusion detection datasets are the KDDCup99 (KDD) and the NSL-KDD (NSL). The main goal of our project was to conduct a robust comparison of both datasets by evaluating the performance of various machine learning classifiers with a larger set of classification metrics than previous researchers. This was done by using numerous Python ML packages and data visualization tools. From our research, we were able to conclude that the NSL dataset is of a higher quality than the KDD dataset. This is because the classifiers trained on the KDD dataset exhibited a bias towards the redundancies within it. Contrary to other scientists, however, our research also showed that the algorithms trained on the NSL dataset were unable to consistently detect the U2R (user to root) and R2L (remote to user) attacks when in fact the NSL dataset was created for this purpose. Overall, due to the correlation between dataset quality and the strength of IDSs, our comprehensive analysis will allow researchers to have a greater source of information when deciding how to develop their intrusion detection datasets and models in the future.

Published

2019-11-19

Issue

Section

Abstracts from the 2019 Aspiring Scientists' Summer Internship Program

Categories