A Robust Comparison of the KDDCup99 and NSL-KDD Intrusion Detection Datasets by Utilizing Principle Component Analysis and Evaluating the Performance of Various Machine Learning Algorithms
In recent years, as intrusion attacks on IoT networks have grown exponentially, there is an immediate need for sophisticated intrusion detection systems (IDSs). A vast majority of current IDSs are data-driven, which means that the most important aspect of this area of research is the quality of the data acquired from IoT network traffic. Two of the most cited intrusion detection datasets are the KDDCup99 (KDD) and the NSL-KDD (NSL). The main goal of our project was to conduct a robust comparison of both datasets by evaluating the performance of various machine learning classifiers with a larger set of classification metrics than previous researchers. This was done by using numerous Python ML packages and data visualization tools. From our research, we were able to conclude that the NSL dataset is of a higher quality than the KDD dataset. This is because the classifiers trained on the KDD dataset exhibited a bias towards the redundancies within it. Contrary to other scientists, however, our research also showed that the algorithms trained on the NSL dataset were unable to consistently detect the U2R (user to root) and R2L (remote to user) attacks when in fact the NSL dataset was created for this purpose. Overall, due to the correlation between dataset quality and the strength of IDSs, our comprehensive analysis will allow researchers to have a greater source of information when deciding how to develop their intrusion detection datasets and models in the future.