Understanding the Effectiveness of State-of-the-art Deep Learning Models on Vulnerability Detection

Authors

  • Ansh Agrawal Department of Computer Science, George Mason University, Fairfax, VA
  • Xiaokuan Zhang Department of Computer Science, George Mason University, Fairfax, VA

DOI:

https://doi.org/10.13021/jssr2025.5364

Abstract

Software vulnerability detection is a critical area in cybersecurity, and recent advances have explored the use of deep learning (DL) to automate this process. However, existing DL-based methods often suffer from slow performance on real-world datasets and fail to capture the relationships between functions in large codebases. The original VulBG study proposed a novel approach by using a Behavior Graph Model to extract and connect semantic slices of code, therefore enhancing the effectiveness of baseline DL models in detecting vulnerabilities. Key components of VulBG were reimplemented from scratch, including data loading, slicing, embedding with CodeBERT, clustering via K-means, and graph embedding using Node2Vec, due to limited or incomplete scripts in the original repository. A neural network classifier was then trained using both baseline features and behavior-based graph features. The model was evaluated on real-world C function datasets using standard metrics such as accuracy, precision, recall, and F1-score. The replicated model achieved an F1-score of 0.5791, closely matching the original study and demonstrating improved recall through behavior-based features. This replication confirms that incorporating inter-function semantic relationships via Behavior Graphs can significantly improve DL-based vulnerability detection. It also shows practical changes in reproducibility and suggests potential improvements such as enhanced slicing techniques and model fusion with pretrained embeddings.

Published

2025-09-25

Issue

Section

College of Engineering and Computing: Department of Computer Science