A Novel Approach to Detecting AI-Generated SQL Queries and Explanations By Utilizing Parallel Dual-Encoder Architecture

Authors

  • Hem Akarapu Tesla STEM High School, Redmond, WA
  • Rithvik Pathuri Rock Ridge High School and Academies of Loudoun, Ashburn, VA
  • Ethan Zhang College of Computing, Georgia Institute of Technology, Atlanta, GA
  • Mihai Boicu Department of Information Sciences and Technology, George Mason University, Fairfax, VA

Abstract

AI code generation disrupts programming education by facilitating academic dishonesty but also presents risks to software development, introducing unintended code into complex systems. Specifically for SQL, these risks escalate dramatically because unlike application crashes from coding errors, faulty AI-generated queries corrupt, delete, or expose sensitive data, possibly costing companies billions in losses and regulatory penalties. Traditional detectors for AI generated code are multifaceted and are not specific to SQL Code. This research presents a novel detection system for AI-generated SQL queries and their explanations using a parallel dual-encoder architecture which processes SQL code through specialized transformers (DeepSeek-Coder-V2, StarCoder 2) while simultaneously analyzing explanatory text using language models (DeBERTaV3, LLaMA 3). The detector was trained and evaluated on LeetCode SQL problems using human solutions from LeetCode posts, complemented by AI-generated solutions from GPT-o3, Claude 4 Opus, Gemini 2.5 Pro, and DeepSeek R1 generated by the research team. Our Multi-Model Feature Fusion with Binary Classifier Training achieved 98% accuracy and 98% F1-score, demonstrating consistent performance in all tested AI models. This framework provides educators and organizations with a reliable tool for maintaining academic integrity and database security while establishing a foundational methodology for AI-generated code detection research. Future work involves modifying the dataset to address syntax and structural differences across various SQL dialects, such as PostgreSQL, MySQL, and SQLite.

Published

2025-09-25

Issue

Section

College of Engineering and Computing: Department of Information Sciences and Technology