Evaluating LLMs as SQL Tutors: A Comparative Study of Adaptive, Feedback-Driven Learning using Commercial and Research-Phase Models

Pranav Anandh; Mihai Boicu

doi:10.13021/jssr2025.5174

Authors

Pranav Anandh Garnet Valley High School, Glen Mills, PA
Mihai Boicu Department of Information Sciences and Technology, George Mason University, Fairfax, VA

DOI:

https://doi.org/10.13021/jssr2025.5174

Abstract

Large Language Models (LLMs) are being increasingly used in education to help people learn various topics, including SQL. While most research focuses on how well AI models can generate correct SQL queries, there has been less attention on how useful they are as tutors, especially when it comes to helping learners fix mistakes through feedback. In this study, several commercial LLMs (GPT-4, Claude, Grok, Llama, and Gemini) and some experimental models (FISQL, CoSQL, DB-GPT, and S3SQL) were evaluated by grading their feedback based on a rubric.

A set of 10 SQL learning tasks across five key topics: selection, aggregation, joins, set operations, and subqueries, was created. For each task, the models were given an incorrect query, and their responses were graded by one researcher using a rubric of 5 points with four weighted areas: accuracy (15%), clarity (20%), guidance (25%), and adaptiveness (40%). Commercial models were generally accurate and fluent, but their ability to guide learners through mistakes varied. Among them, Grok (4.675) stood out as the best-performing commercial model. Conversely, research models, like S3SQL and FISQL, gave strong, structured feedback, especially when they were aware of the database schema, but weren’t built for full back-and-forth conversations. Among those, S3SQL gave the strongest (4.21) and most structured feedback based on published findings.

Overall, our findings show that while LLMs can support SQL education, they vary in terms of adaptability and giving feedback. This study designs a framework for evaluating adaptability and tutoring functionality in both commercial and experimental LLMs, helping future developments in SQL education. Next steps include proposing a new model that combines the features of the best-performing models to further enhance SQL tutoring.

Evaluating LLMs as SQL Tutors: A Comparative Study of Adaptive, Feedback-Driven Learning using Commercial and Research-Phase Models

Authors

DOI:

Abstract

Published

Issue

Section

License

assip