Evaluating the Effectiveness of AI Assistants in Enhancing Conceptual Understanding of Machine Learning Classification Topics

Authors

  • Ella Oliver Thomas Jefferson High School for Science and Technology, Fairfax, VA
  • Mihika Ranjan Novi High School, Novi, MI
  • Sumaya Binte Zilani Choya Department of Information Sciences and Technology, George Mason University, Fairfax, VA
  • Mihai Boicu Department of Information Sciences and Technology, George Mason University, Fairfax, VA

DOI:

https://doi.org/10.13021/jssr2025.5162

Abstract

Bridging the gap between knowledge and application is essential for effective learning, especially when it comes to machine-learning classification methods. While existing research has examined the effects of AI assistants on topics such as student performance and knowledge retention, limited work has been done in specifically addressing their impact on learning machine learning classification topics. Using a five-point difficulty scale, 20 peer researchers evaluated 15 fundamental classification concepts in a pre-experiment. Support Vector Machines (SVM), which are supervised machine learning algorithms that classify data by finding the optimal hyperplane that maximally separates different classes in a high-dimensional space, was rated as the most difficult topic (M = 3.38). This study focuses on analyzing the accuracy and variation between AI assistants in being used as a tool for understanding machine learning classification topics, specifically SVM. Using rubric-driven prompts for an SVM-based sentiment-shift detection task on the Amazon Fine Food Reviews dataset, we refined this discovery by comparing four AI assistants: ChatGPT, Google Gemini, Claude, and DeepSeek. Each assistant was evaluated on coding quality, reproducibility, explanatory clarity, and predictive performance (macro‑F1, accuracy, recall), with scores out of 100. ChatGPT achieved the highest macro‑F1 (0.82) and accuracy (0.84) and an overall rubric score of 88, significantly outperforming Google Gemini (F1 = 0.80, score = 79) and Claude (F1 = 0.59, score = 72). DeepSeek failed to show promising results, including the F1 score below 0.50. Furthermore, ChatGPT's answers showed excellent process organization and detailed clarity. These findings imply that the capacity of AI assistants to facilitate SVM learning differs significantly. Future research will investigate adaptive prompting techniques catered to learners' skill levels and expand this framework to additional classification methods.

Published

2025-09-25

Issue

Section

College of Engineering and Computing: Department of Information Sciences and Technology