Does Initial Student Performance in Online Courses Lead to Student Failure?


  • Andy Qin
  • Sameeha Malik
  • Suheera Malik
  • Dr. Mihai Boicu



Student failure is detrimental, costing $1700 on average in tuition fees, risking future financial aid, and losing course credits that need to be retaken (College Post 2021). However, diagnosing possible failure early on in any course can open windows for intervention and additional assistance to be provided. This study used five features that gauge student performance for assignments allowing multiple attempts (mean grade, mean duration, mean amount of attempts per assignment, mean of a maximum grade per assignment, mean of durations for the attempt that achieved the max grade) from 243 students and over 1000 assignment attempts to predict student failure (final grade of 73% or below). As some models will be able to classify certain failing students while others do not (Er 2012, He et al., 2020), four different machine learning algorithms were used and tested to predict student failure at 3 different time steps during the course (one assignment completed, two completed, etc.). Logistic Regression served as a baseline model for comparison, and the other models used were K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Multilayer Perceptron (MLP). Each of these models was evaluated with 10-fold cross-validation on their accuracy, precision, recall, and f1-score. The Logistic Regression model peaked at 82.9% accuracy with 60.9% recall, K-Nearest Neighbor at 99.6% accuracy with 100% recall, Support Vector Machine at 97.5% accuracy with 91.6% recall, and Multilayer Perceptron at 95.8% accuracy with 85.7% recall. With recall being the metric that shows how many failing students are correctly identified, K-Nearest Neighbors shows the best performance by identifying all at-risk students.





College of Engineering and Computing: Department of Information Sciences and Technology