Using Machine Learning Models to Estimate Final Student Grade from Initial Performance in a Course


  • Laura Zhang Aspiring Scientists' Summer Internship Program Intern
  • Dr. Mihai Boicu Aspiring Scientists' Summer Internship Program Mentor



Learning Management Systems (LMS) contain large amounts of student data that can be used to help improve student learning by applying machine learning. Earlier studies have used various LMS data (i.e. online activity, participation in discussions, and assessment grades) and prediction methods to identify at-risk students (Conijn, et al., 2017, Marbouti, et al., 2016). My research explores the idea of using early academic performance in a course to predict future performance. Using three common machine learning algorithms (K-Nearest Neighbors, Naive Bayes Classifier, Support Vector Machine), three prediction models were created. The data used in the prediction models were pairs of 5-week grades and final course grades from several sections of the same course. In order to assess the accuracy of the prediction models, the final grade percentage predictions were grouped into letter-grade classes. The model performance metrics did not indicate high predictive powers (accuracy: ~65%, precision: ~63%, recall: ~65%), thus ensemble voting was tested to combine the results of the three models. A minimal improvement in prediction accuracy was seen after employing ensemble voting. However, further experimentation was conducted by extending the time frame from 5 weeks to 10 weeks. When 10-week grades were assessed in the models, the prediction accuracy improved (accuracy: ~74%, precision: ~76%, recall: ~74%). The false positives and false negatives in the predictions were also analyzed as false positives are preferred over false negatives in this study. The results showed relatively similar amounts of false positives and false negatives. While the cumulative grade is easy to be used by an instructor for identifying students at risk, the main result of this study is that looking only at the cumulative grade in this course is not a good enough indicator for final course results. Future research must be done to identify other features that may predict the final course results.





College of Engineering and Computing: Department of Information Sciences and Technology