A Comparison Between Manual Free Response Classification and Machine Learning and Natural Language Processing Techniques in Terms of Speed and Accuracy


  • Thrisha Sakamuri
  • Dr. Mihai Boicu




Several studies in the past have shown that different machine learning techniques, ranging from natural language processing to random forests, all obtained accuracy rates around 85% to 90% when processing academic free response answers. (Somers et al., 2021; Nawaz et al., 2022; Bashir et al., 2021; Gomes Rocha et al., 2021). The purpose of our ongoing study is to use natural language processing to analyze feedback based free response answers. We used 81 answers to the question, “Provide up to 3 things that you want to keep (loved) about the research modules and why”. Our study has the goal to compare the accuracy of a natural-language-based classifier. The answers were first manually processed to create a gold standard. We split the given answers into the main points and then classified each answer into broader categories. We then started a similar process using natural language processing of the split answers. We are currently extracting the verb phrase and using those as potential generalized categories of the answer. We will compare the generalized categories with our initially identified categories and then try to use a classifier to group the extracted categories. Our goal is to provide a fast and sound method of pulling the main points from longer answers. With this procedure, instructors could spend much less time analyzing survey responses over large groups of students.





College of Engineering and Computing: Department of Information Sciences and Technology