A comparison of LLM and human answers to StackOverflow programming questions

Authors

  • RICHA GUPTA Department of Computer Science, George Mason University, Fairfax, VA
  • MAISHA FARZANA Department of Computer Science, George Mason University, Fairfax, VA
  • Sajed Jalil Department of Computer Science, George Mason University, Fairfax, VA
  • Thomas D. LaToza Department of Computer Science, George Mason University, Fairfax, VA

DOI:

https://doi.org/10.13021/jssr2023.3896

Abstract

In recent years, large language models (LLMs) have been increasingly used by developers to help write and debug code and learn new programming languages and frameworks. However, LLMs are prone to hallucination, creating fictitious content, and often confidently produce incorrect or misleading answers. To assess how LLM responses compare to human responses to common programming questions, we are designing a study comparing LLM answers to human answers to StackOverflow questions. We sampled a total of 300 StackOverflow questions, using 100 from the top voted, middle voted, and lowest voted questions to compare across varying types of questions within specified timestamps. To compare LLM answers to human answers, we are using thematic analysis to develop a codebook which identifies elements which differ as well as differences in quality. We hope our future results will highlight LLMs' potential in complementing human expertise in programming.

Published

2023-10-27

Issue

Section

College of Engineering and Computing: Department of Computer Science

Categories