A comparison of LLM and human answers to StackOverflow programming questions
In recent years, large language models (LLMs) have been increasingly used by developers to help write and debug code and learn new programming languages and frameworks. However, LLMs are prone to hallucination, creating fictitious content, and often confidently produce incorrect or misleading answers. To assess how LLM responses compare to human responses to common programming questions, we are designing a study comparing LLM answers to human answers to StackOverflow questions. We sampled a total of 300 StackOverflow questions, using 100 from the top voted, middle voted, and lowest voted questions to compare across varying types of questions within specified timestamps. To compare LLM answers to human answers, we are using thematic analysis to develop a codebook which identifies elements which differ as well as differences in quality. We hope our future results will highlight LLMs' potential in complementing human expertise in programming.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.