Using large language models to author debugging hypotheses

Authors

  • GAVIN CRIGGER Department of Computer Science, George Mason University, Fairfax, VA
  • RAYMOND FU Department of Computer Science, George Mason University, Fairfax, VA
  • Abdulaziz Alaboudi Department of Computer Science, George Mason University, Fairfax, VA
  • Thomas D. LaToza Department of Computer Science, George Mason University, Fairfax, VA

DOI:

https://doi.org/10.13021/jssr2023.3895

Abstract

Debugging hypotheses are a central part of the debugging process, guiding developers to look for specific evidence to accept or reject their hypotheses about the cause of a defect. Finding a relevant debugging hypothesis quickly can greatly reduce the time developers spend debugging. Techniques can match a debugging context to a debugging hypothesis which might help explain the defect, but require an extensive database of debugging hypotheses to be effective in practice. To create such a database, we investigated using large language models (LLMs) to create hypotheses. We experimented with multiple approaches to generating hypotheses and found LLMs could partially generate hypotheses with a moderate amount of assistance required from human developers. The LLMs ChatGPT and Phind were effective at creating several elements of a debugging hypothesis, but made mistakes that required human intervention. While both could easily generate hypotheses in the correct format, both struggled at integrating information to create hypotheses. While LLMs were helpful for sketching parts of debugging hypotheses, it is important to consider ways to incorporate developer input to revise and refine them.

Published

2023-10-27

Issue

Section

College of Engineering and Computing: Department of Computer Science

Categories