Using large language models to author debugging hypotheses
Debugging hypotheses are a central part of the debugging process, guiding developers to look for specific evidence to accept or reject their hypotheses about the cause of a defect. Finding a relevant debugging hypothesis quickly can greatly reduce the time developers spend debugging. Techniques can match a debugging context to a debugging hypothesis which might help explain the defect, but require an extensive database of debugging hypotheses to be effective in practice. To create such a database, we investigated using large language models (LLMs) to create hypotheses. We experimented with multiple approaches to generating hypotheses and found LLMs could partially generate hypotheses with a moderate amount of assistance required from human developers. The LLMs ChatGPT and Phind were effective at creating several elements of a debugging hypothesis, but made mistakes that required human intervention. While both could easily generate hypotheses in the correct format, both struggled at integrating information to create hypotheses. While LLMs were helpful for sketching parts of debugging hypotheses, it is important to consider ways to incorporate developer input to revise and refine them.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.