Constructing a Comprehensive Dataset of Reproducible Flaky Tests
Abstract
Regression testing, an important part of software development, is essential for ensuring that new software changes don't break existing features and ensuring software reliability. This process relies on the assumption that test results are consistent: the same test should always produce the same outcome for the same version of code. However, flaky tests, which non deterministically pass or fail, reduce software reliability. These inconsistent tests hinder the ability to accurately identify software regression issues and slow down development. While datasets of flaky tests exist, it can be difficult to reproduce these tests because of their inherent nondeterminism and existing datasets mostly contains two categories of flaky tests. To address this problem, we aim to create a more comprehensive dataset of reproducible flaky tests including other categories as well. To address this, we study the issue reports and flaky test fixes from various open-source projects to create a script that can reliably reproduce some flaky test failures as described in these reports. Our dataset includes flaky test categories hardly found in prior datasets and helps contribute to a deeper understanding of these test failures. By creating this dataset, we hope to facilitate the development of advanced techniques for flaky test detection and resolution, ultimately improving software reliability.
Published
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.