Constructing a Comprehensive Dataset of Reproducible Flaky Tests

Chuyan Wang; Mahbub-Ul-Hoque Sumon; Md Erfan; Md Mahmudul Hasan Pious; Suzzana Rafi; August Shi; Wing Lam

Authors

Chuyan Wang Department of Computer Science, George Mason University, Fairfax, VA
Mahbub-Ul-Hoque Sumon Genex Infosys, Bangladesh
Md Erfan Department of Computer Science and Engineering, University of Barishal, Barishal, Bangladesh
Md Mahmudul Hasan Pious Department of Computer Science, George Mason University, Fairfax, VA
Suzzana Rafi Department of Computer Science, George Mason University, Fairfax, VA
August Shi Department of Electrical and Computer Engineering, University of Texas at Austin, Austin, TX
Wing Lam Department of Computer Science, George Mason University, Fairfax, VA

Abstract

Regression testing, an important part of software development, is essential for ensuring that new software changes don't break existing features and ensuring software reliability. This process relies on the assumption that test results are consistent: the same test should always produce the same outcome for the same version of code. However, flaky tests, which non deterministically pass or fail, reduce software reliability. These inconsistent tests hinder the ability to accurately identify software regression issues and slow down development. While datasets of flaky tests exist, it can be difficult to reproduce these tests because of their inherent nondeterminism and existing datasets mostly contains two categories of flaky tests. To address this problem, we aim to create a more comprehensive dataset of reproducible flaky tests including other categories as well. To address this, we study the issue reports and flaky test fixes from various open-source projects to create a script that can reliably reproduce some flaky test failures as described in these reports. Our dataset includes flaky test categories hardly found in prior datasets and helps contribute to a deeper understanding of these test failures. By creating this dataset, we hope to facilitate the development of advanced techniques for flaky test detection and resolution, ultimately improving software reliability.

Constructing a Comprehensive Dataset of Reproducible Flaky Tests

Authors

Abstract

Published

Issue

Section

License

assip