Robust Statistical Learning for Neuroimaging: Mitigating Artifacts in Conditional Generative AI Data Augmentation

Andrew Chen; Charlie Wang; Yang Long; Zhiling Gu; Lily Wang

doi:10.13021/jssr2025.5332

Authors

Andrew Chen Department of Statistics, George Mason University, Fairfax, VA
Charlie Wang Department of Statistics, George Mason University, Fairfax, VA
Yang Long Department of Statistics, George Mason University, Fairfax, VA
Zhiling Gu Department of Biostatistics, Yale University, New Haven, CT
Lily Wang Department of Statistics, George Mason University, Fairfax, VA

DOI:

https://doi.org/10.13021/jssr2025.5332

Abstract

Generative AI–based data augmentation can address sample size and privacy limitations in neuroimaging, but models like Conditional Variational Autoencoders (CVAEs), often produce artifacts (e.g., checkerboards). These artifacts act as outliers, violating Gaussian-error assumptions of least-squares regression and biasing statistical inference. To mitigate this bias, we developed a simulation framework to evaluate different loss functions for distributed surface-based regression. The framework generates noisy synthetic neuroactivity maps using a CVAE, combines them with original data at varying ratios, and performs distributed image-on-scalar regression (DISR) with bivariate penalized splines over triangulation (BPST), comparing least-squares and Huber loss functions. Results, measured by Mean Integrated Squared Error (MISE) across Monte Carlo replicates, show that synthetic artifacts inflate estimation errors in least‑squares regression. Conversely, Huber regression consistently outperformed least‑squares on both synthetic and mixed datasets, retaining accuracy with minimal efficiency loss despite artifact contamination. These findings establish Huber regression as a robust solution to artifact-induced outliers and underscores the need for robust statistical learning when using generative models in neuroimaging. The proposed method provides a scalable framework for improving statistical power in data-scarce biomedical imaging applications.

Robust Statistical Learning for Neuroimaging: Mitigating Artifacts in Conditional Generative AI Data Augmentation

Authors

DOI:

Abstract

Published

Issue

Section

License

assip