Robust Statistical Learning for Neuroimaging: Mitigating Artifacts in Conditional Generative AI Data Augmentation
DOI:
https://doi.org/10.13021/jssr2025.5332Abstract
Generative AI–based data augmentation can address sample size and privacy limitations in neuroimaging, but models like Conditional Variational Autoencoders (CVAEs), often produce artifacts (e.g., checkerboards). These artifacts act as outliers, violating Gaussian-error assumptions of least-squares regression and biasing statistical inference. To mitigate this bias, we developed a simulation framework to evaluate different loss functions for distributed surface-based regression. The framework generates noisy synthetic neuroactivity maps using a CVAE, combines them with original data at varying ratios, and performs distributed image-on-scalar regression (DISR) with bivariate penalized splines over triangulation (BPST), comparing least-squares and Huber loss functions. Results, measured by Mean Integrated Squared Error (MISE) across Monte Carlo replicates, show that synthetic artifacts inflate estimation errors in least‑squares regression. Conversely, Huber regression consistently outperformed least‑squares on both synthetic and mixed datasets, retaining accuracy with minimal efficiency loss despite artifact contamination. These findings establish Huber regression as a robust solution to artifact-induced outliers and underscores the need for robust statistical learning when using generative models in neuroimaging. The proposed method provides a scalable framework for improving statistical power in data-scarce biomedical imaging applications.
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.