Evaluation of Synthetic Population Data Created Using Generative Adversarial Networks
DOI:
https://doi.org/10.13021/jssr2021.3201Abstract
The generation of realistic synthetic populations is an important function for many agent-based models to provide accurate predictions. The problem with synthetic population data lies within the high dimensional data and irregular distributions. However, deep generative models have been proposed to tackle this issue because of their ability to model arbitrary distributions with greater flexibility. This study presents a comparison and evaluation of synthetically generated populations with different generative adversarial network (GAN) models. We use the public use microdata sample (PUMS) of the population from Fairfax County, Virginia to evaluate the performance of a tabular GAN, conditional tabular GAN (CTGAN), and CopulaGAN, a variant of the CTGAN. Metrics from the TableEvaluator and SDV python libraries are used to measure correlations and probabilistic distributions of population attributes. We found that the CTGAN and the CopulaGAN both outperformed the tabular GAN, while the CTGAN narrowly outperformed the CopulaGAN's average similarity score by 2%. To compare models, we used various F1-scores including logistic regression, random forest classifiers, decision trees, and a multi-layer perceptron; then, we averaged the Jaccard similarity, a metric we used to compute the closeness between the real and synthetic F1 scores for each category. Our research can be applied to other regions in the United States and can be used to accurately model populations when only a small sample of the population is available.
Published
Issue
Section
Categories
License
Copyright (c) 2022 SRIHAN KOTNANA, Taylor Anderson, Andreas Züfle, Hamdi Kavak
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.