A Direct Survey-Based Synthetic Population Generation Approach for Small Area Estimates
DOI:
https://doi.org/10.13021/jssr2025.5151Abstract
Small area estimation (SAE) is a statistical technique that can be used to analyze health outcomes and characteristics in small geographic areas. While the Centers for Disease Control and Prevention (CDC) sets a widely accepted standard for SAE using a multilevel regression approach, its methods are complex and not readily available for public use. To address this gap, we propose an open and more accessible approach that directly integrates public health survey data into the widely used iterative proportional fitting (IPF) technique to generate small area estimates for public health applications. We apply this method to estimate cancer, diabetes, and chronic obstructive pulmonary disease (COPD) prevalence, incorporating demographic factors such as age, race, income, gender, and education, along with health indicators including smoking status, body mass index, health insurance coverage, and urban living status. Our cancer estimates, when compared with New York county-level data, achieved an R2 of 0.555, comparable to CDC’s estimates, with an R2 of 0.61. For diabetes, comparison with Florida county-level data yielded an R2 of 0.437 (CDC: 0.475). For COPD, our estimates achieved an R2 of 0.462, surpassing the CDC with an R2 of 0.426. These results demonstrate that our approach can reasonably generate SAEs that are as accurate or more accurate than the gold standard by CDC. This study contributes to advancing SAE by offering an open and publicly available alternative for generating estimates that does not require complex statistical expertise, expanding access to tools that support public health research and decision-making.
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.