Analysis of COVID-19 Genome Data in the US
Phylogenetic analysis of the COVID-19 virus is vital to identify the various strains, where and when the first case of a strain originated, and the spread dynamics of each strain. We extracted COVID-19 phylogenetic data from the GISAID website for two states in the US, Louisiana and Oregon, and parsed the data into a format that was usable. The data includes the geographic flows of many COVID-19 strains and lineages with origin and destination locations, strain names, recorded dates, and the variants from December 2019 to June 2021. Using various network analysis and visualization tools, we create spatial visualizations of the phylogenetic trees from the data. Future work will enrich these visualizations to examine the connectivity, disease, and sociodemographic characteristics of the regions where new strains emerge. By analyzing thousands of cases of various strains provided by GISAID that include the location and timestamp of each node, we hypothesize that similarities in socioeconomic and geographic factors can be drawn between the mutation originating nodes. Our aim is that our visualizations will aid in communicating the flows of the different strains of the COVID-19 pandemic geographically, identify the relationship between strains and the various characteristics of the regions in which they emerge. Thus, we hope that as a result of our research, we can identify the regions that are at risk of the emergence of new strains. From our results, we can deepen our understanding of COVID-19 in hopes of being better prepared for the next pandemic.
Copyright (c) 2022 NICOLE LIANG, Taylor Anderson, Andreas Züfle, Hamdi Kavak, Amira Roess
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.