Understanding Public Discourse Surrounding Vaccine Hesitancy in Social Media
Vaccine hesitancy is one of the most pressing and researched issues in the modern age. Understanding the discourse surrounding this topic is vitally important to public safety. Twitter allows researchers to study the discussions of vaccine hesitancy in real time. However, a problem arises as many different scientists have investigated this problem, leading to a glut of labelled tweets with different labeling schemes. This study proposes the use of additional textually-derived metrics to help improve our understanding of the labelling of tweets and the determinants surrounding vaccine hesitancy. We collect and combine public datasets from multiple different sources, standardize their labels, and augment them with derived metrics such as the percent of spelling error, ease of reading, and overall sentiment. Both machine learning and deep learning models were evaluated with and without the additional metrics. Overall, with the machine learning models, accuracy, precision, recall, and F1 decreased with the additional metrics. With the deep learning models, there was no difference between the models' performance with or without metrics. Our results suggest that while many advancements continue to be made in AI technology, deciphering human communication through short bursts of text still continues to pose a challenge to the research community.
Copyright (c) 2022 KULEEN SASSE, Ron Mahabir, Olga Gkountouna
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.