Understanding Public Discourse Surrounding Vaccine Hesitancy in Social Media


  • Ron Mahabir Aspiring Scientists' Summer Internship Program Mentor
  • Olga Gkountouna Aspiring Scientists' Summer Internship Program Co-mentor
  • KULEEN SASSE Aspiring Scientists' Summer Internship Program Intern




Vaccine hesitancy is one of the most pressing and researched issues in the modern age. Understanding the discourse surrounding this topic is vitally important to public safety. Twitter allows researchers to study the discussions of vaccine hesitancy in real time. However, a problem arises as many different scientists have investigated this problem, leading to a glut of labelled tweets with different labeling schemes. This study proposes the use of additional textually-derived metrics to help improve our understanding of the labelling of tweets and the determinants surrounding vaccine hesitancy. We collect and combine public datasets from multiple different sources, standardize their labels, and augment them with derived metrics such as the percent of spelling error, ease of reading, and overall sentiment. Both machine learning and deep learning models were evaluated with and without the additional metrics. Overall, with the machine learning models, accuracy, precision, recall, and F1 decreased with the additional metrics. With the deep learning models, there was no difference between the models' performance with or without metrics. Our results suggest that while many advancements continue to be made in AI technology, deciphering human communication through short bursts of text still continues to pose a challenge to the research community.





College of Science: Department of Computational and Data Sciences