An Investigation into the use of Newspaper Data and Large Language Models for Understanding the Occurrence of Flooding in the Caribbean

Authors

  • Krish Kalla Phillips Exeter Academy, Exeter, NH
  • Ron Mahabir Department of Computational and Data Sciences, George Mason University, Fairfax, VA
  • Maction Komwa Department of Geography and Geoinformation Science, George Mason University, Fairfax, VA
  • Olga Gkountouna Department of Computational and Data Sciences, George Mason University, Fairfax, VA

DOI:

https://doi.org/10.13021/jssr2025.5268

Abstract

Flooding remains the most prevalent weather‐related disaster worldwide, causing over $10 billion in annual property damage and significant loss of life—particularly in low‐and-middle‐income countries (LMICs), which account for nearly 90 % of those affected. Traditional flood‐defense projects and early‐warning systems, while effective, often far exceed the financial capacity of LMICs. To help prioritize limited resources, we propose leveraging local newspaper archives—an abundant, underutilized data source—to identify and map flood‐prone areas dynamically. In this study, we extend our previous analysis of Trinidad and Tobago’s two leading newspapers (Trinidad Express and Trinidad Newsday) through the end of 2024, increasing our dataset to over 5500 extracted flood‐event records. We applied four large language models (LLMs) for automated information extraction: ChatGPT o3, Anthropic's Claude 4, Microsoft's Phi4, and Meta's Llama 4. Both ChatGPT and Claude achieved over 90 % accuracy in detecting flood mentions and correctly geocoding their locations, while Phi 4 and Llama 4 reliably identified flood events but struggled to assign precise geographic coordinates. By combining the high‐precision extractions of ChatGPT and Claude with GIS visualization, we generated interactive maps that reveal temporal and spatial patterns of flooding—highlighting the communities most at risk and the predominant causes (e.g., river overflow, infrastructure failure, extreme rainfall) in each area. These results demonstrate that, even with modest budgets, governments and disaster‐management agencies in LMICs can harness natural‐language processing on freely accessible local news to target interventions where they are needed most, thereby reducing both economic losses and human suffering.

Published

2025-09-25

Issue

Section

College of Science: Department of Computational and Data Sciences