An Investigation into the use of Newspaper Data and Large Language Models for Understanding the Occurrence of Flooding in the Caribbean
DOI:
https://doi.org/10.13021/jssr2025.5268Abstract
Flooding remains the most prevalent weather‐related disaster worldwide, causing over $10 billion in annual property damage and significant loss of life—particularly in low‐and-middle‐income countries (LMICs), which account for nearly 90 % of those affected. Traditional flood‐defense projects and early‐warning systems, while effective, often far exceed the financial capacity of LMICs. To help prioritize limited resources, we propose leveraging local newspaper archives—an abundant, underutilized data source—to identify and map flood‐prone areas dynamically. In this study, we extend our previous analysis of Trinidad and Tobago’s two leading newspapers (Trinidad Express and Trinidad Newsday) through the end of 2024, increasing our dataset to over 5500 extracted flood‐event records. We applied four large language models (LLMs) for automated information extraction: ChatGPT o3, Anthropic's Claude 4, Microsoft's Phi4, and Meta's Llama 4. Both ChatGPT and Claude achieved over 90 % accuracy in detecting flood mentions and correctly geocoding their locations, while Phi 4 and Llama 4 reliably identified flood events but struggled to assign precise geographic coordinates. By combining the high‐precision extractions of ChatGPT and Claude with GIS visualization, we generated interactive maps that reveal temporal and spatial patterns of flooding—highlighting the communities most at risk and the predominant causes (e.g., river overflow, infrastructure failure, extreme rainfall) in each area. These results demonstrate that, even with modest budgets, governments and disaster‐management agencies in LMICs can harness natural‐language processing on freely accessible local news to target interventions where they are needed most, thereby reducing both economic losses and human suffering.
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.