An Investigation into the use of Newspaper Data and Large Language Models for Understanding the Occurrence of Flooding in the Caribbean
Abstract
Flooding is the most common and widespread weather-related natural disaster, causing over $10 billion in property damage and countless losses of lives and livestock globally. Governments and institutions often struggle with inadequate flood protection and relief, particularly in low-and-middle-income countries (LMICs), where nearly 90% of those affected reside. Although early warning systems and infrastructure projects are implemented as solutions, they are often prohibitively expensive for LMICs. A more cost-effective solution is the development of timely flood maps, which relies on up-to-date flood data. Traditional methods, such as deploying large survey teams, are costly and impractical, while satellite imagery and radar data are often hindered by cloud cover and interpretative challenges.
Newspaper reports on flooding, however, present an underutilized data source. They are widely available, have extensive archives, and can be curated online. In this study, we utilize newspaper data from Trinidad and Tobago, a small Caribbean Island where flooding remains a significant issue. Using large language models (LLMs) - ChatGPT, Phi3, and LLaMa - we extract information on flood locations and causes reported in online newspapers. This data is then used to create an updated flood risk map, reflecting the frequency and causes of flooding at the community level. Our results show the most at-risk communities and their flooding causes. The comparison of LLMs demonstrates an accuracy range of 89% to 94% in identifying flood locations, with differences within 10%. This approach enhances flood mapping by providing current, actionable data to governments and institutions, improving flood management and response strategies.
Published
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.