There is an abundant amount of unstructured data about historical events — news articles, government reports, and local bulletins — but extracting this information manually at scale is impossible. Our methodology analyzes news reports where flooding is a primary subject. We then use the Google Read Aloud user-agent to isolate primary text from 80 languages, which is standardized into English via the Cloud Translation API.
The most critical step of the extraction process is done using the Gemini Large Language Model (LLM). We engineered a sophisticated prompt that guides Gemini through a strict analytical verification process:
- Classification: The model distinguishes between reports of actual, ongoing, or past floods and articles that merely discuss future warnings, policy meetings, or general risk modeling.
- Temporal reasoning: Gemini anchors relative references (e.g., “last Tuesday”) against an article’s publication date to determine precise event timing.
- Spatial precision: The system identifies granular locations (neighborhoods and streets) and maps them to standardized spatial polygons using using Google Maps Platform.
The technical validation of Groundsource confirms its reliability for high-stakes research. In manual reviews, we found that 60% of extracted events were accurate in both location and timing. Crucially, 82% were accurate enough to be practically useful for real-world analysis — for example, by capturing the correct administrative district or pinpointing the event within a single day of its reported peak.
The coverage provided by Groundsource represents a massive-scale expansion over existing archives. By transforming unstructured media into data, we have generated 2.6 million events — a significant increase compared to the records found in traditional monitoring systems. Furthermore, spatiotemporal matching shows that Groundsource captured between 85% and 100% of the severe flood events recorded by GDACS between 2020 and 2026, a demonstration of its effectiveness in identifying high-impact disasters alongside smaller, localized events.

