We will develop technology for geospatial database generation agents (GDGA) by combining six key components demonstrated in Phase I: 1) Use of intelligent crawling to identify and efficiently process OSINT sources relevant to a specific location or geographic feature 2) Spatio-temporal parsing using on new innovations to recognize and resolve location names and time expressions and an automated recognition engine built using conditional random fields 3) Text parsing using linguistic processing to parse texts into appropriate lexical, syntactic, and semantic units to identify a wide range of location features that may be expressed in tables or natural language sentences 4) A coordination component to seamlessly relate locations, times, and descriptive features to each other both within and across document boundaries using semantic queries 5) A reasoning component which provides a modular framework for assigning confidence scores and selecting the best factoids from a set of potentially conflicting candidates 6) A data management layer which facilitates user interaction on multiple levels of granularity and enables discovery, visualization, and export of data using open standards. In Phase II we will refine these components, integrate them into a web service and standalone application, and demonstrate a broad class of GDGA applications.
Keywords: Information Extraction, Toponym Resolution, Web Mining, Intelligent Agents