Phase II year
2019
(last award dollars: 1689510692)
Data scientists and analysts across the military, intelligence, and law enforcement communities are building machine learning models to classify and predict, but often struggle to create relevant, labeled training data-particularly for human-generated, domain-specific text. Several services can outsource data labeling, but this is not an option for national security data, which often cannot be shared and requires special knowledge to understand. Thresher's QuickCode creates labeled training data for machine learning algorithms from unstructured text, of any size, in any language. Thresher's patented technology allows experts to generate training data in a fraction of the time compared to hand labeling with comparable accuracy. The 480th USAF ISR Wing generates intelligence reports for a wide group of customers. Analysts generating these reports are tasked to provide historical context; however, data in their historical archives is not tagged to allow them to collect, review, and analyze it in the given time constraint. Thresher's proposal identifies a method for using QuickCode to tag the 480th's historical data and create a predictive model to tag future reports, providing the 480th analysts with a research tool to dramatically reduce the time required to create their reports.