Tympana - Machine Learning Assisted Data Annotation
Award last edited on: 9/5/22

Sponsored Program
Awarding Agency
Total Award Amount
Award Phase
Solicitation Topic Code
Principal Investigator
Paul Yacci

Company Information

DataCicada LLC

4277 County Road
Canandaigua, NY 14424
   (585) 350-8757
Location: Single
Congr. District: 27
County: Ontario

Phase I

Contract Number: DE-SC0022459
Start Date: 2/14/22    Completed: 11/13/22
Phase I year
Phase I Amount
Artificial Intelligence and Machine Learning tools are increasingly being used to solve complex problems across diverse applications. While these tools are being routinely used for domains with high volumes of data, they have not deeply integrated into heterogenous data of the complex sciences. Furthermore, humans play a critical role in validation of data truth that models learn from, as well as validating those models have learned the correct unbiased concepts from the data. Typically, subject matter experts are highly knowledgeable about their specific domain of interest (particle physics, viral genomics etc.), but do not have the resources to hand curate large-scale datasets necessary to make their data Artificial Intelligence-ready. Labeling data to make it Artificial Intelligence-ready reduces time and complexity to build Machine Learning models, allowing scientists to apply their domain knowledge at scale to new data points. Using a cloud-hosted solution with a web interface, subject matter experts will provide annotations in a customizable user interface that is domain dependent (image viewer, graph selection tools, sequence annotation tools, geospatial map, etc.) and the platform will learn from annotations to apply machine labels to new data points with high confidence. Through active learning, the platform will only ask the expert to focus on data points that provide the highest value for the learning algorithm. Over time the models will pre-annotate data to the user for review, and eventually reach a point where they can self-annotate with confidence. Phase I will result in building and testing, a scientific machine learning assisted data annotation platform to demonstrate the feasibility of this approach focused on systems biology and bioenergy applications. It is anticipated that biological systems such as plant and microbial systems data, especially relevant to sustainable energy will be able to use this platform. Innovations in life sciences, from sequencing data to advanced imaging data to SARS-CoV-2 data have implications for the greater good for the public and therefore the economy as well. Once the core learning approach is developed during Phase I, only minor modifications on the user interface will be required to apply to new scientific application areas. This will be the focus of Phase II integrating the platform to areas such as Chemical, Geochemical, and Biogeochemical data.

Phase II

Contract Number: ----------
Start Date: 00/00/00    Completed: 00/00/00
Phase II year
Phase II Amount