DOE 2022 Tympana - Machine Learning Assisted Data Annotation

Tympana - Machine Learning Assisted Data Annotation
Award last edited on: 9/5/22

Awarding Agency

DOE

Total Award Amount

$254,627

Award Phase

Solicitation Topic Code

C53-01a

Principal Investigator

Paul Yacci

DataCicada LLC

4277 County Road
Canandaigua, NY 14424

(585) 350-8757

solutions@datacicada.com

www.datacicada.com

Location: Single
Congr. District: 27
County: Ontario

Phase I

Contract Number: DE-SC0022459
Start Date: 2/14/22 Completed: 11/13/22

Phase I year

2022

Phase I Amount

$254,627

Artificial Intelligence and Machine Learning tools are increasingly being used to solve complex problems across diverse applications. While these tools are being routinely used for domains with high volumes of data, they have not deeply integrated into heterogenous data of the complex sciences. Furthermore, humans play a critical role in validation of data truth that models learn from, as well as validating those models have learned the correct unbiased concepts from the data. Typically, subject matter experts are highly knowledgeable about their specific domain of interest (particle physics, viral genomics etc.), but do not have the resources to hand curate large-scale datasets necessary to make their data Artificial Intelligence-ready. Labeling data to make it Artificial Intelligence-ready reduces time and complexity to build Machine Learning models, allowing scientists to apply their domain knowledge at scale to new data points. Using a cloud-hosted solution with a web interface, subject matter experts will provide annotations in a customizable user interface that is domain dependent (image viewer, graph selection tools, sequence annotation tools, geospatial map, etc.) and the platform will learn from annotations to apply machine labels to new data points with high confidence. Through active learning, the platform will only ask the expert to focus on data points that provide the highest value for the learning algorithm. Over time the models will pre-annotate data to the user for review, and eventually reach a point where they can self-annotate with confidence. Phase I will result in building and testing, a scientific machine learning assisted data annotation platform to demonstrate the feasibility of this approach focused on systems biology and bioenergy applications. It is anticipated that biological systems such as plant and microbial systems data, especially relevant to sustainable energy will be able to use this platform. Innovations in life sciences, from sequencing data to advanced imaging data to SARS-CoV-2 data have implications for the greater good for the public and therefore the economy as well. Once the core learning approach is developed during Phase I, only minor modifications on the user interface will be required to apply to new scientific application areas. This will be the focus of Phase II integrating the platform to areas such as Chemical, Geochemical, and Biogeochemical data.

Phase II

Contract Number: ----------
Start Date: 00/00/00 Completed: 00/00/00

Phase II year

----

Phase II Amount

----

SBIR-STTR Award

Tympana - Machine Learning Assisted Data Annotation
Award last edited on: 9/5/22

Sponsored Program

Awarding Agency

Total Award Amount

Award Phase

Solicitation Topic Code

Principal Investigator

Company Information

DataCicada LLC

Phase I

Phase I year

Phase I Amount

Phase II

Phase II year

Phase II Amount

New To Inknowvation.com?

SBIR-STTR Award

Tympana - Machine Learning Assisted Data AnnotationAward last edited on: 9/5/22

Sponsored Program

Awarding Agency

Total Award Amount

Award Phase

Solicitation Topic Code

Principal Investigator

Company Information

DataCicada LLC

Phase I

Phase I year

Phase I Amount

Phase II

Phase II year

Phase II Amount

Tympana - Machine Learning Assisted Data Annotation
Award last edited on: 9/5/22