SBIR-STTR Award

Mission-relevant Artifact Repository in StreamlinedML (MARS)
Award last edited on: 11/6/2023

Sponsored Program
SBIR
Awarding Agency
DOD : AF
Total Award Amount
$1,233,337
Award Phase
2
Solicitation Topic Code
AF221-D015
Principal Investigator
Brianna Maze

Company Information

NextGen Federal Systems LLC

1399 Stewartstown Road Suite 350
Morgantown, WV 26505
   (304) 413-0208
   N/A
   www.nextgenfed.com
Location: Multiple
Congr. District: 02
County: Monongalia

Phase I

Contract Number: FA8750-22-C-0521
Start Date: 8/29/2022    Completed: 11/29/2024
Phase I year
2022
Phase I Amount
$1
Direct to Phase II

Phase II

Contract Number: FA8750-22-C-0521
Start Date: 8/29/2022    Completed: 11/29/2024
Phase II year
2022
Phase II Amount
$1,233,336
Data-driven machine learning approaches have been shown to outperform statistical approaches in most settings. However, results in the literature have historically been difficult to reproduce when the code or datasets are not available and the runtime environment is not well documented. To combat this, the ML community is moving towards an ‘open-source’ approach, where code and data are encouraged to be publicly available to ensure reproducibility. With this new open-source approach, researchers can gain understanding and apply lessons learned from another domain to their own domain. However, there is not currently a single catalog of all datasets and models that are available for use, but rather one-off repositories scattered across various sites. Each of the linked repositories in the catalog provides its own pipeline for transforming data, training a model, performing prediction, and evaluating a model. This process of building out a pipeline for each problem is costly, time-consuming, and leads to results that are not reproducible and not easily comparable against like algorithms. The DoD community would benefit from a searchable catalog of ML artifacts that can be used across domains. This searchable catalog of DoD ML artifacts will provide the benefit of deduplication of work, rapid experimentation and prototyping of ML techniques, and life-cycle management of ML artifacts, all on a government owned platform. This in turn, will help accelerate the advancement of AI and ML across DoD agencies. The proposed MARS solution enhances and extends the powerful SML ecosystem to produce a mission-relevant catalog of ML artifacts. The SML ecosystem is “a set of tools and processes that standardize, accelerate, and reduce the cost of developing and deploying custom ML capabilities in DoD environments.” Its stated goal is to “become a common standard for development and acquisition of ML technologies while increasing technology sharing and discoverability within the DoD.” Implementing this solution requires a unique combination of innovative software, DoD community engagement, and pragmatic IT deployment. The MARS system includes the following components: an Enhanced Artifact Catalog & Discovery capability which uses a mission relevant metadata and taxonomy to provide enhanced discovery of models and datasets, and ML-based semantic search of catalog; a Guided UI for Discovery & ML Pipeline Authoring which incorporates ML best practices and smart recommendation systems to inform users of useful approaches, models, and datasets in their ML pipelines; a Curated Artifact Repository with DoD-specific SOTA models, transformations, and datasets wrapped with MISTK and registered with the SML framework; and Tailorable Deployment Packages that provide cloud-native, hardened, images and scripts for quickly deploying a “MARS in a box” instance specific to a given DoD customer.