SBIR-STTR Award

Model Zero - Reinforcement Learning for Policy Transfer into High-Fidelity Environments
Award last edited on: 11/6/2023

Sponsored Program
SBIR
Awarding Agency
DOD : AF
Total Award Amount
$999,822
Award Phase
2
Solicitation Topic Code
AF212-D001
Principal Investigator
William Li

Company Information

Heron Systems Inc

22685 Three Notch Road Unit B
California, MD 20619
   (301) 866-0330
   bd@heronsystems.com
   www.heronsystems.com
Location: Single
Congr. District: 05
County: St. Mary's

Phase I

Contract Number: FA8750-22-C-0504
Start Date: 1/10/2022    Completed: 7/10/2023
Phase I year
2022
Phase I Amount
$1
Direct to Phase II

Phase II

Contract Number: FA8750-22-C-0504
Start Date: 1/10/2022    Completed: 7/10/2023
Phase II year
2022
Phase II Amount
$999,821
Recent demonstrations of super-human performance leveraging reinforcement learning has shown the power of Artificial Intelligence (AI) for solving high dimensional complex problems through long-term decision making. However, most reinforcement learning approaches require hundreds of millions or even billions of training samples to achieve high performing policies. In this paper we propose a novel model based reinforcement learning algorithm call Model Zero that constructs a world model learned from a lower fidelity simulation and transfers the world model into a high fidelity simulation. Model Zero leverages its existing knowledge learned from a low fidelity simulation and continues to learn a more accurate world model after being transferred into the high fidelity simulation. A key advantage of using our model based reinforcement learning approach is that the methods are general and can be applied to various deterministic high-fidelity physics based environments where training samples are computationally expensive to obtain.