The broader impact/commercial potential of this Small Business Innovation Research (SBIR) Phase I project is a coupled guidance and control flight software solution that enables multi-spacecraft systems to be truly autonomous in their station-keeping and self-distribution. The proposed innovation is for a deep reinforcement learning (DRL) agent to learn how to autonomously determine and command maneuvers that yield a desired spacecraft formation. The proposed project could reduce customers? cost of operations by removing humans from the closed loop control system. The proposed innovation scales to systems of large numbers of spacecraft without scaling the cost of flight operations. Other potential benefits to customers are risk reductions: a DRL agent does not require an exact model of spacecraft subsystems and orbital dynamics and can learn in real-time, thus being robust to off-nominal system performance and unexpected perturbations. Benefits of this project may include a DRL agent discovering novel guidance and control solutions for mission designs that are not known from legacy orbital dynamics approaches. Potential broader societal impacts include enabling Deep Space Gateway operations and science missions to sample in-situ, simultaneous measurements over a large area, resulting in valuable science data returns for research or commercial applications in Earth orbit or deep space.This Small Business Innovation Research Phase I project will demonstrate the technical feasibility of implementing DRL as a solution for truly autonomous spacecraft guidance and control. The challenge motivating this project is the control of multi-spacecraft systems, where maneuver planning is neither intuitive nor straightforward due to the nonlinear equations of relative motion. Historically, solutions are found by making simplifying assumptions of circular orbits and linearized equations of relative motion. Such assumptions are avoided in this research plan. The primary research objective is to train a DRL agent using the high-fidelity model of NASA?s General Mission Analysis Tool (GMAT). First, the problem of achieving a particular formation or distribution will be formulated as a Markov Decision Process. Next, software infrastructure will be developed using TensorFlow, Python, and GMAT. Within this framework, the DRL agent will be trained to learn a policy that maneuvers the spacecraft into a specified formation, subject to operations constraints like available propellant and actuation limits. Anticipated technical results include comparisons of on-policy versus off-policy approaches to achieving coordinated spacecraft mission, demonstration of the technical feasibility of DRL-based flight software for guidance and control, and a characterization of the limitations of DRL-based control.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.