Phase II Amount
$1,149,868
Cloud computing has the potential to serve as a cost-effective and energy-efficient computing paradigm for scientists to accelerate discoveries. Extensive use of commercial cloud computing resources in the scientific community has the potential to lower costs, accelerate research, and enhance collaboration. However, cloud computing utilization is often suboptimal. Users typically overprovision to accommodate potential surges in server use, as well as to ensure that stateful applications, which cannot tolerate any downtime, are not interrupted. To reduce wasteful spending and enable more efficient usage of cloud resources, a technology is being developed that consolidates idle workloads and over-sized software containers to take advantage of deeply discounted server space such as the Spot Market. The technology is a combination of two separate products. The first module spawns containers on discounted VM instances (Spot Instances), and dynamically relocates containers between such instances, based on availability and price. A second technology packs idle containers onto a small number of VMs during the idle period, and relocates containers onto different VMs when workload increases, without any service interruption. This lack of service disruption is a fundamental departure from current market solutions that offer cloud optimization requiring manually re-architecting cloud infrastructure with significant downtime during testing and redeployment. In Phase I, live migration of government High-Performance Computing (HPC) workloads within a single public cloud was demonstrated. The measured savings were up to 80% as compared to on demand costs, with the same performance (i.e. 5x the amount of compute for the same cost). The ability to do similar migrations with similar value in other public clouds is required to address substantial commercial opportunities and DOE user needs. Also, demonstrating a successful hybrid cloud live migration of workloads between on-premises private cloud to the public cloud could lead to significant cost savings without changing a line of code for the application, presenting a potential approach for migrating to the cloud in an inexpensive and low-risk manner. Finally, successfully establishing the platform to support GPUs could result in a potentially large number of highly compute-intensive DOE applications being run in the spot market of multiple public GovClouds at significant cost savings. During the Phase II award, multi-public cloud support will be developed, hybrid cloud support established, and capabilities extended to GPU processing.