DOE 2021 Log-driven Infrastructure Analytics and Management (LIAM)

Log-driven Infrastructure Analytics and Management (LIAM)
Award last edited on: 1/5/2023

Awarding Agency

DOE

Total Award Amount

$1,900,000

Award Phase

Solicitation Topic Code

C51-05b

Principal Investigator

Partha Bhaumik

Ennetix Inc (AKA: Putah Green Solutions)

1477 Drew Avenue Suite 106
Davis, CA 95618

(530) 574-7084

info@ennetix.com

www.ennetix.com

Location: Multiple
Congr. District: 03
County: Yolo

Phase I

Contract Number: DE-SC0021575
Start Date: 2/22/2021 Completed: 10/21/2021

Phase I year

2021

Phase I Amount

$250,000

The ubiquity of cloud-delivered applications and services and the always-on nature of personal and business communications have driven data traffic to grow at unprecedented rates and created virtualized, dynamic, and distributed application-delivery infrastructures. Assuring availability, security, and performance in such an environment poses a real challenge to IT departments. Therefore, traditional IT Ops has given way to DevOps to speed up ITs service response to rapidly-changing demands from their stakeholders. The rate of configuration changes, which include software updates in a DevOps environment, is by design an order of magnitude greater than in a traditional IT Ops environment. Now, IT organizations are trying to leverage machine learning and advanced analytics to further automate and improve responsiveness of infrastructure services. The impetus for configuration changes is originating not just from their stakeholders, but also from the increasing use of software-defined elements in the infrastructure. This new trend is referred to as Algorithmic IT Ops (AIOps). An environment that uses automated provisioning and software-defined or- chestration cannot ignore the impact of frequent configuration changes/updates (manifested in system/server logs) on application infrastructure performance. Additionally, in a distributed infrastructure, the impact of third-party-managed services on application and network performance is extremely significant. Thus, it becomes imperative to understand what rele- vant events are occuring outside the enterprises traditional infrastructure boundaries and how those events impact its ability to meet its performance objectives. Information provided by non-traditional, textualdata sources, e.g., API logs, outage updates, emails, and incident reports, that manifest outage and issues on third-party-managed infrastructures, become critical in infrastructure performance analytics. Todays performance-management tools primarily use numerical network-traffic-related data and limited textual data such as syslogs in silos. Mining pertinent information from textual log/event data, and correlating them with numerical performance data in unison on the same analytics platform will lead to much faster troubleshooting of application/service infrastructure performance issues. Considering these realities, in this Phase I SBIR project, Ennetix will develop a novel, log-driven infras- tructure analytics and management service, called LIAM, to enhance availability, security, and performance of distributed infrastructures, and greatly accelerate root-cause analysis of infratructure problems. LIAM will mine non-traditional textual data, such as system/server logs, configuration change logs, outage reports, event reports from other IT service management products, etc.; and correlate them with numerical network trace and server/host performance data. LIAM will feature advanced machine-learning techniques based on topic mining, novelty detection, and clustering; and it will be built on a scalable architecture to accom- modate other user-defined categorical data sources. LIAM will bring useful additional context to analyzing performance anomalies to reduce application/service interruptions, and accelerate root-cause identification and service restoration. The proposed solution will greatly benefit IT administrators and managers at DOE and other government organizations through a new approach for infrastructure performance management in todays cloud-based, dynamic, and distributed IT infrastructures. The wider benefits of this effort will extend well beyond the immediate DOE scientific community, and on to other enterprises, network operators, and cloud-service providers. In particular, many digital enterprises and commercial cloud-service providers can leverage the proposed service to proactively troubleshoot performance issues for their distributed application/service de- livery infrastructures.

Phase II

Contract Number: DE-SC0021575
Start Date: 4/4/2022 Completed: 4/3/2024

Phase II year

2022

Phase II Amount

$1,650,000

Digital transformation of enterprises and emergence of cloud-delivered applications and services have created virtualized, dynamic, and distributed IT infrastructures. Assuring availability, security, and performance in such an environment poses a real challenge to IT departments. Traditional IT Ops has given way to DevOps to speed up ITs service response to rapidly changing demands from their stakeholders. The rate of configuration changes in a DevOps environment is an order of magnitude greater than in a traditional IT Ops environment. Now, IT organizations are trying to leverage machine learning and advanced analytics to further automate and improve responsiveness of infrastructure services. This new trend is referred to as Artificial Intelligence for IT Ops (AIOps). DevOps environment which uses automated provisioning and software- defined orchestration cannot ignore the impact of frequent configuration changes/updates (manifested in system/server logs) on application infrastructure performance. Information provided by non-traditional, textual data sources, e.g., syslogs, API logs, outage reports, etc. that manifest as issues on infrastructures, become critical in infrastructure performance analytics. Todays performance-management tools primarily use numerical network-traffic-related data and limited textual data such as syslogs in silos. Mining pertinent information from textual log/event data and correlating them with numerical performance data on the same analytics platform will lead to faster troubleshooting of application/service infrastructure performance issues. Considering these realities, in this Phase II SBIR project, Ennetix will develop a novel, log-driven infrastructure analytics and management service, called LIAM, to enhance availability, security, and performance of modern IT infrastructures, and greatly accelerate root-cause analysis of issues. LIAM will mine non- traditional textual data, such as system/server logs, configuration change logs, outage reports, and event re- ports from other IT management platforms; and correlate them with numerical network trace and server/host performance data. LIAM will feature advanced machine-learning techniques based on topic mining, novelty detection, and clustering; and it will be built on a scalable architecture to accommodate other user-defined categorical data sources. LIAM will bring useful additional context to analyzing performance anomalies to reduce application/service interruptions and accelerate root-cause identification and service restoration. During Phase I of this SBIR project, requirements analysis and design of the LIAM platform were conducted, a working prototype was developed, and evaluation studies have been performed to determine LIAMs effectiveness to support IT operations by faster root-cause analytics and troubleshooting of modern IT infrastructures. These feasibility and performance evaluation studies have been accomplished using live data gathered from a large campus IT infrastructure (namely, UC Davis). Outcomes of the Phase I R&D efforts and evaluation studies have confirmed the viability of LIAM as a commercial-grade solution. In this Phase II project (as a continuation of Phase I), the goal is to significantly expand LIAM with analytical features, AI/ML models, third-party integrations, automation methods, and innovative visualizations. A commercial-grade LIAM solution will be developed using which IT operations team can proactively manage the performance of distributed infrastructures. Early trials will be accomplished to demonstrate the functionalities and performance of LIAM on live networks and pave the way to successful market entry and deployment on premier R&E organizations such as UC Davis. The proposed solution will greatly benefit IT administrators and managers at DOE and other organizations through a new approach for IT management which considers various data sources (both textual and numerical) along with traffic data and significantly reduces operational expenditures. The wider benefits of this effort will extend well beyond the immediate DOE scientific community, and on to other enterprises, network operators, and cloud-service providers, who will be able to leverage the proposed LIAM solution to proactively manage their cloud-based, distributed, and dynamic application-delivery infrastructures.

SBIR-STTR Award

Log-driven Infrastructure Analytics and Management (LIAM)
Award last edited on: 1/5/2023

Sponsored Program

Awarding Agency

Total Award Amount

Award Phase

Solicitation Topic Code

Principal Investigator

Company Information

Ennetix Inc (AKA: Putah Green Solutions)

Phase I

Phase I year

Phase I Amount

Phase II

Phase II year

Phase II Amount

New To Inknowvation.com?

SBIR-STTR Award

Log-driven Infrastructure Analytics and Management (LIAM)Award last edited on: 1/5/2023

Sponsored Program

Awarding Agency

Total Award Amount

Award Phase

Solicitation Topic Code

Principal Investigator

Company Information

Ennetix Inc (AKA: Putah Green Solutions)

Phase I

Phase I year

Phase I Amount

Phase II

Phase II year

Phase II Amount

Log-driven Infrastructure Analytics and Management (LIAM)
Award last edited on: 1/5/2023