Current data analysis methodologies in high-energy physics often fall short when managing large scale processing tasks over distributed datasets and used by distributed members within a collaboration or working group. There is no commnon semantic to describe analysis workflow and its attributes for the myriad of complex process types comprising a typical physics study. Without a formal syntax, clarity and composition of methodologies, reproducibility of results, and portability of execution are difficult to achieve over the lifetime of typical high energy physics experiment. This project will develop process oriented programming methods and environrments for the production and analysis of distributed datasets in high energy particle physics. This would result in an "Analysis Process Management" system comprised of a reduction engine for process execution, a toolkit for user composition of processes, and a robust set of client tools for analysis, monitoring, and debugging of running processes. The focus of Phase I was on modeling the workflow and replacing the execution module. A prototype system was developed, which provided scalability, recoverability, and process clarity to the Atlas experiment. Phase II will focus on software-controlled process orchestration, specifically the management and analysis of such processes. A foundation of generic tooling will be created, which could be used by operators in assessing and manipulating destributed datasets.
Commercial Applications and Other Benefits as described by the awardee: A system that improves clarity of expression in applications with large data sets should be of interest in both the physics and business communities. Specific opportunites for commercialization include supply chain management, epidemiology, and computational biology.