The goal of this SBIR project is to develop a new HPC Turnkey system that can automatically find mathematical invariants in large datasets, and then use those invariants to identify new relationships between datasets of different users. Specific goals are: 1) To compute the fundamental, symbolic invariants of any dataset submitted to the system; 2) To find the proximity between datasets as the commonality between invariant properties; 3) To map the world of datasets, finding clusters, new relationships, and hidden connections; 4) To incentivize our existing users to perform more analysis in cluster and cloud environments. Data-driven analysis underlies a growing portion of scientific and engineering discovery today. While our capacity for collecting, processing and storing data is growing exponentially, our ability to understand the data and convert it into meaningful knowledge is not keeping pace. Using a new HPC algorithm, we propose to identify mathematical invariants in large datasets, and use these invariants as genomes of datasets to match them to other datasets. This capability will open the door to identification of heretofore unknown relationships between seemingly unrelated datasets whose scientists are not even aware of each other. Furthermore, the desire to map ones research and data to the global community (without giving that data away) will increase incentive for HPC-based analysis in the science and engineering domains. Qualifications: The Principle investigators research on this technology has been featured in popular news outlets, including listed in Discover Magazine & apos;s Top 25 Stories of 2009 and covered by NPR & apos;s RadioLab program & quot;Limits of Science. & quot; Additionally, the Principle Investigator has developed a prototype software which has attracted over 20, 000 people working from home, schools, and small business environments. The software requires no training or prior experience with HPC tools. Nutionian Inc., established by the PI, is focused on bringing easy-to-use HPC data analytics to the non-expert crowds, and encouraging HPC analysis and identifying new applications. Motivation. Despite advances in generating, collecting and storing experimental data, the technology to convert data into meaningful analytical relations has not kept pace. A survey of current our users shows there is broad appeal to perform this analysis in an HPC system to accelerate their research. These users span a range of areas including biology, engineering, manufacturing, and economics; with diverse applications, from analyzing heat transfer dynamics to inferential sensing and time-series prediction. The proposed data genome project will prompt further analysis in order to connect to the genome, to gain deeper insight more easily, without ever needing to disclose private data.