The broader impact/commercial potential of this Small Business Innovative Research (SBIR) project is to addresses challenges in understanding, analyzing, and visualizing data from large sets of unsorted, noisy data associated with massive next generation sequencing (NGS). These projects frequently are focused on pathogen transmission patterns, drug resistance, and general epidemiology and employ a process called "clustering"; however, current clustering tools are rudimentary, not intuitive, poorly documented and provide little help with data management and visualization. The goal is to develop software for Clustering and Associating Sequences in a Personalized Environment (CASPER). This software will bring much needed state-of-the-art software engineering and visualization technology to NGS sequence analysis that results in finding correlations in disparate data-types that are currently overlooked. Further, this software addresses commercial demands for integrated bioinformatics that speed discovery using contemporary and innovative technologies that enhance the end-user experience. This will increase the ability of researchers to combat major health challenges, perform biological research and develop effective interventions to prevent and treat illness.This SBIR Phase I project proposes to develop a bioinformatics application designed for biological researchers to explore the evolutionary relationships in very large sequence data sets. These data are commonly associated with multiple annotations and there are time-consuming hurdles in acquiring a meaningful visual representation of their relationships, especially in combination with geospatial, demographic and/or temporal data. Further, while many bioinformatics applications/approaches focus on achieving a single analytical task, the proposed software focuses extensively on the end-user, so that efficient and accurate data processing are combined with rich and meaningful graphical outputs. In addition, it will provide a graphical database management system (GDBS) built around the researcher's data as it is imported, resulting in fewer errors. A database linked to analytical results allows for rapid result filtering as well as instantaneous updates as data sets expand over time. Integrated visualization tools allow researchers to produce varied network graphics that can show how results change over time. In Phase I, the goal is to focus on developing a framework to optimize the end-user experience (e.g., speed, intuitive design, useful formatting of results). The project brings together a powerful and unique group of scientists in the fields of software design, computer modeling, data visualization, bioinformatics, genetic analysis and epidemiology.