There is a significant academic and commercial need for new tools that provide high dimensional data visualizations, coupled to analytical data mining techniques. We believe that visualization is the interface to analysis and provides guidance in the discovery process. As a major aim, we will investigate and evaluate new visualization tools, some of which are proprietary, capable of displaying an arbitrary number of dimensions, some of which are proprietary, capable of displaying an arbitrary number of dimensions of data simultaneously. To do this, we will use the large public NCI DIS compound dataset that has been tested against a battery of 60 cancer cell lines. In addition to tool evaluation using this dataset, a lesser aim will be knowledge discovery in the dataset. We propose calculation of the Molconn-Z chemical descriptors and the combined data mining of these descriptors. and associated cell line data. This activity is aimed at the discovery of new compound cancer activity patterns that may be useful in a clinical setting. In a follow on Phase II research study, we will integrate the selected visualization and analytic tools into a robust integrated data mining package for commercial use. Proposed commercial applications: The Specific Aims of this Phase I proposal will allow us to evaluate the commercial potential of high dimensional visualization and analysis tools using the publicly available NCI DIS dataset, as well as data mine this dataset for potential new discoveries.
Thesaurus Terms: antineoplastic, cancer information system, computer data analysis, computer system design /evaluation, data management, imaging /visualization /scanning, information retrieval cell line, neoplasm /cancer pharmacology