There is a significant commercial and academic need for new tools that provide quantitative cluster comparison metrics. It is important for pharmaceutical and biotechnology companies to be able to critically evaluate the utility of using different clustering techniques on large high dimensional datasets, in order to make the most informed decisions based upon the clustering results. We propose to evaluate and build bluster comparison metrics, integrating them with high dimensional visualization techniques, so that not only an overall scope, but the cluster distributions can be compared in an intuitive visual fashion. In carrying out our analysis, we will focus on the NCI (approximately 1,400) compound, subset, 118 known mechanism of action compound gene expression dataset analyzed by Scherf, et.al (2000). IN A FOLLOW ON Phase II SBIR Proposal, we will create a robust software package for commercial release where cluster comparison metrics are integrated with the most valuable visualization tools we identify in the Phase I research. PROPOSED COMMERCIAL APPLICATIONS: The Specific Aims of this Phase I proposal will allow us to create new tools where cluster comparison metrics are integrated with high dimensional visualization techniques, so that not only an overall score, but the cluster distributions can be compared in an intuitive visual fashion. We will use the publicly available NCI DIS compound subset, gene expression dataset of Scherf, e.g. al. (2000) to carry out these aims, as ell as data mine this dataset for new discoveries.
Thesaurus Terms: artificial intelligence, cancer information system, computer data analysis, computer program /software, computer system design /evaluation, data collection methodology /evaluation, mathematics informatics, information retrieval