This SBIR project seeks to build on the success of Ultimode Systems'data-mining tool ACPro, automatic clustering/segmentation software. ACPro generates well-defined, interpretable, probabilistic models of data for segmentation purposes. This project would (1) install high-dimensional and large-sample capability into ACPro, (2) allow incremental updating of ACPro models, and (3) provide a reconfigurable capability into ACPro to allow tuning to its computational platform. We claim these three capabilities are the minimum set of requirements to ready a probabilistic modeling tool such as ACPro for deployment in NASA data-understanding applications. The combined capabilities would allow ACPro to be applied in constrained environments such as on-board satellites where power and data transmission are at a premium. The reconfigurable front-end would allow custom tuning of ACPro to parallel processes used in commercial data mining, networks of workstations for internet mining, and the space-hardened processor arrays planned for NASA deep-space missions. Finally, the incremental capability would allow ACPro to operate in a data ``monitoring'' mode. This project would develop these capabilities to beta-test stage, and concurrently develop demonstrations of the tool on high-profile, commercial and NASA databases accompanied with complementary interpretative tools to aid the knowledge discovery process.
Potential Commercial Applications:High-dimensional, large-sample and incremental capabilities have thefollowing commercial applications: (1) for use as a consulting tool for our high-end clients, (2) for licensing to database and data-mining software houses, (3) for expansion into new markets where large samples and high-dimension are the norm. Ultimode's current growing client list includes mining companies with datasets from satellites and aircraft-borne instruments, and telecommunications companies with large customer databases. While these existing clients could be better served with these new tools, ability to handle increased size would greatly facilitate our expansion into other markets such as internet-related modeling, marketing, manufacturing, banking and finance. Moreover, through our growing partnership with a leading data-mining tools provider, and through future partnerships with database vendors the reconfiguration capability would allow distribution of high-performance code across a range of commercial platforms. Many database companies are still early on in their commercial expansion into the data mining area. With the success of our research, we would have a unique combination of interpretative clustering, large-sample, high-dimension clustering and parallel processing, and thus be competitively placed.