Voci Technologies Incorporated (Voci) is the leading small business developing accelerated Human Language Technology based solutions. Voci is partnering with Richard M. Stern, a Voci advisor and Professor at Carnegie Mellon University (CMU), to develop an Automated Speaker Clustering System (ASCS). The proposed ASCS will be developed by integrating Vocis best in class, patent pending, HyperVox technology with the latest SID capabilities from CMU. The proposed system is uniquely architected to provide tuning parameters that enable tradeoffs between false positive and false negative rates, and the ability to simulate the impact of improvements on different components of the ASCS system essential to developing reliable performance specifications. The proposed ASCS uses a parallel set of proprietary techniques to optimize the extraction of voice features in both batch and streaming modes. The resulting voice features are fused with a reliable word list to provide a clustering decision together with a confidence estimate on the match between the audio sample and the nearest speaker cluster. At the end of Phase I the team will demonstrate the automated clustering of audio files. The Team believes its final ASCS implementation will be able to automatically cluster 10s to 100s of thousand of audio files per hour with useful true/false positive rates.
Benefit: The intent of this effort is to produce a dual use 0x9D capability that meets the needs of the US Navy, DoD and commercial applications. A critical consideration in any commercial product is that it is open and easily integratable with other systems. Voci envisions this powerful new Automatic Speaker Clustering System (ASCS) technology to be embedded in existing Voci products, enhancing these systems ability (i) to provide an additional security layer, without requiring the cost of integrating into an existing interactive voice response (IVR) system, (ii) to more effectively identify individuals of interest for the purpose of preventing fraud and other crimes, and (iii) to render customer relationship management (CRM) systems more effective in dealing with individual customers. These application spaces can be considered part of the enterprise analytics space, which was estimated to be over $10B in revenue in 2010.
Keywords: (4) Gender Identification, (4) Gender Identification, (6) Word Spotting, (5) Language Identification, (8) Large Vocabulary Continuous Speech Recognition (LVCSR), (2) Human Language Technology, (7) Confidence Score, (1) Automated Audio clustering, (3) Speaker Identification,