The objective of this proposal is to demonstrate the feasibility of developing variable speed speech synthesis technology. We plan to use open source TTS systems because they often provide flexibility and interoperability, which is essential for research oriented work. To modify speaking speed, we plan to focus on time domain time-scale modification algorithms, which provide good quality with less computational complexity compared to other approaches such as sinusoidal models or vocoder-based approaches. We will test time domain methods including SOLA, PSOLA, and WSOLA. We will apply linear scaling factor, which modifies the duration regardless of whether the speech segment is a silence, a transient or a sustained vowel. We will also apply different scaling factors to different parts of speech segments. During the optional six months, we will focus on creating multiple voices by modifying voice types, gender, dialects (accents), and perceived emotion of the speech. Based on the source-filter models, we will investigate algorithms for modifying source and filter characteristics, from which many different voices can be generated.
Benefit: The time-scale modification technology will be of tremendous commercial value. Transforming speech or audio signal to an alternative time-scale can be useful digital audio effect. It can be used for fast browsing of speech material for digital libraries and distance learning, fast/slow playback for telephone answering machines and dictaphones, accelerated aural reading for the blind, editing audio/visual recordings for allocated timeslots within the radio/television industry. The ability to change the voice characteristics of TTS speech will enable new applications in various fields in addition to generating multiple voices. It will be an innovative technology for businesses in virtual world environments, childrens toy industry, web-based application software industry, on-line gaming industry, on-line service and entertainment industry, movie industry, and animation (cartoon) industry.
Keywords: Text-to-Speech, Text-to-Speech, Speaking Speed, Voice Conversion, Variable speed, Multiple voices