Voice transformation is altering one person's voice such that it sounds like from another speaker. This can be done by mapping the voice quality and speaking style of the source speaker to that of the target speaker. In this proposal for Phase I, we will investigate state-of-the-art technologies based on the source filter model. For vocal tract modeling and mapping, we will test the linear prediction model and the harmonic noise model. For excitation modeling and mapping, we will consider using the LF model and sinusoidal models. For speaking style mapping, various intonation and speaking rate mapping methods will be examined for the feasibility. This includes various statistical models such as the CART, multiplicative or sum-of-products models. The transformation results will be evaluated using human listeners as well as automatic speaker identification algorithms. We will also investigate methods on how to detect when voice transformation software is employed. The final report of Phase I will include the recommended mapping algorithms and preliminary speech samples transformed using the algorithms. It will also contain requirements and specifications for the voice transformation system that will be implemented during Phase II. Finally, potential risk factors that may affect the performance will be described.
Keywords: Voice Transformation, Voice Conversion, Voice Mimick, Prosody, Speaking Style, Vocal Tract, Excitation, Source-Filter Model