Patent ReferencesSound-synchronized video system Method and apparatus for producing audio-visual synthetic speech Method and apparatus for synthetic speech in facial animation Method for generating photo-realistic animated characters Coarticulation method for audio-visual text-to-speech synthesis Method and apparatus for diphone aliasing Talking facial display method and apparatus 6317716 Speech driven lip synthesis using viseme based hidden markov models Face synthesis system and methodology InventorsApplicationNo. 09494582 filed on 01/31/2000US Classes:704/276, Pattern display704/258, Synthesis704/260, Image to speech704/270ApplicationExaminersPrimary: To, Doris H.Assistant: Opsasnick, Michael N. Attorney, Agent or FirmForeign Patent References
International ClassesG10L 1100G11B 27032 ClaimsWe claim: 1. A method of translingual synthesis of visual speech from a given audio signal in a first language, comprising the steps of: receiving input audio and text of the first language; generating a phonetic alignment based on best phone boundaries using the speech recognition system of the second language and its own set of phones and mapping to convert the phones from the second language to the phones in the first language so as to get an effective alignment in the phone set of the first language; performing a phone to viseme mapping to get a corresponding visemic alignment which generates a sequence of visemes which are to be animated to get a desired video; and animating the sequence of viseme images to get a desired video synthesized output aligned with the input audio signals of the first language. 2. The method of translingual synthesis of visual speech of claim 1, wherein the step of performing phone to viseme mapping is performed using a viseme database in the second language. 3. The method of translingual synthesis of visual speech of claim 1, wherein the step of performing phone to viseme mapping is performed using a viseme database in the first language. 4. A computer implemented method of implementing audio driven facial animation system in a first language, referred to as the novel language using a speech recognition system of a second language, referred to as the base language, the method comprising the steps of: determining whether a correspondence exists between an audio speech signal of the novel language and a phone of the base language, and, if there is no correspondence between audio data of the novel language and a phone of the base language, identify a closest phone of the base language which best matches that of the novel language; writing a word of the novel language into a base language database and adding it to a new vocabulary of a speech recognition system of the base language; and using the new vocabulary to generate a line alignment of the audio speech signal with a corresponding word of the base language vocabulary. 5. The computer implemented method of implementing audio driven facial animation system of claim 4, wherein the phonetically closest phone is chosen. 6. The computer implemented method of implementing audio driven facial animation system of claim 4, wherein the visemically closest phone is chosen. 7. The computer implemented method of implementing audio driven facial animation system of claim 4, the corresponding word of the base language vocabulary is a phonetic word. 8. The computer implemented method of implementing audio driven facial animation system of claim 4, the corresponding word of the base language vocabulary is a visemic word. 9. The computer implemented method of implementing audio driven facial animation system of claim 8, further comprising the step of using the time alignment system of the audio speech signal with a corresponding visemic word of the base language vocabulary to drive images in video animation for generating an animated video in the facial animation system in the first language. Other References
|