U.S. patents available from 1976 to present.
U.S. patent applications available from 2005 to present.

Translingual visual speech synthesis

Patent 6813607 Issued on November 2, 2004. Estimated Expiration Date: Icon_subject January 31, 2020. Estimated Expiration Date is calculated based on simple USPTO term provisions. It does not account for terminal disclaimers, term adjustments, failure to pay maintenance fees, or other factors which might affect the term of a patent.

Patent References

Sound-synchronized video system
Patent #: 5608839
Issued on: 03/04/1997
Inventor: Chen

Method and apparatus for producing audio-visual synthetic speech
Patent #: 5657426
Issued on: 08/12/1997
Inventor: Waters, et al.

Method and apparatus for synthetic speech in facial animation
Patent #: 5878396
Issued on: 03/02/1999
Inventor: Henton

Method for generating photo-realistic animated characters
Patent #: 5995119
Issued on: 11/30/1999
Inventor: Cosatto, et al.

Coarticulation method for audio-visual text-to-speech synthesis
Patent #: 6112177
Issued on: 08/29/2000
Inventor: Cosatto, et al.

Method and apparatus for diphone aliasing
Patent #: 6122616
Issued on: 09/19/2000
Inventor: Henton

Talking facial display method and apparatus
Patent #: 6250928
Issued on: 06/26/2001
Inventor: Poggio, et al.

6317716

Speech driven lip synthesis using viseme based hidden markov models
Patent #: 6366885
Issued on: 04/02/2002
Inventor: Basu, et al.

Face synthesis system and methodology
Patent #: 6449595
Issued on: 09/10/2002
Inventor: Arslan, et al.

More ...

Inventors

Application

No. 09494582 filed on 01/31/2000

US Classes:

704/276, Pattern display704/258, Synthesis704/260, Image to speech704/270Application

Examiners

Primary: To, Doris H.
Assistant: Opsasnick, Michael N.

Attorney, Agent or Firm

Foreign Patent References

  • 0674315 DE 09/01/1995
  • 05-298346 JP 11/01/1993
  • WO9946732 WO 09/01/1999

International Classes

G10L 1100
G11B 27032

Claims




We claim:

1. A method of translingual synthesis of visual speech from a given audio signal in a first language, comprising the steps of:

receiving input audio and text of the first language;

generating a phonetic alignment based on best phone boundaries using the speech recognition system of the second language and its own set of phones and mapping to convert the phones from the second language to the phones in the first language so as to get an effective alignment in the phone set of the first language;

performing a phone to viseme mapping to get a corresponding visemic alignment which generates a sequence of visemes which are to be animated to get a desired video; and

animating the sequence of viseme images to get a desired video synthesized output aligned with the input audio signals of the first language.

2. The method of translingual synthesis of visual speech of claim 1, wherein the step of performing phone to viseme mapping is performed using a viseme database in the second language.

3. The method of translingual synthesis of visual speech of claim 1, wherein the step of performing phone to viseme mapping is performed using a viseme database in the first language.

4. A computer implemented method of implementing audio driven facial animation system in a first language, referred to as the novel language using a speech recognition system of a second language, referred to as the base language, the method comprising the steps of:

determining whether a correspondence exists between an audio speech signal of the novel language and a phone of the base language, and, if there is no correspondence between audio data of the novel language and a phone of the base language, identify a closest phone of the base language which best matches that of the novel language;

writing a word of the novel language into a base language database and adding it to a new vocabulary of a speech recognition system of the base language; and

using the new vocabulary to generate a line alignment of the audio speech signal with a corresponding word of the base language vocabulary.

5. The computer implemented method of implementing audio driven facial animation system of claim 4, wherein the phonetically closest phone is chosen.

6. The computer implemented method of implementing audio driven facial animation system of claim 4, wherein the visemically closest phone is chosen.

7. The computer implemented method of implementing audio driven facial animation system of claim 4, the corresponding word of the base language vocabulary is a phonetic word.

8. The computer implemented method of implementing audio driven facial animation system of claim 4, the corresponding word of the base language vocabulary is a visemic word.

9. The computer implemented method of implementing audio driven facial animation system of claim 8, further comprising the step of using the time alignment system of the audio speech signal with a corresponding visemic word of the base language vocabulary to drive images in video animation for generating an animated video in the facial animation system in the first language.

Other References

  • R.E. Donovan, et al., “The IBM Trainable Speech Synthesis System”, International Conference on Speech and Language Processing, 1998.
  • E.D. Petajan, et al., “An Improved Automatic Lipreading System to Enhance Speech Recognition”, Proc. OHI,1988, pp. 19-25.
  • T. Chen et al., “Audio-Visual Integration in Multimodal Communication”, Proceedings of the IEEE, vol. 86, No. 5, May 1998.
  • F. Lavagetto, et al., “Lipreadable Frame Animation Driven by Speech Parameters”, 1994 International Symposium on Speech, Image Processing and Neural Networks, Apr. 13-16, 1994, Hong Kong.
PatentsPlus Images
Enhanced PDF formats
loading...
PatentsPlus: add to cart
PatentsPlus: add to cartSearch-enhanced full patent PDF image
$9.95more info
PatentsPlus: add to cart
PatentsPlus: add to cartIntelligent turbocharged patent PDFs with marked up images
$16.95more info
 
Sign InRegister
Username  
Password   
forgot password?