U.S. patents available from 1976 to present.
U.S. patent applications available from 2005 to present.

Speech driven lip synthesis using viseme based hidden markov models

Patent 6366885 Issued on April 2, 2002. Estimated Expiration Date: Icon_subject August 27, 2019. Estimated Expiration Date is calculated based on simple USPTO term provisions. It does not account for terminal disclaimers, term adjustments, failure to pay maintenance fees, or other factors which might affect the term of a patent.

Patent References

Method and apparatus for producing audio-visual synthetic speech
Patent #: 5657426
Issued on: 08/12/1997
Inventor: Waters, et al.

Automated synchronization of video image sequences to new soundtracks
Patent #: 5880788
Issued on: 03/09/1999
Inventor: Bregler

Automated speech alignment for image synthesis
Patent #: 5884267
Issued on: 03/16/1999
Inventor: Goldenthal, et al.

Technique for providing a computer generated face having coordinated eye and head movement
Patent #: 6052132
Issued on: 04/18/2000
Inventor: Christian, et al.

Image synthesis Patent #: 6208356
Issued on: 03/27/2001
Inventor: Breen, et al.

Inventors

Application

No. 384763 filed on 08/27/1999

US Classes:

704/270, Application704/235, Speech to image704/258Synthesis

Examiners

Primary: Dorvil, Richemond
Assistant: Nolan, Daniel

Attorney, Agent or Firm

International Classes

G10L 021/06
G10L 015/14
G11B 027/00

Abstract

A method of speech driven lip synthesis which applies viseme based training models to units of visual speech. The audio data is grouped into a smaller number of visually distinct visemes rather than the larger number of phonemes. These visemes then form the basis for a Hidden Markov Model (HMM) state sequence or the output nodes of a neural network. During the training phase, audio and visual features are extracted from input speech, which is then aligned according to the apparent viseme sequence with the corresponding audio features being used to calculate the HMM state output probabilities or the output of the neutral network. During the synthesis phase, the acoustic input is aligned with the most likely viseme HMM sequence (in the case of an HMM based model) or with the nodes of the network (in the case of a neural network based system), which is then used for animation.

Other References

  • Chen et al ("Audio-Visual Integration in Multimodal Communication," IEEE Proceedings vol. 86 No. 5, May 1998).
  • Goldschen et al ("Rationale for Phoneme-Viseme Mapping and Feature Selection in Visual Speech Recognition", Aug. 28-Sep. 8, 1995
PatentsPlus Images
Enhanced PDF formats
loading...
PatentsPlus: add to cart
PatentsPlus: add to cartSearch-enhanced full patent PDF image
$9.95more info
PatentsPlus: add to cart
PatentsPlus: add to cartIntelligent turbocharged patent PDFs with marked up images
$16.95more info
 
Sign InRegister
Username  
Password   
forgot password?