Method and apparatus for producing audio-visual synthetic speech
Patent 5657426 Issued on August 12, 1997. Estimated Expiration Date: August 12, 2014. Estimated Expiration Date is calculated based on simple USPTO term provisions. It does not account for terminal disclaimers, term adjustments, failure to pay maintenance fees, or other factors which might affect the term of a patent.
A method and apparatus provide a video image of facial features synchronized with synthetic speech. Text input is transformed into a string of phonemes and timing data, which are transmitted to an image generation unit. At the same time, a string of synthetic speech samples is transmitted to an audio server. The audio server produces signals for an audio speaker, causing the audio signals to be continuously audibilized; additionally, the audio server initializes a timer. The image generation unit reads the timing data from the timer and, by consulting the phoneme and timing data, determines the position of the phoneme currently being audibilized. The image generation unit then calculates the facial configuration corresponding to the position in the string of phonemes, calculates the facial configuration, and causes the facial configuration to be displayed on a video device.
Other References
Morishima S, Aizawa K, Harashima H; An Intelligent Facial Image Coding Driven by Speech and Phoneme; ICASSP '89 Feb. 1989
Waters K; A Musce Model for Animating Three Dimensional Facial Expression; ACM Computer Graphics vol. 21 No. 4 Apr. 1987
K. Aizawa, H. Harashima, and T. Saito, "Model-Based Sysnthesis Image Coding (MBASIC) System for a Person's Face," In Signal Processing Image Communication, vol. 1, pp. 139-152, 1989
I. Carlbom, W. Hsu, G. Klinker, R. Szeliski, K. Waters, M. Doyle, J. Gettys, K. Harris, T. Levergood, R. Palmer, M. Picart, D. Terzopoulos, D. Tonnesen, M. Vannier, and G. Wallace, "Modeling and Analysis of Empirical Data in Collaborative Environments," Communications of the ACM (CACM), 35(6):74-84, Jun. 1992
H. Choi, S. Harashima, "Analysis and Synthesis of Facial Expression in Knowledge-Based Coding of Facial Image Sequences," In International Conference on Acoustics and Signal Processing, pp. 2737-2740, 1991
N. Duffy, "Animation Using Image Samples," Processing Images of Faces, Ablex, New Jersey, pp. 179-201, 1992
L. Hight, "Lip-Reader Trainer: A Computer Program for the Hearing Impaired," Proc. of the Johns Hopkins First National Search for Applications of Personal Computing to Aid the Handicapped, pp. 4-5, 1981
J. Lewis and F. Parke, "Automatic Lip-Synch and Speech Synthesis for Character Animation," In CHI+CG '87, pp. 143-147, Toronto, 1987
J. Moore and V. O'Connor, "Towards an Integrated Computer Package for Speech Therapy Training," Microtech Report, Bradford College of Art, 1986
M. Oka, K. Tsutsui, A. Ohba, Y. Kurauchi, and T. Tago, "Real-Time Manipulation of Texture-Mapped Surface," Computer Graphics, 21(4):181-188, 1987
F. Parke, "A Model of the Face that Allows Synchronized Speech," Journal of Computers and Graphics, 1(2):1-4, 1975
F. Parke, "Parameterized Models for Facial Animation," IEEE Computer Graphics and Applications, 2(9):61-68, 1982
"Expression control using synthetic speech." --Wyvill, et al, Department of Computer Science, University of Calgary, Calgary, Alberta, Canada, T2N 1N4, ACM Siggraph '89 Course Notes, State Of The Art In Facial Animation, 16th Annual Conf. On Computer Graphics And Interactive Techniques, Boston, Massachusetts 31 Jul.-4 Aug. 1989, pp. 163-175
"Animating speech: an automated approach using speech synthesised by rules" --Hill, et al, The Visual Computer (1988) ACM Siggraph '89 Course Notes, State Of The Art In Facial Animation, 16th Annual Conf. On Computer Graphics And Interactive Techniques, Boston, Massachusetts 31 Jul.-4 Aug. 1989, pp. 176-18