Methods for controlling the generation of speech from text representing one or more names
Patent 5832435 Issued on November 3, 1998. Estimated Expiration Date: January 29, 2017. Estimated Expiration Date is calculated based on simple USPTO term provisions. It does not account for terminal disclaimers, term adjustments, failure to pay maintenance fees, or other factors which might affect the term of a patent.
Improved automated synthesis of human audible speech from text is disclosed. Performance enhancement of the underlying text comprehensibility is obtained through prosodic treatment of the synthesized material, improved speaking rate treatment, and improved methods of spelling words or terms for the system user. Prosodic shaping of text sequences appropriate for the discourse in large groupings of text segments, with prosodic boundaries developed to indicate conceptual units within the text groupings, is implemented in a preferred embodiment.
Other References
Taylor et al, "An interactive synthetic speech generation system," IEE Colloquim on `systems and applications of man-machine interaction using speech i/o`, p. 6/1-3, Mar. 1991
Bachenko et al, "Prosodic phrasing for speech synthesis of written telecommunications by the deaf," IEEE Global telecommunications Conference. Globecom '91, pp. 1391-5 vol. 2, Dec. 1991
Chen et al, "A first study of neural net based generation of prosodic and spectral information for mandrin text-to-speech," ICASSP-92, pp. 45-8 vol. 2, Mar. 1992
Bang et al, "A text-to-speech system for spanish with a frequency domain based prosodic modification algorithm," ICASSP '93, pp. II-183--II-186, Apr. 1993
Chen et al, "Word recognition based on the combination of a sequential neural network and the GPDM discriminative training algorithm," Neural Networks for Signal Processing. Proceedings of the 1991 IEEE Workshop, pp. 376-84, Oct. 1991
Hwang et al, "Neural-network based FO text-to-speech synthesizer for Mandarin," IEE Proceedings-Vision, Image, and Signal Processing, vol. 141, iss. 6, pp. 384-90, Dec. 1994
Julia Hirschberg and Janet Pierrehumbert, "The Intonational Structuring of Discourse", Association of Computational Linguistics: 1986 (ACL-86) pp. 1-9
J.S. Young, F. Fallside, "Synthesis by Rule of Prosodic Features in Word Concatenation Synthesis", Int. Journal Man-Machine Studies, (1980) v12, pp. 241-258
A.W.F. Huggins, "speech Timing and Intelligibility", Attention and Performance VII, Hillsdale, NJ: Erlbaum 1978, pp. 279-297
S.J. Young and F. Fallside, "Speech Synthesis from Concept: A Method for Speech Output From Information Systems", J. Acoust. Soc. Am. 66(3), Sep. 1979, pp. 685-695
B.G. Green, J.S. Logan, D.B. Pisoni, "Perception of Synthetic Speech Produced Automatically by Rule: Intelligibility of Eight Text-to-Speech Systems", Behavior Research Methods, Instruments & Computers, v18, 1986, pp. 100-107
B.G. Greene, L.M. Manous, D.B. Pisoni, "Perceptual Evaluation of DECtalk: A Final Report on Version 1.8*", Research on Speech Perception Progress Report No. 10, Bloomington, IN. Speech Research Laboratory, Indiana University (1984), pp. 77-127
Kim E.A. Silverman, Doctoral Thesis, "The Structure and Processing of Fundamental Frequency Contours", University of Cambridge (UK) 1987
J.C. Thomas and M.B. Rosson, "Human Factors Synthetic Speech", Human Computer Interaction--INTERACT '84, North Holland Elsevier Science Publishers (1984) pp. 219-224
Y. Sagisaka, "Speech Synthesis From Text", IEEE Communications Magazine, vol. 28, iss 1, Jan. 1990, pp. 35-41
E. Fitzpatrick and J. Bachenko, "Parsing for Prosody: What a Text-to-Speech System Needs from Syntax", pp. 188-194, 27-31 Mar. 1989
Moulines et al., "A Real-Time French Text-To-Speech System Generating High-Quality Synthetic Speech", ICASSP 90, pp. 309-312, vol. 1, 3-6 Apr. 1990
Wilemse et al, "Context Free Card Parsing In A Text-To-Speech System", ICASSP 91, pp. 757-760, vol. 2, 14-17 May, 1991
James Raymond Davis and Julia Hirschberg, "Assigning Intonational Features in Synthesized Spoken Directions", 26th Annual Meeting of Assoc. Computational Lingustistics; 1988, pp. 1-9
K. Silverman, S. Basson, S. Levas, "Evaluating Synthesizer Performance: Is Segmental Intelligibility Enough", International Conf. on spoken Language Processing, 1990
J. Allen, M.S. Hunnicutt, D. Klatt, "From Text to Speech: The MIT Talk System", Cambridge University Press, 1987
T. Boogaart, K. Silverman, "Evaluating the Overall Comprehensibility of speech Synthesizers", Proc. Int'l Conference on Spoken Language Processing, 1990
K. Silverman, S. Basson, S. Levas, "On Evaluating Synthetic Speech: What Load Does It Place on a Listener's Cognitive Resources", Proc. 3rd Austal. Int'l Conf. Speech Science & Technology, 199