Method and apparatus of hierarchically organizing an acoustic model for speech recognition and adaptation of the model to unseen domains
Patent 6324510 Issued on November 27, 2001. Estimated Expiration Date: November 6, 2018. Estimated Expiration Date is calculated based on simple USPTO term provisions. It does not account for terminal disclaimers, term adjustments, failure to pay maintenance fees, or other factors which might affect the term of a patent.
A method of organizing an acoustic model for speech recognition is comprised of the steps of calculating a measure of acoustic dissimilarity of subphonetic units. A clustering technique is recursively applied to the subphonetic units based on the calculated measure of acoustic dissimilarity to automatically generate a hierarchically arranged model. Each application of the clustering technique produces another level of the hierarchy with the levels progressing from the least specific to the most specific. A technique for adapting the structure and size of a trained acoustic model to an unseen domain using only a small amount of adaptation data is also disclosed.
Other References
Jurgen Fritsch, Michael Finke, "Acid/HNN: Clustering Hierarchies of Neural Networks for Context--Dependent Connectionist Acoustic Modeling," IEEE International conference on Acoustics, Speech and Signal Processing, Conference 23 (New York, New York), p. 505-508, (1998)
J. Fritsch, M. Finke, A. Waibel, "Effective Structural Adaptation of LVCSR Systems to Unsen domains using hierarchical connectionist acoustic models," Proceedings of the International Conference on Spoken Language Processing, p. 2919-2922, (Nov. 30-Dec. 4, 1998)
Paul, D.B., "Extensions to Phone-State Decision-Tree Clustering: Single Tree and Tagged Clustering," IEEE Comp. Soc. Press, IEEE International Conference on Acoustic, Speech, and Signal Processing (Los Alamitos, US), p. 1487-1490, ( 1997)
H. Franco, "Context-Dependent Connectionist Probability Estimation in a Hybrid Markov Model-Neural Net Speech Recognition System," Computer Speech and Language, vol. 8 (No. 3), (Feb. 22, 1994)
J Fritsch, et al., "Context-Dependent Hybrid HME/HMM Speech Recognition Using Polyphone Clustering Decision Trees," Proc. of ICASSP '97
D.J. Kershaw, et al., "Context-Dependent Classes in a Hybrid Recurrent Network HMM Speech Recognition System," Tech. Rep. CUED/F-INFENG/TR217, CUED, Cambridge England 1995
D.L. Thomson, "Ten Case Studies of the Effect of Field Conditions on Speech Recognition Errors," Proceedings of the IEEE ASRU Workshop, (Feb. 22, 1997)
J. Schurmann and W. Doster, "A Decision Theoretic Approach to Hierarchical Classifier Design," Pattern Recognition 17(3), (Feb. 22, 1994)
J. Fritsch, "Acid/HNN; A Framework for Hierarchical Connectionist Acoustic Modeling," Proceedsing of IEEE ASRU Workshop, (Feb. 22, 1997)
C.J. Leggetter and P.C. Woodland, "Speaker Adaptation of HMMs using Linear Regression," Tech. Rep. CUED/F-INFENG/TR181, CUED, (Feb. 22, 1994)
Franco, H., "Context-Dependent Connectionist Probability Estimation in a Hybrid Markov Model-Neural Net Speech Recognition System", Computer Speech and Language, vol. 8, No. 3, Jul. 1994
Fritsch, J., et al, "Context-Dependent Hybrid MHE/HMM Speech Recognition Using Polyphone Clustering Decision Trees", Proc. Of ICASS '97, Apr. 21-24, 1997
Kershaw, D. J., et al, "Contest-Dependent Classes in a Hybrid Recurrent Network HMM Speech Recognition System", Tech. Rep CUED/F-INFENG/TR217, CUED, Cambridge, England, Jul. 1995
Thomson, D. L., "Ten Case Studies of the Effect of Field Conditions on Speech Recognition Errors", Proceedings of the IEEE ASRU Workshop, Dec. 17, 1997
Schurmann, J., et al. "A Decision Theoretic Approach to Hierarchical Classifier Design", Pattern Recognition, 17 (3), 1984
Fritsch., J., "ACIDHNN; A Framcwork for Hierarchical Connectionist Acoustic Modeling", Proceedings of IEEE ASRU Workshop, Dec. 14-17, 1997
Leggetter, C.J., et al, "Speaker Adaptation of HMM's Using Linear Regression", Tech. Rep. CUED/F-INFENG/TR181, CUED, Jun. 199