Method of iterative noise estimation in a recursive framework
Patent 7139703 Issued on November 21, 2006. Estimated Expiration Date: September 6, 2022. Estimated Expiration Date is calculated based on simple USPTO term provisions. It does not account for terminal disclaimers, term adjustments, failure to pay maintenance fees, or other factors which might affect the term of a patent.
A method and apparatus estimate additive noise in a noisy signal using an iterative technique within a recursive framework. In particular, the noisy signal is divided into frames and the noise in each frame is determined based on the noise in another frame and the noise determined in a previous iteration for the current frame. In one particular embodiment, the noise found in a previous iteration for a frame is used to define an expansion point for a Taylor series approximation that is used to estimate the noise in the current frame. In one embodiment, noise estimation employs a recursive-Expectation-Maximization framework with a maximum likelihood (ML) criteria. In a further embodiment, noise estimation employs a recursive-Expectation-Maximization framework based on a MAP (maximum a posterior) criteria.
Gauvain et al. “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,” Apr. 1994, IEEE Transactions on Speech and Audio Processing, vol. 2, No. 2, pp. 291-298.
Y. Ephraim et al, “On second-order statistics and linear estimation of cepstral coefficients,” IEEE Trans. Speech and Audio Proc., vol. 7, No. 2, pp. 162-176, Mar. 1999.
F.H.Liu, et al., “Environment normalization for robust speech recognition using direct cepstral comparison,” in Proc.1994 IEEE ICASSP, Apr. 1994.
A.Acero et al., “Environmental robustness in automatic speech recognition,” in Proc. 1990 ICASSP, Apr. 1990, vol. 2, pp. 849-552.
A.Acero et al., “Robust speech recognition by normalization of the acoustic space,” in Proc. 1991 IEEE ICASSP, Apr. 1991, vol. 2, pp. 893-896.
P. Green et al, “Robust ASR based on clean speech models: An evaluation of missing data techniques for connected digit recognition in noise,” in Proc. Eurospeech 2001, Aalborg, Denmark, Sep. 2001, pp. 213-216.
Communication dated Nov. 10, 2003 with European Search Report for EP 03020196.6.
Li Deng et al: “Recursive noise estimation using iterative stochastic approximation for stereo-based robust speech recognition” 2001 IEEE Workshop On Automatic Speech Recognition And Understanding. ASRU 2001. Conference Proceedings, Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, Madonna Di Campiglio, Italy, Dec. 9-13, 2001, pp. 81-84.
Moreno P.J. et al, “A vector Taylor series 1-19 approach for environment-independent speech recognition”, 1996 IEEE International Conference On Acoustics, Speech, and Signal Processing Conference Proceedings, 1996 IEEE International Conference On Acoustics, Speech, and Signal Processing Conference Proceedings, Atlanta, GA, pp. 733-736, vol. 2, 1996, New York, NY.
U.S. Appl. No. 10/117,142, filed Apr. 5, 2002, James G. Droppo et al.
U.S. Appl. No. 09/688,764, filed Oct. 16, 2000, Li Deng et al.
U.S. Appl. No. 09/688,950, filed Oct. 16, 2000, Li Deng et al.
“HMM Adaptation Using Vector Taylor Series for Noisy Speech Recognition,” Alex Acero, et al., Proc. ICSLP, vol. 3, 2000, pp. 869-872.
“Sequential Noise Estimation with Optimal Forgetting for Robust Speech Recognition,” Mohomed Afify, et al., Proc. ICASSP, vol. 1, 2001, pp. 229-232.
“High-Performance Robust Speech Recognition Using Stereo Training Data,” Li Deng, et al., Proc. ICASSP, vol. 1, 2001, pp. 301-304.
“ALGONQUIN: Iterating Laplace's Method to Remove Multiple Types of Acoustic Distortion for Robust Speech Recognition,” Brendan J. Frey, et al., Proc. Eurospeech, Sep. 2001, Aalborg, Denmark.
“Nonstationary Environment Compensation Based on Sequential Estimation,” Nam Soo Kim, IEEE Signal Processing Letters, vol. 5, 1998, pp. 57-60.
“On-line Estimation of Hidden Markov Model Parameters Based on the Kullback-Leibler Information Measure,” Vikram Krishnamurthy, et al., IEEE Trans. Sig. Proc., vol. 41, 1993, pp. 2557-2573.
“A Vector Taylor Series Approach for Environment-Independent Speech Recognition,” Pedro J. Moreno, ICASSP, vol. 1, 1996, pp. 733-736.
“Recursive Parameter Estimation Using Incomplete Data,” D.M. Titterington, J. J. Royal Stat. Soc., vol. 46(B), 1984, pp. 257-267.
“The Aurora Experimental Framework for the Performance Evaluations of Speech Recognition Systems Under Noisy Conditions,” David Pearce, et al., Proc. ISCA IIRW ASR 2000, Sep. 2000.
“Efficient On-Line Acoustic Environment Estimation for FCDCN in a Continuous Speech Recognition System,” Jasha Droppo, et al., ICASSP, 2001.
“Robust Automatic Speech Recognition With Missing and Unreliable Acoustic Data,” Martin Cooke, Speech Communication, vol. 34, No. 3, pp. 267-285, Jun. 2001.
“Learning Dynamic Noise Models From Noisy Speech for Robust Speech Recognition,” Brendan J. Frey, et al., Neural Information Processing Systems Conference, 2001, pp. 1165-1121.
“Speech Denoising and Dereverberation Using Probabilistic Models,” Hagai Attias, et al., Advances in NIPS, vol. 13, 2000 pp. 758-764.
“Statistical-Model-Based Speech Enhancement System,” Proc. of IEEE, vol. 80, No. 10, Oct. 1992, pp. 1526.
“HMM-Based Strategies for Enhancement of Speech Signals Embedded in Nonstationary Noise,” Hossein Sameti, IEEE Trans. Speech Audio Processing, vol. 6, No. 5, Sep. 1998, pp. 445-455.
“Model-based Compensation of the Additive Noise for Continuous Speech Recognition,” J.C. Segura, et al., Eurospeech 2001.
“Large-Vocabulary Speech Recognition Under Adverse Acoustic Environments,” Li Deng, et al., Proc. ICSLP, vol. 3, 2000, pp. 806-809.
“A Compact Model for Speaker-Adaptive Training,” Anastasakos, T., et al., BBN Systems and Technologies, pp. 1137-1140, undated.
“Suppression of Acoustic Noise in Speech Using Spectral Subtraction,” Boll, S. F., IEEE Transactions on Acoustics, Speech and Signal Processing, vol. ASSP-27, No. 2, pp. 113-120 (Apr. 1979).
“Experiments With a Nonlinear Spectral Subtractor (NSS), Hidden Markov Models and the Projection, for Robust Speech Recognition in Cars,” Lockwood, P. et al., Speech Communication 11, pp. 215-228 (1992).
“A Spectral Subtraction Algorithm for Suppression of Acoustic Noise in Speech,” Boll, S.F., IEEE International Conference on Acoustics, Speech & Signal Processing, pp. 200-203 (Apr. 2-4, 1979).
“Enhancement of Speech Corrupted by Acoustic Noise,” Berouti, M. et al., IEEE International Conference on Acoustics, Speech & Signal Processing, pp. 208-211 (Apr. 2-4, 1979).
“Acoustical and Environmental Robustness in Automatic Speech Recognition,” Acero, A., Department of Electrical and Computer Engineering, Carnegie Mellon University, pp. 1-141 (Sep. 13, 1990).
“Speech Recognition in Noisy Environments,” Pedro J. Moreno, Ph.D thesis, Carnegie Mellon University, 1996.
“A New Method for Speech Denoising and Robust Speech Recognition Using Probabilistic Models for Clean Speech and for Noise,” Hagai Attias, et al., Proc. Eurospeech, 2001, pp. 1903-1906.
L. Deng, J. Droppo and A. Acero. Recursive Noise Estimation Using Iterative Stochastic Approximation for Stereo-based Robust Speech Recognition, in Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding. Madonna di Campiglio, Italy, Dec. 2001.
Huo et al., “On-line Adaptive Learning of the Continuous Density Hidden Markov Model Based on Approximate Recursive Bayes Estimate”, Proc. IEEE, Speech and Audio Processing, vol. 5, No. 2, pp. 161-172, Mar. 2, 1997, XP000771954.
Acero et al., “Log-domain speech feature enhancement using sequential MAP noise estimation and a phase-sensitive model of the acoustic environment,” Proc. ICSLP, Denver CO, Sep. 2002, pp. 1813-1816.
J. Spragins. “A note on the iterative application of Bayes' rule,” IEEE Trans. Inform. Theory, vol. 11, No. 4, pp. 544-549.
L. Deng, J. Droppo, and A. Acero. “A Bayesian approach to speech feature enhancement using the dynamic cepstral prior,” Proc. ICASSP, vol. I, Orlando, Florida, May 2002, pp. 829-832.
J. Droppo, L. Deng, and A. Acero. “Evaluation of the SPLICE algorithm on the Aurora2 database,” Proc. Eurospeech, Sep. 2001, pp. 217-220.
J. Droppo, A. Acero, and L. Deng, “Uncertainty decoding with SPLICE for noise robust speech recognition,” in Proc. 2002 ICASSP, Orlando, Florida, May 2002.
Kristjansson T. et al, “Towards non-stationary model-based noise adaptation for large vocabulary speech recognition” 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, May 7-11, 2001, pp. 337-340, vol. 1.
J. Droppo, A. Acero and L. Deng:“A nonlinear observation model for removing noise from corrupted speech log mel-spectral energies”, Proceedings ICSLP 2002, pp. 1569-1572.
L. Deng, J. Droppo and A. Acero: “Log-domain speech feature enhancement using sequential map noise estimation and a phase-sensitive model of the acoustic environment”, Proceedings ICSLP 2002, Sep. 16-20, 2002, pp. 1813-1816.
N.B. Yoma, F.R. McInnes, and M.A. Jack, “Improving performance of spectral substraction in speech recognition using a model for additive noise,” IEEE Trans. On Speech and Audio Processing, vol. 6, No. 6, pp. 579-582, Nov. 1998.
Y.Zhao, “Frequency-domain maximum likelihood estimation for automatic speech recognition in additive and convolutive noises,” IEEE Trans. Speech and Audio Proc., vol. 8, No. 3, pp. 255-266, May 2000.
H.Y. Jung et al., “On the temporal decorrelation of feature parameters for noise-robust speech recognition,” in Proc. 2000 ICASSP, May 2000, vol. 8, pp. 407-416.
Li Deng and Jeff Ma, “Spontaneous speech recognition using a statistical coarticulatory model for the vocal-tract-resonance dynamics,” J. Acoust. Soc. Am. 108 (5), Pt. 1, Nov. 2002.
Jeff Ma and Li Deng, “A path-stack algorithm for optimizing dynamic regimes in a statistical hidden dynamic model of speech,” Computer Speech and Language 2000, 00, 1-14.