Content Tags

There are no tags.

Learning from Past Mistakes: Improving Automatic Speech Recognition Output via Noisy-Clean Phrase Context Modeling.

RSS Source
Prashanth Gurunath Shivakumar, Haoqi Li, Kevin Knight, Panayiotis Georgiou

Automatic speech recognition (ASR) systems lack joint optimization duringdecoding over the acoustic, lexical and language models; for instance the ASRwill often prune words due to acoustics using short-term context, prior torescoring with long-term context. In this work we model the automated speechtranscription process as a noisy transformation channel and propose an errorcorrection system that can learn from the aggregate errors of all theindependent modules constituting the ASR. The proposed system can exploitlong-term context using a neural network language model and can better choosebetween existing ASR output possibilities as well as re-introduce previouslypruned and unseen (out-of-vocabulary) phrases. The system provides significantcorrections under poorly performing ASR conditions without degrading anyaccurate transcriptions. The proposed system can thus be independentlyoptimized and post-process the output of even a highly optimized ASR. We showthat the system consistently provides improvements over the baseline ASR. Wealso show that it performs better when used on out-of-domain and mismatchedtest data and under high-error ASR conditions. Finally, an extensive analysisof the type of errors corrected by our system is presented.

Stay in the loop.

Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.