Close this search box.

Miranda J., Neto J.P., Black A.W.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

pp 8131



We propose a method to combine audio of a lecture with its supporting slides in order to improve automatic speech recognition performance. We view both the lecture speech and the slides as parallel streams which contain redundant information. We integrate both streams in order to bias the recognizer’s language model towards the words in the slides, by first aligning the speech with the slide words, thus correcting errors on the ASR transcripts. We obtain a 5.9% relative WER improvement on a lecture test set, when compared to a speech recognition only system.