Introducing Whisper

Other existing approaches incessantly use smaller, more closely paired audio-text training datasets,^{[^reference-1]} ^{[^reference-2]}^{[^reference-3]} or use broad but unsupervised audio pretraining.^{[^reference-4]}^{[^reference-5]}^{[^reference-6]} Because Whisper was trained on a big and diverse dataset and was not fine-tuned to any specific one, it doesn’t beat models that specialise in LibriSpeech performance, a famously competitive benchmark in speech recognition. Nevertheless, after we measure Whisper’s zero-shot performance across many diverse datasets we discover it’s far more robust and makes 50% fewer errors than those models.

A couple of third of Whisper’s audio dataset is non-English, and it’s alternately given the duty of transcribing in the unique language or translating to English. We discover this approach is especially effective at learning speech to text translation and outperforms the supervised SOTA on CoVoST2 to English translation zero-shot.

Introducing Whisper

What are your thoughts on this topic?
Let us know in the comments below.

3 COMMENTS

Share this article

Recent posts

AI in Finance and Its Impact on Worker Retention

AI’s Growing Power Needs: Tech Industry’s Move Towards Nuclear Power

“Human Intelligence Created”… Human Intelligence Challenge Spreads Against ‘Made by AI’

What We Still Don’t Understand About Machine Learning

OpenAI Unveils SearchGPT: A Recent AI-Powered Search Engine

Introducing Whisper

What are your thoughts on this topic? Let us know in the comments below.

3 COMMENTS

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.