Home Artificial Intelligence Introducing Whisper

Introducing Whisper

2
Introducing Whisper

Other existing approaches incessantly use smaller, more closely paired audio-text training datasets,[^reference-1] [^reference-2][^reference-3] or use broad but unsupervised audio pretraining.[^reference-4][^reference-5][^reference-6] Because Whisper was trained on a big and diverse dataset and was not fine-tuned to any specific one, it doesn’t beat models that specialise in LibriSpeech performance, a famously competitive benchmark in speech recognition. Nevertheless, after we measure Whisper’s zero-shot performance across many diverse datasets we discover it’s far more robust and makes 50% fewer errors than those models.

A couple of third of Whisper’s audio dataset is non-English, and it’s alternately given the duty of transcribing in the unique language or translating to English. We discover this approach is especially effective at learning speech to text translation and outperforms the supervised SOTA on CoVoST2 to English translation zero-shot.

2 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here