Home Artificial Intelligence OpenAI debuts Whisper API for speech-to-text transcription and translation

OpenAI debuts Whisper API for speech-to-text transcription and translation

15
OpenAI debuts Whisper API for speech-to-text transcription and translation

To coincide with the rollout of the ChatGPT API, OpenAI today launched the Whisper API, a hosted version of the open source Whisper speech-to-text model that the corporate released in September.

Priced at $0.006 per minute, Whisper is an automatic speech recognition system that OpenAI claims enables “robust” transcription in multiple languages in addition to translation from those languages into English. It takes files in a wide range of formats, including M4A, MP3, MP4, MPEG, MPGA, WAV and WEBM.

Countless organizations have developed highly capable speech recognition systems, which sit on the core of software and services from tech giants like Google, Amazon and Meta. But what makes Whisper different is that it was trained on 680,000 hours of multilingual and “multitask” data collected from the net, in response to OpenAI president and chairman Greg Brockman, which result in improved recognition of unique accents, background noise and technical jargon.

“We released a model, but that truly was not enough to cause the entire developer ecosystem to construct around it,” Brockman said in a video call with TechCrunch yesterday afternoon. “The Whisper API is similar large model which you can get open source, but we’ve optimized to the acute. It’s much, much faster and very convenient.”

To Brockman’s point, there’s plenty in the best way of barriers in relation to enterprises adopting voice transcription technology. In line with a 2020 Statista survey, firms cite accuracy, accent- or dialect-related recognition issues and value as the highest reasons they haven’t embraced tech like tech-to-speech.

Whisper has its limitations, though — particularly in the realm of “next-word” prediction. Since the system was trained on a considerable amount of noisy data, OpenAI cautions that Whisper might include words in its transcriptions that weren’t actually spoken — possibly since it’s each attempting to predict the following word in audio and transcribe the audio recording itself. Furthermore, Whisper doesn’t perform equally well across languages, affected by a better error rate in relation to speakers of languages that aren’t well-represented within the training data.

That last bit is nothing latest to the world of speech recognition, unfortunately. Biases have long plagued even the most effective systems, with a 2020 Stanford study finding systems from Amazon, Apple, Google, IBM and Microsoft made far fewer errors — about 19% — with users who’re white than with users who’re Black.

Despite this, OpenAI sees Whisper’s transcription capabilities getting used to enhance existing apps, services, products and tools. Already, AI-powered language learning app Speak is using the Whisper API to power a latest in-app virtual speaking companion.

If OpenAI can break into the speech-to-text market in a significant way, it may very well be quite profitable for the Microsoft-backed company. According to 1 report, the segment may very well be price $5.4 billion by 2026, up from $2.2 billion in 2021.

“Our picture is that we actually need to be this universal intelligence,” Brockman said. “We actually need to, very flexibly, give you the option to absorb whatever kind of information you have got — whatever form of task you would like to accomplish — and be a force multiplier on that focus.”

15 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here