AIOLA Launches Voice Recognition Model 50% Faster than OpenAI’s ‘Whisper’

-

(Photo = aiola)

Israeli artificial intelligence (AI) startup aiOla has released a voice recognition model that’s 50% faster than OpenAI’s ‘Whisper’. This has made it possible to construct an AI system that may understand and answer users’ questions in near real time.

VentureBeat reported on the first (local time) that aiOLA released ‘Whisper-Medusa’, an open source voice recognition model that doubled the speed by modifying the Whisper architecture.

Whisper converts user audio into text, queries it to a Large Language Model (LLM), and converts LLM answers from text back into audio.

It has turn out to be the usual in speech recognition because of its ability to process complex speech in multiple languages ​​and accents in near real time. It’s downloaded greater than 5 million times a month and is powered by tens of hundreds of apps.

Whisper-Medusa (left) and Whisper speed comparison (Photo = aiOLA)
Whisper-Medusa (left) and Whisper speed comparison (Photo = aiOLA)

AIOLA’s Whisper-Medusa modifies the Whisper architecture and adds a ‘multi-head attention’ mechanism.

Multihead attention divides the ‘self-attention’, which is how each element of the input sequence is said to other elements within the sequence, into multiple heads and performs it in parallel. It could possibly handle more complex relationships between input tokens, allowing the model to capture various sorts of dependencies between input tokens and concurrently mix information from various sources.

It’s explained that expressive power could be improved by handling more complex relationships between input tokens, and processing speed could be increased by applying attention to multiple parts concurrently.

The architectural changes allow Whisper-Medusa to predict 10 tokens at a time as an alternative of 1, leading to 50% faster speech prediction and generation runtime with none performance degradation. aiOLA plans to increase Whisper-Medusa to a 20-head version that may predict 20 tokens at a time.

Currently Whisper-Medusa Hugging FaceIt is offered for research and business use.

Reporter Park Chan cpark@aitimes.com

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x