EchoSpeech: Revolutionizing Communication with Silent-Speech Recognition Technology


Researchers at Cornell University have developed EchoSpeech, a silent-speech recognition interface that employs acoustic-sensing and artificial intelligence to repeatedly recognize as much as 31 unvocalized commands based on lip and mouth movements. This low-power, wearable interface will be operated on a smartphone and requires only a number of minutes of user training data for command recognition.

Ruidong Zhang, a doctoral student of data science, is the lead writer of “EchoSpeech: Continuous Silent Speech Recognition on Minimally-obtrusive Eyewear Powered by Acoustic Sensing,” which can be presented on the Association for Computing Machinery Conference on Human Aspects in Computing Systems (CHI) this month in Hamburg, Germany.

“For individuals who cannot vocalize sound, this silent speech technology may very well be a wonderful input for a voice synthesizer. It could give patients their voices back,” Zhang said, highlighting the technology’s potential applications with further development.

Real-World Applications and Privacy Benefits

In its current form, EchoSpeech may very well be used for communicating with others via smartphone in environments where speech is inconvenient or inappropriate, akin to noisy restaurants or quiet libraries. The silent speech interface can be paired with a stylus and utilized with design software like CAD, significantly reducing the necessity for a keyboard and a mouse.

Equipped with microphones and speakers smaller than pencil erasers, the EchoSpeech glasses function as a wearable AI-powered sonar system, sending and receiving soundwaves across the face and detecting mouth movements. A deep learning algorithm then analyzes these echo profiles in real-time with roughly 95% accuracy.

“We’re moving sonar onto the body,” said Cheng Zhang, assistant professor of data science and director of Cornell’s Smart Computer Interfaces for Future Interactions (SciFi) Lab.

Existing silent-speech recognition technology typically relies on a limited set of predetermined commands and necessitates the user to face or wear a camera. Cheng Zhang explained that that is neither practical nor feasible and in addition raises significant privacy concerns for each the user and people they interact with.

EchoSpeech’s acoustic-sensing technology eliminates the necessity for wearable video cameras. Furthermore, since audio data is smaller than image or video data, it requires less bandwidth to process and will be transmitted to a smartphone via Bluetooth in real-time, in keeping with François Guimbretière, professor in information science.

“And since the information is processed locally in your smartphone as an alternative of uploaded to the cloud,” he said, “privacy-sensitive information never leaves your control.”


What are your thoughts on this topic?
Let us know in the comments below.


0 0 votes
Article Rating
1 Comment
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

Would love your thoughts, please comment.x