Sangbae Jeon, CSO of Gaudio Lab, “We’ll showcase cutting-edge AI audio technology at CES…even pioneering the content market”

-

Sangbae Jeon, CSO of Gaudio Lab, is explaining AI audio generation technology. (Photo = Gaudio Lab)

Gaudio Lab (CEO Oh Hyeon-oh), specializing in artificial intelligence (AI) audio, emerged as a ‘surprise star’ at CES 2024 held in Las Vegas, USA, early this 12 months.

Microsoft CEO Satya Nadella visited the booth and showed interest within the sound generation AI ‘FALL-E’, drawing attention from domestic and foreign media. This solution is a technology that uses image prompts to create appropriate sound effects.

Gaudio Lab won the CES Innovation Award again this 12 months and can take part in CES 2025 to be held in January next 12 months. It’s natural that interest is concentrated on what technology will probably be introduced this time.

Sangbae Jeon, Gaudio Lab’s Chief Science Officer (CSO), said, “If Poly, which attracted attention on the last CES, was a technology for generating sound, the technology to be introduced this time is a complicated technology that mixes sound separation and generation.”

He also emphasized, “If accomplished well, it is a technology that may bring innovation to the content market,” and “AI audio technology will give you the option to pioneer a brand new content market.”

This 12 months’s award-winning work, ‘Gaudio Music Placement’, is a technology that goes beyond images and uses video as a prompt. The AI ​​engine not only recommends and arranges background music suitable for the video, but additionally solves many problems on the video production stage, comparable to background music substitute, dubbing, subtitles, sound effect selection, noise removal, and dialogue separation.

Currently, ‘Gaudio Music Substitute’, which is supplied with a few of these functions, has been commercialized. Substitute is literally an answer that recombines and rearranges sounds in a video, and is the present king on this field.

To clarify the aim of this solution, CSO Jeon Sang-bae cited reality travel entertainment shows comparable to ‘2 Days & 1 Night’ and ‘Latest Journey to the West’ as examples.

Travel entertainment shows nearly 100 songs per episode. Complicating problems arise when such programs are exported overseas.

Even when there isn’t any problem with a song domestically, there are a lot of cases where it becomes an issue overseas where different copyright laws apply. There are cases where the background music needs to get replaced for export, which is a tougher task than you would possibly think.

A former CSO said, “Not only do now we have to cleanly separate the music from the speech sounds, but we even have to switch it with similar music and remaster it one after the other,” adding, “This work is causing delays in exports.”

Substitute is a technology that automates this complete process with AI. In fact, broadcasting corporations are the foremost customers. He also said that this has made it possible to export content that was previously unthinkable.

This demand doesn’t only occur domestically. There are various cases where the background music needs to get replaced so as to upload foreign content to OTT comparable to Netflix. In sports broadcasts, there are quite a number of cases where the sound source is different between live broadcasts and rebroadcasts.

The reason is that even though it is a field that will not be generally well-known, there is important demand from around the globe.

Music Placement, which also received the CES Innovation Award, is an answer that may prevent such substitute work from the start. It provides support for music that excludes copyright issues from the production process. It was expected that there could be plenty of demand within the YouTube creator market.

As such, the solutions to be showcased at this CES are based on the corporate’s world-class sound source technology.

The previous CSO explained, “There are quite a number of things to contemplate within the audio creation process,” and “due to this fact, it will not be a field that anyone can easily follow.” In truth, Gaudio Lab was established in 2015. It is a place that introduced ‘multimodal-based sound effect generation technology’ faster than Eleven Labs, which is attracting attention in the worldwide voice field.

The incontrovertible fact that voice technology has a small database and that there are a lot of several types of dialogue, BGM, and noise was cited as a barrier to entry. He explained that sophisticated sound source separation technology is important to create sound generating AI.

He also explained that creating sounds utilized in movies will not be a sure bet. So as to add sound to the scene where ‘three individuals are conducting an interview within the office’, you wish not only dialogue, but additionally various sounds comparable to the sound of typing on the keyboard, the sound of the heater running, and the sound of the beam projector. Even external noise have to be considered. If any of those are missing, the video will inevitably look unnatural.

The sound also changes depending on the direction. An example is that sounds coming from closer areas are louder. He also said that there are sure to be differences depending on the genre of the video, whether it’s a movie or a drama.

For that reason, he concluded, “To really complete video-audio technology, AI must read the ‘context’ of the video itself.”

Gaudio Lab’s ultimate goal is to construct a technology that reads the context of input video and generates perfect sound. And he said he’s getting near that goal. In truth, sound effects created with the Foley solution will probably be introduced within the domestic movie ‘Ghost Train’, which will probably be released in 2025.

Gaudio Lab CSO Sang-bae Jeon recalled saying in an interview last 12 months, “Gaudio Lab’s sound will probably be included in most movies inside 5 to 10 years.” “Currently, we’re on a smooth journey,” he said.

Meanwhile, it was revealed that latest, unexpected businesses were discovered within the strategy of technological advancement. Karaoke is a representative example.

Existing karaoke rooms used MIDI sound sources, so there was a limitation of not having the ability to erase the unique machine-like sound. Nonetheless, Gaudio Lab’s sound source technology clearly separates pitch, tempo, vocals, MR, etc. from popular song sound sources and supports enlarging, reducing, and removing only specific parts.

With this technology, an AI-based automotive infotainment solution will probably be unveiled for the primary time at CES. This function means that you can separate instruments comparable to vocals, drums, and bass in accordance with the user’s preference and output them individually through the speakers within the automobile, or enjoy karaoke with the vocal removal function.

Former CSO emphasized, “Gaudio Lab is running toward the ‘peak of technology’ that may generate sound effects by inputting only one video,” adding, “We’ve got now entered the stage of pioneering a brand new market with this technology.”

Reporter Jang Se-min semim99@aitimes.com

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x