Constructing a Transcription Application 2.0 Why Transcription? How does it work? The Challenges Conclusion Thanks a lot for reading!

-

An ML application that transcribes audio files into text and may be done in English, French, Spanish, and Arabic.

Transcribing audio files into text generally is a time-consuming task, especially if you’ve gotten a big volume of audio data that should be transcribed. Fortunately, machine learning (ML) algorithms may be used to automate this process and make it much faster and more efficient.

In this text, I’ll discuss an ML application that I built which may transcribe audio files into text in 4 different languages: English, French, Spanish, and Arabic. The front end of the appliance is built using HTML, CSS, and JavaScript, while the back-end code relies heavily on Python.

The appliance works by first taking an audio file as input. The audio file is then converted right into a digital signal using a library called Libros. This digital signal is then passed through a neural network that has been trained on speech recognition data within the 4 languages.

The neural network consists of several layers of nodes that perform various computations on the input signal. The output of the neural network is a sequence of phonemes, that are the smallest units of sound in a language.

Once the phonemes have been identified, they’re combined into words using a language model. The language model uses statistical evaluation to find out the most certainly sequence of words that match the phonemes.

Finally, the transcribed text is displayed on the front end of the appliance using HTML, CSS, and JavaScript.

HTML




Transcription Service




Transcription Service














Transcription Result:







CSS

* {
box-sizing: border-box;
}

body {
font-family: sans-serif;
margin: 0;
padding: 0;
}

.container {
max-width: 600px;
margin: 0 auto;
padding: 20px;
}

.title {
text-align: center;
font-size: 36px;
margin-bottom: 20px;
}

.form {
display: flex;
flex-direction: column;
gap: 10px;
margin-bottom: 20px;
}

.form-group {
display: flex;
flex-direction: column;
}

label {
margin-bottom: 5px;
}

input[type="file"],
select {
border: 1px solid #ccc;
padding: 10px;
border-radius: 5px;
}

.btn {
background-color: #0066CC;
color: #fff;
border: none;
padding: 10px 20px;
border-radius: 5px;
cursor: pointer;
font-size: 18px;
}

.btn:hover {
background-color: #0052A3;
}

.subtitle {
font-size: 24px;
margin-bottom: 10px;
}

.transcription {
border: 1px solid #ccc;
padding: 10px;
border-radius: 5px;
font-size: 16px;
min-height: 100px;
}

Javascript

const form = document.querySelector('#transcription-form');
const transcriptionResult = document.querySelector('#transcription-result');

form.addEventListener('submit', async (e) => {
e.preventDefault();

const formData = recent FormData(form);
const response = await fetch('/api/transcribe', {
method: 'POST',
body: formData,
});
const data = await response.json();
transcriptionResult.innerText = data.transcription;
});

Python

# importing libraries 
from tkinter import N
import speech_recognition as sr
import os
from pydub import AudioSegment
from pydub.silence import split_on_silence
import PySimpleGUI as sg
import os.path
from PySimpleGUI.PySimpleGUI import WIN_CLOSED
from textwrap import wrap
import smtplib
from email.message import EmailMessage
def essential():
pathForConvert = str
# First the window layout in 2 columns
# First the window layout in 2 columns
File_Choosing_Column = [
[
sg.Text("Audio Folder"),
sg.In(size=(25, 1), enable_events=True, key="-FOLDER-"),
sg.FolderBrowse(),
],
[
sg.Listbox(
values=[], enable_events=True, size=(40, 20), key="-FILE LIST-"
)
],
]
# For now will only show the name of the file that was chosen
Transcript_Viewer = [
[sg.Text("Choose an Audio file from the list on the left:")],
[sg.InputText(key='text1')],
[sg.Text(size=(40, 1), key="-TOUT-")],
[sg.Image(key="-WAVF-")],
]
# ----- Full layout -----
layout = [
[
[sg.Text("Transcript any audio (Must be in WAV format)")],
[sg.Button("Exit"), sg.Button("Run")],
[sg.Text("________________________________________________ _______________________________________________")],
sg.Column(File_Choosing_Column),
sg.VSeperator(),
sg.Column(Transcript_Viewer),
]
]
# Create the window
window = sg.Window("Transcriptor", layout)
# Create an event loop
while True:
event, values = window.read()
# Folder name was filled in, make an inventory of files within the folder
if event == "-FOLDER-":
folder = values["-FOLDER-"]
try:
# Get list of files in folder
file_list = os.listdir(folder)
except:
file_list = []

fnames = [
f
for f in file_list
if os.path.isfile(os.path.join(folder, f))
and f.lower().endswith((".wav", ".mp3", ".aac", ".aiff", ".flac", ".m4a", ".ogg", ".pcm", ".wma"))
]
window["-FILE LIST-"].update(fnames)
elif event == "-FILE LIST-": # A file was chosen from the listbox
try:
filename = os.path.join(
values["-FOLDER-"], values["-FILE LIST-"][0]
)
window["-TOUT-"].update(filename)
path = filename
except:
pass
if event == "Exit":
window.close()
break
if event == "Run":
whole_text = RunIt(path)
whole_text = wrap(whole_text, 58)
eachInASeparateLine = "n".join(whole_text)
whole_text = eachInASeparateLine
window["text1"].update(whole_text)
# content
sender = "NotifAccForPy@gmail.com"
reciever = "clintfordcadio@gmail.com"
password = "ldhjjzjcffqqdizu"
msg_body = ("Here is your accomplished transcription: nn" + eachInASeparateLine)
# motion
msg = EmailMessage()
msg['subject'] = 'Text Transcription complete'
msg['from'] = sender
msg['to'] = reciever
msg.set_content(msg_body)
with smtplib.SMTP_SSL('smtp.gmail.com', 465) as smtp:
smtp.login(sender,password)
smtp.send_message(msg)
elif event == sg.WIN_CLOSED:
break
def RunIt(path):
path = path.replace("", "/", 1)
if path.endswith(".mp3"):
newPathList = path.split("")
newPathString = newPathList[-1]
from os import path
import pydub
# files
src = newPathString
newPathList2 = newPathString.split("/")
newPathString2 = newPathList2[-1]
newPathString2 = newPathString2.replace(".mp3",".wav",1)
del newPathList2[-1]
newPathList2.append(newPathString2)
newPathString3 = "/".join(newPathList2)
newFilenameStringWAVFinal = newPathString3
dst = newFilenameStringWAVFinal
# convert wav to mp3
sound = pydub.AudioSegment.from_file(src)
sound.export(dst, format="wav")
path = newFilenameStringWAVFinal

if path.endswith(".m4a"):
newPathList = path.split("")
newPathString = newPathList[-1]
from os import path
import pydub
# files
src = newPathString
newPathList2 = newPathString.split("/")
newPathString2 = newPathList2[-1]
newPathString2 = newPathString2.replace(".m4a",".wav",1)
del newPathList2[-1]
newPathList2.append(newPathString2)
newPathString3 = "/".join(newPathList2)
newFilenameStringWAVFinal = newPathString3
dst = newFilenameStringWAVFinal
# convert wav to mp3
sound = pydub.AudioSegment.from_file(src)
sound.export(dst, format="wav")
path = newFilenameStringWAVFinal

if path.endswith(".aac"):
newPathList = path.split("")
newPathString = newPathList[-1]
from os import path
import pydub
# files
src = newPathString
newPathList2 = newPathString.split("/")
newPathString2 = newPathList2[-1]
newPathString2 = newPathString2.replace(".aac",".wav",1)
del newPathList2[-1]
newPathList2.append(newPathString2)
newPathString3 = "/".join(newPathList2)
newFilenameStringWAVFinal = newPathString3
dst = newFilenameStringWAVFinal
# convert wav to mp3
sound = pydub.AudioSegment.from_file(src)
sound.export(dst, format="wav")
path = newFilenameStringWAVFinal

if path.endswith(".aiff"):
newPathList = path.split("")
newPathString = newPathList[-1]
from os import path
import pydub
# files
src = newPathString
newPathList2 = newPathString.split("/")
newPathString2 = newPathList2[-1]
newPathString2 = newPathString2.replace(".aiff",".wav",1)
del newPathList2[-1]
newPathList2.append(newPathString2)
newPathString3 = "/".join(newPathList2)
newFilenameStringWAVFinal = newPathString3
dst = newFilenameStringWAVFinal
# convert wav to mp3
sound = pydub.AudioSegment.from_file(src)
sound.export(dst, format="wav")
path = newFilenameStringWAVFinal

if path.endswith(".flac"):
newPathList = path.split("")
newPathString = newPathList[-1]
from os import path
import pydub
# files
src = newPathString
newPathList2 = newPathString.split("/")
newPathString2 = newPathList2[-1]
newPathString2 = newPathString2.replace(".flac",".wav",1)
del newPathList2[-1]
newPathList2.append(newPathString2)
newPathString3 = "/".join(newPathList2)
newFilenameStringWAVFinal = newPathString3
dst = newFilenameStringWAVFinal
# convert wav to mp3
sound = pydub.AudioSegment.from_file(src)
sound.export(dst, format="wav")
path = newFilenameStringWAVFinal

if path.endswith(".ogg"):
newPathList = path.split("")
newPathString = newPathList[-1]
from os import path
import pydub
# files
src = newPathString
newPathList2 = newPathString.split("/")
newPathString2 = newPathList2[-1]
newPathString2 = newPathString2.replace(".ogg",".wav",1)
del newPathList2[-1]
newPathList2.append(newPathString2)
newPathString3 = "/".join(newPathList2)
newFilenameStringWAVFinal = newPathString3
dst = newFilenameStringWAVFinal
# convert wav to mp3
sound = pydub.AudioSegment.from_file(src)
sound.export(dst, format="wav")
path = newFilenameStringWAVFinal

if path.endswith(".wma"):
newPathList = path.split("")
newPathString = newPathList[-1]
from os import path
import pydub
# files
src = newPathString
newPathList2 = newPathString.split("/")
newPathString2 = newPathList2[-1]
newPathString2 = newPathString2.replace(".wma",".wav",1)
del newPathList2[-1]
newPathList2.append(newPathString2)
newPathString3 = "/".join(newPathList2)
newFilenameStringWAVFinal = newPathString3
dst = newFilenameStringWAVFinal
# convert wav to mp3
sound = pydub.AudioSegment.from_file(src)
sound.export(dst, format="wav")
path = newFilenameStringWAVFinal

# create a speech recognition object
r = sr.Recognizer()
# a function that splits the audio file into chunks
# and applies speech recognition
# Splitting the big audio file into chunks
# and apply speech recognition on each of those chunks
# open the audio file using pydub
sound = AudioSegment.from_wav(path)
# split audio sound where silence is 700 miliseconds or more and get chunks
chunks = split_on_silence(sound,
# experiment with this value on your goal audio file
min_silence_len = 500,
# adjust this per requirement
silence_thresh = sound.dBFS-14,
# keep the silence for 1 second, adjustable as well
keep_silence=500,
)
folder_name = newPathString2
# create a directory to store the audio chunks
if not os.path.isdir(folder_name):
os.mkdir(folder_name)
whole_text = ""
# process each chunk
for i, audio_chunk in enumerate(chunks, start=1):
# export audio chunk and reserve it in
# the `folder_name` directory.
chunk_filename = os.path.join(folder_name, f"chunk{i}.wav")
audio_chunk.export(chunk_filename, format="wav")
# recognize the chunk
with sr.AudioFile(chunk_filename) as source:
audio_listened = r.record(source)
# try converting it to text
try:
text = r.recognize_google(audio_listened, language="fr")
except sr.UnknownValueError as e:
print("Error:", str(e))
else:
text = f"{text.capitalize()}. "
print(chunk_filename, ":", text)
whole_text += text
# return the text for all chunks detected
return whole_text
essential()

One among the important thing challenges of constructing this application was training the neural network on speech recognition data in multiple languages. I had to gather large amounts of information in each language after which use this data to coach separate neural networks for every language.

I also needed to fastidiously design the language model to be sure that it could accurately match phonemes to words in each language. This required a deep understanding of the unique characteristics of every language, akin to its grammar, syntax, and vocabulary.

Despite these challenges, we were in a position to successfully construct an ML application that may transcribe audio files into text in 4 different languages. This application has many potential use cases, akin to transcribing interviews, lectures, and meetings.

In conclusion, the usage of machine learning algorithms to automate speech recognition is an exciting field with many promising applications. Our application, which may transcribe audio files into text in English, French, Spanish, and Arabic, is only one example of the potential of this technology. As more data becomes available and our understanding of speech recognition improves, we will expect to see much more advanced applications in the long run.

admin

What are your thoughts on this topic?
Let us know in the comments below.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments

Share this article

Recent posts

Parking difficulties in complex urban areas ‘solved’ with AI convergence technology

Jeonnam attracted attention by demonstrating long-distance parking for address-based self-driving cars that combined latest technologies of the Fourth Industrial Revolution, corresponding to artificial intelligence...

Figure AI’s $675 Million Breakthrough in Humanoid Robotics

Within the ever-evolving landscape of technology, humanoid robotics stands as a frontier teeming with potential and promise. The concept, once confined to the realms...

Rolls-Royce advances into solar drones…applies for brand new technology patent

It has been reported that Rolls-Royce has entered the sphere of solar-powered drones. News specialist Bnn reported on the twenty sixth (local time) that Rolls-Royce...

The Stacking Ensemble Method

Understand stacking using scikit-learnDiscover the ability of stacking in machine learning — a method that mixes multiple models right into a single powerhouse predictor....

Rosie Brothers holds ‘Preparation for Digital Leading Schools’ webinar for teachers

Three firms, including Rosie Brothers (CEO Sang-min Noh), Classting (CEO Hyeon-gu Cho), and iPortfolio (CEO Seong-yoon Kim), specializing in artificial intelligence (AI) education, held...

Recent comments

AeroSlim Weight loss price on NIA holds AI Ethics Idea Contest Awards Ceremony
skapa binance-konto on LLMs and the Emerging ML Tech Stack
бнанс рестраця для США on Model Evaluation in Time Series Forecasting
Bonus Pendaftaran Binance on Meet Our Fleet
Créer un compte gratuit on About Me — How I give AI artists a hand
To tài khon binance on China completely blocks ‘Chat GPT’
Regístrese para obtener 100 USDT on Reducing bias and improving safety in DALL·E 2
crystal teeth whitening on What babies can teach AI
binance referral bonus on DALL·E API now available in public beta
www.binance.com prihlásení on Neural Networks and Life
Büyü Yapılmışsa Nasıl Bozulur on Introduction to PyTorch: from training loop to prediction
yıldızname on OpenAI Function Calling
Kısmet Bağlılığını Çözmek İçin Dua on Examining Flights within the U.S. with AWS and Power BI
Kısmet Bağlılığını Çözmek İçin Dua on How Meta’s AI Generates Music Based on a Reference Melody
Kısmet Bağlılığını Çözmek İçin Dua on ‘이루다’의 스캐터랩, 기업용 AI 시장에 도전장
uçak oyunu bahis on Thanks!
para kazandıran uçak oyunu on Make Machine Learning Work for You
medyum on Teaching with AI
aviator oyunu oyna on Machine Learning for Beginners !
yıldızname on Final DXA-nation
adet kanı büyüsü on ‘Fake ChatGPT’ app on the App Store
Eşini Eve Bağlamak İçin Dua on LLMs and the Emerging ML Tech Stack
aviator oyunu oyna on AI as Artist’s Augmentation
Büyü Yapılmışsa Nasıl Bozulur on Some Guy Is Trying To Turn $100 Into $100,000 With ChatGPT
Eşini Eve Bağlamak İçin Dua on Latest embedding models and API updates
Kısmet Bağlılığını Çözmek İçin Dua on Jorge Torres, Co-founder & CEO of MindsDB – Interview Series
gideni geri getiren büyü on Joining the battle against health care bias
uçak oyunu bahis on A faster method to teach a robot
uçak oyunu bahis on Introducing the GPT Store
para kazandıran uçak oyunu on Upgrading AI-powered travel products to first-class
para kazandıran uçak oyunu on 10 Best AI Scheduling Assistants (September 2023)
aviator oyunu oyna on 🤗Hugging Face Transformers Agent
Kısmet Bağlılığını Çözmek İçin Dua on Time Series Prediction with Transformers
para kazandıran uçak oyunu on How China is regulating robotaxis
bağlanma büyüsü on MLflow on Cloud
para kazandıran uçak oyunu on Can The 2024 US Elections Leverage Generative AI?
Canbar Büyüsü on The reverse imitation game
bağlanma büyüsü on The NYU AI School Returns Summer 2023
para kazandıran uçak oyunu on Beyond ChatGPT; AI Agent: A Recent World of Staff
Büyü Yapılmışsa Nasıl Bozulur on The Murky World of AI and Copyright
gideni geri getiren büyü on ‘Midjourney 5.2’ creates magical images
Büyü Yapılmışsa Nasıl Bozulur on Microsoft launches the brand new Bing, with ChatGPT inbuilt
gideni geri getiren büyü on MemCon 2023: We’ll Be There — Will You?
adet kanı büyüsü on Meet the Fellow: Umang Bhatt
aviator oyunu oyna on Meet the Fellow: Umang Bhatt
abrir uma conta na binance on The reverse imitation game
código de indicac~ao binance on Neural Networks and Life
Larry Devin Vaughn Wall on How China is regulating robotaxis
Jon Aron Devon Bond on How China is regulating robotaxis
otvorenie úctu na binance on Evolution of Blockchain by DLC
puravive reviews consumer reports on AI-Driven Platform Could Streamline Drug Development
puravive reviews consumer reports on How OpenAI is approaching 2024 worldwide elections
www.binance.com Registrácia on DALL·E now available in beta