Home Artificial Intelligence How I made AI to Watch YouTube for me Get the Right Tools Get video transcript Save video transcript Ask LLM Demo

How I made AI to Watch YouTube for me Get the Right Tools Get video transcript Save video transcript Ask LLM Demo

0
How I made AI to Watch YouTube for me
Get the Right Tools
Get video transcript
Save video transcript
Ask LLM
Demo

Like many developers (if not all), I’m not a fan of organising environments. After some digging, I discovered evadb — an easy-to-use database system built specifically for AI applications that may cope with video analytics and supports LLM models like openai gpt-3.5-turbo.

Here’s all my imports:

import evadb
import os
import pandas as pd
from pytube import YouTube, extract
from youtube_transcript_api import YouTubeTranscriptApi

And here’s how I start a neighborhood evadb connection, arrange my openai api key, and enter the video I need to observe:

# openai api secret key
os.environ["OPENAI_KEY"] =

# establish evadb connection
cursor = evadb.connect().cursor()

# url of the video you wish to watch
video_link = "https://www.youtube.com/watch?v=***"

Approach 1: download video transcript online

I can download video transcript directly with the video’s url:

# extract video id from url
video_id = extract.video_id(video_link)

# get the complete transcript as an inventory of lines
transcript_lines = YouTubeTranscriptApi.get_transcript(video_id)

# concatenate the lines as a single string
transcript = "".join(list(pd.DataFrame(transcript_lines)['text']))

Approach 2: download video and run speech recognizer

What if this video doesn’t have available transcript? As a substitute, I can download the raw video directly from YouTube and cargo it into an evadb table:

# download YouTube video to a neighborhood mp4 file
YouTube(video_link).streams.filter(file_extension="mp4", progressive="True").first().download(file_name="online_video.mp4")

# load the video into an evadb table called youtube_video
cursor.load("online_video.mp4", "youtube_video", "video").execute()

Then, run a speech recognizer on the downloaded video. Because of evadb, I can grab openai’s whisper model from Hugging Face in 2 lines, literally:

# load whisper-base huggingface model into evadb as a user defined function (UDF)
args = {"task": "automatic-speech-recognition", "model": "openai/whisper-base"}
cursor.create_udf("SpeechRecognizer", type="HuggingFace", **args).execute()

Now it’s time to run the speech recognizer on the video. What’s cool about that is that video frames are all processed as 2nd arrays into table rows so we will query frames and apply functions to them just like every conventional database system. Plus, I can accomplish all that with python-ish function calls as a substitute of complex SQL queries:

# generate transcript from raw video
df = cursor.table("youtube_video").select("SpeechRecognizer(audio)").df()
transcript = df["text"][0]

Now that I even have the transcript, I can load it into an evadb table:

# save transcript to a .csv file
pd.DataFrame([{"text": transcript}]).to_csv("transcript.csv")

# load csv file to a table called transcript
cursor.query("CREATE TABLE IF NOT EXISTS transcript (text TEXT(50));").execute()
cursor.load("transcript.csv", "transcript", "csv").execute()

Now that I even have the video’s transcript, I can ask anything in regards to the video:

# input query to ask in regards to the video
query = str(input("ask an issue in regards to the video: "))

# get answer from llm and print it
answer = cursor.table("transcript").select(f"ChatGPT('{query}', text)").df()["chatgpt.response"][0]
print(answer)

Moreover, I can ask ChatGPT to generate a blog post in regards to the video:

# generate a well-formatted blog post
answer = cursor.table("transcript").select(f"ChatGPT('create an in depth and formatted blog post in .md format', text)").df()["chatgpt.response"][0]
print(answer)

LEAVE A REPLY

Please enter your comment!
Please enter your name here