How I Coded My Own Private French Tutor Out of ChatGPT Ranging from Scratch Architecture and Threading Designing the UI Prompt Engineering Summary

Step-by-step guide to how I used the most recent AI services to show me a recent language, from architecture to prompt engineering

Code of the discussed foreign-language tutor might be present in the companion repo on my GitHub page, and you need to use it freely for any non-commercial use.

So after postponing it for quite a while, I made a selection to resume my French studies. As I signed up for the category, this thought struck me — what if I could program ChatGPT to be my personal French tutor? What if I could to it, and it will speak back to me? Being a Data Scientist working with LLMs, this gave the impression of something price constructing. I mean, yes, I can just speak to my wife, who’s French, but that’s not as cool as designing my very own personal tutor out of ChatGPT. Love you, honey ❤️.

But seriously now, this project is slightly greater than just “one other cool code-toy”. Generative AI is headed to each field in our lives, and Large Language Models (LLMs) appear to take the lead here. The chances of what a single person can do today with access to those models is jaw-dropping, and I made a decision this project is price my time — and yours, too, I think — for 2 fundamental reason:

Using ChatGPT because the well-known online tool is powerful, but integrating an LLM into your code is a complete different thing. LLMs are still somewhat unpredictable, and when your product relies on an LLM — or another GenAI model — for the core product, it’s worthwhile to learn the right way to really control GenAI. And it’s not as easy because it sounds.
Getting a primary working version took only just a few workdays. Before GenAI and LLMs, this could take months, and would probably require greater than a single person. The facility of using these tools to create powerful applications fast is something you actually should try yourself — that’s the longer term, so far as I see it. We’re not going back.

Plus, this project can actually do good. My mom really wants someone she will be able to practice her English with. Now she will be able to, and it costs lower than $3 a month. My wife’s mom desires to kick off Korean studying. Same thing, same cost. And in fact I take advantage of it too! This project really helps people, and it costs lower than a small cup of coffee. That’s the true GenAI revolution, in the event you ask me.

this project from a high-level perspective, what I needed were 4 elements:

, to transcribe my very own voice to words
, preferably a Chat-LLM, which I can ask questions and get responses from
, to show the LLM’s answers to voice
, to convert any French text I don’t fully understand to English (or Hebrew, my native language)

Luckily, it’s 2023, and all the above are very very accessible. I’ve also chosen to make use of managed services and APIs somewhat than running any of those locally, as inference will likely be much faster this fashion. Also, the low prices of those APIs for private use made this decision a no brainer.

After fooling around with several alternatives, I’ve chosen OpenAI’s Whisper and ChatGPT as my Speech-to-text and LLM, and Google’s Text-to-speech and Translate because the remaining modules. Creating API keys and setting these services up was super-simple, and I used to be capable of communicate with all of them through their native Python libraries in a matter of minutes.

What really struck me after testing all these services, is that the tutor I’m constructing isn’t a just an English-to-French teacher; as Whisper, ChatGPT, and Google Translate & TTS support dozens of languages, this might be used to learn just about any language while speaking another language. That’s insane!

Let’s first make sure that the general flow is well understood: We start by recording the user’s voice, which is sent to Whisper API, and returns as text. The text is added to the chat history and sent to ChatGPT, which returns a written response. Its response is sent to Google Text-to-speech, which returns a sound file that will likely be played as audio.

My first practical step was to interrupt this all the way down to components and design the general architecture. I knew I’ll need a UI, preferably a Web UI because it’s just easier to launch apps through the browser today then having a standalone executable. I’ll also need a “backend”, which will likely be the actual Python code, communicating with all different services. But so as to provide a real-time flowing experience, I spotted I’ll need to interrupt it to different threads.

The fundamental thread will run the vast majority of the code: it’ll transcribe my recording to text (via Whisper), display this text on the screen as a part of the chat, after which display the tutor’s written response on the chat screen as well (as received by ChatGPT). But I’ll should move the tutor’s text-to-speech to a separate thread — otherwise, what we’ll have is:

the tutor’s voice will only be heard once your entire message will likely be received from ChatGPT, and its response is likely to be long
it’ll block the user from responding while the tutor speaks

that’s not the “flowing” behavior I’d prefer to have; I’d just like the tutor to start speaking as its message is being written on the screen, and to definitely not block the user and stop it from responding simply because audio remains to be playing.

To do this, the text-to-speech a part of the project was split to two additional threads. Because the tutor’s response was being received from ChatGPT token-by-token, every full sentence created was passed to a different thread, from which it was sent to the text-to-speech service, converting it to sound files. I’d prefer to emphasize the word files here — as I’m sending text to the TTS service sentence-by-sentence, I even have multiple audio files, one per each sentence, which should be played in the proper order. These sound files are then played from one other thread, ensuring audio playback doesn’t block the remaining of this system from running.

Making all this work, together with several other issues originating from UI-server interactions, were the complicated a part of this project. Surprising, huh — software engineering is where things get hard.

Well, a UI was something I knew I’ll need, and I also knew just about how I’d prefer it to look — but coding a UI is beyond my knowledge. So I made a decision to try a novel approach: I asked ChatGPT to put in writing my UI for me.

For this I used the actual ChatGPT service (not the API), and used GPT-4 (yes, I’m a proud paying customer!). Surprisingly, my initial prompt:

Write a Python web UI for a chatbot application. The text box where 
the user enters his prompt is positioned at the underside of the screen, and 
all previous messages are kept on screen

delivered a tremendous first result, ending up with a Python-Flask backend, jQuery code, HTML and matching CSS. But that was only about 80% of all of the functionality I used to be hoping to get, so I spent roughly 10 hours going back-and-forth with GPT-4, optimizing and upgrading my UI, one request at a time.

If I made it look easy,I won’t to obviously say that it wasn’t. The more requests I added, the more GPT-4 got confused and delivered malfunctioning code, which sooner or later was easier to correct manually than asking it to repair it. And I had a whole lot of requests:

Add a profile picture next to every message
Add a button for each message re-playing the its audio
Add a button to each French message that may add its translation below the unique text
Add a save-session and a load-session buttons
Add a dark-mode option, make it select the best mode routinely
Add a “working” icon at any time when a respond from a service is waited for
And lots of many more…

Still, although often GPT’s code never worked out of the box, considering the very fact I even have little or no knowledge within the fields of front-end, the outcomes are amazing — and much beyond anything I could have done myself just by Googling and StackOverflowing. I’ve also made a whole lot of progress in learning the right way to craft higher prompts. Fascinated about it, perhaps I should write one other blogpost just on the lessons-learned from literally constructing a product from the bottom up alongside an LLM… stay tuned!

For this a part of the post, I’ll assume you’ve got some basic knowledge of how communication with a Chat LLM (like ChatGPT) works via API. Should you don’t, you would possibly get slightly lost.

Last but most-certainly not least — I needed to make GPT take the role of a personal tutor.

As a start line, I added a System Prompt to the start of the chat. Because the chat with an LLM is largely an inventory of messages sent by the user and the bot to one another, a System Prompt is generally the primary message of the chat, which describes to the bot the way it should behave and what is anticipated of it. Mine looked something like this (parameters encapsulated by curly-braces are replaced by run-time values):

You're a {language} teacher named {teacher_name}. 
You're on a 1-on-1 session together with your student, {user_name}. {user_name}'s 
{language} level is: {level}.
Your task is to help your student in advancing their {language}.
* When the session begins, offer an appropriate session for {user_name}, unless
asked for something else.
* {user_name}'s native language is {user_language}. {user_name} might 
address you in their very own language when felt their {language} isn't well 
enough. When that happens, first translate their message to {language}, 
after which reply.
* IMPORTANT: In case your student makes any mistakes, be it typo or grammar, 
you MUST first correct your student and only then reply.
* You're only allowed to talk {language}.

This was actually yielding pretty good results, however it gave the impression of the effectiveness of the behavioral instructions I gave the bot (“correct me after I’m unsuitable”, “at all times respond in French”), decayed because the chat went on.

Attempting to fight this vanishing behavior, I got here up with an interesting solution; I manipulated the user messages before sending them over to GPT. Regardless of the user’s message was, I added additional text to it:

[User message goes here]
---
IMPORTANT: 
* If I replied in {language} and made any mistakes (grammar, typos, etc), 
you will need to correct me before replying
* You need to keep the session flow, you are response cannot end the session. 
Attempt to avoid broad questions like "what would you prefer to do", and like 
to offer me with related questions and exercises. 
* You MUST reply in {language}.

Adding these at the top of each user’s message made sure the LLM responds exactly the best way I wanted it to. It’s price mentioning that the long suffix I added is written in English, while the user’s message won’t. For this reason I added an explicit separator between the unique message and my addition (the ---), ending the context of the unique message and starting a recent context. Also note that as this suffix is added to the user’s message, it’s written in first-person (“I”, “me”, etc..). This little trick improved results and behavior dramatically. While it would goes without saying, it is likely to be price emphasizing that this suffix isn’t displayed within the chat UI and the user has no idea it’s added to their messages. It’s inserted behind the scenes, right before being sent with the remaining of the chat history to ChatGPT.

Another behavior I desired to have was making the tutor speak first, meaning ChatGPT will send the primary message because the session begins, without waiting for the user to initiate the session. That’s, apparently, not something ChatGPT was designed to do.

What I discovered when attempting to make ChatGPT reply on a messages-history that contained only the System Prompt, is that ChatGPT “lost it”, and commenced making a chat with itself, playing each the user and the bot. Irrespective of what I attempted, I wasn’t capable of make it properly initiate the session without the user saying something first.

After which I had an idea. When the session initialized, I send ChatGPT the next message on behalf of the user:

Greet me, after which suggest 3 optional subjects for our lesson suiting my 
level. You need to reply in {language}.

This request was designed to make GPT’s response look exactly how I believed a correct initialization of the session by the bot needs to be like. I then removed my message from the chat, and made it seem as if the bot kicked off the session by itself.

What kicked-off as a funny little whim became reality in literally no-time, done completely during spare-time of just one quite-busy man. The incontrovertible fact that tasks akin to these are actually so easy to create doesn’t stop to amaze me. Only one 12 months ago, having something as ChatGPT available was sci-fi, and now I can shape it from my very own personal laptop.

That is the start of the longer term, and whatever is to come back — at the very least I do know I’ll be ready for it with yet another foreign language. Au revoir!

How I Coded My Own Private French Tutor Out of ChatGPT Ranging from Scratch Architecture and Threading Designing the UI Prompt Engineering Summary

Step-by-step guide to how I used the most recent AI services to show me a recent language, from architecture to prompt engineering

What are your thoughts on this topic?
Let us know in the comments below.

4 COMMENTS

Share this article

Recent posts

OpenAI Unveils SearchGPT: A Recent AI-Powered Search Engine

Public Release: Kling AI Video Generator

UK declares hiring of AI staff, but criticism continues

Radical Simplicity in Data Engineering

OpenAI reveals ‘SearchGPT’

How I Coded My Own Private French Tutor Out of ChatGPT Ranging from Scratch Architecture and Threading Designing the UI Prompt Engineering Summary

Step-by-step guide to how I used the most recent AI services to show me a recent language, from architecture to prompt engineering

What are your thoughts on this topic? Let us know in the comments below.

4 COMMENTS

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.