Summarising Best Practices for Prompt Engineering

Artificial Intelligence

Summarising Best Practices for Prompt Engineering

admin

May 29, 2023

Summarising Best Practices for Prompt Engineering

More sophisticated approaches to solving much more complex tasks at the moment are being actively developed. While they significantly outperform in some scenarios, their practical usage stays somewhat limited. I’ll mention two such techniques: self-consistency and the Tree of Thoughts.

The authors of the self-consistency paper offered the next approach. As an alternative of just counting on the initial model output, they suggested sampling multiple times and aggregating the outcomes through majority voting. By counting on each intuition and the success of ensembles in classical machine learning, this method enhances the model’s robustness.

Self-consistency. Figure 1 from the Self-Consistency Improves CoT Reasoning in Language Models paper

You too can apply self-consistency without implementing the aggregation step. For tasks with short outputs ask the model to suggest several options and select the perfect one.

Tree of Thoughts (ToT) takes this idea a stride further. It puts forward the concept of applying tree-search algorithms for the model’s “reasoning thoughts”, essentially backtracking when it stumbles upon poor assumptions.

Tree of Thoughts. Figure 1 from the Tree of Thoughts: Deliberate Problem Solving with LLMs paper

For those who have an interest, take a look at Yannic Kilcher’s video with a ToT paper review.

For our particular scenario, utilizing Chain-of-Thought reasoning is just not needed, yet we will prompt the model to tackle the summarization task in two phases. Initially, it may condense all the job description, after which summarize the derived summary with a give attention to job responsibilities.

Output for a prompt v5, containing step-by-step instructions. Image by Writer created using ChatGPT

On this particular example, the outcomes didn’t show significant changes, but this approach works thoroughly for many tasks.

Few-shot Learning

The last technique we’ll cover is named few-shot learning, also generally known as in-context learning. It’s so simple as incorporating several examples into your prompt to supply the model with a clearer picture of your task.

These examples shouldn’t only be relevant to your task but in addition diverse to encapsulate the range in your data. “Labeling” data for few-shot learning is likely to be a bit more difficult once you’re using CoT, particularly in case your pipeline has many steps or your inputs are long. Nonetheless, typically, the outcomes make it well worth the effort. Also, have in mind that labeling just a few examples is much inexpensive than labeling a whole training/testing set as in traditional ML model development.

If we add an example to our prompt, it’ll understand the necessities even higher. For example, if we exhibit that we’d prefer the ultimate summary in bullet-point format, the model will mirror our template.

This prompt is kind of overwhelming, but don’t be afraid: it’s only a previous prompt (v5) and one labeled example with one other job description within the For instance: 'input description' -> 'output JSON' format.

Output for a prompt v6, containing an example. Image by Writer created using ChatGPT

Summarizing Best Practices

To summarize the perfect practices for prompt engineering, consider the next:

Don’t be afraid to experiment. Try different approaches and iterate step by step, correcting the model and taking small steps at a time;
Use separators in input (e.g. <>) and ask for a structured output (e.g. JSON);
Provide a listing of actions to finish the duty. At any time when feasible, offer the model a set of actions and let it output its “internal thoughts”;
In case of short outputs ask for multiple suggestions;
Provide examples. If possible, show the model several diverse examples that represent your data with the specified output.

I’d say that this framework offers a sufficient basis for automating a big selection of day-to-day tasks, like information extraction, summarization, text generation equivalent to emails, etc. Nonetheless, in a production environment, it continues to be possible to further optimize models by fine-tuning them on specific datasets to further enhance performance. Moreover, there may be rapid development within the plugins and agents, but that’s a complete different story altogether.

Prompt Engineering Course by DeepLearning.AI and OpenAI

Together with the earlier-mentioned talk by Andrej Karpathy, this blog post draws its inspiration from the ChatGPT Prompt Engineering for Developers course by DeepLearning.AI and OpenAI. It’s absolutely free, takes just a few hours to finish, and, my personal favorite, it lets you experiment with the OpenAI API without even signing up!

That’s an excellent playground for experimenting, so definitely test it out.

Wow, we covered quite plenty of information! Now, let’s move forward and begin constructing the appliance using the knowledge we’ve gained.

Generating OpenAI Key

To start, you’ll have to register an OpenAI account and create your API key. OpenAI currently offers $5 of free credit for 3 months to each individual. Follow the introduction to the OpenAI API page to register your account and generate your API key.

Once you will have a key, create an OPENAI_API_KEY environment variable to access it within the code with os.getenv('OPENAI_API_KEY').

Estimating the Costs with Tokenizer Playground

At this stage, you is likely to be interested by how much you’ll be able to do with only a free trial and what options can be found after the initial three months. It’s a reasonably good query to ask, especially when you think about that LLMs cost thousands and thousands of dollars!

In fact, these thousands and thousands are about training. It seems that the inference requests are quite reasonably priced. While GPT-4 could also be perceived as expensive (although the value is prone to decrease), gpt-3.5-turbo (the model behind default ChatGPT) continues to be sufficient for nearly all of tasks. The truth is, OpenAI has done an incredible engineering job, given how inexpensive and fast these models at the moment are, considering their original size in billions of parameters.

The gpt-3.5-turbo model comes at a price of $0.002 per 1,000 tokens.

But how much is it? Let’s see. First, we want to know what’s a token. In easy terms, a token refers to a component of a word. Within the context of the English language, you’ll be able to expect around 14 tokens for each 10 words.

To get a more accurate estimation of the variety of tokens in your specific task and prompt, the perfect approach is to provide it a try! Luckily, OpenAI provides a tokenizer playground that may assist you with this.

Side note: Tokenization for Different Languages

Because of the widespread use of English on the Web, this language advantages from essentially the most optimal tokenization. As highlighted within the “All languages should not tokenized equal” blog post, tokenization is just not a uniform process across languages, and certain languages may require a greater variety of tokens for representation. Keep this in mind if you would like to construct an application that involves prompts in multiple languages, e.g. for translation.

For example this point, let’s take a have a look at the tokenization of pangrams in numerous languages. On this toy example, English required 9 tokens, French — 12, Bulgarian — 59, Japanese — 72, and Russian — 73.

Tokenization for various languages. Screenshot of the OpenAI tokenizer playground

Cost vs Performance

As you’ll have noticed, prompts can change into quite lengthy, especially when incorporating examples. By increasing the length of the prompt, we potentially enhance the standard, but the associated fee grows similtaneously we use more tokens.

Our latest prompt (v6) consists of roughly 1.5k tokens.

Tokenization of the prompt v6. Screenshot of the OpenAI tokenizer playground

Considering that the output length is often the identical range because the input length, we will estimate a median of around 3k tokens per request (input tokens + output tokens). By multiplying this number by the initial cost, we discover that each request is about $0.006 or 0.6 cents, which is kind of reasonably priced.

Even when we consider a rather higher cost of 1 cent per request (comparable to roughly 5k tokens), you’ll still have the ability to make 100 requests for just $1. Moreover, OpenAI offers the pliability to set each soft and hard limits. With soft limits, you receive notifications once you approach your defined limit, while hard limits restrict you from exceeding the required threshold.

For local use of your LLM application, you’ll be able to comfortably configure a tough limit of $1 monthly, ensuring that you just remain inside budget while having fun with the advantages of the model.

Streamlit App Template

Now, let’s construct an internet interface to interact with the model programmatically eliminating the necessity to manually copy prompts every time. We are going to do that with Streamlit.

Streamlit is a Python library that means that you can create easy web interfaces without the necessity for HTML, CSS, and JavaScript. It’s beginner-friendly and enables the creation of browser-based applications using minimal Python knowledge. Let’s now create a straightforward template for our LLM-based application.

Firstly, we want the logic that can handle the communication with the OpenAI API. In the instance below, I consider generate_prompt()function to be defined and return the prompt for a given input text (e.g. much like what you saw before).

And that’s it! Know more about different parameters in OpenAI’s documentation, but things work well just out of the box.

Having this code, we will design a straightforward web app. We want a field to enter some text, a button to process it, and a few output widgets. I prefer to have access to each the total model prompt and output for debugging and exploring reasons.

The code for all the application will look something like this and may be present in this GitHub repository. I actually have added a placeholder function called toy_ask_chatgpt() since sharing the OpenAI key is just not a superb idea. Currently, this application simply copies the prompt into the output.

Without defining functions and placeholders, it is simply about 50 lines of code!

And due to a recent update in Streamlit it now allows embed it right in this text! So you need to have the ability to see it right below.

Now you see how easy it’s. For those who wish, you’ll be able to deploy your app with Streamlit Cloud. But watch out, since every request costs you money in case you put your API key there!

On this blog post, I listed several best practices for prompt engineering. We discussed iterative prompt development, using separators, requesting structural output, Chain-of-Thought reasoning, and few-shot learning. I also provided you with a template to construct a straightforward web app using Streamlit in under 100 lines of code. Now, it’s your turn to provide you with an exciting project idea and switch it into reality!

It’s truly amazing how modern tools allow us to create complex applications in only just a few hours. Even without extensive programming knowledge, proficiency in Python, or a deep understanding of machine learning, you’ll be able to quickly construct something useful and automate some tasks.

Don’t hesitate to ask me questions in case you’re a beginner and need to create an identical project. I’ll be greater than completely happy to help you and respond as soon as possible. Better of luck together with your projects!