The best way to Improve LLM Responses With Higher Sampling Parameters

A deep dive into stochastic decoding with temperature, top_p, top_k, and min_p

10 min read

11 hours ago

Example Python code taken from the OpenAI Python SDK where the chat completion API is called with the parameters temperature and top_p. — When calling the OpenAI API with the Python SDK, have you ever ever wondered what precisely the temperature and top_p parameters do?

If you ask a Large Language Model (LLM) a matter, the model outputs a probability for each possible token in its vocabulary.

After sampling a token from this probability distribution, we will append the chosen token to our input prompt in order that the LLM can output the chances for the subsequent token.

This sampling process could be controlled by parameters similar to the famous temperature and top_p.

In this text, I’ll explain and visualize the sampling strategies that outline the output behavior of LLMs. By understanding what these parameters do and setting them in line with our use case, we will improve the output generated by LLMs.

For this text, I’ll use VLLM because the inference engine and Microsoft’s recent Phi-3.5-mini-instruct model with AWQ quantization. To run this model locally, I’m using my laptop’s NVIDIA GeForce RTX 2060 GPU.

· Understanding Sampling With Logprobs
∘ LLM Decoding Theory
∘ Retrieving Logprobs With the OpenAI Python SDK
· Greedy Decoding
· Temperature
· Top-k Sampling
· Top-p Sampling
· Combining Top-p…

The best way to Improve LLM Responses With Higher Sampling Parameters

A deep dive into stochastic decoding with temperature, top_p, top_k, and min_p

Table Of Contents

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

OpenAI Releases ‘Atlas’ Browser

Dispatch: Partying at certainly one of Africa’s largest AI gatherings

OpenAI enters browser war with Atlas

Scaling Recommender Transformers to a Billion Parameters

Creating AI that matters

The best way to Improve LLM Responses With Higher Sampling Parameters

A deep dive into stochastic decoding with temperature, top_p, top_k, and min_p

Table Of Contents

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.