Home Artificial Intelligence Prompt Engineering Guide for Data Analysts

Prompt Engineering Guide for Data Analysts

0
Prompt Engineering Guide for Data Analysts

Photo by Emiliano Vittoriosi on Unsplash

Getting probably the most out of LLM models as a Data Analyst with Prompt Engineering

Large Language Model (LLM) is on the rise, driven by the recognition of ChatGPT by OpenAI which took the web by storm. As a practitioner in the information field, I look for tactics to best utilize this technology in my work, especially for insightful-yet-practical work as a Data Analyst.

LLMs can solve tasks without additional model training via “prompting” techniques, wherein the problem is presented to the model as a text prompt. Attending to “the fitting prompts” are necessary to make sure the model is providing high-quality and accurate results for the tasks assigned.

In this text, I shall be sharing the principles of prompting, techniques to construct prompts, and the roles Data Analysts can play on this “prompting era”.

Quoting Ben Lorica from Gradient Flow, “prompt engineering is the art of crafting effective input prompts to elicit the specified output from foundation models.” It’s the iterative technique of developing prompts that may effectively leverage the capabilities of existing generative AI models to perform specific objectives.

Prompt engineering skills may also help us understand the capabilities and limitations of a giant language model. The prompt itself acts as an input to the model, which signifies the impact on the model output. prompt will get the model to provide desirable output, whereas working iteratively from a nasty prompt will help us understand the constraints of the model and learn how to work with it.

Isa Fulford and Andrew Ng within the ChatGPT Prompt Engineering for Developers course mentioned two important principles of prompting:

  • Principle 1: Write clear and specific instructions
  • Principle 2: Give the model time to “think”

I believe prompting is like giving instructions to a naive “machine kid”.

The kid could be very intelligent, but that you must be clear about what you wish from it (by providing explanations, examples, specified output format, etc) and give it some space to digest and process it (specify the problem-solving steps, ask it to slowly process it). The kid, given its exposure, may also be very creative and imaginary in providing answers — which we call a hallucination of the LLM. Understanding the context and providing the fitting prompt might assist in avoiding this problem.

Prompt engineering is a growing field, with research on this topic rapidly increasing from 2022 onwards. Among the state-of-the-art prompting techniques commonly used include n-shot prompting, chain-of-thought (CoT) prompting, and generated knowledge prompting.

A sample Python notebook demonstrating these techniques is shared under this GitHub project.

1. N-shot prompting (Zero-shot prompting, Few-shot prompting)

Known for its variation like Zero-shot prompting and Few-shot prompting, the N in N-shot prompting represents the variety of “training” or clues given to the model to make predictions.

Zero-shot prompting is where a model makes predictions with none additional training. This works for common straightforward problems like classification (i.e. sentiment evaluation, spam classification), text transformation (i.e. translation, summarizing, expanding), and straightforward text generation on which the LLM has been largely trained.

Zero-shot prompting: Straightforwardly ask the model on sentiment (Image by Writer)

Few-shot prompting uses a small amount of knowledge (typically between two and five) to adapt its output based on these small examples. These examples are supposed to steer the model to higher performance for a more context-specific problem.

Few-shot prompting: Give examples of how we expect the model output to be

2. Chain-of-Thought (CoT) prompting

Chain-of-Thought prompting was introduced by Google researchers in 2022. Within the Chain-of-Thought prompting, the model is prompted to produce intermediate reasoning steps before giving the ultimate answer to a multi-step problem. The thought is that a model-generated chain of thought would mimic an intuitive thought process when working through a multi-step reasoning problem.

Chain-of-Thought prompting helps in driving the model to interrupt down problems accordingly

This method enables models to decompose multi-step problems into intermediate steps, enabling them to resolve complex reasoning problems that aren’t solvable with standard prompting methods.

Some further variations of Chain-of Thought prompting include:

  • Self-consistency prompting: Sample multiple diverse reasoning paths and select probably the most consistent answers. By utilizing a majority voting system, the model can arrive at more accurate and reliable answers.
  • Least-to-Most prompting (LtM): Specify the chain of thought to first break an issue right into a series of simpler subproblems after which solve them in sequence. Solving each subproblem is facilitated by the answers to previously solved subproblems. This system is inspired by real-world educational strategies for kids.
  • Energetic Prompting: Scaling the CoT approach by determining which questions are an important and helpful ones for human annotation. It first calculates the uncertainty among the many LLM’s predictions, then select probably the most uncertain questions, and these questions are chosen for human annotation before being put right into a CoT prompt.

3. Generated knowledge prompting

The thought behind the generated knowledge prompting is to ask the LLM to generate potentially useful information a couple of given query/prompt, after which leverage that provided knowledge as additional input for generating a final response.

For instance, say you need to write an article about cybersecurity, particularly cookie theft. Before asking the LLM to put in writing the article, you’ll be able to ask it to generate some danger and protection against cookie theft. This may help the LLM write a more informative blog post.

Generated knowledge prompting: (1) Ask the model to generate some content
Generated knowledge prompting: (2) Use the generated content as input to the model

Additional tactics

On top of the above-specified techniques, you too can use these tactics below to make the prompting simpler

  • Use delimiters like triple backticks (“`), angle brackets (<>), or tags ( ) to point distinct parts of the input, making it cleaner for debugging and avoiding prompt injection.
  • Ask for structured output (i.e. HTML/JSON format), this is beneficial for using the model output for one more machine processing.
  • Specify the intended tone of the text to get the tonality, format, and length of model output that you simply need. For instance, you’ll be able to instruct the model to formalize the language, generate not greater than 50 words, etc.
  • Modify the model’s temperature parameter to mess around the model’s degree of randomness. The upper the temperature, the model’s output can be random than accurate, and even hallucinate.

A sample Python notebook demonstrating these techniques is shared under this GitHub project.

Photo by Camylla Battani on Unsplash

As you’ll be able to possibly infer from the examples above, prompt engineering requires a really specific technical communication craft. When you still require business context and problem-solving skills, it continues to be a latest sort of craft that will not be entirely covered as a part of a conventional data analytics skillset.

Data Analysts can leverage their context knowledge, problem-solving skills, and statistical/technical capabilities, with the addition of effective communication for prompt engineering. These are the important thing tasks related to prompt engineering (and LLMs) which potentially be done by Analysts:

  • Specifying LLM problems to be solved. With an understanding of the LLM concepts, we will define the actions to be executed by the model (i.e. whether it’s text classification, generation, or transformation problem) and the fitting query with reference points to be put because the prompts.
  • Iterative prompting. In developing an information model, oftentimes we undergo an iterative process. After constructing the initial model, we evaluate the result, refine it, and retry it along the best way. Similarly for a prompt, we analyze where the result doesn’t give what you would like, and refine it with clearer instructions, additional examples, or specified steps. This requires critical reasoning which most Data Analysts are already good at.
  • Prompt versioning and management. With iterative prompting, you’ll find yourself with quite a few prompt attempts, and the identified model capabilities and/or limitations. It’s important to maintain track of and document these findings for team learning and continuous improvement, as with another existing data evaluation.
  • Designing for safe-prompting. Even though it has shown impressive capabilities, LLM continues to be in a really early stage and is vulnerable to loopholes and limitations. There’s this hallucination problem where models provide highly misleading information, and in addition prompt injection risk of getting untrusted text is used as a part of the prompt. Depending on the use case of the model and prompting, Analysts can advise programmatic safeguards to limit the prompt usage and evaluation of problematic prompting detection.

On top of leveraging the prevailing skills, Analysts have to hone their communication skills and the flexibility to interrupt down problems to offer higher prompts.

Large Language Models have shown promising leads to performing quite a few varieties of language tasks, and prompt engineering is the important thing to unlocking these capabilities. Prompt engineering is about communicating effectively with an AI to realize desired results.

Several techniques could be used to do prompt engineering, however the foundational principle is consistent. It’s about providing clear instructions to the model and helping it in digesting and processing these instructions. Data Analysts can leverage their context knowledge and problem-solving skills to border the fitting prompts and leverage their technical capabilities for designing prompt safeguards.

For further resources on prompt engineering, try:

I feel this area will grow even further in the subsequent few years, and I’m excited to see and participate within the evolution.

LEAVE A REPLY

Please enter your comment!
Please enter your name here