An Agentic Approach to Reducing LLM Hallucinations

-

Tip 2: Use structured outputs

Using structured outputs means forcing the LLM to output valid JSON or YAML text. It will will let you reduce the useless ramblings and get “straight-to-the-point” answers about what you wish from the LLM. It also will help with the following suggestions because it makes the LLM responses easier to confirm.

Here is how you may do that with Gemini’s API:

import json

import google.generativeai as genai
from pydantic import BaseModel, Field

from document_ai_agents.schema_utils import prepare_schema_for_gemini

class Answer(BaseModel):
answer: str = Field(..., description="Your Answer.")

model = genai.GenerativeModel("gemini-1.5-flash-002")

answer_schema = prepare_schema_for_gemini(Answer)

query = "List all of the the reason why LLM hallucinate"

context = (
"LLM hallucination refers back to the phenomenon where large language models generate plausible-sounding but"
" factually incorrect or nonsensical information. This could occur resulting from various aspects, including biases"
" within the training data, the inherent limitations of the model's understanding of the actual world, and the "
"model's tendency to prioritize fluency and coherence over accuracy."
)

messages = (
[context]
+ [
f"Answer this question: {question}",
]
+ [
f"Use this schema for your answer: {answer_schema}",
]
)

response = model.generate_content(
messages,
generation_config={
"response_mime_type": "application/json",
"response_schema": answer_schema,
"temperature": 0.0,
},
)

response = Answer(**json.loads(response.text))

print(f"{response.answer=}")

Where “prepare_schema_for_gemini” is a utility function that prepares the schema to match Gemini’s weird requirements. You could find its definition here: code.

This code defines a Pydantic schema and sends this schema as a part of the query in the sphere “response_schema”. This forces the LLM to follow this schema in its response and makes it easier to parse its output.

Tip 3: Use chain of thoughts and higher prompting

Sometimes, giving the LLM the space to work out its response, before committing to a final answer, can assist produce higher quality responses. This system is named Chain-of-thoughts and is widely used because it is effective and really easy to implement.

We can even explicitly ask the LLM to reply with “N/A” if it may’t find enough context to provide a top quality response. It will give it a simple way out as an alternative of trying to reply to questions it has no answer to.

For instance, lets look into this easy query and context:

Context

Thomas Jefferson (April 13 [O.S. April 2], 1743 — July 4, 1826) was an American statesman, planter, diplomat, lawyer, architect, philosopher, and Founding Father who served because the third president of the US from 1801 to 1809.[6] He was the first writer of the Declaration of Independence. Following the American Revolutionary War and before becoming president in 1801, Jefferson was the nation’s first U.S. secretary of state under George Washington after which the nation’s second vice chairman under John Adams. Jefferson was a number one proponent of democracy, republicanism, and natural rights, and he produced formative documents and decisions on the state, national, and international levels. (Source: Wikipedia)

Query

What yr did davis jefferson die?

A naive approach yields:

Response

answer=’1826′

Which is clearly false as Jefferson Davis shouldn’t be even mentioned within the context in any respect. It was Thomas Jefferson that died in 1826.

If we alter the schema of the response to make use of chain-of-thoughts to:

class AnswerChainOfThoughts(BaseModel):
rationale: str = Field(
...,
description="Justification of your answer.",
)
answer: str = Field(
..., description="Your Answer. Answer with 'N/A' if answer shouldn't be found"
)

We’re also adding more details about what we expect as output when the query shouldn’t be answerable using the context “Answer with ‘N/A’ if answer shouldn’t be found”

With this latest approach, we get the next rationale (remember, chain-of-thought):

The provided text discusses Thomas Jefferson, not Jefferson Davis. No information in regards to the death of Jefferson Davis is included.

And the ultimate answer:

answer=’N/A’

Great ! But can we use a more general approach to hallucination detection?

We will, with Agents!

Tip 4: Use an Agentic approach

We’ll construct an easy agent that implements a three-step process:

  • Step one is to incorporate the context and ask the query to the LLM in an effort to get the primary candidate response and the relevant context that it had used for its answer.
  • The second step is to reformulate the query and the primary candidate response as a declarative statement.
  • The third step is to ask the LLM to confirm whether or not the relevant context entails the candidate response. It is named “Self-verification”: https://arxiv.org/pdf/2212.09561

So as to implement this, we define three nodes in LangGraph. The primary node will ask the query while including the context, the second node will reformulate it using the LLM and the third node will check the entailment of the statement in relation to the input context.

The primary node might be defined as follows:

    def answer_question(self, state: DocumentQAState):
logger.info(f"Responding to query '{state.query}'")
assert (
state.pages_as_base64_jpeg_images or state.pages_as_text
), "Input text or images"
messages = (
[
{"mime_type": "image/jpeg", "data": base64_jpeg}
for base64_jpeg in state.pages_as_base64_jpeg_images
]
+ state.pages_as_text
+ [
f"Answer this question: {state.question}",
]
+ [
f"Use this schema for your answer: {self.answer_cot_schema}",
]
)

response = self.model.generate_content(
messages,
generation_config={
"response_mime_type": "application/json",
"response_schema": self.answer_cot_schema,
"temperature": 0.0,
},
)

answer_cot = AnswerChainOfThoughts(**json.loads(response.text))

return {"answer_cot": answer_cot}

And the second as:

    def reformulate_answer(self, state: DocumentQAState):
logger.info("Reformulating answer")
if state.answer_cot.answer == "N/A":
return

messages = [
{
"role": "user",
"parts": [
{
"text": "Reformulate this question and its answer as a single assertion."
},
{"text": f"Question: {state.question}"},
{"text": f"Answer: {state.answer_cot.answer}"},
]
+ [
{
"text": f"Use this schema for your answer: {self.declarative_answer_schema}"
}
],
}
]

response = self.model.generate_content(
messages,
generation_config={
"response_mime_type": "application/json",
"response_schema": self.declarative_answer_schema,
"temperature": 0.0,
},
)

answer_reformulation = AnswerReformulation(**json.loads(response.text))

return {"answer_reformulation": answer_reformulation}

The third one as:

    def verify_answer(self, state: DocumentQAState):
logger.info(f"Verifying answer '{state.answer_cot.answer}'")
if state.answer_cot.answer == "N/A":
return
messages = [
{
"role": "user",
"parts": [
{
"text": "Analyse the following context and the assertion and decide whether the context "
"entails the assertion or not."
},
{"text": f"Context: {state.answer_cot.relevant_context}"},
{
"text": f"Assertion: {state.answer_reformulation.declarative_answer}"
},
{
"text": f"Use this schema for your answer: {self.verification_cot_schema}. Be Factual."
},
],
}
]

response = self.model.generate_content(
messages,
generation_config={
"response_mime_type": "application/json",
"response_schema": self.verification_cot_schema,
"temperature": 0.0,
},
)

verification_cot = VerificationChainOfThoughts(**json.loads(response.text))

return {"verification_cot": verification_cot}

Full code in https://github.com/CVxTz/document_ai_agents

Notice how each node uses its own schema for structured output and its own prompt. This is feasible resulting from the flexibleness of each Gemini’s API and LangGraph.

Lets work through this code using the identical example as above ➡️
(Note: we will not be using chain-of-thought on the primary prompt in order that the verification gets triggered for our tests.)

Context

Thomas Jefferson (April 13 [O.S. April 2], 1743 — July 4, 1826) was an American statesman, planter, diplomat, lawyer, architect, philosopher, and Founding Father who served because the third president of the US from 1801 to 1809.[6] He was the first writer of the Declaration of Independence. Following the American Revolutionary War and before becoming president in 1801, Jefferson was the nation’s first U.S. secretary of state under George Washington after which the nation’s second vice chairman under John Adams. Jefferson was a number one proponent of democracy, republicanism, and natural rights, and he produced formative documents and decisions on the state, national, and international levels. (Source: Wikipedia)

Query

What yr did davis jefferson die?

First node result (First answer):

relevant_context=’Thomas Jefferson (April 13 [O.S. April 2], 1743 — July 4, 1826) was an American statesman, planter, diplomat, lawyer, architect, philosopher, and Founding Father who served because the third president of the US from 1801 to 1809.’

answer=’1826′

Second node result (Answer Reformulation):

declarative_answer=’Davis Jefferson died in 1826′

Third node result (Verification):

rationale=’The context states that Thomas Jefferson died in 1826. The assertion states that Davis Jefferson died in 1826. The context doesn’t mention Davis Jefferson, only Thomas Jefferson.’

entailment=’No’

So the verification step rejected (No entailment between the 2) the initial answer. We will now avoid returning a hallucination to the user.

Bonus Tip : Use stronger models

This tip shouldn’t be all the time easy to use resulting from budget or latency limitations but you must know that stronger LLMs are less vulnerable to hallucination. So, if possible, go for a more powerful LLM in your most sensitive use cases. You’ll be able to check a benchmark of hallucinations here: https://github.com/vectara/hallucination-leaderboard. We will see that the highest models on this benchmark (least hallucinations) also ranks at the highest of conventional NLP leader boards.

Source: https://github.com/vectara/hallucination-leaderboard Source License: Apache 2.0

On this tutorial, we explored strategies to enhance the reliability of LLM outputs by reducing the hallucination rate. The fundamental recommendations include careful formatting and prompting to guide LLM calls and using a workflow based approach where Agents are designed to confirm their very own answers.

This involves multiple steps:

  1. Retrieving the precise context elements utilized by the LLM to generate the reply.
  2. Reformulating the reply for easier verification (In declarative form).
  3. Instructing the LLM to ascertain for consistency between the context and the reformulated answer.

While all the following tips can significantly improve accuracy, you must keep in mind that no method is foolproof. There’s all the time a risk of rejecting valid answers if the LLM is overly conservative during verification or missing real hallucination cases. Subsequently, rigorous evaluation of your specific LLM workflows remains to be essential.

Full code in https://github.com/CVxTz/document_ai_agents

Thanks for reading !

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x