Methods to Construct an AI Journal with LlamaIndex

will share tips on how to construct an AI journal with the LlamaIndex. We are going to cover one essential function of this AI journal: asking for advice. We are going to start with essentially the most basic implementation and iterate from there. We are able to see significant improvements for this function once we apply design patterns like Agentic Rag and multi-agent workflow.

You will discover the source code of this AI Journal in my GitHub repo here. And about who I’m.

Overview of AI Journal

I need to construct my principles by following Ray Dalio’s practice. An AI journal will help me to self-reflect, track my improvement, and even give me advice. The general function of such an AI journal looks like this:

AI Journal Overview. Image by Creator.

Today, we’ll only cover the implementation of the seek-advise flow, which is represented by multiple purple cycles within the above diagram.

Simplest Form: LLM with Large Context

In essentially the most straightforward implementation, we will pass all of the relevant content into the context and fasten the query we would like to ask. We are able to do this in Llamaindex with just a few lines of code.

import pymupdf
from llama_index.llms.openai import OpenAI

path_to_pdf_book = './path/to/pdf/book.pdf'
def load_book_content():
    text = ""
    with pymupdf.open(path_to_pdf_book) as pdf:
        for page in pdf:
            text += str(page.get_text().encode("utf8", errors='ignore'))
    return text

system_prompt_template = """You might be an AI assistant that gives thoughtful, practical, and *deeply personalized* suggestions by combining:
- The user's personal profile and principles
- Insights retrieved from *Principles* by Ray Dalio
Book Content: 
```
{book_content}
```
User profile:
```
{user_profile}
```
User's query:
```
{user_question}
```
"""

def get_system_prompt(book_content: str, user_profile: str, user_question: str):
    system_prompt = system_prompt_template.format(
        book_content=book_content,
        user_profile=user_profile,
        user_question=user_question
    )
    return system_prompt

def chat():
    llm = get_openai_llm()
    user_profile = input(">>Tell me about yourself: ")
    user_question = input(">>What do you would like to ask: ")
    user_profile = user_profile.strip()
    book_content = load_book_summary()
    response = llm.complete(prompt=get_system_prompt(book_content, user_profile, user_question))
    return response

This approach has downsides:

Low Precision: Loading all of the book context might prompt LLM to lose give attention to the user’s query.
High Cost: Sending over significant-sized content in every LLM call means high cost and poor performance.

With this approach, in the event you pass the entire content of Ray Dalio’s Principles book, responses to questions like “Methods to handle stress?” grow to be very general. Such responses without regarding my query made me feel that the AI was not listening to me. Despite the fact that it covers many necessary concepts like , , and . I like the recommendation I got to be more targeted to the query I raised. Let’s see how we will improve it with RAG.

Enhanced Form: Agentic RAG

So, what’s Agentic RAG? Agentic RAG is combining dynamic decision-making and data retrieval. In our AI journal, the Agentic RAG flow looks like this:

Query Evaluation: Poorly framed questions result in poor query results. The agent will evaluate the user’s query and make clear the questions if the Agent believes it’s obligatory.
Query Re-write: Rewrite the user enquiry to project it to the indexed content within the semantic space. I discovered these steps essential for improving the precision through the retrieval. Let’s say in case your knowledge base is Q/A pair and you might be indexing the questions part to go looking for answers. Rewriting the user’s query statement to a correct query will provide help to find essentially the most relevant content.
Query Vector Index: Many parameters might be tuned when constructing such an index, including chunk size, overlap, or a unique index type. For simplicity, we’re using VectorStoreIndex here, which has a default chunking strategy.
Filter & Synthetic: As a substitute of a posh re-ranking process, I explicitly instruct LLM to filter and find relevant content within the prompt. I see LLM picking up essentially the most relevant content, despite the fact that sometimes it has a lower similarity rating than others.

With this Agentic RAG, you’ll be able to retrieve highly relevant content to the user’s questions, generating more targeted advice.

Let’s examine the implementation. With the LlamaIndex SDK, creating and persisting an index in your local directory is easy.

from llama_index.core import Document, VectorStoreIndex, StorageContext, load_index_from_storage

Settings.embed_model = OpenAIEmbedding(api_key="ak-xxxx")
PERSISTED_INDEX_PATH = "/path/to/the/directory/persist/index/locally"

def create_index(content: str):
    documents = [Document(text=content)]
    vector_index = VectorStoreIndex.from_documents(documents)
    vector_index.storage_context.persist(persist_dir=PERSISTED_INDEX_PATH)

def load_index():
    storage_context = StorageContext.from_defaults(persist_dir=PERSISTED_INDEX_PATH)
    index = load_index_from_storage(storage_context)
    return index

Once we have now an index, we will create a question engine on top of that. The query engine is a strong abstraction that means that you can adjust the parameters through the query(e.g., TOP K) and the synthesis behaviour after the content retrieval. In my implementation, I overwrite the response_mode NO_TEXT since the agent will process the book content returned by the function call and synthesize the end result. Having the query engine to synthesize the result before passing it to the agent could be redundant.

from llama_index.core.indices.vector_store import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import ResponseMode
from llama_index.core import  VectorStoreIndex, get_response_synthesizer

def _create_query_engine_from_index(index: VectorStoreIndex):
    # configure retriever
    retriever = VectorIndexRetriever(
        index=index,
        similarity_top_k=TOP_K,
    )
    # return the unique content without using LLM to synthesizer. For later evaluation.
    response_synthesizer = get_response_synthesizer(response_mode=ResponseMode.NO_TEXT)
    # assemble query engine
    query_engine = RetrieverQueryEngine(
        retriever=retriever,
        response_synthesizer=response_synthesizer
    )
    return query_engine

The prompt looks like the next:

You might be an assistant that helps reframe user questions into clear, concept-driven statements that match 
the style and topics of Principles by Ray Dalio, and perform look up principle book for relevant content. 

Background:
Principles teaches structured fascinated by life and work decisions.
The important thing ideas are:
* Radical truth and radical transparency
* Decision-making frameworks
* Embracing mistakes as learning

Task:
- Task 1: Make clear the user's query if needed. Ask follow-up questions to make sure you understand the user's intent.
- Task 2: Rewrite a user’s query into an announcement that might match how Ray Dalio frames ideas in Principles. Use formal, logical, neutral tone.
- Task 3: Look up principle book with given re-wrote statements. You must provide a minimum of {REWRITE_FACTOR} rewrote versions.
- Task 4: Find essentially the most relevant from the book content as your fina answers.

Finally, we will construct the agent with those functions defined.

def get_principle_rag_agent():
    index = load_persisted_index()
    query_engine = _create_query_engine_from_index(index)

    def look_up_principle_book(original_question: str, rewrote_statement: List[str]) -> List[str]:
        result = []
        for q in rewrote_statement:
            response = query_engine.query(q)
            content = [n.get_content() for n in response.source_nodes]
            result.extend(content)
        return result

    def clarify_question(original_question: str, your_questions_to_user: List[str]) -> str:
        """
        Make clear the user's query if needed. Ask follow-up questions to make sure you understand the user's intent.
        """
        response = ""
        for q in your_questions_to_user:
            print(f"Query: {q}")
            r = input("Response:")
            response += f"Query: {q}nResponse: {r}n"
        return response

    tools = [
        FunctionTool.from_defaults(
            fn=look_up_principle_book,
            name="look_up_principle_book",
            description="Look up principle book with re-wrote queries. Getting the suggestions from the Principle book by Ray Dalio"),
        FunctionTool.from_defaults(
            fn=clarify_question,
            name="clarify_question",
            description="Clarify the user's question if needed. Ask follow-up questions to ensure you understand the user's intent.",
        )
    ]

    agent = FunctionAgent(
        name="principle_reference_loader",
        description="You might be a helpful agent will based on user's query and look up essentially the most relevant content in principle book.n",
        system_prompt=QUESTION_REWRITE_PROMPT,
        tools=tools,
    )
    return agent

rag_agent = get_principle_rag_agent()
response = await agent.run(chat_history=chat_history)

There are just a few observations I had through the implementations:

One interesting fact I discovered is that providing a non-used parameter, original_question , within the function signature helps. I discovered that after I wouldn’t have such a parameter, LLM sometimes doesn’t follow the rewrite instruction and passes the unique query in rewrote_statement the parameter. Having original_question parameters someway emphasizes the rewriting mission to LLM.
Different LLMs behave quite otherwise given the identical prompt. I discovered DeepSeek V3 rather more reluctant to trigger function calls than other model providers. This doesn’t necessarily mean it just isn’t usable. If a functional call needs to be initiated 90% of the time, it needs to be a part of the workflow as a substitute of being registered as a function call. Also, in comparison with OpenAI’s models, I discovered Gemini good at citing the source of the book when it synthesizes the outcomes.
The more content you load into the context window, the more inference capability the model needs. A smaller model with less inference power is more more likely to wander off in the big context provided.

Nevertheless, to finish the seek-advice function, you’ll need multiple Agents working together as a substitute of a single Agent. Let’s speak about tips on how to chain your Agents together into workflows.

Final Form: Agent Workflow

Before we start, I like to recommend this text by Anthropic, Constructing Effective Agents. The one-liner summary of the articles is that . In LlamaIndex, you’ll be able to do each. It means that you can create an agent workflow with more automatic routing or a personalized workflow with more explicit control of the transition of steps. I’ll provide an example of each implementations.

Let’s take a take a look at how you’ll be able to construct a dynamic workflow. Here’s a code example.

interviewer = FunctionAgent(
        name="interviewer",
        description="Useful agent to make clear user's questions",
        system_prompt=_intervierw_prompt,
        can_handoff_to = ["retriver"]
        tools=tools
)
interviewer = FunctionAgent(
        name="retriever",
        description="Useful agent to retrive principle book's content.",
        system_prompt=_retriver_prompt,
        can_handoff_to = ["advisor"]
        tools=tools
)
advisor = FunctionAgent(
        name="advisor",
        description="Useful agent to advise user.",
        system_prompt=_advisor_prompt,
        can_handoff_to = []
        tools=tools
)
workflow = AgentWorkflow(
        agents=[interviewer, advisor, retriever],
        root_agent="interviewer",
    )
handler = await workflow.run(user_msg="Methods to handle stress?")

It’s dynamic since the Agent transition relies on the function call of the LLM model. Underlying, LlamaIndex workflow provides agent descriptions as functions for LLM models. When the LLM model triggers such “Agent Function Call”, LlamaIndex will path to your next corresponding agent for the next step processing. Your previous agent’s output has been added to the workflow internal state, and your following agent will pick up the state as a part of the context of their call to the LLM model. You furthermore mght leverage state and memory components to administer the workflow’s internal state or load external data(reference the document here).

Nevertheless, as I actually have suggested, you’ll be able to explicitly control the steps in your workflow to realize more control. With LlamaIndex, it could be done by extending the workflow object. For instance:

class ReferenceRetrivalEvent(Event):
    query: str

class Advice(Event):
    principles: List[str]
    profile: dict
    query: str
    book_content: str

class AdviceWorkFlow(Workflow):
    def __init__(self, verbose: bool = False, session_id: str = None):
        state = get_workflow_state(session_id)
        self.principles = state.load_principle_from_cases()
        self.profile = state.load_profile()
        self.verbose = verbose
        super().__init__(timeout=None, verbose=verbose)

    @step
    async def interview(self, ctx: Context,
                        ev: StartEvent) -> ReferenceRetrivalEvent:
        # Step 1: Interviewer agent asks inquiries to the user
        interviewer = get_interviewer_agent()
        query = await _run_agent(interviewer, query=ev.user_msg, verbose=self.verbose)

        return ReferenceRetrivalEvent(query=query)

    @step
    async def retrieve(self, ctx: Context, ev: ReferenceRetrivalEvent) -> Advice:
        # Step 2: RAG agent retrieves relevant content from the book
        rag_agent = get_principle_rag_agent()
        book_content = await _run_agent(rag_agent, query=ev.query, verbose=self.verbose)
        return Advice(principles=self.principles, profile=self.profile,
                      query=ev.query, book_content=book_content)

    @step
    async def advice(self, ctx: Context, ev: Advice) -> StopEvent:
        # Step 3: Adviser agent provides advice based on the user's profile, principles, and book content
        advisor = get_adviser_agent(ev.profile, ev.principles, ev.book_content)
        advise = await _run_agent(advisor, query=ev.query, verbose=self.verbose)
        return StopEvent(result=advise)

The particular event type’s return controls the workflow’s step transition. For example, retrieve step returns an Advice event that can trigger the execution of the advice step. You too can leverage the Advice event to pass the obligatory information you would like.

Throughout the implementation, in the event you are annoyed by having to start out over the workflow to debug some steps in the center, the is important when you would like to failover the workflow execution. You may store your state in a serialised format and get better your workflow by unserialising it to a context object. Your workflow will proceed executing based on the state as a substitute of starting over.

workflow = AgentWorkflow(
    agents=[interviewer, advisor, retriever],
    root_agent="interviewer",
)
try:
    handler = w.run()
    result = await handler
except Exception as e:
    print(f"Error during initial run: {e}")
    await fail_over()
    # Optional, serialised and save the contexct for debugging 
    ctx_dict = ctx.to_dict(serializer=JsonSerializer())
    json_dump_and_save(ctx_dict)
    # Resume from the identical context
    ctx_dict = load_failed_dict()
    restored_ctx = Context.from_dict(workflow, ctx_dict,serializer=JsonSerializer())
    handler = w.run(ctx=handler.ctx)
    result = await handler

Summary

On this post, we have now discussed tips on how to use LlamaIndex to implement an AI journal’s core function. The important thing learning includes:

Using Agentic RAG to leverage LLM capability to dynamically rewrite the unique query and synthesis result.
Use a Customized Workflow to realize more explicit control over step transitions. Construct dynamic agents when obligatory.

The source code of this AI journal is in my GitHub repo here. I hope you enjoy this text and this small app I built. Cheers!

Methods to Construct an AI Journal with LlamaIndex

Overview of AI Journal

Simplest Form: LLM with Large Context

Enhanced Form: Agentic RAG

Final Form: Agent Workflow

Summary

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Introducing Community Tools on HuggingChat

Introducing the SQL Console on Datasets

Positive-tuning LLMs to 1.58bit: extreme quantization made easy

Optimize and deploy with Optimum-Intel and OpenVINO GenAI

Exploring the Day by day Papers Page on Hugging Face

Methods to Construct an AI Journal with LlamaIndex

Overview of AI Journal

Simplest Form: LLM with Large Context

Enhanced Form: Agentic RAG

Final Form: Agent Workflow

Summary

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.