will share tips on how to construct an AI journal with the LlamaIndex. We are going to cover one essential function of this AI journal: asking for advice. We are going to start with essentially the most basic implementation and iterate from there. We are able to see significant improvements for this function once we apply design patterns like Agentic Rag and multi-agent workflow.
You will discover the source code of this AI Journal in my GitHub repo here. And about who I’m.
Overview of AI Journal
I need to construct my principles by following Ray Dalio’s practice. An AI journal will help me to self-reflect, track my improvement, and even give me advice. The general function of such an AI journal looks like this:
Today, we’ll only cover the implementation of the seek-advise flow, which is represented by multiple purple cycles within the above diagram.
Simplest Form: LLM with Large Context
In essentially the most straightforward implementation, we will pass all of the relevant content into the context and fasten the query we would like to ask. We are able to do this in Llamaindex with just a few lines of code.
import pymupdf
from llama_index.llms.openai import OpenAI
path_to_pdf_book = './path/to/pdf/book.pdf'
def load_book_content():
text = ""
with pymupdf.open(path_to_pdf_book) as pdf:
for page in pdf:
text += str(page.get_text().encode("utf8", errors='ignore'))
return text
system_prompt_template = """You might be an AI assistant that gives thoughtful, practical, and *deeply personalized* suggestions by combining:
- The user's personal profile and principles
- Insights retrieved from *Principles* by Ray Dalio
Book Content:
```
{book_content}
```
User profile:
```
{user_profile}
```
User's query:
```
{user_question}
```
"""
def get_system_prompt(book_content: str, user_profile: str, user_question: str):
system_prompt = system_prompt_template.format(
book_content=book_content,
user_profile=user_profile,
user_question=user_question
)
return system_prompt
def chat():
llm = get_openai_llm()
user_profile = input(">>Tell me about yourself: ")
user_question = input(">>What do you would like to ask: ")
user_profile = user_profile.strip()
book_content = load_book_summary()
response = llm.complete(prompt=get_system_prompt(book_content, user_profile, user_question))
return response
This approach has downsides:
- Low Precision: Loading all of the book context might prompt LLM to lose give attention to the user’s query.
- High Cost: Sending over significant-sized content in every LLM call means high cost and poor performance.
With this approach, in the event you pass the entire content of Ray Dalio’s Principles book, responses to questions like “Methods to handle stress?” grow to be very general. Such responses without regarding my query made me feel that the AI was not listening to me. Despite the fact that it covers many necessary concepts like , , and . I like the recommendation I got to be more targeted to the query I raised. Let’s see how we will improve it with RAG.
Enhanced Form: Agentic RAG
So, what’s Agentic RAG? Agentic RAG is combining dynamic decision-making and data retrieval. In our AI journal, the Agentic RAG flow looks like this:

- Query Evaluation: Poorly framed questions result in poor query results. The agent will evaluate the user’s query and make clear the questions if the Agent believes it’s obligatory.
- Query Re-write: Rewrite the user enquiry to project it to the indexed content within the semantic space. I discovered these steps essential for improving the precision through the retrieval. Let’s say in case your knowledge base is Q/A pair and you might be indexing the questions part to go looking for answers. Rewriting the user’s query statement to a correct query will provide help to find essentially the most relevant content.
- Query Vector Index: Many parameters might be tuned when constructing such an index, including chunk size, overlap, or a unique index type. For simplicity, we’re using VectorStoreIndex here, which has a default chunking strategy.
- Filter & Synthetic: As a substitute of a posh re-ranking process, I explicitly instruct LLM to filter and find relevant content within the prompt. I see LLM picking up essentially the most relevant content, despite the fact that sometimes it has a lower similarity rating than others.
With this Agentic RAG, you’ll be able to retrieve highly relevant content to the user’s questions, generating more targeted advice.
Let’s examine the implementation. With the LlamaIndex SDK, creating and persisting an index in your local directory is easy.
from llama_index.core import Document, VectorStoreIndex, StorageContext, load_index_from_storage
Settings.embed_model = OpenAIEmbedding(api_key="ak-xxxx")
PERSISTED_INDEX_PATH = "/path/to/the/directory/persist/index/locally"
def create_index(content: str):
documents = [Document(text=content)]
vector_index = VectorStoreIndex.from_documents(documents)
vector_index.storage_context.persist(persist_dir=PERSISTED_INDEX_PATH)
def load_index():
storage_context = StorageContext.from_defaults(persist_dir=PERSISTED_INDEX_PATH)
index = load_index_from_storage(storage_context)
return index
Once we have now an index, we will create a question engine on top of that. The query engine is a strong abstraction that means that you can adjust the parameters through the query(e.g., TOP K) and the synthesis behaviour after the content retrieval. In my implementation, I overwrite the response_mode NO_TEXT
since the agent will process the book content returned by the function call and synthesize the end result. Having the query engine to synthesize the result before passing it to the agent could be redundant.
from llama_index.core.indices.vector_store import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import ResponseMode
from llama_index.core import VectorStoreIndex, get_response_synthesizer
def _create_query_engine_from_index(index: VectorStoreIndex):
# configure retriever
retriever = VectorIndexRetriever(
index=index,
similarity_top_k=TOP_K,
)
# return the unique content without using LLM to synthesizer. For later evaluation.
response_synthesizer = get_response_synthesizer(response_mode=ResponseMode.NO_TEXT)
# assemble query engine
query_engine = RetrieverQueryEngine(
retriever=retriever,
response_synthesizer=response_synthesizer
)
return query_engine
The prompt looks like the next:
You might be an assistant that helps reframe user questions into clear, concept-driven statements that match
the style and topics of Principles by Ray Dalio, and perform look up principle book for relevant content.
Background:
Principles teaches structured fascinated by life and work decisions.
The important thing ideas are:
* Radical truth and radical transparency
* Decision-making frameworks
* Embracing mistakes as learning
Task:
- Task 1: Make clear the user's query if needed. Ask follow-up questions to make sure you understand the user's intent.
- Task 2: Rewrite a user’s query into an announcement that might match how Ray Dalio frames ideas in Principles. Use formal, logical, neutral tone.
- Task 3: Look up principle book with given re-wrote statements. You must provide a minimum of {REWRITE_FACTOR} rewrote versions.
- Task 4: Find essentially the most relevant from the book content as your fina answers.
Finally, we will construct the agent with those functions defined.
def get_principle_rag_agent():
index = load_persisted_index()
query_engine = _create_query_engine_from_index(index)
def look_up_principle_book(original_question: str, rewrote_statement: List[str]) -> List[str]:
result = []
for q in rewrote_statement:
response = query_engine.query(q)
content = [n.get_content() for n in response.source_nodes]
result.extend(content)
return result
def clarify_question(original_question: str, your_questions_to_user: List[str]) -> str:
"""
Make clear the user's query if needed. Ask follow-up questions to make sure you understand the user's intent.
"""
response = ""
for q in your_questions_to_user:
print(f"Query: {q}")
r = input("Response:")
response += f"Query: {q}nResponse: {r}n"
return response
tools = [
FunctionTool.from_defaults(
fn=look_up_principle_book,
name="look_up_principle_book",
description="Look up principle book with re-wrote queries. Getting the suggestions from the Principle book by Ray Dalio"),
FunctionTool.from_defaults(
fn=clarify_question,
name="clarify_question",
description="Clarify the user's question if needed. Ask follow-up questions to ensure you understand the user's intent.",
)
]
agent = FunctionAgent(
name="principle_reference_loader",
description="You might be a helpful agent will based on user's query and look up essentially the most relevant content in principle book.n",
system_prompt=QUESTION_REWRITE_PROMPT,
tools=tools,
)
return agent
rag_agent = get_principle_rag_agent()
response = await agent.run(chat_history=chat_history)
There are just a few observations I had through the implementations:
- One interesting fact I discovered is that providing a non-used parameter,
original_question
, within the function signature helps. I discovered that after I wouldn’t have such a parameter, LLM sometimes doesn’t follow the rewrite instruction and passes the unique query inrewrote_statement
the parameter. Havingoriginal_question
parameters someway emphasizes the rewriting mission to LLM. - Different LLMs behave quite otherwise given the identical prompt. I discovered DeepSeek V3 rather more reluctant to trigger function calls than other model providers. This doesn’t necessarily mean it just isn’t usable. If a functional call needs to be initiated 90% of the time, it needs to be a part of the workflow as a substitute of being registered as a function call. Also, in comparison with OpenAI’s models, I discovered Gemini good at citing the source of the book when it synthesizes the outcomes.
- The more content you load into the context window, the more inference capability the model needs. A smaller model with less inference power is more more likely to wander off in the big context provided.
Nevertheless, to finish the seek-advice function, you’ll need multiple Agents working together as a substitute of a single Agent. Let’s speak about tips on how to chain your Agents together into workflows.
Final Form: Agent Workflow
Before we start, I like to recommend this text by Anthropic, Constructing Effective Agents. The one-liner summary of the articles is that . In LlamaIndex, you’ll be able to do each. It means that you can create an agent workflow with more automatic routing or a personalized workflow with more explicit control of the transition of steps. I’ll provide an example of each implementations.

Let’s take a take a look at how you’ll be able to construct a dynamic workflow. Here’s a code example.
interviewer = FunctionAgent(
name="interviewer",
description="Useful agent to make clear user's questions",
system_prompt=_intervierw_prompt,
can_handoff_to = ["retriver"]
tools=tools
)
interviewer = FunctionAgent(
name="retriever",
description="Useful agent to retrive principle book's content.",
system_prompt=_retriver_prompt,
can_handoff_to = ["advisor"]
tools=tools
)
advisor = FunctionAgent(
name="advisor",
description="Useful agent to advise user.",
system_prompt=_advisor_prompt,
can_handoff_to = []
tools=tools
)
workflow = AgentWorkflow(
agents=[interviewer, advisor, retriever],
root_agent="interviewer",
)
handler = await workflow.run(user_msg="Methods to handle stress?")
It’s dynamic since the Agent transition relies on the function call of the LLM model. Underlying, LlamaIndex workflow provides agent descriptions as functions for LLM models. When the LLM model triggers such “Agent Function Call”, LlamaIndex will path to your next corresponding agent for the next step processing. Your previous agent’s output has been added to the workflow internal state, and your following agent will pick up the state as a part of the context of their call to the LLM model. You furthermore mght leverage state
and memory
components to administer the workflow’s internal state or load external data(reference the document here).
Nevertheless, as I actually have suggested, you’ll be able to explicitly control the steps in your workflow to realize more control. With LlamaIndex, it could be done by extending the workflow object. For instance:
class ReferenceRetrivalEvent(Event):
query: str
class Advice(Event):
principles: List[str]
profile: dict
query: str
book_content: str
class AdviceWorkFlow(Workflow):
def __init__(self, verbose: bool = False, session_id: str = None):
state = get_workflow_state(session_id)
self.principles = state.load_principle_from_cases()
self.profile = state.load_profile()
self.verbose = verbose
super().__init__(timeout=None, verbose=verbose)
@step
async def interview(self, ctx: Context,
ev: StartEvent) -> ReferenceRetrivalEvent:
# Step 1: Interviewer agent asks inquiries to the user
interviewer = get_interviewer_agent()
query = await _run_agent(interviewer, query=ev.user_msg, verbose=self.verbose)
return ReferenceRetrivalEvent(query=query)
@step
async def retrieve(self, ctx: Context, ev: ReferenceRetrivalEvent) -> Advice:
# Step 2: RAG agent retrieves relevant content from the book
rag_agent = get_principle_rag_agent()
book_content = await _run_agent(rag_agent, query=ev.query, verbose=self.verbose)
return Advice(principles=self.principles, profile=self.profile,
query=ev.query, book_content=book_content)
@step
async def advice(self, ctx: Context, ev: Advice) -> StopEvent:
# Step 3: Adviser agent provides advice based on the user's profile, principles, and book content
advisor = get_adviser_agent(ev.profile, ev.principles, ev.book_content)
advise = await _run_agent(advisor, query=ev.query, verbose=self.verbose)
return StopEvent(result=advise)
The particular event type’s return controls the workflow’s step transition. For example, retrieve
step returns an Advice
event that can trigger the execution of the advice
step. You too can leverage the Advice
event to pass the obligatory information you would like.
Throughout the implementation, in the event you are annoyed by having to start out over the workflow to debug some steps in the center, the is important when you would like to failover the workflow execution. You may store your state in a serialised format and get better your workflow by unserialising it to a context object. Your workflow will proceed executing based on the state as a substitute of starting over.
workflow = AgentWorkflow(
agents=[interviewer, advisor, retriever],
root_agent="interviewer",
)
try:
handler = w.run()
result = await handler
except Exception as e:
print(f"Error during initial run: {e}")
await fail_over()
# Optional, serialised and save the contexct for debugging
ctx_dict = ctx.to_dict(serializer=JsonSerializer())
json_dump_and_save(ctx_dict)
# Resume from the identical context
ctx_dict = load_failed_dict()
restored_ctx = Context.from_dict(workflow, ctx_dict,serializer=JsonSerializer())
handler = w.run(ctx=handler.ctx)
result = await handler
Summary
On this post, we have now discussed tips on how to use LlamaIndex to implement an AI journal’s core function. The important thing learning includes:
- Using Agentic RAG to leverage LLM capability to dynamically rewrite the unique query and synthesis result.
- Use a Customized Workflow to realize more explicit control over step transitions. Construct dynamic agents when obligatory.
The source code of this AI journal is in my GitHub repo here. I hope you enjoy this text and this small app I built. Cheers!