Supercharge Your LLM Apps Using DSPy and Langfuse

-

As illustrated in figure 1, DSPy is a pytorch-like/lego-like framework for constructing LLM-based apps. Out of the box, it comes with:

  • Signatures: These are specifications to define input and output behaviour of a DSPy program. These will be defined using short-hand notation (like “query -> answer” where the framework mechanically understands query is the input while answer is the output) or using declarative specification using python classes (more on this in later sections)
  • Modules: These are layers of predefined components for powerful concepts like Chain of Thought, ReAct and even the straightforward text completion (Predict). These modules abstract underlying brittle prompts while still providing extensibility through custom components.
  • Optimizers: These are unique to DSPy framework and draw inspiration from PyTorch itself. These optimizers make use of annotated datasets and evaluation metrics to assist tune/optimize our LLM-powered DSPy programs.
  • Data, Metrics, Assertions and Trackers are a number of the other components of this framework which act as glue and work behind the scenes to counterpoint this overall framework.

To construct an app/program using DSPy, we undergo a modular yet step-by-step approach (as shown in figure 1 (right)). We first define our task to assist us clearly define our program’s signature (input and output specifications). That is followed by constructing a pipeline program which makes use of a number of abstracted prompt modules, language model module in addition to retrieval model modules. One we now have all of this in place, we then proceed to have some examples together with required metrics to evaluate our setup that are utilized by optimizers and assertion components to compile a robust app.

Langfuse is an LLM Engineering platform designed to empower developers in constructing, managing, and optimizing LLM-powered applications. While it offers each managed and self-hosting solutions, we’ll concentrate on the self-hosting option on this post, providing you with complete control over your LLM infrastructure.

Key Highlights of Langfuse Setup

Langfuse equips you with a set of powerful tools to streamline the LLM development workflow:

  • Prompt Management: Effortlessly version and retrieve prompts, ensuring reproducibility and facilitating experimentation.
  • Tracing: Gain deep visibility into your LLM applications with detailed traces, enabling efficient debugging and troubleshooting. The intuitive UI out of the box enables teams to annotate model interactions to develop and evaluate training datasets.
  • Metrics: Track crucial metrics comparable to cost, latency, and token usage, empowering you to optimize performance and control expenses.
  • Evaluation: Capture user feedback, annotate LLM responses, and even arrange evaluation functions to repeatedly assess and improve your models.
  • Datasets: Manage and organize datasets derived out of your LLM applications, facilitating further fine-tuning and model enhancement.

Effortless Setup

Langfuse’s self-hosting solution is remarkably easy to establish, leveraging a docker-based architecture which you could quickly spin up using docker compose. This streamlined approach minimizes deployment complexities and permits you to concentrate on constructing your LLM applications.

Framework Compatibility

Langfuse seamlessly integrates with popular LLM frameworks like LangChain, LlamaIndex, and, after all, DSPy, making it a flexible tool for a big selection of LLM development frameworks.

By integrating Langfuse into your DSPy applications, you unlock a wealth of observability capabilities that enable you to watch, analyze, and optimize your models in real time.

Integrating Langfuse into Your DSPy App

The mixing process is simple and involves instrumenting your DSPy code with Langfuse’s SDK.

import dspy
from dsp.trackers.langfuse_tracker import LangfuseTracker

# configure tracker
langfuse = LangfuseTracker()

# instantiate openai client
openai = dspy.OpenAI(
model='gpt-4o-mini',
temperature=0.5,
max_tokens=1500
)

# dspy predict supercharged with automatic langfuse trackers
openai("What's DSPy?")

Gaining Insights with Langfuse

Once integrated, Langfuse provides a lot of actionable insights into your DSPy application’s behavior:

  • Trace-Based Debugging: Follow the execution flow of your DSPY programs, pinpoint bottlenecks, and discover areas for improvement.
  • Performance Monitoring: Track key metrics like latency and token usage to make sure optimal performance and cost-efficiency.
  • User Interaction Evaluation: Understand how users interact along with your LLM app, discover common queries, and opportunities for enhancement.
  • Data Collection & High quality-Tuning: Collect and annotate LLM responses, constructing beneficial datasets for further fine-tuning and model refinement.

Use Cases Amplified

The mixture of DSPy and Langfuse is especially necessary in the next scenarios:

  • Complex Pipelines: When coping with complex DSPy pipelines involving multiple modules, Langfuse’s tracing capabilities turn out to be indispensable for debugging and understanding the flow of data.
  • Production Environments: In production settings, Langfuse’s monitoring features ensure your LLM app runs easily, providing early warnings of potential issues while keeping track of costs involved.
  • Iterative Development: Langfuse’s evaluation and dataset management tools facilitate data-driven iteration, allowing you to repeatedly refine your LLM app based on real-world usage.

To really showcase the facility and flexibility of DSPy combined with amazing monitoring capabilities of langfuse, I’ve recently applied them to a novel dataset: my recent LLM workshop GitHub repository. This recent full day workshop comprises loads of material to get you began with LLMs. The aim of this Q&A bot was to help participants during and after the workshop with answers to a number NLP and LLM related topics covered within the workshop. This “meta” use case not only demonstrates the sensible application of those tools but additionally adds a touch of self-reflection to our exploration.

The Task: Constructing a Q&A System

For this exercise, we’ll leverage DSPy to construct a Q&A system able to answering questions on the content of my workshop (notebooks, markdown files, etc.). This task highlights DSPy’s ability to process and extract information from textual data, an important capability for a big selection of LLM applications. Imagine having a private AI assistant (or co-pilot) that may show you how to recall details out of your past weeks, discover patterns in your work, and even surface forgotten insights! It also presents a powerful case of how such a modular setup will be easily prolonged to another textual dataset with little to no effort.

Allow us to begin by establishing the required objects for our program.

import os
import dspy
from dsp.trackers.langfuse_tracker import LangfuseTracker

config = {
'LANGFUSE_PUBLIC_KEY': 'XXXXXX',
'LANGFUSE_SECRET_KEY': 'XXXXXX',
'LANGFUSE_HOST': 'http://localhost:3000',
'OPENAI_API_KEY': 'XXXXXX',
'OPENAI_BASE_URL': 'XXXXXX',
'OPENAI_PROVIDER': 'XXXXXX',
'CHROMA_DB_PATH': './chromadb/',
'CHROMA_COLLECTION_NAME':"supercharged_workshop_collection",
'CHROMA_EMB_MODEL': 'all-MiniLM-L6-v2'
}

# setting config
os.environ["LANGFUSE_PUBLIC_KEY"] = config.get('LANGFUSE_PUBLIC_KEY')
os.environ["LANGFUSE_SECRET_KEY"] = config.get('LANGFUSE_SECRET_KEY')
os.environ["LANGFUSE_HOST"] = config.get('LANGFUSE_HOST')
os.environ["OPENAI_API_KEY"] = config.get('OPENAI_API_KEY')

# setup Langfuse tracker
langfuse_tracker = LangfuseTracker(session_id='supercharger001')

# instantiate language-model for DSPY
llm_model = dspy.OpenAI(
api_key=config.get('OPENAI_API_KEY'),
model='gpt-4o-mini'
)

# instantiate chromadb client
chroma_emb_fn = embedding_functions.
SentenceTransformerEmbeddingFunction(
model_name=config.get(
'CHROMA_EMB_MODEL'
)
)
client = chromadb.HttpClient()

# setup chromadb collection
collection = client.create_collection(
config.get('CHROMA_COLLECTION_NAME'),
embedding_function=chroma_emb_fn,
metadata={"hnsw:space": "cosine"}
)

Once we now have these clients and trackers in place, allow us to quickly add some documents to our collection (check with this notebook for an in depth walk through of how I prepared this dataset in the primary place).

# Add to collection
collection.add(
documents=[v for _,v in nb_scraper.notebook_md_dict.items()],
ids=doc_ids, # should be unique for every doc
)

The subsequent step is to easily connect our chromadb retriever to the DSPy framework. The next snippet created a RM object and tests if the retrieval works as intended.

retriever_model = ChromadbRM(
config.get('CHROMA_COLLECTION_NAME'),
config.get('CHROMA_DB_PATH'),
embedding_function=chroma_emb_fn,
client=client,
k=5
)

# Test Retrieval
results = retriever_model("RLHF")
for lead to results:
display(Markdown(f"__Document__::{result.long_text[:100]}... n"))
display(Markdown(f">- __Document id__::{result.id} n>- __Document score__::{result.rating}"))

The output looks promising provided that with none intervention, Chromadb is in a position to fetch probably the most relevant documents.

Document::# Quick Overview of RLFH

The performance of Language Models until GPT-3 was kind of fantastic as-is. ...

- Document id::6_module_03_03_RLHF_phi2
- Document rating::0.6174977412306334

Document::# Getting Began : Text Representation Image

The NLP domain ...

- Document id::2_module_01_02_getting_started
- Document rating::0.8062083377747705

Document::# Text Generation ...

- Document id::3_module_02_02_simple_text_generator
- Document rating::0.8826038964887366

Document::# Image DSPy: Beyond Prompting
...

- Document id::12_module_04_05_dspy_demo
- Document rating::0.9200280698248913

The ultimate step is to piece all of this together in preparing a DSPy program. For our easy Q&A use-case we make prepare a typical RAG program leveraging Chromadb as our retriever and Langfuse as our tracker. The next snippet presents the pytorch-like approach of developing LLM based apps without worrying about brittle prompts!

# RAG Signature
class GenerateAnswer(dspy.Signature):
"""Answer questions with short factoid answers."""

context = dspy.InputField(desc="may contain relevant facts")
query = dspy.InputField()
answer = dspy.OutputField(desc="often lower than 50 words")

# RAG Program
class RAG(dspy.Module):
def __init__(self, num_passages=3):
super().__init__()

self.retrieve = dspy.Retrieve(k=num_passages)
self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

def forward(self, query):
context = self.retrieve(query).passages
prediction = self.generate_answer(context=context, query=query)
return dspy.Prediction(context=context, answer=prediction.answer)

# compile a RAG
# note: we usually are not using any optimizers for this instance
compiled_rag = RAG()

Phew! Wasn’t that quick and easy to do? Allow us to now put this into motion using a number of sample questions.

my_questions = [
"List the models covered in module03",
"Brief summary of module02",
"What is LLaMA?"
]

for query in my_questions:
# Get the prediction. This comprises `pred.context` and `pred.answer`.
pred = compiled_rag(query)

display(Markdown(f"__Question__: {query}"))
display(Markdown(f"__Predicted Answer__: _{pred.answer}_"))
display(Markdown("__Retrieved Contexts (truncated):__"))
for idx,cont in enumerate(pred.context):
print(f"{idx+1}. {cont[:200]}..." )
print()
display(Markdown('---'))

The output is indeed quite on point and serves the aim of being an assistant to this workshop material answering questions and guiding the attendees nicely.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x