Home Artificial Intelligence Anatomy of LLM-Based Chatbot Applications: Monolithic vs. Microservice Architectural Patterns Monolithic architecture Microservices architecture Which architecture to decide on? Conclusion

Anatomy of LLM-Based Chatbot Applications: Monolithic vs. Microservice Architectural Patterns Monolithic architecture Microservices architecture Which architecture to decide on? Conclusion

1
Anatomy of LLM-Based Chatbot Applications: Monolithic vs. Microservice Architectural Patterns
Monolithic architecture
Microservices architecture
Which architecture to decide on?
Conclusion

A Practical Guide to Constructing Monolithic and Microservice Chatbot Applications with Streamlit, Huggingface, and FastAPI

Image generated by Writer using Midjourney V5.1 using the prompt: “isometric highly realistic view of a laptop, screen has the image of a brilliant, multi coloured rubik’s cube that’s illuminated from inside, brilliant, warm, cheerful lighting. 8k, hdr, unreal engine”

With the arrival of OpenAI’s ChatGPT, chatbots are exploding in popularity! Every business seeks ways to include ChatGPT into its customer-facing and internal applications. Further, with open-source chatbots catching up so rapidly that even Google engineers appear to conclude they and OpenAI have “no moat,” there’s never been a greater time to be within the AI industry!

As a Data Scientist constructing such an application, one in every of the critical decisions is selecting between a monolithic and microservices architecture. Each architectures have pros and cons; ultimately, the selection is determined by the business’s needs, reminiscent of scalability and ease of integration with existing systems. On this blog post, we’ll explore the differences between these two architectures with live code examples using Streamlit, Huggingface, and FastAPI!

First, create a latest conda environment and install the mandatory libraries.

# Create and activate a conda environment
conda create -n hf_llm_chatbot python=3.9
conda activate hf_llm_chatbot

# Install the mandatory libraries
pip install streamlit streamlit-chat "fastapi[all]" "transformers[torch]"

In a monolithic application, all of the code related to the applying is tightly coupled in a single, self-contained unit. Image by Writer

Monolithic architecture is an approach that involves constructing your complete application as a single, self-contained unit. This approach is easy and straightforward to develop but can develop into complex as the applying grows. All application components, including the user interface, business logic, and data storage, are tightly coupled in a monolithic architecture. Any changes made to 1 a part of the app can ripple effect on your complete application.

Let’s use Huggingface and Streamlit to construct a monolithic chatbot application below. We’ll use Streamlit to construct the frontend user interface, while Huggingface provides a particularly easy-to-use, high-level abstraction to varied open-source LLM models called pipelines.

First, let’s create a file utils.py containing three helper functions common to the front end in monolithic and microservices architectures.

  1. clear_conversation(): This function deletes all of the stored session_state variables within the Streamlit frontend. We use it to clear your complete chat history and begin a latest chat thread.
  2. display_conversation(): This function uses the streamlit_chat library to create a fantastic chat interface frontend with our entire chat thread displayed on the screen from the most recent to the oldest message. For the reason that Huggingface pipelines API stores user_inputs and generate_responses in separate lists, we also use this function to create a single interleaved_conversation list that incorporates your complete chat thread so we are able to download it if needed.
  3. download_conversation(): This function converts the entire chat thread to a pandas dataframe and downloads it as a csv file to your local computer.
# %%writefile utils.py
from datetime import datetime

import pandas as pd
import streamlit as st
from streamlit_chat import message

def clear_conversation():
"""Clear the conversation history."""
if (
st.button("🧹 Clear conversation", use_container_width=True)
or "conversation_history" not in st.session_state
):
st.session_state.conversation_history = {
"past_user_inputs": [],
"generated_responses": [],
}
st.session_state.user_input = ""
st.session_state.interleaved_conversation = []

def display_conversation(conversation_history):
"""Display the conversation history in reverse chronology."""

st.session_state.interleaved_conversation = []

for idx, (human_text, ai_text) in enumerate(
zip(
reversed(conversation_history["past_user_inputs"]),
reversed(conversation_history["generated_responses"]),
)
):
# Display the messages on the frontend
message(ai_text, is_user=False, key=f"ai_{idx}")
message(human_text, is_user=True, key=f"human_{idx}")

# Store the messages in an inventory for download
st.session_state.interleaved_conversation.append([False, ai_text])
st.session_state.interleaved_conversation.append([True, human_text])

def download_conversation():
"""Download the conversation history as a CSV file."""
conversation_df = pd.DataFrame(
reversed(st.session_state.interleaved_conversation), columns=["is_user", "text"]
)
csv = conversation_df.to_csv(index=False)

st.download_button(
label="💾 Download conversation",
data=csv,
file_name=f"conversation_{datetime.now().strftime('%Y%m%d%H%M%S')}.csv",
mime="text/csv",
use_container_width=True,
)

Next, let’s create a single monolith.py file containing our entire monolithic application.

  1. OpenAI’s ChatGPT API costs money for each token in each the query and response. Hence for this small demo, I selected to make use of an open-source model from Huggingface called “facebook/blenderbot-400M-distill”. You could find your complete list of over 2000 open-source models trained for the conversational task on the Huggingface model hub. For more details on the conversational task pipeline, seek advice from Huggingface’s official documentation. When open-source models inevitably catch as much as the proprietary models from OpenAI and Google, I’m sure Huggingface will likely be THE platform for researchers to share those models, given how much they’ve revolutionized the sphere of NLP over the past few years!
  2. important(): This function builds the frontend app’s layout using Streamlit. We’ll have a button to clear the conversation and one to download. We’ll even have a text box where the user can type their query, and upon pressing enter, we’ll call the monolith_llm_response function with the user’s input. Finally, we’ll display your complete conversation on the front end using the display_conversation function from utils.
  3. monolith_llm_response(): This function is chargeable for the chatbot logic using Huggingface pipelines. First, we create a latest Conversation object and initialize it to your complete conversation history as much as that time. Then, we add the most recent user_input to that object, and eventually, we pass this conversation object to the Huggingface pipeline that we created two steps back. Huggingface mechanically adds the user input and response generated to the conversation history!
# %%writefile monolith.py
import streamlit as st
import utils
from transformers import Conversation, pipeline

# https://huggingface.co/docs/transformers/v4.28.1/en/main_classes/pipelines#transformers.Conversation
chatbot = pipeline(
"conversational", model="facebook/blenderbot-400M-distill", max_length=1000
)

@st.cache_data()
def monolith_llm_response(user_input):
"""Run the user input through the LLM and return the response."""
# Step 1: Initialize the conversation history
conversation = Conversation(**st.session_state.conversation_history)

# Step 2: Add the most recent user input
conversation.add_user_input(user_input)

# Step 3: Generate a response
_ = chatbot(conversation)

# User input and generated response are mechanically added to the conversation history
# print(st.session_state.conversation_history)

def important():
st.title("Monolithic ChatBot App")

col1, col2 = st.columns(2)
with col1:
utils.clear_conversation()

# Get user input
if user_input := st.text_input("Ask your query 👇", key="user_input"):
monolith_llm_response(user_input)

# Display your complete conversation on the frontend
utils.display_conversation(st.session_state.conversation_history)

# Download conversation code runs last to make sure the most recent messages are captured
with col2:
utils.download_conversation()

if __name__ == "__main__":
important()

That’s it! We will run this monolithic application by running streamlit run monolith.py and interacting with the applying on an online browser! We could quickly deploy this application as such to a cloud service like Google Cloud Run, as described in my previous blog post, and interact with it over the web too!

Monolithic Streamlit App interface. Image by Writer
In a microservices application, each component is split up into its own smaller, independent service. Image by Writer

Microservices architecture is an approach that involves breaking down the applying into smaller, independent services. Each application component, reminiscent of the user interface, business logic, and data storage, is developed and deployed independently. This approach offers flexibility and scalability as we are able to modularly add more capabilities and horizontally scale each service independently of others by adding more instances.

Let’s split the Huggingface model inference from our monolithic app right into a separate microservice using FastAPI and the Streamlit frontend into one other microservice below. For the reason that backend on this demo only has the LLM model, our backend API server is similar because the LLM model from the image above. We will directly re-use the utils.py file we created above within the frontend microservice!

First, let’s create a backend.py file that may function our FastAPI microservice that runs the Huggingface pipeline inference.

  1. We first create the pipeline object with the identical model that we selected earlier, “facebook/blenderbot-400M-distill”
  2. We then create a ConversationHistory Pydantic model in order that we are able to receive the inputs required for the pipeline as a payload to the FastAPI service. For more information on the FastAPI request body, please take a look at the FastAPI documentation.
  3. It’s practice to order the foundation route in APIs for a health check. So we define that route first.
  4. Finally, we define a route called /chat, which accepts the API payload as a ConversationHistory object and converts it to a dictionary. Then we create a latest Conversation object and initialize it with the conversation history received within the payload. Next, we add the most recent user_input to that object and pass this conversation object to the Huggingface pipeline. Finally, we must return the most recent generated response to the front end.
# %%writefile backend.py
from typing import Optional

from fastapi import FastAPI
from pydantic import BaseModel, Field
from transformers import Conversation, pipeline

app = FastAPI()

# https://huggingface.co/docs/transformers/v4.28.1/en/main_classes/pipelines#transformers.Conversation
chatbot = pipeline(
"conversational", model="facebook/blenderbot-400M-distill", max_length=1000
)

class ConversationHistory(BaseModel):
past_user_inputs: Optional[list[str]] = []
generated_responses: Optional[list[str]] = []
user_input: str = Field(example="Hello, how are you?")

@app.get("/")
async def health_check():
return {"status": "OK!"}

@app.post("/chat")
async def llm_response(history: ConversationHistory) -> str:
# Step 0: Receive the API payload as a dictionary
history = history.dict()

# Step 1: Initialize the conversation history
conversation = Conversation(
past_user_inputs=history["past_user_inputs"],
generated_responses=history["generated_responses"],
)

# Step 2: Add the most recent user input
conversation.add_user_input(history["user_input"])

# Step 3: Generate a response
_ = chatbot(conversation)

# Step 4: Return the last generated result to the frontend
return conversation.generated_responses[-1]

We will run this FastAPI app locally using uvicorn backend:app --reload, or deploy it to a cloud service like Google Cloud Run, as described in my previous blog post, and interact with it over the web! You possibly can test the backend using the API docs that FastAPI mechanically generates on the /docs route by navigating to http://127.0.0.1:8000/docs.

FastAPI docs for the backend. Image by Writer

Finally, let’s create a frontend.py file that incorporates the frontend code.

  1. important(): This function is precisely much like important() within the monolithic application, aside from one change that we call the microservice_llm_response() function when the user enters any input.
  2. microservice_llm_response(): Since we split out the LLM logic right into a separate FastAPI microservice, this function uses the conversation history stored within the session_state to post a request to the backend FastAPI service after which appends each the user’s input and the response from the FastAPI backend to the conversation history to proceed the memory of your complete chat thread.
# %%writefile frontend.py
import requests
import streamlit as st
import utils

# Replace with the URL of your backend
app_url = "http://127.0.0.1:8000/chat"

@st.cache_data()
def microservice_llm_response(user_input):
"""Send the user input to the LLM API and return the response."""
payload = st.session_state.conversation_history
payload["user_input"] = user_input

response = requests.post(app_url, json=payload)

# Manually add the user input and generated response to the conversation history
st.session_state.conversation_history["past_user_inputs"].append(user_input)
st.session_state.conversation_history["generated_responses"].append(response.json())

def important():
st.title("Microservices ChatBot App")

col1, col2 = st.columns(2)
with col1:
utils.clear_conversation()

# Get user input
if user_input := st.text_input("Ask your query 👇", key="user_input"):
microservice_llm_response(user_input)

# Display your complete conversation on the frontend
utils.display_conversation(st.session_state.conversation_history)

# Download conversation code runs last to make sure the most recent messages are captured
with col2:
utils.download_conversation()

if __name__ == "__main__":
important()

That’s it! We will run this frontend application by running streamlit run frontend.py and interacting with the applying on an online browser! As my previous blog post described, we could quickly deploy to a cloud service like Google Cloud Run and interact with it over the web too!

The reply is determined by the necessities of your application. A monolithic architecture might be an incredible place to begin for a Data Scientist to construct an initial proof-of-concept quickly and get it in front of business stakeholders. But, inevitably, if you happen to plan to productionize the applying, a microservices architecture is usually a greater bet over a monolithic one since it allows for more flexibility and scalability and allows different specialized developers to concentrate on constructing the assorted components. For instance, a frontend developer might use React to construct the frontend, a Data Engineer might use Airflow to put in writing the information pipelines, and an ML engineer might use FastAPI or BentoML to deploy the model serving API with custom business logic.

Moreover, with microservices, chatbot developers can easily incorporate latest features or change existing ones without affecting your complete application. This level of flexibility and scalability is crucial for businesses that need to integrate the chatbot into existing applications. Dedicated UI/UX, data engineers, data scientists, and ML engineers can each concentrate on their areas of experience to deliver a refined product!

In conclusion, monolithic and microservices architectures have pros and cons, and the selection between the 2 is determined by the business’s specific needs. Nevertheless, I prefer microservices architecture for chatbot applications attributable to its flexibility, scalability, and the proven fact that I can delegate frontend development to more qualified UI/UX folk 🤩.

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here