4 Approaches to construct on top of Generative AI Foundational Models


If a number of the terminology I take advantage of here is unfamiliar, I encourage you to read my earlier article on LLMs first.

There are teams which are employing ChatGPT or its competitors (Anthropic, Google’s Flan T5 or PaLM, Meta’s LLaMA, Cohere, AI21Labs, etc.) for real moderately for cutesy demos. Unfortunately, informative content about how they’re doing so is lost amidst marketing hype and technical jargon. Due to this fact, I see folks who’re getting began with generative AI take approaches that experts in the sector will let you know should not going to pan out. This text is my attempt at organizing this space and showing you what’s working.

Photo by Sen on Unsplash

The bar to clear

The issue with most of the cutesy demos and hype-filled posts about generative AI is that they hit the training dataset — they don’t really let you know how well it can work when applied to the chaos of real human users and truly novel input. Typical software is anticipated to work at 99%+ reliability —for instance, it was only when speech recognition crossed this accuracy bar on phrases that the marketplace for Voice AI took off. Same for automated captioning, translation, etc.

I see two ways through which teams are addressing this issue of their production systems:

  • Human users are more forgiving if the UX is in a situation where they already expect to correct errors (this appears to be what helps GitHub Copilot) or where it’s positioned as being interactive and helpful but not able to use (ChatGPT, Bing Chat, etc.)
  • Fully automated applications of generative AI are mostly within the trusted-tester stage today, and the jury is out on whether these applications are literally capable of clear this bar. That said, the outcomes are promising and trending upwards, and it’s likely only a matter of time before the bar’s met.

Personally, I even have been experimenting with GPT 3.5 Turbo and Google Flan-T5 with specific production use cases in mind, and learning quite a bit about what works and what doesn’t. None of my models have crossed the 99% bar. I also haven’t yet gotten access to GPT-4 or to Google’s PaLM API on the time of writing (March 2023). I’m basing this text on my experiments, on published research, and on publicly announced projects.

With all uses of generative AI, it is useful to firmly have in mind that the pretrained models are trained on web content and might be biased in multiple ways. Safeguard against those biases in your application layer.

Approach 1: Use the API Directly

The primary approach is the only because many users encountered GPT through the interactive interface offered by ChatGPT. It seems very intuitive to check out various prompts until you get one which generates the output you would like. That is why you’ve gotten a number of LinkedIn influencers publishing ChatGPT prompts that work for sales emails or whatever.

Relating to automating this workflow, the natural method is to make use of the REST API endpoint of the service and directly invoke it with the ultimate, working prompt:

import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
input="It was so great to fulfill you .... ",
instruction="Summarize the text below in the shape of an email that's 5 sentences or less."

Nonetheless, this approach doesn’t lend itself to operationalization. There are several reasons:

  1. . The underlying models keep improving. Sudden changes within the deployed models broke many production workloads, and other people learned from that have. ML workloads are brittle enough already; adding additional points of failure in the shape of prompts which are fine-tuned to specific models just isn’t sensible.
  2. . It’s rare that the instruction and input are plain strings as in the instance above. Most frequently, they include variables which are input from users. These variables need to be incorporated into the prompts and inputs. And as any programmer knows, injection by string concatenation is rife with security problems. You place yourself on the mercy of the guardrails placed across the Generative AI API whenever you do that. As when guarding against SQL injection, it’s higher to make use of an API that handles variable injection for you.
  3. . It’s rare that you’ll have the ability to get a prompt to work in one-shot. More common is to send multiple prompts to the model, and get the model to change its output based on these prompts. These prompts themselves can have some human input (reminiscent of follow-up inputs) embedded within the workflow. Also common is for the prompts to offer a couple of examples of the specified output (called few-shot learning).

A option to resolve all three of those problems is to make use of langchain.

Approach 2: Use langchain

Langchain is rapidly becoming the library of selection that means that you can invoke LLMs from different vendors, handle variable injection, and do few-shot training. Here’s an example of using langchain:

from langchain.prompts.few_shot import FewShotPromptTemplate

examples = [
"question": "Who lived longer, Muhammad Ali or Alan Turing?",
Are follow up questions needed here: Yes.
Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the final answer is: Muhammad Ali
"question": "When was the founder of craigslist born?",
Are follow up questions needed here: Yes.
Follow up: Who was the founder of craigslist?
Intermediate answer: Craigslist was founded by Craig Newmark.
Follow up: When was Craig Newmark born?
Intermediate answer: Craig Newmark was born on December 6, 1952.
So the final answer is: December 6, 1952

example_prompt = PromptTemplate(input_variables=["question", "answer"],
template="Query: {query}n{answer}")

prompt = FewShotPromptTemplate(
suffix="Query: {input}",

print(prompt.format(input="Who was the daddy of Mary Ball Washington?"))

I strongly recommend using langchain vs. using a vendor’s API directly. Then, be certain that every part you do works with not less than two APIs or use a LLM checkpoint that won’t change under you. Either of those approaches will avoid your prompts/code being brittle to changes within the underlying LLM. (Here, I’m using API to mean a managed LLM endpoint).

Langchain today supports APIs from Open AI, Cohere, HuggingFace Hub (and hence Google Flan-T5), etc. and LLMs from AI21, Anthropic, Open AI, HuggingFace Hub, etc.

Approach 3: Finetune the Generative AI Chain

That is the leading-edge approach in that it’s the one I see utilized by most of the delicate production applications of generative AI. As just an example (no endorsement), finetuning is how a startup consisting of Stanford PhDs is approaching standard enterprise use cases like SQL generation and record matching.

To grasp the rationale behind this approach, it helps to know that there are 4 machine learning models that underpin ChatGPT (or its competitors):

  1. A Large Language Model (LLM) is trained to predict the subsequent word of text given the previous words. It does this by learning word associations and patterns on an unlimited corpus of documents. The model is large enough that it learns these patterns in several contexts.
  2. A Reinforcement Learning based on Human Feedback Model (RL-HF) is trained by showing humans examples of generated text, and asking them to approve text that is agreeable to read. The explanation this is required is that an LLM’s output is probabilistic — it doesn’t predict a single next word; as an alternative, it predicts a set of words each of which has a certain probability of coming next. The RL-HF uses human feedback to learn how you can select the continuation that can generate the text that appeals to humans.
  3. Instruction Model is a supervised model that’s trained by showing prompts (“generate a sales email that proposes a demo to the engineering leadership”) and training the model on examples of sales emails.
  4. Context Model is trained to hold on a conversation with the user, allowing them to craft the output through successive prompts.

As well as, there are guardrails (filters on each the input and output). The model declines to reply certain sorts of queries, and retracts certain answers. In practice, these are each machine learning models which are always updated.

Step 2: How RL-HF works. Image from Stiennon et al, 2020

There are open-source generative AI models (Meta’s LLaMA, Google’s Flan-T5) which let you pick up at any of the above steps (e.g. use steps 1–2 from the released checkpoint, train 3 on your personal data, don’t do 4). Note that LLaMA doesn’t permit business use, and Flan-T5 is a 12 months old (so you might be compromising on quality). To learn where to interrupt off, it is useful to grasp the associated fee/good thing about each stage:

  • In case your application uses very different jargon and words, it could be helpful to construct a LLM from scratch on your personal data (i.e., start at step 1). The issue is that it’s possible you’ll not have enough data and even when you’ve gotten enough data, the training goes to be expensive (on the order of three–5 million dollars per training run). This appears to be what Salesforce has done with the generative AI they use for developers.
  • The RL-HF model is trained to appeal to a gaggle of testers who will not be subject-matter experts, or representative of your personal users. In case your application requires material expertise, it’s possible you’ll be higher off starting with a LLM and branching off from step 2. The dataset you wish for this is far smaller — Stiennon et al 2020 used 125k documents and presented a pair of outputs for every input document in each iteration (see diagram). So, you wish human labelers on standby to rate about 1 million outputs. Assuming that a labeler takes 10 min to rate each pair of documents, the associated fee is that of 250 human-months of labor per training run. I’d estimate $250k to $2m depending on location and skillset.
  • ChatGPT is trained to answer 1000’s of various prompts. Your application, then again, probably requires just one or two specific ones. It will possibly be convenient to coach a model reminiscent of Google Flan-T5 in your specific instruction and input. Such a model might be much smaller (and subsequently cheaper to deploy). This advantage in serving costs explains why step 3 is essentially the most common point of branching off. It’s possible to fine-tune Google Flan-T5 to your specific task with about 10k documents using HuggingFace and/or Keras. You’d do that in your usual ML framework reminiscent of Databricks, Sagemaker, or Vertex AI and use the identical services to deploy the trained model. Because Flan-T5 is a Google model, GCP makes training and deployment very easy by providing pre-built containers in Vertex AI. The associated fee could be perhaps $50 or so.
  • Theoretically, it’s possible to coach a unique option to maintain conversational context. Nonetheless, I haven’t seen this in practice. What most individuals do as an alternative is to make use of a conversational agent framework like Dialogflow that already has a LLM built into it, and design a custom chatbot for his or her application. The infra costs are negligible and also you don’t need any AI expertise, just domain knowledge.

It is feasible to interrupt off at any of those stages. Limiting my examples to publicly published work in medicine:

  1. This Nature article builds a custom 8.9-billion parameter LLM from 90 billion words extracted from medical records (i.e., they begin from step 1). For comparison, Flan T5 is 540 billion parameters and the “small/efficient” PaLM is 62 billion parameters. Obviously, cost is a constraint in going much greater in your custom language model.
  2. This MIT CSAIL study forces the model to closely hew to existing text and likewise doing instruction fine-tuning (i.e., they’re ranging from step 2).
  3. Deep Mind’s MedPaLM starts from an instruction-tuned variation of PaLM called Flan-PaLM (i.e. it starts after step 3). They report that 93% of healthcare professionals rated the AI as being on par with human answers.

My advice is to decide on where to interrupt off based on how different your application space is from the generic web text on which the foundational models are trained. Which model must you fine-tune? Currently, Google Flan T5 is essentially the most sophisticated fine-tuneable model available and open for business use. For non-commercial uses, Meta’s LLaMA is essentially the most sophisticated model available.

A word of caution though: whenever you tap into the chain using open-source models, the guardrail filters won’t exist, so . One option is to make use of the detoxify library. Be sure to include toxicity filtering around any API endpoint in production — otherwise, you’ll end up having to take it back down. API gateways is usually a convenient option to be certain that you might be doing this for all of your ML model endpoints.

Approach 4: Simplify the issue

There are smart approaches to reframe the issue you might be solving in reminiscent of way which you could use a Generative AI model (as in Approach 3) but avoid problems with hallucination, etc.

For instance, suppose you ought to do question-answering. You could possibly start with a robust LLM after which struggle to “tame” the wild beast to have it not hallucinate. A much simpler approach is to reframe the issue. Change the model from one which predicts the output text to a model that has three outputs: the URL of a document, the starting position inside that document, and the length of text. That’s what Google Search is doing here:

Google’s Q&A model predicts a URL, starting position, and length of text. This avoids problems with hallucination.

At worst, the model will show you irrelevant text. What it can not do is to hallucinate since you don’t allow it to really predict text.

A Keras sample that follows this approach tokenizes the inputs and context (the document that you simply are finding the reply inside):

from transformers import AutoTokenizer

model_checkpoint = "google/flan-t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
examples["question"] = [q.lstrip() for q in examples["question"]]
examples["context"] = [c.lstrip() for c in examples["context"]]
tokenized_examples = tokenizer(

after which passes the tokens right into a Keras regression model whose first layer is the Transformer model that takes in these tokens and that outputs the position of the reply inside the “context” text:

from transformers import TFAutoModelForQuestionAnswering
import tensorflow as tf
from tensorflow import keras

model = TFAutoModelForQuestionAnswering.from_pretrained(model_checkpoint)
optimizer = keras.optimizers.Adam(learning_rate=5e-5)
model.fit(train_set, validation_data=validation_set, epochs=1)

During inference, you get the expected locations:

inputs = tokenizer([context], [question], return_tensors="np")
outputs = model(inputs)
start_position = tf.argmax(outputs.start_logits, axis=1)
end_position = tf.argmax(outputs.end_logits, axis=1)

You’ll note that the sample doesn’t predict the URL — the context is assumed to be the results of a typical search query (reminiscent of returned by an identical engine or vector database), and the sample model only does extraction. Nonetheless, you’ll be able to construct the search also into the model by having a separate layer in Keras.


There are 4 approaches that I see getting used to construct production applications on top of generative AI foundational models:

  1. Use the REST API of an all-in model reminiscent of GPT-4 for one-shot prompts.
  2. Use langchain to abstract away the LLM, input injection, multi-turn conversations, and few-shot learning.
  3. Finetune in your custom data by tapping into the set of models that comprise an end-to-end generative AI model.
  4. Reframe the issue right into a form that avoids the risks of generative AI (bias, toxicity, hallucination).

Approach #3 is what I see mostly utilized by sophisticated teams.


What are your thoughts on this topic?
Let us know in the comments below.


Notify of
1 Comment
Newest Most Voted
Inline Feedbacks
View all comments
classical music
classical music
4 months ago

classical music

Share this article

Recent posts

Could We Achieve AGI Inside 5 Years? NVIDIA’s CEO Jensen Huang Believes It’s Possible

Within the dynamic field of artificial intelligence, the search for Artificial General Intelligence (AGI) represents a pinnacle of innovation, promising to redefine the interplay...

MS reveals a part of 'Customized Co-Pilot'… “Testing in progress… coming soon”

A few of the 'Customized Co-Pilot' that Microsoft (MS) announced in January has been released. In addition they announced that they plan to...

Impact of Rising Sea Levels on Coastal Residential Real Estate Assets

Using scenario based stress testing to discover medium (2050) and long run (2100) sea level rise risksThis project utilizes a scenario based qualitative stress...

Create a speaking and singing video with a single photo…”Produce mouth shapes, facial expressions, and movements.”

https://www.youtube.com/watch?v=9KuCy0W5s4o Alibaba introduced a man-made intelligence (AI) system that creates realistic speaking and singing videos from a single photo. It's the follow-up to the...

Recent comments

binance us registrácia on The Path to AI Maturity – 2023 LXT Report
Do NeuroTest work on The Stacking Ensemble Method
AeroSlim Weight loss price on NIA holds AI Ethics Idea Contest Awards Ceremony
skapa binance-konto on LLMs and the Emerging ML Tech Stack
бнанс рестраця для США on Model Evaluation in Time Series Forecasting
Bonus Pendaftaran Binance on Meet Our Fleet
Créer un compte gratuit on About Me — How I give AI artists a hand
To tài khon binance on China completely blocks ‘Chat GPT’
Regístrese para obtener 100 USDT on Reducing bias and improving safety in DALL·E 2
crystal teeth whitening on What babies can teach AI
binance referral bonus on DALL·E API now available in public beta
www.binance.com prihlásení on Neural Networks and Life
Büyü Yapılmışsa Nasıl Bozulur on Introduction to PyTorch: from training loop to prediction
yıldızname on OpenAI Function Calling
Kısmet Bağlılığını Çözmek İçin Dua on Examining Flights within the U.S. with AWS and Power BI
Kısmet Bağlılığını Çözmek İçin Dua on How Meta’s AI Generates Music Based on a Reference Melody
Kısmet Bağlılığını Çözmek İçin Dua on ‘이루다’의 스캐터랩, 기업용 AI 시장에 도전장
uçak oyunu bahis on Thanks!
para kazandıran uçak oyunu on Make Machine Learning Work for You
medyum on Teaching with AI
aviator oyunu oyna on Machine Learning for Beginners !
yıldızname on Final DXA-nation
adet kanı büyüsü on ‘Fake ChatGPT’ app on the App Store
Eşini Eve Bağlamak İçin Dua on LLMs and the Emerging ML Tech Stack
aviator oyunu oyna on AI as Artist’s Augmentation
Büyü Yapılmışsa Nasıl Bozulur on Some Guy Is Trying To Turn $100 Into $100,000 With ChatGPT
Eşini Eve Bağlamak İçin Dua on Latest embedding models and API updates
Kısmet Bağlılığını Çözmek İçin Dua on Jorge Torres, Co-founder & CEO of MindsDB – Interview Series
gideni geri getiren büyü on Joining the battle against health care bias
uçak oyunu bahis on A faster method to teach a robot
uçak oyunu bahis on Introducing the GPT Store
para kazandıran uçak oyunu on Upgrading AI-powered travel products to first-class
para kazandıran uçak oyunu on 10 Best AI Scheduling Assistants (September 2023)
aviator oyunu oyna on 🤗Hugging Face Transformers Agent
Kısmet Bağlılığını Çözmek İçin Dua on Time Series Prediction with Transformers
para kazandıran uçak oyunu on How China is regulating robotaxis
bağlanma büyüsü on MLflow on Cloud
para kazandıran uçak oyunu on Can The 2024 US Elections Leverage Generative AI?
Canbar Büyüsü on The reverse imitation game
bağlanma büyüsü on The NYU AI School Returns Summer 2023
para kazandıran uçak oyunu on Beyond ChatGPT; AI Agent: A Recent World of Staff
Büyü Yapılmışsa Nasıl Bozulur on The Murky World of AI and Copyright
gideni geri getiren büyü on ‘Midjourney 5.2’ creates magical images
Büyü Yapılmışsa Nasıl Bozulur on Microsoft launches the brand new Bing, with ChatGPT inbuilt
gideni geri getiren büyü on MemCon 2023: We’ll Be There — Will You?
adet kanı büyüsü on Meet the Fellow: Umang Bhatt
aviator oyunu oyna on Meet the Fellow: Umang Bhatt
abrir uma conta na binance on The reverse imitation game
código de indicac~ao binance on Neural Networks and Life
Larry Devin Vaughn Wall on How China is regulating robotaxis
Jon Aron Devon Bond on How China is regulating robotaxis
otvorenie úctu na binance on Evolution of Blockchain by DLC
puravive reviews consumer reports on AI-Driven Platform Could Streamline Drug Development
puravive reviews consumer reports on How OpenAI is approaching 2024 worldwide elections
www.binance.com Registrácia on DALL·E now available in beta