Home Artificial Intelligence MPT-7B, The Times of Commercially Usable Language Models Has Come Overall Context Length of StoryWriter model Datasets for Training Others Deploy on Colab Level Up Coding

MPT-7B, The Times of Commercially Usable Language Models Has Come Overall Context Length of StoryWriter model Datasets for Training Others Deploy on Colab Level Up Coding

0
MPT-7B, The Times of Commercially Usable Language Models Has Come
Overall
Context Length of StoryWriter model
Datasets for Training
Others
Deploy on Colab
Level Up Coding

An introduction and development guide for open-source LLM — MPT-7B

Image generated by Midjourney

Mosaic is a startup company specializing in AI models, and in this text, we’ll introduce their newly released MPT-7B model series. They’re fully open-source and commercially available models, trained in 9.5 days from scratch on 1 trillion tokens, whose development process is far more complicated and dear than those models fine-tuned from LLaMA or Pythia I introduced in my previous articles. This can be a remarkable feat for a startup, especially considering that they trained as much as a trillion tokens at 200K USD hardware cost. The bottom model’s capability is reminiscent of the 7-billion LLaMA model, along with that, they’ve also fine-tuned other models for uplifting development communities, including an Instruct model, a Chat model, and a StoryWriter model.

The MPT-7B model is the primary open-source language model with performance reminiscent of the LLaMA-7B model in Mosaic’s evaluation tests, and it looks higher quality and more stable than Pythia and StableLM and lots of other open-source models thus far based on the result and training scales. Other models, comparable to those from RedPajama and OpenLLaMA, have been only snapshots of current training models which haven’t been fully released yet. That is the primary one which we’ve got actually received where it’s the total model and has been benchmarked to indicate that it is largely on par with LLaMA. The MPT-7B model can be the primary commercially available model, and we will fine-tune it ourselves on our own data for business use. This is important progress in the sector of AI and machine learning I cannot wait to think about my potential AI business based on it without large investment.

Zero-shot accuracy of MPT-7B vs. LLaMA-7B vs. other open source models on academic tasks (From Mosaic).

Certainly one of the good things concerning the MPT-7B model is that it has provided the Story-Author model MPT-7B-StoryWriter-65k+, which was trained with ALiBi architecture, allowing users to increase the context to such extreme length. In case you desired to fine-tune an extended model, you might do this today. This is strictly what they’ve done with the Story-Author model, where they’ve taken the bottom model and fine-tuned it with a context length of 65,000+ tokens. To place that into perspective, the unique LLaMA model only accepts 2048 tokens, the StableLM was trained for 4,096 tokens, and for ChatGPT and GPT-4, the number is 4,096 and eight,000-32,000 depending on which version you will have access to.

The feature of 65+K-token input makes MPT far more impressive among the many models

There’s a formidable example on its blog showing that the Mosaic team once prompted all the book of The Great Gatsby then the model generated an honest epilogue successfully based on the 67873 tokens input. I even have never seen every other close model or open model able to doing that in order that makes me consider which method is cheaper to do in-context learning in the longer term, the embedding of OpenAI or the Story-Author model. In case you are fascinated about in-context learning, please discuss with my previous article.

They’ve also trained a 7 billion instruct model MPT-7B-Instruct, which is a short-form instruction-following model. It’s fine-tuned from base mode on open-source datasets mainly from the augmentation of Databricks’ Dolly-15K dataset. Because of this, they get a bigger instruct-based dataset and keep a commercial-usable license. As I discussed within the article on Dolly 2.0, the important thing enabler of Dolly’s business usability is its pure license-free dataset which shouldn’t be generated from other AI models like ChatGPT but created by humans. Due to that, whenever you play with it and ask it questions, you don’t get such familiar answers like “As an AI language model, I can’t …” form of thing. Nonetheless, it shouldn’t be as big a dataset because the Vicuna models and the Koala models are using and the team is planning to increase its training to 1T datasets for more competitive of this instruct model.

An example of Instruct model on JSON conversion task

There’s also one other extensive model of MPT-7B called MPT-7B-Chat to offer seamless, engaging multi-turn interactions for users. Please be noted that this chatbot model is the one one which not allowed for business use.

Furthermore, the MPT-7B model’s optimization layer includes FlashAttention and low precision layernorm, that are a part of the rationale for the faster inference speeds than other 7-billion models about “” on the HuggingFace hub.

Listed below are the resources of MPT-7B models which you can learn from:

While a model is so suitable for business usage, easy and low cost deployment ought to be one in all one other critical characteristics of it. Fortunately, MPT-7B has been engineered to be fast, easy, and inexpensive to deploy for inference tasks, because of the seamless compatibility with the HuggingFace PreTrainedModel base class.

https://colab.research.google.com/drive/16D9tjggLukD38Un0hC-Gss3mrehPXng_?usp=sharing

Please be happy to repeat it to your space, but learn that to run this model, you could have a Colab Pro account or local GPU support to have decent resources the pre-trainedMPT-7B-Instruct requires which need a minimum T4 15GB GPU and 22GB RAM.

Let’s walk through the code within the notebook.

!pip install requests torch transformers einops

from typing import Any, Dict, Tuple
import warnings
import datetime
import os
from threading import Event, Thread
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer

import textwrap

INSTRUCTION_KEY = "### Instruction:"
RESPONSE_KEY = "### Response:"
END_KEY = "### End"
INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
PROMPT_FOR_GENERATION_FORMAT = """{intro}
{instruction_key}
{instruction}
{response_key}
""".format(
intro=INTRO_BLURB,
instruction_key=INSTRUCTION_KEY,
instruction="{instruction}",
response_key=RESPONSE_KEY,
)

class InstructionTextGenerationPipeline:
...

Above are string constants that outline keys utilized in the input and output prompt formats utilized by the InstructionTextGenerationPipeline class.

INSTRUCTION_KEY, RESPONSE_KEY, and END_KEY are used as keys to discover specific sections of the prompt. INTRO_BLURB is a string that gives some introductory text for the prompt.PROMPT_FOR_GENERATION_FORMAT is a string that defines the format of the prompt that’s passed to the language model.

A category named InstructionTextGenerationPipeline is defined to generate text given an instruction using a pre-trained transformer language model. The category uses the transformers library to load the pre-trained model and tokenizer and defines a __call__ method that takes an instruction string as input and generates a response string using the language model.

generate = InstructionTextGenerationPipeline(
"mosaicml/mpt-7b-instruct",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
stop_token_ids = generate.tokenizer.convert_tokens_to_ids(["<|endoftext|>"])

# Define a custom stopping criteria
class StopOnTokens(StoppingCriteria):
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
for stop_id in stop_token_ids:
if input_ids[0][-1] == stop_id:
return True
return False

def process_stream(instruction, temperature, top_p, top_k, max_new_tokens):
# Tokenize the input
input_ids = generate.tokenizer(
generate.format_instruction(instruction), return_tensors="pt"
).input_ids
input_ids = input_ids.to(generate.model.device)

# Initialize the streamer and stopping criteria
streamer = TextIteratorStreamer(
generate.tokenizer, timeout=10.0, skip_prompt=True, skip_special_tokens=True
)
stop = StopOnTokens()

if temperature < 0.1:
temperature = 0.0
do_sample = False
else:
do_sample = True

gkw = {
**generate.generate_kwargs,
**{
"input_ids": input_ids,
"max_new_tokens": max_new_tokens,
"temperature": temperature,
"do_sample": do_sample,
"top_p": top_p,
"top_k": top_k,
"streamer": streamer,
"stopping_criteria": StoppingCriteriaList([stop]),
},
}

response = ''

def generate_and_signal_complete():
generate.model.generate(**gkw)

t1 = Thread(goal=generate_and_signal_complete)
t1.start()

for new_text in streamer:
response += new_text

return response

Now, we’re in a position to call the process_stream() method with proper arguments to see how the model responds to our instructs.

instruction = "Write a travel blog a couple of 3-day trip to The Philippines. You wish describe day-to-day."
temperature = 0.3
top_p = 0.95
top_k = 0
max_new_tokens = 2000
response = process_stream(instruction, temperature, top_p, top_k, max_new_tokens)

wrapped_text = textwrap.fill(response, width=100)
print(wrapped_text +'nn')

The travel blog for 3 days within the Philippines looks quite nice and really much like a standard traveler’s expression.

Response from MPT-7B-Instruct

You may try far more instructs for the model once your Colab or local machine successfully deploys the model, and adjusts the parameters within the code to see different behaviors based on your perspective. From my tests thus far, the text and code completion is nice enough however the reasoning and math haven’t matured yet to run any business instruct task. From Mosaic’s official blog, we all know they try to place more training data into these fine-tuned models.

MPT-7B’s base model, fine-tuning, datasets, training, and inference are all open-source and commercial-free, so you might now start to contemplate training and releasing your individual private model on your AI business at an inexpensive cost.

That’s it.

Hope you will discover something useful in this text and thanks for reading!

LEAVE A REPLY

Please enter your comment!
Please enter your name here