Home Artificial Intelligence Constructing your individual chat interface to your data WITHOUT the OpenAI API Step 1: Collect your data and install dependencies Step 2: Define your system and construct index Step 3: Use your system

Constructing your individual chat interface to your data WITHOUT the OpenAI API Step 1: Collect your data and install dependencies Step 2: Define your system and construct index Step 3: Use your system

0
Constructing your individual chat interface to your data WITHOUT the OpenAI API
Step 1: Collect your data and install dependencies
Step 2: Define your system and construct index
Step 3: Use your system

What you may do with OpenAI’s models is fascinating. Also, tools akin to LangChain and llama-index make it very easy to get a basic ChatGPT-like system up and running in few lines of code. Nonetheless, many of the examples construct on OpenAI’s APIs which shouldn’t be practical in all cases, e.g. because you can not into the cloud or you can not .

Query your system in a ChatGPT-like way on your individual data and without OpenAI.

As I recently tried to get a basic system running using databrick’s dolly and it needed somewhat little bit of trial and error here my quick tutorial on find out how to use a to construct a to your individual data!

For this step, just you need to use and place it right into a directory in your local machine. In my case, these were a bunch of markdown files I pulled from the docs of our data curation tool Highlight (test it out too ;-)).

Next, install every part you would like:

pip install torch transformers langchain llama-index====0.6.0.alpha3

Copy the next code and adjust the trail to your input folder. It uses the Huggingface transformers library to generate embeddings for retrieval and databrick’s dolly to generate the ultimate output.

from pathlib import Path
import torch
from transformers import pipeline
from langchain.llms.base import LLM
from llama_index import SimpleDirectoryReader, LangchainEmbedding, GPTVectorStoreIndex, PromptHelper, LLMPredictor, ServiceContext
from llama_index.langchain_helpers.text_splitter import TokenTextSplitter
from llama_index.node_parser.easy import SimpleNodeParser
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

INPUT_FOLDER = "path/to/your/data/folder"

index_files = list(Path(INPUT_FOLDER).glob("*"))

max_input_size = 2048
num_output = 256
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

pipe = pipeline("text-generation", model="databricks/dolly-v2-3b", trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto")
embed_model = LangchainEmbedding(HuggingFaceEmbeddings())

class CustomLLM(LLM):
model_name = "databricks/dolly-v2-3b"

def _call(self, prompt, stop = None):
response = pipe(prompt, max_new_tokens=num_output)[0]["generated_text"]
return response

@property
def _identifying_params(self):
return {"name_of_model": self.model_name}

@property
def _llm_type(self):
return "custom"

# define our LLM
llm_predictor = LLMPredictor(llm=CustomLLM())

node_parser = SimpleNodeParser(text_splitter=TokenTextSplitter(chunk_size=512, chunk_overlap=max_chunk_overlap))
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, embed_model=embed_model, prompt_helper=prompt_helper, node_parser=node_parser, chunk_size_limit=512)
# Load your data
documents = SimpleDirectoryReader(input_files=index_files).load_data()

index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)

query_engine = index.as_query_engine()

Run the code to construct your document index.

We’re already done! You’ll be able to now use the object to ask questions on your data!

An example I attempted on our docs is something like this:

print(query_engine.query("Summarize typical use cases of the similarity map in few sentences."))

The here is:

The Similarity Map helps to explore simulation similarities and find explanations for observable phenomens.
It might probably be used to group designs by similarity, find outliers and detect correlations between features and goal values.

That is fairly accurate, although the reply is somewhat too specific to a concrete use case. Note that, in fact, a very vital aspect is, that to feed into the system remains to be a challenge. If you need to learn more about this topic, be happy to ,e.g.,via our solutions page, or simply try our free data curation tool highlight.

LEAVE A REPLY

Please enter your comment!
Please enter your name here