Find out how to Convert Any Text Right into a Graph of Concepts

Artificial Intelligence

Find out how to Convert Any Text Right into a Graph of Concepts

admin

November 10, 2023

Find out how to Convert Any Text Right into a Graph of Concepts

In case you were to ask GPT, find out how to create a graph of information from a given text? it could suggest a process just like the following.

Extract concepts and entities from the body of labor. These are the nodes.
Extract relations between the concepts. These are the perimeters.
Populate nodes (concepts) and edges (relations) in a graph data structure or a graph database.
Visualise, for some artistic gratification if nothing else.

Steps 3 and 4 sound comprehensible. But how do you achieve steps 1 and a couple of?

Here’s a flow diagram of the tactic I devised to extract a graph of concepts from any given text corpus. It is comparable to the above method but for a number of minor differences.

Diagram created by the Writer using draw.io

Split the corpus of text into chunks. Assign a chunk_id to every of those chunks.
For each text chunk extract concepts and their semantic relationships using an LLM. Let’s assign this relation a weightage of W1. There might be multiple relationships between the identical pair of concepts. Every such relation is an edge between a pair of concepts.
Consider that the concepts that occur in the identical text chunk are also related by their contextual proximity. Let’s assign this relation a weightage of W2. Note that the identical pair of concepts may occur in multiple chunks.
Group similar pairs, sum their weights, and concatenate their relationships. So now we’ve got just one edge between any distinct pair of concepts. The sting has a certain weight and an inventory of relations as its name.

You possibly can see the implementation of this method as a Python code, within the GitHub repository I share in this text. Allow us to briefly walk through the important thing ideas of the implementation in the subsequent few sections.

To show the tactic here, I’m using the next review article published in PubMed/Cureus under the terms of the Creative Commons Attribution License. Credit to the authors at the top of this text.

The Mistral and the Prompt

Step 1 within the above flow chart is simple. Langchain provides a plethora of text splitters we are able to use to separate our text into chunks.

Step 2 is where the actual fun starts. To extract the concepts and their relationships, I’m using the Mistral 7B model. Before converging on the variant of the model best fitted to our purpose, I experimented with the next:

Mistral Instruct
Mistral OpenOrca, and
Zephyr (Hugging Face version derived from Mistral)

I used the 4-bit quantised version of those models — In order that my Mac doesn’t start hating me — hosted locally with Ollama.

These models are all instruction-tuned models with a system prompt and a user prompt. All of them do a reasonably good job at following the instructions and formatting the reply neatly in JSONs if we tell them to.

After a number of rounds of hit and trial, I finally converged on to the Zephyr model with the next prompts.

SYS_PROMPT = (
"You might be a network graph maker who extracts terms and their relations from a given context. "
"You might be supplied with a context chunk (delimited by ```) Your task is to extract the ontology "
"of terms mentioned within the given context. These terms should represent the important thing concepts as per the context. n"
"Thought 1: While traversing through each sentence, Think concerning the key terms mentioned in it.n"
"tTerms may include object, entity, location, organization, person, n"
"tcondition, acronym, documents, service, concept, etc.n"
"tTerms must be as atomistic as possiblenn"
"Thought 2: Take into consideration how these terms can have one on one relation with other terms.n"
"tTerms which can be mentioned in the identical sentence or the identical paragraph are typically related to one another.n"
"tTerms might be related to many other termsnn"
"Thought 3: Discover the relation between each such related pair of terms. nn"
"Format your output as an inventory of json. Each element of the list incorporates a pair of terms"
"and the relation between them, just like the follwing: n"
"[n"
"   {n"
'       "node_1": "A concept from extracted ontology",n'
'       "node_2": "A related concept from extracted ontology",n'
'       "edge": "relationship between the two concepts, node_1 and node_2 in one or two sentences"n'
"   }, {...}n"
"]"
)USER_PROMPT = f"context: ```{input}``` nn output: "

If we pass our (not fit for) nursery rhyme with this prompt, here is the result.

[
{
"node_1": "Mary",
"node_2": "lamb",
"edge": "owned by"
},
{
"node_1": "plate",
"node_2": "food",
"edge": "contained"
}, . . .
]

Notice, that it even guessed ‘food’ as an idea, which was not explicitly mentioned within the text chunk. Isn’t this excellent!

If we run this through every text chunk of our example article and convert the json right into a Pandas data frame, here’s what it looks like.

The Mistral and the Prompt

LEAVE A REPLY Cancel reply