GliNER2: Extracting Structured Information from Text

, we had SpaCy, which was the de facto NLP library for each beginners and advanced users. It made it easy to dip your toes into NLP, even in the event you weren’t a deep learning expert. Nevertheless, with the rise of ChatGPT and other LLMs, it seems to have been moved aside.

While LLMs like Claude or Gemini can do all types of NLP things automagically, you don’t all the time wish to bring a rocket launcher to a fist fight. GliNER is spearheading the return of smaller, focused models for traditional NLP techniques like entity and relationship extraction. It’s lightweight enough to run on a CPU, yet powerful enough to have built a thriving community around it.

Released earlier this 12 months, GliNER2 is a big step forward. Where the unique GliNER focused on entity recognition (spawning various spin-offs like GLiREL for relations and GLiClass for classification), GliNER2 unifies named entity recognition, text classification, relation extraction, and structured data extraction right into a single framework.

The core shift in GliNER2 is its schema-driven approach, which permits you to define extraction requirements declaratively and execute multiple tasks in a single inference call. Despite these expanded capabilities, the model stays CPU-efficient, making it a really perfect solution for transforming messy, unstructured text into clean data without the overhead of a giant language model.
As a knowledge graph enthusiast at Neo4j, I’ve been particularly drawn to newly added structured data extraction via extract_json method. While entity and relation extraction are useful on their very own, the power to define a schema and pull structured JSON directly from text is what really excites me. It’s a natural fit for knowledge graph ingestion, where .

Constructing knowledge graphs with GliNER2. Image by creator.

On this blog post, we’ll evaluate GliNER2’s capabilities, specifically the model fastino/gliner2-large-v1, with a give attention to how well it may help us construct clean, structured knowledge graphs.

Dataset selection

We’re not running formal benchmarks here, just a fast vibe check to see what GliNER2 can do. Here’s our test text, pulled from the Ada Lovelace Wikipedia page:

Augusta Ada King, Countess of Lovelace (10 December 1815–27 November 1852), also often known as Ada Lovelace, was an English mathematician and author chiefly known for work on Charles Babbage’s proposed mechanical general-purpose computer, the analytical engine. She was the primary to recognise the machine had applications beyond pure calculation. Lovelace is commonly considered the primary computer programmer. Lovelace was the one legitimate child of poet Lord Byron and reformer Anne Isabella Milbanke. All her half-siblings, Lord Byron’s other children, were born out of wedlock to other women. Lord Byron separated from his wife a month after Ada was born, and left England perpetually. He died in Greece throughout the Greek War of Independence, when she was eight. Lady Byron was anxious about her daughter’s upbringing and promoted Lovelace’s interest in mathematics and logic, to forestall her developing her father’s perceived insanity. Despite this, Lovelace remained fascinated about her father, naming one son Byron and the opposite, for her father’s middle name, Gordon. Lovelace was buried next to her father at her request. Although often sick in childhood, Lovelace pursued her studies assiduously. She married William King in 1835. King was a Baron, and was created Viscount Ockham and 1st Earl of Lovelace in 1838. The name Lovelace was chosen because Ada was descended from the extinct Baron Lovelaces. The title given to her husband thus made Ada the Countess of Lovelace.

At 322 tokens, it’s a solid chunk of text to work with. Let’s dive in.

Entity extraction

Let’s start with entity extraction. At its core, entity extraction is the strategy of mechanically identifying and categorizing key entities inside text, akin to , , , or . GliNER1 already handled this well, but GliNER2 takes it further by letting you add descriptions to entity types, providing you with finer control over what gets extracted.

entities = extractor.extract_entities(
    text,
    {
        "Person": "Names of individuals, including nobility titles.",
        "Location": "Countries, cities, or geographic places.",
        "Invention": "Machines, devices, or technological creations.",
        "Event": "Historical events, wars, or conflicts."
    }
)

Entity extraction results. Image by creator.

Providing custom descriptions for every entity type helps resolve ambiguity and improves extraction accuracy. This is particularly useful for broad categories like whereon its own, the model won’t know whether to incorporate wars, ceremonies, or personal milestones. Adding clarifies the intended scope.

Relation extraction

Relation extraction identifies relationships between pairs of entities in text. For instance, within the sentence , a relation extraction model would discover the connection Founded between the entities Steve Jobs and Apple.

With GLiNER2, you define only the relation types you desire to extract as you may’t constrain which entity types are allowed as the pinnacle or tail of every relation. This simplifies the interface but may require post-processing to filter unwanted pairings.

relations = extractor.extract_relations(
    text,
    {
        "parent_of": "An individual is the parent of one other person",
        "married_to": "An individual is married to a different person",
        "worked_on": "An individual contributed to or worked on an invention",
        "invented": "An individual created or proposed an invention",
        "alias": "Entity is an alias, nickname, title, or alternate reference for an additional entity",
        "same_as": "Entity is an alias, nickname, title, or alternate reference for an additional entity"
    }
)

Relation extraction results. Image by creator.

The extraction accurately identified key relationships: Lord Byron and Anne Isabella Milbanke as Ada’s parents, her marriage to William King, Babbage as inventor of the analytical engine, and Ada’s work on it. Notably, the model detected as an alias of but same_as wasn’t captured despite having a similar description. The choice doesn’t seem random because the model all the time populates the alias but never the same_as relationship.

Conveniently, GLiNER2 allows so you may get entity types alongside relation types in a single pass. Nevertheless, the operations are independent: entity extraction doesn’t filter or constrain which entities appear in relation extraction, and vice versa. Consider it as running each extractions in parallel moderately than as a pipeline.

schema = (extractor.create_schema()
    .entities({
        "Person": "Names of individuals, including nobility titles.",
        "Location": "Countries, cities, or geographic places.",
        "Invention": "Machines, devices, or technological creations.",
        "Event": "Historical events, wars, or conflicts."
    })
    .relations({
        "parent_of": "An individual is the parent of one other person",
        "married_to": "An individual is married to a different person",
        "worked_on": "An individual contributed to or worked on an invention",
        "invented": "An individual created or proposed an invention",
        "alias": "Entity is an alias, nickname, title, or alternate reference for an additional entity"
    })
)

results = extractor.extract(text, schema)

Combined entity and relation extraction results. Image by creator.

The combined extraction now gives us entity types, that are distinguished by color. Nevertheless, several nodes appear isolated (Greece, England, Greek War of Independence) since not every extracted entity participates in a detected relationship.

Structured JSON extraction

Perhaps essentially the most powerful feature is structured data extraction via extract_json. This mimics the structured output functionality of LLMs like ChatGPT or Gemini but runs entirely on CPU. Unlike entity and relation extraction, this helps you to define arbitrary fields and pull them into structured records. The syntax follows a field_name::type::description pattern, where type is str or list.

results = extractor.extract_json(
    text,
    {
        "person": [
            "name::str",
            "gender::str::male or female",
            "alias::str::brief summary of included information about the person",
            "description::str",
            "birth_date::str",
            "death_date::str",
            "parent_of::str",
            "married_to::str"
        ]
    }
)

Here we’re experimenting with some overlap: alias, parent_of, and married_to may be modeled as relations. It’s price exploring which approach works higher to your use case. One interesting addition is the description field, which pushes the boundaries a bit: it’s closer to summary generation than pure extraction.

{
  "person": [
    {
      "name": "Augusta Ada King",
      "gender": null,
      "alias": "Ada Lovelace",
      "description": "English mathematician and writer",
      "birth_date": "10 December 1815",
      "death_date": "27 November 1852",
      "parent_of": "Ada Lovelace",
      "married_to": "William King"
    },
    {
      "name": "Charles Babbage",
      "gender": null,
      "alias": null,
      "description": null,
      "birth_date": null,
      "death_date": null,
      "parent_of": "Ada Lovelace",
      "married_to": null
    },
    {
      "name": "Lord Byron",
      "gender": null,
      "alias": null,
      "description": "reformer",
      "birth_date": null,
      "death_date": null,
      "parent_of": "Ada Lovelace",
      "married_to": null
    },
    {
      "name": "Anne Isabella Milbanke",
      "gender": null,
      "alias": null,
      "description": "reformer",
      "birth_date": null,
      "death_date": null,
      "parent_of": "Ada Lovelace",
      "married_to": null
    },
    {
      "name": "William King",
      "gender": null,
      "alias": null,
      "description": null,
      "birth_date": null,
      "death_date": null,
      "parent_of": "Ada Lovelace",
      "married_to": null
    }
  ]
}

The outcomes reveal some limitations. All gender fields are null, though Ada is explicitly called a , the model doesn’t infer she’s female. The description field captures only surface-level phrases (“English mathematician and author”, “reformer”) moderately than generating meaningful summaries, not useful for workflows like Microsoft’s GraphRAG that depend on richer entity descriptions. There are also clear errors: Charles Babbage and William King are incorrectly marked as parent_of Ada, and Lord Byron is labeled a (that’s Anne Isabella). These errors with parent_ofdidn’t come up during relation extraction, so perhaps that’s the higher method here. Overall, the outcomes suggests the model excels at extraction but struggles with reasoning or inference, likely a tradeoff of its compact size.

Moreover, all attributes are optional, which is smart and simplifies things. Nevertheless, you have got to watch out as sometimes the name attribute might be null, hence making the record invalid. Lastly, we could use something like PyDantic to validate results and forged to to appropriate types like floats or dates and handle invalid results.

Constructing knowledge graphs

Since GLiNER2 allows multiple extraction types in a single pass, we are able to mix all above methods to construct a knowledge graph. Moderately than running separate pipelines for entity, relation, and structured data extraction, a single schema definition handles all three. This makes it straightforward to go from raw text to a wealthy, interconnected representation.

schema = (extractor.create_schema()
    .entities({
        "Person": "Names of individuals, including nobility titles.",
        "Location": "Countries, cities, or geographic places.",
        "Invention": "Machines, devices, or technological creations.",
        "Event": "Historical events, wars, or conflicts."
    })
    .relations({
        "parent_of": "An individual is the parent of one other person",
        "married_to": "An individual is married to a different person",
        "worked_on": "An individual contributed to or worked on an invention",
        "invented": "An individual created or proposed an invention",
    })
    .structure("person")
        .field("name", dtype="str")
        .field("alias", dtype="str")
        .field("description", dtype="str")
        .field("birth_date", dtype="str")
)

results = extractor.extract(text, schema)

The way you map these outputs to your graph (nodes, relationships, properties) relies on your data model. In this instance, we use the next data model:

Knowledge graph construction result. Image by creator.

What you may notice is that we include the unique text chunk within the graph as well, which allows us to retrieve and reference the source material when querying the graph, enabling more accurate and traceable results. The import Cypher looks like the next:

import_cypher_query = """
// Create Chunk node from text
CREATE (c:Chunk {text: $text})

// Create Person nodes with properties
WITH c
CALL (c) {
  UNWIND $data.person AS p
  WITH p
  WHERE p.name IS NOT NULL
  MERGE (n:__Entity__ {name: p.name})
  SET n.description = p.description,
      n.birth_date = p.birth_date
  MERGE (c)-[:MENTIONS]->(n)
  WITH p, n WHERE p.alias IS NOT NULL
  MERGE (m:__Entity__ {name: p.alias})
  MERGE (n)-[:ALIAS_OF]->(m)
}

// Create entity nodes dynamically with __Entity__ base label + dynamic label
CALL (c) {
  UNWIND keys($data.entities) AS label
  UNWIND $data.entities[label] AS entityName
  MERGE (n:__Entity__ {name: entityName})
  SET n:$(label)
  MERGE (c)-[:MENTIONS]->(n)
}

// Create relationships dynamically
CALL (c) {
  UNWIND keys($data.relation_extraction) AS relType
  UNWIND $data.relation_extraction[relType] AS rel
  MATCH (a:__Entity__ {name: rel[0]})
  MATCH (b:__Entity__ {name: rel[1]})
  MERGE (a)-[:$(toUpper(relType))]->(b)
}
RETURN distinct 'import accomplished' AS result
"""

The Cypher query takes the outcomes from GliNER2 output and stores them into Neo4j. We could also include embeddings for the text chunks, entities, and so forth.

Summary

GliNER2 is a step in the suitable direction for structured data extraction. With the rise of LLMs, it’s easy to succeed in for ChatGPT or Claude every time you might want to pull information from text, but that’s often overkill. Running a multi-billion-parameter model to extract a couple of entities and relationships feels wasteful when smaller, specialized tools can do the job on a CPU.

GliNER2 unifies named entity recognition, relation extraction, and structured JSON output right into a single framework. It’s well-suited for tasks like knowledge graph construction, where you would like consistent, schema-driven extraction moderately than open-ended generation.
While the model has its limitations. It really works best for direct extraction moderately than inference or reasoning, and results could be inconsistent. However the progress from the unique GliNER1 to GliNER2 is encouraging, and hopefully we’ll see continued development on this space. For a lot of use cases, a focused extraction model beats an LLM that’s doing excess of you would like.

GliNER2: Extracting Structured Information from Text

Dataset selection

Entity extraction

Relation extraction

Structured JSON extraction

Constructing knowledge graphs

Summary

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

The best way to Make Claude Code Higher at One-Shotting Implementations

Construct and Stream Browser-Based XR Experiences with NVIDIA CloudXR.js

AI benchmarks are broken. Here’s what we’d like as a substitute.

Quantum computers need vastly fewer resources than thought to interrupt vital encryption

Stream High-Fidelity Spatial Computing Content to Any Device with NVIDIA CloudXR 6.0

GliNER2: Extracting Structured Information from Text

Dataset selection

Entity extraction

Relation extraction

Structured JSON extraction

Constructing knowledge graphs

Summary

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.