Let’s Call a Spade a Spade: RDF and LPG — Cousins Who Should Learn to Live Together

In years, there was a proliferation of articles, LinkedIn posts, and marketing materials presenting graph data models from different perspectives. This text will refrain from discussing specific products and as a substitute focus solely on the comparison of RDF (Resource Description Framework) and LPG (Labelled Property Graph) data models. To make clear, there is no such thing as a mutually exclusive selection between RDF and LPG — they might be employed in conjunction. The suitable selection depends upon the particular use case, and in some instances each models could also be obligatory; there is no such thing as a single data model that’s universally applicable. In reality, polyglot persistence and multi—model databases (databases that may support different data models throughout the database engine or on top of the engine), are gaining popularity as enterprises recognise the importance of storing data in diverse formats to maximise its value and forestall stagnation. As an illustration, storing time series financial data in a graph model isn’t probably the most efficient approach, because it could end in minimal value extraction in comparison with storing it in a time series matrix database, which enables rapid and multi—dimensional analytical queries.

The aim of this discussion is to supply a comprehensive comparison of RDF and Lpg data models, highlighting their distinct purposes and overlapping usage. While articles often present biased evaluations, promoting their very own tools, it is important to acknowledge that these comparisons are sometimes flawed, as they compare apples to wheelbarrows relatively than apples to apples. This subjectivity can leave readers perplexed and unsure concerning the writer’s intended message. In contrast, this text goals to supply an objective evaluation, specializing in the strengths and weaknesses of each RDF and LPG data models, relatively than acting as promotional material for any tool.

Quick recap of the info models

Each Rdf and LPG are descendants of the graph data model, although they possess different structures and characteristics. A graph comprises vertices (nodes) and edges that connect two vertices. Various graph types exist, including undirected graphs, directed graphs, multigraphs, hypergraphs and so forth. The RDF and LPG data models adopt the directed multigraph approach, wherein edges have the “from” and “to” ordering, and may join an arbitrary variety of distinct edges.

The RDF data model is represented by a set of reflecting the natural language structure of subject—verb—object, with the , , and represented as such. Consider the next easy example: . This sentence might be represented as an RDF statement or fact with the next structure — is a subject resource, the predicate (relation) is and the article value of The worth node could either be a URI (unique resource identifier) or a datatype value (corresponding to integer or string). If the article is a semantic URI, or as also they are known a , then the article would result in other facts, corresponding to . This data model allows for resources to be reused and interlinked in the identical RDF—based graph, or in every other RDF graph, internal or external. Once a resource is defined and a URI is “minted”, this URI becomes immediately available and might be utilized in any context that’s deemed obligatory.

Then again, the LPG data model encapsulates the set of vertices, edges, label project functions for vertices and edges, and key—value property project function for vertices and edges. For the previous example, the representation can be as follows:


(person:Person {name: "Jeremy"})

(city:City {name: "Birkirkara"}) 

(person)—[:BORN_IN]—>(city)

Consequently, the first distinction between RDF and LPG lies inside how nodes are connected together. Within the RDF model, relationships are triples where predicates define the connection. Within the LPG data model, edges are first—class residents with their very own properties. Subsequently, within the RDF data model, predicates are globally defined in a schema and are reused in data graphs, whilst within the LPG data model, each edge is uniquely identified.

Schema vs Schema—less. Do semantics matter in any respect?

Semantics is a branch of linguistics and logic that is anxious concerning the meaning, on this case the meaning of knowledge, enabling each humans and machines to interpret the context of the info and any relationships within the said context.

Historically, the World Wide Web Consortium (W3C) established the Resource Description Framework (RDF) data model as a standardised framework for data exchange throughout the Web. RDF facilitates seamless data integration and the merging of diverse sources, while concurrently supporting schema evolution without necessitating modifications to data consumers. Schemas¹, or ontologies, function the muse for data represented in RDF, and thru these ontologies the semantic meaning of the info might be defined. This capability makes data integration one in all the many suitable applications of the RDF data model. Through various W3C groups, standards were established on how schemas and ontologies might be defined, primarily RDF Schema (RDFS), Web Ontology Language (OWL), and recently SHACL. RDFS provides the low—level constructs for outlining ontologies, corresponding to the Person entity with properties name, gender, knows, and the expected form of node. OWL provides constructs and mechanisms for formally defining ontologies through axioms and rules, enabling the inference of implicit data. Whilst OWL axioms are taken as a part of the knowledge graph and used to infer additional facts, SHACL was introduced as a schema to validate constraints, higher often called data shapes (consider it as “what should a Person consist of?”) against the knowledge graph. Furthermore, through additional features to the SHACL specifications, rules and inference axioms may also be defined using SHACL.

In summary, schemas facilitate the enforcement of the correct instance data. This is feasible since the RDF permits any value to be defined inside a fact, provided it adheres to the specifications. Validators, corresponding to in—built SHACL engines or OWL constructs, are chargeable for verifying the info’s integrity. On condition that these validators are standardised, all triple stores, those adhering to the RDF data model, are encouraged to implement them. Nevertheless, this doesn’t negate the concept of flexibility. The RDF data model is designed to accommodate the of knowledge throughout the schema’s boundaries. Consequently, while an RDF data model strongly encourages the usage of schemas (or ontologies) as its foundation, experts discourage the . This endeavour does require an upfront effort and collaboration with domain experts to construct an ontology that accurately reflects the use case and the info that will probably be stored within the knowledge graph. Nonetheless, the RDF data model offers the pliability to create and define RDF—based data independently of a pre—existing ontology, or to develop an ontology iteratively throughout an information project. Moreover, schemas are designed for reuse, and the RDF data model facilitates this reusability. It’s noteworthy that an RDF—based knowledge graph typically encompasses each instance data (corresponding to “Giulia and Matteo are siblings”) and ontology/schema axioms (corresponding to “Two persons are siblings after they have a parent in common”).

Nonetheless, the importance of ontologies extends beyond providing an information structure; additionally they impart semantic intending to the info. As an illustration, in constructing a family tree, an ontology enables the express definition of relationships corresponding to aunt, uncle, cousins, niece, nephew, ancestors, and descendants without the necessity for the express data to be defined within the knowledge graph. Consider how this idea might be applied in various pharmaceutical scenarios, just to say one vertical domain. Reasoning is a fundamental component that renders the RDF data model a semantically powerful model for designing knowledge graphs. Ontologies provide a specific data point with all of the obligatory context, including its neighbourhood and its meaning. As an illustration, if there’s a literal node with the worth 37, an RDF—based agent can comprehend that the worth 37 represents the age of an individual named , who’s the nephew of an individual named .

In contrast, the LPG data model offers a more agile and simple deployment of graph data. LPGs have reduced give attention to schemas (they only support some constraints and “labels”/classes). Graph databases adhering to the LPG data model are known for his or her speed in preparing data for consumption as a result of its schema—less nature. This makes them a more suitable selection for data architects in search of to deploy their data in such a fashion. The LPG data model is especially advantageous in scenarios where data isn’t intended for growth or significant changes. As an illustration, a modification to a property would necessitate refactoring the graph to update nodes with the newly added or updated key—value property. While LPG provides the illusion of providing semantics through node and edge labels and corresponding functions, it doesn’t inherently achieve this. LPG functions consistently return a map of values related to a node or edge. Nonetheless, this is prime when coping with use cases that must perform fast graph algorithms as the info is accessible directly within the nodes and edges, and there is no such thing as a need for further graph traversal.

Nevertheless, one fundamental feature of the LPG data model is its ease and suppleness of attaching granular attributes or properties to either vertices or edges. As an illustration, if there are two person nodes, “Alice” and “Bob,” with an edge labelled “marriedTo,” the LPG data model can accurately and simply state that Alice and Bob were married on February 29, 2024. In contrast, the RDF data model could achieve this through various workarounds, corresponding to reification, but this is able to end in more complex queries in comparison with the LPG data model’s counterpart.

Standards, Standardisation Bodies, Interoperability.

Within the previous section we described how W3C provides standardisation groups pertaining to the RDF data model. As an illustration, a W3C working group is actively developing the RDF* standard, which contains the concept (attaching attributes to facts/triples) throughout the RDF data model. This standard is anticipated to be adopted and supported by all triple stores tools and agents based on the RDF data model. Nevertheless, the strategy of standardisation might be protracted, regularly leading to delays that leave such vendors at an obstacle.

Nonetheless, standards facilitate much—needed interoperability. Knowledge Graphs built upon the RDF data model might be easily ported between different applications and triple store, as they haven’t any vendor lock—in, and standardisation formats are provided. Similarly, they might be queried with one standard query language called SPARQL, which is utilized by the various vendors. Whilst the query language is identical, vendors opt for various , reminiscent of how any database engine (SQL or NoSQL) is implemented, to boost performance and speed.

Most LPG graph implementations, although open source, utilise proprietary or custom languages for storing and querying data, lacking a regular adherence. This practice decreases interoperability and portability of knowledge between different vendors. Nevertheless, in recent months, ISO approved and published ISO/IEC 39075:2024 that standardises the Graph Query Language (GQL) based on Cypher. Because the charter rightly points out, the graph data model has unique benefits over relational databases corresponding to fitting data that is supposed to have hierarchical, complex or arbitrary structures. Nevertheless, the proliferation of vendor—specific implementations overlooks a vital functionality – a standardised approach to querying property graphs. Subsequently, it’s paramount that property graph vendors reflect their products to this standard.

Recently, OneGraph² was proposed as an interoperable metamodel that is supposed to beat the selection between the RDF data model and the LPG data model. Moreover, extensions to openCypher are proposed³ to permit the querying over RDF data to be prolonged as a way of querying over RDF data. This vision goals to pave the way in which for having data in each RDF and LPG combined in a single, integrated database, ensuring the advantages of each data models.

Other notable differences

Notable differences, mostly in query languages, are there to support the info models. Nevertheless, we strongly argue against the proven fact that a set of query language features should dictate which data model to make use of. Nonetheless, we’ll discuss among the differences here for a more complete overview.

The RDF data model offers a natural way of supporting global unique resource identifiers (URIs), which manifest in three distinct characteristics. Inside the RDF domain, a set of facts described by an RDF statement (i.e. ) having the identical subject URI is known as a . Data stored in RDF graphs might be conveniently split into multiple , ensuring that every graph encapsulates distinct concerns. As an illustration, using the RDF data model it is simple to construct graphs that store data or resources, metadata, audit and provenance data individually, whilst and querying capabilities might be seamlessly executed across these multiple graphs. Moreover, graphs can establish interlinks with resources positioned in graphs hosted on different servers. Querying these external resources is facilitated through throughout the SPARQL protocol. Given the adoption of URIs, RDF embodies the unique vision of Linked Data⁴, a vision that has since been adopted, to an extent, as a tenet within the FAIR principles⁵, Data Fabric, Data Mesh, and HATEOAS amongst others. Consequently, the RDF data model serves as a flexible framework that may seamlessly integrate with these visions without the necessity for any modifications.

LPGs, however, are higher geared towards , and queries. Whilst these functionalities might be regarded as specific implementations within the query language, they’re pertinent considerations when modelling data in a graph, since these are also advantages over traditional relational databases. SPARQL, through the W3C advice, has limited support to path traversal⁶, and a few vendor triple store implementations do support and implement (although not as a part of the SPARQL 1.1 advice) variable length path⁷. At time of writing, the SPARQL 1.2 advice is not going to incorporate this feature either.

Data Graph Patterns

The next section describes various data graph patterns and the way they’d fit, or not, each data models discussed in this text.

Pattern	RDF data model	LPG data model
	Through schemas properties are globally defined through various semantic properties corresponding to domain and ranges, algebraic properties corresponding to inverse of, reflexive, transitive, and permit for informative annotations on properties definitions.	Semantics of relations (edges) isn’t supported in property graphs
	String data can have a language tag attached to it and is taken into account when processing	Could be a custom field or relationship (e.g. label_en, label_mt) but haven’t any special treatment.
	Automatic inferencing, reasoning and may handle complex classes.	Can model hierarchies, but not model hierarchies of classes of people. Would require explicit traversal of classification hierarchies
	Requires workarounds like reification and complicated queries.	Could make direct assertions over them, natural representation and efficient querying.
	Properties inherited through defined class hierarchies. Moreover, the RDF data model has the flexibility to represent subproperties.	Have to be handled in application logic.
	Generally binary relationships are represented in triples, but N—ary relations might be done via blank nodes, additional resources, or reification.	Can often be translated to additional attributes on edges.
	Available through schema definitions: RDFS, OWL or SHACL.	Supports minimal constraints corresponding to value uniqueness but generally requires validation through schema layers or application logic.
	Will be done in various ways, including having a separate named graph and links to the principal resources, or through reification.	Can add properties to nodes and edges to capture context and provenance.
	Automate the inferencing of inverse relationships, transitive patterns, complex property chains, disjointness and negation.	Either require explicit definition, in application logic, or no support in any respect (disjointness and negation).

Semantics in Graphs — A Family Tree Example

A comprehensive exploration of the applying of RDF data model and semantics inside an LPG application might be present in various articles published on Medium, LinkedIn, and other blogs. As outlined within the previous section, the LPG data model isn’t specifically designed for reasoning purposes. Reasoning involves applying logical rules on existing facts as a option to deduce recent knowledge; this is vital because it helps uncover hidden relationships that weren’t explicitly stated before.

On this section we’ll display how axioms are defined for a straightforward yet practical example of a family tree. A family tree is a super candidate for any graph database as a result of its hierarchical structure and its flexibility in being defined inside any data model. For this demonstration, we’ll model the Pewterschmidt family, which is a fictional family from the favored animated television series Family Guy.

All images, unless otherwise noted, are by the writer.

On this case, we are only creating one relationship called ‘hasChild’. So, Carter has a toddler named Lois, and so forth. The one other attribute we’re adding is the gender (Male/Female). For the RDF data model, we have now created a straightforward OWL ontology:

A diagram of a child

AI-generated content may be incorrect.

The present schema enables us to represent the family tree in an RDF data model. With ontologies, we will begin defining the next properties, whose data might be deduced from the initial data. We introduce the next properties:

Property	Comment	Axiom	Example
isAncestorOf	A transitive property which can be the inverse of the isDescendentOf property. OWL engines mechanically infer transitive properties without the necessity of rules.	hasChild(?x, ?y) —> isAncestorOf(?x, ?y)	Carter – —> Lois – —> Chris Carter – —> Chris
isDescendentOf	A transitive property, inverse of isAncestorOf. OWL engines mechanically infers inverse properties without the necessity of rules	—	Chris – —> Peter
isBrotherOf	A subproperty of isSiblingOf and disjoint with isSisterOf, meaning that the identical person can’t be the brother and the sister of one other person at the identical time, whilst they can not be the brother of themselves.	hasChild(?x, ?y), hasChild(?x, ?z), hasGender(?y, Male), notEqual(?y, ?z) —> isBrotherOf(?y, ?z)	Chris – —> Meg
isSisterOf	A subproperty of isSiblingOf and disjoint with isBrotherOf, meaning that the identical person can’t be the brother and the sister or one other person at the identical time, whilst they can not be the brother of themselves.	hasChild(?x, ?y), hasChild(?x, ?z), hasGender(?y, Female), notEqual(?y, ?z) —> isSisterOf(?y, ?z)	Meg – —> Chris
isSiblingOf	An excellent—property of isBrotherOf and isSisterOf. OWL engines mechanically infers super—properties	—	Chris – —> Meg
isNephewOf	A property that infers the aunts and uncles of kids based on their gender.	isSiblingOf(?x, ?y), hasChild(?x, ?z), hasGender(?z, Male), notEqual(?y, ?x) —> isNephewOf(?z, ?y	Stewie – —> Carol
isNieceOf	A property that infers the aunts and uncles of kids based on their gender.	isSiblingOf(?x, ?y), hasChild(?x, ?z), hasGender(?z, Female), notEqual(?y, ?x) —> isNieceOf(?z, ?y)	Meg – —> Carol

These axioms are imported right into a triple store, to which the engine will apply them to the express facts in real—time. Through these axioms, triple stores allow the querying of inferred/hidden triples.. Subsequently, if we wish to get the express details about Chris Griffin, the next query might be executed:

SELECT ?p ?o WHERE {
  ?p ?o EXPLICIT true
}

If we’d like to get the inferred values for Chris, the SPARQL engine will provide us with 10 inferred facts:

SELECT ?p ?o WHERE {
  ?p ?o EXPLICIT false
}

This question will return all implicit facts for Chris Griffin. The image below shows the discovered facts. These are usually not explicitly stored within the triple store.

These results couldn’t be produced by the property graph store, as no reasoning might be applied mechanically.

The RDF data model empowers users to find previously unknown facts, a capability that the LPG data model lacks. Nevertheless, LPG implementations can bypass this limitation by developing complex stored procedures. Nevertheless, unlike in RDF, these stored procedures could have variations (if in any respect possible) across different vendor implementations, rendering them non—portable and impractical.

Take-home message

In this text, the RDF and LPG data models have been presented objectively. On the one hand, the LPG data model offers a rapid deployment of graph databases without the necessity for a sophisticated schema to be defined (i.e. it’s schema—less). Conversely, the RDF data model requires a more time—consuming bootstrapping process for graph data, or knowledge graph, as a result of its schema definition requirement. Nevertheless, the choice to adopt one model over the opposite should consider whether the extra effort is justified in providing meaningful context to the info. This consideration is influenced by specific use cases. As an illustration, in social networks where neighbourhood exploration is a primary requirement, the LPG data model could also be more suitable. Then again, for more advanced knowledge graphs that necessitate reasoning or data integration across multiple sources, the RDF data model is the popular selection.

It’s crucial to avoid letting personal preferences for query languages dictate the selection of knowledge model. Regrettably, many articles available primarily function marketing tools relatively than educational resources, hindering adoption and creating confusion throughout the graph database community. Moreover, within the era of abundant and accessible information, it will be higher for vendors to refrain from promoting misinformation about opposing data models. A general misconception promoted by property graph evangelists is that the RDF data model is overly complex and academic, resulting in its dismissal. This assertion relies on a preferential prejudice. RDF is each a machine and human readable data model that’s near business language, especially through the definition of schemas and ontologies. Furthermore, the adoption of the RDF data model is widespread. As an illustration, Google uses the RDF data model as their standard to represent meta—details about web pages using schema.org. There’s also the belief that the RDF data model will exclusively function with a schema. This can be a misconception, as in spite of everything, the info defined using the RDF data model is also schema—less. Nevertheless, it’s acknowledged that every one semantics can be lost, and the info will probably be reduced to easily graph data. This text also mentions how the oneGraph vision goals to determine a bridge between the 2 data models.

To conclude, technical feasibility alone mustn’t drive implementation decisions wherein graph data model to pick out. Reducing higher—level abstractions to primitive constructs often increases complexity and may impede solving specific use cases effectively. Decisions needs to be guided by use case requirements and performance considerations relatively than merely what’s technically possible.

¹ Schemas and ontologies are used interchangeably in this text.
² Lassila, O. et al. The OneGraph Vision: Challenges of Breaking the Graph Model Lock—In. https://www.semantic-web-journal.net/system/files/swj3273.pdf.
³ Broekema, W. et al. openCypher Queries over Combined RDF and LPG Data in Amazon Neptune. https://ceur-ws.org/Vol-3828/paper44.pdf.
⁴ https://www.w3.org/DesignIssues/LinkedData.html
⁵ https://www.go-fair.org/fair-principles

Let’s Call a Spade a Spade: RDF and LPG — Cousins Who Should Learn to Live Together

Quick recap of the info models

Schema vs Schema—less. Do semantics matter in any respect?

Standards, Standardisation Bodies, Interoperability.

Other notable differences

Data Graph Patterns

Semantics in Graphs — A Family Tree Example

Take-home message

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Recent AI agent learns to make use of CAD to create 3D objects from sketches

Designing digital resilience within the agentic AI era

OpenAI pushes Codex to the Max

The right way to Perform Agentic Information Retrieval

The price of considering

Let’s Call a Spade a Spade: RDF and LPG — Cousins Who Should Learn to Live Together

Quick recap of the info models

Schema vs Schema—less. Do semantics matter in any respect?

Standards, Standardisation Bodies, Interoperability.

Other notable differences

Data Graph Patterns

Semantics in Graphs — A Family Tree Example

Take-home message

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.