How To Construct a Graph-Based Suggestion Engine Using EDG and Neo4j

-

On this tutorial, I’ll show you tips on how to manage a taxonomy in EDG and publish it to a Neo4j instance, where it will possibly be populated with additional data to power a suggestion engine. The taxonomy, which is built and maintained in TopQuadrant’s EDG, defines the structure. A set of (fake) academic journal articles serves because the instance data that populates Neo4j. I’ll use a small hierarchy of STEM categories because the taxonomy to prepare the articles. This data is roofed under the Creative Commons CC0 1.0 Universal Public Domain Dedication.

What’s the purpose of all of this? The purpose is that quite a lot of meaning lives within the taxonomy itself. Each article is tagged with probably the most specific category that applies, but since the taxonomy encodes parent–child relationships, we are able to infer higher-level associations mechanically. For instance, if an article is tagged with , it’s also about and , even when it isn’t explicitly tagged that way. The taxonomy doesn’t just classify, it enables reasoning over how topics relate, so the info source only must record probably the most relevant tag, and the hierarchy fills in the remaining.

We’re separating the instance level information on what a person article is about from the meta information concerning the topics themselves and the way they relate to one another.

The explanations you’d need to construct with this type of architecture are:

  1. Inferencing: Tag with one concept but use the taxonomy to associate many other concepts to the content. As an alternative of tagging an article with and , I can just tag it with . The taxonomy knows that is a branch of . The parent concept, , might be inferred based on the taxonomy.
  2. Aligning multiple systems: I can use one taxonomy to construct a suggestion engine in Neo4j and a GraphRAG application in GraphDB. One team can use vector-based tagging on content stored in SharePoint while one other uses NLP rule-based tagging on content stored in Adobe Experience Manager (AEM). All of those apps are aligned because they’re all using the identical reference data.
  3. Change management: If I would like to recategorize as a branch of fairly than a branch of I just need to vary its parent within the taxonomy. If I don’t have a separate taxonomy, I’d must retag every document tagged with If I actually have multiple downstream apps using the identical list of terms, this becomes a nightmare. I’d must retag every entity tagged with in every application and ensure all the opposite tags related to that document are correct.
  4. Play to tools’ strengths: EDG is great and managing metadata and taxonomies and ensuring those things are aligned and governed well. Neo4j and other graph databases are great at high-performance graph analytics at scale but struggle with the metadata management side of things. With this arrange, we are able to get one of the best of each worlds

There are other architectural approaches to constructing something like this, after all, and there are drawbacks to the approach I outline here. A number of the primary ones include:

  1. Overkill for easy use cases: This tutorial uses an easy demo, however the architecture makes probably the most sense when your data and use cases are complex. Most graph databases, including Neo4j, allow you to define a schema or basic ontology and represent taxonomies with hierarchical relationships. In case your data is comparatively easy, your taxonomy is easy, or just one team needs to make use of it, you could not need this many tools.
  2. Skillset and learning curve: Using EDG and Neo4j together assumes familiarity with two different paradigms: ontology modeling in RDF/SHACL and graph querying in property graphs/Cypher. Many teams are comfortable with one but not the opposite.
  3. More moving parts: Keeping a taxonomy separate from the info you might be tagging means it’s worthwhile to be certain that the tags align with the taxonomy. In the event that they drift, the graph stops fitting together cleanly within the database.
  4. Vendor lock-in: Each Neo4j and EDG are business products so there may be at all times going to be some lock-in and potential migration costs. The standards underlying EDG (RDF, SHACL, and SPARQL), are open source standards from the W3C, which does mitigate overall technical lock-in.

Neo4j is a labeled property graph (LPG). EDG is a knowledge graph curation tool based in RDF and SHACL. LPGs and RDF are two different graph technologies that, historically, haven’t been compatible. EDG has recently built a Neo4j integration feature, nonetheless, which allows users to construct using each technologies.

Below is a visible representation of how these two technologies can work together.

At the bottom in pink, you have data storage. I actually have this split into internal data and external data. Internal data is the raw data you would be storing in an information lake, a content management system (CMS) like SharePoint, or a relational database. There may additionally be external datasets you would like to integrate into your app. These could possibly be public, free data sources like WikiData, upper level ontologies like gist, or proprietary reference datasets like SNOMED or MedDRA (medical taxonomies).

EDG can then act because the semantic layer between the underlying data and downstream apps. You may manage your ontologies, taxonomies, reference data, and metadata in a single place and push what it’s worthwhile to applications like Neo4j as needed. You can too load data directly out of your underlying data sources into Neo4j or every other application.

Step 1: Get free versions of EDG and Neo4j

First, we’re going to must get free versions of those products to mess around with.

For EDG, you’ll must go to this website and request a free trial. You’ll get a link to download EDG together with a license in an email. After the download completes, there may be an executable file within the edg folder, also called edg. Double click that and it should start running in your browser. In the event you don’t have Java installed, it would prompt you to put in Java first.

EDG will then open in your browser in a brand new tab called something like http://localhost:8083/. But it would say it isn’t registered. Click on Product Registration after which upload the license file that was also sent in the e-mail. Then click “Register Product”.

After uploading the license, you possibly can return to the house screen by clicking the TopQuadrant logo in the highest left corner. Now you need to give you the chance to see the primary EDG landing page.

Now we’d like a free version of Neo4j. Go to this link to start along with your free trial. In the event you don’t have an account already, you have to to make one. After you create a Neo4j account you’ll land on a screen like this:

Click “Create instance” after which select the free option.

Once you click “Create instance” you will likely be shown your username and password. The username is often just “Neo4j” however the password is exclusive, so write it down somewhere.

Step 2: Arrange integration

In EDG, in the highest right corner, click on the user icon (it looks like an individual). Then click “Server Administration”. This may take you to a screen with a bunch of options. Click “Product Configuration Parameters”. On the left toolbar you will notice a bunch of integration options. Click “Neo4j”.

You may configure this to push to multiple Neo4j databases, but for this tutorial we’ll just point to the Neo4j instance we just created. On the proper side of the empty Neo4j database line there may be a plus sign. Click that and also you will likely be prompted to enter the Neo4j credentials.

You may name this configuration anything but I selected “neo4jtest1”. The ID ought to be autofilled by EDG. For the Neo4j database URL, you have to to examine the Neo4j instance you created in Neo4j. It’ll look something like this: neo4j+s://cd227570.databases.neo4j.io.

Click “Create and Select”. Now you have to to enter your password. That is the one which Neo4j gave you once you created your Neo4j instance.

Now we’re all configured.

Step 3: Import taxonomy

Go to my GitHub and download this taxonomy. This can be a list of STEM topics in a hierarchy i.e. a taxonomy.

Click “Latest +” at the highest of the screen in EDG then “Import asset collections from TriG or Zip file”. Select the zip file you bought from my GitHub and cargo it into EDG. Click Finish. Once you go to the taxonomy you need to see a hierarchical list of a bunch of various STEM categories.

Step 4: Push taxonomy to Neo4j

Click the cloud dropdown to administer integrations. Within the dropdown menu you will notice the choice to “Link to Neo4j Database”.

Once you click this you may give you the chance to decide on which Neo4j integration you would like to use. Click the one you created in step 2 above.

After you choose the Neo4j integration, the mixing between this taxonomy and your Neo4j instance will likely be created. It’ll appear to be the popup below. Click the mixing to navigate to it. In my example below it known as “Integration with Neo4j database neo4jtest1”. Then click “Okay”.

The combination will now appear within the editor and we are able to change any settings if we wish. You’ll notice next to the cloud dropdown there may be a icon for pushing to integrated systems that appears like a cloud with an arrow on it.

Click edit after which scroll all the way down to “included classes”. That is where we specify which classes in our taxonomy we wish to push to this Neo4j instance. For this tutorial, select “Concept”. This could include every thing within the taxonomy. This may increasingly seem unnecessary, but it will be important for giant taxonomies with many sorts of classes.

Also select “at all times overwrite” to be “True”. This ensures that once we push, we overwrite whatever is within the Neo4j instance.

Now click “Save Changes”.

Back within the editor interface, click the cloud push icon that’s in the highest toolbar now that we’ve got established a Neo4j integration. A popup should appear that appears just like the image below. If we’ve got multiple integrations configured with multiple different applications, we’d see all of them here. For this tutorial, you need to just see the one you made and it ought to be mechanically chosen. Now click “Okay”.

You must see a progress bar of your concepts getting pushed to Neo4j.

Step 5: Explore data in Neo4j

Now return to your Neo4j Aura instance. In the event you click Instances on the left toolbar you will notice the instance we created in Step 1. Now you will notice that there are Nodes and Relationships in it!

You may click “Connect” after which “Explore” which is able to take you to a visible representation of your graph.

Below is the visual explorer of Neo4j Aura. You may just search on the generic term “Resource – BROADER – Resource” to see all the concepts we pushed from EDG together with their parent concepts.

Step 6: Upload articles to Neo4j

Download an inventory of journal articles from my GitHub here. This can be a short list of faux academic journal articles. The concept here is that we wish the taxonomy to come back from EDG however the article metadata to come back from some place else.

Now in Neo4j, click “Import” on the left toolbar and “Latest data source”. A listing of options will appear. You might import your instance data from anywhere, but for this tutorial we’ll just upload the csv file directly. The source of knowledge doesn’t matter, what matters is that the instance data is tagged with terms that come from the taxonomy that we’re managing in EDG. That’s how we are able to align the article metadata with our taxonomy and broader semantic layer.

Upload the csv you downloaded from my GitHub. You’ll then be asked how you would like to define your model. Select “Generate from schema”.

You’ll see Articles.csv pop up as a node. Click the node. You’ll must specify which property you would like to use as the first key. There may be a property on this list of articles called “id” which we’ll use as the first key. To set this as the important thing, click the important thing icon in the underside right for the “id” row. Then select “Run Import”.

You will likely be prompted to enter the password for this instance, which is the one you wrote down at the start. It’ll take a second to run but then you definitely will get this popup of Import results.

You may see that 15 nodes were created. The csv file contained 15 articles and every of them became a node. Now we are able to return to the Explore feature and seek for “Articles.csv”. You’ll see Articles show up within the visual in pink alongside the STEM categories in green. That is great but they usually are not yet linked. To attach the instance data (articles) to the categories, we’d like to run a cypher query.

Step 7: Connect instance data with taxonomy

Click Query within the left toolbar. Within the query box enter:

// 1) Match every imported article node that has a topicUri
MATCH (a:`Articles.csv`)
WHERE a.topicUri IS NOT NULL

// 2) Find the corresponding Concept by its uri property
MATCH (c:Concept {uri: a.topicUri})

// 3) Create the TAGGED_WITH relationship (idempotent)
MERGE (a)-[:TAGGED_WITH]->(c)

// 4) Return a sanity check
RETURN count(*) AS totalTaggedRelationships;

It should appear to be this:

Then press “Run”. You’ll see right under that question something that can say “Created 15 relationships”. That’s a great sign. Now return to the Explorer. Now seek for “Articles.csv – TAGGED_WITH – Resource”. You’ll see that every one of those pink nodes at the moment are connected to our green taxonomy!

Step 8: Construct a suggestion engine

We’re going to run some very basic similarity queries to display the way you’d use the graph we just built for recommendations. First, let’s take a look at an article and which category it’s tagged with. Enter this cypher query into query interface. This may list the categories that the article “Advances in Mathematical Software Studies #7” was tagged with.

MATCH (a:`Articles.csv` {title: 'Advances in Mathematical Software Studies #7'})
MATCH (a)-[:TAGGED_WITH]->(c:Concept)
RETURN a.title AS article, c.prefLabel AS tag, c.uri AS uri
ORDER BY tag;

You must see the next output and the category “Mathematical Software”.

Suppose we wish to search out articles much like this page turner because we wish to recommend them to potential readers. We are able to search for other articles which are also tagged with Mathematical Software, but we may reap the benefits of taxonomical structure we’ve got in our graph. is a subclass of , in response to the STEM taxonomy. You may return to EDG to explore the categories and their children. For our suggestion engine, to search out articles much like our article, we wish to search out other articles which are tagged with , but ALSO articles tagged with other branches of computer science.

We are able to try this with the next cypher query:

// 0) Seed article by its real label
MATCH (me:`Articles.csv` {title: 'Advances in Mathematical Software Studies #7'})  

// 1) get each tagged topic plus its parent
MATCH (me)-[:TAGGED_WITH]->(child:Concept)-[:BROADER]->(parent:Concept)  

// 2) find every other article tagged with a sibling under that very same parent
MATCH (siblingChild:Concept)-[:BROADER]->(parent)<-[:BROADER]-(child)
MATCH (rec:`Articles.csv`)-[:TAGGED_WITH]->(siblingChild)  
WHERE rec <> me  

// 3) compute suggestion rating
WITH rec, count(DISTINCT parent) AS rating  

// 4) now pull in all of the direct tags on each beneficial article
OPTIONAL MATCH (rec)-[:TAGGED_WITH]->(t:Concept)  

// 5) return title, rating, and full tag list
RETURN 
  rec.title                        AS suggestion,
  rating                            AS sharedParentCount,
  collect(DISTINCT t.prefLabel)    AS allTaggedTopics
ORDER BY rating DESC, suggestion
LIMIT 5;

You must get the next results:

There are not any other articles tagged with , but there are articles tagged with other branches of computer science. “Advances in Computers and Society Studies” is an article tagged with the category “Computers and Society”. That is beneficial since the graph knows that each and are branches of

Step 9: Adjusting our taxonomy

I discussed earlier that one reason you’d need to separate your taxonomy out of your graph database is so you possibly can make changes to your taxonomy and simply see the downstream effects in your apps. Let’s try that.

Suppose we wish to recategorize as a branch of fairly than a branch of . To do that in our taxonomy, we just drag and drop the term within the tree structure in EDG.

Now push the taxonomy back into Neo4j using the identical cloud button.

Now once we return to Neo4j and run the suggestion algorithm again, the outcomes are totally different. It’s because our original article was tagged with , which we’ve now classified as a branch of . The opposite articles which are beneficial to us are other articles about math, not computer science.

Conclusion

This straightforward demo shows how a taxonomy can bring structure, flexibility, and intelligence to your data applications. By separating your taxonomy (in EDG) out of your instance metadata (in Neo4j), you gain the power to infer relationships, align systems, and evolve your model over time, without having to retag or rebuild downstream apps. The result’s a modular architecture that makes your graph smarter as your understanding of the domain grows.

Concerning the creator: Steve Hedden is the Head of Product Management at TopQuadrant, where he leads the strategy for EDG, a platform for knowledge graph and metadata management. His work focuses on bridging enterprise data governance and AI through ontologies, taxonomies, and semantic technologies. Steve writes and speaks commonly about knowledge graphs, and the evolving role of semantics in AI systems.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x