## Demystifying Graph Neural Networks — Part 1

## Key concepts for getting began

Graph Neural Networks (GNNs) are gaining attention in data science and machine learning but still remain poorly understood outside expert circles. To know this exciting approach, we must start with the broader field of Graph Machine Learning (GML). Many online resources speak about GNNs and GML as in the event that they are interchangeable concepts or as if GNNs are a panacea approach that makes other GML approaches obsolete. This is just not the case. Certainly one of GML’s primary purposes is to compress large sparse graph data structures to enable feasible prediction and inference. GNNs are one strategy to accomplish this, perhaps probably the most advanced way, but not the one way. Understanding it will help create a greater foundation for future parts of this series, where we’ll cover specific varieties of GNNs and related GML approaches in additional detail.

On this post, we’ll:

- Go over a transient recap on graph data structures
- Cover GML tasks and the varieties of problems they solve
- Examine the concept of compression and its importance in driving different GML approaches, including GNNs

Should you’re reading this text, you likely have already got some background on graph data structures. If not, I like to recommend reading this resource on property graphs or this resource on graph database concepts. I’ll give a *very* transient recap here:

A graph consists of nodes connected by relationships. There are a few other ways to model graph data. For simplicity, I’ll use the property graph model, which has three primary components:

- that represent entities (sometimes called vertices),
- that represent associations or interactions between nodes (sometimes called edges or links), and
- that represent attributes of nodes or relationships.

At its core, Graph machine learning (GML) is the appliance of machine learning to graphs specifically for predictive and prescriptive tasks. GML has a wide range of use cases across supply chain, fraud detection, recommendations, customer 360, drug discovery, and more.

Probably the greatest ways to know GML is thru the different sorts of ML tasks it could accomplish. I break this out for supervised and unsupervised learning below.

## Supervised GML Tasks

The below diagram outlines three of probably the most common GML tasks for supervised learning:

To expand further:

- Predicting a discrete or continuous node property. One can consider node property prediction as , similar to whether an account on a financial services platform needs to be classified as fraud or the way to categorize a product on a web based retail store.
- Predicting whether or not a relationship should exist between two nodes and potentially some properties concerning the relationship. Link prediction is useful for applications like entity resolution, where we would like to predict whether two nodes reflect the identical underlying entity; suggestion systems where we would like to predict what a user will wish to purchase or interact with next; and bioinformatics, for predicting things like protein and drug interactions. For every case, we care about .
- Predicting a discrete or continuous property of a graph or subgraph. Graph property prediction is helpful in domains where you would like to reasonably than modeling entities as nodes inside a bigger graph representing a whole dataset. Use cases include material sciences, bioinformatics, and drug discovery, where individual graphs can represent molecules or proteins that you would like to make predictions about.

## Unsupervised GML Tasks

Below are 4 of probably the most common GML tasks for unsupervised learning:

To elaborate on these further:

- : Reducing dimensionality while maintaining essential signals is a central theme for GML applications. Graph representation learning does this explicitly by generating low-dimensional features from graphstructures, normally to make use of them for downstream exploratory data evaluation (EDA) and ML.
- Community detection is a clustering technique for identifying groups of densely interconnected nodes inside a graph. Community detection has various practical applications in anomaly detection, fraud and investigative analytics, social network evaluation, and biology.
- Similarity in GML refers to finding and measuring similar pairs of nodes in a graph. Similarity is applicable to many use cases, including suggestion, entity resolution, and anomaly and fraud detection. Common Similarity techniques include node similarity algorithms, topological link prediction, and K-Nearest-Neibor (KNN).
- I’m grouping these together as they have an inclination to be less related to ML tasks and more with analytical measures. Nevertheless, they still technically fit here, so I’ll cover them for completeness. Centrality finds essential or influential nodes in a graph. Centrality is ubiquitous throughout many use cases, including fraud and anomaly detection, suggestion, supply chain, logistics, and infrastructure problems. Pathfinding is used to seek out the bottom cost paths in a graph or to guage the standard and availability of paths. Pathfinding can profit many use cases coping with physical systems similar to logistics, supply chain, transportation, and infrastructure.

I got here across this interesting blog post by Matt Ranger which explains this point beautifully: One of the significant objectives with GML, and to a big extent natural language processing too, is compressing large sparse data structures while maintaining essential signals for prediction and inference.

Consider a regular graph represented by an adjacency matrix, a square matrix where each row and column represents a node. If a relationship goes from node A to node B, the cell on the intersection of row A and column B is 1; otherwise, it’s 0. Below is an illustration of some small regular graphs and their adjacency matrices.

Notice that lots of the cells within the above adjacency matrices are 0. Should you scale this to large graphs, particularly those present in real-world applications, the proportion of zeros increases, and the adjacency matrix becomes mostly zeros.

This happens because as these graphs grow, the common degree centrality grows much slower or by no means. In social networks, that is evidenced by concepts like the Dunbar Number, a cognitive limit to the variety of individuals with whom one can maintain stable social relationships. You’ll be able to intuit this for other varieties of graphs too, similar to graphs of economic transactions or graphs of user purchases for suggestion systems. As these graphs grow, the variety of potential unique transactions or purchases a single individual can take part in grows much faster than their capability to accomplish that. I.e. If there are six products on an internet site, one user buying half of them is possible, but when there are a whole lot of hundreds, then not a lot. Consequently, you find yourself with very large and sparse data structures.

Should you could use these sparse data structures directly for machine learning, you wouldn’t need GNNs or any GML — you’d just plug them as features into conventional ML models. Nevertheless, this isn’t possible. It wouldn’t scale, and even beyond that, it will cause mathematical issues around convergence and estimation that will render ML models ill-specified and infeasible. Consequently, a fundamental key to GML is compressing these data structures; arguably, it’s your entire point of GML.

At the best level, three GML approaches exist for accomplishing this compression.

Classic graph algorithms include things like PageRank, Louvain, and Dijkstra’s Shortest Path. They might be used independently for unsupervised community detection, similarity, centrality, or pathfinding. The outcomes of classic algorithms may also be used as features for conventional downstream models, similar to linear and logistic regressions, random forests, or neural networks to perform GML tasks.

Classic graph algorithms are inclined to be easy, easy to start with, and comparatively interpretable and explainable. Nevertheless, they will require more manual work and material expertise (SME) than other GML approaches. This makes classic graph algorithms good first selections in experimentation and development to assist understand what works best in your graph. They can even do well in production for easier problems, but more complex use cases may require graduating to a different GML approach.

Graph embeddings are a type of representation learning. Some graph embedding techniques leverage GNN architectures while others don’t. The latter group, non-GNN, is the main focus of this approach. These embedding techniques as an alternative depend on matrix factorization/decomposition, random projections, random walks, or hashing function architectures. Some examples include Node2vec (random walk based), FastRP (random projection and matrix operations), and HashGNN (hashing function architecture).

Graph embedding involves generating numeric or binary feature vectors to represent nodes, relationships, paths, or entire graphs. The foremost of those, node embedding, is amongst probably the most fundamental and commonly used. The fundamental idea is to generate a vector for every node such that the similarity between vectors (e.g. dot product) approximates the similarity between nodes within the graph. Below is an illustrative example of the small Zachary’s karate club network. Note how the adjacency matrix is compressed to 2-d embedding vectors for every node and the way those vectors cluster together to reflect the graph community structure. Most real-world embeddings could have greater than two dimensions (128 to 256 or higher) to represent larger real-world graphs with thousands and thousands or billions of nodes, but the essential intuition is similar.

The identical logic as above applies to relationship, path, and whole graph embeddings: similarity within the embedding vectors should approximate similarity within the graph structure. This accomplishes the compression while maintaining essential signals, making the embeddings useful for various downstream ML tasks.

Non-GNN embeddings can profit from reduced manual workload and required SME in comparison with classic graph algorithms. While non-GNN embeddings often require hyperparameter tuning to get right, they have an inclination to be easier to automate and generalize across different graphs. Moreover, some non-GNN embeddings like FastRP and HashGNN can scale incredibly well to large graphs on commodity hardware since they don’t require model training. This is usually a massive profit over GNN-based approaches.

Nevertheless, non-GNN embeddings include some trade-offs too. They’re less interpretable and explainable than classic graph algorithms attributable to the more generalized mathematical operations involved. Also they are generally transductive, though recent improvements in Neo4j Graph Data Science allow a few of them to effectively behave inductively in certain applications. We are going to cover transductive and inductive settings in additional depth later on this series; it has to do with the flexibility to predict on latest unseen data and is a vital point of consideration for GML.

A GNN is a neural network model that takes graph data as input, transforms it into intermediate embeddings, and feeds the embeddings to a final layer aligned to a prediction task. This prediction task might be supervised (node property prediction, link prediction, graph property prediction) or unsupervised (clustering, similarity, or just a final output embedding for representation learning). So unlike classic algorithms and non-GNN embeddings, which pass results as features to downstream ML models, particularly for supervised tasks, GNNs are fully end-to-end graph native solutions.

GNNs have a wide range of advantages related to being complete end-to-end solutions. Notably, intermediate embeddings are learned during training and, in theory, robotically infer a very powerful information from the graph. Most up-to-date GNNs are also inductive attributable to having a trained model.

GNNs also include some weaknesses. This includes high complexity, scaling difficulties, and low interpretability and explainability. GNNs can even have limitations around depth attributable to over-smoothing and other mathematical principles.

I’ll discuss GNNs more in my next blog, *GNNs: What They Are and Why They Matter*. Within the meantime, . Data scientists and engineers can find technical documentation for getting began here.

Biggest takeaways from this post:

- Graph Machine Learning (GML) is a broad field with many use case applications and comprising multiple different supervised and unsupervised ML tasks
- Certainly one of the first purposes of GML is compressing large sparse graph structures while maintaining essential signals for prediction and inference.
- GNNs are one among multiple GML approaches that accomplish this compression.