Home Artificial Intelligence Introduction to Graph Evaluation using cuGraph What’s Graph Evaluation? Let’s dive right into a problem What happens when the info set gets larger? What makes cuGraph special?

Introduction to Graph Evaluation using cuGraph What’s Graph Evaluation? Let’s dive right into a problem What happens when the info set gets larger? What makes cuGraph special?

5
Introduction to Graph Evaluation using cuGraph
What’s Graph Evaluation?
Let’s dive right into a problem
What happens when the info set gets larger?
What makes cuGraph special?

When analyzing data, sometimes the relationships between the info elements are vital together with the scope of the query being asked. In those cases, it is best to represent the info as a graph and apply graph analytics.

An example coming out of the COVID pandemic is that everybody became aware of the concept of contact tracing. That process involves collecting details about who someone was in touch with and who those folks were in touch with. Principally, constructing a social network where the nodes represent people and edges represent contact. It is a case where the connection, who contacts whom, is the main focus of the evaluation greater than the characteristics of the person in query. Graph evaluation is the appliance of graph techniques and algorithms to reply questions related to the connection between data objects.

This blog represents a temporary introduction to graph evaluation and is the kickoff to a series of blogs covering graph evaluation using cuGraph. Later blogs will cover cuGraph in depth, with code, scale, and performance details.

The series will cover topics akin to:

  • Node Similarity
  • Node and Edge Centrality
  • Path Finding
  • Community detection
  • Sampling
  • Graphs and varieties of graphs
  • … and plenty of more

A full discussion of graph analytics can easily morph into an entire book, see the nice work by Stanley Wasserman and Katherine Faust for instance, or a PhD level paper on a graph algorithm or performance. Nonetheless, it’s enlightening to first take a look at the questions graph analytics can solve and ignore the small print of how algorithms work.

Graph algorithms help answer questions as diverse as:

  • How are diseases spread in a community?
  • How do international trade routes develop?
  • What are a very powerful papers in a citation list?
  • Which members are redundant or indispensable in a corporation?
  • Based on information flow, how might a department function if specific people(s) leave?
  • Tips on how to get from point A to point B with the least cost — the classic traveling salesman problem

The subsequent blog will dive into this in greater detail, but for now listed below are a couple of key graph parts:

  • Nodes are actors in the info, also called vertices in graph theory.
  • Edges indicate the interactions between nodes, also called links.
  • Attributes are characteristics of nodes and edges, sometimes called features or properties.

Allow us to consider the next quite simple graph that consists of two nodes, A and B, and a single edge that has a weight attribute of three.

Simple Graph Example of two nodes with a relationship between them containing a weight of 3
Figure 1: easy graph example

RAPIDS cuGraph is a component of NVIDIA’s suite of accelerated data science libraries that goals to speed up most points of Data Science via GPU’s. RAPIDS cuGraph focuses on providing acceleration to graph algorithms and workflows. You don’t want to be a Data Scientist or a GPU expert to profit quickly from cuGraph. The simplest platform to make use of cuGraph is a Linux computer with an NVIDIA GPU (Pascal or later generation). RAPIDS cuGraph provides an intuitive and straightforward to make use of Python interface in addition to a set of Jupyter notebook examples.

Only three steps are needed to make use of cuGraph: (1) Load the graph data; (2) Create a Graph; and (3) Call an algorithm.

Jupyter notebook example of running PageRank algorithm with cuGraph
Figure 2: Jupyter notebook based example of using cuGraph

Okay, really 4 steps because the packages should be imported.

Given a social dataset, the query we would like to ask, and understand, is: who’s the preferred and least popular person?

Consider a straightforward dataset which comes from work by JL Moreno within the Forties taking a look at the dining-table partners in a dormitory at a Recent York State Training School. He collected data by asking all twenty-six girls in a dormitory their first and second alternative of dining partners. The raw data structure could be very easy. Each resident’s answer is a row in a comma-separated file. The ‘source’ is the actor answering the query, the ‘goal’ is the actor they would love to dine with, and the burden is either a 1 for first alternative or a 2 for second alternative. From the sample data shown below, you possibly can see that Ada would like to sit down with Cora as her 1st alternative and with Louise as her 2nd option. In the highest left corner of figure 1 below , we are able to see edges from Ada to Cora and Louise with the corresponding edge attributes.

Example of rows in dorm preference data set
Figure 3: Sample data rows
Visualization of complete dining partner preference dataset
Figure 4: Recent York State Training School dormitory

For determining who’s a very powerful, we’re going to have a look at five different centrality measurements:

Table of centrality measures with their definitions and uses
Centrality Measures and Uses

In figure 4 above it’s visually apparent that Eva, Marion, and Edna are the preferred dining partners. We will discern this by counting the variety of edges pointing to them. It isn’t a surprise that the graph calculations in figure 5 below, validate our observations. In a bigger dataset, visualization is harder or inconceivable.

Results of highest centrality calculations for dining preference data set
Figure 5: Sorted Results calculated by running cuGraph’s centrality notebook

As expected, each Eva and Marion topped all of the measures here. The subsequent query to ask is: How do their roles differ?

Betweenness and PageRank scores suggest that Eva is most vital to lines of communication and social structure of the dormitory. Eigenvector and Katz scores indicate Marion wields essentially the most influence over each the favored students and all the population.

Conversely, the bottom scorers in these measures are also of interest.

Many residents have low connectedness as shown by the in-degree distribution chart, Figure 6 below. A complete of 10 out of the 26 students were only chosen as a preferred partner once or under no circumstances. The pool of probably isolated residents is over one third of the population.

Histogram illustrating the distribution of residents chosen as preferred dining partners
Figure 6: In-Degree Histogram

Allow us to sort the graph algorithm results, Figure 7 below, in ascending order (smallest at top) in order that we are able to evaluate least popular actors.

Results of actors with the lowest centrality calculations in the dining preference data set
Figure 7: Results calculated by running cuGraph’s centrality notebook and sorting by lowest popularity

Note: on account of the small dataset size, degree centrality doesn’t completely correlate to lowest degree.

We will ask a spread of additional query beyond just popularity, like:

  • – From the info, many students share the bottom degree centrality identifying Alice, Laura, Ella, and Cora as being amongst essentially the most isolated,
  • – Those like Betty and Ruth who had the bottom PageRank and Eigenvector centrality scores lack influence and connections to approach others in the event that they are bullied.
  • – Those with low degree to others with low degree like Ada and Cora could suddenly grow to be very isolated if either one left the dorm.
  • – Laura has minimum links to only highly connected students. She seems to haven’t any peer group and may very well be a bystander to the mutually connected groups. Her degree centrality is minimum but scores in the midst of connectedness measures.

While the questions above are answered by observing these measures, further evaluation of this dataset could address answer questions like:

  • Whose removal would most disrupt the social structure within the dorm?
  • Who brokers social status throughout the dorm?
  • Who would ascend in prominence if a specific person left the dorm?

Allow us to take a look at a bigger dataset, one which comprises over 40 thousand edges, and is illustrated in figure 6 below. At that scale it starts becoming nearly inconceivable to visually discern the characteristics of individual nodes. Imagine what a graph with over one million edges would seem like.

The figure depicts an email communication network from the 1990’s infamous Enron corporate scandal. For our evaluation, we’ll duplicate the identical measurements from the above easy set. Nonetheless, we must depend on some additional external enrichment data to see if the answers make sense.

visualization of Enron email communication graph
Figure 6: Visualization of Enron email contacts
Image: Peter Prevos, Lucid Manager, May 3, 2023, lucidmanager.org/data-science/analyse-enron-corpus

This dataset comprises email metadata publicized through the Enron scandal trial.

Evaluation of email traffic (a kind of graph evaluation) helped convict most of the conspirators by establishing hidden relationships, determining key actors, and isolating small groups throughout the full email corpus.

  • The scandal was widely impactful:
    – led to the bankruptcy of Enron in 2001
    – Resulted in billions of dollars in lawsuits including a $40 billion suit from shareholders.
    – Arthur Anderson, a top five accounting firm, was dissolved.
    – The Sarbanes-Oxley Act, amongst other reforms, was passed to control financial record keeping.
    – Several Enron officers were sentenced to prison terms
  • The info comprises 40777 unique email connections (the sides) between 6187 different email addresses (the nodes).
  • Weights containing the variety of emails between addresses are included in each connection data but aren’t utilized in this evaluation. (Later blogs will use the weights)
Table of Nodes (vertices) with highest centrality measures in the Enron email data
Figure 6: Sorted Results of Centrality Algorithms for Enron data calculated by running cuGraph’s centrality notebook

Note: Katz Centrality has been omitted from these calculations because it doesn’t properly converge. More on that in a later blog.

Highest centrality addresses in Enron email data with owner roles
Figure 7: Essential Enron vertices with addresses and roles

Inside a matter of seconds (depending in your GPUs), cuGraph can discover vital actors on this graph and provides clues to their role locally it represents. In this instance…

  • Tana.jones@enron.com is connected to essentially the most actors within the graph.
  • Jeff.dasovich@enron.com has the very best Betweenness and PageRank. That email is essentially the most influential within the core network and a very powerful to the graph’s structure. It facilitates the shortest paths between other nodes within the graph.
  • Pete.davis@enron.com has the very best eigenvector centrality. That node had essentially the most influence over everything of the graph moderately than simply the opposite vital nodes.

There are numerous platforms and tools to do graph analytics but cuGraph has a wide range of significant advantages:

  • There are not any licensing fees needed to make use of cuGraph.
  • RAPIDS cuGraph integrates with widely used open-source packages like DASK, PyTorch and scikit-learn with more on the best way.
  • It is a component of the RAPIDS open-source GPU accelerated ecosystem allowing start to complete GPU acceleration.
  • Abundant Support within the RAPIDS community and from NVIDIA via GitHub.
  • It is simple to establish. This text isn’t focused on setup/installation, but links below contain instructions for running in several environments.
  • Scalability and performance are peerless.
    – A single 32GB GPU can process graphs of as much as 250 million edges.
    NVIDIA engineers have scale tested cuGraph on multi-node, multi-GPU systems on graphs of over one trillion edges.
  • User-friendly interface: There’s a Python API just like NetworkX, and for integrators there are each C and C++ APIs.
  • RAPIDS cuGraph is offered in Amazon AWS, Google CoLab, and Paperspace.

RAPIDS cuGraph provides a full suite of documentation and examples including:

As mentioned within the introduction, that is the primary in a series of blogs covering graph evaluation using cuGraph. The subsequent blog will explore the concept of similarity, followed by blogs on centrality, transversal and other classes of graph algorithms. The series may even dive into the notion of graph ETL, and the steps needed to process the info before making a graph. Lastly, the series will include recent relevant topics and requests accepted into the cuGraph GitHub issues list at https://github.com/rapidsai/cugraph/issues.

For extra read on the Recent York State Training S study by JL Moreno, his book “Who Shall Survive” is online at: https://archive.org/details/whoshallsurviven00jlmo/mode/2up

Social Network Evaluation: Methods and Applications” by Stanley Wasserman and Katherine Faust is an excellent book.

Lastly, “Disrupting Dark Networks” by Sean Everton is an informative read on the appliance of network evaluation.

The Enron image is from https://lucidmanager.org/data-science/analyse-enron-corpus, and is under a Creative Commons Attribution-ShareAlike 4.0 International License.

5 COMMENTS

  1. … [Trackback]

    […] Read More on that Topic: bardai.ai/artificial-intelligence/introduction-to-graph-evaluation-using-cugraphwhats-graph-evaluationlets-dive-right-into-a-problemwhat-happens-when-the-info-set-gets-largerwhat-makes-cugraph-special/ […]

  2. … [Trackback]

    […] Find More here to that Topic: bardai.ai/artificial-intelligence/introduction-to-graph-evaluation-using-cugraphwhats-graph-evaluationlets-dive-right-into-a-problemwhat-happens-when-the-info-set-gets-largerwhat-makes-cugraph-special/ […]

  3. … [Trackback]

    […] Here you can find 10746 more Info on that Topic: bardai.ai/artificial-intelligence/introduction-to-graph-evaluation-using-cugraphwhats-graph-evaluationlets-dive-right-into-a-problemwhat-happens-when-the-info-set-gets-largerwhat-makes-cugraph-spec…

  4. … [Trackback]

    […] Read More here on that Topic: bardai.ai/artificial-intelligence/introduction-to-graph-evaluation-using-cugraphwhats-graph-evaluationlets-dive-right-into-a-problemwhat-happens-when-the-info-set-gets-largerwhat-makes-cugraph-special/ […]

LEAVE A REPLY

Please enter your comment!
Please enter your name here