Home Artificial Intelligence Roman Numeral Evaluation with Graph Neural Networks

Roman Numeral Evaluation with Graph Neural Networks

0
Roman Numeral Evaluation with Graph Neural Networks

An Introductory Guide

In this text, I would love to elucidate my journey in developing a model for automatic harmonic evaluation. Personally, I’m curious about understanding music deeply. Questions like: “Why are things structured the way in which they’re?” and “What was the composer or artist pondering when writing the piece?” are essential to me. Naturally, the solution to start was for me to analyse the underlying harmony of a bit.

Scavenging my old notebooks back from the conservatory I stabled upon the technique we were using to annotate and analyze small musical excerpts. It known as Roman Numeral evaluation. The concept is likely to be a bit complicated in case you never heard about it before but please bare with me.

My goal is to construct a system that may robotically analyze musical scores. Given a rating then the system will return the identical rating with an additional staff containing the chords in Roman numeral notation. This could work mainly for classical tonal music but will not be necessarily limited to that.

In the remaining of this text, I’ll introduce the concepts of Roman Numerals, Graph Neural Networks, and discuss some details concerning the model I developed and the outcomes. I hope you enjoy!

Introduction to Roman Numerals

Roman Numeral evaluation is a technique used to know and analyze the chords and harmonic progressions in music, particularly in Western classical music and popular music. Chords are represented using Roman numerals as an alternative of traditional musical notation.

In Roman Numeral evaluation, you see, each chord is assigned a Roman numeral based on its position and performance inside a given key. The Roman numerals represent the dimensions degrees of the important thing, with uppercase numerals representing major chords and lowercase numerals representing minor chords.

For instance, in the important thing of C major, the C major chord could be represented by the Roman numeral “I” (uppercase “I” denotes a significant chord). The D minor chord could be represented by “ii” (lowercase “ii” denotes a minor chord). The G major chord could be represented by “V” (uppercase “V” denotes a significant chord) since it is the fifth chord in the important thing of C major.

A Roman Numeral evaluation example for 2 bars for four-part harmony in C major.

Roman numerals are at all times relative to a key. Then if the secret’s C major then the Roman numeral “V” could be the dominant or the G major chord. But chords do have different qualities for instance minor or major. In Roman numerals, capital letters stand for major quality and lowercase for minor quality.

In music evaluation, often the bottom note is some extent of reference concerning the character of a chord. Roman numerals are in a position to convey this information too. In the instance above, the bass (lowest chord note) of the second chord is F sharp, but the foundation of the chord is D subsequently the chord is in 1 inversion, indicated with the number 6.

One other interesting notation capability of Roman numerals is expounded to borrowed chords. This effect known as secondary degree, implicitly every Roman numeral (primary) has a secondary degree of the tonic (i.e. I or i), nonetheless, when the secondary degree is annotated then we’re informed which scale degree is acting because the tonic momentarily. The third chord, in the instance above, has a dominant seven as its primary degree and the dominant of C major as its secondary degree. The V65 indicates a significant with a seven quality in second inversion.

Roman Numeral evaluation helps musicians and music theorists understand the structure and relationships between chords in a bit of music. It allows them to discover common chord progressions, analyze harmonic patterns, and make comparisons between different musical compositions. It’s a great tool for composers, arrangers, and performers to know the underlying harmony and make musical decisions based on that knowledge.

Automatic Roman Numeral Evaluation

Now that we’ve got a basis for what Roman Numeral evaluation looks like in practice we are able to discuss tips on how to automate it. In this text, we’ll cover a way to predict Roman Numeral from symbolic music, i.e. digital scores (MusicXML, MIDI, Mei, Kern, MuseScore, etc.). Please note which you could obtain a few of these formats from any rating editor software comparable to Finale, Sibelius, MuseScore, or another. Often, the software allows for an export to a musicxml (uncompressed) format. Nonetheless, for in case you don’t have any of those editors I suggest using MuseScore.

Let’s now discuss the representations in additional depth. In contrast to audio representations where music could be seen as a digital sequence within the waveform level or a 2-D spectrogram within the frequency domain, the symbolic representation has individual note events carrying information comparable to onset time, duration, and pitch spelling (names of notes). The symbolic representations have often been treated as a pseudo-audio representation separating the rating into quantized time frames, for instance, a pianoroll (just like the figure shown below). Nonetheless, recently some works proposed a graph representation of a rating where every note represents a vertex within the graph and edges represent relations between notes. For the latter, scores could be transformed on this graph structure which is especially useful when a Machine Learning model is involved.

Different representations of the rating excerpt are shown in the center. Top: quantized timeframe representation, bottom: graph representation.

So given a symbolic rating, the graph is constructed by modelling 3 relationships between notes.

  • Notes starting at the identical time, i.e. same onset.
  • Note starting when the opposite ends, i.e. consecutive notes.
  • Notes starting while the opposite is sounding, i.e. during connection.

The graph of the rating could be used as input to a Graph Neural Network which implicitly learns by propagating the knowledge along the perimeters of the graph. But before we explain how a model works on scores, let’s first briefly explain how Graph Neural Networks work.

So, what exactly are Graph Neural Networks? At their core, GNNs are a category of deep learning models designed to handle data represented as graphs. Identical to real-world networks, graphs consist of interconnected nodes or vertices, each with its own unique features. GNNs leverage this interconnectedness to capture wealthy relationships and dependencies, enabling them to perform evaluation and prediction tasks.

But how do GNNs work? Imagine a musical rating where each note is a node, and note relations represent the connections between them. Traditional models would treat each note instance individually, ignoring the musical context. Nonetheless, GNNs embrace this context by considering each the person’s features (e.g., pitch spelling, duration) and their relationships (same onset, consecutive) concurrently. By aggregating information from neighbouring nodes, GNNs empower us to know not only individual notes but in addition the dynamics and patterns inside the complete network.

To attain this, GNNs employ a series of iterative message-passing steps. During each step, nodes gather information from their neighbours, update their very own representations, and propagate these updated features further through the network. This iterative process allows GNNs to capture and refine information from nearby nodes, regularly constructing a comprehensive understanding of the complete graph.

The message-passing process when done iteratively within the network is usually called graph convolution. A preferred graph convolution block that we also utilized in our music evaluation model known as SageConv, from the famous GraphSAGE paper. We won’t cover the particulars here but there are a lot of sources covering the functionality of GraphSAGE, comparable to this one.

The great thing about GNNs lies of their ability to extract meaningful representations from graph data. By learning from the local context and mixing it with global information, GNNs can uncover hidden patterns, make accurate predictions, and even generate recent insights. This makes them invaluable in a big selection of domains, from social network evaluation to drug discovery, traffic prediction to fraud detection, and now to music evaluation.

The model used for Roman Numeral evaluation known as ChordGNN.
Because the name suggests, ChordGNN is a model for automatic Roman Numeral evaluation based on Graph Neural Networks. A particularity of this model is that’s leverages note-wise information but produces onset-wise prediction, i.e. a Roman Numeral is predicted for every unique onset event of the rating. That signifies that multiple notes at the identical onset will share the identical Roman Numeral similar to when annotating a musical rating. Nonetheless, through the use of Graph Convolution information from every note is propagated through the neighboring notes and onsets.

ChordGNN model architecture illustration.

ChordGNN relies on a Graph Convolutional Recurrent Neural Network Architecture and it consists of stacked GraphSAGE Convolutional Blocks that operate on the note level.

The Graph Convolution is followed by an Onset-Pooling Layer that contracts the note representations to the onset level, thus leading to a vector embedding for every unique onset of the rating. That is a vital step because it moves the representation from a graph to a sequence.

The embeddings obtained by the Onset-Pooling, that are also ordered by time, are then fed to a Sequential model, comparable to a GRU stack. Finally, easy Multi-Layer Perceptron Classifiers are added for every one in all the attributes that describe a Roman Numeral. Subsequently, ChordGNN can also be a Multi-Task model.

ChordGNN does indirectly predict the Roman numeral for each position of the rating but relatively predicts the degree, local key, quality, inversion and root as an alternative. The predictions of every attribute task are combined right into a single Roman Numeral prediction by analyzing the predictions for every of the tasks. Let’s see what the output predictions looked like.

On this section, we’ll take a look at a few of ChordGNN’s predictions and even compare them with an evaluation done by a human. Below is an example of the primary bars from Haydn’s string quartet op.20 №3 movement 4.

A comparison between the human annotation and ChordGNN on a passage of Haydn’s string
quartet op.20 №3 movement 4.

In this instance, we are able to view several things. In measure 2, the human annotation marks a tonic in first inversion; nonetheless, the viola at that time is lower than the cello and subsequently the chord is definitely in root position. ChordGNN is in a position to predict this accurately. Subsequently, ChordGNN predicts a harmonic rhythm of eighth notes, which disagrees with the annotator’s half-note marking. Analyzing the underlying harmony in that passage, we are able to justify our ChordGNN’s decisions.

The human annotation suggests that the complete second half of the 2nd measure represents a viio chord. Nonetheless, it mustn’t be in the primary inversion, because the cello plays an F# as the bottom note (which is the foundation of viio). Nonetheless, there are two conflicting interpretations of the segment. First, the viio on the third beat is seen as a passing chord between the encompassing tonic chords, resulting in a dominant chord in the following measure. Alternatively, the viio could already be a part of a chronic dominant harmony (with passing chords on the offbeats) resulting in the V7. The ChordGNN solution accommodates each interpretations because it doesn’t try and group chords at the next level, treating each eighth note as a person chord relatively than a passing event.

A comparison between the human annotation and ChordGNN on a passage of Mozarts’s Piano Sonata K279 movement 1. Image by the writer

Above is one other example comparing the predictions of ChordGNN with the unique evaluation of a Mozart Piano Sonata. On this case, ChordGNN’s evaluation is a little more simplistic, selecting to omit some chords. This is going on on two different occasions with the dominant seven in 4 inversion (V2). That is an affordable assumption for ChordGNN for the reason that bass is missing. One other disagreement between the annotation and the prediction occurs on the half cadence towards the tip. ChordGNN is treating the C# of the melody as a passing note where the annotator chooses to specify the extension of #11.

In this text, we discussed a recent method for automating Roman Numeral Evaluation using Graph Neural Networks. We discussed how the ChordGNN model works and showcased a few of its predictions.

E. Karystinaios, G. Widmer. Roman Numeral Evaluation with Graph Neural Networks: Onset-wise Predictions from Note-wise Features. Proceedings of International Society of Music Information Retrieval Conference (ISMIR), 2023.

All images and graphics in this text are created by the writer.

LEAVE A REPLY

Please enter your comment!
Please enter your name here