Machine Learning & Algorithmic Music Composition

Abstract

Machine learning and algorithmic systems has not been a foreign application process in the field of music composition. Researchers, musicians, and aspiring artists have used algorithmic music composition as a tool for music production for many years now, and as technology advances, so do the understandings of the art that algorithms output and the implications that come along with it. Who owns the art? Is it creative? This research paper explores the way machine learning and algorithms are used to implement and utilize music compositional systems, as well as the discourse that both exists and will exist in the coming years due to the accessibility of this technology. Case studies will be examined to narrate the support and disapproval of algorithms for music composition, such as Magenta’s NSynth system and Amper’s system.

Introduction

The process and study of algorithmic music composition has been around for centuries, for algorithmic music is understood as a set of rules or procedures followed to put together a piece of music (Simoni). These algorithms can be simple or complicated — they are meant to be manually predictive styles of music composition. However, fairly new research regarding the computational, algorithmic machine learned process of music creation has been prevalent. How is machine learning applied to the field of music production? For the sake of this research paper, any concepts regarding music theory and styles of music/music genres will be avoided for our discourse primarily because the discourse being expanded upon is algorithmic music composition in relation to technology. Algorithmic composition is made up of methodical procedures through computer processing, which has made algorithms in musical contexts more sophisticated and complex (“Algorithms in Music”).

Technical Overview

Though the functionality of algorithmic music composition systems differ based on the utility of the technology (e.g. a tool for creation v. a system that generates a piece at random), they all share the same internal inner system. Machine learning is defined as a set of techniques and algorithms that carry out certain tasks while being housed inside of the artificial intelligence it’s designed to be in. Machine learning researchers are primarily interested in understanding the knowledge about data-driven algorithms. In relation to technological-algorithmic music composition, defined as the creation of methodological procedures (“Algorithms in Music”), data is collected from hundreds of types of music and coded into different categories for a more organized and automated flow of data retrieval on the machines end once an input is requested for the machine to output content. This process can be identified as classification, a type of algorithm that classifies features of data into categories. If a particular algorithm were to be implemented for an artist to use technology as a tool to create a machine-generated melody, the algorithm begins to look through the scanned and classified data it has on melodies and begins to produce a melody that not only borrows sonic elements from the data is has, but also is sonically representative as a combination of the data is has learned.

The process of collecting data is called a generative adversarial network, which is a deep neural network made up of two other networks that feed off of one another (“A Beginner’s Guide to GANs”). One network is called the generator which generates the new data, while the other network is called the discriminator (part of the discriminative algorithm) which evaluates an input for authenticity (“A Beginner’s Guide to GANs”). The generating network begins to produce an requested content (in this case, a musical component) at random, in which, soon enough, the discriminator network begins to feed data into the generating network by critiquing what’s being produced (“A Beginner’s Guide to GANs”). From there, the generating network fine tunes what is being generated until the discriminator network lessens the amount of critiques it feeds, which suggests that the generating network has produced something well-bodied enough for the discriminating network to identify it as a creative work of art (“A Beginner’s Guide to GANs”).

 

Functionality and Utility

Wave animation

Visual of raw audio data and the hundreds of components that make up each sonic component within a 1 second audio file (“WaveNet”)

As an application to automated music production, there has been heavy discourse regarding GANs and the approachability of GANs in research specifically in regards to audio, in comparison to, more commonly, digital images (“The NSynth Dataset”). Audio signals are harder to code and classify based on the properties that make up sound — previous efforts to synthesize data-driven audio was limited by subjective forms of categorization such as textural sonic make-up or training small parametric models (“The NSynth Dataset”). Researchers at Magenta have partnered with Google to create NSynth, an open-source audio data set that contains over 300,000 musical notes, each of which are representative of different pitches, timbres, and frequencies (“The NSynth Dataset”). The creation of NSynth was Magenta’s attempt at making audio dataset retrieval (GAN’s) as approachable and accessible as possible, without the technical limitations (“The NSynth Dataset”). By having this technology more accessible, the developers at Magenta were also developing news ways for humans to use technology as a tool for human expression (“NSynth”). NSynth uses deep neural networks to create sounds as authentic and original as human-synthesized sounds by mimicking a WaveNet expressive model — a deep learning generative model of raw audio waveforms which generates sound (speech or music) that mimics the original source of sound. WaveNets work as a convolutional neural network where the input goes through various hidden layers to generate an output as close as possible to the input (“WaveNet”).

Architecture animation

Demonstration of WaveNet system of inputting and outputting media, (“WaveNet”)

Magenta considers their inspired-WaveNet as a compression of original data, whose output is as similar as the input (“NSynth”). Below are images provided by Magenta that demonstrates the process of inputted audio becoming coded, classified, and compressed, and then outputted as a reconstructed sound that resembles the original input:

The process of GAN’s at work to reproduce an inputted sound, (“NSynth”)

Here is a clip of the original bass audio (“NSynth”).

In this audio clip, the original bass audio is embedded, compressed, and then reconstructed as the following output (“NSynth”).

Algorithmic music composition can be used in various ways, similarly to how other forms of art can be produced by technology depending on the implementation of the technology for content creation. Some algorithmic music composition systems can be used as a tool to create sounds generated from a pool of data that reflects other sounds, similarly to the way way Magenta’s NSynth works, while other algorithmic music composition function as stand-alone systems synthesized to output an entire generated track. Researchers at Sony used Flow Machines software, which houses 13,000 pieces of musical track data, to create a song that mimics the work of The Beatles (Vincent). The track “Daddy’s Car” (link below), was fully produced by composer Benoit Carre who inputted the desired style of music and created the lyrics to the track (Vincent). Though multi-layered and complex, Sony’s experiment holds a revolutionary standard for the better and worse of algorithmic music composition. The bettering of it would be that algorithmic music composition can be demonstrated as a machine-capable music creator. However, legal questions come into play regarding rights/copyright and artistic autonomy. The ability for AI to generate a track from scratch simply by requesting a particular type of genre brings in various types of data that the machine has learned threatens the sanctity of artists losing rights to their song, let alone being unable to determine the origins of the borrowed/influenced musical components of the newly composed AI track.

In regards to the generative adversarial networks that algorithmic music compositional systems run off of, there comes a point in time where the data being collected in the machine learning and classification process is only representative of music that’s similar to one another. In other words, a computer scientist can generate an algorithm for AI to produce a song, but only feed that AI musical data from 1,000 songs all of which are sonically similar. Even if 1,000 songs are randomly selected for classification and deep learning, and all those songs are fairly familiar in genre, whatever song is machine-produced would only reflect what the generator of the GAN’s thinks is music — and that notion of what “is” music would be based off of what the network commonly notes as the structure of music. More avant-garde tracks won’t be representative of what algorithmic music composition systems are capable of. With this understanding, the general ‘standard’ for what music really is, based on how the generator in the GAN functions, will be set and will produce tracks that might be overall similar to one another. This is evident not only by how algorithmic music composition systems are too generic and lack substance (Deahl), but also based on how more avant-garde/non-generic tracks are more creative and multi-dimensional in terms of the quality of art and the meaning behind it.

Ethical Considerations

Many ethical considerations surround the discourse of creativity and artificial intelligence. Art is and has been considered a personable and human-to-human experience where one artist creates a body of work to express an idea or emotion to their audience. From there, the art begins to find its place and exist in a broader cultural context where it provides meaning and metaphorical reflection. This is creativity — defined as implementing original ideas based on one’s imagination. The term “imagination” generally denotes to the human experience and the human mind. Once technology is brought into the mix of artistic creation and creativity, regardless of whether technology is used as a tool for the artist or used as a stand-alone machine that generates content through the use of algorithms, cultural discussions outside of the realm of computer science can begin to threaten and question whether technology-influenced art is “creative” or “art.” Much discussion around the world of AI and machines already involves misunderstandings and misconceptions of those technologies, where a narrative of AI being autonomous beings that will take over the human race is portrayed in the media from people that are unaware of the development process of AI and how it functions. Though the threat of automation does exist, people fail to understand that AI and machines are not autonomous. They are not self-thinking humans/beings. The actions performed by machines are a product of human development and language/actions that are coded into algorithms as a reflection of the human experience. Perhaps it’s that people are afraid to approach the fundamental aspects of artificial intelligence, or that de-blackboxing artificial intelligence is unappealing. However, in the past 20+ years, our knowledge of natural language processors, machine learning, and artificial intelligence as expanded at such an exponential rate that it seems as though we’ve come to an intersectional point in time where society is now trying to catch up to speed with the development and growth of technology.

Computer generated music is a product of humans, not the machines themselves because machines are not autonomous self-thinking beings. Through generative adversarial networks, artificially intelligent machines learn to understand and classify the data so that once an input is given from a human, they can produce/reproduce whatever action is being desired. However, if the machine isn’t responsible for creating the music, who is? Computer scientists are not the ones actually producing the data that is being fed as algorithms to the computer – they merely create the algorithms. Nor is anyone able to cite music that a machine generates through the use of algorithms because what’s generated is an inspired mix of data from other artists work – hundreds and hundreds of them. The criteria for creativity and aesthetics, especially in the art world, is subjective to the artist and audience (Simoni). Earlier system models of generative algorithmic compositions are outnumbered by later and more recent system models of generative algorithmic compositions due to how these systems are dulling the creative process (Simoni). The dulling of creativity is an ethical debate surrounding the art realm, where people are not only fearful that machine-composed art will oversaturate the art realm, but also lower the expectations of creativity itself. Already, when reviewing AI generated bodies of work, if told that the body of work is AI produced, art critics highly criticize it calling it one-dimensional, boring, and unimaginative as if it were a knock off of already existing artists (GumGum Insights). Projects such as GumGum’s self-painting machine sparked a lot of controversy over how creating an image through a collection of data in the GAN can be considered “creative,” suggesting that it’s not at all creative because there was no artist directly involved in the creative process (“ART.IFICIAL”). Another consideration was the lack of source credibility with citing where the artistic inspiration is from. It is not only hard to determine what work(s) influence the outputted creative work of a machine, but also hard to determine how much of whose work was an influential source to create the GAN-generated art. The same arguments can be made in relation to algorithmic musical composition. Depending on the algorithm being implemented to produce musical content, it’s evident that there is a pool of data collected and classified for the generative adversarial network to work off from the data it has to create music. Technically speaking, the collected data is visible and citable, however the output of algorithmic music composition is not entirely traceable back to its original source. Music production computer softwares like Logic are already both readily accessible to consumers and allows for producers to generate auto-populated drum patterns that are unique to each and every user, thanks to artificial intelligence’s deep neural networks which relies on large amounts of data to output what’s being requested by the user (Deahl).

For the many arguments made against using algorithmic composition, there are many arguments made in support of it. Many advocates urge to use algorithmic music composition programs as tools to both enable artists to create more and make music composition more accessible for non-musicians (Deahl). Amper, an AI algorithm music composition tool that generates music, is easier than Google’s NSynth machine and allows for music to be automatically generated in less than three manual commands through a generative adversarial network that creates a unique sound each and every time (Deahl). Many artists such as Taryn Southern use tools like this to produce meaningful music rather than to harm the art industry by letting machine powered algorithms draw in musical inspirations from its data to produce a unique track (Deahl). Southern’s practices for music production ties into the discourse surrounding remix practices — what about a remix is original or stolen? At what point does a publisher of an art piece have to cite/source the original content it’s drawing inspiration from? In regards to algorithmic music composition, should we and how should we cite from the inspired sources? Should there be a way for AI-generated music to be identifiable? With quick developments in technology, there is a possibility that a new standard of music production will be created in regards to the awareness (or lack thereof) of algorithmic music composition.

Conclusion

Algorithmic music composition is a tool that has been around for quite some time. It’s the use of algorithmic music composition systems, as a tool or as a stand alone music production machine, that are rapidly evolving in time. These systems primarily function as general adversarial networks in which a gathered amount of data is learned as a natural language processor for the system. From there, once an input is requested (e.g. “produce rock music”), the system identifies previously classified sounds coded as “rock music,” compresses it, and the outputs a sound that borrows from the copious data it has learned and stored. Current research on the use of algorithmic composed music demonstrates that there are positives and negatives to such systems — it allows for algorithm music composition to become a tool for creative expansion and accessibility for aspiring artists, however it also hinders creative development by limiting its source credibility and sonic uniqueness. Machine learning and algorithmic computational systems are embedded in the process of algorithmic music composition, however the ongoing debate on whether the work it produces is creative or not will remain a subjective debate until legal precautions are carried out to bring clarity to who owns AI-influenced music. 

Work Cited:

“A Beginner’s Guide to Generative Adversarial Networks (GANs).” Skymind, https://skymind.ai/wiki/generative-adversarial-network-gan.

“ALGORITHMS IN MUSIC.” NorthWest Academic Computing Consortiumhttp://musicalgorithms.ewu.edu/musichist.html.

“ART.IFICIAL: How Artificial Intelligence Is Paving the Way for the Future of Creativity.” Gumgumhttps://gumgum.com/artificial-creativity.

Deahl, Dani. “HOW AI-GENERATED MUSIC IS CHANGING THE WAY HITS ARE MADE.” The Verge, 31 Aug. 2018, https://www.theverge.com/2018/8/31/17777008/artificial-intelligence-taryn-southern-amper-music.

“NSynth: Neural Audio Synthesis.” Magenta, 6 Apr. 2017, https://magenta.tensorflow.org/nsynth.

Simoni, Mary. “Chapter 2: The History and Philosophy of Algorithmic Composition.” Algorithmic Composition: A Gentle Introduction to Music Composition Using Common LISP and Common Music, MI: Michigan Publishing, 2003, https://quod.lib.umich.edu/s/spobooks/bbv9810.0001.001/1:5/–algorithmic-composition-a-gentle-introduction-to-music?rgn=div1;view=fulltext.

“The NSynth Dataset.” Magenta, 5 Apr. 2017, https://magenta.tensorflow.org/datasets/nsynth.

Vincent, James. “This AI-Written Pop Song Is Almost Certainly a Dire Warning for Humanity.” The Verge, 26 Sept. 2016, https://www.theverge.com/2016/9/26/13055938/ai-pop-song-daddys-car-sony.

“WaveNet: A Generative Model for Raw Audio.” DeepMindhttps://deepmind.com/blog/wavenet-generative-model-raw-audio/.