Tracking the Path of Spotify Music: Design Principles and Technologies that Make Spotify Workable

Abstract

Streaming technologies are getting more popular and essential with the development of the Internet. Cheap, convenient and fast, streaming music is gradually replacing the physical music players, such as vinyl, CD player, and phonograph. Music streaming platforms not only provide users a wide variety of songs without the limitation of time and space, but also construct a social network environment where users can share music. Leading companies in the industry include Spotify, Apple Music, Amazon Music, Pandora etc. In addition, streaming music service companies offer users mobile apps with great functions, meaning that users do not need to stick to a PC to enjoy the music. The competition among companies makes consumers the biggest beneficiary.

In this research paper, I am going to use the case study of Spotify app to research into the question of “How does Spotify implement as many design principles as possible?” This paper is divided into three main parts: 1) two main infrastructures that keep Spotify running: proliferating data infrastructure as well as audio and streaming infrastructure (Eriksson et al., 2017); 2) three ways Spotify works as interface: interface between websites, between human and devices as well as between human and the large computing system; 3) Spotify as a sociotechnical system. My expected goal of this paper is to visualize the series of unobservable series of systematic actions triggered by “one click” of the button on the GUI of Spotify.

Introduction:

Streaming(adj.): relating to or being the transfer of data (such as audio or video material) in a continuous stream especially for immediate processing or playback (Merriam-Webster)

  • How Does Music Streaming Work?

With streaming, “the client browser or plug-in can start displaying the data before the entire file has been transmitted”. During the procedure of streaming, the audio file is transmitted and delivered in small packets, which compose metafiles, and then decoded by the codec. When the buffer is filled by the decoded results, the files are turned into music and the computer straightly plays the music (White, 2015) “Each of the scores of available audio codecs are specialized to work with particular audio file formats, such as mp3, ogg, etc. As the buffer fills up, the codec processes the file through a digital-to-analog converter, turning file data into music, and while the server continues to send the rest of the file” (White,2015).

Different from downloaded music, which is stored permanently in the device’s hard drive and whose access does not require the connection to the internet once stored, streaming music works through wi-fi or mobile data, and the users do not “own” the music. As long as there is a steady stream of packets delivered to the computer, the user will hear the music without any interruptions.

  • Digitization of Sound Wave and Compression of Audio Files

The digitization process of sound wave follows the information transmission model of Shannon. The digitization process is like an analog of sound: it is recorded as a sequence of discrete events and encoded in the binary language of computers. Digitalization involves two mains steps, sampling, the measurement of air pressure amplitude at equally spaced moment in time, and quantization, the “translation” from the amplitude of individual sample to integers in binary form. Sample rate refers to the number of samples taken per second (samples/s), which is also called Hertz (Hz). Bit depth refers to the number the number of bits used per sample. The physical process of measuring the changing air pressure amplitude over time can be modeled by the mathematical process of evaluating a sine function at particular points across the horizontal axis. As it showed in the graphic below” (DIGITAL SOUND &MUSIC).

In order to reduce the file size and stream more efficiently over network, the sample rate of digital audio, bit depth and bit rate, are often compressed, during which the quality of the music is inevitably damaged. Compression can be lossy and lossless according to audio quality: lossless compression enables compressing the file size while remaining the quality of the audio. What is more, “the file can be restored back to its original state; lossy compression permanently removes data (by reducing original bit depth)” (BBC).

Two Main Infrastructures of Spotify:

When digital audio is packaged into files and “become music at Spotify, aggregation of data occurs on, at, and via many computational layers” (Eriksson et al., 2017). The platformization of Spotify hides the complex data exchange process behind the well-designed GUI, so that its design principles are invisible to consumers. The two main infrastructures, proliferating Data Infrastructure and Audio and Streaming Infrastructure, and several detailed design principles, are what make Spotify workable.

  • Proliferating Data Infrastructure

Following an “end-to-end server and client model”, Spotify’s proliferating data infrastructure, proposed by Eriksson et al., is the foundation of its service . This infrastructure enables the communication between Spotify’s servers and their clients’ devices: at one end lied their servers and data centers to “send out music files and fetch back user data” (Eriksson et al,2019); at the other end lied users’ playback devices, a PC, a smartphone or a tablet.  Spotify is synced to the Spotify Cloud Server, a non-physical public data storage by Google, meaning that it “operates based on Google’s cloud compute, storage, and networking services, as well as its data services, such as Pub/Sub, Dataflow, Big Query, and Dataproc” (Datacenter Knowledge). 

Data is the most important part of this infrastructure. Spotify builds ways for data exchange. Eriksson et al. see the transmission of information and data between users and Spotify’s server as an “event delivery system”. Most events that are produced within Spotify are generated from Spotify’s users. They define user data as sets of “structured events that are caused at some point in time as a reaction to some predefined activity” (Eriksson et al., 2019). When a user performs an action, for instance, clicking the play button on the app, a piece of information (“an event”) signaling “playing the music” is sent from the user’s device to the server via the internet.

  • Audio and Streaming Infrastructure

The second infrastructure is the audio and streaming infrastructure. Spotify balances the file size and the speed of the internet very well since its streaming service experiences a very low latency, that is the delay, between a user requesting a song and hearing it, is almost imperceptible. Spotify’s  low latency streaming is owe to Ogg Vorbis format, an open-source lossy audio compression method “that offers roughly the same sound quality as mp3, but with a much smaller files size”. Note that music files are not permanently stored in the destination device, what happens is that the buffer stores a few seconds of sound before sending it to the speaker, so Spotify’s client “fetches the first part of a song from its infrastructural back end and starts playing a track as soon as sufficient data has been buffered as to make stutter unlikely to occur” (Eriksson et al., 2017 ). The small file size enables Spotify’s server to send the file fast, and thus to play music almost instantly after the client clicks the play button.

Spotify offers different bitrate in regard to streaming quality.

This design enables users to adjust the music quality according to their needs, or they could “turn on the automatic quality streaming” so that the app will automatically detect the best bitrate to use based on the internet environment of the device, which is super convenient for mobile app users, since they do not need to worry about suddenly running out all data by accidentally selecting extreme.

Sum Up: How Does Spotify Track Streams?

The scale of data passes through Spotify is enormous: In 2016, the company “handled more than thirty-eight terabytes of incoming data per day, while permanently storing more than “70 petabytes of…data about songs, playlists, etc.” (Sarrafi,2016)

What happens when the user clicks the “play” button? How does Spotify deal with the large scale of data? How can Spotify make sure the data are transmitted accurately? How Spotify manages to make sure the data is on the right path and head to the correct destination? All things about de-blackboxing the design of Spotify is to track the path of its data files.

In order to deliver music worldwide, Spotify applied SIR (SDN Internet Router).  To recap, “every computer on the Internet has an IP, that IP belongs to an IP network and that IP network belongs to a an organization”(Spotify Labs). So, Spotify identifies users by their IP addresses. First, Spotify has two transit providers to make sure they can reach all clients. Transit providers are companies who own very large networks and allow other organizations to connect to their network for a fee so they can reach the rest of the world. Second, Spotify uses Content Delivery Networks (CDNs), extremely well-connected network, to reach faraway users and help with the bandwidth required to send users the music, so that their users don’t have to wait for their bits to travel all over the world. They have physical data centers in London, Stockholm, Ashburn (VA) and San Jose (CA). In addition to utilizing Google Cloud Platform, Spotify also utilizes up to at least five internet exchange points (IXPs) located in Frankturt (DEC-IX), Stockholm (Netnod), Amsterdam (AMSIX), London (LINX), and Ashburn (EQIX-ASH). The service is also attached to some subscriber networks, broadband or mobile providers, to speed up and shorten the distance to their users (Dbarrosop, 2016).

Spotify splits traffic between data centers. When a Spotify client connects to the service, a combination of techniques is used to make sure that the connection is made to the best possible data center. Also, when Spotify connects to organizations’ network, they need to know which of those connections are suitable to reach the connecting client (Spotify Labs). By applying SRI, Spotify could monitor available paths, choose the best one based on real time metrics, and thus provide clients better and more accurate service.

“As Nicole Starosielski claims, “a simple ‘click’ on a computer commonly activates vast infrastructures whereby information is pushed through router, local internet networks, IXPs, long-haul backbone systems, coastal cable stations, undersea cables and data warehouses at the speed of light” (Eriksson et al. 2017).

Detailed Design Principles that Enable Spotify’s Different Features

  • Spotify as Interface for Agencies

  1. Interface Between Different Websites

Linked Data and Data Integration

Spotify links and combines data from different sources, enabling two of its key features: one is collaboration with companies , the other one is the playlist function. In this section, I will focus on the former feature, and the latter one will be discussed later.

Spotify operates data integrations with lots of companies, the most observable one (across the app’s interface) is the “infrastructural tie-in” with Facebook. It merges its login system with Facebook so that users could either sign up with their email or Facebook account. By linking account to Facebook, Spotify let users “display their Facebook name, picture, and find their friends easily on Spotify” (Spotify). They could also share their playlists with their friends and know what their friends are listening. Two huge databases are connected by users’ simple clicking of “sign up with Facebook”. The collaboration is a win-win deal for both companies. 

Interoperability

Interoperability is “the ability of different information systems, devices and applications (‘systems’) to access, exchange, integrate and cooperatively use data in a coordinated manner, within and across organizational, regional and national boundaries.” (himss.org). Sharing music and personal profile is one of its interoperabilities. Spotify’s users can share Spotify music via multiple platforms such as “Skype, Tumblr, Twitter, Telegram, etc.” When users click one of the bars, the link will direct them to the relating websites, which triggers the information integration process. They can also share songs via Spotify Uniform Resource Indicator (URI). This link is convenient because it directly takes users to the Spotify application, without having to go through the web page first (but with HTTP song link, users will be directed to web page). For example, the link of this song is” https://open.spotify.com/album/3Nlbg1BHLXDKqQVQ9ErCmg”; the URI is“spotify:track:0WVAQaxrT0wsGEG4BCVSn2?context=spotify%3Aplaylist%3A37i9dQZF1DX0BcQWzuB7ZO”. What is the most interesting part is that users can install Spotify music and playlist on their personal website by copying the embed code of the music. Copying and clicking the link inside the application is the analog of opening a new window in the web browser.

In sum, data integration and interoperability are inseparable: it is data integration that enables  the interoperability between Spotify and different websites. The sharing of databases adds lots of “social” functions, and thus affordances, to Spotify.

2. Interface Between Users and Devices

Viewing Spotify from the perspective of offline, it serves as an interface between users and devices, or a product of human computer interaction. This part is a continuation of what happens when the computer finished “communicating” with Spotify’s servers. After the packets arrive at the computer, the system decodes and sends the result to a buffer, then to the speaker. Take the computer desktop application for example, Spotify GUI enables human to interact with the computer by means of visualization, audition and tactician. It serves both as a controlling interface for users to send their command, and a checking interface for users to monitor how well the computer completes their commands. Although it does not invent any new function for computers, it helps users to “develop and assemble” different functions of the computer. The action of clicking the play button (through the touchpad, tactician) triggers the RAM, the buffer, the speaker, the monitor…For example, the interface informs the user that the song starts to play, that is when the computer receives the user’s command and reacts to it , in three ways (on the monitor, visual): 1. The play button change from “pause” to “play”; 2. The progress bar is moving and the remaining time of the song is changing; 3. The speaker icon pops up on the album cover. All of which, accompany with the sound of the music (auditory), are signs that human is interacting with the computer and that the computer is playing the song they want.

3. Interface Between Users and the Underlying System

Playlist

Playlist is the building block of Spotify. It connects different modules of Spotify, for example, from one playlist to another or from one singer to a music genre. The user can find out that playlist is everywhere in Spotify’s desktop: in the home page, there displays different types of playlists in the form of square shaped photos; in the main navigation bar on the left side, there exits a list of playlists and the function of “add new playlist”; when the user search for a song, an artist or a key word, what appears is a screen full of playlists. Playlist is the simulation of album, in which different songs are connected by the same singer. Spotify’s playlists connect songs together based on different elements, such as mood, weather, genre…For example, the songs in the playlist named “Christmas Hits” are connected by the Christmas element. When the user clicks into this playlist, he or she is likely to get into another playlist related to Mariah Carey (who is famous for singing Christmas songs) or “Christmas Classic” (because these two playlists share the same element “Christmas”). Thus, playlists link the data of artists, songs, key words etc. together.

As Eriksson et al. indicate in their study, the streaming metaphor itself implies a continuous flow of music, reminiscent of a never-ending playlist (Eriksson et al, 2019). Building playlists is not a new function invented by Spotify, since“early media players such as Winamp provided functionality for reaggregating tracks into customized playlists–an approach to music that built on previous assembling practice and technology” (Eriksson et al, 2019). Spotify uses playlist to guide users from one module to another. It allows users to mix and match their favorite tracks and rewrap the tracks into their personalized playlists. It also enables Spotify to recommend and create playlists for users based on data of preexisted playlists. The creation and recreation of playlists is a non-stop data aggregation process that keeps Spotify’s different functions working.

Music Recommendation

Followed by the launch of “expert playlists for every mood and moment”, Spotify took a step toward “algorithmic and human curated recommendations”. It “not only delivers music, but also frames and shapes data” (Eriksson et al. 2017). It is also a method of how Spotify enables “deep and unique conversation” between users and the complicated technological system.

Spotify recommends music in various ways: the weekly updated playlists such as “Discover Weekly” and “Release Radar” as well as song and album recommendation such as “Top recommendations for you”, “Similar to…” and “Because you listened to…”.

Take the “Discover Weekly” for example. It is based on three recommendation model: 1. Collaborative Filtering: it makes prediction based on users’ historical behaviors on Spotify, such as  “whether a user saved the track to their own playlist (see, playlist appears again!), or visited the artist’s page after listening to a song” (Ciocca, 2017). Say user Pipi listens to tracks A,B,C, user Sisi listens to the track D,E,F., and they are paired up. Spotify will recommend B to Sisi and F to Pipi, after making sure that neither of them has listened to the music Spotify recommends. The whole process is actualized by matrix math and Python libraries.  2. Natural Language Processing (NLP)models: it is the analytics of text. By searching over the web to look for blog post and other written texts about artists and music, Spotify can figure how people define and describe a specific song or musician; Take Echo Nest for example (Whitman, 2012), they put Spotify’s data into a chart called “cultural vectors” or “top terms”. Each artist has their top terms and associated weight, meaning the possibilities that people are likely to describe them. Basically, Spotify uses the charts to create a vector to determine the similarity of two songs. 3. Raw Audio Models: it is the analysis of the track’s characteristics. Spotify uses convolutional neural networks to analyze the similarities of the characteristics between the music, such as “time signature, key, mode, tempo, and loudness”(Ciocca, 2017). Then, it recommends songs for users based on their listening history.

(“Cultural vectors” or “top terms,” as used by the Echo Nest)

Through layers of analysis and calculations of different sources of data, Spotify makes personalized recommendation for users, which is also a process of database integration and an interoperation across different companies.

  • Spotify as a Sociotechnical System

Spotify only allows its users to download music within the application, which means that users cannot export their user libraries (both online and offline) outside of Spotify’s ecosystem. Music libraries can only be synced between devices with Spotify app. It seems a good way for Spotify to secure their clients, right? But the whole story is not that simple. It has a lot to do with the music industry licensing agreements and digital rights management (DRM).

A Proprietary Format is a format that a software program will accept or output data that is entered into the program (the law dictionary). Music streams from Spotify, protected by DRM, are encrypted in OGG Vorbis format. It is an open source and patent-free alternative for lossy compression, which means that Spotify’s software developers do not need to pay license fees by supporting OGG in their application, neither do they need to publish the change when they fix the code according to their own needs” (Mitchell, 2014). Spotify’s audio files is coded by its own engineer based on the original OGG format, thus its music cannot be decoded or decrypted by other software. To put is simple, users cannot use other media player to play Spotify’s music, do not even mention to burn Spotify’s music into CD.

 Spotify is available in most of Europe and the Americas, Australia, New Zealand, and parts of Africa and Asia. Its content can be accessed through both app and web player: the app can operate on iOS, Android, Mac and Window system as well as through several sound systems, TVs and car stereo systems; the web player is supported by web browsers including Chrome, Firefox, Edge and Opera (Spotify). To be more specific, Spotify is coded to operate on those systems and devices. 

The enactment of Digital Performance Right in Sound Recordings Act (DPRA) and Digital Millennium Copyright Act (DMCA) forces many music streaming providers to pay a sound performance royalty in addition to the musical work royalty (Richardson, 2014). Spotify “must pay licensing fees to copyright holders (record labels, such as Warner Brothers and SONY) for each song played, whether offered to a paying, or to a non-paying customer, which makes the freemium model of customer expensive. That is also the reason why Spotify charges for premium services.

Also, if the user opens  “About Spotify” tab, he or she can see a series of logos for Universal Music Group, EMI, Warner Music Group, etc. under the “Content provided by” part, meaning that the music on Spotify comes from elsewhere.

Spotify “opens its Application Program Interface (API) to external developers, whose applications could retrieve data from the Spotify music catalog”. But developers have to follows a set of rules in order to develop their apps, such as agreeing with terms of uses and creating client IDs. Also, they have to “go through a rigorous Spotify approval process before being released on the platform” (Myers, 2011). 

Its collaboration with other companies and websites is supported by API. Spotify uses proprietary format to protect the artists and the content of its software.

Conclusion:

From producing, compressing and packaging, to transmitting, decoding and playing music files, Spotify does not invent anything new for the music streaming industry, but it does a great job in working as an interface of “connecting and coordinating”. It connects websites, people, and physical functions of devices. It coordinates the labor distribution of the software’s functions, the passage of data packets, the relationship between its service and the sociotechnical environment. With the purpose of connecting and coordinating, following the rules of the large sociotechnical background, Spotify applies as many design principles as possible to provide their users the most pleasing, convenient and personalized music streaming service. There is nothing about the technology itself that makes everything a blackbox for consumers (Martin Irvine), but how Spotify applies the design rules. Spotify is a product of human computer interaction.

Reference and Citations:

  1. Peter Brusilovsky(2007). The Adaptive Web. p. 325. ISBN 978-3-540-72078-2.
  2. Richardson, J. H. (2014). The Spotify Paradox: How the Creation of a Compulsory License Scheme for Streaming On-Demand Music Services Can Save the Music Industry. SSRN Electronic Journal. doi: 10.2139/ssrn.2557709
  3. Eriksson, M., Fleischer, R., Johansson, A., Snickars, P., & Vonderau, P. (2019). Spotify teardown inside the black box of streaming music. Cambridge, MA: The MIT Press.
  4. Ron White, How Computers Work. 9th ed. Indianapolis, IN: Que Publishing, 2007. Excerpts.
  5. Peter J. Denning and Craig H. Martell. Great Principles of Computing. Cambridge, MA: MIT Press, 2015. Review chapters 4, 5, 6. Excerpts in pdf.
  6. Brianwhitman, Author. “How Music Recommendation Works – and Doesn’t Work.” Variogram by Brian Whitman, 11 Dec. 2012, https://notes.variogr.am/2012/12/11/how-music-recommendation-works-and-doesnt-work/.
  7. “Streaming.” Merriam-Webster, Merriam-Webster, https://www.merriam-webster.com/dictionary/streaming.
  8. “5.1.2 Digitization.” Digital Sound & Music, 23 Jan. 2018, http://digitalsoundandmusic.com/5-1-2-digitization/.
  9. “Encoding Audio and Video – Revision 5 – GCSE Computer Science – BBC Bitesize.” BBC News, BBC, https://www.bbc.co.uk/bitesize/guides/z7vc7ty/revision/5.
  10. Sverdlik, Yevgeniy. “How Much Is Spotify Paying for Google Cloud?” Data Center Knowledge, 7 Mar. 2016, https://www.datacenterknowledge.com/archives/2016/03/07/how-much-is-spotify-paying-for-google-cloud.
  11. Dbarrosop. “SDN Internet Router – Part 1.” Labs, 28 Jan. 2016, https://labs.spotify.com/2016/01/26/sdn-internet-router-part-1/.
  12. Dbarrosop. “SDN Internet Router – Part 2.” Labs, 2 Feb. 2016, https://labs.spotify.com/2016/01/27/sdn-internet-router-part-2/.
  13. Ciocca, Sophia. “How Does Spotify Know You So Well?” Medium, Medium, 5 Apr. 2018, https://medium.com/s/story/spotifys-discover-weekly-how-machine-learning-finds-your-new-music-19a41ab76efe.