Music To My Ears: De-Blackboxing Spotify’s Recommendation Engine

Proma Huq
CCTP-607 Big Ideas in Tech: AI to the Cloud
Dr. Martin Irvine
Georgetown University 2019


The average individual makes over 70 conscious decisions a day (Iyengar, 2011). To steer consumers through this maze of choices, recommendation system algorithms are perhaps one of the most ubiquitous applications of machine learning for online products and services (McKinsey, 2013). A prime example of this is evident in Spotify’s recommendation engine, which harnesses deep learning systems and neural networks to map accurate content suggestions for customized playlists, such as its “Discover Weekly” series. To further explore the paradigm of recommendation systems, the research question for this case study is “How Does Spotify’s Recommendation Algorithm Provide Accurate Content?” Key insights include de-blackboxing the algorithm, the process of collaborative filtering and matrix factorization, providing a deeper understanding of how Spotify gathers “taste analysis data”, thereby providing a positive user experience. 


On any given day, the average individual makes a range of conscious decisions about their media consumption. As we navigate these omnipresent choices in our increasingly interconnected world, recommendation system algorithms that are now ubiquitous in our lives steer us towards them, like invisible Internet elves enabling our every whim or flight of fancy. Would you like to watch this show on Netflix? Perhaps you’d like to buy this on Amazon? Or the ever-present, “based on your interest in X, how about Y?” These helpful suggestions are part of an overarching “data-driven and analytical” approach to consumer decision-making (Ramachandran, 2018), fueled by recommendation engines. This is especially salient when it comes to media consumption, as these algorithms allow users to find relevant and enjoyable content, thereby increasing user satisfaction and driving engagement for a given product or service (Johnson, 2014). The manner in which a successfully designed recommendation algorithm drives growth in an organization is evident in the case of Spotify.

With over 200 million users worldwide, 96 million of whom are premium subscribers (Spotify 2019, Fortune, 2019) Spotify is a treasure trove of big data, with a recommendation algorithm that is ripe for de-blackboxing. Considering the current media affordances in terms of the technology of streaming music, Spotify and its recommendation algorithms are changing the way we discover, listen to and interact with music. Based on this, the primary research question for this paper is “How Does Spotify’s Recommendation Algorithm Provide Accurate Insights?”

Spotify in Numbers

Launched in 2008, Spotify is a Swedish start-up – an audio streaming platform that offers a “freemium” tier subscription service, earning revenue by means of advertising and premium subscription fees (Spotify, 2019). The free version, aptly named “Spotify Free”, only allows users to shuffle play songs and listening is interspersed with ads. At $9.99/month, Spotify Premium  gives users the freedom of choice, ad-free unlimited access and higher quality audio. The company offers over 40 million songs with an estimated 5 million playlists that are curated and edited (Spotify, 2018). 


Spotify boasts a wide range of genres, moods and even ocassion based playlists. In fact, as I write this paper, I’m listening to “Chill Lofi Study Beats”, a Spotify curated playlist that has 422,812 followers and is one of several playlists in the “Study” genre, as offered by Spotify.

Figure 1: Playlist Screenshot

In July 2015, Spotify launched its Discover Weekly playlist. As clearly apparent by it’s self-evident title, Discover Weekly is an algorithm-generated playlist that is released, (or, in colloquial music terms, “dropped”) every Monday, bringing listeners up to two hours of custom, curated music recommendations. Spotify also offers other customized recommendations in Daily Mixes, Release Radar and Recommended suggestions of playlists or artists.  Users claimed it was “scary” how well Spotify was able to discern their musical tastes and that the platform “knew” or “got” them. According to Spotify, by 2016 Discover Weekly had “reached nearly 5 billion tracks streamed” since it’s launch, a clear sign of its success as an algorithmic product offering. 

Source: Vox Creative

De-blackboxing The Recommendation Algorithm 

The primary aim of recommendation algorithms are to analyze user data in order to provide personalized recommendations. In terms of Spotify, Discover Weekly and other playlists are created using collaborative filtering, based on the user’s listening history, in tadem with songs enjoyed by users who seem to have a similar history. Additionally, Spotify uses “Taste Analysis Data” to establish a Taste Profile. This technology, developed by Echo Nest (Titlow, 2016), groups the music users frequently listen to into clusters and not genres, as the human categorization of music is largely subjective. Examples of this are evident in Spotify’s Discover Weekly and Daily Mix playlists suggestions. Clustering algorithms like Spotify’s group data based on their similarities. Alpaydin describes clustering as an “exploratory data analysis technique where we identify groups naturally occurring in the data” (Alpaydin, 2016 p. 115). Services like Spotify can cluster songs, genres and even playlist tones,  in order to train a machine learning algorithms to predict preferences and future listening patterns. 

Figure 2: How Discover Weekly Works. Source: Pasick, 2015.

Machine learning algorithms in recommender systems are typically classified under two main categories — content based and collaborative filtering (Johnson, 2014). Traditionally, Spotify has relied primarily on collaborative filtering approaches for their recommendations. This works well for Spotify, as it revolves around the strategy of determining user preference from historical behavioral data patterns. An example of this is if two users listen to the same sets of songs or artists, their tastes are likely to align. Christopher Johnson, former Director of Data Science at Spotify, who worked on the launch of Discover Weekly, outlines the differences between the two in his paper on Spotify’s algorithm. According to Johnson, a Content Based strategy relies on analyzing factors and demographics that are directly associated with the user or product, such as the age, sex and demographic of the user or a song genre or period, such as music in the 70’s or 80’s. Recommendation systems that are based on Collaborative Filtering take consumer behavior data and utilize it to predict future behavior (Johnson, 2014). This consumer behavior leaves a trail of data, generated through implicit and explicit feedback (Ciocca, 2017). Unlike Netflix, which from its nascence used a 5 star point rating system (WSJ, 2018), Spotify relied primarily on implicit feedback to train their algorithm. Examples of user data based on implicit feedback can be playing a song on repeat or skipping it entirely after the first 10 seconds. User data is also gleaned from explicit feedback (Pasick, 2015), such as the heart button on Discover Weekly or songs that were liked that automatically save in the library and “Liked from Radio” playlist. An example of the myriad other ways in which collaborative filtering and recommendation algorithms work in different approaches is evident in the diagram below ( see Figure 3). Spotify uses an amalgamation of 4 approaches – Attribute based, CF (item by item), CF (user similarity) and Model based. 

Figure 3: Recommender Algorithm Approaches. Source: Zheng, 2015.

Spotify further analyzes and applies user data by using a matrix decomposition method, which is also known as matrix factorization. The approach of matrix factorization aims to find answers by ‘decomposing’ (hence the term matrix decomposition) the data into two separate segments (Alpaydin, 2016). The first segment defines the user in terms of marked factors, each of which is weighted differently. The second segment maps between factors and products, which in the Spotify universe are songs, artists, albums and genres, thus defining a factor in terms of the products offered. In his book, Alpaydin provides an example to further elaborate on this as applied to movies. In the following diagram (see Figure 4) each customer has only watched a small percentage of the movies and the overall movies have only been watched by a small percentage of the customers. Based on these assumptions, the learning algorithm needs to be able to generalize and predict successfully.

Figure 4: Matrix decomposition for movie recommendations.
Source: Alpaydin, 2016.

Each row of the data matrix X contains movies (or for the scope of this case study, this can interchangeably be considered as music). Most of this data however, is missing, as the customer has not yet watched many of the movies, which is where the recommendation system comes in. The matrix then factors this into two splits – F and G – where F is factors and G movies/music. Spotify uses a matrix factorization application called Logistic Matrix Factorization or Logistic MF, to generate lists of related artists, for example for Artist ‘Radio’ playlists, based on binary preference data (Johnson, 2014). This matrix is established by calculating millions of recommendations based on millions of other user behavior and preferences, an example of which can be seen below, in Figure 6.


Each row of this sample matrix represents one of Spotify’s 200 million users.  Conversely, each column represents one of the 40 million songs in their database.


Figure 6: Spotify Matrix Snapshot. Source: Ciocca, 2017.

This is followed by the data being run through a matrix factorization formula, resulting in two different vectors, identified in this diagram (see Figure 7) as X and Y. In terms of Spotify, X represents the user and their preferences, while Y embodies the song, representing a single song profile (Ciocca, 2017).

Figure 7: User/Song Matrix. Source: Johnson, 2015.

Navigating Key Challenges: ConvNet & NLP

In previous years, Spotify encountered a “cold start problem” (Schrauwen, 2014) – when no prior behavioral or user data was available it was unable to use its existing CF model trained algorithms. Consequently, faced with a “cold start”, Spotify found themselves inept at providing recommendations for brand new artists or old or unpopular music. In order to navigate this, Spotify harnessed convolutional neural networks – known as CNNs or ConvNet – the same deep neural network technology used in facial recognition software. In the case of Spotify, the CNN has been trained within the set paradigms of audio, conducting a raw audio data analysis instead of examining pixels. The audio frames pass through the convolutional layers of the neural network architecture resulting in a “global temporal pooling layer” (Dieleman, 2014), the computation of learned features throughout the course of a single track. By identifying a song’s key characteristics, such as time, tone, tempo etc., the neural network “understands” the song, thereby allowing Spotify to identify and recommend similar songs and artists to targeted users – those who display the same behavioral past data – thus determining accuracy. Additionally, for further accuracy, Spotify uses NLP or Natural Language Processing in analyzing the “playlist itself as a document” (Johnson, 2015), using each song title, artist or other textual evidence to analyze as part of their machine learning recommendation algorithm.

Outliers: This is Not My Jam!

As a by product of this training, Spotify is smart enough to recognize and distinguish outliers. Alpaydin expains this as another application area of machine learning, termed outlier detection, where the aim this time is to find instances that do not obey the general rule—those are the exceptions that are informative in certain contexts.” (Alpaydin 2017 p. 72). For example, let’s imagine I recently watched Bohemian Rhapsody, the Queen movie, and happened to listen to a song by the band once, deviating from my usual stream of microgenres such as nu-disco, house and electro-funk. If Spotify, based on that outlier, now kept sending me recommendations to listen to Queen or other 70’s bands, I as a user may not obtain high levels of satisfaction from the service, thereby losing interest in it and feeling Spotify doesn’t “get” me. In the diagram below (see Figure 8) the user in question has a taste profile that primarily consists of the genres of funk/soul, indie folk and folk. The outlier in this case is a kid’s song, perhaps played for the author’s daughter a few times. The algorithm must be trained to follow the data nuggets on the trail of pattern recognition thereby eliminating any outliers for the recommendation algorithm.

Figure 8: Spotify core preference diagram. Source: Pasick, 2015

Ethical Implications 

Similar to the manner in which deep neural networks established paradigms for a “good” selfie by virtually eliminating people of color in a ConvNet training experiment (Karpathy, 2015), the defined parameters for recommendation algorithms can have a larger affect on music. The potential shortfall of collaborative filtering are rampant when machine learning design is trained only to exhibit certain results based on preexisting pattern recognition. Despite the fact that Spotify aims to neutralize this by factoring in other methods of data analysis, machine learning recommendation algorithms can still potentially bury other data, or in the case of Spotify, other music based on probabilistic inferences and predictions. 


In tandem with advances in technology and media affordances, future implications of machine learning include more personalized, immersive user experiences with progressively complex features. With recommendation algorithms choosing what content we watch, what we listen to and even our romantic relationships (du Sautoy, 2018), guiding users towards certain choices and away from others eliminates free choice, so to speak, ‘pigeon holing’ users. It is important to remember however, that ultimately these algorithms are trained and designed. Despite the often hyperbolic coverage it receives, the overarching umbrella of AI in and of itself relies heavily on machine learning and ML fairness. Marcus claims that “the logic of deep learning is such that it is likely to work best in highly stable worlds” (Marcus, 2018). However, in today’s world of fluid musical genres and especially while applying the concepts of pattern recognition, machine learning and collaborative filtering, most of the user generated data is still subjective – a microcosm of the larger sociotechnical system we live in.

Works Cited

Alpaydin, E. (2016). Machine Learning. Cambridge, Massachusetts. The MIT Press.

Ciocca, S. (2017). How Does Spotify Know You So Well? Medium.

Ek, D. (2019). The Path Ahead: Audio First. Spotify Blog.

HBS (2018). Spotify May Know You Better Than You Realize. Harvard University. Retrieved from

Johnson, C. (2014). Algorithmic Music Recommendations at Spotify

Johnson, C. (2014) Logistic Matrix Factorization for Implicit Feedback Data. Spotify

Johnson, C. (2015). From Idea to Execution: Spotify’s Discover Weekly. Retrieved from

Karpathy, A. (2015) “What a Deep Neural Network Thinks About Your #selfie,”

Iyengar, S. (2011). How to Make Choosing Easier. TED, New York.

Marcus, G. (2018).”Deep Learning: A Critical Appraisal” ArXiv.Org.

McKinsey (2013). How Retailers Can Keep Up With Consumers.

Navisro Analytics (2012). Collaborative Filtering and Recommendation Systems

Pasick, A (2015). The magic that makes spotify’s discover weekly playlists so damn good. Quartz.

Ramachandran, S., & Flint, J. (2018). At Netflix, Who Wins When It’s Hollywood vs. the Algorithm? The Wall Street Journal.

Schrauwen, B. Oord, V. D. A, (2014). Deep Content-Based Music Recommendation. Ghent University

Zheng, Y. (2015). Matrix Factorization In Recommender Systems. DePaul University.