Author Archives: Proma Huq

Music To My Ears: De-Blackboxing Spotify’s Recommendation Engine

Proma Huq
CCTP-607 Big Ideas in Tech: AI to the Cloud
Dr. Martin Irvine
Georgetown University 2019

Abstract

The average individual makes over 70 conscious decisions a day (Iyengar, 2011). To steer consumers through this maze of choices, recommendation system algorithms are perhaps one of the most ubiquitous applications of machine learning for online products and services (McKinsey, 2013). A prime example of this is evident in Spotify’s recommendation engine, which harnesses deep learning systems and neural networks to map accurate content suggestions for customized playlists, such as its “Discover Weekly” series. To further explore the paradigm of recommendation systems, the research question for this case study is “How Does Spotify’s Recommendation Algorithm Provide Accurate Content?” Key insights include de-blackboxing the algorithm, the process of collaborative filtering and matrix factorization, providing a deeper understanding of how Spotify gathers “taste analysis data”, thereby providing a positive user experience. 

Introduction

On any given day, the average individual makes a range of conscious decisions about their media consumption. As we navigate these omnipresent choices in our increasingly interconnected world, recommendation system algorithms that are now ubiquitous in our lives steer us towards them, like invisible Internet elves enabling our every whim or flight of fancy. Would you like to watch this show on Netflix? Perhaps you’d like to buy this on Amazon? Or the ever-present, “based on your interest in X, how about Y?” These helpful suggestions are part of an overarching “data-driven and analytical” approach to consumer decision-making (Ramachandran, 2018), fueled by recommendation engines. This is especially salient when it comes to media consumption, as these algorithms allow users to find relevant and enjoyable content, thereby increasing user satisfaction and driving engagement for a given product or service (Johnson, 2014). The manner in which a successfully designed recommendation algorithm drives growth in an organization is evident in the case of Spotify.

With over 200 million users worldwide, 96 million of whom are premium subscribers (Spotify 2019, Fortune, 2019) Spotify is a treasure trove of big data, with a recommendation algorithm that is ripe for de-blackboxing. Considering the current media affordances in terms of the technology of streaming music, Spotify and its recommendation algorithms are changing the way we discover, listen to and interact with music. Based on this, the primary research question for this paper is “How Does Spotify’s Recommendation Algorithm Provide Accurate Insights?”

Spotify in Numbers

Launched in 2008, Spotify is a Swedish start-up – an audio streaming platform that offers a “freemium” tier subscription service, earning revenue by means of advertising and premium subscription fees (Spotify, 2019). The free version, aptly named “Spotify Free”, only allows users to shuffle play songs and listening is interspersed with ads. At $9.99/month, Spotify Premium  gives users the freedom of choice, ad-free unlimited access and higher quality audio. The company offers over 40 million songs with an estimated 5 million playlists that are curated and edited (Spotify, 2018). 

 

Spotify boasts a wide range of genres, moods and even ocassion based playlists. In fact, as I write this paper, I’m listening to “Chill Lofi Study Beats”, a Spotify curated playlist that has 422,812 followers and is one of several playlists in the “Study” genre, as offered by Spotify.

Figure 1: Playlist Screenshot

In July 2015, Spotify launched its Discover Weekly playlist. As clearly apparent by it’s self-evident title, Discover Weekly is an algorithm-generated playlist that is released, (or, in colloquial music terms, “dropped”) every Monday, bringing listeners up to two hours of custom, curated music recommendations. Spotify also offers other customized recommendations in Daily Mixes, Release Radar and Recommended suggestions of playlists or artists.  Users claimed it was “scary” how well Spotify was able to discern their musical tastes and that the platform “knew” or “got” them. According to Spotify, by 2016 Discover Weekly had “reached nearly 5 billion tracks streamed” since it’s launch, a clear sign of its success as an algorithmic product offering. 

Source: Vox Creative

De-blackboxing The Recommendation Algorithm 

The primary aim of recommendation algorithms are to analyze user data in order to provide personalized recommendations. In terms of Spotify, Discover Weekly and other playlists are created using collaborative filtering, based on the user’s listening history, in tadem with songs enjoyed by users who seem to have a similar history. Additionally, Spotify uses “Taste Analysis Data” to establish a Taste Profile. This technology, developed by Echo Nest (Titlow, 2016), groups the music users frequently listen to into clusters and not genres, as the human categorization of music is largely subjective. Examples of this are evident in Spotify’s Discover Weekly and Daily Mix playlists suggestions. Clustering algorithms like Spotify’s group data based on their similarities. Alpaydin describes clustering as an “exploratory data analysis technique where we identify groups naturally occurring in the data” (Alpaydin, 2016 p. 115). Services like Spotify can cluster songs, genres and even playlist tones,  in order to train a machine learning algorithms to predict preferences and future listening patterns. 

Figure 2: How Discover Weekly Works. Source: Pasick, 2015.

Machine learning algorithms in recommender systems are typically classified under two main categories — content based and collaborative filtering (Johnson, 2014). Traditionally, Spotify has relied primarily on collaborative filtering approaches for their recommendations. This works well for Spotify, as it revolves around the strategy of determining user preference from historical behavioral data patterns. An example of this is if two users listen to the same sets of songs or artists, their tastes are likely to align. Christopher Johnson, former Director of Data Science at Spotify, who worked on the launch of Discover Weekly, outlines the differences between the two in his paper on Spotify’s algorithm. According to Johnson, a Content Based strategy relies on analyzing factors and demographics that are directly associated with the user or product, such as the age, sex and demographic of the user or a song genre or period, such as music in the 70’s or 80’s. Recommendation systems that are based on Collaborative Filtering take consumer behavior data and utilize it to predict future behavior (Johnson, 2014). This consumer behavior leaves a trail of data, generated through implicit and explicit feedback (Ciocca, 2017). Unlike Netflix, which from its nascence used a 5 star point rating system (WSJ, 2018), Spotify relied primarily on implicit feedback to train their algorithm. Examples of user data based on implicit feedback can be playing a song on repeat or skipping it entirely after the first 10 seconds. User data is also gleaned from explicit feedback (Pasick, 2015), such as the heart button on Discover Weekly or songs that were liked that automatically save in the library and “Liked from Radio” playlist. An example of the myriad other ways in which collaborative filtering and recommendation algorithms work in different approaches is evident in the diagram below ( see Figure 3). Spotify uses an amalgamation of 4 approaches – Attribute based, CF (item by item), CF (user similarity) and Model based. 

Figure 3: Recommender Algorithm Approaches. Source: Zheng, 2015.

Spotify further analyzes and applies user data by using a matrix decomposition method, which is also known as matrix factorization. The approach of matrix factorization aims to find answers by ‘decomposing’ (hence the term matrix decomposition) the data into two separate segments (Alpaydin, 2016). The first segment defines the user in terms of marked factors, each of which is weighted differently. The second segment maps between factors and products, which in the Spotify universe are songs, artists, albums and genres, thus defining a factor in terms of the products offered. In his book, Alpaydin provides an example to further elaborate on this as applied to movies. In the following diagram (see Figure 4) each customer has only watched a small percentage of the movies and the overall movies have only been watched by a small percentage of the customers. Based on these assumptions, the learning algorithm needs to be able to generalize and predict successfully.

Figure 4: Matrix decomposition for movie recommendations.
Source: Alpaydin, 2016.

Each row of the data matrix X contains movies (or for the scope of this case study, this can interchangeably be considered as music). Most of this data however, is missing, as the customer has not yet watched many of the movies, which is where the recommendation system comes in. The matrix then factors this into two splits – F and G – where F is factors and G movies/music. Spotify uses a matrix factorization application called Logistic Matrix Factorization or Logistic MF, to generate lists of related artists, for example for Artist ‘Radio’ playlists, based on binary preference data (Johnson, 2014). This matrix is established by calculating millions of recommendations based on millions of other user behavior and preferences, an example of which can be seen below, in Figure 6.

 

Each row of this sample matrix represents one of Spotify’s 200 million users.  Conversely, each column represents one of the 40 million songs in their database.

 

Figure 6: Spotify Matrix Snapshot. Source: Ciocca, 2017.

This is followed by the data being run through a matrix factorization formula, resulting in two different vectors, identified in this diagram (see Figure 7) as X and Y. In terms of Spotify, X represents the user and their preferences, while Y embodies the song, representing a single song profile (Ciocca, 2017).

Figure 7: User/Song Matrix. Source: Johnson, 2015.

Navigating Key Challenges: ConvNet & NLP

In previous years, Spotify encountered a “cold start problem” (Schrauwen, 2014) – when no prior behavioral or user data was available it was unable to use its existing CF model trained algorithms. Consequently, faced with a “cold start”, Spotify found themselves inept at providing recommendations for brand new artists or old or unpopular music. In order to navigate this, Spotify harnessed convolutional neural networks – known as CNNs or ConvNet – the same deep neural network technology used in facial recognition software. In the case of Spotify, the CNN has been trained within the set paradigms of audio, conducting a raw audio data analysis instead of examining pixels. The audio frames pass through the convolutional layers of the neural network architecture resulting in a “global temporal pooling layer” (Dieleman, 2014), the computation of learned features throughout the course of a single track. By identifying a song’s key characteristics, such as time, tone, tempo etc., the neural network “understands” the song, thereby allowing Spotify to identify and recommend similar songs and artists to targeted users – those who display the same behavioral past data – thus determining accuracy. Additionally, for further accuracy, Spotify uses NLP or Natural Language Processing in analyzing the “playlist itself as a document” (Johnson, 2015), using each song title, artist or other textual evidence to analyze as part of their machine learning recommendation algorithm.

Outliers: This is Not My Jam!

As a by product of this training, Spotify is smart enough to recognize and distinguish outliers. Alpaydin expains this as another application area of machine learning, termed outlier detection, where the aim this time is to find instances that do not obey the general rule—those are the exceptions that are informative in certain contexts.” (Alpaydin 2017 p. 72). For example, let’s imagine I recently watched Bohemian Rhapsody, the Queen movie, and happened to listen to a song by the band once, deviating from my usual stream of microgenres such as nu-disco, house and electro-funk. If Spotify, based on that outlier, now kept sending me recommendations to listen to Queen or other 70’s bands, I as a user may not obtain high levels of satisfaction from the service, thereby losing interest in it and feeling Spotify doesn’t “get” me. In the diagram below (see Figure 8) the user in question has a taste profile that primarily consists of the genres of funk/soul, indie folk and folk. The outlier in this case is a kid’s song, perhaps played for the author’s daughter a few times. The algorithm must be trained to follow the data nuggets on the trail of pattern recognition thereby eliminating any outliers for the recommendation algorithm.

Figure 8: Spotify core preference diagram. Source: Pasick, 2015

Ethical Implications 

Similar to the manner in which deep neural networks established paradigms for a “good” selfie by virtually eliminating people of color in a ConvNet training experiment (Karpathy, 2015), the defined parameters for recommendation algorithms can have a larger affect on music. The potential shortfall of collaborative filtering are rampant when machine learning design is trained only to exhibit certain results based on preexisting pattern recognition. Despite the fact that Spotify aims to neutralize this by factoring in other methods of data analysis, machine learning recommendation algorithms can still potentially bury other data, or in the case of Spotify, other music based on probabilistic inferences and predictions. 

Conclusion

In tandem with advances in technology and media affordances, future implications of machine learning include more personalized, immersive user experiences with progressively complex features. With recommendation algorithms choosing what content we watch, what we listen to and even our romantic relationships (du Sautoy, 2018), guiding users towards certain choices and away from others eliminates free choice, so to speak, ‘pigeon holing’ users. It is important to remember however, that ultimately these algorithms are trained and designed. Despite the often hyperbolic coverage it receives, the overarching umbrella of AI in and of itself relies heavily on machine learning and ML fairness. Marcus claims that “the logic of deep learning is such that it is likely to work best in highly stable worlds” (Marcus, 2018). However, in today’s world of fluid musical genres and especially while applying the concepts of pattern recognition, machine learning and collaborative filtering, most of the user generated data is still subjective – a microcosm of the larger sociotechnical system we live in.

Works Cited

Alpaydin, E. (2016). Machine Learning. Cambridge, Massachusetts. The MIT Press.

Ciocca, S. (2017). How Does Spotify Know You So Well? Medium. https://medium.com/s/story/spotifys-discover-weekly-how-machine-learning-finds-your-new-music-19a41ab76efe

Ek, D. (2019). The Path Ahead: Audio First. Spotify Blog. https://newsroom.spotify.com/2019-02-06/audio-first/

HBS (2018). Spotify May Know You Better Than You Realize. Harvard University. Retrieved from https://digit.hbs.org/submission/spotify-may-know-you-better-than-you-realize/

Johnson, C. (2014). Algorithmic Music Recommendations at Spotify

Johnson, C. (2014) Logistic Matrix Factorization for Implicit Feedback Data. Spotify https://web.stanford.edu/~rezab/nips2014workshop/submits/logmat.pdf

Johnson, C. (2015). From Idea to Execution: Spotify’s Discover Weekly. Retrieved from https://www.slideshare.net/MrChrisJohnson/from-idea-to-execution-spotifys-discover-weekly

Karpathy, A. (2015) “What a Deep Neural Network Thinks About Your #selfie,”  http://karpathy.github.io/2015/10/25/selfie/

Iyengar, S. (2011). How to Make Choosing Easier. TED, New York. https://www.ted.com/talks/sheena_iyengar_choosing_what_to_choose

Marcus, G. (2018).”Deep Learning: A Critical Appraisal” ArXiv.Org.

McKinsey (2013). How Retailers Can Keep Up With Consumers.
https://www.mckinsey.com/industries/retail/our-insights/how-retailers-can-keep-up-with-consumers

Navisro Analytics (2012). Collaborative Filtering and Recommendation Systems https://www.slideshare.net/navisro/recommender-system-navisroanalytics

Pasick, A (2015). The magic that makes spotify’s discover weekly playlists so damn good. Quartz. https://qz.com/571007/the-magic-that-makes-spotifys-discover-weekly-playlists-so-damn-good/

Ramachandran, S., & Flint, J. (2018). At Netflix, Who Wins When It’s Hollywood vs. the Algorithm? The Wall Street Journal.

Schrauwen, B. Oord, V. D. A, (2014). Deep Content-Based Music Recommendation. Ghent University https://papers.nips.cc/paper/5004-deep-content-based-music-recommendation.pdf

Zheng, Y. (2015). Matrix Factorization In Recommender Systems. DePaul University.

 

Deep Learning and Deep Insights

A lot of the hyperbolic coverage on the implications of AI, in tandem with the overarching umbrella under which products and services are labeled as “AI”, is rooted in misinformation. De-blackboxing the myths and paving the way for a clearer personal understanding of artificial intelligence and similar concepts has been my primary goal as our class navigated through the field. Fittingly, the European Union’s guidelines for developing ethical applications of AI provide a succinct summation of the range of concepts covered over the course of our class. Some of the key points were as follows:

  • Human Agency & Oversight (the very first concept we tackled in this class, that human autonomy allows for design and intervention)
  • Privacy and Governance
  • Transparency
  • Diversity, Non-discrimination and fairness (linking back to the work we did on ML fairness)

This journey of insights led to narrowing down my fields of interest – NLP, ML and deep learning systems – to determine my final project on Spotify and the manner in which its algorithms are changing the way we interact with music. Spotify uses “taste analysis data” a technology developed by Echo Nest (Titlow, 2016), which groups the music users frequently listen to into clusters (not genres, as human categorization of music is largely subjective). Examples of this are Spotify’s Discover Weekly and Daily Mix playlists, and also the end of the year “wrapped” playlists, where they provide each user with insights about their music habits. Essentially, Discover Weekly is Spotify’s unique version of the recommendation engine – similar to the way in which Amazon recommends new books (and just about everything else under the sun) both online and recently bringing the same phenomenon to their brick and mortar Amazon Bookstores – “if you like this, try this!

 

 

 

 

 

According to Marcus, ….in speech recognition, for example, a neural network learns a mapping between a set of speech sounds, and set of labels (such as words or phonemes)”. For the purpose of my project, I aim to determine how deep learning systems and neural networks learn how to map songs for the “Discover Weekly” playlist, for example, to determine which set of categories a certain song belongs to. Marcus also claims that “the logic of deep learning is such that it is likely to work best in highly stable worlds”, which is problematic, both for the scope of my project (especially in today’s world of fluid genres in terms of music) and the larger sociotechnical system we live in.

 

 

 

Air, Water and Facebook

Last week’s gnoviCon had some fascinating insights on the impact of Big Data and its potential to overturn established concepts of trust, privacy and security. The theme for this year was Big Tech, Data & Democracy and the keynote speaker, Siva Vaidhyanathan, touched on all three of those points in his talk on Facebook and its role in a democratic society.

A forerunner in accumulating and storing Big Data, Facebook advertising is unparalleled, even more so as a political tool. The 2016 presidential elections marked the first time in modern history that a political party’s campaign strategy involved investing heavily not on television ads, but in social media. With Facebook having 2.3 billion users, Vaidhyanathan challenged the audience to think of a company with a similar reach – “I don’t know anything that’s touched that many people, except maybe air and water. It goes air, water, Facebook!” he mused.

Recent events such as the Facebook Cambridge Analytica scandal coupled with the affordances of targeted advertising raise the question of whether Facebook and other social media sites are helping or hindering democracy. In this era of “fake news”, even unpopular opinions can be amplified on Facebook due to the algorithm, as if your “friends” interact with such a post, it will get more traction. Panelist Ethan Porter argued, “It’s incumbent upon social media companies to invest in combating misinformation. The good news however, is that people can become more politically informed by using social media”. This is an example of by-product learning (Prior, 2013) the act of learning political information through an unintended source. In recent times, most people learn political facts as a by-product of non-political routines such as scrolling through Facebook, circling back to the fact that Big Tech companies are held responsible for the security and visibility of sensitive data.

Another point brought up by the advent of big tech and the pernicious side effects of ubiquitous computing is that we are so overwhelmed by “a constant barrage of stimulation – a mini Times Square in our pockets – that it habituates us to the immediacy of its call,” said Vaidhyanathan. Our investment in social media has resulted in the dis-investment in social institutions such as science and health technology that help us collectively work to solve problems, adding to the vast galaxy of big data along the way.

References

Prior, Marcus. 2014. Post broadcast democracy. Chapter 1, “Introduction” and Chapter 3, “Broadcast television, political knowledge, and turnout.”

The Consumer & The Cloud

Most of the ubiquitous computing we engage with in our everyday lives work in tandem with “the Cloud” – a seemingly abstract technology that some people are perplexed by. What exactly is the cloud? Unlike its illusory name, it is not a fluffy contraption in the sky beaming up all our data and then raining it back down on us when we need it. In de-blackboxed terms, cloud computing is simply storing and accessing data over the Internet, instead of on your specific device – your phone, laptop or smart TV.

The NIST definition of cloud computing consists of 5 characteristics, which help to further deblackbox the technology (Ruparelia, 2016). These characteristics consists of:

Cloud Computing Characteristics

  • Ubiquitous Access
  • On-Demand Availability based on the consumers self-service
  • Pooling of resources
  • Rapid Elasticity
  • Measured Service Usage

An example of cloud computing that consumers may be familiar with is evident in Apple’s iCloud, which is essentially a storage service. As of 2016, the service had 782 million users – an astronomical amount of data (Apple, 2019). Each iCloud account gets 5GB of storage for free, for email, documents, photos and backup data. For more data, such as if you were to have 10,000 photos on your iPhone, you pay a small monthly fee. Another popular example is Google drive, where one can access documents and media files remotely, free of the chains of traditional hardware based storage.

A cloud computing technology I was not familiar with however is Amazon Web Services. AWS is primarily a B2B service, happening beyond the reach of the end user and integral to the functionality of their services. AWS provides web-hosting services for a plethora of Fortune 500 companies. Despite the competitive war between Netflix and Amazon, I was surprised to learn that Netflix is in fact hosted on Amazon Web Services! The vast reach of AWS is further explored in this short clip from Patriot Act, Hasan Minhaj’s weekly stand up show which happens to be on Netflix:

The convergence of these technologies paves the way for a litany of potential social, cultural, ethical and of course technological implications. First, the benefits – there are several positive factors that cloud computing has brought into our lives. On a micro level, individuals such as employees can work remotely or students can now work collaboratively on the same document from multiple locations, increasing efficiency. Large companies can benefit from economies of scale, managing consumer accounts and media services in one platform. A potential privacy concern may emerge from this – who truly owns all this data? In the age of frequent privacy violations such as data breaches this raises large societal questions about data and privacy. Essentially, consumers choosing to use this technology may all be at the mercy of the oligopoly of the “big four”!

Practice Makes Perfect? AI Bias and ML Fairness

While the words “Artificial Intelligence” may conjure up alarmist imagery of a dystopian future (as evident in Hollywood movies like Blade Runner, or shows like Westworld), perhaps the real concerns are two-pronged: 1) AI bias and machine learning fairness and 2) The affordances and capability of technology in misleading the public. With the prevalence of surveillance technology applications like Amazon Rekognition, it is now easier than ever for law enforcement and businesses to track and identify individuals. If Alexa and the Echo are Amazon’s ears of surveillance, Rekognition is now the eyes, but can we always trust what they are seeing?

Studies have shown that AI is less able to discern and identify POC, especially women, marginalizing them and potentially putting them in harm’s way as a result of misidentification. The ACLU’s perspective on Rekognition is that, “the rights of immigrants, communities of color, protesters and others will be put at risk if Amazon provides this powerful surveillance system to government agencies”. This technology can be used to target other minority communities as well, due to existing societal or police bias. Human bias can also find its way into the deep learning process as a lot of ML fairness depends on the paradigms of training – which is done by humans, not as many believe, conjured by magic.

With the advent of new media technology, deep fakes are also a rising ethical issue that may have a political impact as well. An early example of this is the viral video “Golden Eagle Snatches Kid” – a humorous, harmless fake. However, this escalates when it depicts people of political significance, espousing polarizing views. A lot of “fake news” that floats around on Facebook, Twitter or other social media platforms have now evolved from Photoshop to video, making it more believable as the viewer has seen it with their own eyes. This can pave the way for ethical and political implications for elections, which may have consequences for entire nations and snowball into having a global impact.

So how do we work towards preventing these ethical violations? Practice makes perfect and machine learning fairness will only further develop with the faces the algorithms practice on. The more they practice, they better they will learn to recognize, which opens up another pandora’s box…what are the ethical implications of where they get the data?!

References:

https://www.wsj.com/articles/deepfake-videos-are-ruining-lives-is-democracy-next-1539595787

https://www.nbcnews.com/tech/internet/facial-recognition-s-dirty-little-secret-millions-online-photos-scraped-n981921 

https://www.perpetuallineup.org/findings/racial-bias

“Hey Siri…how can you hear me?”

In the context of this class, I may have spoken about the studio apartment I share with my three roommates – Siri, Alexa and Google. Fascinated by the capabilities and affordances of each (essentially they all fall under the category of “voice-activated digital assistants”, but each do something slightly different) I came to own all 3. The product stack of these assistants can also operate as home automation hubs, with the capacity to link everything from your lights, to your doorbell and alarm system, bringing to mind an imminent dystopian future as depicted in the “Future Son” commercial, by Geico.

Out of all the devices I use the iPhone and HomePod the most for everyday AI interactions, both of those products run the chatbot or software agent, Siri. The concepts we have learned so far are a toolbox that paves the way for us to to de-blackbox the technology and it’s unobservable layers. Firstly, let’s start with what is visible (other than the product itself): the UI or application, which is the top layer of the internet stack, is the only part that humans can see. Behind this lie several layers that work via speech recognition-NLP, data processes which boomerang back an answer to your request or question, starting with the wake word “Hey Siri!”. So how does the analog to digital, then digital to analog (in terms of alarms, lights etc) conversion work? According to Apple’s Machine Learning Journal, the “Hey Siri” wake word uses a Deep Neural Network (DNN) to convert the analog – or in this case, your voice – to digital.

The voice capabilities for Siri on iPhone, especially after last week’s unit were mostly “de-blackboxed” for me. However, I was curious as to how Siri on my HomePod overcomes the myriad challenges it faces from itself (loud music) and the surrounding environment – noise, television chatter, conversations, etc. How can Siri hear me when I am yelling at it to turn off my alarm from the bathroom (he lives in the living room), while it’s playing my podcast? Apple describes this as a “far-field setting” which works by integrating various multichannel signal processing technology which suppresses or filters noise. Here is a helpful diagram of the process:

The fact that my HomePod is, for the most part, accurately able to decode my requests in different conditions is thanks to the above process. It was helpful to learn and understand the behind the scenes magic instead of just thinking it works! As the machine learning journal article said, “next time you say “Hey Siri” you may think of all that goes on to make responding to that phrase happen, but we hope that it “just works!”

References

Hoy, Matthew B. (2018). “Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants”. Medical Reference Services Quarterly. 37 (1): 81–88.

Siri Team. Hey Siri: An On-device DNN-powered Voice Trigger for Apple’s Personal AssistantApple Machine learning Journal, vol. 1, no. 6, 2017.

Information Gatherers: The Hunt to Communicate

“Man the food-gatherer reappears incongruously as information gatherer” (McLuhan, 1967) (Gleick, pg 7).

The quote above struck me as especially poignant in today’s world of ubiquitous computing. Information is everywhere, enabling communication that bridges perceptions, distances and even foreign languages.

A relevant example in my opinion that may act as a microcosm of some of the theories we explored in the readings is Google Translate, an app that allows you to translate – in real time – one language to another. This is both reflective of and dependent on Shannon’s Theory, as it requires the following:

Info Source > Transmitter > SIGNAL > Receiver > Destination
(Language input) (New language)

It is not, as it may seem, “a room full of bilingual elves” working behind the scenes to convert one language into meaning for the receiver, but rather a microcosm of the manner in which Shannon’s theory works. Much like the diagram Kevin drew for us last week to explain how FaceID works on an Apple iPhone, there is a seemingly obscured process that goes on as the message is transmitted from point A to point B – in this case, from a native speaker of English, for example, to a native speaker of French, in their language, for them to be able to communicate. In this case, the input (the language) is then decoded to reflect the chosen, pre-programmed display. I found this video, explaining how Google Translate works to be quite illuminating:

Therefore, my understanding of it is that the difference between the e-information transmitted and received successfully depends largely on the receiver, circling back to the concepts of entropy in conjunction with the freedom of choice one has in the construction of communication.

Works Cited

• Irvine, Martin. “Introduction to the Technical Theory of Information” Feb. 4, 2019.

• James Gleick,The Information: A History, a Theory, a Flood. (New York, NY: Pantheon, 2011).

• Claude E. Shannon and Warren Weaver, The Mathematical Theory of Communication (Champaign, IL: University of Illinois, 1949).

Categories and Complexities: Machine Learning parameters

Proma Huq & Deborah Oliveros

What is a “good” selfie? What’s a “bad” one? – Under the lens of machine learning in terms of AI, the parameters for determining a good selfie from a bad selfie are debatable. As with any labeling of sets of data, Karpathy is aware of how skewed that categorization might be. What was interesting to us was how both sets of data were very similar to each other. When the author asked, “if I gave you any of these images, could you tell what category it belongs to?” I would not be able to do that in most of the cases (based on his pre-existing concepts of a good or bad selfie). While this may be subjective for the naked human eye, or tainted by biases, for the machine, it’s easier to differentiate them by means of creating a graphical model, since it recognizes the pictures while connecting all the given information to determine category, based on pre-existing parameters.

The following are some observations we had based on Karpathy’s article, in the context of the concepts from Alpaydin’s chapters:

  • Depending on what kind of data we provide and how we categorize that data, the machine’s responses and categories may be positive or negative in a societal context. To reiterate and simplify machine learning: we tell the machine what is “good” and what is “bad” and then the machine learns to differentiate based upon those parameters and subsequent learned patterns.

Case study example: Snapchat & Face Recognition

As we learned from Alpaydin, in the case of Snapchat (and Instagram, but Snapchat did it first!) face recognition for augmented reality filters, the “input” would be the image captured in the selfie, and the “category” or class to be recognized are the faces in the image in order to apply the filters to change them. The learning program therefore needed to learn to match the face image to the filter points in order to transform them to whatever augmented reality filter the user chooses that day.

There are several challenges with this seemingly simple application: Faces are 3D, often people do little videos of them with the filter, not just a photo. Other challenging factors can include accessories (hats, sunglasses), hair styles, facial hair, smiling or frowning, difference in angles and/or lighting etc. So how does the machine navigate all of this?

Facial recognition, in tandem with an “active shape model” learn to then categorize your face and apply the filter based on the “landmarks” of the terrain of your face.

ML and Ethical Issues

When it comes to examples of facial recognition and categorization, the concept of machine learning fairness comes into play. Some examples that come to mind are facial recognition systems mis-gendering African Americans, flagging Asian people in pictures as blinking, or most recently the case of police in the UK using facial recognition on one of the busiest streets in London over the holidays, to detect suspicious behavior and facial expressions, despite the fact that the data provided from criminal records have a long documented history of discrimination and targeting minorities.

–       Interesting fact: the top 100 selfies were all women, but none of them were noticeable women of color, which tells us (based on Karpathy’s grading systems of views and likes) that his first assessment of how to take a good selfie as “be female” should possibly be amended to “be a white female”. I doubt there’s some data determining white women take more selfies or their selfies are liked more than those of women of color. However, based on a very flawed categorization, that’s what the machine picked up on. It’s almost funny when he says, “I was quite concerned for a moment there that my fancy 140-million ConvNet would turn out to be a simple amount-of-skin-texture-counter” because he was worried it would detect the best selfies as the ones showing more skin. However, he fails to address that it resulted in a ‘amount-of-skin-shade-counter’ considering white as the best. Also, how about that cropping suggestion that leaves the girl out of the picture with the cars and then the clearly non-white guy out of the picture with Marilyn Monroe? This illustrates my previous point.

By design, these systems theoretically designed to be objective. That is because of how these convolutions or filters react to different ‘stimulations’ that are so minute they don’t necessarily have to rely on environmental stimulus it to determine category. Conversely, when it all comes together, that objectivity is lost due to human interaction, the categorization of that stimulus into labels. That’s where the ethical questioning has a crucial role because we need categorization, we just need to be aware of how representative of reality that categorization is and what is the possible outcome and impact in decision-making as a result of that categorization in terms of machine learning.

Works Cited

Andrej Karpathy, “What a Deep Neural Network Thinks About Your #selfie,” Andrej Karpathy Blog (blog), October 25, 2015, http://karpathy.github.io/2015/10/25/selfie/.

Ethem Alpaydin, Machine Learning: The New AI. Cambridge, MA: The MIT Press, 2016.

Aubrey, A. “How Do Snapchat Filters Work?” Dev.To. 2018.

AI’s PR crisis: Reframing the Narrative

This week’s readings provided a conceptual parallel to last week’s assigned Guardian article by Naughton, which delved deeper into the pernicious effects of the manner in which the media presents AI to members of the greater, more credulous public. The lack of responsible framing and media narratives plays a huge role in the de-blackboxing of these AI systems and shedding light on the behind the scenes action, if you will, of machine learning. In accordance with Gerbner’s Theory of Cultivation (Gerbner, 1976), entertainment media also has an affect on this, as audiences often cement their sense of reality based on the content they consume, without having prior knowledge or insight on the way these systems work.

The often sensationalized depictions further perpetuate the concept of “sociotechnical blindness” (Johnson & Verdichio, 2016) in which most people are unaware of the key role played by humans in the design machine learning of AI systems. Johnson & Verdichio suggest circumventing the “semantic gap” that has been created by in referring to AI entities as autonomous or the suggestion that they, by themselves, are “intelligent”. The example of the “autonomous” Roomba was especially salient here, as the article sheds light on its workings by elaborating on the following simple format:

Environmental cues + Software (or internal programming)
= Movement across the room

While the Roomba may be seen as “unpredictable”, the limits of its behavior are known and visible to the naked eye, therefore easier to de-blackbox. Johnson & Verdichio explain this by saying, “we know the Roomba will not climb up the walls or fly because we can see that it doesn’t have the mechanical parts necessary for such behavior”. The point is, that AI has somewhat of a PR image crisis that needs to be reframed – it is often conceptualized as “the big unknown” whereas further illuminating the connection between AI and society in mass media will aid in greater understanding of the narrative.

Examples of machine learning and inductive bias can also be applied to concepts explored in the other readings. According to Alpaydin, “the aim of machine learning is rarely to replicate the training data but the correct prediction of new cases” (Alpaydin, pg 39). However, correlation does not always prove to be causation, so I am interested in delving deeper into instances where inductive bias works against machine learning, as evident in the Fibonacci sequence example – correlation does not always prove to be causation!

AI and Convenience

I live in a studio apartment, but in my small space, I have three very helpful roommates; Siri, Alexa and Google. Each morning, Siri wakes me up with an alarm (and then two more if I’m being honest), and plays my music on the HomePod. Alexa runs me through the weather and the day’s news. Google tells me about my commute – he’s the most reticent of the group, overshadowed by his showier friends, but still very helpful when it comes to my passion for cooking. According to Alpaydin, the term for my usage of all these devices is ubiquitous computing – “using a lot of computers for all sorts of purposes all the time without explicitly calling them computers” (Alpaydin, 2016). They each serve a different function, despite often overlapping, but all ultimately adding convenience to our modern, interconnected society.

My tech-reliant morning routine is a microcosm of Alpaydin’s hypothesis, that we create space in our lives for the convenience of technology driven by artificial intelligence, simply due to the fact that “..we want to have products and services specialized for us. We want our needs to be understood and our interests to be predicted” (Alpaydin, 2016). Am I aware that all of this data is being stored, that there is not one, but three devices in my home that listen to my every word laying in wait for the “wake word” (“Hey Siri”, “Alexa…”, “Hey Google”)? Yes, but despite concerns of my privacy potentially being violated, or that I am too dependent on these technologies, it is now shockingly easy for these big corporations to be let into our homes to collect data, when we as a society now prioritize convenience over all.

These ethical issues concerning privacy and surveillance, in tandem with the growth of AI and data mining practices, are cropping up at a time when machine learning is already having “a measurable impact on most of us” (Naughton, 2019). At present, we already see the advent of “programs that learn to recognize people from their faces… with promises to do more in the future” (Alpaydin, 2016). Alpaydin further elaborates on this, differentiating between writing programs and collecting data. An example of a potential machine learning algorithm in action is evident in the recent “Ten Year Challenge” that is rampant on social media, primarily on Facebook. The challenge is a seemingly harmless way to do a before and after, a #TransformationTuesday in viral meme form. However, the data that this trend is leaving in its wake can be an example of machine learning within the bounds of a specific data set – in this case 10 years. “Supporters of facial recognition technologies said they can be indispensable for catching criminals…But critics warned that they can enable mass surveillance or have unintended effects that we can’t yet fully fathom” (Fortin, 2019). This ties back to Noughton’s point, that the “soft” media coverage of artificial intelligence drives a media narrative of AI as a solution to all our problems, without focusing on potential harmful effects. In Noughton’s words, this narrative is “explicitly designed to make sure that societies don’t twig this until it’s too late to do anything about it” – similar to where most of us find ourselves at present, highly dependent on technology.

Ultimately, an interesting facet to these introductory readings can be reflected in a statement from the essay, “Do Artifacts Have Politics?” (Winner, L. 1986), as follows: “in our times, people are often willing to make drastic changes in the way they live to accommodate technological innovation, while at the same time resisting similar kinds of changes justified on political grounds.” Despite being a dated article, the author’s foresight and message are still salient today. In the context of our class, would we give up the convenience that artificial intelligence brings to our modern lives, if say for example one or more of these technologies were not made ethically? Perhaps not, as we are over-reliant on technology. But how far would we give up our privacy for the sake of convenience?

References

Alpaydin, E. (2016). Machine learning: the new AI. Cambridge, MA: MIT Press.

Fortin, J. (2019). Are ‘10-Year Challenge’ Photos a Boon to Facebook’s Facial Recognition Technology?. NYTimes.com. Retrieved from: https://www.nytimes.com/2019/01/19/technology/facebook-ten-year-challenge.html

Naughton, J. (2019). ‘Don’t Believe the Hype: The Media Are Unwittingly Selling Us an AI Fantasy’ The Guardian, January 13, 2019.

Winner, L. (1986). ‘Do Artifacts Have Politics?’ Chicago, IL: University of Chicago Press.