Author Archives: Beiyuan Gu

Deblackbox Facebook News Feed Algorithm as A System for Attention Manipulation

Abstract

With approximately two billion monthly active users, Facebook’s News Feed is an emerging, influential information platform in our social life. This paper briefly introduces the history of Facebook News Feed algorithm and then links the algorithm to the attention economy. Machine learning, as a powerful tool of data analysis, is utilized by Facebook to optimize its ability to attract people’s attention. By unpacking Facebook News Feed algorithm, this paper is intended to illustrate the ways in which Facebook News Feed manipulates users’ attention.

Introduction

Facebook, mainly known as a social media and social networking service company, is also an important information platform for some users. This feature is embodied in Facebook News Feed, a place where you can see updates, news, and videos from the various publishers including families, friends, and media agencies. News Feed, according to Facebook, shows you the stories that are most relevant to you. Here comes a question: how does Facebook define “relevance”?

  • A Brief History of Facebook News Feed Algorithm

The top secret behind “relevance” is the News Feed algorithm. It was developed by Facebook in 2006 and has gone through tens of updates over these years. In September 2006, Facebook officially launched News Feed. Since then, Facebook News Feed algorithm has undergone several changes. In 2017, the “Like” button feature was added, which made Facebook the first platform to practice an algorithm-based news feed. Prior to the addition of the “Like” button, the only way for users to interact with the other users was to comment on a status or post.

In October 2009, Facebook took a bold step. It introduced a new type of sorting order to the algorithm. It changed the whole picture of News Feed by replacing the original chronological default with a popularity-oriented algorithm. Six years later, Facebook announced other new changes to the News Feed ranking algorithm. For example, the algorithm emphasizes each user’s most 50 interactions on the network in determining what they see in their News Feed.

In 2015, Facebook expanded the sources of posts pushed to users’ News Feed, which means users would see posts from other users not on their “Friends” lists. In 2016, Facebook launched Audience Optimization tool, allowing publishers to reach a specific audience based on interests, demographics, and geographic location. In the same year, Facebook unveiled the elements that are given more weight in computation: user interest in the creator, post performance among other users, previous content performance of the creator, type of post the user prefers, and how recent the post is (Lua, 2019). In addition, when you click on a post, Facebook measures how much time you spend on the post, even if don’t respond to it. In March 2017, Facebook revamped the News Feed algorithm and decided to weigh “Reactions” more than “Likes”. In 2019, Facebook introduced a new metric called “Click-Gap” that analyzes sites and posts on Facebook compared to the internet as a whole. If a post is perceived as only popular on Facebook and nowhere else online, its reach will be limited in the News Feed.

In the beginning, the methodology behind the News Feed was the Edgerank Algorithm, an algorithm that is used to determine the order of posts. It is an adaptation of PageRank technique, a ranking algorithm used by Google’s search engine. What differentiates these two algorithms is the context they are in. Edgerank is based on social network, while PageRank is mainly used to order web pages.

Driven by machine learning, the Facebook algorithm evolves and changes as a result of an ever-increasing set of data (Introna, 2016). As of 2011, Facebook has stopped using the EdgeRank system and uses a machine learning algorithm. The machine learning algorithm takes more than 100,000 factors into account (McGee, 2013), making it more accurate in predicting what users want to see in their News Feed.

  • Attention Economy

Attention economy, first theorized by Simon, focuses on how people’s limited attention is allocated among content (Simon, Deutsch & Shubik, 1971). It is a result of the rise of the attention industry that eventually came into reality in the past century. An overwhelming amount of information stimuli compete for people’s cognitive resources, giving rise to attention economy (Shapiro & Varian, 2007). Attention economy can be traced back to the nineteenth century when the first newspaper fully dependent on advertisers was created in New York. However, the business model that converts attention into revenue had not been fully realized until the twentieth century. As soon as the digital era has come, communication technologies provide everyone with a loudspeaker, allowing content to be distributed worldwide without any effort.

At present, the concept of attention economy has invaded into every facet of our lives. Consumers can be affected in many aspects, such as what to think and buy, the outcome of elections, and political discourse (Huberman, 2017). Tim Wu (2017), the author of The Attention Merchant,said in his book that the attention industry has asked and gained more and more of our waking moments in exchange for new conveniences and diversions, creating a grand bargain that has transformed our lives.

In the era of attention economy, big tech companies like Facebook take advantage of their high technology to hold people’s attention on their platform and then make money from advertisers. The engineers continuously adjust the weights of the algorithm to keep pace with Facebook’s business model and maximize revenue.

Machine Learning and Facebook News Feed Algorithm

Facebook has a large data set featuring 100 billion ratings, more than a billion users, and millions of items (Kabiljo & Ilic, 2017). With the rapid expansion of data set, machine learning is increasingly essential to Facebook algorithm because machine learning especially features the expertise of dealing with an incredible amount of data. Machine learning is a method of data analysis that aims to construct a program that fits the given data (Alpaydin, 2016). It is an important branch of artificial intelligence.

System map of FBLearner

Driven by the business model, Facebook News Feed algorithm is designed and optimized to activate engagement. On Facebook, users’ engagement can be determined by many factors, such as view, click, like, comment, and share. By treating each factor as a metric, Facebook can treat the News Feed as a machine learning problem, where the inputs are various content on Facebook, and the output is the probability of an engagement event. According to an official document released by Facebook, general models are trained to determine various user and environmental factors that ultimately determine the rank order of content. When a user opens Facebook, the model generates a personalized set of the most relevant posts, images, and other content to display from thousands of publishers, as well as the best ordering of the chosen content (Hazelwood et al., 2018). The ML models implanted in prediction and ranking algorithm are illustrated in the following paragraphs.

  • Ranking Algorithm

The Ranking algorithm is a process that ranks all available posts that can display on a user’s News Feed based on how likely the user will respond positively. To be specific, the algorithm calculates the ranking score of an event based on two factors: probability and value.

Probability presents the chance that users will react to the story as each event suggests; value represents the weight given to an event.

Ranking Score Calculation Table. Retrieved from https://learning.oreilly.com/videos/the-artificial-intelligence/9781492025979/9781492025979-video320235

Here is an example of the ranking model. There is an 11% chance that users will click on the post and 2.2% probability that users will like it. Also, there is 0.099% chance that users will hide the story. Each event is given a weight based on its importance. A weighted ranking score will be generated as the final ranking score, which is 0.2277. The posts in the inventory will be ranked according to their final score.

All the probability column is calculated by machine learning models and the value column is based on the user study and product decision. Since interaction is more valued in the algorithm, the weights of “Like” and “Comment” are much higher than “Click”, while the weight of “Hide” is negative.

In 2016, Facebook stated the “core values” it uses when determining what shows up in a user’s feed. Facebook emphasized that the posts from friends and family will be on the top of one’s News Feed, followed by the posts that “inform” and posts that “entertain.” Other core values include posts that represent all ideas and posts with “authentic communication.” Facebook also claimed that it emphasized the user’s ability to hide posts, and the user’s ability to prioritize their own feed with the “See First” function.

  • Decision Tree Models

Decision trees are commonly used as a predictive model, mapping possible outcomes of a series of related choices. The decision tree is one of the oldest methods in machine learning. Also, it is one of the most common predictive modeling approaches used in machine learning due to its non-linearity and fast evaluation (Ilic & Kuvshynov, 2017).

Decision trees are a type of supervised machine learning in which the input and the corresponding output are previously labelled. The decision tree is used to find the most similar training instances by a sequence of tests on different input attributes. It is a flowchart-like structure in which each decision node represents a class label, each branch represents the outcome that leads to those class labels (Alpaydin, 2016). Each decision node applies a splitting test on an attribute and one of the branches is taken based on the outcome. The paths from root to leaf represent classification rules. The leaves represent the decisions or final outcomes. When the flow reaches leaves, the search stops. Through a set of procedures, we can find the most similar training instances and get the probability of each instance.

Figure 3. An example: the probability of clicking on a notification. Retrieved from https://code.fb.com/ml-applications/evaluating-boosted-decision-trees-for-billions-of-users/

Decision tree models are based on the idea that a user’s future behavior is generally consistent with his or her past actions. It is a powerful model in predicting and it is currently implanted in Facebook News Feed Algorithm. The figure above shows an example of a simple decision tree that generates the probability of clicking on a notification. This decision tree has the following attributes: 1) the number of clicks on notifications from a specific user today; 2) the number of likes that the story from the notification has; 3) the total number of notification clicks from this specific user. The input data goes through the decision nodes and the values of data input are checked according to the parameters. Eventually, we can get the probability of clicking on a notification. With the decision tree, we can predict the probability of the user clicking on the other notifications in the future. Decision trees can also be used in predicting the probability of clicking on ads in the News Feed.

  • Collaborative Filtering

Collaborative filtering (CF) is a recommendation system that helps people discover items that are most relevant to them. It is based on the idea that the best recommendations come from people who share similar interests (Kabiljo& Ilic, 2017). Collaborative filtering is commonly implanted in e-commerce applications and online news aggregators. Facebook has a Collaborative Filtering recommender system that is used in many areas of the site.

There are three types of CF: User-based collaborative filtering, Item-based collaborative filtering, and Model-based collaborative filtering. The difference between these three CF recommendation systems is nuanced. User-based collaborative filtering firstly finds neighbors who share similar interests with the targeted user by comparing the posts they liked, then recommends posts to the targeted user based on the preferences of the neighbors. Item-based collaborative filtering calculates the similarity score of two posts based on all users’ reaction to them, and then recommend to the targeted user the posts that fit his preference. Model-based collaborative filtering trains a model with input data that is extracted from targeted user’s prior reactions and, later, predicts his or her future actions with the built model. Collaborative filtering-based recommendation is different from Content-based Recommendation because Collaborative Filtering recommendation systems connect a post with those who liked the post, instead of just focusing on the post itself.

Figure 4. Facebook Collaborative Filtering. Retrieved from https://code.fb.com/core-data/recommending-items-to-more-than-a-billion-people/

In the example, Facebook uses Apache Giraph to analyze the social graph formed by users and their connections. Apache Giraph is an iterative graph processing system built for big data. It is able to break down the complicated structure and find the most relevant posts to the targeted user based on the results generated by Collaborative Filtering.

Attention Manipulation

Attention manipulation is a strategic action to influence how a user allocates his or her attention. When a consumer’s attention is limited, her ultimate purchasing decisions may hinge on what she pays attention to, which in turn incentivizes firms to engage in attention manipulation (Persson, 2017).  In order to achieve the goal of converting users’ attention to revenue, News Feed changes the way in which people receive information, which also influences what they see about the world around them. Algorithms are not just abstract computational processes, they also have the power to enact material realities by shaping social life to various degrees (Beer, 2013; Kitchin & Dodge, 2011).

As a result of attention economy, News Feed brings problems to users in three aspects. First of all, this manipulation is without consent. They didn’t ask users whether they were willing to participate in it. Users are automatically giving their consent to these kinds of attention manipulation when they sign up for Facebook and click the button “I agree on the user agreement”. Facebook does not provide users the option of not using News Feed algorithm, instead, it assumes that users are comfortable with it. The lack of consent deprives a rightful choice of users.

The second problem is the loss of agency. People are getting accustomed to being fed with the recommended messages, which is a big concern. The News Feed algorithm is trying to make decisions on users’ behalf. After interviewing 25 Facebook users, researchers found that several participants expressed their unease and discomfort about their perception of Facebook algorithm controlling what they see and do not get to say (Bucher, 2017). The feeling of being controlled always comes along with the loss of agency. The algorithm, instead of users themselves, decides what information they receive on Facebook, which might jeopardize people’s ability to think about the most relevant information they need.

Another problem resulting from the News Feed as an approach of attention manipulation is emotional contagion. Big tech companies like Facebook are trying to lead people to experience emotions without their awareness. According to a new study by social scientists at Cornell, the University of California, San Francisco (UCSF), and Facebook, emotions can spread among users of online social networks (Segelken & Shackford, 2014). People with lesser judgment are more vulnerable to these kinds of attention manipulation.

Conclusion

Since the inception of Facebook News Feed algorithm, it has undergone great changes to be consistent with Facebook’s business model. Although Facebook opened the curtains for the algorithm, it is still difficult for ordinary users to learn about hundred thousands of weights in the algorithm. Due to the incredible scale of Facebook’s data set, the algorithm nowadays is built with machine learning models, which is featured by the ability to handle big data.Unpacking the Ranking algorithm, Decision tree model, and Collaborative filtering helps to get deeper into how the algorithms work.

Facebook News Feed has become the world’s biggest information distribution platform. By now, there have been a variety of types of content on News Feed: text, photo, video, event, and groups. The diversity requires more complexity of the ranking algorithm; for outsiders, it brings a greater challenge to debalckbox the algorithm.

The real problem is that there is much less accessible information concerning the parameters. As the News Feed algorithm starts to supplant traditional editorial story selection, it is hard for users to get into its story curation system that is parallel to our knowledge of the principles that the editors used to refer to (DeVito, 2017). Education that teaches how to get rid of attention manipulation has not been in place for now.

But it is time to act. As Carl Newport (2019) suggests, you should transform the way you think about the different flavors of one-click approval indicators that populate the social media universe. The first rule is to learn about how it works and never fall into the trap.

 

References

Alpaydin, E. (2016). Machine learning. Cambridge, Massachusetts ; London, England: The MIT Press.

DeVito, M. A. (2017). From editors to algorithms.Digital Journalism, 5(6), 753-773. doi:10.1080/21670811.2016.1178592

Beer, D. (2013). Popular culture and new media. Basingstoke [u.a.]: Palgrave Macmillan.

Bucher, T. (2017). The algorithmic imaginary: Exploring the ordinary affects of facebook algorithms.Information, Communication & Society, 20(1), 30-44. doi:10.1080/1369118X.2016.1154086

Facebook newsfeed algorithm history. Retrieved from https://wallaroomedia.com/facebook-newsfeed-algorithm-history/

Hazelwood, K., Bird, S., Brooks, D., Chintala, S., Diril, U., Dzhulgakov, D., . . . Wang, X. (Feb 2018). Applied machine learning at facebook: A datacenter infrastructure perspective. Paper presented at the 620-629. doi:10.1109/HPCA.2018.00059 Retrieved from https://ieeexplore.ieee.org/document/8327042

Huberman, B. (2017). Big data and the attention economy.Ubiquity, 2017(December), 1-7. doi:10.1145/3158337

Ilic, A., & Kuvshynov, O. (2017). Evaluating boosted decision trees for billions of users. Retrieved from https://code.fb.com/ml-applications/evaluating-boosted-decision-trees-for-billions-of-users/

Introna, L. D. (2016). Algorithms, governance, and governmentality.Science, Technology, & Human Values, 41(1), 17-49. doi:10.1177/0162243915587360

Kabiljo, M., & Ilic, A. (2015).Recommending items to more than a billion people. Retrieved from https://code.fb.com/core-data/recommending-items-to-more-than-a-billion-people/

Kitchin, R., & Dodge, M. (2014). Code/space(1. MIT Press paperback edition ed.). Cambridge, Mass. [u.a.]: MIT Press.

Lua, A.Decoding the facebook algorithm: A fully up-to-date list of the algorithm factors and changes. Retrieved from https://buffer.com/library/facebook-news-feed-algorithm

McGee, M. (2013). EdgeRank is dead: Facebook’s news feed algorithm now has close to 100K weight factors. Retrieved from https://marketingland.com/edgerank-is-dead-facebooks-news-feed-algorithm-now-has-close-to-100k-weight-factors-55908

Newport, C. (2019). Digital minimalism. UK ; USA ; Canada ; Australia ; India ; New Zealand ; South Africa: Penguin Business.

Persson, P. (2017). Attention manipulation and information overload. Cambridge, MA: National Bureau of Economic Research.

Segelken, H. R., & Shackford, S. (2014). News feed:‘Emotional contagion sweeps Facebook. Cornell Chronical. Retrieved fromhttp://news.cornell.edu/stories/2014/06/news-feed-emotional-contagion-sweeps-facebook

Shapiro, C., & Varian, H. R. (2007). Information rules(Nachdr. ed.). Boston, Mass: Harvard Business School Press.

Simon, H. A., Deutsch, K. W., & Shubik, M. (1971). Designing organizations for an information-rich world., 37-72. Retrieved from http://www.econis.eu/PPNSET?PPN=487583434

Yves Citton, & Barnaby Norman. (2017). The ecology of attention(English edition. ed.). GB: Polity. Retrieved from https://ebookcentral.proquest.com/lib/[SITE_ID]/detail.action?docID=4788164

Wu, T. (2017). The attention merchant

Machine Learning: Clarification is Needed

AI is a trending word both in the academic world and in our daily life, but it still remains to be a huge blackbox that people with no science background can barely understand. Among all the AI technologies, machine learning is a method of data analysis that recognizes patterns and automates analytical model building. Machine learning is a great tool that helps researchers to deal with an incredibly large amount of data. However, Lipton and Steinhardt pointed out some trending problems present in machine learning scholarship. The four problems, including the use of mathiness and the misuse of language, that they focus on in their paper can be pertinent explanations for the misunderstanding in this field. In addition, it is noticeable that some of the papers involve a huge amount of computing resources. These researches are difficult to reproduce and verify, which has the potential to bring about the Matthew effect and the monopoly of the academic research. Questions might be asked: Is it necessary to use so many computing resources for machine learning? And how can we get meaningful results from data? Clarification is needed here in terms of the process of machine learning.

At the beginning of this semester, we read articles that give more detailed information about AI/ML from the technical perspective, which helps us go deeper into the applications that we use every day on our phone or websites. But I have to admit that it is hard for me to understand the whole procedures inside the technologies that we talked about in class. For example, I have not figured out convolutional neural networks in facial recognition technology yet. However, I realized that learning the fundamental design principles of technology equips me with the ability to identify the problems in it. For instance, bias facial recognition probably come from biased data. This brings my attention to the stage of data preparation and allows me to think more about what I can do.

All through this semester, we have discussed major ethical and political issues related to artificial intelligence including biased data, privacy, attention manipulation and the loss of human agency. In order to figure out what is going wrong with one specific technology, the first step is to understand the architecture and algorithm of this blackbox. The more clarification there is, the more we can do.

References

 

 

What Do We See in the Digital Age

Micro-targeting is a technique used by commercial marketers, but it is employed by political parties in political campaigns to track individual voters and identify potential supporters. It would not be possible in a large scale without the development of large-scale database containing data about as many voters as possible. The database tracks voters’ habits as companies do when they analyze consumer behaviors. They collect data about an individual voter based on his or her information that one shares online consciously or unconsciously. Voters’ demographic information along with other hundreds of variables are gathered together to be analyzed by data scientists. The outcomes are used by political parties to “better” communicate with voters through direct mails, phone calls, emails, and social media. In this way, political parties can have a significant impact on the voters.

Big data and ML algorithm have penetrated our daily life in the way that one cannot even notice it. It reminds me of agenda-setting theory. It was first raised by Lippman and then formally developed by McCombs and Shawin a study on the 1968 American presidential election. Agenda-setting theory describes the “ability (of the news media) to influence the importance placed on the topics of the public agenda”. It shows the power of media agencies that although they cannot decide how you think, but they can decide what you think about. When agenda setting theory meets big data, the whole landscape has changed significantly.As soon as we talk, share, and purchase online, our information will be recorded instantly. The huge amount of data about one specific person is connected as a societal system and through the analysis of the societal system, they can roughly know who you are and what you like. After that, they feed you what you may like and get profit from your attention. As Bernardo A. Huberman said, attention is what we value most in the digital age. Focus is always finite and ephemeral. So that’s why companies try their best to obtain and hold consumers’ attention through various methods including lurid headlines, targeted advertising and etc.

It is easier than before to get information about the world outside of mine, but it is also getting hard for me to see the whole picture of this world in the digital. I have to be skeptical about what I see.

 

References

 

 

Facial Recognition and Cloud Computing

As we know, facial recognition is based on artificial intelligence and machine learning. Machine learning involves recognizing patterns from a great number of existing data by a set algorithm until it is capable of predicting new data. In machine learning, a Convolutional Neural Network (CNN) is a class of deep artificial neural networks that has successfully been applied to analyzing visual imagery. Facial recognition is one of its applications. To enhance the capability of this technology, cloud-based facial recognition system has emerged.

According to National Institute of Standards and Technology (NIST), cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. It has five desirable characteristics such as on-demand self-service, broad network access, resource pooling and rapid elasticity. In a facial recognition system implemented in cloud infrastructure, the facial recognition engine is located in the cloud, not in the local processing unit (used in the traditional method).

Moving both the facial recognition engine and facial recognition database onto the cloud helps to render a seamless system. This model is employed by several commercial applications to carry out security check. The query face is captured by the user and transmitted to the cloud server for conducting authentication with the gallery faces of the facial recognition database located on the cloud.

The new faces are enrolled through the user interface, or say, user application. In order to carry out the task of Face Tagging, the user interface communicates with the cloud-based web API (application programming interface) that contains the facial recognition engine and a database of faces. The user interface enrolls new faces and encodes the face image, which is then sent to the cloud-based API that processes the image through the facial recognition engine. The facial recognition engine runs a pre-defined facial recognition algorithm. The query face from the user interface is then compared by the facial recognition engine against a gallery of images.  After a conclusive match is determined, the query face will be classified as belonging to a particular individual or not. Then, the result will be sent back to the user interface.

Cloud-based facial recognition systems bring about various benefits coming from inherent characteristics. They have the advantage of real-time processing. On demand self-service allows customers to quickly procure and access the services they want. Moreover, cloud computing allows the system to become broadly accessible in the sense that cloud services provide the capability for quick and reliable integration with other applications. In addition, cloud services facilitate high scalability in order to ensure that the system can be adapted to a wide user base.

References

Nayan B. Ruparelia, Cloud Computing (Cambridge, MA: MIT Press, 2016). Selections. Read chapters Introduction, 1-3.

Derrick Roundtree and Ileana Castrillo. The Basics of Cloud Computing: Understanding the Fundamentals of Cloud Computing in Theory and Practice. Amsterdam; Boston: Syngress / Elsevier, 2014. Excerpts from Introduction and Chap. 2.

Vinay, A., Shekhar, V. S., Rituparna, J., Aggrawal, T., Murthy, K. N. B., & Natarajan, S. (2015). Cloud based big data analytics framework for face recognition in social networks using machine learning. Procedia Computer Science, 50, 623-630. doi:10.1016/j.procs.2015.04.095

Can AI Be Artist?

The topic that I am interested most in this week is job displacement caused by artificial intelligence. Many people are concerned about it because AI is replacing human in various fields. To some people’s surprise, it happens not only in those industries requiring merely repetitive work, it is also not uncommon to be seen even in those industries that emphasizes creativity. I once had experience with an AI application which can create posters by its algorithm. Engineers feed AI with a great number of images and slogans to train it, and then it can generate posters based on your needs. Although it sounds fascinating, but the outcome is nowhere near creativity. The basic principle of this AI application is to put different elements together in a poster. It is fine but not organic.

Last year, I attended a speech given by Kai-Fu Lee, CEO of 创新工场, former President of Google China and Author of AI Superpowers. He measures jobs in two aspects—compassion and creativity or strategy—and then divides them into four areas by the two dimensions (see picture). The more compassion and creativity one job needs, the more difficult it can be replaced by AI. So according to Lee, CEO is the least possible to be taken by AI. But in his prediction, artist is comparably less competitive than scientist. I have to disagree with him.

Let’s take painting as an example. Painting skill is not the most important factor when we comment a painting. What really matters is the meaning it conveys or the way it is socially embedded. AI cannot understand human values and cultures as the way that humans do, at least for now.

At present, we shouldn’t be concerned about whether AI will replace the jobs that ask for creativity. However, there is no denying that AI can enhance creative work. For instance, AI can help with the creation of music and script writing.  As the graph below shows, AI will completely replace human force in task-oriented areas in the future. Also, AI can aid people to make better decisions if it is employed in a proper way.

 

Reference:

Janna Anderson, Lee Rainie, and Alex Luchsinger, “Artificial Intelligence and the Future of Humans,” Pew Research Center: Internet and Technology, December 10, 2018.

https://www.technologyreview.com/s/612913/a-philosopher-argues-that-an-ai-can-never-be-an-artist/

https://www.forbes.com/sites/solrogers/2018/12/21/does-ai-enhance-creativity/#5ee9d37017d0

Speech and PowerPoints by Kai-Fu Lee.

Deblackboxing Siri as a virtual assistant

Virtual assistant is an emerging topic in artificial intelligence field. It can perform tasks for its users based on verbal commands. It is normally implanted in digital devices like smart phones, personal computers, and smart speakers. Apple’s Siri is one of them. Siri is voice-activated by personalized “Hey, Siri” and then it provides information or performs tasks as commanded. The procedure is composed of various layers and each layer is responsible for specific task or tasks. It would be clearer to deblackbox it by layers

According to Apple’s Patent Application for “An intelligent automated assistant system”, a system for operating an intelligent automated assistant includes

  • one or more processors that start with the Detector

The Deep Neural Network (DNN) is used to detect “Hey Siri.” First, the microphone turns your voice into a stream of waveform samples, and then these waveform samples are converted to a sequence of frames through spectrum analysis. DNN converts each of these acoustic patterns into a probability distribution. “Hey Siri” can be detected if the outputs of the acoustic model fit the right sequence for the target phrase. After Siri is activated, it can perform tasks as requested.

  • memory storing instructions that cause the processors to perform operations, including
  • obtaining a text string from a speech input received from a user

For example, if I want my iPhone to call my Mom while I am driving, I would say “Hey, Siri” to activate Siri, and then say “call Mom” to give a command. Through speech recognition, my speech will be turned into a text string than can be processed by the processor.

  • interpreting the received text string to derive a representation of user intent

Through NLP, the processor interprets “call Mom” as an instruction to dial a person who is remarked as “Mom” in the contacts.

  • identifying at least one domain, a task, and at least one parameter for the task, based at least in part on the representation of user intent

After interpretation, this layer links my instruction to “Phone” domain and opens “Phone” function.

  • performing the identified task

My iPhone calls “Mom” using the phone number I saved in the contacts.

  • provide an output to the user, wherein the output is related to the performance of the task.

The procedure above is a simplified version of how Siri receive and perform our verbal instructions. It is noticeable that there are nested complicated layers implanted in each layer which are waiting to be deblackboxed.

 

References

https://patents.google.com/patent/AU2011205426B2/en

Apple Machine Learning Journal (1/9, April 2018): “Personalized ‘Hey Siri’.

Deblackboxing Deep Learning Machine Translation

Major AI players in the domain such as Google and Facebook are moving forward to deep learning. Deep learning approaches are especially promising because they learn, and also, they are not fixed for any specific task. Deep learning architectures such as deep neural networks have been applied to fields including natural language processing.

A deep learning machine translation system is simply composed of an “encoder” and a “decoder”. The encoder converts sequence to vector, a kind of language that computer can understand. And the decoder converts vector to sequence, which is the output perceivable by users. The encoder and decoder system are called long short-term memory recurrent neural networks (LSTM-RNN).  The advantage of LSTM-RNN is that it is good at dealing with long sentences. The way that LSTM-RNN deals with complexity of sentences is to associate a word with other words in the input. The main character of deep learning translation approach is that it is based on the hypothesis that words appearing in similar contexts may have a similar meaning. The system thus tries to identify and group words appearing in similar translational contexts in what is called “word embeddings” (Poibeau, 2017). In other word, this approach understands a word by embedding it into the context, which enhances the accuracy of translation.

I would like to use English-Chinese translation as an example. Translating Chinese can often be tricky because it has a different alphabet system with grammar rules and one word can have several different meanings and pronunciations. To solve the problem, Google neural machine translation is relying on eight-layer LSTM-RNNs to have a more accurate translation in terms of the context.

Also, there is an interesting term called “attention mechanism” that is used in deep learning machine translation. By learning a large number of data, this model learns to decide which parts to focus its attention on while generating from one language to another. After this step, this mechanism helps to align input with output. In other word, it helps to make sure that the output represents the major meaning of the input.  However, not every language has the same sequence as English does and this raises the difficulty of alignment. English-Japanese translation can be good example here since Japanese has a quite different syntax system.

Deep learning is a good tool for translation, but it is not perfect. It can still make significant errors that a human translator would never make, like mistranslating proper names or rare terms, and translating sentences in isolation rather than considering the context of the paragraph or page. So there is still a long way to go.

 

References

Ethem Alpaydin,  Machine Learning: The New AI. Cambridge, MA: The MIT Press, 2016.

Thierry Poibeau, Machine Translation (Cambridge, MA: MIT Press, 2017).

How Google Translate Works: The Machine Learning Algorithm Explained (Code Emporium). Video.

 

Daily experience with two data systems

Unicode Emoji

I believe most of you have had the experience of sending emojis when chatting with friends which are helpful to better express your feelings, but sometimes your friends receive the emojis that are totally different from what you think they are seeing. For example, if you send a neutral face, faces will vary from platform to platform. As shown in the screenshot, the presentation of a neutral face is significantly different in Android, Microsoft, Apple and Samsung emoji systems. In some extreme cases, your friends see nothing but a bunch of unreadable codes like “😔.

How do emojis get lost in translation? Behind the emojis you see on your screens is the Unicode standard. It is a way of representing the written characters of any language by specifying a code range for a language family and standard bytecode definitions for each character in the language (Irvine, 2019). Unicode sets the basic emoji symbols that are available, then Apple, Google, Microsoft, and Samsung draw their own interpretation. That’s why a neutral face looks different on an Android phone than it does on an iPhone.

Database management system

A database system has four components: users, the database application, the database management system (DBMS), and the database. As an important part of a database system, DBMS, a computer program, is used to create, process, and administer the database (Kroenke, 2017).

In a DBMS environment, there are three types of users: application programmers, database administrators, and end users. The application programmers write programs in various programming languages to interact with databases. Database administrators take responsibility of managing the entire DBMS system. The end users interact with the database management system by conducting operations on database like retrieving, updating, deleting, and so on.

DBMS is highly applicable in our daily life. For example, in universities, DBMS is used to manage student information, course registrations, colleges and grades. University employs a database application to keep track of things, so that staff can easily retrieve and update student information on her computer with a software. Application programs read or modify database data by sending SQL statements to the DBMS. The DBMS receives requests and translates those requests into actions on the database (Kroenke, 2017). Then the database that stores a collection of related tables (like Student, Courses, Department, and Deposits) operate actions and send back the data that a university staff needs.

References

Irvine, “Introduction to Data Concepts and Database Systems.”

John D. Kelleher and Brendan Tierney, Data Science (Cambridge, Massachusetts: The MIT Press, 2018).

David M. Kroenke et al., Database Concepts, 8th ed. (New York: Pearson, 2017). Excerpt.

What is DBMS? https://www.guru99.com/what-is-dbms.html

An Application of Information Theory in Music

This week’s reading introduced Shannon’s inmformation theory. What’s fascinating in his argument is that information is independent from meanings. He hold the idea that information can be measured and standardized. Information theory allows us to have a deeper understanding of information and data in a fundamental way.

In his paper, Shannon argued that “the fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.” (Shannon, 1948) For example, music can be thought of as the transmission of information from one point to another. To put it in a communication system, the sound of music is a message and an encoder generates a distinct signal for the message. Signals go through a channel that connects transmitter and receiver. A decoder on the receiver end converts the signals back into sound waves that we can perceive. 

According to Shannon, “information is entropy.” Entropy is a measure of disorder or uncertainty about the state of a system. The more disordered a set of states is, the higher the entropy. Shannon considered entropy to be the measure of the inherent information in a source (Gleick, 2011). Denning also pointed out that Information is existing as physically observable patterns. Based on that, Febres and Jaffé found a way to classify different musical genres automatically.

Febres and Jaffé solved the music classification by using the entropy of MIDI files. A MIDI file is a digital representation of a piece of music that can be read by a wide variety of computers, music players and electronic instruments. Each file contains information about a piece of music’s pitch and velocity, volume, vibrato, and so on.  This enables music to be reproduced accurately from one point to another. In fact, a MIDI file is composed of an ordered series of 0s and 1s, which allows them to compress each set of symbols into the minimum number necessary to generate the original music. After that, they measured the entropy associated with each piece of music based on the fundamental set. They eventually found that music from the same genre shared similar values for second order entropy. This case is an application of information theory, and it is really inspiring that information theory has the potential be applied into many other fields.

 

References

Peter J. Denning and Craig H. Martell. Great Principles of Computing, Chap. 3, “Information.”

Claude E. Shannon and Warren Weaver, The Mathematical Theory of Communication (Champaign, IL: University of Illinois, 1949).

James Gleick, The Information: A History, a Theory, a Flood. (New York, NY: Pantheon, 2011).

Martin Irvine, “Introduction to the Technical Theory of Information

Musical Genres Classified Using the Entropy of MIDI Files, MIT Technology Review https://www.technologyreview.com/s/542506/musical-genres-classified-using-the-entropy-of-midi-files/

Understand biased AI from a sociotechnical perspective

In October 2018, Reuters revealed that Amazon built a hiring tool which was found biased against women. The author of the article Amazon scraps secret AI recruiting tool explained, for example, that if the word “women’s” and certain women’s colleges appeared in a candidate’s resume, they were ranked lower. This case shows that artificial intelligence is not neutral and machine learning has limitations.

The first question I want to ask is what led to its bias. Did Amazon’s hiring tool learn it by itself? To answer these questions, we need to figure out how it works at first. According to the report, the hiring tool employed machine learning to give job candidates scores ranging from one to five stars.  As we know from Alpaydin’s work, machine learning techniques find patterns in a large number of data. The hiring tool here is an application of machine learning. It was trained by analyzing the resumes that were submitted to the company over a 10-year period. The goal of the hiring AI is to find patterns in those resumes through machine learning and then calcuate the qualification of a candidate based on the observed patterns.

So one can say that Amazon’s hiring tool taught itself that male candidates were more qualified for tech-related jobs. However, the real problem is beyond that simple conclusion. Johnson and Verdicchio argued that AI systems should be thought of as sociotechnical ensembles, which are combinations of artefacts, human behavior, social arrangements, and meaning. Any computational artefact has to be embedded into some context in which there are human beings that work with the artefact to accomplish tasks. They warned people that we cannot solve sociotechnical issues by only focusing on the technical part of problems. In this case, the outdated resume materials are the context. Most of the resumes came from male candidates. The existence of gender imbalance during the past decade had a significant impact on the results of machine learning. That’s what makes AI biased. It reflected the limitations of artificial intelligence. Therefore, the hiring AI can only be used as a supplementary tool. To solve this problem, sociotechnical approach is needed. For example, people from diverse areas should work together to lessen the existing biased variables in the original data. But we need to keep in mind that artificial intelligence is not necessarily neutral. It is still a long way to go.

 

References

Johnson, D., & Verdicchio, M. (2017). Reframing AI discourse. Minds and Machines, 27(4), 575-590. doi:10.1007/s11023-017-9417-6

Alpaydin, E. (2016). Machine learning: the new AI. MIT Press.

Amazon scraps secret AI recruiting tool that showed bias against women https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G