Category Archives: Final Project

AI Algorithms Interpretation- Possible or Not

Jillian Wu

How to interpret algorithms is a rising problem. Governments, organizations, and individuals are all involved in it. This article is motivated by GDPR and attempts to discuss if all algorithms are explainable and understandable. It will introduce different methods to interpret AI programs and discuss the black box that cannot be explained in different algorithms.  


In recent years, AI has developed extremely quickly. Combining with advanced applications in Big Data (BD), Computer Vision (CV), Machine Learning (ML), and Deep Learning (DL), AI/algorithms have made machines/computers capable. It can drive cars, detect faces, interact with humans. However, for the high speed, there are lots of problems with algorithms. There are cases about algorithms related to financial frauds, racism, and violating privacies all over the world. In China, it is reported that users will get a cheaper deal when they order rides online using Android mobiles instead of Apple ones (Wu, 2021). Dark-skinned people are provided fewer services than white people, whereas white people have lower health risks (Ledford, 2021). A self-driving Uber crashed and killed a pedestrian (Gonzales, 2019). Those algorithms harm both individuals and the society. People do not see principles of how programs work. Terms in the AI domain are also complicated and chaotic. Therefore, people start being afraid of AI and put it in an overstate status. The reasons are pretty complicated. It may because the big companies intend to maintain their monopolies, or media agencies trying to take profits from it, or people have so many fantasies about futures and artificial intelligence (though current AI is still in the weak AI phase and far from strong AI like what is performed in movies). How to deal with potential threats? Governments can help. European Union acts first towards those issues. It issued General Data Protection Regulation (GDPR) in May 2018, which has made seven principles to regulate future AI, including the Right of Explanation of Automated Decision (Vincent, 2019).

It will focus on the Explanation of Automated Decision. With the regulation, people could interpret how AI algorithms work and why the data fed to AI systems perpetuate biases and discrimination. If the decision is explainable, it will be easier to assess algorithms’ pros, cons, and risks, so that people can decide to what extent and on what occasions algorithms can be trusted. Also, practitioners will know what aspects they should work on to improve algorithms.

What is Interpretability?

There is no universal definition for interpretability. According to “interpretability is the degree to which a human can understand the cause of a decision” (Miller, 2018). It also said that “interpretability is the degree to which a human can consistently predict the model’s result” (Kim et al., 2016).

People used to instinctively absorb data from the real world and process them employing brains. With those activities, the human could easily understand the world and make decisions. 

However, with human development, the requirement of data is significantly increasing – data are becoming massively. Tools to collect them and deal with them follow up – humans get computers. Therefore, the interpretation progress has changed. People primarily digitalize those data and then utilize algorithms (the black box) to process them. Even though human beings are always the destination (Molnar, 2019), people are eager to know what happened in the black box. What are its logics to make decisions? (Humans can explain their logic to make decisions). They prefer all algorithms interpretable. Nevertheless, for now, people cannot explain all the black boxes. The reasons that many models and applications are uninterpretable come from the insufficient understanding of tasks and targets. As long as the modeler learns more about the mission of the algorithms, the algorithms will perform better.

How to interpret Algorithms?

Since algorithm interpretability is a relatively new topic, researchers have built different systems and standards for assessing it. It introduces two classifications here. The first one has three stages of interpreting, according to Kabul. This one is more friendly to people who are not experts in AI and ML.

 1) Pre-modeling

“Understanding your data set is very important before you start building models” (Kabul, 2017). Interpretable methods before modeling mainly involve data preprocessing and data cleansing. Machine learning is designed to discover knowledge and rules from data. The algorithm will not work well if the modeler knows little about the data. The key to interpreting before modeling is to comprehensively understand the data distribution characteristics, thereby helping the modeler consider more about potential problems and choose the most reasonable model or approach to get the best possible solution. Data visualization is an effective pre-modeling interpretable method (Kabul, 2018). Some may regard data visualization as the last step in data mining to perform the analysis and mining results. However, when a programmer starts an algorithm project, data is the first and most important. It is necessary to establish a sound understanding of the data by visualization methods, especially when the data volume is large or the data dimension is wide. With visualization, the programmer will fully understand the data, which is highly instrumental for the following coding.

2) Modeling

Kabul categorizes models as “white box (transparent) and black box (opaque) models based on their simplicity, transparency, and explainability” (Kabul, 2017). It is easier to interpret white box algorithms like Decision trees than to interpret black box algorithms like deep neural networks since the latter have many parameters. In Decision tree models, the data movement could be clearly traced. People could easily build the accountability system in this program. For example, it can clearly see that “there is a conceptual error in the “Proceed” calculation of the tree shown below; the error relates to the calculation of ‘costs’ awarded in a legal action”(Wikipedia, n.d.) .


3) Post-modeling

Explanation in this stage is utilizable to “inspect the dynamics between input features and output predictions” (Kabul, 2017). Also, since the distinct rules of models, the interpretation methods are model-based. It is basically same with the post-hoc interpretability, so it will further discuss in the following section. 

The second obtains two groups of interpretability techniques and “distinguishes whether interpretability is achieved by restricting the complexity of the machine learning model (intrinsic) or by applying methods that analyze the model after training (post hoc)” (Molnar, 2019), which could be further divided (Du et al., 2019). It is more professional in explaining algorithms.

1) Intrinsic interpretability

It combines interpretability with algorithms themselves. The self-explanatory model is embedded in their structures. It is simpler than post-hoc one, which includes programs like the decision tree and rule-based model (Molnar, 2019), which are explained in former section. 

2) Post-hoc interpretability

Post-hoc interpretability is flexible. Programmers can use any preferred method to explain different models. It has multiple explanations for the same model. Therefore, there are three advantages of post-hoc interpretability: a) the explanatory models could be applied in different DL models; b) it can get more comprehensive interpretations for certain learning algorithms; c) it could use with all forms like vectors (Ribeiro et al., 2016). However, it has shortcomings as well. “The main difference between these two groups lies in the trade-off between model accuracy and explanation fidelity” (Du et al., 2019). By means of external models/structures will be not only arduous but lead to potential fallacies. The typical example is Local Interpretable Model-agnostic Explanations (LIME). It is a third-party model to explain DL algorithms with focusing on “training local surrogate models to explain individual predictions” (Ribeiro et al., 2016b). In LIME, the modeler will change the data input to analyze how predictions will change accordingly. For a diagnosis DL program. LIME may delete some data columns to see whether the results are different from human decision. If the results changed,  the changed data may vital for the algorithms, vice versa. It can also be used for tabular data, text and images, so it is popular in recent. Nonetheless, it is not perfect. Some argue that it only helps practitioner to pick better data. The method used in LIME – Supervised Learning does useless job. It cannot know how decisions are make and how decisions incentivize behaviors. 


Although the methods to explain ML programs are currently booming, it is still difficult to interpret some Deep Learning algorithms, especially Deep Neural Network (DNNs) algorithms.This fact relates to Reframing AI Discourse. ‘Machine autonomy’ is not equal to human autonomy. Although designers set patterns for the AI system, the AI will become an entity (run by rules that may be unexpected when encountering real problems). This kind of entity does not mean AI can determine where it will go by itself but become an independent program if there is no intervention.

Black box in DNNs Algorithms

After the EU released GDPR, Pedro Domingos, the professor of Computer Science in UW, said on Twitter that “GDPR makes Deep Learning illegal.” From his perspective, DL algorithms are unexplainable. 

The black box is still there. In 2020, the image cropping algorithm of Twitter was found racist. It will “automatically white faces over black faces” (Hern, 2020). Twitter soon apologized and released an investigation and improvement plan. However, in its investigation, the modelers stated that their “analyses to date haven’t shown racial or gender bias” (Agrawal & Davis, 2020), which means they did not figure out what leads to bias. They cannot tell where the potential harm comes from. In the future, they intend to change the design principle to “what you see is what you get” (Agrawal & Davis, 2020). In other words, they give up using the unexplainable algorithm and choose the intrinsically interpretable model. This is not the only example. According to ACLU, Amazon Rekognition shows a strong bias on race issues. Although Amazon responded that ACLU misused and mispresented their algorithm, “researchers at MIT and the Georgetown Center on Privacy and Technology have indicated that Amazon Rekognition is less efficient at identifying people who are not white men” (Melton, 2019).

All those cases happened in DL algorithms. The black box in DNNs algorithms comes from the way they work. They imitate human brains to build neurons and set false neural networks with several layers so that the learning algorithm could develop recognitions on what has been learned/processed  (Alpaydin, 2016). “They are composed of layers upon layers of interconnected variables that become tuned as the network is trained on numerous examples” (Dickson, 2020). The theory of DL algorithms is not difficult to explain. In this video, the host thoroughly describes of principles of it and introduces simple DL algorithms. 

However, real cases are much more complex. In 2020, Microsoft released the largest DL algorithm about NLP– Turing Natural Language Generation (T-NLG). It contains 1.7 billion parameters. There are also other algorithms containing billions of parameters like Megatron LM by Nvidia and GPT-2 by OpenAI. How those large algorithms use parameters and combine them to make decisions is currently impossible to explain. “A popular belief in the AI community is that there’s a tradeoff between accuracy and interpretability: At the expense of being uninterpretable, black-box AI systems such as deep neural networks provide flexibility and accuracy that other types of machine learning algorithms lack” (Dickson, 2020). Therefore, it forms a vicious circle. People are constantly building DNNs algorithms to solve complicated problems, but they cannot clearly explain how their programs make decisions. For that, they, then start building new models try to interpret algorithms. However, for models to explain these DNNs algorithms (the black box), people have fierce disputes over them. Professor Rudin thinks that this approach is fundamentally flawed. The new explanation model is guessing instead of deducing. “Explanations cannot have perfect fidelity with respect to the original model. If the explanation was completely faithful to what the original model computes, the explanation would equal the original model, and one would not need the original model in the first place, only the explanation” (Rudin, 2019). Therefore, it is still hard to de-blackbox the DL algorithms. Moreover, the black box is also in the proprietary algorithms, like ones mentioned above (Twitter and Amazon). Companies hide codes to keep the edge over their competitors (Dickson, 2020). That makes de-blackbox unreachable. Although they are working on their own businesses, the potential risks cannot be automatically eliminated. 


Interpretability (de-blackbox) is required by everyone. Companies need it to improve their algorithm quality so that they can make more profits. Individuals need it to ensure that their rights are not harmed and they are treated equally. Governments need it to construct more reliable institutions for people and the society. Although there are many methods to interpret algorithms, they cannot be used universally. How to make all algorithms interpretable should be explored. Governments and corporations should think more about using DL algorithms. Also, the consensus about the role that algorithms play in shaping society should be reached.  



Agrawal, P., & Davis, D. (2020, October 1). Transparency around image cropping and changes to come.

Alpaydin, E. (2016). Machine learning: The new AI. MIT Press.

Dickson, B. (2020, August 6). AI models need to be ‘interpretable’ rather than just ‘explainable.’ The Next Web.,features%20of%20their%20input%20data.

Du, M., Liu, N., & Hu, X. (2019). Techniques for interpretable machine learning. Communications of the ACM, 63(1), 68–77.

Gonzales, R. (2019, November 7). Feds Say Self-Driving Uber SUV Did Not Recognize Jaywalking Pedestrian In Fatal Crash. NPR.

Hern, A. (2020, September 21). Twitter apologises for “racist” image-cropping algorithm. The Guardian.

Kabul, I. K. (2017, December 18). Interpretability is crucial for trusting AI and machine learning. The SAS Data Science Blog.

Kabul, I. K. (2018, March 9). Understanding and interpreting your data set. The SAS Data Science Blog.

Kim, B., Koyejo, O., & Khanna, R. (2016). Examples are not enough, learn to criticize! Criticism for Interpretability. Neural Information Processing Systems.

Ledford, H. (2021, May 8). Millions of black people affected by racial bias in health-care algorithms. Nature.

Melton, M. (2019, August 13). Amazon Rekognition Falsely Matches 26 Lawmakers To Mugshots As California Bill To Block Moves Forward. Forbes.

Miller, T. (2018). Explanation in Artificial Intelligence: Insights from the Social Sciences. ArXiv:1706.07269 [Cs].

Molnar, C. (2019). Interpretable machine learning. A Guide for Making Black Box Models Explainable.

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016a). Model-Agnostic Interpretability of Machine Learning. ArXiv:1606.05386 [Cs, Stat].

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016b). “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. ArXiv:1602.04938 [Cs, Stat].

Vincent, J. (2019, April 8). AI systems should be accountable, explainable, and unbiased, says EU. The Verge.

Wikipedia. (n.d.). Decision tree.

Wu, T. (2021, April 21). Apple users are charged more on odering a cab than Android users. The Paper.

De-blackboxing “the Cloud” and the Principle of Scalability


            The abstractness of the term “the cloud” has left many unknowns in a technology that has been rapidly evolving and present in most computational and technological advancements that we use on a regular basis. The nature and characteristics of the cloud create a mystery behind the systems and infrastructure both computational and physical that accompany cloud computing. By de-blackboxing and navigating through the main features, characteristics and concepts of cloud computing, an emphasis is placed on the understanding that the vast production of data can also lead to the overuse of data centers and physical concepts that ultimately have an impact on the environment.


Figure 1. via GIPHY

Cloud computing has been an expanding phenomenon and been put to great use over the past decade. From personal use, to businesses, educational institutions, governmental institutions and even health care establishments, rely on the efficiencies, safety and operability of “the cloud” for day-to-day functions and operations. The effectiveness and performance of the cloud slowly became adopted by anyone with a smart device as big tech and software companies not only use cloud computing technology in their products but are also the ones who create it, develop it and hold major decisions over it. With the rapid evolution of technology, more and more data is being constantly transferred, saved, uploaded, downloaded and more, in such large amounts that only powerful “high-performance computing” systems such as cloud computing, can “handle the big data problems we have today in reasonable time” (Alpaydin,2017, p. 176). The concept of being able to access your data from a non-specific location, without having to use a specific device, or carry a floppy disk or USB-stick, was not fathomable a few decades ago. The idea of an “invisible” cloud where everything and anything can be manipulated, stored and re-distributed made peoples’ fast-paced lives even more accommodating. Of course, it’s not just personal use that comes into play, but also businesses, companies and large corporations do not have to invest in thousands of computes, maintenance and support staff nor their own data servers and space, since someone else can provide that service to them (De Bruin & Floridi, 2017; Bojanova et al., 2013). An intangible, invisible “cloud”. Or is it? To what extent is it as abstract as most people think it is? De-black boxing cloud computing or “the cloud”, is critical towards understanding its implications both virtually and in the real, physical world. This piece further investigates how and to what extent does cloud computing use and consumption, affect the physical implications and infrastructures in terms of their environmental impact.

What is “The Cloud”?

One of the biggest cloud computing management companies Amazon’s Amazon Web Services (AWS) defines cloud computing as “the on-demand delivery of IT resources via the Internet” that provides access to any tech services on a on an as-needed basis (AWS, 2019). Among the plethora of things that cloud computing can be and is used for some are: “data backup, disaster recovery, email services, sharing virtual desktops, big data analytics, customer-facing web applications. It can also be used for personalized treatment of patents, fraud detection for finance companies, provide online games for millions of people/players around the world and more” (Theocharaki, 2021).

The National Institute of Standards and Technology also known as NIST, define cloud computing as:

 Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models. (Ruparelia, 2016, p. 4)

Although, as a technology it is still a new and constantly evolving “phenomenon” and because of the black-box mystery that is attached to it, one can’t say that there is an exact definition but more of an overall concept of what cloud computing is and what it does. Overall, the cloud is “actually a services or group of services” (Roundtree et al., 2014, p. 1), where a collection of technologies or a set of common technologies work together, so that the data and computation attributes are handled in large and remote off-site data centers. According to the NIST, cloud computing can be distinguished by three main components; key cloud characteristics, cloud deployment models and cloud service models (Alpaydin, 2017; Roundtree et al., 2014). Behind this “modern buzzphrase” of cloud computing, “hides a rich tradition of information sharing and distributed computing” (Denning & Martell, 2015) whose vast unknown of what took place behind the border of the box gave it its famous name; “the Cloud”.

History of The Cloud

Figure 2. Project MAC’s, IBM 7094

In the 1950s and 1960s, big companies such as IBM had already figured out a business model for cloud computing with the use of “rented computation time on large mainframe computers” and researches such as John McCarthy, who was a leading Artificial Intelligence computer scientist at Stanford, investigated the “ideas of computation as a utility function (De Bruin & Floridi, 2017, p. 24). In the mid-1960s the Massachusetts Institute of Technology (MIT), built the Project MAC – an acronym for “multiple-access computer” or “man and computer”, which conceptualized the “idea of building systems that could share computing power among many users” (Denning & Martell, 2015, 28). Project MAC lead to the invention of Multics an early operating system that allowed memory, disk and CPU to be distributed over among many people with the incentive of sharing the cost responsibility and therefore lowering the price of individual payment.

Figure 3. The H6180 Multics at MIT

The supply of the computing power would be used as a utility, a commodity that anyone could use.  Towards the end of the decade, ARPANET (The Advanced Research Projects Agency Network) followed the essence of utility; resource sharing and wide accessibility and as long as you were connected to the network you could connect with any host and therefore service(s). This soon evolved in what we now know as TCP/IP protocols, which official set and standardized in 1983 by APRANET. TCP/IP protocols allowed for message exchange without having to know someone’s actual location but just IP addresses, it was based on open standards that could be used in open-source software (Denning & Martell, 2015; Irvine 2021; Nelson, 2009). After adopting the Domain Naming System (DNS) a year later, host names now had their personalized numeric IP addresses ( creating even more flexibility between communications and location of internet matter (Denning & Martell, 2015).

By the 1990s when the World Wide Web was taking over, just as Cloud Computing started gaining more fame in the early 2000s, the de-blackboxing of such types of computing and the knowledge behind their functionalities, paved the way of how they were to be understood by the general public. The presence of the WWW, created further transparency and ‘manipulation’ of information objects across networks and the Internet especially after the appearance and creation of Uniform Resource Locators (URLs) and the Digital Object Identifier (DOI) system, that gave unique identifiers and ‘names’ to ‘things’ on the Internet creating unique digital objects (Nelson, 2009; Denning & Martell, 2015).

The client-server architecture that is used by most web services in the cloud even today, can be attributed to MIT’s Multics, which developed the idea of sharing resources from a mainframe system for multiple users, Xerox Palo Alto Research Center’s system “Alto”, a network of independent graphic workstations that were all connected together on an Ethernet, and another MIT creation the ‘X-Window’ client-server system, that basically granted pre-established client-server communication protocols, allowing new service providers to user their own hardware and user interfaces without the extra hassle of designing new protocols (Denning & Martell, 2015, 30).

Figure 4. Xerox PARC’s Alto system

 With the creation of more and different forms and products in tech, such as PCs, tablets, smart phones, email services etc. cloud computing gained huge interest as it managed to adapt and support these ‘expansions’. In 2006, Google’s then CEO – Eric Schmidt, popularized the term to what most people now refer to as “The Cloud” and has become a part of pretty much anything we do that is related to technology in one way or another (DeBruin & Floridi, 2017, p.23-24).

Architecture & Functionality

Almost everything we do or use in terms of our day-to-day technology is in one way or another a part or a process of cloud computing. From our email services, to video streaming services such as Netflix or YouTube, to smart phone speech recognition to sharing your files on Google Drive, uploading them on Dropbox, sending photos, doing online school on Zoom, or working with ten other people on the same project at the same time on a specific platform, and so much more relies on “the cloud” for our daily functioning that has now become almost something we take for granted. The perplexity of the systems and processes that go on into what makes “the cloud” and the fact that it encompasses such an interconnected vastness of groups of services, frameworks, paths, etc. makes it all that more complicated to detangle and understand. Exactly because of how broad the definition or concept of “the cloud” can be, doesn’t necessarily mean that everything that is on the internet or Web-based application/product, make it a cloud application.

The five main characteristics that a service/software/system/platform needs to have in order to be considered a part of cloud computing are: on-demand self-service, broad network access, resource pooling, rapid elasticity and measured service (Ruparelia, 2016; Rountree, 2014).On-demand self-service is the idea that users/consumers have the ability to access the service on their own without the presence of a “middleman”. Broad Network Access refers to the accessibility of cloud services that only require a network connection in order to connect to the services/products/applications. The better the connection (i.e. LAN – Local Area Network connections) or a good Internet connection, the better and more efficient the services will be and should also support access from any type of device (i.e. smartphones, tablets, computers, etc.).Resource Pooling benefits the provider’s side as it implies that since the customers/users will not always have the need to simultaneously use all the resource available to them at the same time, the ones that are not being put to use can benefit and be used by another customer/user. This in a sense saves resources and allows providers to service more customers than they would’ve if all resources were constantly having ‘to work’ for one user, even though they were not being used. 

Rapid Elasticity entails the ability of the cloud, to grow to the user’s demand and satisfaction. Through automation and orchestration, when the resources have been used to their full capacity, the system will automatically seek to gain more capacity expansion. On the customer’s end this looks like this unimaginable space in the cloud, but in reality, for the providers this means that the more space is wanted the more physical resources need to be implemented such as computer resources, hard disks, servers, etc. However, the key to this is that the resources in demand is that in order for the providers to “save on consumption costs” such as power, electricity, cooling systems and more, “even though the resources are available, they are not used until needed”. Similarly, once “the usage has subsided, the capacity shrinks as needed to ensure that resources are not wasted” (Roundtree, 2014). Measured services are the fifth characteristics that a service/software/system/platform in order to be considered cloud computing. Measured services means having the ability for cloud service/providers to measure usage such as the time i.e. for how long has someone been using the service, the amount of data used i.e. how much space is it taking up, etc. This also is what determines the rates and prices of plans. If you have ever gotten notifications about running out of cloud storage on an Apple device or needing to update your cloud payment options/plan on your Google Drive, and you have payed money that has ‘magically’ increased your cloud space in “the cloud”, then it is this ‘phenomenon’, one could even say ‘luxury’, of measured services and rapid elasticity (Rountree, 2014).

Cloud Service Models

As previously mentioned, the vastness of services that can be offered from cloud computing are called “cloud service models” and are more broadly categorized into the types/kinds of services that they offer based on their target audience, responsibilities and tasks, costs, etc. The three basic service models are Infrastructure as a Service, Platform as a Service and Software as a Service.

Infrastructure as a Service also referred to as IaaS, is the service that provides “basic infrastructure services to customers” and the hardware infrastructure so both physical and virtual machines, i.e. networking, the servers, storage, plants, etc. on a utility basis. Example of this can also include; IT systems monitoring, backup and recovery, platform and web hosting, etc. (Rountree, 2014, p. 7; Ruparelia, 2016, p. 21 & 131). Some “real life” examples and applications of Dropbox with file synchronization, printing with Google Print, hosting on Amazon EC2 or HP Cloud or storage on Amazon Cloud Drive, Apple’s iCloud, etc. (Ruparelia, 2016). Platform as a Service or PaaS, “provides an operating system, development platform, and/or a database platform”. This allows and creates the ideal environment for installing and running software, developing applications by eliminating the need for a company – the client, to have to build their own infrastructure in order to develop those aps. Other “real life” examples and uses include development with languages such as C, C++ and Java, database services for business intelligence and data-warehousing. Software as a Service or SaaS, provide “application and data services” by supplying hosted applications without the need of installing and downloading them, paying extra for them or giving up space for them on your hard drive or storage disk/drive. From the application skeleton itself to all the data that comes with it, SaaS means that the cloud service/provider is responsible for maintain and keeping all platforms and infrastructure needed for the services to take place. SaaS is “the original cloud service model […] and remains the most popular” as it “offers the largest number of provider options” (Rountree, 2014, 7). It also entails use cases such as billing and invoicing, asset and content management, image rendering, email and instant messaging and more. Applications of SaaS include email services such as Gmail, Hotmail, Outlook, etc., collaborative suites such as Google Docs and Microsoft Office 365, content management such as Box and SpringCM and customer relationship management with Salesforce. (Ruparelia, 2016).

Figure 5. Cloud Service Models diagram by David Choo

In Cloud Computing (2016), Ruparelia and a few other identify and discuss the presence of further/more specific service offerings in terms of their abstraction levels. Information as a Service (INaaS) and Business Process as a Service (BPaaS) are two of those. Information as a Service (INaaS) is responsible for providing business information that is relevant to the specific customer/client whether on an individual, business or corporate level. This may include market information, price information, stock price information, information validation, business processes and tasks, health information from patients, real-time flight information, etc. (Ruparelia, 2016, p. 177; Mosbah et al., 2013, p. 3). Business Processes as a Service (BPaaS) aids in business process outsourcing by carrying out business functions that rely on large amount of service and data that facilitate in a business’s functioning. This can include ecommerce, payroll and printing, tax computation, auditing, health pre-screening, ticketing and billing, etc. Google’s AdSense and IBM’s Blue are examples of these. (Ruparelia, 2016, p. 193; Mosbah et al., 2013, p. 3).


Cloud Deployment Models


With the wide variety of cloud computing options and services each individual, business, organization, corporation, etc. differs in what they need to use cloud services for. In order to support the environment in which personal or business use is needed or wanted, a certain kind of cloud environment must be implemented by having different service models. The four deployment models of the cloud are public, private, community and hybrid.

The public cloud service model is the most commonly thought of as all of its services, systems and operations take place in a housed external service provider. The infrastructure of the cloud is owned by the cloud service organizations who are responsible for administering and managing the provided service and can apply this across abstraction levels and available via the Internet. Some example of the public cloud model are Google Docs, Microsoft Office 365 and Amazon Cloud Player. (Ruparelia, 2016; Mosbah et al., 2013; Rountree, 2014).The private cloud service model all the services, systems and resources are provided and located by the individual’s company’s, organization’s or person’s private cloud with zero access to the public. Private clouds can be accesses through a local (LAN), wide area network (WAN) or through a private virtual network, VPN and is managed, operated and maintained by the individual(s) in question. (Ruparelia, 2016; Mosbah et al., 2013; Rountree, 2014).The community cloud service model is a semi-public cloud or a “broader version of a private cloud” (Ruparelia, 2016, 32) and is shared among members of a group, organization, etc. that have some sort of shared goals, missions, concerns, etc. This is specific to groups/organizations that perhaps for security and safety measures/reasons do not want to use the public cloud and theresponsibility of maintenance is shared among the members/users who have access to it. Examples of its use include a publishing cloud, a health industry cloud or a banking regulation cloud. (Ruparelia, 2016; Mosbah et al., 2013; Rountree, 2014). Finally, the last cloud service model is the hybrid 

Figure 6. Representation of cloud variety by Ruparelia et al.

cloud. This entails a combination of two or more of the aforementioned cloud models that are not mixed but linked together to work more efficiently and to achieve their specific goals/operations and allow data and application portability. A hybrid cloud can consist of public and private clouds and the mixing and matching allows its users/customers more flexibility and choices in what they do and how they use their cloud services. (Ruparelia, 2016; Mosbah et al., 2013; Rountree, 2014).

Figure 7. A great depiction of The Relationship between Services, Deployment Models, and Providers by Mosbah et al.


Data Centers, Principle of Scalability and Cloud Computing Emissions

With the ambiguity that accompanies what “the cloud” really is, this concept that after all it might really just be a cloud, an invisible mass of data, information, systems and software comes a lot of misunderstanding about its functions, operations and of course consequences. However, in order for the computational and electronic aspect of cloud computing to take place there needs to be some sort of physical support that accompanies the cloud products and services, in general the overall system. With the mass production, circulation, consumptions, manipulation, etc. of data in unquantifiable amounts, technological challenges can come into play. Scaling out is a main concern of cloud computing that is getting more and more attention and being further addressed not only by people in the tech or science field but also those in the natural and environmental scientists and even pop-culture. The environment of cloud infrastructure, entails and relies on commodity equipment which means that in order to “add capacity, you need to scale out instead of scaling up” (Rountree, 2014, 16). Scaling out can lead to extra pressure and burden for datacenter and facilities that host the cloud’s infrastructure and “increased environment-related costs in resources such as power and cooling” (Rountree, 2014, 16) amongst a variety of other things.

Data centers are physical locations/sites/areas/ spaces, the true “home” of cloud computing” where all the countless of servers and processors are housed. Data centers are spread out in all different areas and cities, remote or otherwise, in the U.S. and all over the world. The various data centers can communicate and collaborate with each other through a network through which “tasks are automatically distributed and migrated from one to the other” (Alpaydin, 2017, 152).

“As cloud computing adoption increases, the energy consumption of the network and of the computing resources that underpin the cloud is growing and causing the emission of enormous quantities of CO2”, explains Gattulli et al., in their research on cloud computing emissions (Gattulli et al., 2014).  In the past decade alone, “data center energy usage has decoupled from growth in IT workloads” with public cloud vendors, also being among the biggest (tech) companies in the world, deploying large amounts of new cloud systems and networks leaving an environmental impact that is often times harder to asses because of the nature of this technology, than it is to calculate other sort of emissions (Mytton, 2020). “Although the large cloud vendors are amongst the largest purchasers of renewable electricity, customers do not have access to the data they need to complete emissions assessments under the Greenhouse Gas Protocol” leading the way for scientist and researchers such as Gattulli and Mytton, to find new ways and methods to control IT emissions and lessen the environmental impact that our overreliance on the efficiency of this technology has on our planet. Over the past 5 or so years, the Information and Communication Technology’s carbon emissions alone have amounted to 1.4% – 2% of total global greenhouse gas emissions, “approximately 730 million tones CO2 equivalent (Mt CO2-eq)” (Ericsson, 2021; Gattulli et al., 2014). Data centers that are used for public internet alone consumed 110TWh in 2015, almost 1% of the world’s electricity consumption (Ericsson, 2021). Often, we do not think of all the daily services and products we use that ultimately rely on the cloud for their functions, such as video streaming platforms, gaming, overall uses of AI and Machine Learning, cryptocurrencies, etc. In 2017 for example, Justin Bieber’s song “Despacito”, “consumed as much electricity as Chad, Guinea‑Bissau, Somalia, Sierra Leone and the Central African Republic put together in a single year” through streams and downloads (five billion) and Bitcoin mining “accounted for 0.2 percent of global electricity usage in mid-2018” (Ericsson, 2021).

Figure 8. Representation of the Carbon footprint of ICT and data traffic development by Ericsson

Figure 9. Distribution of ICT’s carbon footprint in 2015 by Ericsson


The technological evolutions of the past decades have led to the amazing invention of cloud computing. The “explosive growth of data and the need for this data to be securely stored yet accessible anywhere, anytime” lead to a higher demand and even need of cloud computing (Bojanova et al., 2013).  Of course, this has created a circle of constant data and data services being constantly re-born and re-distributed in the broad network and cloud. The mystery behind what cloud computing and “the cloud” is, doesn’t necessarily help with understanding and conceptualizing the physical and material aspect of this technology. Therefore, this further instigates the hidden implications that come along with disregarding the fact that cloud computing isn’t so much in “the cloud” but on physical location on earth that keep getting larger and more with the exponential increase of cloud computing services demand. As it happens, data centers that hold and are the backbone of cloud computing, as well as all the other external ‘expenditures’ such as electricity, maintenance, etc. have much heavier implications on the environment than we assume from a conceptually intangible technological advancement. Recent research and environmental analysis, support the idea that low-carbon cloud-computing solutions, renewable energy sources, as well as gaining access to data about cloud computing emissions and power usage effectiveness can increase awareness and understanding of what is going on behind the scenes of this technology that we truly hold so dear to us (Mytton, 2020; Gattulli et al., 2013; Ericsson, 2021).



Alpaydin, Ethem. (2016). Machine Learning: The New AI. Cambridge, MA: The MIT Press.

Amazon’s Amazon Web Services 

Bojanova, I., Zhang, J., and Voas, J. (2013).  “Cloud Computing,” in IT Professional, vol. 15, no. 2, pp. 12-14, doi: 10.1109/MITP.2013.26.

De Bruin, Boudewijn and Floridi, Luciano. (2017). The Ethics of Cloud ComputingScience and Engineering Ethics vol. 23, no. 1 (February 1, 2017): 21–39.

Denning, Peter J.  and Martell, Craig H.. (2015). Great Principles of Computing. Cambridge, MA: The MIT Press. 

Ericsson. (2021). ICT and the Climate. Ericson.

Gattulli, M., Tornatore, M., Fiandra, R., and Pattavina, A. (2014). “Low-Emissions Routing for Cloud Computing in IP-over-WDM Networks with Data Centers,” in IEEE Journal on Selected Areas in Communications, vol. 32, no. 1, pp. 28-38, doi: 10.1109/JSAC.2014.140104.

Irvine, M. (2021) What is Cloud Computing? AI/ML Applications Now Part of Cloud Services. Class notes:

Mosbah, Mohamed Magdy, Soliman, Hany and El-Nasr Mohamad Abou. (2013). Current Services in Cloud Computing: A Survey. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.3,No.5 DOI : 10.5121/ijcseit.2013.3501

Mytton, D. (2020). Assessing the suitability of the greenhouse gas protocol for calculation of emissions from public cloud computing workloads. Journal of Cloud Computing, 9(1) doi:

Nelson, M. (2009). Building an Open Cloud. Science, 324(5935) from

Ruparelia, Nayan B. (2016). Cloud Computing. Cambridge, MA: MIT Press, 2016. 

Roundtree, Derrick and Castrillo, Illeana.(2014)The Basics of Cloud Computing: Understanding the Fundamentals of Cloud Computing in Theory and Practice. Amsterdam; Boston: Syngress / Elsevier.


Theocharaki, D. (2021). Cloud Monopoly. Class notes:


Figure 1: GIF from Giphy 

Figure 2: Photo of Project MAC’s, IBM 7094 from Multicians

Figure 3: Photo of H6180 Multics at MIT from 

Figure 4: Photo of Xerox PARC’s Alto system from Wired article “The 1970s Conference That Predicted the Future of Work” by Leslie Berlin 

Figure 5: Photo of Cloud Service Models diagram by David Choo

Figure 6: Screenshot from Ruparelia, Nayan B. (2016). Cloud Computing. Cambridge, MA: MIT Press, 2016. 

Figure 7: The Relationship between Services, Deployment Models, and Providers by Mosbah, Mohamed Magdy, Soliman, Hany and El-Nasr Mohamad Abou. (2013). Current Services in Cloud Computing: A Survey. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.3,No.5 DOI : 10.5121/ijcseit.2013.3501

Figure 8: Representation of the Carbon footprint of ICT and data traffic development from Ericsson. (2021). ICT and the Climate. Ericson.

Figure 9: Distribution of ICT’s carbon footprint in 2015 from Ericson.






Tracking and Sharing, a method to improve recommendation accuracy?

Fudong Chen


This article attempts to answer the following questions: whether and how the recommendation system can recommend a topic-related content that has never appeared in the system. To figure out the question, the article gives a brief description of the recommendation system and concludes that without relative data, the system cannot recommend relative content. Then the article focuses on the external data of the app and deblackbox the digital fingerprint to show that it is possible to improve the recommendation system by tracking users and sharing data. Finally the article discusses the data privacy and expressed some concerns.


The recommendation system of various apps based on machine learning and algorithms brings us a lot of convenience. Shopping apps recommend products we need, video apps recommend videos that attract us, and search engines guess what we want to search. The prediction or interference of our needs through machine learning and various algorithms is actually well understood, since they are based on the behavioral data created by ourselves. For example, I bought a science fiction novel, and the shopping app recommends other science fiction novels; another example, if I click on a cat-related video, the app will recommend more cat videos. Admittedly, based on the different recommendation system algorithm, there are different recommendation strategies, but most of the strategies are explainable and understandable from a human perspective. But in daily life, we may meet the following situations: we discussed a topic with friends (maybe on other apps, or even in reality). And the topic has never been discussed or searched on an app. but after a while, the topic-related advertisements or videos are recommended on the app. This coincidence naturally makes us question: can the recommendation system recommend a topic-related content that has never appeared in the system, or is our mobile app monitoring us all the time and extracting key words for recommendation? This article will try to answer this question. First, explain the composition and data sources of the recommendation system in general. Then, starting from data sources, explain how large Internet companies build user portraits in multiple dimensions, debalckbox the method of tracking users. And finally discuss the impact of mobile phone fingerprints (digital fingerprints) on data privacy.

How does the recommendation system works?

Before discussing whether the recommendation system can make recommendations as accurate as monitoring,we need to briefly describe how the recommendation system works. Simply put, the recommendation system is divided into three aspects: data, algorithm, and architecture. The data provides information and is the input of the recommendation system. The data contains user and content attributes information and user behavior and preference information, such as clicking on a certain type of video, purchasing a certain type of goods, etc. The algorithm provides the logic for processing the data, that is, how to process the data to get the desired output. Take the most commonly used algorithm in the recommendation system, Collaborative Filtering algorithm, as an example. Collaborative Filtering is based an assumption: if A and B have similar historical annotation patterns or behavior habits in some content, then they will have similar interests in content. It generally uses the nearest neighbor algorithm to calculate the distance between users by using the user’s historical preference information, and then uses the weighted product reviews of other user which is the nearest neighbor to predict the target user’s preference for a specific product. The system recommends products or content to target users based on the result. The architecture specifies how data flows and processes. It specifies the process of how data travels from the client to the storage unit (database) and then back to the client.

In other words, the recommendation system categorizes raw data and forms user portraits, attaches model tags or labels (ie patterns) to each user, and then recommends content based on various algorithms, such as the Collaborative Filter just mentioned.

Fig1, data processing

As fig1 shown above, the original data contains four aspects:

User data refers to the user’s personal information, such as gender, age, registration time, mobile phone model, etc.

Content data refers to the content provided by the app. Foe example, content data of shopping apps such as Taobao and Amazon are related with products and product reviews. Content data of video apps such as Tik Tok and Netflix are related with videos and video reviews.

User behavior logs refer to what the user did on the app, such as what videos they searched for, what videos they shared, or what product they purchased.

External data is data given by other apps. A single app can only collect a certain aspect of the user’s preferences data. For example, a video app can only describe what type of content user prefers in the video field. But if we integrate other different types of app data, the user’s data dimension will be greatly enriched.

The fact labels are cleaned based on the original data, including dynamic and static portraits:

Static portrait refers to the attributes of the user which are independent of the product scene, such as age and gender. Such information is relatively stable.

Dynamic portrait refers to the user’s behavior data on the app, and explicit (the behavior clearly expressed by the user) includes likes, sharing, etc. It is worth mentioning that if it is a comment, it is necessary to use NLP to determine whether the user is positive, negative or neutral. The implicit ones (the user does not clearly express their preferences) include the duration time the user watch video, clicks, etc.

Model labels are obtained through weighted calculation and cluster analysis through fact labels, which means weight for each dimension, and then calculate, and the users will be classified (cluster analysis) depended on the calculation.

In short, the recommendation system processes the data layer by layer by using various models and algorithms, and then returns the corresponding recommendation results. But in any case, the recommendation system cannot give recommendation results out of nothing. It needs to input various data, process the data according to algorithms designed by humans, and return the results according to certain logic. Therefore, for a single app, if we have not discussed the topic on the app (that is, there is no corresponding data for the recommendation system), it is reasonable that the app will not return the recommendation results of the related topic.

However, it can be seen in fig1 that the data source is not limited to the app itself. If there is corresponding external data, the recommendation system have the ability to recommend the content corresponding to the external data. In fact, technically speaking, large Internet companies such as Google, Alibaba and ByteDance, etc., usually have multiple apps in different fields, which can share user data and expand user portraits’ dimensions through user account information and digital fingerprints. Take Alibaba as an example. Ali’s apps include map, health, payment, video platform and even weibo, a social platform, so Ali’s portrait of Chinese users can cover many dimensions. It is worth mentioning that for different apps with common accounts, it is reasonable to directly match the account with the database. However, some Ali-owned apps, such as AutoNavi Maps, do not require users to log in to their accounts. Does Ali have a way to track this kind of users? The answer is yes. For users who use the app without logging in to a personal account, the app can identify or track users by the fingerprint of the smartphone.

How to track users?

Existing tracking mechanisms are usually based on either tagging or fingerprinting (Klein & Pinkas, n.d.). Tracking here are similar to the word recognize or identify mentioned above.

The typical tagging method is cookies. Cookies are data stored on the user’s local terminal. It is a small piece of text information sent by the server to the client browser and stored locally on the client as a basis for the server to identify the user’s identity status. Their main use is to remember helpful things like your account login info, or what items were in your online shopping cart (Cover Your Tracks, n.d.). But now, whether PC browser or mobile phone, there are many users who choose to delete or hide cookies, which leads to the poor effect of using cookies to identify users.

The typical fingerprint technology is Browser fingerprint technology. it is a concept proposed by Ecjersley in 2010 (Eckersley, n.d.), which means when a user uses a browser to access the server, the server get browser feature identification, canvas feature value, some hardware Information and system information, and generates a unique strings for the browser used by the user through a specific fingerprint generation algorithm. The accuracy of user identification technology based on browser fingerprints depends on the identification ability of browser fingerprints, and the identification ability of browser fingerprints depends on its degree of uncertainty. The higher the uncertainty, the higher its uniqueness. , The stronger the identification ability. For example, whether sharing cookies is a measurement of fingerprints. Some people are willing to share and the others are not. So if we know the measurement whether sharing the cookies, we can make sure which one the user belongs to. And if we have more measurements, users will be more likely to identify. From the source of measurements acquisition, they are divided into HTTP headers and JavaScript. HTTP headers means when connecting to a server, browsers send the user-agent, the desired language for a webpage, the type of encoding supported by the browser, among other headers. JavaScript is a programming language used to develop web pages. The server can obtain device information through JavaScript commands. For example, obtain the User-Agent through navigator.userAgent, and use commend of Intl.DateTimeFormat().resolvedOptions().timeZone to obtain the time zone. The following figure shows my fingerprint information on the website AmIUnique:

Figure 2, some measurements of fingerprints, source:

All the measurements in Figure 2 are to find out the uniqueness of the user. It is worth mentioning the measurement of Canvas and WebGL. When drawing a 2D picture or 3D picture on different operating systems and different browsers including PC and mobile phone, the generated image content is actually not exactly the same, even if it looks the same to our eyes. So by extracting the picture information of Canvas and WebGL, we can uniquely identify and track the user.

Deblackbox the digital fingerprints

In the above, we talked about the measurements of browser fingerprints. In fact, digital fingerprints of mobile phone and browser fingerprints have many similarities, especially those related to JavaScript. Although different algorithms use different measurements to track mobile phones, these digital fingerprints models all follow a generic methodology which is shown below:

Figure 3, Generic methodology of digital fingerprints, source: (Baldini & Steri, 2017)

Meanwhile, we can also deblackbox digital fingerprint following the fingerprint recognition process of the browser.

Figure 4, Browser fingerprint recognition process

Looking at the two pictures together, digital fingerprint recognition is composed of 3 entities, namely the mobile phone on the client side (refers to Browser), the apps on the server side (refers to Website), and the database (SQL). In fact, for fingerprints of mobile phones, in addition to the above-mentioned measurements similar to browser fingerprints, such as device information, user configuration, etc., there are also many measurements about mobile phone components (hardware). But all the data needs to be digitized before proceeding to the next step. Therefore, for apps, digital information that can be directly obtained is usually used for identification.

When the user enters the app, the identification process of digital fingerprints begins. After the users access, App will send files such as html, css and JavaScript to the client, and usually the fingerprint collection script will be sent to the user together. The fingerprint collection script is defined by the app developer. For simple features, they can be obtained directly through API. For example, the user agent can directly use the userAgent property of the navigator object to obtain, and the screen resolution can be obtained through the width and height properties of the Screen object. The client (here means the phone) will send the fingerprint digital information to the app according to the script command. Note that because JavaScript and html do not require permission to run, users cannot perceive this process. The digital information will then be sent to the database, and be matched by Instance based algorithm and machine learning algorithm in the database. Instance based algorithm is often used in static fingerprint, which means the collected fingerprint feature values are converted into string form and spliced, and the spliced string is transformed into a fixed-length number through a hash algorithm. So if the number matches one of the instance in the database, then the user is identified. However, due to the frequency of feature value changes, the tracking time of static fingerprints for users is often very short. Most of the time, the company will use dynamic fingerprint and matches it by deep learning. In simple terms, the dynamic fingerprint compares each feature value of the fingerprint and sets a threshold. When the similarity of fingerprint to be matched and a fingerprint in the database reach the threshold, then confirm that the two match, otherwise insert the fingerprint to be matched into the database as a new one. There are many methods to generate threshold, such as statistical analysis methods, distance algorithms, random forest algorithms, LSTM algorithms and so on.

Back to the original question, when the user portraits of people are enriched, the portraits will not only include behavioral data, but also interpersonal relationship data and the data about relationship between you devices (PC, phone and so on) and accounts. For example, if you shared a shopping link to a friend a long time ago, your user portrait and your friend’s user portrait will be considered relevant, so when you discuss a topic with your friend, your friend may have left data on the topic online. The recommendation system based on the relationship between you and your friends, as well as other data such as location, coexisting in a local area network, etc. It is reasonable that after discussing the topic, the recommendation system will recommend the relevant content to your friend and also recommend it to you at the same time.

Discussion of data privacy and sharing personal data

Whether it is the opt-out privacy policy in the United States or the principle of informed consent represented by Europe, I think the key to data privacy lies in informed and optional. Like cookies that record user data, sharing cookies lets users enjoy convenience on the website and get a better experience; not sharing cookies will not lead to be unable to use the main services of the website. More importantly, the user has the right to choose whether to share cookies or not. Even if the website does not provide the option of not obtaining cookies, users can manually cancel sharing cookies through browser settings. But the emergence of digital fingerprints broke the principle of informed and optional. Now whether it is a website or an app, whether it is a PC or a mobile phone, companies can collect digital fingerprint information to identify users without the user’s perception. Secondly, for data share, the app usually provides a privacy policy statement before use. No matter which type of app, the topic of data sharing will be mentioned that the consent of data privacy statement means that the company is allowed to share the data in the company and its affiliates. If the user rejects the statement of privacy policy, he will not be able to use the services of the entire app. This actually deduces the user’s choice. Additionally, Some companies can let users turn off Ad personalization by themselves. Apps of Google and Ali all have this option. But this option does not guarantee that these companies will not collect your data. For example, app of Taobao clearly states: Service logs, device-related information, and device location information when you use the app will all be used for personalized recommendations. You can make decision independently on recommended content by turning off personalized recommendations (in my view, instead of refusing to be collected information). Take another example, Google’s privacy policy update of June 2015 indicates that they use “technologies to identify your browser or device (Privacy Policy – Privacy & Terms – Google, n.d.)” In fact, according to an interview with Bytedance employees, the above-mentioned information is classified as level-2 information, which means we cannot find a specific person in reality followed this kind of information. But it contains information like Consumer behavior, geographic location, browsing history and it can point to a specific account but not directly pointing to the owner of the account. After special approval, this kind of information can be shared with related company or different departments in the same company. In other words, the data we generate in an app and the user portraits generated therefrom may be used and analyzed by other apps of the same company. In addition, the combination of data sharing and tracking user technique also makes the app’s permission acquisition policy useless. For example, even if I forbid the shopping app to obtain the current location permission, it can still get the desired data through the map app. Take another example, the content I posted on social media can also be learned and analyzed by other apps, even if I do not log in to other apps with a social media account. In fact, when a user logs in to Tik Tok for the first time, which is also called cold start, the user may still be recommended to the accounts of classmates or friends he know in reality before he generates first bit of behavioral data in Tik Tok. This is brought about by track technique and data sharing. In addition, although the permission acquisition situation of apps is transparent, and the sensitive permission will needs to be confirmed by the user every time when it is used (for example, for the acquisition of microphone permission, app permission needs to ask for user consent, and a second confirmation is required when using microphone permission. This is also the reason why I think it is temporarily impossible to use mobile phone to monitor keywords for advertising recommendations), some mobile phone components that are considered not sensitive and do not require permission to use may also be used to violate privacy. According to Zheng, there is technology to eavesdrop part of the voice information of the mobile phone speaker through the accelerometer, a motion sensors of mobile phone (Zheng et al., n.d.).


In the article, we ask a question based on a daily phenomenon: whether the mobile app has the ability to make recommendations as accurate as monitoring. First of all, we introduce the basic composition and operation of the recommendation system, and concluded that the recommendation system cannot give recommendation results out of nothing. It needs to input various data and process the data according to the algorithm designed by humans. The result should be relative with the input data. From the perspective of data sources, we deblackbox the process of digital fingerprint and believe that the data sharing of apps in different fields and the user tracking technique can enrich user portraits and make accurate recommendations. Finally, the article expresses the concerns about the impact of digital fingerprint on data privacy, and considers that data privacy in the mobile phone field needs more research and corresponding restrictive measures.


Baldini, G., & Steri, G. (2017). A Survey of Techniques for the Identication of Mobile Phones Using the Physical Fingerprints of the Built-In Components. 19(3), 29.

Eckersley, P. (n.d.). How Unique Is Your Web Browser? 19.

Klein, A., & Pinkas, B. (n.d.). DNS Cache-Based User Tracking. 15.

Laperdrix, P., Rudametkin, W., & Baudry, B. (n.d.). Beauty and the Beast: Diverting modern web browsers to build unique browser fingerprints. 18.

Privacy Policy – Privacy & Terms – Google. (n.d.). Retrieved May 13, 2021, from

Zheng, T., Zhang, X., Qin, Z., Li, B., Liu, X., & Ren, K. (n.d.). Learning-based Practical Smartphone Eavesdropping with Built-in Accelerometer. 18.

Cover Your Tracks. (n.d.). Retrieved May 13, 2021, from

Anand, S. A., & Saxena, N. (n.d.). Speechless: Analyzing the Threat to Speech Privacy from Smartphone Motion Sensors. 18.

FP-STALKER: Tracking Browser Fingerprint Evolutions. (n.d.). 14.

Das, A., Borisov, N., & Chou, E. (n.d.). Every Move You Make: Exploring Practical Issues in Smartphone Motion Sensor Fingerprinting and Countermeasures. 21.

Hauk, C. (2021, January 14). Browser Fingerprinting: What Is It and What Should You Do About It? Pixel Privacy.

A Survey of Data, Algorithms, and Machine Learning and their roll in Bias

Matthew Leitao


Algorithms and machine learning models continue to proliferate through the many contexts of our lives. This makes understanding how these models work and why these models can be biased is critically important to navigating the current information age. In this paper, I explain how data is created, how models are run, and when and where biases are introduced to the model. Though this is a survey of the types of machine learning models out there, I follow the example of how a machine learning model evaluating resumes handles the data, and how difficult it is at times to come to an objective outcome.


Algorithms are everywhere, in our phones, in our homes, and the systems that run our everyday life. Algorithms are the reason why our digital maps know the best routes, and how Amazon always seems to know what other items we might like. These algorithms have the ability to make life easier for many people around the world but what is thought of usually as algorithms is the partnership between algorithms and machine learning.  An algorithm is a formula for making decisions and determinations about something whereas machine learning is a technique we use to create these algorithms (Denning & Martell, 2015). These machine learning models can be complex, containing hundreds of features, and millions of rows to come to results that are highly consistent and accurate. They also have a wide variety of uses from determining insurance premiums (O’Neil, 2016a), writing articles (Marr, 2020), improving the administration of medicine (Ackerman, 2021; Ngiam & Khor, 2019) and setting bail (Kleinberg, Lakkaraju, Leskovec, Ludwig, & Mullainathan, 2018). Though there are a lot of benefits to using algorithms and machine learning, there are times when they cause more harm than good, such as when algorithms give men higher credit limits than women (Condliffe, 2019), bias hiring practices, and recruitment (Hanrahan, 2020), and fail to identify black faces (Simonite, 2019).  The reason why this is so important is as these systems continue to expand into new fields, individuals rely on the judgments of these algorithms to make important decisions (even sometimes more than they rely on the judgment of others) (Dietvorst, Simmons, & Massey, 2015; Logg, Minson, & Moore, 2019). These important decisions made by an algorithm can become biased as the algorithm does not eliminate systemic bias but instead multiplies it (O’Neil, 2016b). To understand why some models work better and others, and where systemic bias comes from I will de-black box data, algorithms, and machine learning models using a potential resume sorter as an example.

The Data

We come to live in a period of history that some refer to as the information age, as data becomes one of the most valuable commodities to have and own (Birkinshaw, 2014). Data alone is not useful but data in context and relationship to other information is why companies spend millions of dollars a year to harvest information from as many sources as possible. A clear example is in explaining how labeling different data can data alter our perceptions of the magnitude of a specific number. Take the number 14, if I were to label it as 14 days versus 14 years, 14 days will then seem negligible compared to 14 years but I were to then add another piece of information such as 14 days as President versus 14 years in an entry-level position the 14 days will then carry more weight than 14 years. This is how data works, by quantifying existing information in ways in which we can then analyze the differences in the number 14 in the various contexts in which it exists.

The first part of the data process is quantifying the phenomenon of interest. Using the example of a job application, one of the properties that need to be quantified properly is years of experience. Just as with the previous example number magnitudes, not all experience is weighed the same so different features or variables need to be created to differentiate these types of experiences. This could be done by categorizing the type of prior work as entry-level or managerial, or degree of relevance to the position being offered. As with all translations, there is something lost when attempting to change from one language to another. How would one categorize freelance experience or unpaid work? These examples highlight how even when capturing what appears to be the correct objective information through the process of quantifying may end up taking highly complex information and flattening thoughts, expertise, and experiences, ultimately biasing the outcome (O’Neil, 2016b). This standardizes information into a format that is understandable to a computer but may not accurately represent the reality from which the information is derived. This is why companies and researchers attempt to collect many different types of information as it gives a well-rounded context to the data and allows for a fuller picture. A good example of these types of profiles is checking what your Google ( Facebook ( add profiles have to say about you (Haselton, 2017; Holmes, 2020).

Figure 1. Picture taken from Homes (2020).

The information a company has on an individual may not be explicitly given but rather inferred by the other pieces of information especially when the target data is unavailable. This can be done using by making two assumptions. First, that information does not exist in isolation. Second, that relationships between variables occur in a systematic way. Both of these points will be addressed more specifically in the next section about modeling. This is to say, that the more information received, the better the inference we can make about certain qualities of the person. Going back to the example of a job application, if someone reports working for a specific company, say Netflix, knowing other employees who have come from Netflix will allow individuals to make inferences about the work ethic of the applicant. People do this all the time when taking suggestions from their friends on items to buy or places to eat.  Though these inferences may be faulty, especially considering people’s tastes differ depending on the type of food, in collecting more information, people can make better judgments based on the information available.

There are major issues though when it comes to data and data relationships.
First, “Garbage in, garbage out” problem. Data is only as good as the quality of information being put into it. This issue is substantiated most directly when the data being captured is either not accounting for the truth or the question being asked doesn’t accurately measure the construct thought to be measured (O’Neil, 2016b). In the example of the job application, if someone is asked what type of experience they may have with coding python, their answer may be only two years, but their true understanding of coding may have come from the 11 years working with javascript, C++, and SQL. The question attempting to get directly at the at expertise may gloss over a more fundamental understanding of coding in general.

Second, previous biases may be reflected in the data, which the data does not account for. This has become extremely salient in the past few years with the rise of the Black Lives Matter movement bringing to light the systemic issues when it comes to understanding outcomes in relation to race. An example from a paper by Obermeyer and colleagues (2019) showed that black patients’ health risk scores are chronically sicker than their white counterparts at the same score. This is because black patients generally spend less money on medicine and because health care costs are used as a measurement of the level of sickness black patients are rated as being healthier. This though doesn’t reflect the truth about the severity of the illnesses black and white individuals may face, but more the cultural differences in seeking standard health care. It’s important that when collecting data that the data represents what you believe it represents and that a more holistic picture understood, especially before embarking on creating your model.

Figure 2. Taken from Obermeyer et al., 2019


“All models are wrong, but some are useful” – George Box

Modeling is how we turn data into algorithms. Each piece of data gives us important sets of information in context but it’s how these data may interact which allows us to make predictions, inferences, and act upon it. It’s important to note that models initially that models are agnostic to what the data is or represents, the chief concern of a model is the potential relationship that data may have with a specified result. Modeling makes the assumption that data varies in a systematic way, meaning that there is a discernable pattern that can be used to predict certain outcomes. This assumption means that data cannot occur in isolation and that there are relationships between phenomena that can explain why something is the way it is. The distinction between these two things leads to the initial distinction between the type of models used, predictive more inferential.

Prediction Versus Inference

Predictive models care about one thing, being accurate. This may mean that a model might find a connection between applicants with the letter Z in their name and potential hire-ability. Though this may seem to be a silly example, it does correctly illustrate the point that these types of models only worry about the potential outcome and maximizing the predictability of these outcomes. There are many benefits to this, as people may not mind how a computer is generating an outcome for a cancer screening, only that the screening is accurate.

Inference on the other hand concentrates on the relationship between the variables. An example of this would be how much does five years of field experience matter when compared to a college degree in the subject. This type of modeling attempts to discern meaningful connections using semantically meaningful data. This would be more useful in the instances when attempting to find the cause of a particular outcome and understanding how one thing may relate to another.

Most modeling you find in business are predictive models, whereas in academia and policy inferential models are much more important. The type of model you decided on will ultimately impact the outcome you arrive at.

Types of Modeling

Modeling is the attempt to create an algorithm which can accurately predict a certain phenomenon. Each of these models then is competing with whether it is able to decern outcomes better than chance, using various different methods to achieve this. Most modeling in computer science involves taking part of the data to create the tests and another part of the data to Modeling can be broken down into two broad categories, classification and regression.

Classification, also referred to as discriminant analysis, attempts to create boundaries around data in an effort to correctly sort data into predefined categories (Denning & Martell, 2015). Most of these techniques are also categorized as non-parametric, meaning that the model does not make assumptions about the structure of the data before attempting to sort the data into groups. There are a few different classification techniques but one that most easily understood and widely used is a decision tree. Decision trees are essentially a sequence of logical statements attempting to create rules around a certain phenomenon. Things like, does the oven feel hot? If ‘Yes’ then the oven might be on. Though models get more complex than this, and the number of rules may increase the goal says the same, how to most accurately sort data into categories using these rules. The program attempts to create a model that is able to increase accuracy while reducing error as much as possible. This may be more or less possible depending on the outcome of the data, and the divisions created may not be meaningful in any way.

Figure 3. An example of a decision tree for a Job Recruiter

The other type of modeling is regression modeling, or linear modeling. This type of modeling makes assumptions about the data’s structure which is why it’s referred to as parametric modeling (Alpaydin, 2016). These assumptions state that the data is should be represented in a normal distribution, and that the data varies in a systematic linear way.

Figure 4. Standard Normal Bell Curve taken from

Though there is are whole courses devoted to regression, what regression is attempting to do is take the variation within one variable to explain some of the variations in another. How much can I explain the increased desire to have ice cream with the increase in summer temperatures? In the example of the job applicant, the greater the years of experience may make a more capable candidate. The issue with this obviously is that it relies primarily on the data being linear, and presupposing that a candidate who may have 20 years of experience is always better than a candidate who has 10 years of experience. There are ways around this assumption but most modeling done using these techniques do no account for them. In regression technically the more items used the better we are able to predict the outcome, though this doesn’t mean each variable is contributing a  significant amount. Different weights are then placed on different variables which make up these regression formulas, indicated that certain variables contribute more to finding the results then others. Statistical weights are represented by a number that you multiply with the observed variable (e.g. years of experience) in an attempt to create a formula. Each weight then represents how much of a certain value contributes to finding the predicted outcome (e.g. hire-ability).  There a couple ways to do regression, using a frequentist approach or a Bayesian approach, regardless of what you are attempting to do is explain the variation in the target variable.

Figure 5. Taken from

The last type of modeling I want to discuss are Neural Networks. These are a bit complicated so I included a video explaining what these are in greater detail. To summarize, neural networks are a series of interconnected nodes which adjust the statistical weights of the connecting variables/nodes in an attempt to find the best possible configuration to predict the outcome. These statistical weights start in arbitrary but adjust through iterations to create the best model possible. This type of model is being utilized to create complex networks and formulas to predict things such as heart disease (Reuter, 2021). The unfortunate part of Neural Networks are nodes refered to as hidden layers, which processes which occur behind the scenes and makes Neural Networks difficult to interpret beyond the outcome predictor.

Figure 6. A simple Neural Network taken from

All of these different types of modeling are ultimately tools to understand the relationships within the data. This brings us back to the concept of “Garbage in, garbage out”. The most important part of the model is the information that is being used to create it, without good information, we can’t get a useful model.


There many different techniques which are utilized useful algorithms using machine learning. As previously stated, the data we feed into these models will impact the eventual outcome. This is why it is so important to understand what we are attempting to predict and to control for it. Coming back again to the resume sorter to explain why getting data right is so important. In 2016 it found that those who had ethnic-sounding names were not called back at the same rate as those who had whitened their name, even with equivalent resumes (Kang, DeCelles, Tilcsik, & Jun, 2016). When creating these algorithms with machine learning, the data we provide the model will indicate the result we ultimately achieve. If certain types of people or certain types of experiences are not accounted for in the data or misrepresented then those biased are amplified through the process of machine learning. If an applicant for instance takes an opportunity that is hard to define, those opportunities may become disadvantageous when looking for a job in the future. If an applicant does conform to the standard array of qualities, then that applicant may be rejected for being different rather than not right for the position. Though all models may be wrong, models used incorrectly may cause more harm than good. This is to say the solution may be to create the world we want to see rather than the world which we have. By creating data and training machine learning models on these ideals, algorithms will be created to reflect what we want, rather than amplify what already is. To get there we have to understand the relationships between certain factors already existing in our data, such as how to we weight certain experiences, or achievements.

            Algorithms have revolutionized the world for the better and will continue to do so as data becomes more abundant and machine learning models become more complex. Understanding how data is used, and why things are the way they are gives us healthly skepticism for when the next time an algorithm tells us what to do.


Ackerman, D. (2021). System detects errors when medication is self- administered. MIT News, pp. 2–5. Retrieved from

Alpaydin, E. (2016). Machine Learning. Cambridge, Massachusetts: The MIT Press.

Birkinshaw, J. (2014). Beyond the Information Age. Wired, 8. Retrieved from

Condliffe, J. (2019). The Week in Tech: Algorithmic Bias Is Bad. Uncovering It Is Good. The New York Times, 13–15. Retrieved from

Denning, P. J., & Martell, C. H. (2015). Great Principles Of Computing. The MIT Press. Cambridge, Massachusetts: The MIT Press.

Dietvorst, B. J., Simmons, J. P., & Massey, C. (2015). Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General, 144(1), 114–126.

Hanrahan, C. (2020). Job recruitment algorithms can amplify unconscious bias favouring men , new research finds Women ’ s CVs are ranked lower than men ’ s CVs across each job type. ABC News, 10–13. Retrieved from

Haselton, T. (2017). How to find out what Facebook knows about you. CNBC, pp. 1–12. Retrieved from

Holmes, A. (2020). Clicking this link lets you see what Google thinks it knows about you based on your search history — and some of its predictions are eerily accurate. Buisness Insider, 1–8. Retrieved from

Kang, S. K., DeCelles, K. A., Tilcsik, A., & Jun, S. (2016). Whitened Résumés: Race and Self-Presentation in the Labor Market. Administrative Science Quarterly, 61(3), 469–502.

Kleinberg, J., Lakkaraju, H., Leskovec, J., Ludwig, J., & Mullainathan, S. (2018). Human Decisions and Machine Predicitons. Quarerly Jorunal of Economics, 237–293.

Logg, J. M., Minson, J. A., & Moore, D. A. (2019). Algorithm appreciation: People prefer algorithmic to human judgment. Organizational Behavior and Human Decision Processes, 151(December 2018), 90–103.

Marr, B. (2020). What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence? Forbes, 2–8. Retrieved from

Ngiam, K. Y., & Khor, I. W. (2019). Big data and machine learning algorithms for health-care delivery. The Lancet Oncology, 20(5), e262–e273.

O’Neil, C. (2016a). How algorithms rule our working lives. The Guardian, pp. 1–7. Retrieved from

O’Neil, C. (2016b). Weapons of Math Destruction (Vol. 78). New York, New York: Crown.

Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453.

Reuter, E. (2021). Mayo Clinic finds algorithm helped clinicians detect heart disease , as part of broader AI diagnostics push. Med City News, 1–5. Retrieved from

Simonite, T. (2019). The Best Algorithms Still Struggle to Recognize Black Faces | WIRED. Wired, 1–7. Retrieved from


De-blackboxing the Role of Machine Learning in Computational Propaganda

Hans Johnson

De-blackboxing the Role of Machine Learning in Computational Propaganda


The COVID-19 pandemic has brought about drastic societal changes. One of the most evident changes is increased time spent at home. Consequently, many indoor forms of entertainment have witnessed substantial growth in popularity, for example, online video streaming, video gaming, and particularly social media use.1  Unfortunately, increased traffic on social media has inadvertently magnified exposure to information operations (disinformation). Information operations come in many different forms, yet one of the most prolific in social media has been computational propaganda. Within this subsect of disinformation, machine learning is used to full effect to amplify influential information, incite emotional responses, and even interact with legitimate users of social media platforms.2 There is a combination of machine learning and pattern recognition techniques utilized in this process, several of the most eminent being NLP (Natural Language Processing) and styleGAN (Generative Adversarial Networks). This research project will,1) Give a brief history in the evolution of propaganda and how historical and modern propaganda differ in scope, 2) Provide a foundational understanding of NLP and styleGAN, and 3) Describe how NLP and styleGAN is used to disseminate information, or otherwise amplify it.


Propaganda has been used throughout human history by various state and non-state actors to influence human behavior to achieve a desired outcome. 3 Propaganda can be seen or heard in symbols, (images, words, movies, music, etc.). However, what separates propaganda from regular human discourse is it’s deliberateness. Propaganda is at its core, the deliberate creation or alteration of symbols (information) to influence behavior. 4 This is why propaganda in the digital age is so troubling. After all, computing is the structuring, management, and processing of information, (in what is now vast quantities).5  

The mass production of influential information and symbols began with the printing press. By the 1890’s, a single issue newspaper numbered over one million copies, allowing media to reach larger audiences than ever before. 6 Newspapers were known to influence public opinion, particularly leading up to, and during times of war. The cartoon depicted below was an editorial published in the Pennsylvania Gazette in 1754, which helped incite resistance in British colonies against French colonial expansion in North America. 7 

In the late 19th century, the Spanish-American war was agitated by the newspaper moguls William Randolph Hearst and Joseph Pulitzer, who began publishing what was known as “yellow journalism.” 8  This form of journalism published crude and exaggerated articles which were meant to sensationalize information, and otherwise promulgate emotional responses in viewers. The illustration in the newspaper below exhibits how false information can travel faster than the truth. In 1898, the USS Maine sank due to an explosion of unknown origins. Yet, before a formal investigation was conducted, newspapers circulated claiming the boat had sank due to a bomb or torpedo originating from the Spanish navy. An investigation at the time concluded the boat was sunk as a result of a sea mine. However, in 1976, a tertiary investigation proved the boat sank as a result of an internal explosion. 9 If such information was available at the time, the war may never have happened.  

A turning point for propaganda came when real images first began to appear in newspapers. Moments captured in real time have a profound impact on the human psyche. In 1880, the first halftone was printed in the Daily Graphic, beginning what is now known as photojournalism.10 The picture below is the half-tone printing in the Daily Graphic of New York’s shanty town. 

During World War 1, posters were the primary transmitter of propagandist material. 11 Although, the information was often originating from the targets populations own government. A combination of image and text sends a powerful message, the simple phrases directing the viewer to feel a certain way about an image. The poster on the right is depicting German troops in WWI committing what looks to be war crimes, coaxing viewers to join the military.12 The use of posters, cartoons, and images continued throughout the early half of the 20th century, and continues to this day. 

As we transition into the digital age, information reaches audiences across the world at unprecedented speeds, and it seems information has outpaced society’s capacity to process it. As literacy rates drastically increased in the past two centuries, so too did access to information, and consequently, propaganda. What is more troubling however, is literacy rates and tertiary education completion among adults has not increased proportionately. While nearly 100% of Americans aged 15 or older are literate, only approximately one-fourth receive tertiary education. 13 14  This creates a serious conundrum, as information proliferates.  However, at the same time, society’s capacity to  process that information in a comprehensive and objective manner is insufficient and detrimental. Below are two graphs, one depicting higher education completion in adults, the second showing US household access to information technology. 

What is even more concerning is the progressing capability of entities to target certain demographics with specific and generative information. The capacity to profile groups began early in the 20th century, by means of surveys which collected data on public opinion, consumer habits, and elections.15  In 1916, the Literary Digest began public polling on presidential candidates.16  This practice was further augmented by the Gallup Polls in the 1930s, which took into account more than just public opinion on elections, including the economy, religion, and public assistance programs.17  Understanding public sentiments was an important step in the evolution of influencing human behavior. 

Currently, human behavior can be categorized, documented, and influenced, based on our most intricate and personal habits. This is made possible as a result of  increased storage capacity in cloud infrastructure, machine learning, and deep neural networks. Furthermore,  most of this information is often gathered without user consent or knowledge.

Although this data is not always used to simply influence consumer habits, it can be used to disrupt social cohesion, instill distrust in democratic institutions, and incite violence based on race, religion and political disposition. Malicious entities, many originating from Russia, have infiltrated social media circles in the United States, creating false personas which present as activist groups of various motivations. Much of this intentional malicious activity is made possible through NLP and styleGAN.18  NLP is likely used in several ways by propagandists, most importantly, to translate propaganda from one language to another. Secondly, semi-automated chatbots are trained to interact with legitimate users. And with this, we will provide a base understanding of NLP and styleGAN. 

Natural Language Processing (NLP) 

NLP is essentially the intersection between linguistics and computer science.19 Natural written and spoken languages are encoded as data within a computer via acoustic receptors or typed text, then decoded by a program which places this data through a Deep Neural Net (DNN). This DNN routes the data through hidden layers of mathematical algorithms, and injects the data into a statistical model which produces the most accurate representation of said data. This method of machine learning has improved over the years, with IBMs statistical word level translations being some of the first NLP software. 

IBMs Word Based NLP

IBM’s software was successful for three reasons,  it could be improved, modified, and revised.20 This statistical model based ML began in the 1960s, and utilized rule-based algorithms known as “word tagging.”21 Word tagging would assign words grammatical designations like nouns, verbs, or adjectives, in order to construct functional sentences. Yet, as one could imagine, words are used in a multitude of ways in the English language, which created limitations.  An issue with IBM’s model was, however, the translation of single words, rather than entire sentences. Another issue which plagued IBM’s early statistical models was its inadequate access to data.22 Whereas, now in machine learning, the opposite is the case. There is such a multitude of data, that it must be cleaned and carefully chosen to meet certain needs. The diagram below is depicting IBM’s statistical model process.23

Google’s GNMT

Google’s Neural Machine Translation (GMNT) system is making strides in increasing the accuracy of machine translation and speech recognition in several ways. First, GNMT encodes and decodes entire sentences or phrases, rather than word to word translation like IBM’s early NLP.24  Secondly, GNMT employs Recurrent Neural Networks (RNN) to map the meaning of sentences from one input language to an output language. This form of translation is much more efficient than word to word or even phrase based methods. Below is an example of GNMT actively translating Chinese text, a language historically difficult to translate in NLP software. This is shown directly from Google’s AI blog is below :25

Yet, the sentence based GNMT is coming very close to decoding complex Chinese text. The concept of translating entire sentences to produce meaning in another language was once thought of as early as 1955:

“Thus may it be true that the way to translate from Chinese to Arabic, or from Russian to Portuguese, is not to attempt the direct route, shouting from tower to tower. Perhaps the way is to descend, from each language, down to the common base of human communication—the real but as yet undiscovered universal language—and then re-emerge by whatever particular route is convenient” 26

Other developments in NLP are numerous, and one of them is particularly concerning as it pertains to propaganda, this being GPT3.


Generative Pre-trained Transformer 3 or “GPT3” is the third version of a text generator which utilizes machine learning and DNN to produce and predict speech. The capabilities of GPT3 can include answering questions, composing essays, and even writing computer code.27 Yet, unlike GNMT, IP surrounding GPT3 is kept mostly secret, with the exception of some application programming interfaces. The GPT utilizes over 175 billion parameters in its weighting system to determine the most plausible course of action, more than ten times the number of its next highest competitor.28 GPT3 with its Q & A feature can answer common sense questions, setting it apart from other AI software as seen here29

Yet for all its practical applications, there are also some serious deficiencies. The text it generates can sometimes be far from the desired outcome.  At times, bias becomes evident, and even racially discriminatory.30 Additionally, instead of correcting some of the shortcomings in previous versions, GPT3 simply offers a wider range of weighting parameters.

Generative Adversarial Networks

Generative Adversarial Networks consist of two DNN’s, one is the generative and the other  discriminative. The generative creates a form of media, whether this be images, soundwaves or text, and this is analyzed by the discriminative. The adversary compares this media to a real image, and if deemed real, the generative network wins; if it is considered fake, the generative runs its algorithms again, then produces more images.  


StyleGAN is a derivative of Generative Adversarial Networks which is able to produce high definition artificially created images of human faces which appear real to an untrained eye. StyleGAN generates more unique images than other GAN generated images.31  Here are some examples of human faces produced with StyleGAN:32

Machine Learning in Computational Propaganda

One of the many benefits of open source information and free software, is enriching  the lives of the less privileged segments of society, and this, even though an unintended consequence of open source technology is its use by nefarious entities. Currently, Google Translate, GPT2 and StyleGAN are open source. This means malicious actors can utilize the technology with virtually no cost in R&D or use. The possible applications of such technology relating to propaganda are many. 

Role of NLP

NLP is perhaps the most concerning of the machine learning techniques which can be utilized by foreign entities. One of the many barriers which previously limited the spread of information historically, was language. Now with GNMT becoming progressively more accurate, malign actors can translate vast sums of foreign languages more efficiently and quickly than ever before. This serves a dual purpose, as GNMT can be used by foreign actors to send more complex messages which are less distinguishable from the target language. Secondly, it becomes easier to research divisive topics in various regions of the world. Below are two propaganda ads originating from Russian sources from 2015 – 2016.33 Both ads cover politically divisive topics, one being LGBTQ rights, and the other police brutality against minorities.  

The text contained in the previous ads is crudely translated, likely indicating NLP was used for their production. It is possible crude translation is why these ads were discovered in the first place. Many of the Russian ads from 2015-2017 released by the US Senate Intelligence Committee contain frequent grammatical and translation errors.34 Furthermore, most, if not all the ads relate to race, religion, sexuality, or politics. 

Role of GPT2

GPT2 can be utilized by malign actors in multiple ways. One possible use is to train chatbots to interact with legitimate users in social media platforms.35 The Q & A feature of GPT is what makes such interactions possible by directing chatbots to comment on specifically tagged posts, popularize hashtags, and potentially respond to emphatic replies from users.36 Secondly, GPT2 can boost the relevance of posts which fake accounts are trained to popularize. Lastly, GPT2 can be used to create fake profile biographical information to afford more legitimacy to fake accounts. 

Role of StyleGAN

The role of styleGAN in computational propaganda is to add legitimacy to fake profiles.37 As seen above in the collage of AI generate photos, real from fake can be difficult to differentiate. Adding a human face to profiles is particularly useful for creating false personas whose mission is to produce content to be amplified by either autonomous or semi-autonomous accounts. Below is a fake twitter account generated entirely autonomously:38

The limitations of NLP and styleGAN in Propaganda

The R&D associated with NLP and styleGAN is complex, but its use in spreading information is simple; create false personas, like, share, comment and react. While the applications of NLP and styleGAN are numerous for the proliferation of fake news, what is more concerning, is the amplification of factual, yet divisive news. By simply reinforcing already existing divisions, computational propaganda self-proliferates. Propaganda is most successful concerning topics which are already extreme points of contention.

“Propaganda is as much about confirming rather than converting public opinion. Propaganda, if it is to be effective must, in a sense, preach to those who are already partially converted”  – Welch, 2014 39

The previous statement has become particularly evident in the past few years in American politics, concerning the sense of tribalism in race, religion, and sexuality.40 Take, for example, the following Russian propaganda ad: 

In hindsight, the ad does not seem to send such a divisive message. After all, most could get behind supporting homeless veterans. However, what differentiates this ad from the previous, is its subtlety. The ad received nearly 30,000 views and 67 clicks, far more than the police brutality and LGBTQ ad, likely because it was not identified as early as its counterparts. Secondly, if one is to take note of the date on the ad, it was created not long after the Baltimore riots in the aftermath of Freddie Gray’s death.41 The ad also is tailored to target African-American audiences. The timing of information appears to be just as important as the message, and with modern technology, timing is almost never an issue. 


Machine learning plays a fundamental role in amplifying  information, but a limited role in creating it. Successful conspiracy theories require time to fabricate, and even more importantly, human, rather than artificial, intelligence.42 In fact, the overuse of AI in spreading information can be detrimental to an operation, as it flags the associated accounts or posts due to over activity.43 After analyzing a multitude of Russian propaganda ads between 2015-2017 released by the Senate Intelligence Committee (provided by social media platforms), it became apparent the ads which were discovered contained poor grammar. This may suggest the gap in data indicates foreign entities are using machine learning to analyze which ads are taken down and which remain.44  Additionally, a rather obvious bias in the data, it consisted almost entirely of sponsored ads paid for in Russian rubles, which is easily trackable. What was also absent from the released data, was a very modern influential form of propaganda, this being memes. In recent years, the Russian Internet Research Agency has garnered a strong following in its troll accounts on Instagram, which reach millennial audiences of varying demographics utilizing memes and pop culture, memes which are likely curated entirely by individuals, not AI.45 The human element of propaganda remains just as relevant as it did in the 20th century, and will likely continue well into the 21st century. 

End Notes

  1. Samet, A. (2020, July 29). How the coronavirus is changing us social media usage. Insider Intelligence.
  2.  Woolley, S., & Howard, P. N. (2019). Computational propaganda: Political parties, politicians, and political manipulation on social media.
  3.  Smith, B. L. (n.d.-a). Propaganda | definition, history, techniques, examples, & facts. Encyclopedia Britannica. Retrieved May 11, 2021, from
  4.  Smith, B. L. (n.d.-a). Propaganda | definition, history, techniques, examples, & facts. Encyclopedia Britannica. Retrieved May 11, 2021, from
  5. What is computing? – Definition from techopedia. (n.d.). Techopedia.Com. Retrieved May 11, 2021, from
  6. Newspaper history. (n.d.). Retrieved May 11, 2021, from
  7. The story behind the join or die snake cartoon—National constitution center. (n.d.). National Constitution Center – Constitutioncenter.Org. Retrieved May 11, 2021, from
  8. Milestones: 1866–1898—Office of the Historian. (n.d.). Retrieved May 11, 2021, from
  9. Milestones: 1866–1898—Office of the Historian. (n.d.). Retrieved May 11, 2021, from
  10. The “daily graphic” of new york publishes the first halftone of a news photograph: History of information. (n.d.). Retrieved May 11, 2021, from
  11. Posters: World war i posters – background and scope. (1914). //
  12. Will you fight now or wait for this. (n.d.). Retrieved May 11, 2021, from //
  13. Roser, M., & Ortiz-Ospina, E. (2013). Tertiary education. Our World in Data.
  14. Roser, M., & Ortiz-Ospina, E. (2016). Literacy. Our World in Data.
  15. Smith, B. L. (n.d.-b). Propaganda—Modern research and the evolution of current theories. Encyclopedia Britannica. Retrieved May 11, 2021, from
  16. The “literary digest” straw poll correctly predicts the election of woodrow wilson: History of information. (n.d.). Retrieved May 11, 2021, from
  17. Inc, G. (2010, October 20). 75 years ago, the first gallup poll. Gallup.Com.
  18.  P. 4 Martino, G. D. S., Cresci, S., Barrón-Cedeño, A., Yu, S., Pietro, R. D., & Nakov, P. (2020). A survey on computational propaganda detection. Proceedings of the Twenty-Ninth International Joint Conference
  19.  What is natural language processing? (n.d.). Retrieved May 11, 2021, from
  20.  P. 118 Poibeau, Thierry. Machine Translation. 1st ed., MIT Press, 2017.
  21. A beginner’s guide to natural language processing. (n.d.). IBM Developer. Retrieved May 11, 2021, from
  22. A beginner’s guide to natural language processing. (n.d.). IBM Developer. Retrieved May 11, 2021, from
  23.  P. 118 Poibeau, Thierry. Machine Translation. 1st ed., MIT Press, 2017.
  24. Le, Q. V., & Schuster, M. (n.d.). A neural network for machine translation, at production scale. Google AI Blog. Retrieved May 11, 2021, from
  25.  Le, Q. V., & Schuster, M. (n.d.). A neural network for machine translation, at production scale. Google AI Blog. Retrieved May 11, 2021, from
  26.  P. 64 Poibeau, Thierry. Machine Translation. 1st ed., MIT Press, 2017
  27. Marr, B. (n.d.). What is gpt-3 and why is it revolutionizing artificial intelligence? Forbes. Retrieved May 11, 2021, from
  28. Vincent, J. (2020, July 30). OpenAI’s latest breakthrough is astonishingly powerful, but still fighting its flaws. The Verge.
  29.  Sharma, P. (2020, July 22). 21 openai gpt-3 demos and examples to convince you that ai threat is real, or is it ? [Including twitter posts]. MLK – Machine Learning Knowledge.
  30. Vincent, J. (2020, July 30). OpenAI’s latest breakthrough is astonishingly powerful, but still fighting its flaws. The Verge.
  31.  P. 1 Karras, T., Laine, S., & Aila, T. (2018). A style-based generator architecture for generative adversarial networks.
  32.  P. 3 Karras, T., Laine, S., & Aila, T. (2018). A style-based generator architecture for generative adversarial networks.
  33. Social media advertisements | permanent select committee on intelligence. (n.d.). Retrieved May 11, 2021, from
  34. Social media advertisements | permanent select committee on intelligence. (n.d.). Retrieved May 11, 2021, from
  35.  P. 4 Martino, G. D. S., Cresci, S., Barrón-Cedeño, A., Yu, S., Pietro, R. D., & Nakov, P. (2020). A survey on computational propaganda detection. Proceedings of the Twenty-Ninth International Joint Conference
  36. Xu, A. Y. (2020, June 10). Language models and fake news: The democratization of propaganda. Medium.
  37. O’Sullivan, D. (2020, September 1). After FBI tip, Facebook says it uncovered Russian meddling. CNN.
  38. O’Sullivan, D. (2020, September 1). After FBI tip, Facebook says it uncovered Russian meddling. CNN.
  39.  P. 214 Welch, D. (2004). Nazi Propaganda and the Volksgemeinschaft: Constructing a People’s Community. Journal of Contemporary History, 39(2), 213-238. doi: 10.2307/3180722
  40. NW, 1615 L. St, Suite 800Washington, & Inquiries, D. 20036USA202-419-4300 | M.-857-8562 | F.-419-4372 | M. (2014, June 12). Political polarization in the american public. Pew Research Center – U.S. Politics & Policy.
  41. Peralta, E. (n.d.). Timeline: What we know about the freddie gray arrest. NPR.Org. Retrieved May 11, 2021, from
  42. Woolley, S., & Howard, P. N. (2019). Computational propaganda: Political parties, politicians, and political manipulation on social media.
  43. Woolley, S., & Howard, P. N. (2019). Computational propaganda: Political parties, politicians, and political manipulation on social media.
  44.  P. 4 Martino, G. D. S., Cresci, S., Barrón-Cedeño, A., Yu, S., Pietro, R. D., & Nakov, P. (2020). A survey on computational propaganda detection. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 4826–4832.
  45. Thompson, N., & Lapowsky, I. (2018, December 17). How russian trolls used meme warfare to divide america. Wired.

De-Blackboxing AI/ML in Credit Risk Management

Hao Guo


This essay discusses Artificial Intelligence (AI)’s applications in the financial field, specifically in the usage on banks sector’s credit risk management. Starting with a short description of fundamental AI and financial background knowledge, followed by the shortcomings of traditional banking and further emphasized the importance of applying AI techniques in credit risk assessment along with its positive consequences. The major component forces on introducing different ML models applied in the process of credit risk management and later de-Blackbox AI technique by analyzing each ML model in detail. The toolkit example of Zen Risk developed by Deloitte helps visualized the de-Blackboxing process by providing a real-life case. The article also listed current concerns and potential risks of applying AI-generated models in the financial field at the bottom to provide a more comprehensive analysis. 


In the era of data explosion, artificial intelligence (AI), as a tool that great at processing massive amounts of data in a limited timeline, has been used in various fields to save manpower and material resources. Credit risk management as a sector mainly composed of data and models in the financial field, the intervention and the integration of artificial intelligence technology has become an inevitable trend. The AI-generated model can improve efficiency and increase productivity while reducing operating costs. AI and ML are game-changer in the field of risk management due to the feature of properly addressing the risks posed by financial institutions that involved with a large number of complex types of information every day (Trust, 2020). Even though there are many prominent aspects of applying AI techniques in financial sectors, only 32 % of financial services applied in their industries, mainly on data prediction, financial product recommendations, and voice recognition. For now, the most common AI usage in the financial industry is chatbots which provide simple financial guidance to customers. A more complex application of AI in banking is to identify and eliminate fraud. However, the real potential market falls into risk management which highly related to financial institutions’ revenue (Archer Software, 2021). For better encouraging financial institutions especially the banking sector to keep completive in the market, making more profit, along with providing more comprehensive services for customers, de-Blackboxing AI-generated models becomes a priority for achieving these goals. How will the AI/ML model transform the banking system and further finning credit risk management? What is the Blackbox in this particular process? How do people interpret it? What are the benefits and negative results of this technique? Those are the questions waiting to be addressed.

  1. AI usage in Financial systems

1.1 Artificial Intelligence

Artificial Intelligence is the technology that allows the machine to mimic the decision-making process and the problem-solving ability of human beings by processing a massive amount of data with rapid, iterative algorithms to eventually automatically seek out its pattern. In short, like Alan Turing described, AI is “A system that acts like humans.” The most sturdy applications we encountered today like Siri, Amazon’s recommendation system is ANI (Artificial Narrow Intelligence) which is differs from the AGI (Artificial General Intelligence) because it only specifies in the particular field instead of mastering in a variety of sectors (Artificial Intelligence (AI), 2021).

1.2 Financial Systems

A financial system is a network connected by financial institutions like insurance companies, stock exchanges, and investment banks which allows organizations and individuals to perform capital transformations (Corporate Finance Institute, 2021). Along with the data explosion, processing financial dossier becomes more complex and time-consuming than ever before. The most arresting feature of the past financial system is the high dependency on human ingenuity (Part of the Future of Financial Services series from the World Economic Forum and Deloitte, 2018).

1.3 The Application of AI in Financial Service Spectrums

The introducing of AI brought a sky-level high efficiency into the financial system. Based on the CB Insights report (2018), over 100 companies indicated that their applications of AI improved communities’ performances in many aspects. Figure one maps out partial associations and their AI technique operating areas. Most AI financial service falls into nine categories: 1) Credit Scoring / Direct Lending 2) Assistants / Personal Finance 3) Quantitative & Asset Management 4) Insurance 5) Market Research / Sentiment Analysis 6) Debt collection 7) Business Finance & Expense Reporting 8) General Purpose / Predictive Analytics 9) Regulatory, Compliance, & Fraud Detection. Architectural Intelligence nearly across the entire financial service spectrum. Credit risk management categorized under Credit Scoring / Direct Lending is the ceiling priority in those many AI regulatory areas.

Figure 1. The AI in Fintech Market Map, 2017

  1. AI in Banking Majors

2.1 Banks Income Structure

The core of the banking business model is lending. Banks create monetary currency using the income earned from lending instruments and customer-facing activates. Other profit-making approaches including: Customer deposits, mortgages, personal loans, lines of credit, bank fees, interbank lending, and currency trading (Survivor, 2021). The majority of financial services in the banking sector are associated with lending which severely relies on the credit of the obligator. Back in 2018, 29% of customers claimed their preference of using credit cards for daily consumption (Schroer, 2021). As of 2020, 44% of U.S consumers carry mortgages and the number is growing steadily at a rate of 2% annually (Stolba, 2021). Borrowers’ behavior of repayment failure can cause banks to go bankrupt.

2.2 The Shortcoming of Traditional Lending Assessment

The audit process of issuing a loan requires a lot of manpower because customer files are often crowded with too many objective noise components. The slightest misjudgment will result in a wrong decision and further cause profit losing and injury borrower’s interest. For the individual borrower, forming a risk profile can affect one’s life to a great extent. For example, whether the individual can drive and live safely, the possibility of them being educated, and the chance of receiving medical treatment. For business borrowers, their risk picture is involved in more complex situations due to their data across a variety of parameters which need a longer period, cost more manpower and material resources to generate a holistic risk profile. Credit risk can affect a borrower’s financial status, lose the loaner’s capital, and damage both reputations (PyData, 2017). And the suboptimal underwriting, inaccurate portfolio monitoring methodologies, and inefficient collection models could aggravate these lending problems (Bajaj, n.d.).

Figure 2. overview of common steps in the lending process. ([[Graph]], n.d.)

2.3 The Importance of Applying AI in Lending Assessment

Processing a large number of credit assessments in limited timelines is the precedence for the banks to solve. The credit information’s existence in a form of dynamic data lets the AI have full leeway. Since the prominent feature of the AI technique is interpreting massive data in terse time with near-perfect accuracy (Bajaj, n.d.). AI helps the banks to streamline and optimize credit decisions in a wider range, transform noisy objective information into quantitative trading to better portrait consumers’ risk portfolios. AI-generated WAP (mobile banking application) can help banks knowing their borrowers’ financial conditions deeper but with more privacy by monitoring users’ financial behaviors. The AI-driven assessment can better analyze borrowers’ banking data, tracking their financial activities, and further avoid giving risk loans and reduce the possibility of encountering credit fraud (Use Cases of AI in the Banking Sector, 2021). AI is replacing many financial positions like data science analysts and FRM (financial risk manager) by proving safer, smarter, and more effective financial services to consumers (Schroer, 2021).

Figure 3. Machine learning surfaces insights within large, complex data sets, enabling more accurate risk (McKinney, n.d.)

  1. AI in Credit Risk Management

In the video above (RISKROBOT TM – Explainable AI Automation in Credit Risk Management – SPIN Analytics Copyright 2020, 2018), the broadcaster introduced RISKROBOT as a classic example of a credit risk computing AI technique and provided a cursory description of steps AI needed to portrait a consumer’s credit risk profile and generates a report. In another presentation made by PyData (2017), the reporter takes ZOPA as an example to introduce their ML technique involved credit risk management process. By comparing, it is quite obvious the fundamental procedures of applying AI techniques in credit risk management are relatively comparable.

3.1 AI Decision-Making’s Involvement in Credit Risk Management

Unlike human service banks which take days or even weeks to evaluate and process borrowing formalities, AI-driven banks provided extensive automated and nearly real-time services to the individual borrowers and SME lending. Following the local data sharing regulation, AI-assisted banks generate more accurate assessment results by evaluating clients’ both traditional data sources like bank transaction activities, FICO score, tax return histories, and new data information resources like general location data report, utility info more quickly, massively and extensively. AI’s decision-making involvement in credit risk management observes clients from sophisticated perspectives decreases the possibility of offering a risky loan by screening out potential fraud performers (Agarwal et al., 2021).

Credit Qualification

Instead of using a rule-based linear regression model, AI-driven banks built complex models to analyze both structured and unstructured data collected from user’s browsing histories and their social media to perform an objective and comprehensive analysis on individuals and SMEs who lack official credit records or authentic credit information reports. When building and refining the ML quantitative model, customers with significant loan risk characteristics are automated filtered by early algorithms. Potential default borrowers with wavery financial portraits required manual verification in the early stage and were comprehended by the ML model through categorizing more comparable cases in the self-auditing process (Agarwal et al., 2021).

Limit Assessment and Pricing

AI/ML technique allows the banks’ analysis borrowers off the record financial condition by applying optical character recognition (OCR) to extract data from non-documentation files like e-commercial expenditures from costumer’ email and their telecom records. The ML model in this intervention can dissect loan appliers’ actual financial disposition power, to provide a more rational loan amount that does not exceed the borrower’s repayment ability, and further using NLP (natural language processing) to determine the repayment interests (Agarwal et al., 2021).

Fraud Management

ML model is also devastated in detecting the five costliest frauds: 1) identity theft 2) employee fraud 3) third-party or partner fraud 4) customer fraud, and 5) payment fraud like money laundering (Agarwal et al., 2021). A chinses bank Ping An applied facial recognition to identify the confidence level of borrowers’ financial statements. The AI-driven facial recognition mobile phone software can detect and process 54 subtle expressions in 1/15 to 1/25 of a second by tracing eye movements (Weinland, 2018).

Figure 4. The combination of AI and analytics enhances the onboarding journey for each new customer. (McKinsey & Company, 2021)

3.2 ML Models in Credit Risk Management

Support Vector Machine (SVM)

SVM is a supervised machine learning algorithm often used for individual feature classifications (Ray, 2020).  By using the concept of Structural Risk Minimalization (SRM), SVM calculates and differentiates the two classes of hyper-plane in high-dimensional space and lines by using the linear model in a high-dimensional space. SVM helps to analyze credit risks by classifying the decisions to the rational breadth (Iyyengar, 2020).

Figure 5. (Kaggle: Credit Risk (Model: Support Vector Machines), 2020)

Decision Tree (DT)

Decision tree (like CRT, QUAID, QUEST, C5.0) are responsible for making predictions by inserting pre-programed decision rules subtracted from data features and generating tree-like structures terminated by decision notes which corresponding to input various. Starting from the top/rooting component, tracking down each branch that represents specific features of the borrower to find the predicted value (credit risk) (Fenjiro, 2018).  

Figure 6. Decision Tree in loaning approval case, 2018


Neural Networks (NN)

Neural Networks technique is a processor which simulates the activities of the human brain to collect the detected information and store the knowledge. Three major layers in Neural Networks include the input layer, hidden layer, and output layer. Other ML models like MA (Metaheuristic Algorithm) are also fit for analyzing credit risk management. However, the application depends on hands-on situations bases on the misclassification level, the accuracy of algorithms, and computational time (Iyyengar, 2020).

Figure 7. The neural network layers for credit risk evaluation

3.3 De-Blackbox AI in Credit Risk Management

Case study of Zen Risk

The AI technique used in credit risk management is a double-edged sword that performs extremely efficiently, but the progress wasn’t transparent enough for both loaners and borrowers to further touch the bottom. Deloitte designed a de-Blackboxing tool especially for revealing the myth of the AI-driven credit risk assessment process. The platform called Zen Risk aids to help clients access, compare and study the modernist ML models for better understand, analyzing AI techniques applied in credit risk management, and also make more accurate predictions. Zen Risk as a de-Blackbox toolkit, promised its clients a complete transparency evolution, audibility process, and clear output. The Zen-Risk case study will open the Blackbox on the perspectives of the applied models, features along with general outcome explanation and individual forecast explanation (Phaure & Robin, 2020).

Starting with the advanced pre-data processing stage where data filtering, classification, cleansing, and identified outlier parameters happens. Clients with prepared data sources can determine the perfect match ML model to use (like NN, DT, SVM, MA mentioned above). Zen Risk visualized the model choosing process for users to better understanding what happens when different ML models computing the data. The solution can be integrated into a single model, or hybrid models when seeking for comprehended investigation. The straightforward solutions generally fall into simple ML models like Boosting (like LightGBM) and Neural Network. Heterogeneous classifiers, individual classifiers, and Homogeneous classifiers are the most common methods used in this stage. When encountering complex situations, applying hyperparameter optimization algorithms (like LIME, SHAP) is necessary to perform a more engaged data interaction (Phaure & Robin, 2020).

Taking the tree-like model as an example, the algorithm performed in the first stage can present the importance of each feature by assessing its quantitative value. The computing process captures the impact of manipulating a variable on the model evaluation metric. During the transformation, the decrease in model quality is often associated with the variable’s Importance and influence. For example, if value = 600 is the standard of loaning rejection, then the feature of credit amount and age indicates a highly correlated factor with whether approving a loan than the features of loaning purposes (Phaure & Robin, 2020).

Figure 8. Deloitte artificial intelligence credit risk

LIME as a local model categorized as a post-hoc model-agnostic explanation technique that explains the individual prediction of de-Blackboxing ML in credit risk assessment by the lights of other approximate easy-to-decrypted Blackbox (Misheva, 2021). Unlike tree models, LIME remains encrypted as a Blackbox that only allows the users to study what’s happening inside by providing a similar transparent model (like linear regression, decision tree). The figure below is the LIME model possessed on these succedaneums (XGBOOST Model), to explain whether if the input data (borrower) has risky potential or not (Phaure & Robin, 2020).

Figure 9. Deloitte artificial intelligence credit risk LIME model presented by XGBOOST Model 

The most promising model is the Shapley value analysis (SHAP) which calculates the portion of each feature contributed in the individual prediction that is hard to accomplish by applying a simple linear function model. Unlike LIME which presents various factors of an individual, SHAP presents a unique value that indicates the direct answer onto a specific individual. The function of SHAP is showing above where f presents full feature, i presents the added features. And the next figure shows the result of a borrower is too risky for granting a loan generated by the SHAP model (Phaure & Robin, 2020).

  1. Concerns

Non-transparency and Data Bias

32 % of financial institutions displace fears of applying AI techniques and ML models in the credit risk assessment. AI-driven models are accurate in providing final outputs (making loaning decisions), but the complex calculation turns the entire thinking progress into a Blackbox which is hard to decrypt. It is more difficult for the financial institutions to explain the unqualified reasons to the borrowers other than providing a numerical result, and also hard for financial servers to report to their superiors why do these models receive these predictions (scores) (Kerr-Southin, 2021). The feature of Non-transparency in AI-generated models makes it even harder to detect and correct data bias which could deepen the discrimination.


The algorithms in the AI-generated credit risk assessment model are programmed by human beings. The programmer’s proficiency directly affects the performance level of the model. The model risk could severely harm a financial institution because is often too scaled to retrieve the loss. Any insignificant mistakes like hiring non-experience modelers and operators, no back-testing, and operational problems in the model could result in irretrievable damage. One large US bank lost $6 billion due to value-at-risk model risk. Under the regulation of protecting customer and company privacy, these failed modeling examples are growing on the tree but cannot be publicly studied. It blocks the way of learning from past experiences. Constant trials and errors have become the only effective solution at present (McKinsey & Company et al., 2015).


Through detailed analysis of several common artificial intelligence (AI) lending risk analysis models in the financial field, it is not difficult to find that because there is a large amount of complex data types got involved, while machines can conclude faster and more accurately than ever under the correct model control, but it becomes harder for humans to explain the reasons behind the scene. The AI/ML in Credit Risk Management is frankly unable to de-Blackbox to the bottom but could only analyze an individual model to help us understand more. It is worth noting that although this technology has brought many benefits to various financial organizations and individuals, such as saving manpower, material and decrease time costs, there are also many hidden concerns and potential risks like deepening discrimination cause by its non-transparency feature. 


  1. Agarwal, A., Singhal, C., & Thomas, R. (2021, March). AI-powered decision making for the bank of the future. McKinsey & Company.
  2. Archer Software. (2021, January 18). How AI is changing the risk management? Cprime | Archer.
  3. Artificial Intelligence (AI). (2021, May 4). IBM.
  4. Bajaj, S. (n.d.). AI, machine learning, and the future of credit risk management. Birlasoft.
  5. CB Insights. (2018, July 20). The AI In Fintech Market Map: 100+ Companies Using AI Algorithms To Improve The Fin Services Industry. CB Insights Research.
  6. Corporate Finance Institute. (2021, January 27). Financial System.
  7. Deloitte France. (2018, December 11). Zen Risk [Video]. YouTube.
  8. Fenjiro, Y. (2018, September 7). Machine learning for Banking: Loan approval use case. Medium.
  9. [Graph]. (n.d.). SAS.
  10. Iyyengar, A. (2020, August 18). 40% of Financial Services Use AI for Credit Risk Management. Want to Know Why? Aspire Systems.
  11. Kabari, L. G. (n.d.). The neural network layers for credit risk evaluation [Graph].
  12. Kaggle: Credit risk (Model: Support Vector Machines). (2020). [Graph]. Kaggle.
  13. Kerr-Southin, M. (2021, January 22). How FIs use AI to manage credit risk. Brighterion.
  14. McKinney. (n.d.). Figure 3. Machine learning surfaces insights within large, complex data sets, enabling more accurate risk [Graph].
  15. McKinsey & Company. (2021). Figure 4. The combination of AI and analytics enhances the onboarding journey for each new customer. [Graph].
  16. McKinsey & Company, Härle, P., Havas, A., Kremer, A., Rona, D., & Samandari, H. (2015). The future of bank risk management.
  17. Misheva, B. H. (2021, March 1). Explainable AI in Credit Risk Management. ArXiv.Org.
  18. Part of the Future of Financial Services series from the World Economic Forum and Deloitte. (2018, September 7). The new physics of financial services: How artificial intelligence is transforming the financial ecosystem. Deloitte United Kingdom.
  19. Phaure, H., & Robin, E. (2020, April). deloitte_artificial-intelligence-credit-risk.pdf.
  20. PyData. (2017, June 13). Soledad Galli – Machine Learning in Financial Credit Risk Assessment [Video]. YouTube.
  21. Ray, S. (2020, December 23). Understanding Support Vector Machine(SVM) algorithm from examples (along with code). Analytics Vidhya.
  22. RISKROBOT TM – Explainable AI Automation in Credit Risk Management – SPIN Analytics Copyright 2020. (2018, November 27). [Video]. YouTube.
  23. Schroer, A. (2021, May 8). AI and the Bottom Line: 15 Examples of Artificial Intelligence in Finance. Built In.
  24. Stolba, S. L. (2021, February 15). Mortgage Debt Sees Record Growth Despite Pandemic. Experian.,-In%20line%20with&text=Even%20with%20the%20moderate%20growth,highest%20they%20have%20ever%20been.&text=As%20of%202020%2C%20approximately%2044,2019%2C%20according%20to%20Experian%20data.
  25. Survivor, T. W. S. (2021, February 10). How Do Banks Make Money: The Honest Truth. Wall Street Survivor.
  26. The AI In Fintech Market Map. (2017, March 28). [Graph]. CBINSIGHTS.
  27. Trust, D. B. (2020, October 17). Applying AI to Risk Management in Banking and Finance. What’s the latest? Deltec Bank & Trust.
  28. Use Cases of AI in the Banking Sector. (2021, April 21). USM.
  29. Weinland, D. (2018, October 28). Chinese banks start scanning borrowers’ facial movements. Financial Times.

Chatbots and Emojis For an Improved Human Experience

 Chirin Dirani


With the growing use of conversational user interfaces, stems the need for a better understanding of social and emotional characteristics embedded in the online dialogues. Specifically, textbase chatbots face the challenge of conveying human-like behavior while being restricted to one channel of interaction, such as texts. The aim of this paper is to investigate whether or not it is possible to normalize and formalize the use of emojis for a comprehensive and complete means of communication. In an effort to answer this question, and as a primary source, the paper will investigate the findings of a new study published in 2021 by the University of Virginia, How emoji and word embedding helps to unveil emotional transitions during online messaging. The study found that chatbots design can be enhanced to have the ability to understand the “affective meaning of emojis.” By that, chatbots will be more capable to understand the social and emotional state of the users and subsequently conduct a more naturalistic conversation with humans. The paper concludes by calling for more empirical research on chatbots using emojis for emotionally intelligent online conversations.


Throughout history, humans established relationships using explicit means of communication, such as words, and implicitly by using their body language. Most of these relationships were developed through face-to-face interactions. Body language delivers important visual clues to what is said. In fact, small clues such as facial expressions or gestures add a great deal of meaning to our words. In the last few decades and with the growing use of different forms of technology, people shifted to communicating through text and voice messaging in an online form. Chatbots, which is the common name of voice assistants and virtual assistants, as well as the text chat, is an important technology that has various implementations. Despite the widespread use of this service, especially to support businesses, chatbots technology still lacks efficiency due to the absence of body language. In this paper, I will explore the impact of using other informal means of communication in the chatbot texting service to replace body language and identify emotions. Using Emojis to infer emotional changes during chatbot texting, is one of the means I propose. In an effort to make textbase chatbots’ conversations with humans more efficient, I will try to answer the question on whether or not it is possible to normalize and formalize the use of emojis, for a comprehensive and enhanced means of communication. For this purpose, I will start by looking into the history of text chatbot service, and will then deblackbox the different layers, levels, and modules composing this technology. The aim of this paper is to contribute to finding solutions for the urgent challenges facing one of the most growing services in the field of technology today. 

Definition of Chatbots

According to Lexico, a chatbot is “A computer program designed to simulate conversation with human users, especially over the internet.” Also, it is an artificial intelligence program, as well as a Human–Computer Interaction (HCI) model.” This program uses natural language processing (NLP) and sentiment analysis to conduct an online conversation with humans or other chatbots via text or oral speech. To describe this software, Micheal Mauldin, called it “ChatterBot” in 1994, after creating the first Verbot; Julia. Today, Chatbots is the common name of artificial conversation entities, interactive agents, smart bots, and digital assistants. Due to its flexibility, the chatbots’ digital assistants proved useful in many fields, such as education, healthcare and business industries. Also, it is used by organizations and governments on websites, in applications, and instant messaging platforms to promote products, ideas or services.

The interactivity of technology in combination with artificial intelligence (AI) have greatly improved the abilities of chatbots to emulate human conversations. However, chatbots are still unable to conduct conversational skills like humans do. This is due to the fact that chatbots today are not fully developed to infer their user’s emotional state. Progress is achieved everyday and chatbots are gradually getting more intelligent and more aware of their interlocutor’s feelings.

The Evolution History of Chatbots

What is called today as a benchmark for Artificial Intelligence (AI), “Turing test,” is rooted in Alan Turing’s well known paper that was published in 1950; Computing, Machinery and Intelligence. The overall idea of Turing’s paper is that machines too can think and are intelligent. We can consider this as the starting point of bots in general. For Turing, “a machine is intelligent when it can impersonate a human and can convince its interlocutor, in a real-time conversation, that they are interacting with a human.”

In 1966, The German computer scientist and professor at Massachusetts Institute of Technology (MIT),  Joseph Weizenbaum, built on Turing’s idea to develop the first chatterbot program in the history of computer science; ELIZA. This program was designed to emulate a therapist who would ask open-ended questions and respond with follow-ups. The main idea behind this software is to make ELIZA’s users believe that they are conversing with a real human therapist. For this purpose, Weizenbaum programmed ELIZA to recognize some key words from the input and regenerate an answer using these keywords from a pre-programmed list of responses. Figure 1, illustrates a human conversation with ELIZA. It shows clearly how this program picks up a word, and responds by asking an open-ended question. For example, when the user said: “He says that I’m depressed much of the time,” ELIZA took the word “depressed” and used it to formulate its next response, “I am sorry to hear that you are depressed.” This case of open-ended questions created an illusion of understanding and having an interaction with a real human being, meanwhile the whole process was an automated one. PARRY is a more advanced copy of ELIZA that was founded in 1972. It was designed to act like a patient with schizophrenia. Like ELIZA, PARRY was a chatbot but with limited capabilities in terms of understanding language and expressing emotions. Add to it, PARRY was a slow respondent and couldn’t learn from the dialogue.

Figure 1: A human Conversation with ELIZA, Source: 

The British programmer Rollo Carpenter was the first pioneer to use AI for his chatbot; Jabberwacky, back in 1982. Carpenter aimed at simulating a natural human chat that can pass the Turing test. “Jabberwacky was written in CleverScript, a language based on spreadsheets that facilitated the development of chatbots, and it used contextual pattern matching to respond based on previous discussions.” Like his predecessors, Carpenter was not able to program Jabberwacky with high speed or to deal with large numbers of users.  

The actual evolution of chatbot technology happened in 2001 when it was made available on messengers such as America Online (AOL) and Microsoft (MSN). This new generation of chatbots “retrieved information from databases about movie times, sports scores, stock prices, news, and weather.” The new improvement in this technology paved the way for a real  development in machine intelligence and human–computer communication.

A new improvement to AI chatbots took place with the development of smart personal voice assistants, which were built into smartphones and home speaking devices. These voice assistants received voice commands, answered in digital voice and implemented tasks such as monitoring home automated devices, calendars, email and other applications. Multiple companies introduced their voice assistants; Apple SIRI (2010), IBM Watson (2011), Google Assistant (2012), Microsoft Cortana (2014) and Amazon Alexa (2014). The main distinction between the new generation of chatbots and the old ones is the quick meaningful response to the human interlocutor.

By all means, 2016 was the year of chatbots. In this year, there was a substantial development in AI technology, in addition to introducing the Internet of Things (IoT) to the field of chatbots. AI changed the way people communicated with service providers since “social media platforms allowed developers to create chatbots for their brand or service to enable customers to perform specific daily actions within their messaging applications.” The integration of chatbots in the IoT scenario opened the door wide for the implementation of such systems. Thanks to the development in natural language processing (NLP) and compared to ELIZA, today’s chatbots can share personal opinions and are more relevant in their conversation. However, they can be vague and misleading as well. The important point to note here is that chatbots are still being developed and as a technology, it hasn’t yet realized its fullest potential. This brief historical overview of the evolution of chatbots tells us that although the technology has experienced rapid developments, it is yet to promise us a world of possibilities, if properly utilized.

Chatbot Categories  

There are several ways to categorize chatbots (see Figure 2). First, they can be categorized according to their purpose as either assistants or for interlocutors. Assistant chatbots are developed to assist users in their daily activities, such as schedule an appointment, make a phone call, search for information on the internet and more. Second, Chatbots can also be grouped according to their communication technique, and this can be either via text, voice or image, or all of them together. Recently, chatbots can respond to a picture, comment and even express their emotions towards this picture. The third categorization is related to the chatbots’ knowledge domain, and it is the access range provided for the bots. Based on the scope of this access, a bot can be either generic or specific. While generic bots can in fact answer questions from any domain, the domain-specific chatbots respond only to questions about a specific knowledge domain. Interpersonal chatbots are also under the communication technique category and they are the bots that offer services without being a friendly companion. In addition, there are the Intrapersonal chatbots, which are close companions and live in their user’s domain. The Inter-agent chatbots are the ones that can communicate with other chatbots such as Alexa and Cortana. Fourth category is according to classification. Under this category, chatbots are classified into three main classes. The informative chatbots are used to give information to their user; these information are usually stored in a fixed source. The chat-based/conversational chatbots which conduct a natural conversation with their user like a human. Finally, the task-based chatbots that handle different functions and are excellent at requesting information and responding to the user appropriately. It is important to mention that the method that a chatbot uses to generate its response categorizes it into a rule-based, retrieval based, or a generative based chatbot. This paper will focus on the class of bots that use texts as means of communication.

Chatbots Categories

The Chatbots Technology

Depending on the algorithms and techniques, there are two main approaches for developing chatbot technology; the pattern matching and the pattern recognition using machine learning (ML) algorithms. In what follows, I will provide a brief description of each technique, however, this paper is concerned with AI/ML pattern recognition chatbots. 

Pattern Matching Model  

This technique is used in rule-based chatbots, such as ELIZA, PARRY and Jabberwacky. In this case, chatbots “match the user input to a rule pattern and select a predefined answer from a set of responses with the use of pattern matching algorithms.” In contrast to knowledge-base chatbots, rule-based ones are unable to generate new answers because their knowledge comes from their developers who developed this knowledge in the shape of conversational patterns. Despite the fact that these bots are fast responding, however, their answers are automated and not spontaneous like the knowledge-base chatbots. There are three main languages used to develop chatbots with the pattern-matching technique; Artificial Intelligence Markup Language (AIML), Rivescript, and Chatscript.

Pattern Recognition Model: AI/ML Empowered

The main distinction between the pattern matching and pattern recognition bots, which is in more scientific words, rule-based and knowledge-based bots is the presence of Artificial Neural Networks (ANNs) algorithms in the latter case. By using AI/ML algorithms, these relatively new bots can extract the content from their users input using natural language processing (NLP) and the ability to learn from conversations. These bots need an extensive amount of Data training set as they do not rely on predefined response for every input. Today, developers use ANNs in the architecture of ML empowered chatbots. It is useful to mention here that retrieval-based chatbots use ANNs to select the most relevant response from a set of responses. Meanwhile, generative chatbots synthesize their reply using deep learning techniques. The focus of this paper is on the chatbots using deep learning methods since this is the dominant technology used in today’s chatbots. 

Deblackboxing Chatbot technology

Uncovering the different layers, levels, and modules in the chatbots will help us to better understand this technology and the way it works. In fact, there are many designs that vary depending on the type of chatbot. The following description reveals the key design principles and main architecture that applies to all chatbots.

Figure 3 Demonstration of the general architecture for AI chatbot of the entire process. Source:  How emoji and word embedding helps to unveil emotional transitions during online messaging

In an analysis of Figure 3, we can see the different layers of operation within a chatbot including the user interface layer, the user message analysis layer, the dialog management layer, the backend layer and finally, the response generation layer. The chatbot process begins when the software receives the user’s input through an application using text. The input is then sent to the user message analysis component to find the user’s intention following pattern matching or machine learning approaches. In this layer, Natural Language Processing (NLP) breaks the input down, comprehends its meaning, spell checks and corrects user spelling mistakes. The user’s language is identified and translated into the language of the chatbot with what is called Natural Language Understanding (NLU) which is a “subset of NLP that deals with the much narrower, but equally important facet of how to best handle unstructured inputs and convert them into a structured form that a machine can understand” and act accordingly. Then the dialog management layer controls and updates the conversation context. Also, it asks for follow-up questions after the intent is recognized. After the intent identification, the chatbot proceeds to respond or ask for information retrieval from the backend. The chatbot retrieves the information needed to fulfill the user’s intent from the Backend through external Application Performance Interfaces (APIs) calls or Database requests. Once the appropriate information is extracted, it gets forwarded to the Dialog Management Module and then to the Response Generation Module which uses Natural Language Generation (NLG) to turn structured data into text output that answers the main query. 

The chatbots architecture is supported today with three important trends of technology; AI/ML algorithms, Big data and cloud computing systems. On one hand, AI/ML enable intelligent algorithms that are capable of learning on the go. These algorithms are the artificial neural networks (ANN) which are means of training data that empower chatbots’ outputs with greater “accuracy” (lower error rate). On the other hand, Big data provides AI/ML hungry ANN algorithms with a big amount of data which in turn enriches chatbot’s backend storage. Then the vast amount of AI trained chatbots’ output data needs the scalability and extensibility offered by cloud computing, in the shape of cheap extensible storage memories. This unique combination “offers huge advantages in terms of installation, configuration, updating, compatibility, costs and computational power” for chatbots.” 

The above shows us that the chatbots technology is very complex and intricate. Nevertheless, it is at the same time flexible and can be easily further developed and upgraded with new layers. After analyzing the chatbots layers and gaining a better understanding of the role of each of these layers, we can in fact incorporate our desired upgrades and prepare for the new generation of charbots that are able to relate to the interlocutors’ emotions over text.

Discussion around the Main Argument:

As humans, we develop relationships through everyday face-to-face interactions. Body language delivers important visual clues to what we say. In fact, small clues such as facial expressions or gestures add a great deal of meaning to our words. In the 60s, Professor Albert Mehrabian formulated the 7-38-55% communication rule about the role of nonverbal communication and its impact during face-to-face exchanges, (see figure 4). According to this rule, “only 7% of communication of feelings and attitudes takes place through the words we use, while 38% takes place through tone and voice and the remaining 55% of communication take place through the body language we use.”

Figure 4: Theory of communication. Source: 

In the last twenty years and with the growing use of different forms of technology, people shifted to communication through text and voice messaging in the online space. Chatbots is one of the important technologies that has various implementations. Despite the widespread use of this technology, chatbots still lack efficiency due to the absence of body language and ability to infer emotions, feelings and attitudes of its interlocutor. To solve this issue, many researchers proposed different scenarios. Our guide in this discussion is the primary source “How emoji and word embedding helps to unveil emotional transitions during online messaging.” This source is the first study of its kind, by the University of Virginia, and it suggests that using emojis and word embedding to model the emotional changes during social media interactions is an alternative approach to making the textbase chatbot technology more efficient. Also, the study advocates for the fact that extended affective dictionaries, which include emojis, will help in making chatbots work more efficiently. The study “explores the components of interaction in the context of messaging and provides an approach to model an individual’s emotion using a combination of words and emojis.” According to this study, detecting the user’s emotion during the dialogue session will improve chatbots’ ability to have a “more naturalistic communication with humans.”

Moeen Mostafavi and Michael D. Porter, the researchers who conducted this project, believe that tracking a chatbot user’s emotional state during the communication process needs a “dynamic model.” For this model they consulted the “Affect Control Theory (ACT) to track the changes in the emotional state of the user during his/her communication with a chatbot after every dialogue. Figure 5 demonstrates the interaction between a customer and a Chatbot using emojis. This interesting study concludes with an important finding: chatbots design can be enhanced to have the ability to understand the “affective meaning of emojis.” However, there is a need to extend dictionaries to support the researchers’ use of ACT to apply new designs for chatbots behaviors. The researchers claim that the increasing use of emojis in social media communication today will facilitate adding them to dictionaries to support the researchers’ efforts. 

As this research paper demonstrates, the chatbots’ flexibility and the technological advances make it easy for the chatbot designers to incorporate the use of emojis in a more intelligent manner. This integration would increase this tool’s ability to understand, analyze and respond to the emotional changes of the human on the other end of the chat is experiencing. Nevertheless, I suggest that the challenge for this process is in building a rich foundation for these emojis in the dictionaries and which requires a collaboration at higher levels and more.

Figure 5: An interaction between a user and a Chatbot using emojis. Source: How emoji and word embedding helps to unveil emotional transitions during online messaging


Taking into account the significant human financial and capital investment committed to the development of chatbots and other AI-driven conversational user interfaces, it is necessary to understand this complex technology. The focus of the chatbot community has so far concentrated on the language factors, such as NLP.  This paper argues that it is equally important to start heavily investing in the social and emotional factors in order to enhance the abilities of the textbase AI-driven chatbots. Chatbots have a long way to go before they realize their fullest potential and pass the Turing test. However, promising improvement surfaced in the last few years. The goal of this paper was to investigate whether or not it is possible to harness the use of an already available tool, such as emojis, to enhance the communication power of chatbots. A unique newly published primary source was investigated to help in answering this question. Understanding the evolution history of chatbots, their categories, and technology was important to deblackbox this complex technology. This clarity helps us realize that adding emojis to this complex process is not an easy one but still not impossible, given the additional support provided by three important technologies, AI/ML, Big data and Cloud computing. Using Emojis for chatbots involves applying modifications to the main structure of chatbot’s architecture by adding a new layer. Add to it, there will be a need to extend traditional dictionaries by adding emojis to support the process. The primary source found, by evidence, that implementing this new approach will definitely provide chatbot with new abilities to become more intelligent. To conclude, there is still a need for more empirical research on chatbots’ use of emojis as a leverage. The process is not easy but given the huge investment and growing need for chatbot in many fields, the potential for the outcomes of such research will be groundbreaking and will transform the human’s experience with chatbots as a tool of support.


Adamopoulou, Eleni, and Lefteris Moussiades. “Chatbots: History, Technology, and Applications.” Machine Learning with Applications 2 (December 15, 2020): 100006.

The British Library. “Albert Mehrabian: Nonverbal Communication Thinker.” The British Library. Accessed May 8, 2021.

“Chatbot.” In Wikipedia, May 6, 2021.

“CHATBOT | Definition of CHATBOT by Oxford Dictionary on Lexico.Com Also Meaning of CHATBOT.” Accessed May 8, 2021.

Fernandes, Anush. “NLP, NLU, NLG and How Chatbots Work.” Medium, November 9, 2018.

Hoy, Matthew B. “Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants.” Medical Reference Services Quarterly 37, no. 1 (January 2, 2018): 81–88.

“Jabberwacky.” In Wikipedia, November 15, 2019. 

Jurafsky, Daniel and James H. Martin. Speech and Language Processing: An Introduction to           Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd ed. (Upper Saddle River, N.J: Prentice Hall, 2008).

Kar, Rohan, and Rishin Haldar. “Applying Chatbots to the Internet of Things: Opportunities and Architectural Elements.” International Journal of Advanced Computer Science and Applications 7, no. 11 (2016): 8.

Mell, Peter, and Tim Grance. “The NIST Definition of Cloud Computing.” National Institute of Standards and Technology, September 28, 2011.

Molnár, György, and Zoltán Szüts. “The Role of Chatbots in Formal Education.” In 2018 IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY), 000197–000202, 2018.

Mostafavi, Moeen, and Michael D. Porter. “How Emoji and Word Embedding Helps to Unveil Emotional Transitions during Online Messaging.” ArXiv:2104.11032 [Cs, Eess], March 23, 2021.

Shah, Huma, Kevin Warwick, Jordi Vallverdú, and Defeng Wu. “Can Machines Talk? Comparison of Eliza with Modern Dialogue Systems.” Computers in Human Behavior 58 (May 1, 2016): 278–95.

Shum, Heung-yeung, Xiao-dong He, and Di Li. “From Eliza to XiaoIce: Challenges and Opportunities with Social Chatbots.” Frontiers of Information Technology & Electronic Engineering 19, no. 1 (January 1, 2018): 10–26.


Deblackboxing “Translation” Paper on Amazon Echo Plus

Deblacking “Translation” Paper on Amazon Echo Plus (with cover) (.docx)

Deblacking “Translation” Paper on Amazon Echo Plus 

Heba Khashogji


Smart speakers have gained wide popularity among many users because it provides more luxury. Still, some users feel that this type of device constitutes a violation of privacy, and there is no point in using it. In this paper, we will talk about the Amazon Echo Plus, its main components, and how it works step-by-step by following the “deblackboxing” method.


At present, smart speakers have become widely used among people, and according to Jakob and Wilhelm (2020), Amazon dominates the smart speaker market along with Google, Xiaomi, and others, and among these speakers is the Amazon Echo family.

In this paper, we will talk specifically about the Amazon Echo Plus, the “smart speaker.” This smart speaker is powered by Amazon’s cloud-based voice service known by the name Alexa. Smart speakers have many uses, including the field of healthcare for the elderly. Ries and Sugihara, 2018 and Robinson et al., 2014 claimed that the technology itself has proven that it is able to provide healthcare thanks to the existing technologies and its current functions as Amazon Echo Plus is used as an alternative healthcare provider to humans in the early stages of people with dementia. These devices also use the Internet of Things (IoT) that helps control home appliances by voice recognition. You can also listen to music on demand by any artist or genre from many platforms such as Spotify, Amazon Music, and Apple Music through such devices.

  1. Systems Thinking

Amazon Echo Plus starts working when it hears the word “Alexa” from a user. The word Alexa refers to the virtual assistant from Amazon. The alert word “Alexa” can be changed later to “Echo,” “Amazon,” or “Computer.”

When the virtual assistant hears the alert word, it starts working, and the ring at the top lights up in blue color. Then Echo Plus can be asked any question, for example, about the weather, and it answers the weather with a summary of what the weather will be like during the day.

Echo Plus has a built-in hub. This hub supports and controls ZigBee smart devices, such as light bulbs and door locks, which can be bound to the home assistant asking Alexa to “discover the devices.” Similarly, when the user asks Amazon Echo Plus any question by voice command, Echo Plus records the audio and sends it through the Amazon cloud servers. These servers convert the recorded voice into text which will be analyzed, and therefore Alexa finds the best way to answer this text. This answer is converted back to audio, and this information is sent to the Echo Plus smart speaker to show the audio response (Rak et al., 2020).

Amazon Echo Plus features local voice control and allows us to control our home devices without any internet connection. However, if one needs to listen to music from Spotify or Amazon, an internet connection is required.

  1. Design Thinking and Semiotic Thinking

Below is a simple example that shows how the Amazon Echo Plus works. We will assume in this example that the user says “Hello world” for the purpose of examination:

First, to start the device, the user says, “Hello world.” When the device hears the wakeup word “Alexa,” it starts to listen. Second, the Amazon Echo Plus device sends the speech to the Alexa service via the cloud to recognize speech. After which, it converts it into text, and the natural language processing operations are performed to identify the purpose of the request. Third, Alexa sends a JSON file that contains the demand to Lambda Function to handle the request. Lambda function is one of Amazon Web services that run user’s code only when needed, so there is no need to run servers continuously. In our example, the lambda function will return “Welcome to the Hello world” and send it to the Alexa service. Fourth, Alexa receives a JSON response and converts the resulting text into an audio file. Finally, the Amazon Echo Plus receives and plays audio for the user. As you can see below, figure 1 shows how the user interacts with the Amazon Echo Plus device (Amazon Alexa, n.d.).

Figure 1: User Interaction with Amazon Echo Plus (Alexa Developer, n.d.)


  1. JSON (Intent/ Response)

“JavaScript Object Notation” is one way of formatting that structures data used chiefly by web applications for communication. JSON syntax is created based on JavaScript object notation syntax (Wazeed, 2018):

  • Data is in name/value pairs. Example: {“fruit”:” Banana”}.
  • Data is separated by commas. Example: {“fruit”: ”Banana”, “color”: ”yellow” }
  • Curly braces hold objects.

Figure 2 shows an example of JSON code; inside the Intents array, there’s a HelloWorldIntent and one of the built-in intents: AMAZON.HelpIntent. AMAZON.HelpIntent responds to sentences that contain words or phrases indicating that the user needs help, such as “help.” Alexa creates an intent JSON File after it converts speech to text.

Figure 2: An Example of JSON Code (Ralevic, 2018)

  1. Text to Speech System

Text-to-speech is done in several stages. The input Text to Speech System (TTS) is a text that is analyzed, then that text is converted into an audio description, after which a tone is generated. The main units of the text-to-speech architecture are as follows (Isewon et al., 2014). Figure 3 shows text to speech System:

Figure 3: Text to Speech System (Isewon et al., 2014)

  • Natural Language Processing Unit (NLP): It produces an audio version of the text on the input. The primary operations of the NLP unit are as follows:
    • Text analysis: First, the text is decomposed into tokens. Token-to-word conversion creates the orthographic form of the token. For example, the token “Mr” is transformed to “Mister”; it is constituted by expansion.
    • Application of the pronunciation rules: after the first stage is complete; the pronunciation rules are applied. In some cases, the letter can correspond to no sound (for example, “g” in “sign”), or multiple characters correspond to a single phoneme (such as: “ch” in “teacher”). There are two approaches to determine pronunciation:
      • Dictionary-based with morphological components: as many as possible words are stored in a dictionary. Pronunciation rules determine the pronunciation of words that are not found in the dictionary.
      • Rule-based: pronunciations are created from the phonological knowledge of dictionaries. Only words whose pronunciation is an exception are included in the dictionary.

If the dictionary-based method has a large and enough phonetic dictionary, it will be more exact than the rule-based method.

  • Prosody Generation: after the pronunciation is specified, the prosody is created. Prosody is essential for specifying an affective state. If any person says, “It is a delicious pizza,” it can reflect whether that person likes the pizza or not, which depends on a person’s intonation. Text to Speech system (TTS) is based on many factors such as intonation modeling (phrasing and accentuation), amplitude, and length modeling (including sound length and pause, which determine syllable length and speech tempos) (Isewon et al., 2014).
  • Digital Signal Processing Unit (DSP): It converts the symbolic information received from NLP into understandable speech.
  1. Convert Text to Tokens

Alexa divides Speech into tokens according to the following (Gonfalonieri, 2018) (Trivedi et al., 2018) :

  1. The wake-up word: The wake-up word “Alexa” tells the Amazon Echo Plus to start by listening to the user’s commands.
  2. Launch word: The word launch is a transitional action word indicating to Alexa that a skill summons will likely follow. Typical launch words include “tell, ask and open.”
  3. Invocation name: To initiate an interaction with a skill, the user says the skill’s recall name. For example, to use the weather skill, a user could say, “Alexa, what’s the weather today?”
  4. Utterance: Simply put, spoken speech is a user’s spoken request. These spoken requests can invoke a skill and provide input to a skill.
  5. Prompt: A string of text that must be pronounced to the user to request information. You include prompt text in your response to a user request.
  6. Intent: an action that fulfills the user’s spoken request. Intents can optionally contain arguments called apertures.
  7. Slot value: slots are input values ​​that are provided in the spoken user request. These values ​​help Alexa in knowing the user’s intent.

Figure 4 shows that the user is giving the entry information, the travel date for Friday. This value is an intent slot, which Alexa will transfer to Lambda to process the skill code.

Figure 4: Dividing Words Into Tokens ( Amazon Alexa, n.d.)

  1. Speech Recognition

Speech recognition is the machine’s ability to identify words and phrases in the spoken language and convert these words or phrases into text that the machine can handle (Trivedi et al., 2018). There are three ways that computer performs matching speech with stored phonetics:

  • Acoustic phonetic approach: Hidden Markov Model (HMM) is used in this approach. Hidden Markov Model develops a non-deterministic probability model for speech recognition. HMM consists of two variables: the hidden states of the phonemes stored in computer memory and the visible frequency segment of the digital signal. Each phoneme has a probability, and the syllable is matched with the phoneme according to the probability. Then, the matched phonemes are collected together to form the correct words according to the language’s grammar rules, which are stored previously.
  • Pattern recognition approach: Speech recognition is one of the areas of pattern recognition. It falls under what is known as supervised learning. In a supervised learning system, we have a dataset where the input (audio signal) and output (text corresponding to the audio signal) of the dataset is known. The dataset is divided into two sets: a training set and a testing set. Supervised learning is also divided into two phases: the training phase and the testing phase. In the training phase, the training set is used and entered into a specified model and trained with a certain number of iterations to produce our trainer model. The trainer model is tested by a test set to ensure that it is operating properly. In the speech recognition stage, the user’s voice is matched with the previously trained pattern and so on until the recognized sentence is produced as a text (Trivedi et al., 2018).
  • Artificial intelligence approach: it is based on the use of main knowledge sources such as sounds, spoken knowledge based on spectral measurements, proper meaningful knowledge, and syntactical words knowledge.

Figure 5 shows a typical speech recognition system.

Figure 5: Typical Speech Recognition System (Samudravijaya, 2002)

  1. User- Speaker Interaction

Amazon Echo Plus has powerful microphones. The device needs to be activated; the microphone always works and waits for the wake-up word “Alexa” to be activated (Jakob, 2020). Figure 6 shows the voice processing system. Microphones in Echo plus convert voice signal, which is a continuous signal to digital signal.  The process of converting analog signal to digital signal has three stages:

  • Sampling: Samples are taken at equal time periods, and a frequency samples a periodic signal called a cutoff frequency. The cutoff frequency must be equal to more than twice the maximum frequency of the input signal. This is called Nyquist’s theorem.
  • Quantization: The second step assigns a numerical value to the voltage level. This process searches for the closest value corresponding to the signal amplitude out of a specific number of possible values, covering the whole amplitude range. The size of the quantizer scope must be a power of 2 (such as 128, 256 …).
  • Coding: After the closest discrete value is identified, a binary numerical value is assigned for each discrete value. Quantizer identifies the discrete value, and a numerical value is assigned corresponding to each discrete value, then it is encoded as a binary number. The quantization and encoding process cannot be entirely correct and can only provide an approximation of the real values. AS higher as possible of the quantizer resolution, the closer this approximation will be to the real value of the signal (Pandey, 2019).

Figure 6: Voice Processing System (Abdullah et al., 2019)

According to Abdullah et al. (2019), audio is processed to remove noise and then passed to the signal processing phase. Preprocessing involves applying a low pass filter to remove noise from the voice background. A low pass filter can be defined as a frequency filter that passes signals with a lower frequency than cutoff frequency and prevents higher frequency than cutoff frequency, as shown in figure 7.

On the other hand, Signal Processing is considered a major component in voice processing. It captures the most important part of the input signal. Where the major component of signal processing is Fast Fourier Transform (FFT): Fourier transforms a signal from the time domain to the frequency domain. Fast Fourier transform is an algorithm used to calculate discrete input faster than computing it directly (Maklin, 2019). After which an FFT and its magnitude are taken, which generates a frequency domain representation of the audio called a magnitude spectrum.

Figure 7: Ideal Low Pass Filter (Obeid et al., 2017)

  1. Ethics and Policy

Intelligent systems, including Internet of Things (IoT) systems, manage a very large amount of personal data, which is unknown to many users with limited experience. Also, these devices control most home appliances, such as home air conditioning systems, home lighting, washing machines …, which makes this type of system questionable in terms of security and privacy (Rak et al., 2020). One of the main reasons that prevent users from increasing the use of IoT systems is because they collect, process, and share personal user data with other parties. There are many IoT systems that collect user data without their knowledge or consent (Thorburn et al., 2019).

A woman from Oregon discovered that her smart assistant had recorded a voice call between her and her husband, and the recorded call was sent to one of her contacts on her phone. The existence of many of these violations has led to the adoption of many privacy regulations, such as the European General Data Protection Regulation (GDPR). GDPR is in European Union law. The main aim of GDPR is to give people control over their personal data and prevent their data from sharing without their consent. The GDPR consists of provisions as well as requirements related to the personal processing data of people located in the EU (Thorburn et al., 2019).

To this end, Echo Plus always listens to his alert word “Alexa” and starts to work when it thinks that it heard this word, then it begins to record the voice and receive the commands, which can be seen through the blue light of the ring at the upper part. It does not record anything except that it is waiting for a word of alert from the user. Amazon uses encryption to protect the audio recordings that Alexa uploads. These audio files can be deleted at any time that the user wants. Amazon Echo Plus also allows the user to stop recording via the microphone by pressing the “mute button” and prevent him/ her from hearing anything, even the alert word, and then the ring turns red (Crist and Gebhart, 2018).


This paper discussed the most popular smart speaker device, “Amazon Echo plus.” The paper explained how the device works and its main components. The main discussion points and concepts were tackled, including; Natural Language Processing, converting speech to text and converting text to speech. In the end, the paper elaborated on the ethics and how the device tries to provide more privacy for users.


  1. Abdullah, H., Garcia, W., Peeters, C., Traynor, P., Butler, K. R., & Wilson, J. (2019). Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems. arXiv:1904.05734v1.
  2. Amazon Alexa (n.d.). Build an Engaging Alexa Skill Tutorial. Retrieved from
  3. Crist, R., & Gebhart, A. (2018, September 21). Retrieved from
  4. Gonfalonieri, A. (2018, November 21). How Amazon Alexa works? Your guide to Natural Language Processing (AI). Retrieved from towards data science:
  5. Isewon, I., Oyelade, J., & Oladipupo, O. (2014). Design and Implementation of Text To Speech Conversion for Visually Impaired People. International Journal of Applied Information Systems (IJAIS).
  6. Abdullah, H., Garcia, W., Peeter, C., Traynor, P., Butler, K. R., & Wil, J. (2019). Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems. arXiv:1904.05734v1
  7. Alexa Developer (n.d.). Build an Engaging Alexa Skill Tutorial. Retrieved from
  8. Alexa Developer (n.d.). Build an Engaging Alexa Skill Tutorial. Retrieved from
  9. Isewon, I., Oyelade, J., & Oladipupo, O. (2014). Design and Implementation of Text To Speech Conversion for Visually Impaired People. International Journal of Applied Information Systems (IJAIS).
  10. Jakob, D. &. Wilhelm, S. (2020). Amazon Echo: A Benchmarking Model Review. Retrieved from
  11. Maklin, C. (2019, December 19). Fast Fourier Transform. Retrieved from
  12. Obeid, H., Khettab, H., Marais, L., & Hallab, M. (2017). Evaluation of Arterial Stiffness by Finger-Toe Pulse Wave Velocity: Optimization of Signal Processing and Clinical Validation. Journal of Hypertension. DOI:10.1097/HJH.0000000000001371.
  13. Pandey, H. (2019, November 25). Analog to Digital Conversion. Retrieved from
  14. Rak, M., Salzillo, G., & Romeo, C. (2020). Systematic IoT Penetration Testing: Alexa Case Study. Italian Conference on Cyber Security, (pp. 190-200). Ancona.
  15. Ralevic, U. (2018, July 24). How To Build A Custom Amazon Alexa Skill, Step-By-Step: My Favorite Chess Player. Retrieved from
  16. Ralevic, U. (2018, July 24). How To Build A Custom Amazon Alexa Skill, Step-By-Step: My Favorite Chess Player. Retrieved from
  17. Ries, N. &. (2018, December 10). Robot revolution: Why technology for older people must be designed with care and respect. Retrieved from
  18. Robinson, H., MacDonald, B., & Broadbent, E. (2014). The role of healthcare robots for older people at home: A review. International Journal of Social Robotics, 6(4), 575-591.
  19. Samudravijaya, K. (2002). Automatic Speech Recognition. Tata Institute of Fundamental Research. Retrieved from
  20. Thorburn, R., Margheri, A., & Paci, F. (2019). Towards an integrated privacy protection framework for IoT: contextualising regulatory requirements with industry best practices. DOI:10.1049/cp.2019.0170.
  21. Trivedi, A., Pant, N., Pinal, P., Sonik, S., & Agrawal, S. (2018). Speech to text and text to speech recognition systems-A review. IOSR Journal of Computer Engineering (IOSR-JCE), 36-43.
  22. (2018, June 6). JavaScript JSON. Retrieved from

Final Paper: De-Blackboxing Facial Recognition

Chloe Wawerek

De-Blackboxing Facial Recognition

As the Key to Ethical Concerns 


Facial Recognition has received attention in recent news for issues regarding biases in machines ability to detect faces and the repercussions this would have on minorities. After reviewing the design principles behind facial recognition through material on the history and evolution of AI, I am confident that the ethical issues facing facial recognition is not in the technology itself but how humans shape the technology. This coupled by case studies on research conducted by Pew and MIT emphasizes that skewed data affects how algorithms process and learn which leads some technology to have what is called ingrain biases. This though is easy to solve after de-blackboxing facial recognition, which is what I am to do in this paper.

  1. Introduction

There are certain things that we take for granted as humans. The ability to see, speak, and comprehend are just a few that we can do but have difficulty explaining how we do it. Scientist are trying to replicate what minds can do, like the above-mentioned task, in computers that constitutes a broad new field better known as AI. However, the thing with technology is that everything is an embodiment of electricity designed to represent 1s or 0s also known as bits. String of bits are then defined by programs to be a symbol like numbers. Computing therefore represents how humans impose a design on electricity to perform as logic processors.  As a result, AI programs do not have a human’s sense of relevance i.e. common sense because amongst many things they do not know how to frame a situation, in which implications tacitly assumed by human thinkers are ignored by the computer because they haven’t been made explicit (Boden). In the words of Grace Hopper “The computer is an extremely fast moron. It will, at the speed of light, do exactly what it is told to do—no more, no less.” So, the question then is how did humanity start concerning itself with the ability of computers to recognize faces? I want to examine how facial recognition works and de-blackbox this ability down to the design processes that set the foundation for this technology. Starting from the concept of computation I will trace the evolution of facial recognition to highlight what the root issues are regarding this technology. Fundamentally, Computers have a vision problem because they cannot understand visual images as human do. Computers need to be told exactly what to and what not to look for in identifying images and solving problems, hence extremely fast morons. Understanding this we need to look deeper into why issues exist if humans set the precedent for what computers should see versus what they do see.  

  1. Facial Recognition

2.1 Computation to AI

The designs for computing systems and AI have been developed by means of our common human capacities for symbolic thought, representation, abstraction, modeling, and design (Irvine). Computation systems are human made artifacts composed of elementary functional components that act as an interface between the functions performed by those components and the surroundings in which it operates (Simon). Those functions combine, sequence, and make active symbols that mean (“data representations”) and symbols that do (“programming code”) in automated processes for any programmable purpose (Irvine). Computers then are nothing more than machine for following instructions and those instructions are what we call programs and algorithms. Roughly speaking, all a computer can do is follow lists of instructions such as the following:

  • Add A to B
  • If the result is bigger than C, then do D; otherwise, do E
  • Repeatedly do F until G

Computers, then, can reliably follow very simple instructions very, very quickly, and they can make decisions if those decisions are precisely specific. (Woodbridge) If we are to build intelligent machine, then their intelligence must ultimately reduce to simple, explicit instructions like these, which begs to question can humans produce intelligent behavior simply by following lists of instructions? Well, AI takes inspiration from the brain. If we can understand how the brain functions regarding information processing that surpasses engineering products – vision, speech recognition, learning – we can define solutions to these task as formal algorithms and implement them on computers (Alapaydin). Currently, a machine is said to have AI if it can interpret data, potentially learn from the data, and use that knowledge to adapt and achieve specific goals. However, based on this definition there exist different interpretations of AI, strong vs. weak. Strong AI is when a program can understand in a similar way as a human would. Whereas weak AI is when a program can only simulate understanding. Scientists are still wrestling with the issues of AI comprehension that involves understanding the human world and the unwritten rules that govern our relationships within it by testing programs through the Winograd Schema (Woodbridge).

Example: Question – Who [feared/advocated] violence?

Statement 1a: The city councilors refused the demonstrators a permit because they feared violence.

Statement 1b: The city councilors refused the demonstrators a permit because they advocated violence.

These problems consist of building computer programs that carry out task that currently requires brain function, like driverless cars or writing interesting stories. To do so scientist use a process called machine learning which aims to construct a program that fits a given data set by creating a learning program that is a general model with modifiable parameters. Learning algorithms adjust the parameters of the model by optimizing performance criterion defined on the data (Alapaydin). In layman terms machine learning are algorithms that give computers the ability to learn from data, and then make predictions and decisions while maximizing correct classification while minimizing errors. A machine learning algorithm involves two steps to choose the best function, from a set of possible functions, in explaining the relationships between features in a dataset: training and inference. 

  1. The first step, training, involves allowing a machine learning algorithm to process a dataset and chooses the function that best matches the patterns in the dataset. The extracted function will be encoded in a computer program in a particular form known as a model. The training process then proceeds by taking inputs creating outputs and comparing outputs to the correct outputs from example list in dataset. The training is finished and model is fixed once the machine learning algorithm has found a function that is sufficiently accurate in which the output generated matches the correct output listed in the dataset.
  2. The next step is inference in which the fixed model is applied to new examples that scientists do not know the correct output value and therefore want the model to generate estimates of this value on its own.
    1. Machine learning algorithm uses two sources of info to select the best function. One is the dataset and the other assumptions (inducive bias) to prefer some functions over others, irrespective of the patterns in the dataset. Dataset and inducive bias counterbalance each other, a strong inductive bias payless attention to the dataset when selecting a function. (Kelleher)

Neural networks are a commonly used form of machine learning algorithm that take inspiration from some structures that occur in the brain that this paper will focus on in its de-blackboxing of facial recognition. Neural network uses a divide-and-conquer strategy to learn a function: each neuron in the network learns a simple function, and the overall (more complex) function, defined by the network, is created by combining these simpler functions. In brief, neural networks are organized in layers connected as links that take a series of inputs and combines them to then emit a signal as an output, both inputs and outputs are represented as numbers. Between the input and output are hidden layers that sum the weighted inputs and then apply a bias. These are initially set to random numbers when a neural network is created, then an algorithm starts training the neural network using labeled data from the training data. The training starts from scratch by initializing filters at random and then changing the filters slightly using a mathematical process by telling the system what the actual image is e.g. a toad vs a frog (supervised learning?). Next it applies the activation function (transfer function) that gets applied to an output performing a final mathematical modification to get the result.

2.1 Computer Vision  

Computer vision is extracting high level understanding from digital videos and images. So, the first step is to make digital photos and to do so we need to use a digital camera. When taking a photo, the light of the desired image passes through a camera’s lens, diaphragm, and open shutter to hit millions of tiny micro lenses that capture the light to direct it properly. The light then goes through a hot mirror that lets visible light pass and reflect invisible infrared light that would distort the image. Then the remaining light goes through a layer that measures the colors captured this layer mimics human eyesight as only being able to distinguish visible light and identify the colors red, green, and blue, another explicit presentation of human design in our computational systems. The usual design is the Bayer array which is a matrix array of green, red, and blue colors separated and never touching the same color but contains double the number of green. Finally, it strikes the photodiodes which measure the intensity of the light by first hitting the silicon at the “P-layer” which transforms the lights energy into electrons creating a negative charge. This charge is drawn into the diode’s depletion area because of the electric field the negative charge creates with the “N-layers” positive charge. Each photodiode collects photons of light as long as the shutter is open, the brighter a part of the photo is the more photons have hit that section. Once the shutter closes the pixels have electrical charges that are proportional to the amount of light received. Then it can go through two different process either CCD (charge-coupled device) or CMOS (complementary metal-oxide semiconductor). Either process the pixels go through an amplifier that converts this faint static electricity into a voltage in proportion to the size of each charge (White). The electricity is then converted into data in with the most common being hexcode. Data is always something with humanly imposed structure, that is, an interpretable unit of some kind understood as an instance of a general type. Data is inseparable from the concept of representation. In simplest terms colors are composed of 256 numbers of each shade of red, blue, and green. So, to alter a pictures colors one needs to change the number associated with that color. Black being 0 of all three which is the absence of color and white being 256 of all three.

There are several methods that a computer can then use to extract a meaning from digital images and gain vision. The ultimate goal is to gain context sensitivity which means to be aware of its surroundings i.e. understand social and environmental factors so that the machine reacts appropriately. To do so machine learning relies on pattern recognition. Pattern recognition composes of classifying data into categories determined by decision boundaries.  To do so involves a process that first starts with sensing/acquisition. This step uses a transducer such as a camera or microphone to capture signals (e.g., an image) with enough distinguishing features. The next step, preprocessing, makes the data easier to segment like numerating pixels into a digit by dividing the RGB code of the pixel by 256. Followed by segmentation which partitions an image into regions that are meaningful for a particular task—the foreground, comprising the objects of interest, and the background, everything else. In this step the program determines if it will be a region-based segmentation in which similarities are detected or a boundary-based segmentation in which discontinuities are detected. Following segmentation is feature extraction where features are identified. Features are characteristic properties of the objects whose value should be similar for objects in a particular class, and different from the values for objects in another class (or from the background). Finally, the last step is classification which assigns objects to certain categories based on the feature information by evaluating the evidence presented and decides regarding which class each object should be assigned to, depending on whether the values of its features fall inside or outside the tolerance of that class.

For computer recognition some of the machine learning algorithmic methods through pattern recognition include color mark tracking which searches pixel by pixel through their RGB values for the color of it is looking for. Prewitt Operations is used to find edges of objects (like when a self-guided drone is flying through an obstacle) by searching in patches. To do so scientist employ a technique called convolution in which a rule is created that defines an edge by a number indicating the color differences between a pixel on the left and pixel on the right. Through this concept the Viola Jones Face Detection method uses the same techniques to identify multiple features that identifies a face through scanning every patches of pixels in a picture, such as finding lines for noses and islands for eyes (CrashCourse). The last method and the one we will focus on is convolutions neural networks (ConvNets). This method follows the neural network concept explained in 1.2 but has many different complex layers that outputs a new image through different learned convolutions like edges, corners, shapes, simple objects (mouths/eyebrows), etc. until there is a layer that put all the previous convolutions together. ConvNets are not required to be many layers deep, but they usually are, to recognize complex objects and scenes hence why the technique is considered deep learning.

The image taken from Andrej Karpathy’s blog on ConvNets show how ConvNets operate. On the left is the image and the ConvNet is fed raw image pixels, which represent as a 3-dimensional grid of numbers. For example, a 256×256 image would be represented as a 256x256x3 array (last 3 for red, green, blue). Then convolutions are performed, meaning small filters are applied to slide over the image spatially. These filters respond to different features in the image, it could be an edge, island, or regions of a specific color. There are 10 responses to the filter which represents one column. These 10 responses indicate that there are 10 filters to help identify what the image is. In this way the original photo is transformed from the original (256,256,3) image to a (256,256,10) “image”, where the original image information is discarded and the ConvNet only keeps the 10 responses of the filters at every position in the image. The next 14 columns are the same operation continuously repeated to get each new column. This will gradually detect more and more complex visual patterns until the last set of filters puts all the previous convolutions together and makes a prediction (Karpathy).

Pattern recognition is not 100% accurate. The choosing of the features that create the decision boundaries and space result in a confusion matrix that tells what the algorithm got right and wrong. This inability to be 100% accurate is termed the “Curse of Dimensionality” in which the more features we add to make the decisions more precise the more complicated the classification become and as such experts employ the “Keep It Simple Scientist” method. Faces are even more difficult than other images because differences in poses and lighting or additive features like hats, glasses, or even beards cause significant changes in the image and understanding by algorithms. However, scientist can program algorithms like ConvNet to be mostly right by identifying features and through repetitive training which assists the algorithm to gradually figure out what to look for, termed reinforcement learning.

3. Conclusion

Facial recognition is nothing but pattern recognition. ConvNet is just one of many methods that organizations use to recognize face. Computers are given an algorithm to learn from a trained data set before being applied to a test set. These algorithms are only extrapolating from the trained data accurate predictions trying to get the closest approximation to whatever we want. When the outputs fail, we are not getting a good correspondence between what we inputted and reality. We need to redesign and go back and get better approximations (actionable) to get accurate projections. It is not the technology itself that is wrong but the data humans feed it. Garbage in, garbage out. Thus, we see ethical issues today regarding AI perpetuating racial and gender biases. A Pew research shows large gender disparities between facial recognition technology being able to identify male and females based on faulty data. While a research in 2008 showed glaring racial discrepancies between black and white skin tones. Now knowing the design principles behind facial recognition, accurate training data that reflects the population this technology will be used on is key to solving this issue. To do so organizations should diversify training data and the field by encouraging and supporting minorities in color and gender. Governments should enact regulations to ensure transparency and accountability in AI technology and prevent the use of facial recognition in justice and policing without a standard for accuracy. Other concerns derive from organizations getting these images without the consent of the person in said image and using it in their facial recognition databases. Though this is not a fault of the technology itself but the application of the technology by organizations. As such similar solutions resolve around regulations and transparency. The future does not need to look bleak if people gain a shared understanding of what really drives these issues. The first step is understanding that facial recognition is not a blackbox that cannot be demystified. It is instead just extremely fast pattern recognition utilizing algorithms on sometimes skewed data. Understanding the design principles behind anything can better shape solutions to problems that exist.  


Alapaydin, Ethem. Machine Learning-The New AI. MIT Press, 2016,

Besheer Mohamed, et al. “The Challenges of Using Machine Learning to Identify Gender in Images.” Pew Research Center: Internet, Science & Tech, 5 Sept. 2019,

Boden, Margaret. AI-Its Nature and Future. Oxford University Press, 2016,

Buolamwini, Joy, and Timnit Gebru. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. p. 15.

CrashCourse. Computer Vision: Crash Course Computer Science #35. 2017. YouTube,

Dougherty, Geoff. Pattern Recognition and Classification. Springer, Accessed 26 Feb. 2021.

Irvine, Martin. “CCTP-607: Leading Ideas in Technology: AI to the Cloud.” Google Docs, Accessed 3 May 2021.

Karpathy, Andrej. “What a Deep Neural Network Thinks about Your #selfie.” Andrej Karpathy Blog, 25 Oct. 2015,

Kelleher, John. Deep Learning. MIT Press, 2019,

Simon, Herbert. The Sciences of the Artificial. 3rd ed., MIT Press, 1996,

White, Ron, and Tim Downs. How Digital Photography Works. 2nd ed, Que Publishing, 2007,

Woodbridge, Micheal. A Brief History of Artificial Intelligence. 1st ed., FlatIron Books, 2020,