Author Archives: Chloe Wawerek

Final Paper: De-Blackboxing Facial Recognition

Chloe Wawerek

De-Blackboxing Facial Recognition

As the Key to Ethical Concerns 

Abstract

Facial Recognition has received attention in recent news for issues regarding biases in machines ability to detect faces and the repercussions this would have on minorities. After reviewing the design principles behind facial recognition through material on the history and evolution of AI, I am confident that the ethical issues facing facial recognition is not in the technology itself but how humans shape the technology. This coupled by case studies on research conducted by Pew and MIT emphasizes that skewed data affects how algorithms process and learn which leads some technology to have what is called ingrain biases. This though is easy to solve after de-blackboxing facial recognition, which is what I am to do in this paper.

  1. Introduction

There are certain things that we take for granted as humans. The ability to see, speak, and comprehend are just a few that we can do but have difficulty explaining how we do it. Scientist are trying to replicate what minds can do, like the above-mentioned task, in computers that constitutes a broad new field better known as AI. However, the thing with technology is that everything is an embodiment of electricity designed to represent 1s or 0s also known as bits. String of bits are then defined by programs to be a symbol like numbers. Computing therefore represents how humans impose a design on electricity to perform as logic processors.  As a result, AI programs do not have a human’s sense of relevance i.e. common sense because amongst many things they do not know how to frame a situation, in which implications tacitly assumed by human thinkers are ignored by the computer because they haven’t been made explicit (Boden). In the words of Grace Hopper “The computer is an extremely fast moron. It will, at the speed of light, do exactly what it is told to do—no more, no less.” So, the question then is how did humanity start concerning itself with the ability of computers to recognize faces? I want to examine how facial recognition works and de-blackbox this ability down to the design processes that set the foundation for this technology. Starting from the concept of computation I will trace the evolution of facial recognition to highlight what the root issues are regarding this technology. Fundamentally, Computers have a vision problem because they cannot understand visual images as human do. Computers need to be told exactly what to and what not to look for in identifying images and solving problems, hence extremely fast morons. Understanding this we need to look deeper into why issues exist if humans set the precedent for what computers should see versus what they do see.  

  1. Facial Recognition

2.1 Computation to AI

The designs for computing systems and AI have been developed by means of our common human capacities for symbolic thought, representation, abstraction, modeling, and design (Irvine). Computation systems are human made artifacts composed of elementary functional components that act as an interface between the functions performed by those components and the surroundings in which it operates (Simon). Those functions combine, sequence, and make active symbols that mean (“data representations”) and symbols that do (“programming code”) in automated processes for any programmable purpose (Irvine). Computers then are nothing more than machine for following instructions and those instructions are what we call programs and algorithms. Roughly speaking, all a computer can do is follow lists of instructions such as the following:

  • Add A to B
  • If the result is bigger than C, then do D; otherwise, do E
  • Repeatedly do F until G

Computers, then, can reliably follow very simple instructions very, very quickly, and they can make decisions if those decisions are precisely specific. (Woodbridge) If we are to build intelligent machine, then their intelligence must ultimately reduce to simple, explicit instructions like these, which begs to question can humans produce intelligent behavior simply by following lists of instructions? Well, AI takes inspiration from the brain. If we can understand how the brain functions regarding information processing that surpasses engineering products – vision, speech recognition, learning – we can define solutions to these task as formal algorithms and implement them on computers (Alapaydin). Currently, a machine is said to have AI if it can interpret data, potentially learn from the data, and use that knowledge to adapt and achieve specific goals. However, based on this definition there exist different interpretations of AI, strong vs. weak. Strong AI is when a program can understand in a similar way as a human would. Whereas weak AI is when a program can only simulate understanding. Scientists are still wrestling with the issues of AI comprehension that involves understanding the human world and the unwritten rules that govern our relationships within it by testing programs through the Winograd Schema (Woodbridge).

Example: Question – Who [feared/advocated] violence?

Statement 1a: The city councilors refused the demonstrators a permit because they feared violence.

Statement 1b: The city councilors refused the demonstrators a permit because they advocated violence.

These problems consist of building computer programs that carry out task that currently requires brain function, like driverless cars or writing interesting stories. To do so scientist use a process called machine learning which aims to construct a program that fits a given data set by creating a learning program that is a general model with modifiable parameters. Learning algorithms adjust the parameters of the model by optimizing performance criterion defined on the data (Alapaydin). In layman terms machine learning are algorithms that give computers the ability to learn from data, and then make predictions and decisions while maximizing correct classification while minimizing errors. A machine learning algorithm involves two steps to choose the best function, from a set of possible functions, in explaining the relationships between features in a dataset: training and inference. 

  1. The first step, training, involves allowing a machine learning algorithm to process a dataset and chooses the function that best matches the patterns in the dataset. The extracted function will be encoded in a computer program in a particular form known as a model. The training process then proceeds by taking inputs creating outputs and comparing outputs to the correct outputs from example list in dataset. The training is finished and model is fixed once the machine learning algorithm has found a function that is sufficiently accurate in which the output generated matches the correct output listed in the dataset.
  2. The next step is inference in which the fixed model is applied to new examples that scientists do not know the correct output value and therefore want the model to generate estimates of this value on its own.
    1. Machine learning algorithm uses two sources of info to select the best function. One is the dataset and the other assumptions (inducive bias) to prefer some functions over others, irrespective of the patterns in the dataset. Dataset and inducive bias counterbalance each other, a strong inductive bias payless attention to the dataset when selecting a function. (Kelleher)

Neural networks are a commonly used form of machine learning algorithm that take inspiration from some structures that occur in the brain that this paper will focus on in its de-blackboxing of facial recognition. Neural network uses a divide-and-conquer strategy to learn a function: each neuron in the network learns a simple function, and the overall (more complex) function, defined by the network, is created by combining these simpler functions. In brief, neural networks are organized in layers connected as links that take a series of inputs and combines them to then emit a signal as an output, both inputs and outputs are represented as numbers. Between the input and output are hidden layers that sum the weighted inputs and then apply a bias. These are initially set to random numbers when a neural network is created, then an algorithm starts training the neural network using labeled data from the training data. The training starts from scratch by initializing filters at random and then changing the filters slightly using a mathematical process by telling the system what the actual image is e.g. a toad vs a frog (supervised learning?). Next it applies the activation function (transfer function) that gets applied to an output performing a final mathematical modification to get the result.

2.1 Computer Vision  

Computer vision is extracting high level understanding from digital videos and images. So, the first step is to make digital photos and to do so we need to use a digital camera. When taking a photo, the light of the desired image passes through a camera’s lens, diaphragm, and open shutter to hit millions of tiny micro lenses that capture the light to direct it properly. The light then goes through a hot mirror that lets visible light pass and reflect invisible infrared light that would distort the image. Then the remaining light goes through a layer that measures the colors captured this layer mimics human eyesight as only being able to distinguish visible light and identify the colors red, green, and blue, another explicit presentation of human design in our computational systems. The usual design is the Bayer array which is a matrix array of green, red, and blue colors separated and never touching the same color but contains double the number of green. Finally, it strikes the photodiodes which measure the intensity of the light by first hitting the silicon at the “P-layer” which transforms the lights energy into electrons creating a negative charge. This charge is drawn into the diode’s depletion area because of the electric field the negative charge creates with the “N-layers” positive charge. Each photodiode collects photons of light as long as the shutter is open, the brighter a part of the photo is the more photons have hit that section. Once the shutter closes the pixels have electrical charges that are proportional to the amount of light received. Then it can go through two different process either CCD (charge-coupled device) or CMOS (complementary metal-oxide semiconductor). Either process the pixels go through an amplifier that converts this faint static electricity into a voltage in proportion to the size of each charge (White). The electricity is then converted into data in with the most common being hexcode. Data is always something with humanly imposed structure, that is, an interpretable unit of some kind understood as an instance of a general type. Data is inseparable from the concept of representation. In simplest terms colors are composed of 256 numbers of each shade of red, blue, and green. So, to alter a pictures colors one needs to change the number associated with that color. Black being 0 of all three which is the absence of color and white being 256 of all three.

There are several methods that a computer can then use to extract a meaning from digital images and gain vision. The ultimate goal is to gain context sensitivity which means to be aware of its surroundings i.e. understand social and environmental factors so that the machine reacts appropriately. To do so machine learning relies on pattern recognition. Pattern recognition composes of classifying data into categories determined by decision boundaries.  To do so involves a process that first starts with sensing/acquisition. This step uses a transducer such as a camera or microphone to capture signals (e.g., an image) with enough distinguishing features. The next step, preprocessing, makes the data easier to segment like numerating pixels into a digit by dividing the RGB code of the pixel by 256. Followed by segmentation which partitions an image into regions that are meaningful for a particular task—the foreground, comprising the objects of interest, and the background, everything else. In this step the program determines if it will be a region-based segmentation in which similarities are detected or a boundary-based segmentation in which discontinuities are detected. Following segmentation is feature extraction where features are identified. Features are characteristic properties of the objects whose value should be similar for objects in a particular class, and different from the values for objects in another class (or from the background). Finally, the last step is classification which assigns objects to certain categories based on the feature information by evaluating the evidence presented and decides regarding which class each object should be assigned to, depending on whether the values of its features fall inside or outside the tolerance of that class.

For computer recognition some of the machine learning algorithmic methods through pattern recognition include color mark tracking which searches pixel by pixel through their RGB values for the color of it is looking for. Prewitt Operations is used to find edges of objects (like when a self-guided drone is flying through an obstacle) by searching in patches. To do so scientist employ a technique called convolution in which a rule is created that defines an edge by a number indicating the color differences between a pixel on the left and pixel on the right. Through this concept the Viola Jones Face Detection method uses the same techniques to identify multiple features that identifies a face through scanning every patches of pixels in a picture, such as finding lines for noses and islands for eyes (CrashCourse). The last method and the one we will focus on is convolutions neural networks (ConvNets). This method follows the neural network concept explained in 1.2 but has many different complex layers that outputs a new image through different learned convolutions like edges, corners, shapes, simple objects (mouths/eyebrows), etc. until there is a layer that put all the previous convolutions together. ConvNets are not required to be many layers deep, but they usually are, to recognize complex objects and scenes hence why the technique is considered deep learning.

The image taken from Andrej Karpathy’s blog on ConvNets show how ConvNets operate. On the left is the image and the ConvNet is fed raw image pixels, which represent as a 3-dimensional grid of numbers. For example, a 256×256 image would be represented as a 256x256x3 array (last 3 for red, green, blue). Then convolutions are performed, meaning small filters are applied to slide over the image spatially. These filters respond to different features in the image, it could be an edge, island, or regions of a specific color. There are 10 responses to the filter which represents one column. These 10 responses indicate that there are 10 filters to help identify what the image is. In this way the original photo is transformed from the original (256,256,3) image to a (256,256,10) “image”, where the original image information is discarded and the ConvNet only keeps the 10 responses of the filters at every position in the image. The next 14 columns are the same operation continuously repeated to get each new column. This will gradually detect more and more complex visual patterns until the last set of filters puts all the previous convolutions together and makes a prediction (Karpathy).

Pattern recognition is not 100% accurate. The choosing of the features that create the decision boundaries and space result in a confusion matrix that tells what the algorithm got right and wrong. This inability to be 100% accurate is termed the “Curse of Dimensionality” in which the more features we add to make the decisions more precise the more complicated the classification become and as such experts employ the “Keep It Simple Scientist” method. Faces are even more difficult than other images because differences in poses and lighting or additive features like hats, glasses, or even beards cause significant changes in the image and understanding by algorithms. However, scientist can program algorithms like ConvNet to be mostly right by identifying features and through repetitive training which assists the algorithm to gradually figure out what to look for, termed reinforcement learning.

3. Conclusion

Facial recognition is nothing but pattern recognition. ConvNet is just one of many methods that organizations use to recognize face. Computers are given an algorithm to learn from a trained data set before being applied to a test set. These algorithms are only extrapolating from the trained data accurate predictions trying to get the closest approximation to whatever we want. When the outputs fail, we are not getting a good correspondence between what we inputted and reality. We need to redesign and go back and get better approximations (actionable) to get accurate projections. It is not the technology itself that is wrong but the data humans feed it. Garbage in, garbage out. Thus, we see ethical issues today regarding AI perpetuating racial and gender biases. A Pew research shows large gender disparities between facial recognition technology being able to identify male and females based on faulty data. While a research in 2008 showed glaring racial discrepancies between black and white skin tones. Now knowing the design principles behind facial recognition, accurate training data that reflects the population this technology will be used on is key to solving this issue. To do so organizations should diversify training data and the field by encouraging and supporting minorities in color and gender. Governments should enact regulations to ensure transparency and accountability in AI technology and prevent the use of facial recognition in justice and policing without a standard for accuracy. Other concerns derive from organizations getting these images without the consent of the person in said image and using it in their facial recognition databases. Though this is not a fault of the technology itself but the application of the technology by organizations. As such similar solutions resolve around regulations and transparency. The future does not need to look bleak if people gain a shared understanding of what really drives these issues. The first step is understanding that facial recognition is not a blackbox that cannot be demystified. It is instead just extremely fast pattern recognition utilizing algorithms on sometimes skewed data. Understanding the design principles behind anything can better shape solutions to problems that exist.  

References

Alapaydin, Ethem. Machine Learning-The New AI. MIT Press, 2016, https://drive.google.com/file/d/1iZM2zQxQZcVRkMkLsxlsibOupWntjZ7b/view?usp=drive_open&usp=embed_facebook.

Besheer Mohamed, et al. “The Challenges of Using Machine Learning to Identify Gender in Images.” Pew Research Center: Internet, Science & Tech, 5 Sept. 2019, https://www.pewresearch.org/internet/2019/09/05/the-challenges-of-using-machine-learning-to-identify-gender-in-images/.

Boden, Margaret. AI-Its Nature and Future. Oxford University Press, 2016, https://drive.google.com/file/d/1P40hHqgDjysytzQfIE7ZXOaiG0Z8F2HR/view?usp=drive_open&usp=embed_facebook.

Buolamwini, Joy, and Timnit Gebru. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. p. 15. http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf

CrashCourse. Computer Vision: Crash Course Computer Science #35. 2017. YouTube, https://www.youtube.com/watch?v=-4E2-0sxVUM.

Dougherty, Geoff. Pattern Recognition and Classification. Springer, https://drive.google.com/file/d/1BT-rDW-mvnCOtUvvm-2xBwzF8_KJcKyI/view?usp=drive_open&usp=embed_facebook. Accessed 26 Feb. 2021.

Irvine, Martin. “CCTP-607: Leading Ideas in Technology: AI to the Cloud.” Google Docs, https://drive.google.com/file/d/1Hk8gLXcgY0G2DyhSRHL5fPQn2Z089akQ/view?usp=drive_open&usp=embed_facebook. Accessed 3 May 2021.

Karpathy, Andrej. “What a Deep Neural Network Thinks about Your #selfie.” Andrej Karpathy Blog, 25 Oct. 2015, https://karpathy.github.io/2015/10/25/selfie/.

Kelleher, John. Deep Learning. MIT Press, 2019, https://drive.google.com/file/d/1VszDaSo7PqlbUGxElT0SR06rW0Miy5sD/view?usp=drive_open&usp=embed_facebook.

Simon, Herbert. The Sciences of the Artificial. 3rd ed., MIT Press, 1996, https://drive.google.com/file/d/1jXPTxnsDzOA2AKuWsGBqF1sPV07sMQtO/view?usp=drive_open&usp=embed_facebook.

White, Ron, and Tim Downs. How Digital Photography Works. 2nd ed, Que Publishing, 2007, https://drive.google.com/file/d/1Bt5r1pILikG8eohwF1ZnQuv5eNL9j8Tv/view?usp=sharing&usp=embed_facebook.

Woodbridge, Micheal. A Brief History of Artificial Intelligence. 1st ed., FlatIron Books, 2020, https://drive.google.com/file/d/1zSrh08tm9WbYtERSNxEWvItnKdJ5qmz_/view?usp=sharing&usp=embed_facebook.

Synthesis of Learning

I’m with some older family members and friends this weekend, and like most family gatherings I’m asked what am I doing in life. After telling them that I’m in school, they then asked what I was learning and it was amazing how in depth I could explain concepts like AI/ML. It was also great to see their interest in trying to understand it and clear up some confusion regarding AI/ML and the ethical implications. Below is some concepts that I refined after looking through my notes to take with me as go to future family gatherings and figure is a good start in my learning synthesis. I’ll finish it with some thoughts on how I want to approach my final paper:

Before we talk about Artificial Intelligence (AI) we need to understand the role of computers. Computers are nothing more than machine for following instructions and those instructions are what we call programs and algorithms. AI then seeks to make computers do the sorts of things that minds can do with two main aims: to get useful things done (technical) and to help answer questions about human beings and other living things (scientific). To do so though requires intelligence that must ultimately be reduced to simple explicit instructions for computers to process. Which presents the fundamental challenge of AI: Can you produce intelligent behavior simply by following list of instructions? 

A machine is said to have AI if it can interpret data, potentially learn from the data, and use that knowledge to adapt and achieve specific goals. The type of AI we are facing today is nothing like that of which is found in movies and science fiction novels. Today we are  struggling with the concept of narrow/weak AI in which scientist are trying to build computer programs that carry out task that currently requires thought. Some examples include filtering spam messages, recognizing faces in pictures, and creating usable translations. Each require a similar design processes in which they work because of machine learning (ML) which is a subset of AI. ML is used when we believe there is a relationship between observations of interest, but we do not know exactly how. Using artificial neural networks ML can be used to predict new cases of a certain instance through pattern recognition. Artificial neural networks are three layers connected as links that replicate the brain’s neural network. The first layer is the input layer that gives a numeric value to the input. The second layer is the hidden layer that classifies data and transfers inputs into the last layer the output layer. To do so the hidden layer applies a bias to the weighted input and an algorithm starts training the neural network using labeled data from a training set. The output is a result of the hidden layer’s interpretation i.e. learning from the training data. 

Though these programs or “set of instructions” do not understand decisions made but can simulate understanding, which can be dangerous when society puts trust in systems they do not understand. Some ethical concerns include the inherent bias in existing AI because of the biases in training data. Others revolve around privacy rights regarding AI uses in facial recognition as well as the destructive uses in new AI innovations in natural language processing like GPT-3. I think one topic for a final paper could be examining some of these concerns more thoroughly  specifically regarding facial recognition and the ideas behind big data and unregulated collection of all this data. At the same time I’m interested in the design aspects of cloud computing and AI with regard another big issue, the environmental impact. The amount of energy consumed through AI algorithms like GPT-3 is alarming as climate change becomes the focus of governments and corporations. 

References:

Alapaydin, Ethem. 2016. Machine Learning-The New AI. MIT Press Essential Knowledge Series. Cambridge, MA: MIT Press. https://drive.google.com/file/d/1iZM2zQxQZcVRkMkLsxlsibOupWntjZ7b/view?usp=drive_open&usp=embed_facebook.

Boden, Margaret. 2016. AI-Its Nature and Future. Great Britain: Oxford University Press. https://drive.google.com/file/d/1P40hHqgDjysytzQfIE7ZXOaiG0Z8F2HR/view?usp=drive_open&usp=embed_facebook.

CrashCourse. 2017. Machine Learning & Artificial Intelligence: Crash Course Computer Science #34. https://www.youtube.com/watch?v=z-EtmaFJieY&t=2s. ———. 2019. What Is Artificial Intelligence? Crash Course AI #1. https://www.youtube.com/watch?v=a0_lo_GDcFw&list=PL8dPuuaLjXtO65LeD2p4_Sb5XQ51par_b&index=2.

Woodbridge, Micheal. 2020. A Brief History of Artificial Intelligence. 1st ed. New York: FlatIron Books. https://drive.google.com/file/d/1zSrh08tm9WbYtERSNxEWvItnKdJ5qmz_/view?usp=sharing&usp=embed_facebook.

Big Data

I enjoyed the definition of Big Data by Kitchin that seems to be the community standard in which Big Data is different from other data because of terms like volume, velocity, variety, exhaustive, resolute, indexical, relational, flexible, and scalable amongst others that the Wikipedia blog also included. To me Big Data is data that shares the above italicized traits in which traditional computer processors and memory can not compute. Non-traditional would require AI/ML computation to deal with the abundance, exhaustivity and variety, timeliness and dynamism, messiness and uncertainty, and high relationality that is Big Data. My question for this, does the concept of Big Data involuntarily mean the use of AI/ML with the Data acquired? I know Kitchin characterizes Big Data as opposed to Data by being generated continuously, seeking to be exhaustive and fine-grained in scope, and flexible and scalable in its production, in doing so does that mean that Big Data we know today really only emerged from innovations in AI/ML? I believe the answer is yes, but would like confirmation.

Furthermore, Big Data is a relatively new phenomenon under the above definition because it is a result of two enabling changes in society that Denning argued. First, the expansion of the internet into a billion computing devices i.e. the Internet of Things that enables access to vast amounts of data. Second, the digitization of almost everything resulting in an explosion of innovation of network-based big data applications and automation of cognitive tasks. As a result, the emergence of Big Data from societal change is spurring more societal change. “Revolutions in science have often been preceded by revolutions in measurement,” – Sinan Aral. Science is only one thing that Big Data has changed, Kitchin argues that Big Data will move scientific approaches to a data-driven science method blending aspects of abduction, induction, and deduction the “born from data” rather than theory method. In society, Big Data is spurring changes in social networks and content providers ability to attract and hold consumers attention in the digital economy (Huberman). In government, Big Data can be a tool to enable surveillance and monitoring at unprecedented levels (Johnson). In education, Big Data is enabling the creation of different methods of learning and instructions through creation of personalized paths based on data analytics on users’ interactions with existing educational courses (Opening Statemen). The applications of big data touches nearly every aspect of our society though are there certain parts of our society that like cloud computing should not utilize Big Data?

Big Data is important because data is viewed as a prized resource that can optimize efficiency and profits for organizations or enable surveillance and security by governments. This and the relative newness of this technology has lent to a wild west in terms of lack of regulations and limits in the collection of data, the encroachment on consumer’s privacy and security rights, and the lack of transparency in models. So, I will echo a similar question made in the closing statement of the ACM Ubiquity, do regulatory initiatives even have the support to confront the ethical challenges in Big Data?

 

“Big Data.” 2021. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Big_data&oldid=1016820836.
“Kitchin-Big Data-New Epistemologies and Paradigm Shifts-2014.Pdf.” n.d. Google Docs. Accessed April 10, 2021. https://drive.google.com/file/d/1MkRUzSYCu1LKXWxkR6COus26Wae1srXC/view?usp=drive_open&usp=embed_facebook.
“Kitchin-The-Data-Revolution-Big-Data-Open-Data-Data-Infrastructures-Excerpts.Pdf.” n.d. Google Docs. Accessed April 10, 2021. https://drive.google.com/file/d/1T2JGeIHWkez0ecTgkWXJd5q4wWWkCOWl/view?usp=drive_open&usp=embed_facebook.
“ONeil-Weapons of Math Destruction (2016).Pdf.” n.d. Google Docs. Accessed April 10, 2021. https://drive.google.com/file/d/1ps92pvLRVWCbno4CrVCyYwCukDUWBJsh/view?usp=drive_open&usp=embed_facebook.
“Ubiquity: Big Data.” n.d. Accessed April 10, 2021. https://ubiquity.acm.org/article.cfm?id=3158352.
“Ubiquity: Big Data and the Attention Economy.” n.d. Accessed April 10, 2021. https://ubiquity.acm.org/article.cfm?id=3158337.
“Ubiquity: Big Data, Digitization, and Social Change.” n.d. Accessed April 10, 2021. https://ubiquity.acm.org/article.cfm?id=3158335.

The Cloud

To understand how AI/ML and data systems are implemented in cloud architecture we first have to discuss what the “Cloud” is. Cloud computing defined by the National Institute of Standards and Technology is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models. In brief the five essential characteristics are On-demand self-service, Broad network access, Resource pooling, Rapid elasticity, and Measured service. The three service models are Information as a Service, Platform as a Service, and Software as a Service. Finally, the four deployment models are public, private, community, hybrid. The readings provided more detail specifically Rountree’s Basics of Cloud Computing regarding the basic definitions of cloud followed by more in-depth detail of the abstractions (service models) and deployment models in Ruparelia’s Cloud Computing. For me though I understand cloud as the ability to manipulate data with on-demand availability of computer system resources from hosting companies (Amazon, Google, IBM, Microsoft). Cloud takes away the need for physical infrastructure by virtualizing everything from physical hardware to applications.  

Now how that applies to AI/ML is that the main questions so let’s now go to IBM’s Virtual Assistant Watson. Most of the case studies were based on business interactions and this virtual assistant was built to help companies “provide customers with fast consistent and accurate answers across any application, device or channel.” I took this to mean a business version of Apple’s Siri working for companies customer service operations. IBM states that Watson utilizes what IBM calls AutoML with “meta-learning” techniques, what I conjure as a form of blackboxing. I think Watson is another neural network that uses the similar techniques in the virtual assistant’s discussion in week 8, in which virtual assistants receive an input and run that input through various hidden layers in a neural network to produce an output. However, Watson does this in a form of conversational user interfaces i.e. advance chatboxes. IBM argues that they are the leader in this technology for companies due to advance in their natural language processing capabilities and new natural language understanding. Since all my information from them is from their own publishing I would argue that it may be bias.

Anyway, how this combines with cloud computing is that businesses can pay for hosting companies computer services like Watson to collect, organize, and analyze their data and provide appropriate responses without investing in their own physical and virtual resources to do the same. This thereby produces a new source of efficiency and innovation. Companies like Siemens have used this tool and created CARL an HR interactive chatbot for employees. CARL which stands for Cognitive Assistant for Interactive User Relationship and Continuous Learning runs on IBM Cloud and is powered by IBM Watson. This is a good way of viewing the interaction between Cloud and AI/ML. I like to look at it in the sense the Watson provided the adaptability behind the process i.e. algorithms, NLP, and AI/ML to answer users inquiries while the cloud provided the scalability to accommodate users, languages and topics through IBM’s vast databases, storage centers, and computation services.

Now the ethics behind this are interesting. As opposed to other emerging technology that can be used for nefarious reasons like AI/ML, cloud computing to me is just an extension of computing services on a different platform. So, the risk I see is in the inherent design of the system due to its reliance on the internet namely the vulnerabilities to attacks and security and privacy. Regarding vulnerabilities to attacks, I mostly agree with Floridi’s argument in Ethics of Cloud Computing we have to look more at the users and impose proscriptive measures preventing uses of certain businesses with private and personal information to use these systems, for now. I believe that companies should continue to advance their software and find resolutions to the inherent vulnerabilities in cloud computing, an until then certain businesses should avoid using cloud computing. Regarding security and privacy, if only four companies own and operate cloud services which are slowly becoming the go to “unifying” architecture I instantly think of security and privacy violations of user’s data. In this sense I disagree with Floridi and argue that something beyond technological neutral regulations should be thought of that can both limit big tech’s monopoly of this service while also sustaining innovation. There is a middle ground we just need more dialogue and de-blackboxing these concepts for the general public.

 

Ethics for AI

In my opinion the biggest issue is data abuse. Pew research defines it as data use and surveillance in complex systems designed for profit by companies or for power by governments. The below will examine how both the private and public realm exploit data through my interpretations of certain cases.

In terms of the private realm, the ability for companies to create algorithms that perpetuate filter bubbles and echo chambers thereby increasing polarization is my main concern. These algorithms take the judgment out of information and replace it with predictive analysis of what a person would like to read, listen, or watch to reinforce the user’s opinions and thereby maintain their attention. These big data companies, notably coined the frightful five, have the power to influence and control the platforms used today for public discourse and in doing so are collecting vast amount of information that they profit from – “They are benefitting from the claim to empower people, even though they are centralizing power on an unprecedented scale.” These companies must be held accountable for there role in sowing discord and be regulated to prevent their accumulation of unfettered power. Looking specifically at Facebooks “Ad Preferences” we can examine this problem more thoroughly. Facebook categorizes and identifies users through interaction on Facebook, enhanced observation through Facebook’s Pixel application, and the ability to monitor users offline. With these inferences Facebook uses its deep learning models to label people for specific targeting purposes. This effort to curate advertisement and clickbait is an alarming invasion of privacy in which 51% are not comfortable with, yet it is still being done. What regulations can we make to impose transparency in big techs application of data? Should we break up big tech? Should big tech be liable for the content on their platforms?

In terms of the public realm, the ability for governments to create a surveillance state in which constant monitoring, predictive analysis, and censorship hinder the freedom of human agency is what I view as the most alarming utilization of AI. We see it developing in China with the social credit system and the exportation of safe cities to developing nations across the world. The extreme of China may not become a reality in America because of differing ideals, but that does not mean the government will not use AI in some form to secretly monitor citizens or at the very least violate privacy rights. Looking at Hao and Solon’s article we see how data collection for face recognition fell down a slippery slope. Organizations are downloading images without users consent to collect and hoard often without proper labeling for unimaginable uses in the future, namely surveillance. Critics rightly so are arguing against the legality of this collection and then distribution toward law enforcement agencies that exacerbates “historical and existing bias” that harms communities that are already “over-policed and over-surveilled.” What regulations should be imposed to prevent the exploitation of biometrics? Can we retroactively delete our images from these databases like it mentioned regarding IBM? Or will we forever have our biometrics stored? What laws can we make regulating companies and government cooperation over the collection of our data for their own purposes?

Data abuses lies with an uneducated public and unsubstantial regulatory laws. This façade of ethics of AI is just that a façade, a temporary band aid on a growing problem. Governments supported by companies need to create epistemological communities to foster discourse on standards and norms. The first step is creating a shared language of AI that can facilitate discussions between politicians and tech and educate the public. After coming to a consensus of definitions and terms the establishment of norms can be created. These norms do not require new ideas but rather can work off the framework of the biotech principles plus one that Floridi and Cowls argued: beneficence, non-maleficence, autonomy, justice, and explicability. Creating these norms unifies attitudes toward the development of AI that can be controlled and understood. From these norms we then need to establish laws that limit the acquisition of data without authorization from users, require notifications of when algorithms are perpetuating inherent biases, provide concise but understandable explanations of what algorithms are doing, and establish watchdogs for exploitation of AI to harm. This is just the regulatory side of AI because to approach the scientific community would be to demand understanding in AI to know what values and ethics are and implement them in choices, a feat that scientist struggle with. So, my question today focuses on the fact that knowing that AI is inherently flawed because it lacks the emotional intelligence what task should we prevent it from doing? Rather what task should we prevent it from being the sole decision maker in?

References: 

Floridi, Luciano, and Josh Cowls. 2019. “A Unified Framework of Five Principles for AI in Society.” Harvard Data Science Review 1 (1). https://doi.org/10.1162/99608f92.8cd550d1.
Francisco, Olivia SolonOlivia Solon is tech investigations editor for NBC News in San. n.d. “Facial Recognition’s ‘Dirty Little Secret’: Social Media Photos Used without Consent.” NBC News. Accessed March 19, 2021a. https://www.nbcnews.com/tech/internet/facial-recognition-s-dirty-little-secret-millions-online-photos-scraped-n981921.
———. n.d. “Facial Recognition’s ‘Dirty Little Secret’: Social Media Photos Used without Consent.” NBC News. Accessed March 19, 2021b. https://www.nbcnews.com/tech/internet/facial-recognition-s-dirty-little-secret-millions-online-photos-scraped-n981921.
“Hao-This Is How We Lost Control of Our Faces-2021.Pdf.” n.d. Google Docs. Accessed March 19, 2021. https://drive.google.com/file/d/1bULmbuWbLb0G27xsgGtJQeuLTIfVhbJn/view?usp=embed_facebook.
“Nemitz-Constitutional Democracy and Technology in the Age of Artificial Intelligence-2018.Pdf.” n.d. Google Docs. Accessed March 19, 2021. https://drive.google.com/file/d/10-9tiJBJ-oXjvpY7it18p5cL6JN9UNCa/view?usp=embed_facebook.
NW, 1615 L. St, Suite 800Washington, and DC 20036USA202-419-4300 | Main202-857-8562 | Fax202-419-4372 | Media Inquiries. 2018. “Artificial Intelligence and the Future of Humans.” Pew Research Center: Internet, Science & Tech (blog). December 10, 2018. https://www.pewresearch.org/internet/2018/12/10/artificial-intelligence-and-the-future-of-humans/.
———. 2019a. “Facebook Algorithms and Personal Data.” Pew Research Center: Internet, Science & Tech (blog). January 16, 2019. https://www.pewresearch.org/internet/2019/01/16/facebook-algorithms-and-personal-data/.
———. 2019b. “The Challenges of Using Machine Learning to Identify Gender in Images.” Pew Research Center: Internet, Science & Tech (blog). September 5, 2019. https://www.pewresearch.org/internet/2019/09/05/the-challenges-of-using-machine-learning-to-identify-gender-in-images/.
“The Ethical Character of Algorithms—and What It Means for Fairness, the Character of Decision-Making, and the Future of News.” n.d. The Ethical Machine. Accessed March 19, 2021. https://ai.shorensteincenter.org/ideas/2019/1/14/the-ethical-character-of-algorithmsand-what-it-means-for-fairness-the-character-of-decision-making-and-the-future-of-news-yak6m.

Hey Google …

I decided to choose to explore how Google Assistant works because I am an android user. I have a Samsung Galaxy S9 and never used my virtual assistant to the point that I had to Google how to turn it on. After playing around with it I tested all the functions Google Assistant said it could do: search the Internet, schedule events and alarms, adjust hardware settings on the user’s device, show information from the user’s Google account, engage in two-way conversations, and much more!

Google assistant can do all of this through its own natural language processers. From my understanding it follows the same kind of logic that we’ve been learning in the last couple of weeks. The premise is this:

  1. Using speech to text platform google assistant first converts spoken language to text that the system can understand (Week 6; Crash Course #36). Quick rundown on Speech recognition, using a spectrogram spoken vowels and whole words are convert into frequencies. These frequencies are the same for each vowel and creates what is termed a phoneme. Knowing these phonemes computers can convert speech into text. This text is then further broken down into the data components identifiable through Unicode.
  2. Once identifying the command or questions Google assistant takes the users inputs and runs it through a complex neural network with multiple hidden layers. I’m unsure what specific type of neural network Google uses, but for a quick rundown on neural networks: there is an input layer, hidden layer(s), and output layer connected through links like brain neurons. Algorithms learn on data sets in the hidden layer to create an output from the inputs given (Week 5; Machine Learning 3+4).
  3. Google goes through different process for different inputs whether it is a command or question. Producing the output and other required actions. Using the speech synthesis process, reverse of speech recognition process, to present an output to the users.

**I appreciated the figures presented in the beginning and took time to understand them a little more and I think the best in terms of understanding are Fig.1, Fig. 39, and Fig. 47 (I tried to paste them in my post but I don’t think it worked. 

           
           
 
 

Some defining notes from the Google’s Patent:

  • Google assistant has various embodiments of a computing device that can work independently or interact with each other.
  • The various embodiments Google assistant can take on allows it to have access, process, and/or otherwise utilize information from various devices as well as store memory in these different embodiments.
  • Google’s assistant adapts to its users by applying personal information, previous interactions, and physical context to provide more personalized results and improve efficiency.
  • Using active ontology and its adaptability to the user mentioned, it can predict and anticipate the next text using active input elicitation technique.

*I short google assistant knows a lot about us and is constantly gathering more data to improve its interface and understanding based on patterns.

Compared to the other virtual assistance like Apple’s Siri or Amazon’s Alexa, Google is more intelligent because it uses its own servers that capable of searching Google’s entire knowledge base for answers. However, Google is not as smart as GPT-3, I use the term smart loosely. GPT-3 is the most advance natural language processing system on the planet. Develop by OpenAI and released last year this is the closest humans have to a machine capable of producing responses coherent responses to any English task. It can do it because it has more parameters, about 175 billion more, to train and learn from. It really is just a bigger version of its predecessor GPT-2 and thus has the same shortfalls that GPT-2 faced regarding comprehension and understanding.

There are a lot of metaphors out there regarding what GPT-3 is and the one I like the most is that it is an improv actor. It can write articulate response that mimic a coherent entity, but it does not understand the meaning behind the text it is writing. The lack of logic and reasoning is evident in the shortfalls regarding semantics and culture. I do not want to completely detract from this momentous step but after further reading I agree with scientist that maybe a new approach is warranted there comes a point when the bigger thing will not work. The computerphile video put it into an interesting context if you want to get to space you get just continue building bigger rockets you have to re-approach the situation. I think this is the point we are at, especially when faced with issues arising from the amount of energy needed to conduct more training computations as well as the inherent racist and sexist bias within this data.

References:

Computerphile. 2020. GPT3: An Even Bigger Language Model – Computerphile. https://www.youtube.com/watch?v=_8yVOC4ciXc.
“Google Assistant.” 2021. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Google_Assistant&oldid=1010208194.
“GPT-3, Bloviator: OpenAI’s Language Generator Has No Idea What It’s Talking about.” n.d. MIT Technology Review. Accessed March 13, 2021. https://www.technologyreview.com/2020/08/22/1007539/gpt3-openai-language-generator-artificial-intelligence-ai-opinion/.
“OpenAI’s New Language Generator GPT-3 Is Shockingly Good—and Completely Mindless | MIT Technology Review.” n.d. Accessed March 13, 2021. https://www.technologyreview.com/2020/07/20/1005454/openai-machine-learning-language-generator-gpt-3-nlp/.
“US9548050.Pdf.” n.d. Accessed March 13, 2021. https://patentimages.storage.googleapis.com/82/3a/54/65f4b75dc45072/US9548050.pdf.
“Virtual Assistant.” 2021. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Virtual_assistant&oldid=1011400052.
“Why GPT-3 Is the Best and Worst of AI Right Now | MIT Technology Review.” n.d. Accessed March 13, 2021. https://www.technologyreview.com/2021/02/24/1017797/gpt3-best-worst-ai-openai-natural-language/.

Google Translate VS. Me

Natural Languages are filled with ambiguity that lies in understanding context, which is what makes it extremely difficult for computers to understand and translate. As a student of the Persian Farsi language, I am constantly running into similar issues that computers face when translating and honestly, they probably do a better job than I can at it… (Score 1 for Google)

I went through the language training in late 2016 to early 2017 at that time my professors yelled at us for using google translate, but after the readings done today, I look back and see that I could have learned from google translate how to better translate Farsi to English for myself. Like our understanding of computer vision, natural language process uses neural networks to translate. Natural language processing (NLP) is a big field concerned with the interactions between computers and human language, how to program computers to process and analyze large amounts of natural language data.  The goal is to make a computer capable of “understanding” contents in writing to include the nuances associated with languages. Within NLP is computational linguistics which looks at approaches to linguistic questions through computational modelling. One of those computational modellings of natural language is machine translation (MT) that examines the use of software to translate text or speech from one language to another. That is what I want to focus on first in this post.

How this is done is through an encoder-decoder model specifically a recurrent neural network (RNN). Remembering how neural networks work for pattern recognition in last weeks reading, it did not come as a surprise to understand the broad concepts of this. What differentiate RNN from last week’s readings is the bidirectional function in which the program can go back into its hidden layer continually and modify it before creating its output. How this works from my understanding:

  • We have an English sentence that gets encoded with numeric values (sequence-to-vectors), so the computer understands it.
  • These numeric values (vectors) go through the neural network hidden layer using an attention mechanism to align inputs and outputs based on the weighted distribution for the most probable translation.
    1. The bidirectional programming looks at words before and after, finding the important words through the attention mechanism. This increases the ability for the computer to understand semantics and translate sentences longer than 15-20 words by aligning inputs and outputs.
  • The highest value is then decoded into a word (vector-to-sequence) and generated as the output.

*This is done word by word.

The above is the explanation for Google’s Neural Machine Translation System, which is a way to deal with the ambiguities that lie in natural languages. I kind of relate to this process on how I understand and translate languages. I’m no expert in understanding Farsi, but I approach it identifying the important words mainly the nouns or verbs like the attention mechanism would do. Then I try to find the context of the sentence by pairing the words I do know with ideas of what the words before and after could be. Where google and I differentiate is that sometimes a word is left ambiguous to me because knowing it will not help me understand the sentence or make it more difficult to understand. I can remember teachers telling me not to worry about all the words but to grab the concept of it. I can mitigate the ambiguities because I understand the context behind the content sometimes interpreting it differently but still able to portray my idea without being as concise as MT needs to be. (Score 1 for me)

Another way to deal with ambiguities in NLP, that I believe is used in Google’s system, is the concepts behind BabelNet and WordNet.  Originally thought this was just a huge database for synonyms of words like a better version of thesaurus.com, but the more I understand what NLP and MT need to have to function the more I understand the difficulty for a computer to find the meanings behind words. From my understanding, BabelNet and WordNet are lexicons that create deeper links than just synonyms by finding the semantic relationships. I think that programs like this help computers understand and generate sentences needed in chatbot conversations by relating words to other words and thereby concepts.

We see an advancement of this through the case studies in which using neural networks one can train a program to guess the next word based on relational semantics and training data. Known as GPT-2 this is the latest evolution in NLP that eerily enough can create news articles that mimic human writing. As impressive as this is, this also brings a sense of caution to the exploitation of this technology in mass producing targeted fake news, reason OpenAI did not release the coding behind this technology. Another difficulty is that even though it is capable of writing human like content the computers still do not understand anything besides word association. Just like with the difficulties of computer vision the lack of understanding permeates and frustrates researchers.

Questions

  1. Of the four philosophies guiding NLP mentioned in the Hao article, which one does Machine Translation fit under? What does BabelNet/WordNet fit under?
  2. It seems like the resounding issues with NLP is the same as Computer Vision, lack of understanding? Do you think with the increasing availability and amount of data that some of the approaches, specifically training neural networks, can improve computers ability to understand of feign understanding?
  3. What is the most critical issue facing MT today? What is the most critical issue facing NLP today?
  4. Can I create my own program that can convert my speech into text?

References:

“A Neural Network for Machine Translation, at Production Scale.” n.d. Google AI Blog (blog). Accessed March 6, 2021. http://ai.googleblog.com/2016/09/a-neural-network-for-machine.html.

“An AI That Writes Convincing Prose Risks Mass-Producing Fake News.” n.d. MIT Technology Review. Accessed March 6, 2021. https://www.technologyreview.com/2019/02/14/137426/an-ai-tool-auto-generates-fake-news-bogus-tweets-and-plenty-of-gibberish/.

“Better Language Models and Their Implications.” 2019. OpenAI. February 14, 2019. https://openai.com/blog/better-language-models/.

“Computational Linguistics.” 2021. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Computational_linguistics&oldid=1008316235.

CrashCourse. 2017. Natural Language Processing: Crash Course Computer Science #36. https://www.youtube.com/watch?v=fOvTtapxa9c.

———. 2019. Natural Language Processing: Crash Course AI #7. https://www.youtube.com/watch?v=oi0JXuL19TA.

CS Dojo Community. 2019. How Google Translate Works – The Machine Learning Algorithm Explained! https://www.youtube.com/watch?v=AIpXjFwVdIE.

“Machine Translation.” 2021. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Machine_translation&oldid=999926842.

“Natural Language Processing.” 2021. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Natural_language_processing&oldid=1009043213.

“The Technology behind OpenAI’s Fiction-Writing, Fake-News-Spewing AI, Explained.” n.d. MIT Technology Review. Accessed March 6, 2021. https://www.technologyreview.com/2019/02/16/66080/ai-natural-language-processing-explained/.

Learning to Read

Before delving into the key issues and points from Karpathy’s article, we need to deconstruct pattern recognition to its basics. Pattern Recognition is a subset of Machine Learning because it is a process that gives computers the ability to learn from data that can then be used to make predictions and decisions. This process composes of classifying data into categories determined by decision boundaries. The goal is to maximize correct classification while minimizing errors. To do so it goes through a step-by-step process most notable from Dougherty’s reading regarding the image below. The entire method boils down to this:

  • Sensing/Acquisition – uses a transducer such as a camera or microphone to capture signals (e.g., an image) with enough distinguishing features.
  • Preprocessing – makes the data easier to segment like numerating pixels into a digit by dividing the RGB code of the pixel by 256.
  • Segmentation – partitions a signal into regions that are meaningful for a particular task—the foreground, comprising the objects of interest, and the background, everything else.
    1. Region-based = similarities are detected.
    2. Boundary-based = discontinuities are detected.
  • Feature Extraction –
    1. Features are characteristic properties of the objects whose value should be similar for objects in a particular class, and different from the values for objects in another class (or from the background). Examples: Continuous (numbers) or Categorical (nominal, ordinal)
  • Classification – assigns objects to certain categories based on the feature information by evaluating the evidence presented and decides regarding the class each object should be assigned, depending on whether the values of its features fall inside or outside the tolerance of that class.

The first four steps I interpret as preparing the data and features that the algorithm will apply to the data, and the final step is the where the action occurs in a simple and fast manner. Using a picture for an example, this happens by sending the data in each pixel through this process. Now that we know what pattern recognition consist of, we can further now examine Karpathy’s explanation of Convolution Neural Networks (ConvNet) which is just another form of pattern recognition specifically a type of classification method. Other methods include decision trees, forest (which are just compilations of decision trees), support vector machines, and neural networks.

To understand ConvNets we should start with understanding neural networks. Neural networks are organized in layers connected as links that take a series of inputs and combines them to then emit a signal as an output, both inputs and outputs are represented as numbers. Between the input and output are hidden layers that sum the weighted inputs and then apply a bias. These are initially set to random numbers when a neural network is created, then an algorithm starts training the neural network using labeled data from the training data. The training starts from scratch by initializing filters at random and then changing the filters slightly using a mathematical process by telling the system what the actual image is e.g. a toad vs a frog (supervised learning?). Next it applies the activation function (transfer function) that gets applied to an output performing a final mathematical modification to get the result. ConvNet follows the same principle but has more hidden layers performing more data analysis to recognize complex objects and scenes, this is also termed deep learning.

Karpathy was able to highlight this through a practical example using selfies which I found both amusing and enjoyable. I think the key points he raises that are echoed in the other readings is that pattern recognition is not 100% accurate. The choosing of the features that create the decision boundaries and space result in a confusion matrix that tells what the algorithm got right and wrong. This inability to be 100% accurate is termed the “Curse of Dimensionality” in which the more features we add to make the decisions more precise the more complicated the classification become and as such experts employ the K.I.S.S. method. However, we can program algorithms like ConvNet to be mostly right by identifying features and through repetitive training assist the algorithm to gradually figure out what to look for, this I believe is termed supervised learning or maybe reinforcement learning? In sum ConvNet is a form of pattern recognition used as a tool for machine learning that still has obstacles to overcome but is now being used to interpret data to convert handwriting into text, spot tumors in CT scans, monitor traffic flows on road, propel self-driving car, possibilities are endless!

Questions:

Understanding the definitions of supervised vs unsupervised does that mean supervised learning is pattern recognition? Does that then mean unsupervised learning does not exist, if so, what are some examples?

Where does reinforcement learning fall under supervised or unsupervised?

Are features another term for bias and weights?

References:

Alapaydin, Ethem. 2016. Machine Learning-The New AI. MIT Press Essential Knowledge Series. Cambridge, MA: MIT Press. https://drive.google.com/file/d/1iZM2zQxQZcVRkMkLsxlsibOupWntjZ7b/view?usp=drive_open&usp=embed_facebook.

CrashCourse. 2017a. Machine Learning & Artificial Intelligence: Crash Course Computer Science #34. https://www.youtube.com/watch?v=z-EtmaFJieY&t=2s.

———. 2017b. Computer Vision: Crash Course Computer Science #35. https://www.youtube.com/watch?v=-4E2-0sxVUM.

———. 2019. How to Make an AI Read Your Handwriting (LAB) : Crash Course Ai #5. https://www.youtube.com/watch?list=PL8dPuuaLjXtO65LeD2p4_Sb5XQ51par_b&t=67&v=6nGCGYWMObE&feature=youtu.be.

“Dougherty-Pattern Recognition and Classification-an Introduction-2013-Excerpt-1-2.Pdf.” n.d. Google Docs. Accessed February 26, 2021. https://drive.google.com/file/d/1BT-rDW-mvnCOtUvvm-2xBwzF8_KJcKyI/view?usp=drive_open&usp=embed_facebook.

“What a Deep Neural Network Thinks about Your #selfie.” n.d. Accessed February 26, 2021. https://karpathy.github.io/2015/10/25/selfie/.

Data – Its All Starting to Make Sense!!!

I think something just clicked! Slowly starting to make sense of binary and how it is converted into the symbols we see on our screens. First, I need to define Data which is always something with humanly imposed structure, that is, an interpretable unit of some kind understood as an instance of a general type. Data is inseparable from the concept of representation. This representation must be universal so that devices can communicate with other devices, enter Unicode. Unicode is literally just that a universal code that gives a string of binary digits for each symbol, number, letter, etc. that at its current standard has enough space to generate 2,147,483,647 symbols. The first 127 symbols come from ASCII which uses a 7-bit structure and could only generate 127 symbols. We can get the extended ASCII scale that can generate 8-bit structures but there are so many more symbols that needs associated binary digits so Unicode developed a world system that most commonly uses UTF-8 or UTF-16 but can go up to UTF-32. So, for me to understand this better here is an example:

It is easy to use binary digits to represent numbers with the whole 64, 32, 16, 8, 4, 2, 1 (7-bit/ASCII) sequence in which 82 would represent 1010010 but to represent letters or symbols there needs to be a universal agreement on what binary digits represent. So, each letter or symbol is given a number such that “a” means 65 which means 1000001.

Now if every computer uses this same method of symbology (a.k.a. bytecode definitions) then they can communicate with each other which is why a universal standard is so important. The next question is then how do you get a computer to create the symbol “a.” I understand that the binary digits 1000001 = a but how does a pop up on my screen? How does “a” pop up on my screen, how is it converted from binary digits into the letter i.e. rendered? Professor Irvine mentioned it in his intro in which it seems like a software interprets it and then displays the text on a screen, so maybe its next week’s lesson?

This is just for text though. To understand photos is not that different which is wild! After reading How Digital Photography Works I no longer need to hire a professional photographer to take photos for me I know how to alter pictures! Joking but the basics are there and I’ve de-black boxed it! In simplest terms colors are composed of 256 numbers of each shade of red, blue, and green. So, to alter a pictures colors on just needs to change the number associated with that color. Black being 0 of all three which is the absence of color and white being 256 of all three. Though to get from an image that I see into something digital it goes through some cool science that if I did not know better would be a form of magic. The down and dirty, after light passes through a camera’s lens, diaphragm and open shutter it hits millions of tiny micro lenses that capture the light to direct it properly. The light then foes through a hot mirror that lets visible light pass and reflect invisible infrared light that would just distort images. Then it goes through a layer that measures the colors captured, the usual design is the Bayer array which has green, red, and blue separated and never touching the same color but double the number of greens. Finally, it strikes the photodiodes which measure the intensity of the light by first hitting the silicon at the “P-layer” which transforms the lights energy into electrons creating a negative charge. This charge is drawn into the diodes depletion area because of the electric field the negative charge creates with the “N-layers” positive charge. Each photodiode collects photons of light as long as the shutter is open, the brighter a part of the photo is the more photons have hit that section. Once the shutter closes the pixels have electrical charges that are proportional to the amount of light received. Then it can go through two different process either CCD (charge-coupled device) or CMOS (complementary metal-oxide semiconductor). Either process the pixels go through an amplifier that converts this faint static electricity into a voltage in proportion to the size of each charge. A digital camera literally converted light int electricity! MAGIC, joking its Science once you understand it! My question is then how does the computer recognizes the binary digits associated with the electric current? More precisely where in this process does the electric current become a recognizable number on the 256 green, red, blue scale?

Now that I understand the different types of data how to we access and store it. The crash course videos after watching a couple of times provided the answers! Data is structured to make it more accessible. It is stored on Hard Disk Drives and Hard Drives which are the evolutions of years of different research on storing data, which originated with paper punch cards (wild). Hard Disk Drives from my understanding is what our computers use for RAM because it has the lowest “seek time” (time it takes to find the data) by utilizing a memory hierarchy to manage and store the data. Hard Drives are RAM integrated circuits nonvolatile solid-state drives (SSD) that contain no moving parts but still not as fast as Hard Disk Drives. I am not sure if this is correct though because I more familiar with hearing the term Hard Drive rather than Hard Disk Drive, and associate disk technology with old computers. So, my question is what type of memory storage do most modern computers use or do the use both? Anyway, after understanding where data is located, the next step is understanding how it is organized in that storage system. Data is stored in file formats like JPEG, TXT, WAV, BMP, etc. which are stored back-to-back in a file system. At the front, a Directory file or Root File, is kept at the front of storage (location 0) and list the names of all the other files to help identify the files types. Modern file system stores files in blocks with slack space so that the a user can add more data to that file. If it exceeds its slack space it creates another block. This fragmentation of data goes through a defragmentation process that reorders data to facilitate ease of access and retrievability.

 

References:

CrashCourse. 2017a. Data Structures: Crash Course Computer Science #14. https://www.youtube.com/watch?v=DuDz6B4cqVc&list=PL8dPuuaLjXtNlUrzyH5r6jN9ulIgZBpdo&index=15.

———. 2017b. Memory & Storage: Crash Course Computer Science #19. https://www.youtube.com/watch?v=TQCr9RV7twk&list=PL8dPuuaLjXtNlUrzyH5r6jN9ulIgZBpdo&index=20.

———. 2017c. Files & File Systems: Crash Course Computer Science #20. https://www.youtube.com/watch?v=KN8YgJnShPM&list=PL8dPuuaLjXtNlUrzyH5r6jN9ulIgZBpdo&index=21.

“FAQ – UTF-8, UTF-16, UTF-32 & BOM.” n.d. Accessed February 21, 2021. https://unicode.org/faq/utf_bom.html.

Martin Irvine. 2020. Irvine 505 Keywords Computation. https://www.youtube.com/watch?v=AAK0Bb13LdU&feature=youtu.be.

The Tech Train. 2017. Understanding ASCII and Unicode (GCSE). https://www.youtube.com/watch?v=5aJKKgSEUnY.

“White-Downs-How Digital Photography Works-2nd-Ed-2007-Excerpts-2.Pdf.” n.d. Google Docs. Accessed February 21, 2021. https://drive.google.com/file/d/1Bt5r1pILikG8eohwF1ZnQuv5eNL9j8Tv/view?usp=sharing&usp=embed_facebook.

Information, What is it? Also a brief history into an untold Legend

I think the first step to answering these questions is to figure out what information theory is. From my understanding it is just the explanation for how we can impose human logic and symbolic values on electronic and physical media. From this we get this offshoot that I think falls into the information theory concept something called E-information which is the digital electronics concept i.e. electrical engineering information where E-information = mathematics + physics of signals + time. This whole process is to preserve the pattern and quality of ta signal unit so that it can be successfully received at the end. Now the signal-code transmission model I think was the result of the information theory built by Claude Shannon to transmit electronic signals most efficiently over networks or broadcast radio waves merged with the question of how to represent data in discrete electronic units. This model is a way of transmitting error-free electronic signals in telecommunication systems, but it left out meanings and social uses of communication because they are assumed or presupposed. As a result, I believe the information theory describes how our primary symbol systems “encode” but do not provide a meaning to the symbol. This is what I got from the first text on the introduction to technical theory.  

For the next one, following a timeline of Shannon we see how the information theory came to exist, starting with the Differential Analyzer that was coordinated by a hundred relays, intricately interconnected, switching on and off in particular sequence. From this and his enjoyment in logic puzzles, Shannon realized in a deeply abstract way the possible arrangements of switching circuits lined up to symbolic logic particularly Boole’s algebra. Here we get his masters thesis in a machine that could solve logic puzzles the essence of the computer. Then he got interested in why the telephony, radio, television, and telegraphy all following the same general form for communication always suffer distortion and noise (static). But he took a job in Princeton and then WWII kicked off and he was assigned “Project 7” that applied mathematics to fire-control mechanism for anti-aircraft guns to “apply corrections to the gun control so that the shell and the target will arrive at the same position at the same time.” The problem was something similar to what plagued communications by telephone, the interfering noise and problem of smoothing the data to eliminate or reduce tracking errors. We go on a quick history of the telephone and are reintroduced to Shannon reading a text published in Bell System Technical Journal about the Baduot Code. Here information is the stuff of communication in which communication takes the place by means of symbols that convey some type of meaning. Fast forward to 1943 and Shannon is working as a cryptanalyst and enjoying tea with Alan Turing where they talked not about their work but of the possibility of machine learning. Here Shannon develops a model for communication and then I get lost in the math but from my understanding Shannon created this idea that natural-language text could be encoded more efficiently for transmission or storage. In which he develops a way to ensure end to end transmission.

From all of this we get the internet, which is designed with no central control. It’s a distributed packet switch network that ensures end-to end connectivity, so that any device could connect to any other device. I’m still having difficulty understanding what E-information is to compare it to properly compare it to the internet. The Internet is controlled by no one and everyone in the fact that the Internet packets (I think the information) are structure-preserving-structures between senders and receivers. At the same time, from my understanding, E-information is like the internet in that it uses electricity to create imposing regular, interpretable patterns that are designed to be communicable through a physical system to a human user. If this is right than wouldn’t the internet be a form of E-information?

 

Lingering Questions:

Where does E-information fit in the information theory?

What did the math that Shannon did to realize error-free transmission accomplish? I did not understand the math aspect behind his theories and would like a better explanation.

Where is the ALU and how does the ALU on my computer turn logic gates into actual images and symbols?

Why are there so many logic gates?

Evolution of Computation

This week provided almost an evolutionary timeline of computation, which was super cool. Starting the reading with Prof. Irvines video we understand that computation is information processing in the sense of information transformation for all types of digitally encodable symbolic forms. This is done through binary electronics in which we impose a structure on electricity in sequences of on//off states and then assign symbolic values to physical units. From this we created the modern digital computer and the computer system that “orchestrate” (combine, sequence, and make active) symbols that mean (data representations) and symbols that do (programming code) in automated processes for any programmable purpose. 

Then we go into computer principles with Denning and Martell which further defines computation as dependent on specific practices and principles, where each category of principle is a perspective on computing. This image sums of the initial chapter: 

Then we make this big jump into Machine Learning which is the next step of computation in part because the world has regularities we can collect data of example observations and analyze it to discover relationships. Machine learning involves the development and evaluation of algorithms that enable a computer to extract (or learn) functions from a dataset (sets of examples). This is done through algorithms that induces (or extracts) a general rule (a function) from a set of specific examples (the dataset) or assumptions (inductive bias). Following this is Deep Learning another derivative of computation introduced as the subfield of Machine Learning that focuses on the design and evaluation of training algorithms and model architectures for modern neural networks using mathematical models loosely inspired by the brain. I believe this as an evolution of Machine Learning because Deep Learning has the ability to learn useful features from low-level raw data, and complex non-liner mappings from inputs to outputs rather than having a human input every feature (correct me if I’m wrong but features are the inputs for data within a dataset). Deep Learning was spurred from Big Data which has some notable ethical questions regarding privacy that I would love to further dissect. Overall this mean that Deep Learning’s ability to compute information is much faster and more accurate than many other machine learning models that use hand-engineered features. 

It is honestly inspiring and jaw dropping to see the jump from Dartmouth to Machine Learning and now Deep Learning. So many questions still exist, but now I have a decent grasp that the devices I’m using now to create this post consist of humans imposing symbolic meaning to electricity that at its root is just 1/0s that through layers of my computer system is creating comprehensible images. From that we have evolved computers from a device that stores and transports data to actual machines capable of learning through data and deriving computation. I’m still curious the nature of Deep Learning and its difference and applicability to our issues today as opposed to Machine Learning. Also what is noise? 

Best,

Chloe

—BREAK—

References

Alapaydin, Ethem. 2016. Machine Learning-The New AI. MIT Press Essential Knowledge Series. Cambridge, MA: MIT Press. https://drive.google.com/file/d/1iZM2zQxQZcVRkMkLsxlsibOupWntjZ7b/view?usp=drive_open&usp=embed_facebook.
 
Denning, Peter, and Craig Martell. 2015. Great Principles of Computing. MIT Press. https://drive.google.com/file/d/1RWhHfmv4oJExcpaCpMe85MLtOgATLZ5Z/view?usp=drive_open&usp=embed_facebook.
 
Kelleher, John. 2019. Deep Learning. MIT Press. https://drive.google.com/file/d/1VszDaSo7PqlbUGxElT0SR06rW0Miy5sD/view?usp=drive_open&usp=embed_facebook.
 
Martin Irvine. 2020. Irvine 505 Keywords Computation. https://www.youtube.com/watch?v=AAK0Bb13LdU&feature=youtu.be.

What is AI?- Chloe Wawerek

To better understand this course, a solid definition of AI and its history is needed, and the following is what I gained from this weeks readings. From my understanding AI is any machine capable of interpreting data, potentially learning for the data, and use the knowledge to adapt and achieve specific goals. AI’s roots come from the history of computers which are machines that can reliably follow very simple instructions very, very quickly, and they can make decisions as long as those decisions are precisely specific. With that being said the question then posed is, can computers produce intelligent behavior simply by following lists of instructions like these? I think machine learning is a step in that direction but the issues that AI faces is twofold:

1) We have a recipe for the problem that works in principle, it doesn’t work in practice because it would require impossibly large amounts of computing time and memory. 

2) Or we have no real idea what a recipe for solving the problem might look like.

Because of these issues AI research is currently focused on first developing AGI the ability of computers to have the same intellectual capabilities as humans but isn’t concerned with issues such as consciousness or self-awareness (weak AI). AI research embodies various disciplines including physics, statistics, psychology, cognitive sciences, neuroscience, linguistics, computer science and electrical engineering, which is wild. From my understanding the furthest we’ve gotten to being close to weak AI is machine learning: constructing a program that fits the given data by creating a learning program that is a general template with modifiable parameters.

Combing Simon and Alpaydin’s work we see that machine learning is a requirement for AI and that it is based off the human brain. In fact all of AI takes inspiration from the brain hence the various disciplines involved in its advancement. Though Simon poses an interesting hypothesis that intelligence is the work of symbol systems, and by comparing the human brain to a computer system, both symbol systems in work, computers are therefore intelligent? The logic comes form the argument that logic is computation and since both the brain and computers work to compute data they are therefor intelligent. *please correct me if I’m wrong in this analysis.* I can see the reasoning behind this but I believe there are so many more gaps to fill, but with AI being so vast this could be a correct interpretation of weak AI or Narrow AI but what about the Grand Dream?

Having the understanding that computers work off of binary codes and require instructions and guidance to complete task, it seems improbable that the Grand Dream will come into fruition. Even machine learning requires some sort of template of instructions for the computer to go off of. I think with the amount of data available and continually growing computers can start inferring patterns and making predictions, but humans as much as human are predictable they are also unpredictable. Additionally, there are certain unwritten rules that govern relationships which is why maybe we can get to a point that computers can become indistinguishable from humans but can they pass the Winograd schemas? I think this ability to relate to humans on an emotional level is what will always prevent this sort of self-awareness in machines. 

Questions that I still have:

  1. What is the difference between an algorithm and a program?
  2. What is the difference between an artifact and a symbol?
  3. How does electromagnetic energy convert to symbols ie binary codes?
    1. Specifically how did the Turing Machine work and why was that the foundation of computers?
  4. Is cybernetics just another word for neural networking? 
  5. Why is the divide between cybernetics and symbolic computing in AI so hotly debated? What really is the difference?

References: