Author Archives: Stephanie Plaza

Week 12 Reflection

As this course nears its end, this week’s reflection looks at the bigger picture. While we’ve discussed an array of topics from ‘Big Data’ to NLP, we have consistently seen deep-rooted issues in our current developments in AI/ML. Included in these issues is failure to distinguish between explanation and speculation, failure to identify the sources of empirical gains, the use of mathematics that obfuscates or impresses rather than clarifies, e.g. by confusing technical and non-technical concepts, as well as misuse of language, e.g. by choosing terms of art with colloquial connotations or by overloading established technical terms (Lipton). This course has aimed at deblackboxing, or providing clarity on the technologies that are seemingly purposefully explained in a confusing matter. While this week’s readings aimed at these issues and possible solutions (like XIA), this topic begs the question: where do we start?

As students studying Computer Science/similar topics at the degree level, securing an internship/full-time role at a company working on developments in AI/ML is an accomplishment. There is such great competition, that new hires most likely would not dare try to challenge their bosses/company ethically. As time passes, complacency grows and the thankfulness for a paycheck keeps people from speaking up until finally, the damage has already been done. How do we teach students at the university level how to not only right the wrongs of current AI/ML development practices, but also implement them at their future places of work?

If, like the readings mention, we utilize some sort of independent governance to insure sound practice, what incentivizes companies to hire and pay these companies?  Why would they want to be more restricted in the work they do when they have been getting away with more cost-effective, profit-driving techniques currently? With laws put in place like section 240 or more irrelevant rulings like NYT v. Sullivan, it is clear that there is very little accountability from the government in terms of the influence technology has on individual’s lives. Without some sort of ruling/law from the government demanding better practice or immense social pressure (that will not happen any time soon because of the blackboxing of our technologies), it does not seem likely our technologies will ethically improve any time soon.  

As we’ve seen from articles like this, it is clear that large companies with ethics boards do not seem to have a lot of impact on the workings of companies. Instead, they are a pretty line on a reportings PowerPoint/get-out-of-jail-free card. Clearly, people’s hopes of AI/ML becoming more ethical in its development isn’t working. So, to round back to my initial question, where do we go from here?

Week 11

Many of the readings this week focus on the shifting definition of big data. To being with the more literal definition of “big,” digital data huge in volume (consists of terabytes or petabytes of data), high in velocity (being created in/near real-time), and diverse in variety in type (structured/unstructured in nature, temporally/spatially referenced) can be defined as big data. In addition, big data strives to capture entire populations or systems, making it exhaustive in scope, aims to be as detailed as possible, and relational in nature, allowing for the conjoining of different datasets, and is scalable (can expand in size rapidly) (Big Brother). Essentially, we try to record relevant data that we can combine with other potentially relevant data, all with the hopes of answering questions about populations/systems, which leads to the next definition of big data: one that is tasked with giving deep and new insights into human behavior. In this case, data is not “big” in its volume, velocity, or variety, but “big” in that in theory, huge amounts of data are available to anyone in the world over the internet (in reality there is private data) (Digitization). 

With our goals of generating relevant insights in mind, data science steps in to produce these insights. Driven by practical problems, data science is required to transform big data into useful, valuable information and involves finding relevant data, data preparation, data analysis, and data visualization (Huberman). The applications-driven nature of data science means that visualization is extremely important for understanding the output of the applications stage and communicating the results to clients and stakeholders. Given the absurdly large and complex amount of data, data scientists tackle the scientific challenge of formulating methods to represent complex and entangled systems. Data scientists utilize big data every day to generate insights. In one example, it was discovered that people tend to tell lies on Facebook while their Google searches reflect deep personal truths (Huberman).

In our everyday life, we use technology in ways that add to big data and data scientists’ work. For example, almost every time we use social media, go online shopping, or surf the web, data is being collected on us. Our locations, spending histories, interests, and sometimes even information about our body types are collected. This data is sold to many different private companies with the hopes of generating more insights about human behavior (and in most cases, generating more money). In today’s day and age, big data is used in governments, the education system, media, the healthcare industry, and a variety of other places.

 

Huberman, Bernardo A. “Big Data and the Attention Economy: Big Data (Ubiquity Symposium).” Ubiquity, December 2017, 1–7.
Johnson, Jeffrey. “Big Data: Big Data or Big Brother? That Is the Question Now.” Ubiquity, August 2018, 1–10.
Johnson, Jeffrey. “Big Data, Digitization, and Social Change: Big Data (Ubiquity Symposium.” Ubiquity, December 2017, 1–8.
Kitchin, R. The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. SAGE Publications, 2014. 

Week 10

In thinking about some of the consequences that could result from the convergence of the technologies on one overall “unifying” architecture from one of the “big four” companies (Google, AWS, IBM, Microsoft), my immediate thought is to consider the dangers. First, cloud computing has seen plenty of controversy in terms of data security: “2010, for instance, witnessed a huge cyber attack on the popular cloud email services of Gmail, and the sudden discontinuation of cloud services to WikiLeaks by Amazon. There followed the 2013 NSA spying scandal, the 2014 nude photo iCloud hack and the Sony hack, with hackers increasingly turning to the cloud.” If all users of an architecture system had to use technology provided by only one company, data breaches could much more widescale than they already are, and effect more people than they would if people did not have various companies to choose to use services from.

Not only would we need to worry about security in terms of data breaches, but also in terms of what we as users “own” and what the company providing the services owns: “cloud computing suits the interests and values of those who adopt a deflated view of the value of ownership and an inflated view of freedom” (De Bruin 2010). In other words,  cloud computing is designed for people “who care less about where, for instance, a certain photograph is stored and who owns it and care more about having the opportunity and freedom to do things with it.” This can be extremely dangerous, especially if we do not look into the nitty-gritty of who owns our information and where it is stored because we can be giving our work, data, and reliance on a company we may not fully trust. If we disagree with this company’s regulations for storing or accessing our information but need cloud services, we may be at a loss and left with the choice to succumb to regulations we do not align with or not having the technology to meet our specific hopes/goals.

As with all monopolies, payment becomes a huge concern. If one company controls the market, what stops them from charging any prices they choose to store our data on the cloud? Again, users are left with the hard decision to conform to absurd prices or choosing to forgo needed technology. Especially considering that many companies intentionally blackbox cloud computing services, many users will not be able to conceptualize fair pricing and could be taking advantage of users for more than just a hotdog. 

Lastly, what concerns me about one provider is what happens when we experience hiccups or outages in our systems: “ To minimize the risk of interrupted service due to power outages, datacentres are located near power plants and data are stored on various different physical locations—the greater the number of locations where your data are stored, the more you pay…even then, things may go wrong. Cloud services may face problems as a result of which they become temporarily unavailable. For the numerous companies dependent on cloud services, this means interruption of their websites, their customer services and/or their sales administrations.” If we all rely on the same services, does it mean a more substantial piece of the internet is down than what would be if we had options? And again, more issues arise with the idea of services being down and pricing: “small start-up companies are typically affected most: cloud companies require their customers to pay more to store data in more datacentres to diminish the risk, but smaller companies are less likely to be able to afford this.” Too much control for one company is never a good thing, and can have serious financial, security, and independence concerns for users.

de Bruin, B. (2016). The Ethics of Cloud Computing, Science and Engineering Ethics, volume 23, 21–39.

de Bruin, B. (2010). The liberal value of privacy. Law and Philosophy, 29(5), 505–534.

Laziness and Magic

Probably the most important ideological issues I have noticed with the advancement of AI/ML applications are the lack of accountability as well as the deep-seated nature of the issues that are trying to be tackled.

To begin, companies are money hungry and employees are a mixture of lazy/wanting to please that we allow shortcuts to be taken and questionable data to be used to train and test our systems. While beginning data collection tools were extremely cautious and used photoshoots with consenting individuals, time, money, and lack of diversity became an issue. So, employees began scrapping the web and used images of faces from websites like Flickr (where many photos are registered under the creative license) to build huge datasets of faces they can train on. This is where the issues begin. By 2007, researchers began downloading images directly from Google, Flickr, and Yahoo without concern for consent. “They found that researchers, driven by the exploding data requirements of deep learning, gradually abandoned asking for people’s consent. This has led more and more of people’s personal photos to be incorporated into systems of surveillance without their knowledge…People were extremely cautious about collecting, documenting, and verifying face data in the early days, says Raji. ‘Now we don’t care anymore. All of that has been abandoned,’ she says. ‘You just can’t keep track of a million faces. After a certain point, you can’t even pretend that you have control’” (Francisco). For a person who is completing potentially revolutionary work to say ‘they don’t care’ clearly suggests that they are not held accountable for their actions and there are no rules in place to do so. If the people creating the databases say they ‘can’t even pretend they have control,’ should we be rethinking the processes we are defining?

Once the data is trained, biases usually show up: “There are two main ways that bias shows up in training data: either the data you collect is unrepresentative of reality, or it reflects existing prejudices” (AI Bias). Truthfully – as long as humans are developing AI/ML technologies, I do not think there will be tech that is fully free from human bias. Maybe with developer teams that diverse enough we can thwart the issue, but to say a human-created technology will be free from humanity’s imperfections seems like a lofty goal. Similar to the lack of accountability shown when collecting data, it seems that the outputs of the AI/ML applications have no one responsible for them: “really strange phenomena start appearing, like auto-generated labels that include offensive terminology” (AI Ethics-Washing). How is anything that humans have created with a specific goal in mind contain a “strange phenomena”? Computers do not develop their own brain in the process of creating these applications where they can decide to be offensive, rather it is humans creating applications that lead to these offensive labels.

While we talk about the imperfections that come with being a human being present in AI/ML technology, our designs are also based off of the culture we are a part of, which is not only different from region to region but also different throughout time. For example, take the study of when people were asked about “moral decisions that should be followed by self-driving cars. They asked millions of people from around the world to weigh in on variations of the classic “trolley problem” by choosing who a car should try to prioritize in an accident. The results show huge variation across different cultures.” All humans have their own experiences: different upbringings, traumas, family histories, health conditions….to say we can all universally agree on the difficult decisions AI/ML applications should make is impossible. Even if we were able to, it is probable that 5 years or 20 years down the road, that decision would not continue to be agreed upon.

In my opinion, stricter rules and regulations need to be placed on employees of large tech companies. Employees of pharmaceutical companies work with many patients’ personal data and obtain consent from patients to enlist in their clinical trials and share data with the pharmaceutical companies. These employees must adhere to strict extremely strict guidelines set forth by the FDA and HIPAA and employees of tech companies should have to do the same. “A recent study out of North Carolina State University also found that asking software engineers to read a code of ethics does nothing to change their behavior.” Much like having to read the Terms & Conditions, no one does it and almost no one cares about the contents. Having ethics and regulations that are high stakes need to be enforced. If we are not enforcing them at the employee/developer level, how can we expect the users to use these applications ethically?

 

Francisco, Olivia Solon. “Facial Recognition’s ‘Dirty Little Secret’: Social Media Photos Used without Consent.” NBC News. Accessed March 19, 2021. https://www.nbcnews.com/tech/internet/facial-recognition-s-dirty-little-secret-millions-online-photos-scraped-n981921.
MIT Technology Review. “In 2020, Let’s Stop AI Ethics-Washing and Actually Do Something.” Accessed March 2o, 2021. https://www.technologyreview.com/2019/12/27/57/ai-ethics-washing-time-to-act/.
MIT Technology Review. “This is how AI bias really happens—and why it’s so hard to fix.” Accessed March 2o, 2021. https://www.technologyreview.com/2019/02/04/137602/this-is-how-ai-bias-really-happensand-why-its-so-hard-to-fix/ 

Week 8 Reflection

Stephanie

Designing a virtual assistant is no simple task, but to do so, we would need to think about how virtual assistants work. They work via text (online chat, especially in an instant messaging app or another app, SMS Text, e-mail), voice (Amazon Alexa, Siri, Google Assistant), and through taking and/or uploading images (Samsung Bixby on the Samsung Galaxy S8). As a broad overview, virtual assistants use NLP to match user text or voice input to executable commands. These assistants continue to learn over time by using artificial intelligence techniques including machine learning. 

Before Apple integrated its hands-free virtual assistant, it began allowing users to use Siri by first pressing the buttons of their home screens and then followed by saying “Hey Siri.” This is an important step in the process of developing hands-free virtual assistants because it tells us how Apple trained its technologies. The users’ “Hey Siri” utterances used for the initial training set for the US English detector model. They also included general speech examples, as used for training the main speech recognizer. To check the initial automatic transcripts for accuracy, Apple hired a team of people to monitor the data that would be the foundation of the program was correct.

Apple products, like many virtual assistant products, are built with a microphone. This is responsible for capturing audio, which turns our voices into a stream of instantaneous waveform samples at a rate of 16000/second. After accumulating these waveforms, they are converted into a sequence of frames that each describes the sound spectrum of approximately 0.01 sec. These are fed into a Deep Neural Network acoustic model, which converts the acoustic patterns into a probability distribution over a set of speech sound classes. For example, those used in the phrase “Hey Siri” (accounting for silence) total to about 20 sound classes. 

In order to keep the technology hands free and therefore activate upon command, a small speech recognizer runs all the time and listens for just its ‘wake word’.  In iPhones, this is known as the Always On Processor (AOP). While Apple uses “Hey Siri,” other well-known wake words include “OK Google” or “Hey Google”, “Alexa”, and “Hey Microsoft.” When the speech recognizer detects the wake word(s), the device parses the speech that follows as a command or query.

Once the acoustic patterns of our voice at each instant are converted into a probability distribution over speech sounds, a temporal integration process computes a confidence score that the phrase you uttered was in fact the wake word. If the score is high enough, the virtual assistant wakes up. It is also important to note that the threshold to decide whether to activate Siri is not a fixed value.

The Deep Neural Network acoustic model, once trained with not only our wake word but also some sort of corpus of speech allows virtual assistants to provide a sound class label for each frame and ultimately estimate the probabilities of the states given the local acoustic observations. “The output of the acoustic model provides a distribution of scores over phonetic classes for every frame. A phonetic class is typically something like ‘the first part of an /s/ preceded by a high front vowel and followed by a front vowel.’”

Once the question/task is converted into speech waves and processed through the DNN, Apple licenses Wolfram Alpha’s Knowledge Base. This knowledge base is able to respond to fact-based questions, with the example from Wikipedia as such: “How old was Queen Elizabeth II in 1974?” Wolfram Alpha displays its “input interpretation” of such a question, using standardized phrases such as “age | of Queen Elizabeth II (royalty) | in 1974”, the answer of which is “Age at start of 1974: 47 years”, and a biography link. 

In terms of a virtual assistant’s voice, after databases have been trained, many companies hire local voice talent and have them read books, newspapers, web articles, and more. These recordings are transcribed to match words to sounds in order to identify phonemes, the individual sounds that make up all speech. “They try to capture these phonemes spoken in every imaginable way: trailing off at the end of the word, harder at the beginning, longer before a pause, rising in a question. Each utterance has a slightly different sound wave…every sentence Siri speaks contains dozens or hundreds of these phonemes, assembled like magazine cut-outs in a ransom note. It’s likely that none of the words you hear Siri say were actually recorded the way they’re spoken”(Wired). As companies continue to hunt for the right voice talent, they run the speech of those who audition through the models they’ve built looking for phoneme variability—”essentially, the sound-wave difference between the left and right side of each tiny utterance. More variability within a phoneme makes it hard to stitch a lot of them together in a natural-sounding way, but you’d never hear the problems listening to them speak. Only the computer sees the difference” (Wired). Once the right person is found who sounds right to both human and computer, they are weeks at a time, and that becomes the voice of the virtual assistant.

References

Apple Machine Learning Research. “Hey Siri: An On-Device DNN-Powered Voice Trigger for Apple’s Personal Assistant.” Accessed March 15, 2021. https://machinelearning.apple.com/research/hey-siri.

“How Apple Finally Made Siri Sound More Human.” Wired. Accessed March 14, 2021. https://www.wired.com/story/how-apple-finally-made-siri-sound-more-human/.

“Virtual Assistant.” In Wikipedia, March 10, 2021. https://en.wikipedia.org/w/index.php?title=Virtual_assistant&oldid=1011400052.

“WolframAlpha.” In Wikipedia, March 13, 2021. https://en.wikipedia.org/w/index.php?title=WolframAlpha&oldid=1011910358.

Questions: 

I’m having a hard time understanding the mathematical side of the DNN Can you please explain?
“The DNN consists mostly of matrix multiplications and logistic nonlinearities. Each “hidden” layer is an intermediate representation discovered by the DNN during its training to convert the filter bank inputs to sound classes. The final nonlinearity is essentially a Softmax function (a.k.a. a general logistic or normalized exponential), but since we want log probabilities the actual math is somewhat simpler.”

Week 7 Reflections

One thing I have been interested in for a while was how devices like Amazon Alexa, Google Home, or Siri take in and process our words into text and then provide us with answers. From the Computer Science Crash Course video, it was explained that the acoustic signals of words are captured by a computer’s microphone. This signal is the magnitude of displacement of a diaphragm inside of a microphone as sound waves, which cause it to oscillate. We have graphable data to represent time and the vertical access is the magnitude of displacement (amplitude). The sound pieces that makeup words are called phonemes. Speech recognition software knows what all these phonemes look like because, in English, there are roughly 44 phonemes, so computer software essentially tries to pattern match. To separate words from one another, figure out when sentences begin and end, and obtain speech converted into text, techniques used include labeling words with parts of speech and constructing a Parse Tree (which not only tags every word with a likely part of speech, but also reveals how the sentence is constructed). 

“You shall know a word by the company it keeps.” But, to make computers understand distributional semantics, we have to express the concept in math. One simple technique is to use Count Vectors.  A count vector is the number of times a word appears in the same article or sentence as other common words. But an issue presented with count vectors is that we have to store a LOT of data, like a massive list of every word we’ve ever seen in the same sentence, and that’s unmanageable. To try to solve this problem, we use an encoder-decoder model: the encoder tells us what we should think and remember about what we just read and the decoder uses that thought to decide what we want to say or do. In order to define the encoder, we need to create a model that can read in any input we give it, i.e. a sentence. To do this, a type of neural network called a Recurrent Neural Network (RNN) was devised. RNNs have a loop in them that lets them reuse a single hidden layer, which gets updated as the model reads one word at a time. Slowly, the model builds up an understanding of the whole sentence, including which words came first or last, which words are modifying other words and other grammatical properties that are linked to meaning. 

Stepping away from the more technical side of NLP and the devices we currently use, I wanted to note that I love the idea of a positive feedback loop. Because people say words in slightly different ways due to things like accents and mispronunciations, transcription accuracy is greatly improved when combined with a language model, which can take statistics about sequences of words. The more we use these devices that try to recognize speech and hear new accents, mispronunciations, etc, the better we can train our devices to understand what we are saying. Scary? Maybe. But also cool.

I’m extremely excited to be reading about natural language processing this week, as I loved the intro course I took in NLP last semester. One of the later assignments we had that reminded me of the Crash Course videos and some of the reading was called “Read training data for the Viterbi tagger.” For context, the Viterbi algorithm is essential for POS tagging but also great for signal processing (cell phone signal decoding), DNA sequencing, and WiFi error correction. Here were the instructions for the assignment:

  • Read the training data
  • Split the training file into a list of lines. 
  • For each line that contains a tab (“\t”), split it by tab to collect the word and part of speech tag.
  • Use a dictionary to track frequencies for:
    • Each word as each tag
    • Each transition from the last tag to the next tag
    • Sentence starting probabilities for each tag
  • Divide by the total number of words to make probabilities and put them into the same nested dictionary structure used by the Viterbi tagger.
  • Now test the tagger:
    • read the test file
    • Tag each sequence of words using the viterbi code
    • Report in a comment: For how many tokens did the tagger find the right solution?
  • Add an evaluation by sentences: for how many sentences is the tagger 100% correct? (include code to calculate this and report the accuracy in a comment)


Here is what my code looked like:

 

Questions:

Google’s version of this is called Knowledge Graph. At the end of 2016, it contained roughly 70 billion facts about, and relations between, different entities… Can you speak more about knowledge graphs, what is necessary to create one, and how they are stored? How does Google use this?

Citations

CrashCourse. Natural Language Processing: Crash Course AI #7, 2019. https://www.youtube.com/watch?v=oi0JXuL19TA.
———. Natural Language Processing: Crash Course Computer Science #36, 2017. https://www.youtube.com/watch?v=fOvTtapxa9c.
Google Docs. “Poibeau-Machine Translation-MIT-2017.Pdf.” Accessed March 8, 2021. https://drive.google.com/file/d/1vOZvxGA-1Uf2HL1MAYqx8Silk9g1r6e9/view?usp=drive_open&usp=embed_facebook.

Weekly Takeaways

Karpathy’s article provides readers with an interesting example to deblackbox Convolutional Neural Networks and in this case, how they can be used to do feature detection, pattern recognition, and probabilistic inferences/predictions for classifying selfie photo images. To put it simply, Karpathy introduces readers to ConvNets by providing an example with animals. Basically, the process is a large collection of filters that are applied on top of each other. After we have trained a dataset, when we attempt to test our dataset, we send a raw image, which is represented as a 3-dimensional grid of numbers’ convolutions, and one operation is repeated over and over a few tens of times (depending on how many we’ve decided to run it through). Small filters slide over the image spatially and this operation is repeated over and over, with yet another set of filters being applied to the previous filtered responses. The goal of this process is to gradually detect more and more complex visual patterns until the last set of filters is computing the probability of entire visual classes in the image.

To get into greater detail of what Karpathy explains through selfies, I will refer to the Crash Course videos and start with the idea of preprocessing. When raw images are fed to be tested, the computer reads the image by looking at pixels. Each pixel is a grayscale value between 0 and 255.  To normalize each pixel value and make them easier for the neural network to process, we’ll divide each value by 255.  That will give us a number between 0 and 1 for each pixel in each image and makes the data easier to process. Carrie Anne of Crash Course does a great job of going into detail about this pixel processing. In effect, an artificial neuron, which is the building block of a neural network, takes a series of inputs and multiplies each by a specified weight, and then sums those values altogether. These input weights are equivalent to kernel values; neural networks can learn their own useful kernels that are able to recognize interesting features in images. These kernels contain the values for a pixel-wide multiplication, the sum of which is saved into the center pixel of an image. We then perform a convolution, which is the operation of applying a kernel to a patch of pixels in order to create a new pixel value. Convolutional neural networks use banks of those neurons to process image data and after being digested by different learned kernels, output a new image.

The issues I found with Karpathys article I will speak about in the form of questions:

  • Karpathy writes that we don’t know what the filters should be looking for, instead, we initialize them all randomly and then train them over time….could that be a waste of resources/time/space? How do we “not know” what the filters should be looking for when it is clear to the human brain that the ‘things’ we want to differentiate are distinct in some way/shape/form?
  • Karpathy also says our algorithm needs to be trained on a very diverse but similar set of data, but in the example he used, how many variations of frogs/dogs, or more broadly, humans are there? I think there is a huge potential exclusivity, and I find it interesting that the best-ranked selfies are almost entirely skinny white females.
  • Also, this may be an unimportant point, but I dislike the way Karpathy trained the data and believe faulty reasoning could lead to reliance upon technology that is improperly trained. In his example, it should be about more than how many followers you have or likes you received. What if this is the user’s 10th post of the day or 13th selfie in a row and people are not inclined to like the photo? What if Instagram was down/the algorithm hid the photo from viewers and it did not get many likes? I don’t think describing factors like this as“an anomaly” and claiming “it will be right more times than not” is a fair enough argument when the stakes are higher. 
  • While this was not raised in Karpathy’s article but rather the Artificial Intelligence Crash Course video, the speaker said it is extremely expensive to label a dataset and that is why for this lab we used prelabeled data. Could financial concerns like this lead to larger-scale ethical issues? Is there any way around this?

 

CrashCourse. Computer Vision: Crash Course Computer Science #35, 2017. https://www.youtube.com/watch?v=-4E2-0sxVUM.
———. How to Make an AI Read Your Handwriting (LAB) : Crash Course Ai #5, 2019. https://www.youtube.com/watch?list=PL8dPuuaLjXtO65LeD2p4_Sb5XQ51par_b&t=67&v=6nGCGYWMObE&feature=youtu.be.
“What a Deep Neural Network Thinks about Your #selfie.” Accessed March 1, 2021. https://karpathy.github.io/2015/10/25/selfie/.

The Basics of Human Communication – Digitally

Best put, the term “’data’” is inseparable from the concept of representation. In the contexts of computing and information, data is always a humanly imposed structure that must be represented through an interpretable unit of some kind.

All text characters work with reference to an international standard for representing text characters in standard bytecode definitions. In effect, Unicode is designed by using bytecode characters, which are designed to be interpreted as a data type for creating instances of character, followed by interpretation in the software stack design, which projects character shapes to pixel patterns on the specific screens of a device as its output. Unlike some of the other types of communication we have spoken about like images or gifs or mp3 files, Unicode provides a set of software-interpretable numbers for representing the form of the whole representable character. A binary media file (like those previously mentioned) has no predefined form or size (for memory). I find a personal example of using Unicode to be funny.  In a web design computer science course I took, I was taught to put in the line UTF-8 to be able to help with the setup of a website. Not until reading the Wikipedia page did I realize that line represented one of the most commonly used encodings.

The second way in which we define a concept of data is through database management systems. This relies on a client/server relationship The client-side reflects the software interfaces for creating, managing, and querying the database on a user/manager’s local PC, while the server-side would be the database management system that is installed and running on a computer in a data center or Cloud array of servers. An example of a relational database model is SQL; SQL uses “Structured Query Language to create a database instance, input data, manage updates, and output data-query results to a client interface,” with which the client can “‘query’ (ask questions or search) data in the system.” As an aside to this definition of DBMS as a concept of data, I think something that has helped me deblackbox this idea is the database course I am taking currently. We have not even started to learn SQL, but instead, we are given a problem and hand draw the given data and its relation to the other data. This signifies to me just how much of a human-centered process database design is, it is not just magical Oracle taking care of everything; “a well-designed database is a partial map of human logic and pattern recognition for a defined domain.” I think that this understanding I have gained can be summed up in Kelleher’s Data Science: “One of the biggest myths is the belief that data science is an autonomous process that we can let loose on our data to find the answers to our problems. In reality, data science requires skilled human oversight throughout the different stages of the process. Human analysts are needed to frame the problem, to design and prepare the data, to select which ML algorithms are most appropriate, to critically interpret the results of the analysis, and to plan the appropriate action to take based on the insight(s) the analysis has revealed.” 

Interestingly enough, digital images seem to be defined by data similar to a combination of the way we use DBMS and Unicode (for text). “Digital cameras store photographs in digital memory, much like you save a Word document or a database to your computer’s hard drive. Storage begins in your camera and continues to your personal computer.” To get more into the ‘nitty-gritty,’ an image is stored as a huge array of numbers and in digital photography, the three colors – red, blue, and green—can have any of 256 shades, with black as 0 and the purest rendition of that color as 255. The colors on your computer monitor or the little LCD screen on the back of your camera are grouped in threes to form tiny full-color pixels, millions of them. When you change the values of adjoining colors in the pixels, you suddenly have 17 million colors at your disposal for each pixel. Essentially, we are express colors as numbers, and these pixelated colors form an image. This is similar to Unicodes expression of text as a number, not a glyph, for each character we attempt to encode.

Kelleher, J. D., and B. Tierney. Data Science. MIT Press, 2018. 

 

“Unicode.” In Wikipedia, February 21, 2021. https://en.wikipedia.org/w/index.php?title=Unicode&oldid=1008164095.

White, R., and T. E. Downs. How Digital Photography Works. Que, 2005. 
 
 
 
Question:
 
If we say all data must be able to be represented in order to be considered data, why is there a separate definition for “data” as representable logical and as conceptual structures?

E-information vs Text Messages

My best friend just sent me a text message about her day. The perceptible text in this text message, i.e. the information, is encoded as bytes, transmitted in data packets, and then interpreted in software for rendering on my phone screen. When E-information is transmitted and received, it is being used as a symbolic medium to encode and decode information to be interpreted by my best friend and me.  We are simply observing and contextualizing representations in and representations out. Messages like the text my friend sent and assumed background knowledge cannot be represented in E-information. Rather, E-information is a sublayer in the technical design process that is all about encoding and representing interpretable representations. Behind the scenes of this text message are interconnected modules from wireless receivers to pixel-mapped screens. As I am reading this text message from my friend, I am able to gather meaning from it because I am a cognitive agent that understands material sign structures in living contexts of interpretation and understanding. electrical signal patterns are designed to be “communicable” “internally” (unobservably) throughout the components of a physical system (transducers, processors, memory units, digital network connections, interfaces), and “communicable” “externally” to the human users of meaningful signs and symbols through the system channels for “outputting” perceptible representations (usually patterns of pixels, and sound through audio interfaces). 

I quite like this example: 

Consider the parallels with the way we perceive and “decode” symbolic units using “natural” energy (light and sound) as medium. What we observe are effects and inferences from detected information: the light waves hitting our retinas from these text characters with features that we map to patterns, the acoustic waves of speech sounds and musical notes, the patterns of light registered from all the ways we use meaningful visual “information.” Using E-information-designed electronic devices like radios, TVs, and computer screens, we observe the effects of information shaped and “communicable” for internal processing in the systems. Meaning-making, the act of creating, expressing, and understanding meaning, is likewise unobservable (we can’t probe our minds or find the neural structures in our brains at the precise milliseconds we engage in conversations, interpret a page in a novel, recognize a song, or navigate directions along city streets with a GPS map), but we make reliable inferences from all the symbolic representations that we use every day.

As stated above, E-information is simply a layer in the process of sending and receiving text messages. But once those messages are received, it is up to the human brain to connect these symbols, whether they be pixels that make up images, letters, emojis, or however we beings define and practice communication. For humans to be able to interpret these representations,  we require a method of creating recognizable patterns with perceptible distinctions among the symbolic system being used (in this case text). If the states are randomly fluctuating in a substrate, humans cannot recognize patterns needed for interpreting and contextualizing. 

AI is…simple?

In a world where everything seems chaotic and it seems that many things happen randomly, it is quite comforting to hear that “machine learning, and prediction, is possible because the world has regularities. Things in the world change smoothly.” Of course, in this case, Ethem Alpaydin is speaking about the ways in which we can train our AI in order to complete a task or make a prediction, but nevertheless, these systems are trained on data from the world we live in. In fact, the smoothness assumptions of our sensory organs and brain are so important because they are necessary for our learning algorithms, which make a set of assumptions about the data to find a unique model. And while we probably all hold the viewpoint that many of the technologies we see today are extremely complex (which they are), the reason we can train our data to make predictions is that we are unconsciously trying to find a simple explanation for this data. Beyond technology itself, preferring simple explanations is human nature; philosopher Occam’s argues that we should eliminate unnecessary complexity for more favorable interactions. In fact, barcodes and single fonts are the ideal images because there is no need for learning, we can simply template match them

One could argue simplicity is why the binary system works so well for our electronics. Because this system is discreet, aka able to be distinct and differentiated. “We need designer electronics to impose a shape, a pattern, a structure, on a type of natural energy that is nothing like human logic or meaningful symbolic patterns” Professor Irvine states. And the simplest electrical pattern we can design and control is switch states (like on/off, open/closed, etc). Given this, the binary system, which only has two positions and two values, is an efficient way to transform digital binary computers into symbol processors. Binary and base 2 math lead to a mapping system for a one-to-one correspondence and overall present a solution to a symbolic representation and symbolic processing problem. Through this process, we can make electricity hold a pattern in order to represent something not electronic (i.e. something more human). The binary system provides us with a unified subsystem with which we can build many layers on and thus create data structures in a defined pattern of bytes.

When applying this to deblackboxing, in which we remove the notion that a computer/program’s inputs and operations are not visible to the user or another interested party, we can see that at its heart simple systems are used to create our technologies. The principles of computing (communication, computation, coordination, recollection, evaluation, design) in this case are useful, as “some people see computing as computation, others as data, networked coordination, or automated systems. The framework can
broaden people’s perspectives about what computing really is.”

Again, when specifically applied to the principle of computation, we can see that at the heart our systems are composed of layers of binary maps – yes/no’s, 0/1’s, on/offs. It’s beautiful, but there is no ‘magic’ underneath the hood of our systems. We store and train on data, use math, and develop our algorithms to create the technologies we have today.

 

Duality in Development of Technology

It is likely that the most important reason why we develop technology (AI, ML, etc.) is for the benefit of the human – physically, emotionally, and mentally. The social reception of technologies, therefore, is the basis of what determines useful and useless advancements in tech. It can be argued that we as a society have held the view (explicitly and implicitly) that technology is an independent thing and can cause social, political, or economic effects. The utopian/dystopian creates hype, hope that developments in technology will bring about a better world, and hysteria regarding technology as independent, uncontrollable, and influential. On the other hand, I would argue that people who are somewhat knowledgeable about developments in technology can understand the human power, both negative and positive, behind the technology. We can see in films like The Facebook Dilemma the power a human-created algorithm has on politics around the world, or facial recognition scanners programmed to work better on specific races. While it is easy to live in the bliss of having tasks become easier and more automated, overall it is not as dreamlike or uninterpretable as it has once been. It may be more difficult to place blame on specific people, but nonetheless, we can see there is a clear human-powered bias in many of our technologies.

In addition to this idea of hype and hysteria, while I do understand the almost ridiculous technology and automation we see in movies (specifically for the time in which the movies were produced), I believe we are unaware of revolutionary technologies in the works. While talk of self-driving cars has been increasingly popular, we fail to recognize the history of automated vehicles. When attending a tour of the Google office in Chelsea, New York, a Googler said technologies like the Google Home have been in the works for over ten years and there are plenty more technologies in the works right now that no one knows about but will become all the rage in ten years time. 

The two frameworks for producing human-level intelligent behavior in a computer seem to be battling a popularity contest. We have the mind model, or symbolic AI utilizes a series of binary yes/no true/false 0/1 operations to arrive at a conclusion/action. It uses these symbols to represent what the system is reasoning about. Symbolic AI was the most widely adopted approach to building AI systems from the mid-1950s until the late 1980s, and it is beneficial because we can use it to explicitly understand the goal of our technology and the AI’s decision. Alternatively to the mind model is the brain model, which aims at simulating the human nervous system. Obviously, as the brain is extremely complex, it is not yet possible to replicate human-level intelligent behavior, but developers have created technology similar to the human brain. For example, neural networks are based on a collection of connected units or nodes modeled after the neurons in a biological brain.

What I am interested in learning more about is the list of tasks that Michael Wooldridge describes in A Brief History of Artificial Intelligence. At the ‘nowhere near solved’ level, he writes of interpreting what is going on in a photo as well as writing interesting stories. Notably, a scandal broke out in which women searching the word ‘bra’ when in the Photos app were returned photos of themselves in a bra/bathing suit. And we continue to see information from photos read like this. I can type ‘dog’ and get many dog photos from my camera roll, etc. And while I have not been able to do so myself, in an Intro NLP course last semester, we trained our system on a large dataset and could write extremely simple sentences using bigrams or trigrams. While technology cannot create an interesting story out of nothing, it cannot do anything without data and storytelling is no different. This data was also used to predict the order of a sentence and the part of speech of each word, making Wooldridge’s example of “Who [feared/advocated] violence?” or “What is too [small/large]?” as questions a more experienced developer would be able to program. So I suppose my question is: are there truly limits to what we can automate/create? It seems that as time progresses we continue to do things we thought were once impossible.

“A Brief History of Autonomous Vehicle Technology.” Wired, March 31, 2016. https://www.wired.com/brandlab/2016/03/a-brief-history-of-autonomous-vehicle-technology/.

MIT Technology Review. “A US Government Study Confirms Most Face Recognition Systems Are Racist.” Accessed February 1, 2021. https://www.technologyreview.com/2019/12/20/79/ai-face-recognition-racist-us-government-nist-study/.

the Guardian. “Apple Can See All Your Pictures of Bras (but It’s Not as Bad It Sounds),” October 31, 2017. http://www.theguardian.com/technology/shortcuts/2017/oct/31/apple-can-see-bra-photos-app-recognises-brassiere.

FRONTLINE PBS, Official. The Facebook Dilemma, Part One (Full Film) | FRONTLINE, 2018. https://www.youtube.com/watch?v=T48KFiHwexM.