Category Archives: Final Project

Towards a Revised Theoretical Framework for Interface Design – Ojas Patel

Abstract

There have been several attempts to theorize computation and cognitive technologies as we approach an age of ubiquitous computing. While there is a large body of literature that focuses on how to maximize the efficiency of interface design as we approach such an era, there is a lack of theoretical discourse. As interface is the mechanism by which we engage with computers, a revision to theoretical frameworks of interface is necessary in an age of ubiquitous computing. Semiotics lends much in the way of theorizing generative meaning making as well as recursive symbol processing. Applying the study of semiotics to a discussion of interface leads to a move away from the desktop metaphor and the prominence of the GUI interface, and towards more integrated, multi-faculty interfaces that map to distinct layers of information in space. Furthermore, it lends to a process of unlimited semiosis in interface design and diversity in interface metaphors.

Introduction

Semiotics has offered much in the way of how we theorize about our cognitive technologies. From symbolic logic, to cultural institutions, semiotics has provided conceptual frameworks for analysis as well as for conducting research, such as communication in computer mediated environments; the mapping of signal processing in machines and symbol processing in humans; and the role of Semiotic Engineering in issues revolving around HCI (de Souza 415-418). In what follows, I will use semiotic approaches to interface design to contribute to a discussion of interface in a paradigm shift of computation from a single user with a single computer system to models of computation that take growing ubiquity into account.

Cognitive Technologies as Artefactualization of Conceptual Sign Systems

The triadic model in Peircean semiotics breaks down sign processes as such: “A sign, or representamen [the material-perceptible component], is something which stands to somebody for something in some respect or capacity. It addresses somebody, that is, creates in the mind of that person an equivalent sign, or perhaps a more developed sign. That sign which it creates I call the Interpretant of the first sign” (Peirce 16). We can think of the three components of sign processes (Representamen, Object, and Interpretant) as such: sense data, concept, and percept. Each component of a sign process is a sign system in itself, in much the same way Ray Jackendoff treats the different components of spoken language, to be discussed later. As much semiotic literature has found, the fundamental components of human cognition break down to this system of sign processes.

Of interest to a discussion of interface in digital technologies is the instantiation of sign systems through sign processes. To interpret a sign process is to reinforce the sign systems that form the foundation of that sign process. A single painting reinforces the medium (or sign system) of painting, which reinforces visual art, which reinforces artistic expression, which reinforces and preserves human culture. All our cognitive technologies and symbolic expressions exist in an ongoing continuum: a historically constructed network of artifacts, ideas, languages, cultures, media, technologies – anything constructed by humans (Irvine 43). This idea is critical in a discussion of interface in cognitive technologies, because the interface mediates the sign systems of our cognitive faculties with system architecture sign systems, to be elaborated on further later.

Most importantly, cognitive technologies are artefactual instantiations of sign processes. Unlike spoken language, computer mediated sign processes are tokenized instantiations of meaning making. This ties in with the concept of “vital materiality,” or the concept that our artifacts are not a product of our symbolic meaning systems but rather mediate our relationship with the world and are indicative of the human species’ shared experiences (Barrett 10). In other words, artifacts are codified with meaningful and interpretable human data, and furthermore are a necessary component in meaning making in the continuum of symbolic cognition. Cognitive technologies afford artefactualization of conceptual sign systems, and thereby an artefactual network of meaning systems. The more this concept is realized in computational technologies, the more pervasive our artefactual representations of sign systems; the more we computationally remediate sign processes and sign systems, the closer we get to a major paradigm shift in the role of cognitive technologies.

The New Paradigm

One perspective on dealing with this new paradigm in HCI is a transition from thinking about direct manipulation and object-oriented computing to navigation of information spaces (Benyon 426). Thinking about computing as navigation of information spaces frees up the study of HCI from a single user and single system to a larger system of information spaces. In the words of Benyon, “As computing devices become increasingly pervasive, adaptive, embedded in other systems and able to communicate autonomously, the human moves from outside to inside an information space” (426). What this refers to is how computation is mediating more of our sign processes, from the workplace, to museums, to politics, to entertainment, and far beyond. Furthermore, the information in each individual system is unique to that system, which is how we move between various information systems. While the topic of ubiquitous computing is a bit out of the scope of this paper, it will point its eye toward the idea of ubiquitous computing as a motivator for this paradigm shift in HCI.

This idea of technical systems as information spaces ties directly with the notion that we offload symbolic information into artifacts that become part of our cognitive processes, in a process known as “external symbolic storage,” (Renfrew 4). The idea is that our artifacts are part of the cognitive process of preserving culture and information, both in terms of writing as well as architecture, art, pottery, etc. This idea is applicable to the idea of technical systems as information spaces as technical systems are cognitive artifacts. As Peter Bogh Andersen writes, computer systems are layers of sign systems: “If we continue this descent through the different layers of the system, passing through the operating system and the assembly code, down to the actual machine code, we will encounter signs most of the way” (Andersen 6). Even down to the machine code, we can treat sequences of electrical signals as a sign system. Because of this, when we encounter an information system, we are effectively navigating into an information space and its network of information spaces. This is true whether we are navigating the World Wide Web from a personal computer, accessing client information through an employer’s intranet, or ordering a sandwich at a deli kiosk, just as much as the spaces themselves flood us with information about whether it’s a home environment, professional environment, or commercial environment through architecture, décor, and social contracts within those spaces.

The globality of sign systems, especially as it permeates technical computer systems, is indicative of how we offload and automate those sign systems. In the words of Jeannette Wing, “Computational thinking is using abstraction and decomposition when attacking a large complex task or designing a large complex system” (Wing 33). If we think of our artifacts, including technical systems, as organizing complex computational tasks in the everyday generation and preservation of culture, the emergence of computer systems playing central roles in human culture is an extremely confluent event (not to fall into any mystic determinism about emerging technologies). And it is no mistake that computers are layered with sign systems much like our own cognition – computation is a type of human logic. We use computer systems to more efficiently compute, store, and organize the processes and products of this logic.

Further speculation on this perspective of HCI leads to the realm of augmented space or reality. In a ubiquitous computing environment, augmented space refers to the dynamic information spaces layered over physical spaces – more a cultural and aesthetic practice than technical (Manovich 220). This refers to more than just the virtual data in a given space, but the layers of abstraction in a given space. As Manovich writes, “If previously we thought of an architect, a fresco painter, or a display designer working to combine architecture and images, or architecture and text, or to incorporate different symbolic systems into one spatial construction, we can now say that all of them were working on the problem of augmented space – the problem, that is, of how to overlay physical space with layers of data” (Manovich 226). We can think of each layer of data as a layer of symbolic abstraction. For example, in an art exhibit with four walls a floor and a ceiling, one layer of abstraction may be the paintings on the wall; another layer may be the construction of the room itself; still another may be the barriers, walkways, plaques, and other informational behavior modifying objects in a room; yet another abstraction are the people in the room – be they employees or other patrons. To add digital layers of information is to incorporate dynamic, computational information into the space.

To consider this is to realize that computation not only affects our cognitive faculties, but the ontology of our environment as well. The information layers of a given space with digital augmentation allows for a layer of dynamic information, of variability (Manovich 234). This is important, because our spaces and our environment often contains information that is not readily extractable, but rather it exists in a network of knowledge. Without some familiarity of the network within which a given space or artifact exists, the knowledge within the space or artifact is inaccessible. By adding an additional layer of information on top of an artifact or space, we are engaging in a project of augmenting reality, and thereby augmenting human cognition.

Central to this notion is that our environment plays an active role in human cognition; given this, we can think of computer mediated human action as a coupled system (Clark 8). Consider the backwards brain bicycle:

While a bicycle is a locomotive technology, not a cognitive technology, what this video shows us is how integral the form of our technologies are to the cognitive process of using it. Just one component of the bicycle is changed in the backwards brain bicycle, and yet the entire cognitive process of operating the bicycle is stunted. Changing the handling does not make this bike any less of a bike, and all of the components that compose a bike are there. Furthermore, it takes little effort to imagine and conceptualize the new task of riding a backwards brain bicycle – you steer left to turn right and vice versa. However, the cognitive process itself is still stunted. That is because the bicycle itself, and its individual component parts are all part of the cognitive process of riding the bicycle.

For some researchers, this principle of cognition begs for cognitive ethnography to play a central role in HCI research (Hollins 181). This is an important point for my project because this idea of augmenting space and human cognition must incorporate the features and properties of specific cognitive processes if we are to design appropriate cognitive technologies for the tasks they are designed for. I can think of no one better to source in a transition to a discussion of interface than Douglas Engelbart, credited with the invention of the Graphical User Interface, the desktop metaphor, and the mouse/pointer. His theoretical idea of augmenting human intellect is as follows: “By ‘augmenting human intellect’ we mean increasing the capability of a man to approach a complex problem situation, to gain comprehension to suit his particular needs, and to derive solutions to problems. Increased capability in this respect is taken to mean a mixture of the following: more-rapid comprehension, better comprehension, the possibility of gaining a useful degree of comprehension in a situation that previously was too complex, speedier solutions, better solutions, and the possibility of finding solutions to problems that before seemed insoluble” (Engelbart 95).

The Role of Interface in the New Paradigm

Direct manipulation as a paradigm for HCI was extremely important, as we needed an interface that was universally accessible. However, as computation grows in ubiquity in our everyday meaning making and cognitive tasks, there is a growing need to rethink interface and employ more complex cognitive faculties in interface design. The primary role interface plays in the navigation of information spaces is that the interfaces characterize our relationship with those spaces and systems. Interfaces mediate cognitive symbolic faculties with the symbolic representations in external sign systems. Sign systems themselves, a relationship with any external sign system is dependent on interfaces.

More generally, we can use interface to think about meaning systems not only in terms of technological processes, but other cognitive meaning making processes such as language. Ray Jackendoff uses the concept of interface to revise Chomsky’s claim about syntax’s role in language to say that phonological, syntactic, and semantic structures all interface with each other to produce language and generativity within that language (125). The role of interface as a construct in meaning making is about how it networks components of two distinct and related sign systems.

This is crucial to our understanding of interface, because even when we talk about sign systems, we are using a sign system to interface to that given system: language. In Peircean semiotics, “A Sign is a Representamen of which some Interpretant is a cognition of a mind” (Peirce 291). Furthermore:

Signs are divisible by three trichotomies: first, according as the sign in itself is a mere quality, is an actual existent, or is a general law; secondly, according as the relation of the sign to its Object consists in the sign’s having some character in itself, or in some existential relation to that Object, or in its relation to an Interpretant; thirdly, according as its Interpretant represents it as a sign of possibility, or as a sign of fact, or a sign of reason (291).

Even in talking about the components of our symbolic thinking, reduced to a triadic model of sign components, Peirce generated a lexicon for talking about the sign system of symbolic cognition: a sign system within the greater sign system of language. We use logic and language to navigate between these sign components and reason with them.

Symbolic thinking affords an endless chain of associations made to create larger systems of meaning. The more sign systems are created, and the more they grow in complexity, the more crucial a theoretical understanding of interface, especially on the topic of designing interface for cognitive technologies. If our cognitive technologies automate abstractions at various levels, then our interfaces are responsible for assuring these automations as well as our abilities to interact with them are efficient enough to reduce noise or elements that inhibit meaning making.

While the origins of language and the interfaces between phonology, syntax, and semantics continue to be a mystery, the origins of interfaces are not always a mystery. In the case of our symbolic cognitive faculties and their relationships with language and technologies, and especially their origins, there are still many matters of debate. However, semiotics lends us an important understanding of HCI, namely that “research should focus on the interface — considered as a sense production device — and should analyze the ambiguous game between signification and interpretation played by designers and users. For semioticians this process is not a linear transmission of information (interface–>user) but a cooperative one (designers <–>interface <–> user): both designers and users, mediated by the interface, participate in this contractual game of sense production” (Scolari 5). In other words, to execute actions on a personal computer is largely a communicative process between the user and software designers through the medium of interface. This is crucial because this important facet of our relationship with our technologies through interface is hidden in our direct interactions with interfaces. Researchers need to focus on the specific artifacts of interface to really discuss the relationship between users and designers. The Scolari article focuses on this for research in improving interface design, which is important and while I will discuss it briefly, I think the same research project should be employed for understanding the various consequences and implications of human designed interface, which I will also discuss later.

Interface as a design problem finds its solution in the same place the problem comes from in a semiotic framework – the cognitive affordances and restraints within a given computational event. For example, systems designers of flight systems need to implement recognizable textual and pictorial symbols in their interfaces for users, which are already characterized by the event of flight (Andersen 6). In other words, systems designers must navigate their interface design between intuitive inferences of users’ cognitive faculties and the technical systems’ symbol processing. Designers must utilize the affordances of the event within the restraints of the event – the problem is the restraints; the solution – affordances. And balancing between the affordances and restraints are exactly why interface is so important. Because “interfaces do not only present a content and a set of instructions for interaction: they also offer information about users possible movements and represent the relationships of the communicational exchange” (Scolari 9).

The Importance of Interface Design

Before discussing the features of how interface design should be approached, I first want to handle why this is an important problem by discussing representations of interface. First of all, though we’re used to thinking about interface as the GUI of personal computers, I once again want to invoke the idea that interface is more so a sign system that mediates two different sign systems. This is important, because new types of interfaces are constantly emerging in our technologies, such that utilize “gesture recognition, voice commands, and eye-tracking” which “present themselves as lower level inputs that do not tire out the user, but offer a good cognitive control-to-task fit” (Mentzelopoulos 65). This is to say that our interface affordances are increasing, such that utilize different cognitive processes and physiological processes that fit different technological tasks better than the traditional graphical WIMP interface.

Consider Xbox’s gesture recognition and voice command interface, Kinect.

The camera and microphone recognize gestures and voice commands for navigating the Xbox graphical interface, selecting/executing Xbox games/applications, and for controlling gameplay. In terms of an entertainment atmosphere, this ability to use hand gestures and voice commands increase the cognitive affordances possible than are possible with just a game controller, and thereby adds layers to the information space of the room the Xbox is in. Returning to Manovich’s point of augmented space, the camera affords a translation of hand gestures to a digital layer of data. In other words, the human body itself becomes a layer of digital information and part of the Xbox interface in the form of gestures recognizable by both Kinect and the user.

Important to note from the advertisement is the representation of this physical interface. The setting is the home; the whole family participates; children are smiling as they operate virtual steering wheels; and the tone of the whole advertisement fits the image of the product. The whole idea is to have fun with a new interface for an entertainment system.

What if the same interface action is applied to a different setting and tone altogether? Consider the following clips: one from the Xbox advertisement, and one from an episode of the science fiction thriller series Black Mirror.

View post on imgur.com

 

View post on imgur.com

The interface affordance of a handwave to scroll through content horizontally is the same in both clips. However, note the differences between colors and facial expressions in the two clips. In the Xbox advertisement, a warm tone is created for the effect of marketing an image of family entertainment and wholesome fun. In the clip from Black Mirror, darker colors are used and a somber facial expression are used to generate a somber tone. This is because the scene is set in a workspace. Bingham, the character in the scene, is on an exercise bike, and as he pedals, he earns credits which he can use to purchase more content features on the entertainment system – in the world of this episode, riding the exercise bike is a form of labor. In this sense, the exercise bike interfaces to a technically mediated economic system, which interfaces to the entertainment system he scrolls through. Thematically, this is in line with Neo-Marxist frameworks of thinking about technologically mediated capital systems. Bingham feels oppressed by this system, and this is understood by his obsession with beauty and authenticity and his expressions of frustration with the popular American Idol-esque show within the show that turns his romantic interest into an adult film actress.

All this is to say that the interface characterizes our relationship with the sign systems it mediates. In the Kinect commercial, the hand wave gesture is characterized by a warm relationship with the entertainment the user is scrolling through. In the Black Mirror clip, the same gesture is characterized by feelings of hostility toward the industry of entertainment and oppression by the economic system. In one, the hand gesture is a mechanism of play and choice; in the other, one of indoctrination. Interface plays a crucial role in our relationships with our sign systems, and what the above clips show us is that interface is a dynamic sign system itself, characterized by the status of the sign systems it mediates, as well as the users relationship with those sign systems. Therefore, we have to be careful about how we design interface, think about interface design, especially as information spaces and computationally mediated systems grow more ubiquitous.

A New Theoretical Framework for Interface

With an eye toward augmented space and information spaces, with a new paradigm in computing affordances must come a rethinking of interface. The desktop metaphor and GUI have perhaps become inadequate for computationally mediated spaces that utilize more of our own cognitive affordances. To return back to Mentzelopoulos’s point of “control-to-task fit” in interface, it becomes more and more important to offload the right cognitive tasks to the right type of interface, be it through direct manipulation in a traditional GUI, voice command or gesture in a perceptual interface, or remediation of space in an augmented reality interface. We need a synthesis of our computational affordances to redesign the way we think of computer interface.

Namely, I am calling for a paradigm shift from the way we think about interface in our current personal computer environments. While culture-centered design is not quite what I’m calling for, the abandonment of the desktop interface metaphor is something in line with culture-centered design researchers: “…the desktop, which in theory should empower users to customise and personalise, according to their cultural context as manufacturers promise in their marketing slogans, has been restricted by existing operating systems, which only give the user a certain level of autonomy, such as freely chosen multiple languages, character sets and national formats” (Shen et al. 822). The desktop interface and metaphor are inadequate for a computer system as ubiquitous as the modern OS. At risk is cultural variance between textual formats, color schemes, object layouts,

The problem with our current desktop environment is that it instantiates the desktop metaphor as a principle of computing, whereas the desktop metaphor and WIMP interface are not principles of computing, but rather they are one of the many possible ways computing can be expressed in a GUI computer system. The universality of the desktop metaphor comes from the role interface plays: “a user-interface metaphor is a device for explaining some system functionality or structure (the tenor) by asserting its similarity to another concept or thing already familiar to the user (the vehicle)” (Barr et al. 191). Designers need to choose an interface metaphor that is recognizable to consumers, and perhaps the most obvious choice for a project of augmenting human intellect starts with an augmentation of our work environments. However, as ethnography has been suggested as part of interface design research above, we are entering an era where different designs can be utilized for various cultural and computing environments. This opens up freedom for variance in designs and metaphors, across a range of computing events and cognitive processes. While there is certainly diversity in software design and interface, there needs to be whole new sets of metaphors for computing systems. In the direct words of CCD researchers:

“There seems to be a gap between notions of technology and culture, and a lack of appropriate and valid approaches to their synchronisation. More positively, researchers have been encouraged recently to establish more empirical and practised-based studies within the field of culture and usability. It is likely that a deeper understanding of culture, human cognition and perception followed by the evolution of technology, may help to bridge the gap” (Shen et al. 826).

Interface metaphors are also closely linked with the tasks they are designed for. For example, in a Swedish office setting, the word “kort” (which translates to card) is used to refer to electronic cards the employees use to organize information in an online file system. The same word is used to refer to the paper cards in their physical filing systems. This is because work language itself evolves with the tasks of the work environment (Andersen 25). So too should our interface designs and metaphors.

It doesn’t take much past looking to the semiotic properties of interface.

View post on imgur.com

(Barr 200). Considering the computer icon as the representamen, the conceptual component of the sign process is the potential for the action of printing. The Interpretant is that clicking the icon leads to a printed document. However this idea of printing a document that’s on the computer already utilizes a metaphor that the file is similar to a document. Consider this graphic:

View post on imgur.com

(Barr 202). The metaphorical entailments are the cognitive associations between the metaphor itself which functions as a representamen and the affordances the metaphor offer. In the communication between designer and user, an essential semiotic component of interface are the purposeful associations made by use of metaphor. So if we call a text file on a computer interface a document, we make all the associations that come with the metaphor. The researchers refer to this set of associations as the UI metaphorical entailments (Barr et al. 207).

Conclusion

The entire discussion of interface is crucial to how we think about our cognitive technologies in a process of unlimited semiosis (Barr et al. 201). As our interfaces network us to our cognitive technologies, it is important to design our interfaces in a way that properly represents their role in the continuum of our symbolic thinking. As our cognitive technologies, by their artefactualization of sign processes, exist as instantiations of sign systems, we need to be careful about how we design these technologies. Left alone, the risk of improper metaphorical associations as well as an erasure of computational diversity are too high. Interface plays too crucial of a role to allow a single interface metaphor to be the basis of how computation is culturally constructed. If computation is a desktop in our computers, wouldn’t that imply the computational logic we employ in our own minds have the same associations? We are far more than how our workspaces define us, and therefore we need a more engaging and diverse series of interface metaphors and designs.

Works Cited

Andersen, Peter B. “Computer Semiotics.” Scandinavian Journal of Information Systems, vol. 4, no. 1, 1992, pp. 3-30.

Barr, Pippin, Robert Biddle, and James Noble. “A Semiotic Model of User-Interface Metaphor.” Virtual, Distributed, and Flexible Organisation: Studies in Organisational Semiotics, edited by Liu Kecheng, Kluwer Academic Publishers, pp. 189-215.

Barrett, John C. “The Archaeology of Mind: It’s Not What You Think.” Cambridge Archaeological Journal, vol. 23, no. 1, Feb 2013, pp 1-17.

Benyon, David. “The New HCI? Navigation of Information Spaces.” Knowledge-Based Systems, vol. 14, no. 8, 2001, pp. 425-430.

Clark, Andy and David Chalmers. “The Extended Mind.” Analysis 58, no. 1 (January 1, 1998): 7–19.

de Sousa, Clarisse S. “Semiotic Approaches to User Interface Design.” Knowledge-Based Systems, vol. 14, no. 8, 2001, pp. 415-418.

Destinws2. “The Backwards Brain Bicycle – Smarter Every Day 133.”YouTube. YouTube, 24 Apr. 2015. Web. 19 Oct. 2016.

“Fifteen Million Merits.” Black Mirror, season 1, episode 2, Channel 4, 11 Feb. 2013. Netflix, https://www.netflix.com/watch/70264858?trackId=13752289&tctx=0%2C1%2C7de57de5-13bb-47dc-8d56-862526c8977b-132115175.

Gearlive. “E3 2009: Project Natal Xbox 360 Announcement.” YouTube. YouTube, 02 June 2009.

Hollan, James, Edwin Hutchins, and David Kirsh. “Distributed Cognition: Toward a New Foundation for Human-computer Interaction Research.” ACM Transactions, Computer-Human Interaction 7, no. 2 (June 2000): 174-196.

Irvine, Martin. “The Grammar of Meaning Making: Signs, Symbolic Cognition, and Semiotics.”

Mentzelopoulos, Markos, Jeffrey Ferguson, and Aristidis Protopsaltis. “Perceptual User Interface Framework For Immersive Information Retrieval Environments.” International Journal Of Interactive Mobile Technologies 10.2 (2016): 64-71.

Peirce, Charles S. From “Semiotics, Symbolic Cognition, and Technology: A Reader of Key Texts,” collected and edited by Martin Irvine.

—. Peirce’s Lecture on Triadic Relations and Classes of Signs, Lowell Institute, 1903.

Ray Jackendoff, Foundations of Language: Brain, Meaning, Grammar, Evolution. New York, NY: Oxford University Press, USA, 2003

Jeannette Wing, “Computational Thinking.” Communications of the ACM 49, no. 3 (March 2006): 33–35.

Manovich, Lev. “The Poetics of Augmented Space.” Visual Communication, vol. 5, no. 2, 2006, pp 219-240.

Renfrew, Colin. “Mind and Matter: Cognitive Archaeology and External Symbolic Storage.” In Cognition and Material Culture: The Archaeology of Symbolic Storage, edited by Colin Renfrew, 1-6. Cambridge, UK: McDonald Institute for Archaeological Research, 1999.

Scolari, Carlos. “The Sense of the Interface: Applying Semiotics to HCI Research.” Semiotica, vol. 177, no. 1, 2009, pp. 1-27.

Shen, Siu-Tsen, Martin Woolley, and Stephen Prior. “Towards Culture-Centred Design.” Interacting with Computers, vol. 18, no. 4, 2006, pp. 820-852.

Talk to me in Spanish, French, English, and Chinese (Roxy)

Abstract

Thanks to Google Translate, we can realize some communication we never thought that might happen. With GNMT (Google Neural Machine Translation) system, Google had improved his performance by a lot. What human beings’ cognitions does it distribute? How to improve Google Translate? In this paper, I will trace the history of the technology applied in machine translation first, including the PBMT (phrase-based machine translation) system and the GNMT (Google Neural Machine Translation), explain the poor behavior of Google translate by comparing the differences of understanding a sentence between human beings and machines (with examples), and analyze the human beings’ cognitions distributed by Google Translate.

1. Introduction

My friend Melissa, a graduate student in the translation of English and Chinese, had told me that: “As a translator, when I saw the latest progress of Google Translate, I totally could understand the anxieties and fears of the textile workers in the 18th Century when they saw the steam engine. Google Translate, the industry leader of machine translation, is a useful technology for people to extend and distribute their cognitions.

Communication is really important to human beings, the social beings who have to live in society and deal with each other. Languages are always the biggest barrier for people to understand the outside world. Semiotics is the study of signs. According to C.S. Peirce, there are three kinds of signs: icons, indices, and symbols. Icons represent things by simply imitating them; indices convey the idea by being physically connected with them, and symbols convey the meaning because of their usages. The most typical and common example of a symbol is language ( Irvine, 2016a). The semiotic feature of language makes language impenetrable to machines. Google had already improved its algorithm a lot, but still had some flaws.

2.The Main body of the essay

2.1Technical Overview

2.1.1 The introduction of Machine translates

Machine Translation is the use of software to translate a source language into the target language. Machine translation system can be divided into two categories: rule-based and corpus-based. The former’s resources are dictionaries and rule bases, the latter are corpora with the statistical mean.

Only a few years ago, PBMT (Phrased-Based Machine Translation) system was the mainstream approach of machine translation. Google Translate was based on this algorithm as well. Google machine translation basically uses technology-based statistical machine translation method. It takes a large number of bilingual web content as a corpus, and then selected the most corresponding words of the original language. The “Phrase”, here, in “phrased-based” means the smallest unit of translation.

2.1.2 How PBMT works?

First of all, PBMT breaks up the sentences into phrases according to the syntax of this language. Syntax, grammar, describes the rules and constraints for combining words in phrases and sentences that speakers of any natural language use to generate new sentences and to understand those expressed by others(Irvine, 2016b). Here is an example, we can see the syntax tree of the sentence “Once when I was six years old I saw a magnificent picture in a book, called True Stories from Nature, about the primeval forest(From The Little Prince written by Anntonie de Saint-Exupery, translated from the French by Katherine Woods ).”%e5%b1%8f%e5%b9%95%e5%bf%ab%e7%85%a7-2016-12-16-%e4%b8%8b%e5%8d%8810-26-14

Then, it will match each phrase to the target language from its big data.

Finally, it is necessary to rephrase the target language phrases so that it can conform to the syntax of the target language.

During the whole translation process, it is also necessary to use other lower-level NLP (Natural Language Processing) algorithms, such as Chinese word segmentation, part of speech, syntax structure, etc. Admittedly, Google’s technology is advanced, but it still sometimes generates all kinds of translation jokes. The reason is that statistical method, unlike human beings with knowledge, needs a large-scale bilingual corpus. The accuracy of translation directly depends on the size and accuracy of this corpus. This way of translation will eventually generate the incorrect translation because of the error propagation since any error in the middle link will continue to spread down, and lead to the wrong final result. Therefore, even if the accuracy of a single system can be as high as 95%, the accumulation of minor error will cause an unacceptable result.

2.1.3 How does GNMT work?

In September 2016, Google has been published “Google`s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation” on http://ArXiv.org.

With the same corpus, the GNMT (Google Neural Machine Translation) system can achieve the same result with less workload compared with the PBMT (phrase-based machine translation) system. The word “neural” in GNMT means it can pay attention on the words you’ve inputted. In the past, the translation engine would only look at each word, one by one, and provide you the language matching with it. But the GNMT will “read” the words at first. For example, if the texts you’ve input is an excerpt of a news that had already translated into French by the news website. The Google Translate will then provide you the same chunk from the French page.

The diagram below shows how GNMT translates a Chinese sentence into an English sentence.

This model follows the common sequence-to-sequence learning framework with attention. It has three components: an encoder network, a decoder network, and an attention network.

Formal languages are defined with respect to a given alphabet, which is a finite set of symbols, each of which is called a letter. This notation does not mean, however, that elements of the alphabet must be “ordinary” letters; they can be any symbol, such as numbers, or digits, or words (Clark, A., f 2013). First, “The encoder transforms a source sentence into a list of vectors, one vector per input symbol.” Here, it encodes each Chinese character into each vector. “Given this list of vectors, the decoder produces one symbol at a time, until the special end-of-sentence symbol (EOS) is produced. The encoder and decoder are connected through an attention module which allows the decoder to focus on different regions of the source sentence during the course of decoding

%e5%b1%8f%e5%b9%95%e5%bf%ab%e7%85%a7-2016-12-19-%e4%b8%8b%e5%8d%8812-45-40With GNMT system, Google Translate can achieve a better result under the standard of human assessment. With the help of bilingual human assessors, the sample sentences from Wikipedia and news website can reduce (Wu, Y. 2016).

%e5%b1%8f%e5%b9%95%e5%bf%ab%e7%85%a7-2016-12-17-%e4%b8%8b%e5%8d%886-37-08

Data from side-by-side evaluations, where human raters compare the quality of translations for a given source sentence. Scores range goes from 0 to 6, with 0 meaning “completely nonsense translation” and 6 meaning “perfect translation.”

Here are some examples of a translation produced by PBMT, GNMT, and Human.

%e5%b1%8f%e5%b9%95%e5%bf%ab%e7%85%a7-2016-12-17-%e4%b8%8b%e5%8d%886-36-01

Machine translation is far from being perfect. The encoding of sentences into vectors, regardless of its language feature, the content may be uncontrollable, and lead to errors. GNMT will still make significant mistakes that human translators will never make, such as misspellings and misinterpretations of rare terms. However, GNMT represents a major milestone.

2.2 Google Translate V.S. Human Beings

“As the engineering branch of computational linguistics, natural linguae processing is concerned with the creation of artifacts that accomplish tasks” (Clark, A.2013) NLP is the main source for machines to understand a sentence, most NLP tasks require the annotation of linguistic entities, with class labels: A part-of-speech tagger, for instance, assigns a part of speech to each word.

When I saw the news of Google ‘s new GNMT system, I was very interested. So I used a Chinese article to test Google Translate. The article I chose was very logical, it is an introduction of an earphone. The result I received was surprisingly good, the choices of words are very accurate and suitable.

But this matter is not that simple.

2.2.1 the Semiotics of language

The linguistic sign units have two sides, the concept and sound-image. Saussure will use sign [signe], signified [signifé] and signifier [signifiant] to mention word, concept, and sound-image respectively. (Martin, I. 2016 a)

%e5%b1%8f%e5%b9%95%e5%bf%ab%e7%85%a7-2016-12-19-%e4%b8%8a%e5%8d%885-06-17

An English-speaker can understand the meaning of words. He can follow the command to pick the red paper among 10 pieces of colorful paper. “But a computer cannot understand the meaning of red, just as a piece of paper cannot understand what is written on it (Hausser, R.1999).”

Let’s see Google Translate’s performance when dealing with a more flexible and oral article.

%e5%9b%be%e7%89%87-1

 

And this is the translation from a Chinese-English translation company: “If the performance of GTX 950 is taken as a benchmark of 100%, then the performances of GTX 1050 and GTX 1050Ti can reach 110% and 140% respectively. They also excel the previous generation models in terms of power consumption. Apart from saving energy, lower power consumption means less heat generated, which is a good news for the game players with small-chassis computers for which ventilation can be an issue.“

We can see that the google translation version, compared with the one translated by human, is barely acceptable.

2.2.2 The Beauty of Language

The machine cannot understand the beauty of language. Poems are fantastic because they have rhythm and verbal poetic images. Here is an example,  this is an ancient Chinese poem.

陆游 《卜算子·咏梅》

驿外断桥边,

寂寞开无主。

已是黄昏独自愁,

更著风和雨。

 

无意苦争春,

一任群芳妒。

零落成泥碾作尘,

只有香如故。

 

(The Diviner-Ode to the Plum

By Lu You

Tr. Zhao Yinchuan

Beside the broken post bridge there

It blows, solitarily sane

The dimming dusk it can hardly bear

And there’s the slash of wind and rain

 

It contends for spring with no one

That horde of flowers, let them flare

It falls into dust, trundled to none

Its aroma welling as e’er)

In this poem, the rhyme is following a certain pattern. The human translator can understand it and find the words rhythm with each other to accomplish the translation. But the machine can never tell this difference. It will follow the big data’s command and output the most common combination of these words.

In addition, “Broken Bridge(断桥)” “Dimming Dusk(黄昏)” “Wind(风)” “Rain(雨)” These verbal poetic images jointly create a lonely and clear atmosphere. We can see the picture imaginatively: a plum is opening in a desolate spot next to a broken bridge, the evening wind and rain scatter the plum into the mud, but it still maintains its aroma.

Language sign, according to Saussure, has (1) the arbitrary (i.e., unmotivated) structural relation of sound and meaning in any natural language (the foundation of language as a symbolic system), (2) speech sounds, word forms, and meanings are elements in a system of interrelations, within which, and only within which, they function as constituents of a language; and (3) the recognition of two dimensions of meaning — the “context-free” sense (like dictionary meaning) and social-cultural value (meaning in contexts of use) (Saussure, 1959). These features make language can only be understood by the people who understand the system of this language (signifier), and the meaning of this symbol (signified). This limitation is the biggest barrier for the machine to fully understand one sentence.

“Broken Bridge(断桥)” “Dimming Dusk(黄昏)” “Wind(风)” “Rain(雨)” these words are accepted by the Chinese society to signify a specific meaning. These verbal poetic images can accurately express the feeling of the author so that the readers receive an aesthetic experience.

2.2.2 The Formation of Language

Finally, it cannot match the same target language if the formation of the source language is changed.

%e5%9b%be%e7%89%87-2

%e5%9b%be%e7%89%87-3

Now, Google Translate can deal with the translation between English and Chinese. But if I add an auxiliary on the same sentence, the translation cannot reach the ideal result.

The process of GNMT is purely a process of fitting functions. Through this fitting function, if the source language changes its formation, it will map different target language, even if they have the same meaning. So, adding a few irrelevant words will change the result enormously.

2.3 Distributed, Extended, and Embodied Cognition

Google has already completed the experimental and commercial practice on its TensorFlow platform (TensorFlow™ is an open source software library for numerical computation using data flow graphs) with Tensor Processing Units. This technology

2.3.1Distributed cognition between individuals and technology

2.3.1.1 Beyond Direct Manipulation: Graphical Interface

The interface is the access for people to manipulate the things existing on the screen. An important research issue for the field of human-computer interaction is how to move beyond current direct-manipulation interfaces (Hollan, J.2000). This web page allows us to interpret actions such as, input the source language, change the source language, get the output of the target language, share the texts, listen to the texts, etc. Some of the actions can be realized in the real world, but some have no easy counterpart.

%e5%b1%8f%e5%b9%95%e5%bf%ab%e7%85%a7-2016-12-19-%e4%b8%8a%e5%8d%889-43-08

As users become more familiar with an environment they situate themselves more profoundly(Hollan, J.2000). “Everything we take for granted about graphical “interfaces” – software controlled pixel mapping with an “interactive” software layer engineered to track pointing devices and defined regions for user-activated commands (icons, menus, links) – were developed in this context for “augmenting human intellect” and organizing all forms of symbolic representations and expressions. “(Martin I., 2016 c)This website, like the most website, uses graphical interfaces (icons) to connect the users to the website.

2.3.1.2  Provide knowledge

Listen: From the icon “listen”, Google Translate will read this text. Google Translate recorded a human’s voice of thousands and thousands of carefully-chosen sentences to make the voice of it. These sentences are chosen to contain all the sounds in one language and all the combinations in this language (For example, in English, the /s/ sound will change to accommodate the letter in front of it). They by following divide these sentences into sound tokens. The voice we can hear is the combination of these tokens. The technology could provide us with the knowledge and skills that are unavailable from internal representations (Zhang, J., 2006). We could imitate the sound provided by Google to speak another language.

Translation (Website translation, Camera instant translation):

@Aidan Mechem twittered his experience with Google Translate.

%e5%b1%8f%e5%b9%95%e5%bf%ab%e7%85%a7-2016-12-19-%e4%b8%8a%e5%8d%8810-29-32

With notebooks, we don’t have to memorize everything happens in the daily life; with Google Translate, we don’t have to learn Spanish to communicate with a Spanish-speaker. Google Translate is an affordance of our cross-language communication.

2.3.2 Distributed cognition across individuals.

2.3.2.1 Cross-cultural communication

Instead of spending times in learning other languages, Google Translate provide you the most convenient and efficient solution of understanding other languages. The function of “share”, and “read phonetically” democratize the cross-language communication.

2.3.2.2 Google Translate Community

Languages are much more complex than the machine can understand. The language may be unclear without the context. The sentence like “J’ai votre nom” can be understood in two different meanings: “I have all of your names” and “I have your name; you are one person I respect”. That is why Google Translate Community is available for anyone to correct the translation. Even if the meaning is clear, idioms and specific terms and expressions make translation very hard for machine translations to translate precisely. The meaning of one sentence may change with different social context or the status of the reader and writer. On analyzing the distributed cognition across individuals, reductionists insist that the cognitive properties of a group can be entirely determined by the properties of individuals”; the interactionists insist that the interactions among the individuals can produce emergent group properties that cannot be reduced to the properties of the individuals(Zhang, J., 2006). The communities’ advises can be really important for Google to expand its corpora and by following to provide better service.

%e5%b1%8f%e5%b9%95%e5%bf%ab%e7%85%a7-2016-12-17-%e4%b8%8a%e5%8d%881-23-31

%e5%b1%8f%e5%b9%95%e5%bf%ab%e7%85%a7-2016-12-19-%e4%b8%8a%e5%8d%888-53-42

3. Conclusion:

Machine learning is a process of identifying the vectors. For example, there are two boxes of fruit, including apples and oranges. The machine will first identify their vectors: (red, with handle, sweet) = apple, (yellow, no handle, acid) = orange. Facing a green apple, the machine can find that this green apple is relatively close to the apples’ vectors, and will identify it as an apple.

In the linguistic field, NPL is the way for the machine to analyze one sentence. GNMT system is a new algorithm for Google Translate to pay attention to the connection between the source language and target language. With these algorithms, Google Translate has actually been incredibly successful, in my point of view. But there are some big problems when translating: for example, from Chinese to English, Google Translate can’t tell where Chinese words start and stop since there aren’t spaces between Chinese words.

An English speaker will say “it’s Greek to me” when he cannot understand a word; a Greek speaker will say “it sounds like Chinese” when he encounters the same difficulty; a Chinese speaker addresses this situation as “it’s Heavenly Script to me”. Google Translate can break down the barrier of language for us. I think Google Translate can be improved to bring a revolution in the near future to the cross-language communication.

%e5%9b%be%e7%89%87-1

 

 

 

 

References:

Martin, I. (2016 a). The grammar of meaning systems: Sign systems, symbolic cognition, and semiotics. Unpublished manuscript.

Martin Irvine. (2016b). Introduction to Linguistics and Symbolic Systems: Key Concepts. Unpublished manuscript.

Martin Irvine. (2016c). Introduction to the Technical Theory of Information. Unpublished manuscript.

Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., … & Klingner, J. (2016). Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv preprint arXiv:1609.08144.

Carey, J. (2002). A cultural approach to communication. McQuail’s reader in mass communication theory, 36-45.

Clark, A. (2008). Supersizing the mind: Embodiment, action, and cognitive extension. OUP USA.

Clark, A., Fox, C., & Lappin, S. (Eds.). (2013). The handbook of computational linguistics and natural language processing. John Wiley & Sons.

Zhang, J., & Patel, V. L. (2006). Distributed cognition, representation, and affordance. Pragmatics & Cognition, 14(2), 333-341.

Hausser, R., & Hausser, R. (1999). Foundations of computational linguistics. Berlin: Springer.

Ferdinand de Saussure, Course in General Linguistics. 1911-1916. English translation by Wade Baskin, 1959. Excerpts.

Hollan, J., Hutchins, E., & Kirsh, D. (2000). Distributed cognition: toward a new foundation for human-computer interaction research. ACM Transactions on ComputerHuman Interaction (TOCHI), 7(2), 174-196.

Chandler, D. (2007). Semiotics: the basics. Routledge.

Short, T. L. (2007). Peirce’s theory of signs. Cambridge University Press.

Bergman, M., & Paavola, S. (2010). The commens dictionary of Peirce’s terms-Peirce’s terminology in his own words.

Denning, P. J., & Bell, T. (2012). The information paradox. American Scientist, 100(6), 470.

Perice’s Semiotics and Environmental Education

 

Abstract:

This paper argues for an enhanced education design in Peirce’s theoretical work on semiotics for those working in environmental conservation initiatives. In light of the recent environmental and climate crisis, our communities are faced with a myriad of complex innovation challenges. The environmental crisis is weaved into faulted economic, technological, and social systems, which lack an understanding and foresight of our impacts on the surrounding ecosystem. A true paradigm shift is needed in environmental science and management, which is both collaborative and complex. Peirce’s work in semiotics values the complex and collaborative understandings of mapping and sharing knowledge on our universe. This paper takes the specific field of cartography as a case study for the argument. Particularly, the principles of semiotics should be applied to the use of new interactive maps such as Google Earth Maps.

Introduction:

“A sign is something by knowing which we know something more. The whole universe is perfused with signs,” said C.S. Peirce. Being that the whole universe is perfused with signs, it is not unusual that humans have been dedicating themselves to depicting the surrounding universe into a symbol form for thousands upon thousands of years through mapping. The complex academic work of C.S. Peirce has led to advanced work in languages, syntax, and computational work as it has enlightened the field of semiotics and provided a base for symbolic theory. As a polymath, Peirce drew connections between logic, math, and linguistics. Notably, he was also a cartographer and is known for creating the quincuncial map. His work hightlights the complex nature of our cultural connections and notes that the study of signs is in many ways the study of relationships between individuals, culture, sign vehicles, time, and the surrounding the environment. It is this focus on the complex networks of understanding that leads one to see parallels in his work with the future of environmental science, map-making, and ecological knowledge.

Throughout history, maps have been used as a symbolic meaning-making system to understand and communicate a shared environment. Maps are persuasive, political, scientific, and explanatory. Humans have culturally evolved throughout time because of our abilities to interact with representations of reality rather than true reality. Maps have allowed people to symbolically render their surrounding environments to make documentations, future plans, and share information to others in the community.

“Peirce discovered that the human social-cognitive use of signs and symbols in everything from language and mathematics to scientific instruments, images, and cultural expression provides a unifying base for understanding meaning, knowledge, learning, and what we call “progress” in developments in both sciences and arts,” (Irvine). I would propose that these principles are being largely ignored in today’s environmental movement as ecological scientists have been unable to have the resources to properly communicate environmental knowledge.

Peirce’s semiotic principals should be more widely distributed as a means for the environmental paradigm shift to begin. The affordances of Google Earth’s Web 2.0 software allows for the potential greatest cartographic collaboration in history. Google Earth attempts to subvert traditional power structures through the use of interface and an demonstrated understanding of the semiotics of maps. Peirce’s theories on semiotics can and should be applied to our analysis of interactive design in Google Earth.

Peirce Introduction

C.S. Peirce’s lifeworks revolved around the process of meaning-making and knowledge as a generative process. With every symbolic experience, one experiences a process of combinatorial information processing. “A sign that by knowing something by which you know more said Peirce. He developed an understanding of the meaning-making process as a triadic experience. This process in explained through Martin Irvine’s evaluation, “A Sign, or Representamen, is a First which stands in such a genuine triadic relation to a Second, called its Object [an Object of thought], as to be capable of determining a Third, called its Interpretant, to assume the same triadic relation to its Object in which it stands itself to the same Object,” (Irvine).

Peirce expanded this notion by defining multiple different types of signs and categorizing them under icon, indexes, and symbols. Icons were defined by Peirce as, “a mere community in some quality.” Put simply, the sign would share a quality with what it was signifying and also called likenesses. Indexes were “whose relation to their objects consists in a correspondence in fact” and this indexical aspect would point in a way to the true thing it was representing. Finally, symbols were those “whose relation to their objects is an imputed character” which had general or conventional connections to the object, (Stanford).

Peirce Cartography

C.S. Peirce begins his essay A Quincuncial Projection of the Sphere, “For meteorological, magnetological and other purposes, it is convenient to
have a projection of the sphere which shall show the connection of all parts
of the surface.” In this essay, Peirce further explains his projection “is formed by transforming the stereographic projection, with a pole at infinity, by means of an elliptic function.” This map places the north and south poles as single points that are then radiated out from with mathematical precision. The map uses squares of varying sized scales from his formula to create the highly accurate spatial rendering of the sphere on the map, (Peirce).

His theoretical work was based around the logic problem of representation, which linked his interests in mapping, imaging, language, and mathematics. With his fascination in geographical mapping, he pondered the regressive element in continuity of the map image itself. He wrote,

“If a map of the entire globe was made on a sufficiently large scale, and out of doors, the map itself would be shown upon the map, and upon that image would be seen the map of the map, and so on indefinitely. If the map were to cover the entire globe, it would be an image of nothing but itself, where each point would be imaged by some other point, itself imaged by a third, etc. But a map of the heavens does not show itself at all,” (Carolyn, 300).

He often noted his own quincuncial projection map as being superior to the standard map of the time as he stated, “a Mercator’s projection shows the entire globe (except the poles) over and over again in endlessly recurring strips.” He continued, “many maps, if they were completed, would show two or more different places on the earth at each point of the map (or at any rate on a part of it), like one map drawn upon another.” His quincuncial projection map is analyzed by Pierpont as “representing one-to-one correspondence of the interior of a square by the interior of a circle of unit radius about the origin on the plane of the stereographic projection,” (Carolyn).

Peirce’s quincuncial projection map was successful as the U. S. Coast and Geodetic Survey recently published its principals while working on a major international plane air routes. Due to the accuracy of his map, the air routes are shown with the least distortion of any other map and in most situations are depicted as straight lines. This aids in understanding the true angles of intersection for air traffic as opposed to Mercator or the stereographic projections (Carolyn,307). This work is further built upon in digital mapping such as Google Earth Maps. While interacting with Google maps, users experience satellite rendered images that are overlaid into appropriately sized squares. These image overlays continue into smaller and smaller squares in conjunction with earth’s longitude and latitude coordinates.

screen-shot-2016-12-19-at-6-10-54-am

Cartography and Semiotics

            Beyond his work on the quincuncial projection, all of Peirce’s semiotic work has stood as a platform for others studying the sign systems within cartography in both politics and sciences. Geographic Information Systems (GIS) have created a new phenomenon in cartography as the digitization and high information based renderings of the environment has dramatically changed the work of many fields of science. Due to the increase of use and influence of such systems, it is imperative that the new users understand the fundamentals and traditional representative capabilities of cartography.

As users begin to gain an understanding of the ways in which maps have traditionally represented space, time, and other natural phenomena through expressive communicative powers, they are then aware of their own role in the semiotic process. Their meaning-making literacy goes through a meta-experience of not simply knowing what a sign stands for, but also being cogitatively aware of that process through interpretation. I would argue that this education could vastly increase political engagement and empowerment.

Ferdinand De Saussure discusses the recognition of two dimensions of meaning – the context-free and the socio-cultural value. This distinction is crucial for understanding any system of symbols that we come across. In the context of the complex meaning systems of maps, I found that socio-cultural value is key. Mapping is valued as a specific social sharing device, but in the case of GIS technology within environmental science, most citizens have no socio-cultural understanding of these maps because they lack relevant symbolic meaning. This socio-cultural component is, in many ways, a missing link in the sphere of environmental messaging. This missing link can be traced back to understanding symbols in a messaging language form. While each map acts the same in what Peirce would call its material-perceptible form, we as a collective then have the initial learned associations. However, in the triadic form, the response formed by such a map was only held within my own personal experience.

Our cumulative experiences with maps have changed throughout time simply based on our ability to change between the three basic classes of signs from icons, indexes, and symbols because of our abilities to capture the surrounding universe in different symbolic forms. While hand drawn historical maps are known for their geographical inaccuracies, we now use satellite and photo imagery to gain precise details of the planet and surrounding universe. However, it is important to remember these new and highly accurate depictions of the universe are still symbols of instances in time.

“These models are mash-ups of the iconic, indexical, and symbolic—none of which the interface makes clear, until one considers another element of the Peircian model of semiotics: that all signs must have an interprétant: an agentive, cognitive frame for reference,” writes Helmreich, (1226). In the case of Google Earth, we have a conglomerate of images and data collected by Google camera missions, satellites. publicly shared initiatives like data from environmental or oceanographic research studies, or the everyday citizen. It is here that we can see that see that systems of meaning can pre-shape what will count as a sign.

These interpretants are tools of use that have cultural influence. Helmrich futhers analyzes the role of semiotics in the Google Ocean application within the Google Earth application. He writes, “Artifacts in the data reveal some of the assumptions built into the human and machine intepretant ecology. The image of the real, filtered through the model, indexes its social and institutional conditions of possibility, underscoring the way that systems of meaning can pre-shape what will count as a sign,” (1226).

screen-shot-2016-12-19-at-6-14-16-am

A simple example for Google Earth is the blurring of images over the government security areas or the on going security debate between countries like China and Saudi Arabia and Google. While icons, indexes and symbols are perfused across the Google Earth platform, there are interpretant experiences that already craft what type of icons, indexes, and signs the individual interpretant has access to view, (Helmreich, 1226).

This interpretant holds an immense amount of power. To further expand upon the role of the interpretant in maps and in particular Google Earth Maps, we can look at the political acknowledgements of sovereign nation-states written and highlighted on the map. These maps are not simply satellite images taken of the planet, but highly edited and stylized renderings of the world we politically associate with.

“What we have learned from Saussure is that, taken singly, signs do not signify anything, and that each one of them does not so much express a meaning as mark a divergence of meaning between itself and other signs,” (Wood and Fels, 95). In this sense, signs allow for systems of relationships to exist through the creation of distinct working parts. In this sense we are faced with understanding the nature of systems of complexity in meaning-making systems. Wood and Fels explain this as, “what the map does (and this is its most important internal sign function) is permit systems to open and maintain a dialogue with one another,” (96). Maps form a complex systems through the distinction of these varying signs in a spatial representation of the relationships they have to one another. These distinct working parts operate interdependently acting as individual components, but also creating a hierarchal and combinatorial sign that is the map itself.

They continue that “There is nothing in the map that fails to signify,” (96). Each symbol for a river, political border, mountain is acted upon by the others. Even if there was to be a blank space left on a map, that blank space is relationally interacting with the other pieces and therefore symbolizes something. In my Fijian map example, we see just lines mostly distinguishing land from ocean. In the absence of line exist one or the other as we create the spatial representations of our universe.

Overview of Google Earth Interfaces

Google Earth provides an abundance of information for users in a variety of interfaces. It was originally only accessible via desktop but as of 2008 began to be used as a mobile app for iOS and Android. The mobile ability created a new future for Google Earth as its geolocation technologies were now used in a Web 2.0 format with mobile users producing an incredibly new amount of data. On the mobile version as well as the iPad and i-touch, Google Earth uses the multi-touch interface to explore the globe and other Google Earth spaces. The multi-touch allows for zooming and moving throughout the mapping system. It also allows for the use of the iPhone Assisted GPS to aid in crowdsourcing data.

The dependencies that must be in place for such a technology evolved from the evolution of remote sensing technology with the ability of satellites to collect data on the dimensions of earth objects below. This data is rendered into image format. This technology can be traced back to earlier remote sensing technologies combining from airplane companies such as Boeing. Google Earth users actively participate in the creation of Google Earth through taking pictures on mobile devices and also using the SketchUp software for 3D modeling. The whole program uses software to superimpose images onto the same mapping system that is interactive. The imagery is updated to higher pixels as satellite and remote sensing technologies are updated and more participants engage with the 3D rendering software, (Google Earth).

screen-shot-2016-12-19-at-6-21-01-am screen-shot-2016-12-19-at-6-21-42-am screen-shot-2016-12-19-at-6-21-59-am

Through the use of these interfaces, users come into contact with a varied collection of icons, indexes, and symbols. The likeness or icons are directly implemented through a number of interactive features including adding photography to street view. As users begin to create and formulate important locations on the map, they are able to make place-makers with descriptions and names in an indexical fashion. Finally, users are interacting with the varying sized squares of image information to better understand space and relational distance as the map acts as an entire symbol meaning-making system constantly creating generative cycles as each piece of symbolic representation interacts relationally with each other adding more and more meaning to each piece as other pieces are added and relationally observed.

Google Earth and Environmentalism

Every year, Google hosts the “Geo for Good” conference in which they discuss their goals as a company partnering with city planners and conservation organizations as a way to have their technologies such as Google Earth used for aiding projects for health and the environment. This partnership gives both Google and these NGO’s positive public support. Google Maps and Earth ended up beating out the competition of Map-quest and other’s such as Microsoft and Yahoo by making a bold move. Google avoided advertisements on the site and instead slowly integrated local businesses into their mapping, provided information about the business, gave indoor imaging to some, and fulfilled these partnerships through Google Business, (Geo for Good).

Distributed agency is given by those that use this technology as substantial amounts of scientists have begun employing Google Earth technology. However, the use of the mapping system is mainly for communication and cause marketing rather than scientific analysis. Because of this, conservation organizations have begun employing the technology in their campaigns. These organizations include the Jane Goodall Institute which provide digital mapping and ecosystem management visuals for potential donors and communities in areas of conservation. Beyond these NGO’s, we see that Google Earth has components such as offering traffic data thanks to crowdsourcing (The Jane Goodall Institute).

Affordances of Web 2.0

Manovich’s example of Google Earth as a Web 2.0 software opened my thoughts about our perceptions of globalization, computer technology, and the physical nature of our planet, (37). Never before has the physical space of our planet been so heavily monitored and documented nor have we had the capacity to use software as a precomputation with Google Earth users as distributed cognition. If societal advancement comes from our exceptional symbolic ability to offload cognitive memory, emotion, and logic into forms for reuse and distribution, the potential of computer technology in the form of Web 2.0 as a source of data monitoring in geography and planetary change seems infinite. As I continue to learn about technology for conservation, I am curious how to best design software for “precomputation” of environmental data and how to best use the internet for “distributed cognition.”

The use of Google Earth and other conservation technology tools are beginning to be broadly distributed by technology companies through non-profits. Using design principles to simplify products such as Android tablets, Google has been able to cross language, cultural, and educational borders to provide services and employment options to communities deeply affected by deforestation or other environmental hazards. These products have been specifically given to local communities in the Democratic Republic of the Congo working on primate monitoring. If the design of these products allowed such technologies to remain “blackboxed” and mystical, such institutions as the Jane Goodall Institute would be unable to access these technologies for research purposes. Through proper training of the user, these technologies are de-blackboxed into simple experiences that allow for efficiency. This user-interface design model is the key to de-blackboxing and distributing these cognitive artifacts globally (The Jane Goodall Institute).

On a more scientific note, field data scientists are able to track the range of species and create ecological niche maps based on population densities and sprawl of the species. This saves scientists an incredible amount of time in the field and allows for data collection and visualization to be collected directly on the computer for further study. Often, this visual data is used for forest monitoring and even carbon credit analysis.

Furthermore, this visualization allows for conservation organizations such as Earthwatch Expeditions to explore citizen science projects and educational sessions within classrooms. This organization uses Google Earth to reach a wide range of individuals by placing markers on participants’ local Google Earth maps that explain an ecological issue facing their environment. It then gives information about how the citizen can collect simple data or pictures of the area for scientists looking to build their research, (The Jane Goodall Institute).

One of the grand affordances of Google Earth Maps is the Web 2.0 functionality and “Google Earth Community.” The program allows citizen participants to engage in the social network of the Google Maps by making placemakers and contributing to central community knowledge of certain locations. Of course, this function does need to be monitored as any one can contribute individual knowledge that may be false or inaccurate to the local cultural standards. One such example is individuals have be observed placing false business locations in an attempt to boost advertising. However, this function is mostly used appropriately. Community members can even create overlays which can provide augmentations of their local street view or even storm paths.

Increased Semiotic Education in Environmental Projects

“We are able to store and forward symbolic thought from one generation to many others. Enabling a cumulative cultural ‘ratchet effect’ also known as ‘progress,’” (Irvine). This storage through time allows for the cumulative process in which all symbol systems evolve including maps that are known to hold the knowledge of geographical landmarks, political boundaries, and pathways to resources. The knowledge within these maps are also made from societal needs and created from the knowledge of many members of communities. They are created to be referenced over and shared throughout time while still holding the knowledge of a time in which they were created.

If we are to hope for any sort of amelioration for the environmental crises, we must employ this type of thinking into our environmental management and mapping projects. As there has been a movement towards creating distributed cognition in environmentally threatened sites through citizen-engaged projects. As the realm of environmental science is reaching out and engaging with those not trained in some of the symbolism, icons, and indexes known to the niche group of environmental scientists in the area, it has become more important than ever to not simply teach meaning to contributing citizens, but also teach the project team and citizens the meaning-making frameworks. The semiotic work of C.S. Peirce is largely overlooked in active fields of environmental science, but as stated, Peirce is potentially one of the greatest minds in foundational scientific thought.

Overall, it is imperative that citizens contributing to the Google Earth Maps have had an education in Perice’s theories of semiotics. Through the dissemination of these concepts, citizens participating in citizen science initiatives through interacting and adding to Google Earth Maps can better involve themselves in the collective symbolic creation and interpretation of the signs and symbols of interactive digital mapping. With these principles, citizens and scientists can both be reflexive about their own patterns of understanding the cartographic information in front of them and progressive in their work to collect, interpret, and disseminate their own work.

 

Bibliography

Irvine, Martin. The grammar of meaning systems,: Sign systems, symbolic cognition, and semiotics.

Saussure, F. Course in General Linguistics. 1911-1916. English translation by Wade Baskin, 1959. Excerpts.

Manovich, Lev. Software Takes Command. New York: Bloomsbury Academic. 2013.

Stanford Encyclopedia of Philosophy. Peirce’s Theory of Signs. Stanford University. 2006.

Peirce, C.S., A Quincuncial Projection of the Sphere. American Journal of Mathematics. 1879.

Eisele, Carolyn. Charles S. Peirce and the Problem of Map-Projection. Proceedings of the American Philosophical Society. 1963.

Helmreich Stefan. From Spaceship Earth to Google Ocean: Planetary Icons, Indexes, and Infrastructures. Social Research. 2011.

Fels, John. Wood, Denis. Designs on signs. Myth and meaning in maps. North Carolina State Univeristy.

Google Earth. Earth.google.org. 2016.

Geo For Good. Geoforgood.2016.earthoutreach.org. 2016

Pintea, Lillian. The Jane Goodall Institute. 2015.

From the Messenger Boy to Facebook Messenger: The Transformative Power of the Telegraph (Amanda Morris)

“Society can only be understood through a study of the messages and the communication facilities which belong to it; and that in the future development of these messages and communication facilities, messages between man and machines, between machines and man, and between machine and machine, are destined to play an ever increasing part.” – Norbert Wiener

Introduction

220px-samuel_morse_1840

Samuel Morse

   Five weeks aboard a ship in the year of 1832 was all it took for Samuel Morse to begin fostering an idea of an invention that would forever change the future of communication. An artist and a professor, Morse was returning back to the United States after spending three years in Europe improving his painting skills and beginning work on his iconic painting, Gallery of the Louvre.

telegraph-sketch  Two weeks into the voyage, Morse found himself discussing electromagnetism with a fellow passenger, Dr. Charles Jackson, who explained that electricity was believed to be capable of passing through a circuit of any length instantaneously. Through the remainder of his journey home, with the Gallery of the Louvre sitting unfinished in the cargo (Antoine, 2014), a curious Morse began the early sketches of what would eventually become the electromagnetic telegraph.

da1900-35028-lm002305   How did the telegraph change the way that we communicate today? Without the scientific advancement and communicative enhancement that this invention brought to the world, the radio, telephone, and computers may have looked and operated very differently, if they could exist at all. It is often forgotten that the telegraph was one of the first inventions to connect the technical with the humanistic; a combination that the human mind has the tendency to seperate. However, the telegraph is a prime example of how these two disciplines have come together to enhance the ways in which humans communicate and understand the world around them. 

The Telegraph

220px-reess_cyclopaedia_chappe_telegraph   It is often forgotten that Samuel Morse was not the first inventor of the telegraph, nor was he the only individual with the idea for an electric telegraph. While Morse remained relatively ignorant of the work of others, across the world, scientists and scholars were attempting to create the very same concept. It should be noted that only after various fundamental discoveries in chemistry, magnetism, and electricity could a practical electromagnetic telegraph come to be. Before the electromagnetic telegraph came attempts at communication using shutter systems and the semaphoric telegraph, both of which communicated visually using towers and pivoting shutters.

   In the 1790s, Galvani and Volta revealed the nature of galvanism – the generation of electricity by the chemical reaction of mixing acids and metals, and in 1820, electromagnetism was discovered by Hans Christian Oersted and Andre-Marie Ampere. In the 1820s and 1830s, scientists and inventors from across the globe were working to create a working and practical electric telegraph, perhaps most notably William Cooke and Charles Wheatstone in England. Yet, many of these inventors ultimately reached a roadblock: electromagnets were only so powerful, and mechanical effects were not being produced from a distance. Morse ran into the same problem. However, he eventually met and began working with a fellow American, Joseph Henry, who in 1831 solved this critical problem by replacing the customary battery of one large cell with a battery of many small cells instead (Beauchamp, 2001; Czitrom, 1982; Standage, 1998).

telegraph-prototype   The electric telegraph advanced the way that people communicated. It included an information source which, with the help of a human, produced a sequence of messages to be communicated to the receiving terminal.  It included a transmitter which operated on the message in order to produce a signal that was suitable for transmission over a channel – the electric wire. There was a receiver t the other end who reconstructed the message sent by the transmitter. And through the receiver, the piece of communication eventually reached its destination – the person for whom the message was intended. The telegraph is an example of a discrete system of communication, where both the message and the signal are a sequence of discrete symbols; the message is a sequence of letters and the signal is a sequence of dots, dashes, and spaces (Shannon, 1948).

Communicating through Morse Code

Before learning more about Morse code specifically, it is important to distinguish the difference between a code and a cipher, mainly because Morse’s original idea for communication through the telegraph was to use a cipher.

Code: When letters of the alphabet are replaced by symbols. An important group of codes used in telegraphy are the two-level, or binary, codes, of which the Morse code is the best known example. (Beauchamp, 2001)

Cipher: When the letters containing a message are replaced by other letters on a one-to-one basis, meaning that the message will not be shortened. This concept was introduced into the operation of the mechanical semaphore toward the end of its period use. This type of communication requires a cipher-book (which differs from a code book) and a higher order of accuracy in transmission. (Beauchamp, 2001)


As he worked on perfecting the telegraph, Samuel Morse and his team were also experimenting with how exactly two people could communicate through this invention. Morse originally intended on using a cipher in which all of the words of the English language would be assigned a specific and unique number, and only the number would be transmitted. However, this idea was eventually replaced by the American Morse Code – an alphabetic code where each letter and number, and many punctuation signs and other symbols, are represented by a combination of dots and dashes.

international-morse-codeMorse and Alfred Vail, an inventor who worked with Morse on the telegraph, designed the code by counting the number of copies of each letter in a box of printer’s type, ensuring that the most common letters had the shortest equivalents in code. This duration-related code had never before been considered by other inventors working on creating an electric telegraph. However, it is perhaps because of this duration-related code (as opposed to say, Cooke & Wheatstone’s polarity-related needle indication), that it was Morse’s version of the telegraph and code that changed and jump-started the entire telegraph industry (Beauchamp, 2001). When American Morse code reached Europe, a number of changes were made and the International Morse Code was created. This became the standard for almost a century, with a 1913 international agreement requiring the American code to be replaced.

The coding system that was often used in electric telegraphy was a proto-binary code, which means that it was recognized by either the duration or the polarity of the transmitted electric impulse. According to his own personal notes, Morse defined four principle features of the telegraph. It was a marking instrument, consisting of a pencil, pen, or print-wheel. It used an electromagnet to apply pressure to the instrument on a moving strip of paper. It was a system of signs – i.e. the Morse code – that identified the information that was transmitted. And lastly, it was a single circuit of conductors (Beauchamp, 2001).

The Significance of Combining Machine and Code

“If the presence of electricity can be made visible in any desired part of the circuit, I see no reason why intelligence might not be instantaneously transmitted by electricity to any distance.” – Samuel Morse

As the first electric telegraph line began to experience success in 1844, an entirely new era of modern communication was established in America, and eventually, around the world. The electric telegraph introduced a significant change in the way that humans communicated – this was the first time in history where some method of transportation was not required in order to communicate; the telegraph introduced instantaneity to the world (Czitrom, 1982).

messengers-a-messenger-a-servant-that-worked-for-the-lord-iohjie-clipartCommunication had previously relied on a middle party – the messenger. If people did not live together or find themselves as neighbors, their communication across distance was only as quick as the messenger. In many instances, the telegraph eliminated the messenger and introduced the beginnings of what would evolve into the rapid networks of communication that we know today. The telegraph eliminated dependence on time and distance by connecting people through electricity and code.

illus001Expanding further, the electric telegraph expanded on the concept of communicating through code. While humans had always been communicating and making sense of their world through symbols (i.e. art as a means of communication), the electric telegraph created a combination the world had never before seen: electricity and code. As Daniel Czitrom wrote in his book, “Media and the American Mind,” the telegraph served as a “transmitter of thought” where human cognitive understanding was combined with electricity and the machine.

a114_web-drawing-abc-telegraphThe electric telegraph supported one main idea: it assigned the humanistic symbolic values of a system of signs (Morse code) to the scientific process of electric currents in a switched circuit that could electromagnetically imprint marks and sounds to process the code. The simple on/off switches found in the telegram paved the way for the beginning of binary switches that could be found in the first computer designs (Irvine).

Coding and Computing

Because the telegraph introduced the design and production of technical equipment in a pre-electronic age, we now know a great deal more about data compression, error recovery, flow control, encryption, and computer techniques. The beginning of the internet was influenced in part by the pioneers involved in the coding of the telegraph (Beauchamp, 2001).

differenceengineSlightly before Samuel Morse began work on the electric telegraph, Charles Babbage began designing a different kind of machine that he hoped would be able to compute and produce certain kinds of mathematical tables without human intervention. This early idea of automatic computation was the beginning of what we now know as computer science.

codecodeIn order to understand the process, it is important to understand the terminology associated with key words. While, according to etymology, computation refers to the idea and act of calculating, Subrata Dasgupta writes in “It Began with Babbage” that computation is comprised of symbols – things that represent other things – and “the act of computation is, then, symbol processing: the manipulation and transformation of symbols.” Dasgupta points out that “things” that represent other things could include a word that represents an object in the world, or a graphical road sign that contains meaning to motorists (Dasgupta, 2014).

How does the electric telegraph relate to computing? Morse code is an essential factor. Samuel Morse was able to combine symbols – code that represented words that held meaning to humans – and share this very humanistic code rapidly, using a very scientific method of electrical switches. Samuel Morse and his team were some of the first people to begin paving the trail for what we’re still figuring out today – how to encode different types of switches on our technological devices so that we can communicate more rapidly and effectively.

python-morse-code-exampleMorse’s electric telegraphy is an example of a discrete noiseless channel for relaying information; a sequence of choices that come from a finite set of elementary symbols (Morse Code). Each of the symbols has a certain but differing duration of time depending on the amount of dots and dashes that is contained in each individual code. The symbols (code) can be combined into a sequence, and any given sequence can serve as a signal for the channel. Morse code helped to enact the idea of combining math and communication, the humanistic and the scientific, by introducing the question of how an information source could be described mathematically, and how much information – in bits per second – could be produced in a given source (Shannon, 1948). It is in this way that the process of transmitting Morse code over the telegraph served as a precursor to the process of encoding and decoding that is now used in modern technology such as computers. The Morse code messages that were transmitted contained a sequence of letters that often formed sentences which contained a statistical structure of a human language, such as English. Thus, certain letters appeared more frequently than others. By correctly encoding the message sequences into signal sequences, this structure allowed humans to save time, as well as channel capacity, while communicating (Shannon, 1948).

Closing

The electric telegraph and the Morse code that accompanies it, is a prime example of how communication can be seen as a means for one mechanism (for example, the code transmitted through electric telegraph/message sent) to directly affect another mechanism (for example, rapid reception of news) (Shannon, 1949). It is because of the ideas of Morse, Babbage, and countless others that humanistic ideas of symbolism can be combined with scientific technological advancements to continually enhance the ways in which humans connect.

codingMorse’s idea is still alive and well in today’s computers. Just like the presence and absence of electricity in certain parts of a circuit (binary states) was used to send a code that represented human signs and symbols, today’s computers continue to use this combination of electricity and human signs and symbols to code the machines and devices that allow us to communicate (Irvine, 2). This is signifiant, considering how our technology has changed the way that we send and receive messages, thus changing the ways that we communicate, and furthermore, change the ways in which we understand society (Packer & Jordan, 2001). 

r_1bahxmp5iToday, humans use a similar but much more advanced concept of coding to program our electrically-powered digital devices. However, the purpose of this evolved and modern process remains very much the same as that of the electric telegraph: to communicate and connect in the most rapid and effective way possible. It is essential to realize that the sciences and humanities go hand-in-hand when thinking about how we have communicated since the invention of the electric telegraph. Without code, there would be no way to communicate, and without mathematics, there would be no way to transmit the code. 

Works Cited

Antoine, J. (2014). Samuel F. B. Morse’s Gallery of the Louvre and the Art of Invention. Brownlee, P. (Ed.). New Haven, CT. Yale University Press.

Beauchamp, K. (2001). A history of telegraphy: its technology and application. Bowers, B., & Hempstead, C. (Eds.). Exeter, Devon: Short Run.

Dasgupta, S. (2014). It began with Babbage: the genesis of computer science. New York, NY: Oxford University Press.

Czitrom, D. (1982). Media and the American Mind. Chapel Hill, NC: University of North Carolina.

Irvine, M. A Samuel Morse Dossier: Morse to the Macintosh Demonstration of the Morse Telegraph: Electric Circuits and “A System of Signs.” Georgetown University.

Packer, R., and Jordan, K. (2001). Multimedia: From Wagner to Virtual Reality. New York, NY: W.W. Norton & Co.

Shannon, C. (1948). A Mathematical Theory of Communication. The Bell System Technical Journal, 27, 379-423, 623-656).

Shannon, C., & Weaver, W. (1964). The Mathematical Theory of Communication. Urbana, IL: University of Illinois.

Standage, T. (1998). The Victorian Internet: The remarkable story of the telegraph and the nineteenth century’s on-line pioneers. New York, NY: Walker.

Big Ideas and Small Revolutions: Learning, Meaning, and Interface Design

By Rebecca N. White

 

Abstract

Alan Kay helped revolutionize computing, but it was not quite the revolution he wanted. With his Dynabook in the 1970s, he aimed to teach children to program so they could experiment with and learn from personal computers, and to help humans’ thought processes adapt to the digital medium. What is the legacy of this interactive, educational vision?

I seek to answer this question by looking at Kay’s ideas in the context of C. S. Peirce’s meaning-making models and Janet Murray’s work on twenty-first-century interaction design, which is rooted in semiotic principles. Exploring Kay’s vision in this way sheds light on how technological developments interact with natural human meaning-making processes, revealing principles that make good digital design to augment humans’ cognitive processes and that help technologies become societal conventions. For this project, I conducted a textual analysis of primary-source material from Alan Kay Janet Murray in the framework of C. S. Peirce.

While Kay’s educational ideas are evident in many of today’s technologies, a semiotic analysis reveals that Kay was perhaps pushing humans to be too computer-like too quickly. Interactions with computing systems must satisfy human expectations and meaning-making processes.

 

Introduction

Using computing technology today is both a social and a personal experience. Video games driven by powerful graphics cards play on large, flat-screen monitors without lag, allowing users to spend a Friday night at home alone navigating complex simulations or connecting with gamers across the globe in massive online worlds. Touchscreen devices that fit in purses and pockets line up icons that are gateways to telephonic capabilities, web browsers, music streaming applications, and more in conventional grid layouts. A range of portable computers with full keyboards weigh under 3 pounds but can still deliver users to more externalized memories than they would ever need in their lifetimes. And with their personal computing innovations in the 1970s, Alan Kay and the Learning Research Group at Xerox PARC helped bring this world into being, providing the inspiration for technical developments for decades to come.

Kay’s[1] widely implemented technical vision for the graphical user interface and more was driven by media, communication, and learning theories. Yet, a simple-sounding, non-technical idea that Kay put forward has not broadly caught on (alankay1 2016). An educational goal was at the center of the vision: to create a “metamedium” that could “involve the learner in a two-way conversation” (Kay and Goldberg). The aim was for users to be able to write their own programs, not just use prepackaged ones, so they could experiment and learn. The proposed device, called the Dynabook, was intended for “children of all ages” (Kay 1972), but Kay focused heavily on the potential for youth to learn by doing (Kay and Goldberg).

From Kay 1972

From Kay 1972

This was a truly bold vision. Kay sought to launch what he calls today the real computer revolution—the one in which humans’ thought processes adapt to the possibilities presented by the digital medium (alankay1 2016).

These ideas are part of a broader process of meaning making and knowledge building that C. S. Peirce has described. And Kay was building on a long legacy of innovations and ideas about augmenting human intelligence with computing systems, from Charles Babbage to Samuel Morse to Claude Shannon to Douglas Engelbart and J. C. R. Licklider.

Interaction designer Janet Murray is operating in the environment that this history made possible. It is an environment in which humans interact with computers, not just use them as tools, and in which digital design is focused on that interaction. It is a space in which people dabble with new frontiers of technology, such as virtual reality. It is an age in which ideas proliferate, and some exceed technical capabilities. Murray strives to add some design structure to this at times chaotic interactive environment, with the aim of giving humans agency in interaction and amplifying their meaning-meaning processes.

To begin exploring these topics and the ways in which Kay’s educational ideas have developed over time, I asked the question: What is the legacy of Alan Kay’s interactive, educational vision for personal computing? I sought to answer this question by looking at Kay’s ideas in the context of C. S. Peirce’s meaning-making models and Janet Murray’s work on twenty-first-century interaction design, which is rooted in semiotic principles. This analysis has implications beyond tracing the Dynabook’s legacy. Exploring Kay’s ideas through semiotic models sheds light on how technological developments interact with natural human meaning-making processes. It reveals general principles that make good digital design to augment humans’ cognitive processes and that help technologies become societal conventions. For this project, I drew on a textual analysis of primary-source material from Alan Kay and Janet Murray conducted with a Peircean framework. To understand Kay’s influences, I also turned to the media theories of Marshall McLuhan and the work of Kay’s colleagues and predecessors, including Engelbart, Licklider, Vannevar Bush, and others, supplemented by my analysis of current technological developments.

Kay and Murray are united around the idea that digital devices should be designed in a way that helps humans build knowledge. Yet, the two diverge in approach. Murray’s focus is on humans and computing systems meeting in the middle—what the computer can do procedurally must match the user’s expectations, not the other way around. Kay too wanted devices and interfaces that matched the way humans make meaning, but he sought to make human thinking more procedural, to quickly adapt thought to the way computers process information. Additionally, Murray’s theories are firmly rooted in society’s communal process of making meaning, while Kay is focused on individual learning, often seeming to overlook the significance of collective processes.

The educational vision Kay put forward remains relevant, and his ideas are evident in many of today’s technologies. However, a semiotic analysis reveals that Kay was perhaps pushing humans to be too computer-like too quickly.

 

A Vision for Learning

The plans for personal computing developed at Xerox PARC required rethinking hardware and programming to build a computing system that was not just a passive recipient of information but an active participant in a creative process. Much of the technical vision was widely implemented. The graphical user interface and overlapping windows, object-oriented programming, and the use of icons in interfaces that we know today were all born at Xerox PARC (Kay and Goldberg 2003). Yet, Kay’s precise educational vision, which built on Seymour Papert’s and other’s work (Kay 2001), did not catch on as intended (Manovich 2013). Over forty years since it was first introduced, the plan to teach every child how to program and set him or her up with a digital teacher with which to experiment has not been widely adopted by industry or educational systems (alankay1, Greelish). Yet, when these ideas are viewed in broader terms of human meaning making and knowledge building, aspects of Kay’s learning vision are apparent in many areas.

Augmentation and Communication

The idea that the human mind needs to be understood before designing interfaces motivated Kay’s ideas about computers and human interaction with them. He was there at the birth of user-interface and human-centered design, if not its only father. According to this way of thinking, humans are active users not passive consumers. A computer isn’t just a tool, but rather a “metamedium” that combines other media and is an extension of thought (Kay 2001).

Humans have long used tools to extend their abilities and help them navigate the world, as well as for more symbolic purposes (Donald 2007, Renfrew 1999). And these processes were firmly rooted in an external and networked process of making and sharing meaning. Language, writing, and literacy allowed humans to store memories externally and transmit them to future generations, aiding knowledge building and cultural progress (Donald 2007). Humans extend their cognition to these systems, which Clark and Chalmers describe as “a coupling of biological organism and external resources.” Language is one tool that extends cognition in this way (Clark and Chalmers 1998, 18).

Kay operates in this spirit and in many cases has pointed out the importance of language, but he is also firmly situated in the realm of digital thinking—using computing systems that deal in abstract symbols that humans can understand and machines can execute to aid thinking. As he wrote, “language does not seem to be the mistress of thought but rather the handmaiden” (Kay 2001). For Peirce, like Kay, the linguistic and other symbolic systems are just one part of a broader system of logic and meaning-making processes.

With his focus on computing technologies to extend cognition, Kay was also building on the work of Douglas Engelbart, Ivan Sutherland, J. C. R. Licklider, and others (Kay 2004). Licklider termed this “man-computer symbiosis” (Licklider 1990). Sutherland developed the Sketchpad and light pen, which made graphical manipulation possible via an interface, creating a computing device with which humans could begin to be visually creative (Sutherland 2003). Engelbart developed foundational interface ideas and technology, such as the mouse, to change human behavior and augment human intelligence (Engelbart 2003). Often, Kay nods to these and other innovators. In one recent conversation he described his aims in the broader context, “We were mostly thinking of ‘human advancement’ or as Engelbart’s group termed it ‘Human Augmentation’ — this includes education along with lots of other things” (alankay1 2016).

The individual was Kay’s focus. He wanted to build a personal computer with which the user shared a degree of “intimacy.” In his conception, achieving that intimacy required users to be able to both read and write with the computer, to make it truly theirs. He sought to adapt computing devices to the way humans think while also changing the way humans think (Kay 2001).

At the center of Kay’s ideas were principles of communication and meaning making. He has often described a revelation he had when reading Marshall McLuhan’s work on media: the computer itself is a medium. It is a means for communicating information to a receiver that the receiver can then recover and understand. Kay took this further and interpreted McLuhan’s work as saying the receiver must become the medium in order to understand a message—an idea that would drive his conception of human-computer interaction as an intimate connection (Kay 2001). Referring to Claude Shannon who pioneered a theory for conveying information in bits without noise, Kay recently put his general thoughts on process and meaning this way:

The central idea is “meaning”, and “data” has no meaning without “process” (you can’t even distinguish a fly spec from an intentional mark without a process.
One of many perspectives here is to think of “anything” as a “message” and then ask what does it take to “receive the message”?
People are used to doing (a rather flawed version of) this without being self-aware, so they tend to focus on the ostensive “message” rather than the processes needed to “find the actual message and ‘understand’ it”.
Both Shannon and McLuhan in very different both tremendously useful ways were able to home in on what is really important here. (alankay1 2016)

In the same discussion, Kay elaborated on the Shannon ideas: “What is “data” without an interpreter (and when we send “data” somewhere, how can we send it so its meaning is preserved?). . . . Bundling an interpreter for messages doesn’t prevent the message from being submitted for other possible interpretations, but there simply has to be a process that can extract signal from noise” (alankay1 2016). He is tackling this idea of meaning from both a technical—as in, extracting signals from noise—and a human perspective.

Beyond Shannon and McLuhan, this sounds much like Peirce’s triadic conception of the process of meaning making. In this model, a human makes meaning (an interpretant) by correlating an object (a learned concept) and a sign (a material and perceptible representation). Reception is key in this model as well. What are signs without an interpretant? There is no meaning without the human process of recognition and correlation (Irvine 2016a). Peirce also came to call his signs mediums—that is, interfaces to meaning systems, or instances of broader types (Irvine 2016b)—an interesting parallel to Kay’s revelation that the computer is a metamedium. Moreover, Peirce was very focused on process, but in a way slightly different from Kay. The process of meaning making with symbolic systems is dynamic and is always done in a broader societal and communal context. Communicated information can only be understood if the sender and the receiver are drawing from the same conventional understandings (Irvine 2016a).  Kay does not seem to fully account for the communal aspect of these processes.

The Building Blocks of Learning

Working with this internal meaning-making framework, Kay drew heavily on ideas about the nature of children’s thought processes. He wanted a personal device that could match the meaning-making processes of children at their individual developmental levels (Kay 1972). The design of the computer interface needed to be tied to natural learning functions (Kay 2001). As Kay put it recently, “For children especially — and most humans — a main way of thinking and learning and knowing is via stories. On the other hand most worthwhile ideas in science, systems, etc. are not in story form (and shouldn’t be). So most modern learning should be about how to help the learner build parallel and alternate ways of knowing and learning — bootstrapping from what our genetics starts us with” (alankay1 2016). He demonstrated some of these learning techniques using modern technology in a 2007 TED Talk (if the player is not working properly, the video can be viewed on TED’s website; jump to 12:15 for the clip):

Kay’s primary influence when it came to cognitive development was Jerome Bruner, though he was also inspired by other developmental psychologists and Seymour Papert’s educational work with LOGO. Most influential in Kay’s interface efforts were Bruner’s descriptions of children’s stages of development and three mentalities—enactive, iconic, and symbolic. The enactive mentality involves manipulation of tangible objects; iconic or figurative involves making connections; and symbolic involves abstract reasoning. Additionally, Kay recognized that a human’s visual, symbolic, and other systems operate in parallel (Kay 2001).

According to Kay, a computing device needed to be designed to serve and activate all of these areas if it was to encourage learning and creativity, one of his overarching goals for children and adults alike. He sought to combine the concrete and the abstract. Based on these principles, he developed the motto “doing with images makes symbols” (Kay 2001).

Although Peirce’s conception of sign systems did not involve clear stages of development and he was not modeling internal cognitive processes, his ideas roughly correspond to Kay’s. The enactive mentality is about how humans interact with the material signs in their worlds. It is a tactile and action-oriented experience with the outside world that provides input to humans with which they then make meaning. The iconic mentality could map to two of Peirce’s signs: the iconic and the indexical, which represent and point to objects, respectively. The symbolic realm appears to be the same in both conceptions—abstractions and generalizations made from other signs. Kay’s motto, “doing with images makes symbols,” could thus be broadened to “doing with signs makes meaning.”

The computing device Kay envisioned would help users make their own generalizations and abstractions from digital symbols (Kay and Goldberg), which are themselves abstractions. This is the process of meaning making and knowledge building with signs that Peirce describes (Irvine 2016a). And in the Peircean sense, the computers and their parts are signs as well, the material-perceptible parts of humans’ symbolic thought processes (Irvine 2016b).

Central to Peirce’s conception is the dialogic nature of sign systems. That is, the individual process of meaning making is based on conventions shared with other humans (Irvine 2016a). In contrast, Kay focuses on the “interactive nature of the dialogue” between the human and the computing system, another symbolic actor in a sense (Kay and Goldberg). It is almost as if Kay views computers as on the same level as humans in terms of the symbolic dialogue. This thought process emerged clearly in a recent Q&A. A questioner asks specifically about tools to encourage the communal process of knowledge building, and Kay brings the conversation back around to individual human and human-computer processes (in addition to criticizing the interface he has to use to respond to the questioner) (alankay1 2016).

Alan Kay responds to a question in a Hacker News thread

Alan Kay responds to a question in a Hacker News thread

In this case, Kay’s conception is both in line and in conflict with Peirce’s. Particularly in more recent writings after the spread of the internet, it appears that Kay recognizes the communal network of human meaning making and extends it to computers. It is not just about augmenting human intellect and increasing creativity on a personal, internal level. Rather, those interactive processes stretch far beyond the coupling of one human with a personal computer. However, in the Peircean conception, a computer is not the same kind of symbolic actor as a human. Computing systems cannot make meaning. They can convey information with which humans can then make meaning due to their capacity for abstraction and generalization, but they do not make correlations in the same way.

The Dynabook

The technical computing revolution all began with these ideas. Focused on the human-computer dialogue, Kay set out to translate these principles into a personal computing vision. Kay and Goldberg envisioned a metamedium that could simulate and represent other media. The vision took the form of the Dynabook, though many of its components were incorporated into other computing devices as well.

Part of this development process involved conceptualizing the foundational concepts in terms of the affordances of the digital space, but it also entailed shifting the way users approached computing systems. As Kay put it, “The special quality of computers is their ability to rapidly simulate arbitrary descriptions, and the real [computer] revolution won’t happen until children learn to read, write, argue and think in this powerful new way” (Kay 2001). He wanted to alter the way people approach digital technologies, and still sees that as the aim. In a recent Q&A, he warned: “children need to learn how to use the 21st century, or there’s a good chance they will lose the 21st century” (alankay1 2016). Kay at times calls his educational vision the “service” conception of the personal computer.

Regardless of terms applied, the idea was bold and far-reaching. As Kay wrote, “man is much more than a tool builder . . . he is an inventor of universes” (Kay 1972), and he sought use computers to make the most of that potential. The intention was for humans, particularly children, to be able to program the system themselves and learn concepts by experimenting in the digital space. Kay described the idea and experiments with computing technology in detail in “A Personal Computer for Children of All Ages” and “Personal Dynamic Media,” co-written with Adele Goldberg.

From Kay and Goldberg

From Kay and Goldberg

The underlying, meaning-making principles on which Kay drew translated into physical design choices (Greelish 2016). To encourage creativity and based on his understanding of the iconic mentality, Kay envisioned an interface that presented as many resources on the same screen as was feasible. To meet this need, Kay created overlapping windows using a bitmap display presenting graphic representations that users could manipulate with a pointing device—a mouse. The Smalltalk object-oriented programming language was an outgrowth of Kay’s understanding of how people process messages, and it unified the concrete and abstract worlds “in a highly satisfying way” (Kay 2001). The language was intended to be easy to use so even children could create tools for themselves and build whatever they wanted to in the metamedium. Users could personalize a text editor to process linguistic symbols as they saw fit, or create music and drawing programs. The Dynabook itself was intended to be lightweight, portable, able to access digital libraries of collective knowledge, and able to store, retrieve, and manipulate data (Kay 2001, Kay 2004, Kay and Goldberg).

 

Where These Ideas Took Us

Although many of his technical conceptions are ubiquitous, Kay’s somewhat utopian vision of world in which each child had an individualized computer tutor with which to experiment through programming did not take off. This was at least in part because Kay was bound by the technical capabilities of the day, not to mention the magnitude of the task of shifting bureaucracies and ingrained human processes built up over centuries.

The devices Kay described in his early papers had to first be created before they could be used widely to enact his service vision. This was, after all, a new medium. The seeds of his ideas grew out of existing conventions like editing, filing systems, drawing, and writing (Kay and Goldberg), so they were somewhat familiar to users and activated existing human knowledge. But the possibilities afforded by the digital space were just being probed when Kay was first writing (Kay 1972). Technologies that we today think of as commonplace and some that have not yet come to fruition were being invented back then. For example, Kay hypothesized that the technology could “provide us with a better ‘book,’ one which is active (like the child) rather than passive.”

And Kay always intended for the initial ideas to grow and evolve, summed up in the “finessing style of design” employed at Xerox PARC (Kay 2004). These ideas were not meant as the end-all-be-all. Kay and his cohort imagined that others would not just improve upon them but also produce new innovations.

Yet, Kay hinted that there could be problems with the metamedium conception as well. He and Goldberg doubted that a single device could be pre-programmed to accommodate all user expectations. It was better to allow the user to customize the device as they saw fit. Kay and Goldberg explained: “The total range of possible users is so great that any attempt to specifically anticipate their needs in the design of the Dynabook would end in a disastrous feature-laden hodgepodge which would not be really suitable for anyone” (Kay and Goldberg).

That is, in many ways, what happened. Today, computing devices are frequently used for passive consumption of other forms of media. Users do create with current computing systems, but that creativity is constrained by software that has been programmed by someone else. The ability to program machines remains the purview of those with specialized skills (Manovich 2013).

Photo by Walden Kirsch, licensed under the Creative Commons Attribution-Share Alike 2.0 Generic license

Photo by Walden Kirsch, licensed under the Creative Commons Attribution-Share Alike 2.0 Generic license

Kay, for one, is not satisfied with the way in which computing technology has evolved, and has bemoaned the lack of innovation and the current state of computing. To him, people are mostly tinkering around the edges of existing conventions and not thinking about inventing for the future. Overall, there is not enough emphasis on the services side of his original ideas. Current programming languages remain too abstract and are not user-friendly enough. He would like to see languages that are scalable and easier for humans to use. Kay criticizes tablets and other systems that do not use pointing devices, which are necessary for the enactive mentality. He still thinks simulations are an important part of the computer revolution but is not satisfied with any, although he describes NetLogo as interesting. And he widely criticizes user-interface designs, particularly those from Apple, as not being easy enough for their users to manipulate and personalize (alankay1 2016, Kay 2004, Oshima et al 2006). One recent example comes from a Q&A (alankay 1 2016):

hackernews2

Alan Kay responds to a question in a Hacker News thread

Despite his specific criticisms, many of the general principles Kay put forward, which were based on his conception of thought and learning processes, can be seen today. For instance, today design fields exist that are focused on user interfaces and human-computer interaction, which is a significant change in and of itself (Kay 2001). And interactive computation that can better predict emergent behaviors and better respond to humans’ mental models is an active area of research that could perhaps make it unnecessary to teach children programming in order to achieve Kay’s aims (Wegner 1997, Goldin et al 2006).

Although software is not open to programming by children, many educational tools have been and are being developed that draw on interactive principles pioneered by Kay and that can react to a learner’s needs. Duolingo, a language-learning app built using collective intelligence and that adapts to users’ learning levels, is just one example. Broader initiatives to incorporate computation in early childhood exist as well. Code.org, backed by Google, Microsoft, Facebook, and others, seeks to make computer science accessible to all children. Active learning practices incorporate many of the principles Kay sought to foster using computing technologies (Center 2016). Jeanette Wing describes computing as the automation of abstractions and seeks to teach children how to think in this way (Wing 2006, Wing 2009). Ian Bogost argues that procedural literacy, based on computing processes, should be applied outside the realm of programming to teach people how to solve problems (Bogost 2005). The One Laptop Per Child initiative, which Kay mentions in the video above, seeks to give children the metamedia with which to experiment.

The list goes on, but perhaps most true to Kay’s vision is MIT Lifelong Kindergarten’s Scratch project. This is no surprise given that the Media Lab of which this project is a part was co-founded by Nicholas Negroponte, who was also influenced by Papert and worked with Kay and Papert on the One Laptop Per Child project (MIT Media Lab). Kay’s Squeak Smalltalk language formed the backbone of Scratch (Chen, Lifelong 2016b), which seeks to help “young people learn to think creatively, reason systematically, and work collaboratively — essential skills for life in the 21st century.” And it allows all users to program, create, and share their creations. Although anyone can use the platform, educators are encouraged to use it as a learning tool, and resources are provided to help teachers on that front (Lifelong 2016a). Thanks to the internet, this project can go directly to educators and students, rather than proponents having to navigate educational systems as would have been necessary in the 1970s.

 

Designing for the Networked, Metamedia World

Janet Murray is operating in this context, and encouraging others to think more like Kay did in the 1960s and ’70s. She takes the new form of representation Kay helped create, the metamedium, and lays out principles of design to maximize meaning and user (“interactor”) agency or interactivity. To do this, she says, designers should deconstruct projects into components and then rethink them in terms of the affordances provided by the digital space.

Murray draws on a range of different fields and acknowledges deep historical context, generally operating from a semiotic, Peircean perspective. Like Peirce, Murray is thinking abstractly and trying to build out a general model in a sense. Her model is of the digital design process, and she seeks to extract common, general principles that can be applied regardless of project. The model is a component of the broader meaning-making system described by Peirce.

Like Kay, Murray approaches computing systems as media and not tools. Media, she says, “are aimed at complex cultural communication, in contrast to the instrumental view of computational artifacts as tools for accomplishing a task” (Murray, 8). Stemming from this, she discourages use of the word “user” and the phrase “interface design,” as they are too closely related to tools (Murray, 10–11).

How she would prefer to describe these human-computer processes sounds much like Kay’s vision: “an interactor is engaged in a prolonged give and take with the machine which may be useful, exploratory, enlightening, emotionally moving, entertaining, personal, or impersonal” (Murray 2011, 11). This idea is, in a sense, broader than Kay’s, which was intended to be somewhat narrowly focused on children’s learning processes.

But at base, both attempt to tap into broader human processes of making meaning—the process described by Peirce in which a human forms an interpretant from an object and a sign/medium. Human beings, as members of the symbolic species, are unique in that they operate in the realm of abstractions and generalizations. They can provide computing systems with symbols that those systems can then execute—because humans have drawn on their symbolic capacities to build them that way. And humans can make meaning of the symbols that the systems return. Each new abstraction creates new meaning, building knowledge—which is the process of learning in a general, non-psychological sense.

The job of designers, according to Murray, is to use code to design digital artifacts that meet interactors’ needs and expectations, allowing them to form those new correlations—as Kay sought to do with his original designs. This involves using existing conventions in new ways, to signal certain meaning correlations to users (Murray, 16). Conventions allow humans to recognize patterns amid complexity and noise. Those patterns, or schema in the cognitive science sense, are built from experience (Murray, 17). Users must be able to make meaning and connections out of what they have in front of them; in Peirce’s terms, a system should not be so foreign that it prohibits users from extracting features and forming interpretants based on existing knowledge (Irvine 2016b).

The affordances of the digital medium help designers achieve these aims. An interactive system that is successful will create “a satisfying experience of agency” for the user by matching the digital medium’s procedural and participatory affordances—that is, the programmed behaviors of the system and the expectations of the users (Murray, 12). Kay developed one of the types of languages used to encode those behaviors—object-oriented programming.

Kay’s work also laid the groundwork for participatory affordances. Murray’s description of this topic takes those foundations for granted: “Because the computer is a participatory medium, interactors have an expectation that they will be able to manipulate digital artifacts and make things happen in response to their actions” (Murray, 55). That expectation is possible in part because of Kay’s original vision; this is essentially his “doing with images makes symbols.” Kay, however, sought to take this further, and to transfer more agency to users by allowing them to design their own programs to meet their knowledge-building needs.

There are also spatial and encyclopedic affordances of the digital medium. The former is about visual organization, and it builds off of what Kay initially created with the graphical user interface. This graphical organization involves the abstractions made up of bits of information that have come to signal particular meanings to users of computing systems, such as file folders, icons, and menus. Here too, as when Kay was designing the Dynabook, the focus is on meaning making and tapping into human thought processes: “Good graphic design matches visual elements with the meaning they are meant to convey, avoiding distraction and maximizing meaning” (Murray, 76). In Peirce’s terms, the perceptible signs (designs) correspond to objects, and humans correlate the two to make meaning. Murray argues, harking back to Shannon’s information theory, that designs should minimize noise so the interpreters can make maximum meaning.

The encyclopedic affordance, meanwhile, stems from the vast capabilities of computing technology to store information that humans can retrieve and process. This enables cultural progress and collective knowledge building because these technologies can store vast amounts of information for use over time, allowing many humans now and in the future to form interpretants from the same information. Kay thought of this as well in his Dynabook conception, discussing the use of personal computers to access digital instances of books or libraries full of information through the LIBLINK (Kay 1972). In 2015, he even wrote about the challenges of ensuring this wealth of externalized memory can be accessed by future generations (Nguyen and Kay). And he was reared in the culture of the Advanced Research Projects Agency (ARPA), which focused on “interactive computing” in a “networked world” (Kay 2004). Yet, one area Kay does not spend much time commenting on is the dialogic, communal nature of meaning making, remaining focused on the individual experience.

This nature factors centrally into Murray’s thinking. She focuses on meaning making as not just an individual but also a social activity; humans interpret digital media based on both personal and collective experiences. Interaction with digital media, she says, necessarily involves interpretation of artifacts within broader cultural and social systems (Murray, 62). Interactors also use computing technology to access and interact with other people and broader cultural systems (Murray, 11). Drawing on this dialogic nature of the symbolic computing system, Murray calls for using existing media conventions to actively contribute to and develop the collective, or as she puts it, “to expand the scope of human expression” (Murray, 19).

This meshes with Peirce in many ways. According to his semiotic model, meaning is always communal, intersubjective, collective, and dialogic (Irvine 2016a, 2016b). Signs are the ways in which we communicate meanings to others, and those meanings are always made in the context of collective understanding, drawing on existing conventions so others may make their own correlations. Humans can communicate in ways members of their society understand because they can communicate in mutually agreed-upon symbols (Irvine 2016a). Digital technologies offer ways to externalize and share the meanings interactors make from these collective systems.

Still, intersubjectivity does not mean that the same signs lead all humans to make the same correlations. Interpretant formation is necessarily based on context, and each individual interprets a perceptible sign based on their individual experiences and perspectives on conventions, which can lead to the making of various meanings. In this sense, meaning is personal and dynamic. And Murray acknowledges that inventors of digital technologies cannot control the ways in which those artifacts will be interpreted or used:

The invention of a new form of external media augments our capacity for shared attention, potentially increasing knowledge, but also increasing the possibilities for imposing fixed ideas and behavior and for proselytizing for disruptive causes. Media can augment human powers for good or for evil; and they often serve cultural goals that are at cross purposes. (Murray 40)

On this topic, one point of direct comparison between Murray and Kay relates to music. Murray references the pirating of music that took place over the internet starting in the 1990s, which resulted in decreased sales of CDs among other outcomes. In contrast, Kay wrote in 1972 that “most people are not interested in acting as a source or bootlegger; rather, they like to permute and play with what they own” (Kay 1972). Kay expected individual users would want a flexible computing device with which they could make their own meanings, but he underestimated the impact of networking and communal meaning processes. These computational artifacts have the power to alter the way humans behave, for bad and not just good.

Often the deciding factors in this development are out of any individual’s control. Murray puts a fine point on this: “Cultural values and economic imperatives drive the direction of design innovation in ways that we usually take for granted, making some objects the focus of intense design attention while others are ignored altogether” (Murray, 28).

Today, Kay acknowledges this power to a degree. He consistently and fondly remembers his time at Xerox PARC as a somewhat utopian experience of all researchers working together toward a common vision and in the absence of market drivers (Kay 2004). He has struggled to find another place like that. With respect to current artificial intelligence, for instance, he commented that “the market is not demanding something great — and neither are academia or most funders” (alankay1 2016). Still, he persists in trying to control the outcomes and change the way people think.

 

Small Revolutions

It is clear that Murray and Kay are moving toward similar ends. Both are attempting to create digital technologies that tap into the human processes of making meaning and building knowledge, and to augment those processes. They argue for meeting users where they are—delivering on expectations and helping with the process of extraction and abstraction. Both also recognize that the new digital space provides new affordances, not least the opportunity to give users greater agency over devices, and requires rethinking how information is presented.

When it comes to broad brush strokes, Murray’s general design process sounds much like the process Kay undertook when thinking up the Dynabook. Murray’s basic recipe for digital design includes: “framing and reframing of design questions in terms of the core human needs served by any new artifact, the assembling of a palette of existing media conventions, and the search for ways to more fully serve the core needs that may lie beyond these existing conventions” (Murray, 19).

Similarly, Murray and Kay are both firmly oriented toward the future. The only time Murray mentions Kay directly in her book, in fact, is on this subject. She quotes him saying, “The best way to predict the future is to invent it” (Murray 25). And the title of her book, Inventing the Medium, is essentially what Kay did in the 1970s.

Yet, Murray in some ways has a much broader scope than Kay. This is perhaps a counterintuitive thought given Kay’s truly revolutionary vision. Still, she is working in a much more complicated, networked computing environment than Kay was, and her goal is to fit anything that could be designed in the new digital space under the same set of umbrella principles. Her ideas are firmly rooted in broad, societal processes of meaning making, not just in the individual learning process. And she is exceeded in ambition by C. S. Peirce, who sought to produce a model of all meaning processes.

The implications of this are difficult to discern. But a more holistic view such as that taken by Murray could indeed help designers better meet human needs than one focused on individual goals, even if that approach does not impact the flow of history as spectacularly as Kay did. After all, humans are the symbolic species. Making new meaning is inborn and collective. The power of conventions should not be underestimated. And as Murray writes, the computer is a large and complicated “cultural loom, one that can contain the complex, interconnected patterns of a global society, and one that can help us to see multiple alternate interpretations and perspectives on the same information” (Murray, 21). Designing digital technologies today requires keeping the communal possibilities in mind.

Another difference has to do with agency. Murray frequently stresses the need for digital designs to match human meaning-making processes. Kay also stressed the need for computing technology to operate on the user’s level. Yet, Kay was actually trying to drastically change the way humans processed information as part of that symbiosis. With his vision to teach children programming so they could experiment with computer tutors, he was attempting to start a revolution in which humans made meanings with an entirely new set of abstractions that did not evolve organically from human processes but that was created by a small subset of experts. The new sign system did not emerge from the existing behaviors of a collective society but was rather imposed on broader society by a small culture. In a sense, his device was not meeting children on their playing field but was moving them to an entirely new country.

Murray speaks to this point when discussing situated action theory. She writes, referencing anthropologist Lucy Suchman, “Instead of asking the user to conform more fully to the machine model of a generic process, Suchman argues for an embodied and relational model, in which humans and machines are constantly reconstructing their shared understanding of the task at hand” (Murray, 62).

Society, broadly speaking, may now be approaching a point at which computing technologies can meet humans in this way, closer to their natural processes. That is, humans are becoming more accustomed to this new symbolic system—and its power—and technological developments are allowing computing systems to be more adaptable to human processes. This is not the disruptive, futuristic thinking both Kay and Murray call for, but evolution. Perhaps that is what it takes for long-term, deep changes in human behavior and meaning-making processes to happen.

In this vein, Kay has adapted his educational model to reflect developments, achieved and projected, in artificial agent technology. He, along with co-authors, outlined a new plan for “making a computer tutor for children of all ages” in 2016. The team wants to leverage innovations in artificial intelligence technology to develop an interactive tutor that can observe and respond to students’ behaviors, without the student having to program the device’s activities (Oshima et al 2016).

Although all meaning making is intersubjective according to Peirce, there is also something to be said for the stress Kay puts on the individual experience. Members who share symbolic systems draw from the same conventions, but their experiences are personal, and the interpretants they form are individualized to some extent. Humans operating in the digital space also now expect to make use of its participatory affordances. In colloquial terms, they want control, and to be able to do their own thing.

Many technologies are now more customizable and reactive to individual desires to create and learn, although most have not reached the point Kay wanted to with his service vision. The aforementioned Scratch and NetLogo are examples. Amazon has opened up its Alexa system and others to developers, so users can develop functionality for these devices to serve their own needs, as well as commercial ones. These apps can be shared with other users. Google also allows developers to create add-ons that bring new functionalities to its apps. To amplify individual and collective meaning-making processes, more flexibility is perhaps needed on this front.

Virtual and augmented reality technology, meanwhile, could completely change user interfaces once again. Although the technology is today often just used to play games and have fun, as the interactor below is doing, it could eventually revolutionize the way in which humans interact and make meaning with computing technologies.

When it comes to these technologies, Kay is both the base and the tip of the iceberg in many ways. His and others’ ideas drove and support the development of what we know today to be personal computing, and in formulating that vision, he helped unlock endless possibilities. But as Murray hints and Peirce demonstrates, there is a broader logic at play. Murray tries to tap into those broader meaning-making processes to push digital design forward one step—or giant leap—at a time.

In large part due to the technological access brought about by cheaper and smaller hardware components and the internet, there has, perhaps, not been one computer revolution of the sort Kay outlined but a multitude of smaller revolutions as humans have tried to catch up to the technological advancements. At least the two symbolic processors—the human and the computer—seem to be moving closer together in many ways, even if that change is slower than Kay might like.

 

Notes

[1] In many forums, Kay has given credit to his colleagues at Xerox PARC for their roles in bringing this vision to life. However, for ease of reading, unless he was a co-author on a publication, I have only used Kay’s name throughout this text.

 

Bibliography

alankay1. 2016. “Alan Kay Has Agreed to Do an AMA Today.” Hacker News. Accessed December 7. https://news.ycombinator.com/item?id=11939851.

“Amazon Developer Services.” 2016. Accessed December 18. https://developer.amazon.com/.

Bogost, Ian. 2005. “Procedural Literacy: Problem Solving with Programming, Systems, & Play.” Telemedium (Winter/Spring): 32–36.

Bolter, Jay David, and Richard Grusin. 2000. Remediation: Understanding New Media. Cambridge, MA: The MIT Press.

Bush, Vannevar. 2003. “As We May Think.” In The New Media Reader, edited by Noah Wardrip-Fruin and Nick Montfort, 35–47. Cambridge, MA: MIT Press.

Center for New Designs in Learning and Scholarship. 2016. “Active Learning.” Accessed December 18. https://commons.georgetown.edu/teaching/teach/.

Chen, Brian X. 2010. “Apple Rejects Kid-Friendly Programming App.” WIRED. April 20. https://www.wired.com/2010/04/apple-scratch-app/.

Clark, Andy, and David Chalmers. 1998. “The Extended Mind.” Analysis 58, no. 1: 7–19.

“Code.org: Anybody Can Learn.” 2016. Code.org. Accessed December 18. https://code.org/.

Deacon, Terrence W. 1998. The Symbolic Species: The Co-evolution of Language and the Brain. New York: W. W. Norton & Company.

“Develop Add-Ons for Google Sheets, Docs, and Forms | Apps Script.” 2016. Google Developers. Accessed December 18. https://developers.google.com/apps-script/add-ons/.

Donald, Merlin. 2007 “Evolutionary Origins of the Social Brain.” In Social Brain Matters: Stances on the Neurobiology of Social Cognition, edited by Oscar Vilarroya and Francesc Forn i Argimon, 215-222. Amsterdam: Rodophi.

Engelbart, Douglas. 2003. “Augmenting Human Intellect: A Conceptual Framework.” In The New Media Reader, edited by Noah Wardrip-Fruin and Nick Montfort, 93–108. Cambridge, MA: MIT Press. Originally published in Summary Report AFOSR-3223 under Contract AF 49(638)-1024, SRI Project 3578 for Air Force Office of Scientific Research, Menlo Park, CA: Stanford Research Institute, October 1962.

Gleick, James. 2011. The Information: A History, a Theory, a Flood. New York: Pantheon.

Goldin, Dina, Scott A. Smolka, and Peter Wegner, eds. 2006. Interactive Computation: The New Paradigm. New York: Springer.

Greelish, David. 2016. “An Interview with Computing Pioneer Alan Kay.” Time. Accessed December 5. http://techland.time.com/2013/04/02/an-interview-with-computing-pioneer-alan-kay/.

Irvine, Martin. 2016a. “The Grammar of Meaning Systems: Sign Systems, Symbolic Cognition, and Semiotics.” Unpublished manuscript, accessed December 17. Google Docs file. https://docs.google.com/document/d/1eCZ1oAurTQL2Cd4175Evw-5Ns7c3zCxoxDKLgVE8fyc/.

———. 2016b. “A Student’s Introduction to Peirce’s Semiotics with Applications to Media and Computation.” Unpublished manuscript, accessed December 17. Google Docs file. https://docs.google.com/document/d/1F0mFTLC1HgYIOnzwoNrSa0Re7PVplUfSo_OmqSOMfXc/edit.

Kay, Alan. 2001. “User Interface: A Personal View.” In Multimedia: From Wagner to Virtual Reality, edited by Randall Packer and Ken Jordan, 121–131. New York: W. W. Norton. Originally published in 1989. Available at http://www.vpri.org/pdf/hc_user_interface.pdf.

———. 2003. “Background on How Children Learn.” VPRI Research Note RN-2003-002. Available at http://www.vpri.org/pdf/m2003002_how.pdf.

———. 2004. “The Power of Context.” Remarks upon being awarded the Charles Stark Draper Prize of the National Academy of Engineering, February 24. Available at http://www.vpri.org/pdf/m2004001_power.pdf.

———. 2007. “A Powerful Idea about Ideas.” Filmed March 2007. TED video, 20:37. Accessed December 7, 2016. https://www.ted.com/talks/alan_kay_shares_a_powerful_idea_about_ideas.

Kay, Alan C. 1972. “A Personal Computer for Children of all Ages.” Palo Alto, CA: Xerox Palo Alto Research Center.

———. 1977. “Microelectronics and the Personal Computer.” Scientific American 237, no. 3: 230-44.

Kay, Alan, and Adele Goldberg. 2003. “Personal Dynamic Media.” In The New Media Reader, edited by Noah Wardrip-Fruin and Nick Montfort, 393–404. Cambridge, MA: MIT Press. Originally published in Computer 10, no. 3 (March 1977): 31–41.

“Learn a Language for Free.” 2016. Duolingo. Accessed December 18. https://www.duolingo.com/.

Licklider, J. C. R. 1990. “The Computer as Communication Device.” In Systems Research Center, In Memoriam: J. C. R. Licklider, 21–41. Palo Alto, CA: Digital Equipment Corporation. Originally published in IRE Transactions on Human Factors in Electronics HFE-1: 4–11, March
1960.

Lifelong Kindergarten Group at the MIT Media Lab. 2016a. “Scratch – Imagine, Program, Share.” Accessed December 18. https://scratch.mit.edu/.

———. 2016b. “Smalltalk – Scratch Wiki.” Last modified December 13. https://wiki.scratch.mit.edu/wiki/Smalltalk.

Manovich, Lev. 2013. Software Takes Command. New York: Bloomsbury Academic.

Maxwell, John W. 2006. “Tracing the Dynabook: A Study of Technocultural Transformations.” PhD diss., University of British Columbia.

McLuhan, Marshall. 1964. “The Medium Is the Message.” In Understanding Media: The Extensions of Man, 7–21. Cambridge, MA: MIT Press. Available at http://web.mit.edu/allanmc/www/mcluhan.mediummessage.pdf.

MIT Media Lab. “In Memory: Seymour Papert.” Accessed December 18. https://www.media.mit.edu/people/in-memory/papert.

Murray, Janet H. 2011. Inventing the Medium: Principles of Interaction Design as a Cultural Practice. Cambridge, Massachusetts: The MIT Press. http://site.ebrary.com/lib/alltitles/docDetail.action?docID=10520612.

Nguyen, Long Tien, and Alan Kay. 2015. “The Cuneiform Tablets of 2015.” Paper presented at the Onward! Essays track at SPLASH 2015, Pittsburgh, PA, October 25. Available at http://www.vpri.org/pdf/tr2015004_cuneiform.pdf.

“One Laptop per Child.” 2016. Accessed December 18. http://one.laptop.org/.

Oshima, Yoshiki, Alessandro Wart, Bert Freudenber, Aran Lunzer, and Alan Kay. 2006. “Towards Making a Computer Tutor for Children of All Ages: A Memo.” In Proceedings of the Programming Experience Workshop (PX) 2016, 2125. New York: ACM.

Renfrew, Colin. 1999. “Mind and Matter: Cognitive Archaeology and External Symbolic Storage.” In Cognition and Material Culture: The Archaeology of Symbolic Storage, edited by Colin Renfrew, 1–6. Cambridge, UK: McDonald Institute for Archaeological Research.

Sutherland, Ivan. 2003. “Sketchpad: A Man-Machine Graphical Communication System.” In The New Media Reader, edited by Noah Wardrip-Fruin and Nick Montfort, 109–126. Cambridge, MA: MIT Press. Originally published in American Federation of Information Processing Societies Conference Proceedings 23:329-246, Spring Joint Computer Conference, 1963.

Wegner, Peter. 1997. “Why Interaction Is More Powerful Than Algorithms.” Communications of the ACM 40, no. 5: 80–91.

Wing, Jeannette. 2006. “Computational Thinking.” Communications of the ACM 49, no. 3: 33–35.

———.  2009. “Jeannette M. Wing – Computational Thinking and Thinking About Computing.” YouTube video, 1:04:58. Posted by ThelHMC. October 30.

 

Do Robots Speak In Electric Beeps?: Artificial Intelligence & Natural Language Processing (Alexander MacGregor)

Abstract

When we think of the term “artificial intelligence”, a certain array of images often comes to mind. Be it humanoid robots doing housework, or sinister machines ruthlessly stepping over humans on their path to dominance, much of the public discourse surrounding this term has been driven by our media and arts, and all the artistic license that comes with them. But if we explore the history of artificial intelligence and its applications, we see that the tasks we have traditionally attempted to offload onto AI are less whimsical, but perhaps just as fundamental to our experience as semiotically capable cognitive beings. In this paper, I will trace this history while focusing in on the specific task of natural language processing (NLP), examining the models we use to offload the linguistic capabilities that we, as humans running OS Alpha, obtain at birth.

Brief History of Artificial Intelligence

Although artificial intelligence only became a formal academic discipline at the Dartmouth Conference of 1956, the ideas and concepts that came to shape the field were present as far back as Ancient Greece. It was Aristotle’s attempts to codify “right-thinking” that first laid ground for much of the logic-based framework that AI philosophy resides within (Russell & Norvig, 17). In what was perhaps the first step in the history of AI related cognitive offloading, 14th century Catalan philosopher Ramon Llull conceived of the mechanization of the act of reasoning in his Ars generalis ultima. In the 16th Century, Leonardo da Vinci outlined the designs for a mechanical calculator, which was eventually realized in 1623 by a German scientist by the name of Wilhelm Schickard, although it was Blaise Pascal’s “Pascaline” calculator built 20 years later that is more widely recognized (Russell & Norvig, 5).

Leonardo da Vinci's sketch of a mechanical calculator

Leonardo da Vinci’s sketch of a mechanical calculator

Around the same time, English philosopher Thomas Hobbes, German philosopher Gottfried Leibniz, and French philosopher Rene Descartes were each advancing the discourse on this topic. In the introduction to his seminal book, Leviathan, Hobbes asks the reader “For seeing life is but a motion of limbs, the beginning whereof is in some principal part within, why may we not say that all automata (engines that move themselves by springs and wheels as doth a watch) have an artificial life? For what is the heart, but a spring; and the nerves, but so many strings; and the joints, but so many wheels, giving motion to the whole body” (Hobbes, 7). Leibniz was attempting to discover a “characteristica universalis”, which would be a formal and universal language of reasoning allowing for all debate and argument to be unambiguously reduced to mechanical operation (Levesque, 257). It is impossible to ignore the impact of Rene Descartes on the formation of this field. While he is perhaps most well known for his theory of mind-body dualism, he also developed more directly automation based observations, such as conceptualizing animals as machines (Russell & Norvig, 1041).

In the 19th and 20th Centuries, we began to see attempts to build machines capable of executing on the ideas promoted by previous philosophers. Charles Baggage’s Difference Engine was an attempt to mechanize computational work previously done by “human computers”. Babbage also designed, but was never able to build, what he called an “Analytical Engine”, which was really the first design for what we now know as a “general purpose computer” (Dasgupta, 27). The code breaking frenzy of the Second World War provided an environment in which many computational advances were made, and there was perhaps no more influential figure to emerge from this era than Alan Turing. Considered by many to be the father of modern computing, Turing’s work during this era was crucial to the prismatic explosion of AI and computing advancements that we saw in the latter half of the 20th century.

London Science Museum's model of Charles Babbage's Difference Engine

London Science Museum’s model of Charles Babbage’s Difference Engine

Brief History of Natural Language Processing

In 1950, Alan Turing published his paper “Computing Machinery and Intelligence”, which proved to be pivotal in the yet-to-exist field of artificial intelligence. In the paper, Turing contemplated the possibility of building machines that are capable of “thinking”, and introduced the concept of the Turing Test as a way to determine whether a machine was exhibiting such traits to the extent they were indistinguishable from a human being (Dennett, 3-4). When we, as humans, engage in the exchange of ideas, symbols and messages that assure us of our respective intelligence and “personhood”, we do it through the interface of language. It is truly one of the key enablers of our semiotic skill-set. So if we are to create artificial intelligence, then the ability to communicate signs, symbols and meaning through language is a top priority.

Four years after Alan Turing published his seminal paper, IBM and Georgetown University held a joint exhibition to demonstrate their fully automatic machine translation capabilities. Using an IBM 701 mainframe computer, the operator was able to translate Russian sentences comprising of 250 words and six grammatical rules into English (Hutchins, 240). This “breakthrough” was highly publicized, and led the authors and press to make bold predictions about the immediate future of artificial intelligence and machine translation, but the reality of the situation was much less grandiose. The program was only able to seem successful by severely restricting the grammatical, syntactic and lexical possibilities far short of any realistic conceptions of a truly artificial intelligence.

Newspaper clipping detailing the IBM-Georgetown Machine Translation Experiment

Newspaper clipping detailing the IBM-Georgetown Machine Translation Experiment

Successes & Limitations

This was, in fact, the story of most of the NLP attempts made during the early days of AI. Although successful, programs like Daniel Bobrow’s STUDENT, designed to solve high school level algebraic word questions, and Joseph Wizenbaum’s ELIZA, famously used to simulate the conversation of a psychotherapist, were still operating within very strict linguistic constraints. ELIZA, for example, wasn’t capable of analyzing the syntactic structure of a sentence or deriving its meaning, two elements that are crucial for true language comprehension. Nor was it able to extrapolate on its linguistic inputs to explore its environment. ELIZA was, in fact, the first chatterbot, designed only to respond to certain keyword inputs by with a pre-set answer. (Bermudez, 31)

The limitations of these early NLP systems gave rise to the micro-world approach of MIT AI researchers Marvin Minsky and Seymour Papert, most famously exhibited in fellow MIT researcher Terry Winograd’s SHRDLU program. Micro-worlds were problems that would require real intelligence to solve, but were relatively simple and limited in scope (Russell & Norvig, 19-20). Papert initially saw micro-worlds as a way to connect computing to the hard sciences, where simple models were often used to derive fundamental scientific principles (Murray, 430). Winograd’s SHRDLU program put this approach to test, and was one of the earliest attempts to get machines to do true natural language processing, which meant the system would “report on its environment, plan actions, and reason about the implications of what is being said to it” (Bermudez, 32). SHRDLU was a success and prompted a lot of excitement around the field of NLP, but because it was so dependent on syntactic analysis and was operating within a micro-world, many of the same limitations from the early machine translation attempts were present in SHRDLU (Russell & Norvig, 23). The simplicity of the micro-world constraints meant SHRDLU’s language was correspondingly simple, as it could only talk about the events and environments of the micro-world it inhabits.

The micro-world of Terry Winograd's SHRDLU program

The micro-world of Terry Winograd’s SHRDLU program

Even with these constraints, SHRDLU did contribute to three major advancements in the evolution of NLP. Firstly, it displayed that the conceptual and theoretical rules of grammar could be practically executed in an NLP program. Secondly, it showcased the approach of breaking down cognitive systems into distinct components that each executes a specific information-processing task. Thirdly, it was built on notion of language as an algorithmic process (Bermudez, 33). These factors would set the stage for how NLP programs would be built moving forward.

In many ways, machine translation and NLP followed a similar historical trajectory as speech recognition attempts. The early excitement prompted by the information theory and word-sequencing models of the 1950s would be tempered in favour of the highly knowledge-intensive and specific micro-world approach of the 1960s. The 1970s and 1980s saw a push for commercialization of these previously academically restricted programs, but also, more importantly, the rise of the neural network approach. (Russell & Norvig, 26)

This prompted a civil war of sorts between the “connectionist” camp advocating for approaches like neural networks, the “symbolic” camp advocating symbol manipulation as the best frame to understand and explore human cognition, and the “logicist” camp advocating a more mathematical approach. Even to this day there has been no truly definitive resolution to this conflict, but the modern view is that the connectionist and symbolic frameworks are complementary, not incompatible. (Russell & Norvig, 25)

Enter Stage Right: Artificial Neural Networks

When looking at the modern natural language processing landscape, one sees that artificial neural networks (ANNs), particularly recurrent neural networks, are the en vogue computational approach (Conneau, Schwenk, Barrault & LeCun, 1). Originally inspired by the mid-20th century neuroscience discovery that mental processes are composed of electrochemical activity in networks of brain cells called neurons, artificial intelligence researchers aimed at modeling their approaches after this system (Norvig, 727). Two of the most prominent early advocates of this method were Warren McCulloch and Walter Pitts, who in 1943 designed a mathematical and algorithmic computational model for these neural networks. Yet due to a lack of research funding and the publication of an influential research paper by Minsky and Papert in which they detailed the limitations of the computational machines being used to run neural networks at that point in time, the neural network approach was sidelined until the late 1980s when the neural network back-propagation learning algorithms first discovered in 1969 by Arthur Bryson and Yu-Chi Ho were reinvented (Norvig, 761). The very influential textbook “Parallel Distributed Processing: Explorations in the Microstructure of Cognition” by David Rumelhart and James McClelland also helped to reinvigorate the AI community’s interest in this approach.

Model of a feedforward neural network

Model of a feedforward neural network

Yet a lack of computational power would mean other machine learning methods, such as linear classifiers or support vector machines, would hold precedence over neural networks. That was the case until the computational processing hardware landscape evolved to a state where technologies such as GPUs and computational approaches like distributed computing made it possible for neural networks to be deployed on the scale necessary to handle tasks like natural language processing.

So How Exactly Do These Neural Networks Work?

In a nutshell, a neural network is a collection of computational units connected together that “fires” an output when its inputs cross a predefined hard or soft threshold. (Russell & Norvig, 727-728) The earliest models were designed with only one or two layers, but they ran into limitations when it came to approximating basic cognitive tasks. Later models would solve this problem by adding a layer of “hidden units” and giving the nets the ability to adjust the connection weights.

This video is a helpful visual introduction to the concept:

What Does Linguistics Have To Say About All This? 

Due to the brain being a massively parallel organ with neurons apparently working independently of each other, artificial neural networks have been used as an approach to computationally offload many of the cognitive functions the brain performs, such as pattern recognition, action planning, processing and learning new information, and using feedback to improve performance (Baars, 68). Language is an inherently symbolic activity, so if we are to offload the task of natural language processing to artificial intelligence, the capability of neural nets to be translated into symbolic form, and for symbolic forms to be translated back into neural nets, is a feature that makes this approach very attractive.

In addition to being symbolic, language is also a practical, rule-governed activity. It was Noam Chomsky, often considered to be the father of modern linguistics, who first attempted to discover why it is that language operates in the manner it does (Bermudez, 16). In his groundbreaking book Syntactic Structures, Chomsky makes a distinction between the deep structure and surface structure of a sentence. The former is referring to how the basic framework of a sentence is governed by phrase structuring rules operating at the level of syntactic elements such as verbs, adjectives and nouns. The latter refers to the organization of words in a sentence, which must abide by the sentence’s deep structure. The important point to note here is that language is conceived of as hierarchical, algorithmic, and rule-based. The rules extend to not only grammar and syntax, but also individual words and contextual meaning.

Chomsky's famous grammatically correct, yet semantically unintelligible sentence.

Chomsky’s famous grammatically correct, yet semantically unintelligible sentence.

Adding onto Chomsky’s insights was his student at MIT in the 1960s, Ray Jackendoff, whose “parallel architecture” linguistic model sought to debunk the syntactocentric models of old and promote a framework positing the independent generativity of the semantic, phonological and syntactic elements of language (Jackendoff, 107). From Jackendoff, we can conceptualize language as a combinatorial structure in which elements work in a parallel fashion to produce expression and meaning. Again, a processing architecture is at the basis of this framework.

Jackendoff's model of Parallel Architecture

Jackendoff’s model of Parallel Architecture

ANNs and NLP, Live Together in Perfect Harmony? 

While artificial neural networks do not have linguistic rules inherently built into them like the human brain is thought to, they have been shown to capable of modeling complex linguistic skills. The simple recurrent neural networks designed by Jeff Elman have been successful trained to predict the next letter in a series of letters, and the next word in series of words. Studies done by developmental psychologists and psycholinguists that examine the patterns children display when they learn languages have shown that in many features of language acquisition, human beings follow a very archetypal trajectory. One example would be making similar types of grammatical construction mistakes at similar learning stages. When artificial neural network researchers analyzed the degree to which their models can reproduce these features of language processing, they found similarities between how the neural networks learn and how children learn. (Bermudez, 246)

Verb tense is another specific area in which much research has been conducted testing the natural language processing capabilities of artificial neural networks. While the actual computational process is quite complex, it essentially boils down to the theory that children learn the past tense in three distinct stages. In the first stage, they use only a small number of verbs in primarily irregular past tenses. In the second stage, the number of verbs in use expands and they formulate past tense in the “standard stem + -ed” format. In the third stage, as they learn more verbs, they correct their “over-regularization errors”. Where artificial neural nets come in is in their ability to develop a similar learning pathway without needing to have linguistic rules explicitly coded in them. (Bermudez, 247)

*Record Scratch* Let’s Pump The Brakes A Bit

It is important to note at this juncture that artificial neural nets are nowhere close to mirroring the brain’s ability to perform these tasks, and neither is that the goal. The aim is to enable machines to engage in natural language processing, regardless of the similarity of method to how humans engage in natural language processing. There is no imperative to follow the same rule-based framework for language that humans use, because artificial neural networks are not attempts to reconstruct the human brain or even mirror its intricacies, but rather to behave in accordance with rule-governing aspects of linguistic understanding, even though they do not represent those rules. They are simply an approach inspired by this one element of how our brains process information. Compared to the massively complex brain, most of the simulations run through artificial neural nets are relatively small-scale and limited. But for certain cognitive tasks, neural nets have proven to be more successful than programs using logic and standard mathematics (Baars, 68). The neural network approach provides certain affordances that make computation of this scale and nature more effective, such as its ability to handle noisy inputs, execute distributed and parallel computation, and to learn. It is not imperative to resolve any conflicts between the way we believe the brain to be operating and the way neural networks are architected. As Texas A&M University Professor of Philosophy Jose Bermudez states:

 “The aim of neural network modeling is not to provide a model that faithfully reflects every aspect of neural functioning, but rather to explore alternatives to dominant conceptions of how the mind works. If, for example, we can devise artificial neural networks that reproduce certain aspects of the typical trajectory of language learning without having encoded into them explicit representations of linguistic rules, then that at the very least suggests that we cannot automatically assume that language learning is a matter of forming and testing hypotheses about linguistic rules. We should look at artificial neural networks not as attempts faithfully to reproduce the mechanics of cognition, but rather as tools for opening up new ways of thinking about how information processing might work.” (Bermudez, 253-254)

What Does The Future Hold?

The future of natural language processing and artificial intelligence is sure to be shaped by the tech giants currently absorbing research talent at a vociferous rate. Companies like Google, Facebook, Microsoft, Amazon, and Twitter have all identified businesses uses for this technology. For Facebook, it’s their DeepText engine that filters unwanted content from their users’ newsfeeds. Google’s uses for this technology are varied, but include user experience in apps, search, ads, translate and mobile. Microsoft’s research team is looking to this technology to design and build software.

This corporate takeover has not gone without concern. For the majority of the history of AI research, universities and public research institutions have been the incubation chambers for breakthroughs, and they have a far more transparent culture than corporations driven by profit maximization and incentivization towards harbouring trade secrets. In order to assuage this concern, many of these companies have embraced an open source culture when it comes to their findings. They have encouraged their researchers to publish and share their work (to an extent) with the broader community, under the rationalization that a collegial atmosphere will create gains that everyone can utilize. Bell Labs and Xerox PARC have become the aspiration models, as it was precisely the accessibility and open environment of these institutions that allowed innovation to thrive.

Xerox PARC's infamous beanbag chair meetings

Xerox PARC’s infamous beanbag chair meetings

This is surely one of the main reasons we’ve witnessed an exodus of academic researchers into these companies. Two of the most prominent names in the field right now are Geoffrey Hinton and Yann LeCun. Hinton, a former University of Toronto professor considered to be the godfather of deep learning, was scooped up by Google to help design their machine learning algorithms. LeCun, a former New York University professor, is now the Director of AI Research at Facebook. Undoubtedly, the extremely large data sets these companies have collected are also a powerful draw, as they allow for training bigger and better models. When asked what he perceives the future of NLP and artificial neural nets to be, Hinton answered:

For me, the wall of fog starts at about 5 years. (Progress is exponential and so is the effect of fog so its a very good model for the fact that the next few years are pretty clear and a few years after that things become totally opaque). I think that the most exciting areas over the next five years will be really understanding videos and text. I will be disappointed if in five years time we do not have something that can watch a YouTube video and tell a story about what happened” (Hinton).

A similar question was posed to University of Montreal professor of Computer Science Yoshua Bengio, also considered to be one of the preeminent figures in the field right now, to which he responded:

I believe that the really interesting challenge in NLP, which will be the key to actual “natural language understanding”, is the design of learning algorithms that will be able to learn to represent meaning” (Bengio).

Where Does “Meaning” Fit Into The Equation? 

If meaning-making is the ultimate purpose of language, then the true holy grail of natural language processing through artificial neural networks is unsupervised learning. The majority of the current models being employed utilize a supervised learning technique, meaning the network is being “told” by the designers what mistakes and errors it is making (Bermudez, 220). With unsupervised learning, the training wheels come off and the network receives no supervisory external feedback, learning on its own (Arbib, 1183). According to University of California, Berkley Professor Michael I. Jordan, one of the leading researchers in the fields of machine learning and artificial intelligence, unsupervised learning is “presumably what the brain excels at and what’s really going to be needed to build real “brain-inspired computers”” (Jordan).

Conclusion

Journeying through the history of artificial intelligence, we saw just how broad and deep the philosophical roots of this field are. From canonical figures like Aristotle and Descartes to modern heavyweights like Turing and Chomsky, the scope of thinkers contributing to artificial intelligence advancements is proof positive of its interdisciplinary nature. The problems posed by the quest to cognitively offload key human faculties require answers drawing from such diverse fields as computer science, neurology, linguistics, mathematics, physics, and engineering. Out of all the cognitive tasks we have attempted to offload to AI, natural language processing is perhaps the most important. As the renowned cognitive scientists, linguist and psychologist Steven Pinker has stated:

For someone like me, language is eternally fascinating because it speaks to such fundamental questions of the human condition. Language is really at the center of a number of different concerns of thought, of social relationships, of human biology, of human evolution, that all speak to what’s special about the human species. Language is the most distinctively human talent. Language is a window into human nature, and most significantly, language is one of the wonders of the natural world.” (Big Think)

It is only natural that in the quest to technologically mediate this uniquely human skill, we looked to our own brain for inspiration. But while certain neurological features have surely inspired artificial neural networks, the dominant natural language processing model, AI designers, researchers and architects are not bound by them. The goal is to get computational machines to process natural language. How one gets there is relatively inconsequential. Due to the exponential increase in size and quality of the data sets used to train artificial neural nets, we are sure to see some exciting advances in natural language processing over the next few years, but as of now, the ultimate goal of a “strong AI” capable of dealing with the concept of linguistic meaning remains behind the “wall of fog”.

Works Referenced

  1. Russell, Stuart J., and Peter Norvig. Artificial Intelligence: A Modern Approach. Third ed. Upper Saddle River, NJ: Prentice Hall, 2010.
  2. Hobbes, Thomas. Leviathan. Urbana, Illinois: Project Gutenberg, 2002. Web. 2 December, 2016.
  3. Levesque, Hector J. “Knowledge Representation and Reasoning.” Annual Review of Computer Science1 (1986): 255-87. Web. 6 Dec. 2016.
  4. Dasgupta, Subrata. It Began With Babbage: The Genesis of Computer Science. Oxford: Oxford UP, 2014.
  5. Dennett, Daniel C. Brainchildren: Essays On Designing Minds. Cambridge, MA: MIT, 1998.
  6. Hutchins, John. “From First Conception to First Demonstration: the Nascent Years of Machine Translation, 1947-1954. A Chronology.” Machine Translation, vol. 12, no. 3, 1997, pp. 195–252. Web. 5 Dec. 2016.
  7. Bermúdez, José Luis. Cognitive Science: An Introduction to the Science of the Mind. Cambridge: Cambridge UP, 2010.
  8. Murray, Janet. Inventing the Medium. Cambridge, MA: MIT, 2012.
  9. LeCun, Yann, et al. “Very Deep Convolutional Networks for Natural Language Processing.” ArXiv: Computation and Language, 2016. Web. 15 Dec. 2016.
  10. Jackendoff, Ray. Foundations of Language: Brain, Meaning, Grammar, Evolution. Oxford: Oxford UP, 2002.
  11. Baars, Bernard J., and Nicole M. Gage. Cognition, Brain, and Consciousness: Introduction to Cognitive Neuroscience. Burlington, MA: Academic/Elsevier, 2010.
  12. geoffhinton [Geoffrey Hinton]. “AMA Geoffrey Hinton.” Reddit, 10 Nov. 2014, https://www.reddit.com/r/MachineLearning/comments/2lmo0l/ama_geoffrey_hinton/clyjm11/. Accessed 13 Dec. 2016.
  13. yoshua_bengio [Yoshua Bengio]. “AMA: Yoshua Bengio.” Reddit, 24 Feb. 2014, https://www.reddit.com/r/MachineLearning/comments/1ysry1/ama_yoshua_bengio/cfpmo29/ . Accessed 13 Dec. 2016.
  14. michaelijordan [Michael I. Jordan]. “AMA: Michael I. Jordan.” Reddit, 11 Sep. 2014, https://www.reddit.com/r/MachineLearning/comments/2fxi6v/ama_michael_i_jordan/ckep3z6/. Accessed 13 Dec. 2016.
  15. Big Think. “Steven Pinker: Linguistics as a Window to Understanding the Brain.” Online video clip. Youtube. YouTube, 6 October 2012. Web. 15 Dec. 2016
  16. Arbib, Michael A. Handbook of Brain Theory and Neural Networks. 2nd ed. Cambridge, MA: MIT, 2003.
  17. Wilson, Robert A., and Frank C. Keil. The MIT Encyclopedia Of The Cognitive Sciences. Cambridge, MA: MIT, 1999.
  18. Murphy, Kevin P. Machine Learning: A Probabilistic Perspective. Cambridge, MA: MIT, 2012.
  19. Marcus, Gary F. The Algebraic Mind: Integrating Connectionism and Cognitive Science. Cambridge, MA: MIT, 2001.
  20. Frankish, Keith, and William M. Ramsey. The Cambridge Handbook of Cognitive Science. Cambridge: Cambridge UP, 2012.

Mediating the Social-Cultural Function of Museums: A Case Study of the Digital Exhibition of “Landscape with the Fall of Icarus” – Ruizhong Li

Abstract

In this paper, I am going to analyze an online exhibition of the painting Landscape with the Fall of Icarus as:

  1. A digital artefact mediating the social-cultural function of museums
  2. An interaction design as a cultural practice

The paper will include:

  1. An illustration of the multilayer of interfaces embedded in this digital artefact and the affordances implemented by this digital exhibition;
  2. An analysis of how this online exhibition remediated the social-cultural function of museums; what changes brought by the digital museum idea to on-site museum-visiting experience and human cognition of art history.

Introduction

Digitization has made it possible for human to tour the world without leaving their desk. Digital representations can remediate almost every material artefact bit by bit. Remediation, in essence, is a process of creating an embedded structure of meaning systems as time goes by. Therefore, in order to de-black the remediating process, we need to figure out what’s going on in each layer of interfaces and how they connect to the larger meaning systems. The digital exhibition I want to discuss in this paper is a case of mediating a material painting in a physical-existed museum, a meta-(already)-meaning system. Therefore, before I go further into looking at the remediation of the social-cultural function of museum (as suggested in the title), I would do an visual illustration of symbols remediated and affordances implemented in each layer of interface. Then, based on the detailed illustration, I will do an analysis on how this online exhibition remediated the social-cultural function of museums. Done with every piece of detail about this digital exhibition, I want to compare it to another form of digital representation within a larger realm of virtual experience, to see what might be a special application of the technical mediation technologies for this instance. Finally, I will extrapolate from the case to discuss the differences between the virtual experience and on-site experience of visiting museum, and how digital exhibition changes human cognition of the art history.

Multilayer interfaces of the Digital Exhibition

“All human symbolic activity and representation in any material form are artefacts of symbolic cognition in a long cumulative continuum of technical mediations for human sign systems.” – Martin Irvine

Computation, software, digital media are artefacts of human symbolic cognition, and thus all computing is fundamentally “human(istic) computing. Human sign system is the fundamental meaning system underlying all of the technology mediated platform. Therefore, I will start off from the “human” interpretation of the painting and go further into the computer-mediated meta world.

First of all, for background information about the painting, please click here for details.

  • Layer 1  The Painting: Landscape with the Fall of Icarus
pieter_bruegel_de_oude_-_de_val_van_icarus

Fig 1. Landscape with the Fall of Icarus (undated)

The painting Landscape with the Fall of Icarus, featuring a subject from Greek mythology, depicted the fall of Icarus described by Ovid in his Metamorphoses. The painting included almost all the elements in the story, but at the same time, it is a very personal interpretation of the story.

72786eaf-700e-43ba-a7f1-909b6995c29b

Fig 2. Bruegel’s interpretation through composition

The composition suggests that what Ovid accentuated in his passage, the fall of Icarus, was understated by Bruegel (the painter). In the bottom right-hand corner of the painting, the legs of Icarus himself can be seen desperately flailing in the air. What’s more, the rest of the world remains unperturbed; the ploughman, the shepherd, even the fisherman were showing indifference to the fall of the mythical hero.

According to Peirce’s triadic model of the sign as a process with material forms, the following diagram presents and explains the ObjectInterpretantRepresentamen, and the Semiosis.

semiosispainting-1

Fig 3. Semiosis of the Painting

The Object in this symbolic productivity process is the original Greek mythology, which is already a meaning system organizing assorted folklores into a formalized framework. Interpreted by Bruegel, he decomposed the established institutions and relationships of the characters and elements (the symbols) in the story and reconstruct the symbols into a personal interpreted meaning system via the proportion, composition, palette, etc. Understating the fall of Icarus, Bruegel emphasized the landscape and other characters in the painting. In terms of proportion, Icarus takes up only a small space of the painting while the other characters in the foreground and the panorama of the port in the background take up a large proportion of the whole painting. In terms of composition, the character of Icarus was depicted in the shadow on the right-bottom end of the diagonal – the least conspicuous position. In the foreground, Bruegel depicted a tangible reality that ordinary people were minding their own business: ploughman was steering his plough, shepherd was gazing at sky while grazing his sheep, and fisherman was engrossed in his toil; none of the characters pay attention to Icarus flailing in the water. In the background, Bruegel delineated a panorama of the port, island and the surrounding town, which shift audience’s attention to the marvelous landscape of the vast ocean. In terms of palette, Bruegel used three hues one after the other (browns, greens and blues) to create the impression of depth. Icarus was on the blurred boundary of two hues, where hardly can audiences notice. These correlation-making components reflected Bruegel’s responses to the story, which function as the Interpretant in the symbolic activity. The painting, the one hanging on a wall in the Royal Museums of Fine Arts of Belgium is the Representamen. It is the “sign vehicle” representing the material-perceptible structure with the interpretable features in the painting realm.

The painting provided a personal interpretation of the Ovid’s passage, which is a special genre of the Greek mythology. It is not a direct interpretation of the original story, but it serves to present an interface for audience to understand the Fall of Icarus story via a medium of painting.

  • Layer 2  The Painting in the Museum
Fig 4. Brugel Artworks Exhibition in the Royal Museums of Fine Arts of Belgium

Fig 4. Brugel Artworks Exhibition in the Royal Museums of Fine Arts of Belgium

In the room where exhibit Bruegel’s painting artworks, the paintings are decorated (or protected) by wooden frames hanging on the wall. The walls are painted in peaceful green, providing a harmony tone for better visual experience for audiences while watching the artworks. Paintings are hang on the wall aligned with a horizontal line at a suitable height according to people’s watching habit. The paintings are not tightly attached to the wall; instead, in order to solve the problem that the painting may reflect light, there is certain degrees between the plane of the painting and the wall. In terms of location of each work, the most famous and delicate artwork of Bruegel, The Fall of Rebel Angel is placed at the center of the wall, which is faced with the designated auditorium (the deck in the middle of the room). Beside this painting, there are Landscape with the Fall of Icarus (right) and Winter Landscape with Bird-trap (left). These paintings work together to serve to map audiences into another meaning system to interpret any one of these paintings. The painting as a whole is a single symbol of the “room meaning system”. For instance, Landscape with the Fall of Icarus functions as a related artwork to the middle painting for the same topic of the fallen angel, at the same time, serves to guide audience through the room to the next painting, leaving room for audiences to imagine the possible correlation between these two artworks.

The painting Landscape with the Fall of Icarus with the wooden frame is a part of the Bruegel’s room, and functions as an interface for audiences to understand the larger meaning system of Bruegel’s artwork collection. Observed from the photograph (in fact, a digital copy) above, we have a general feeling of the installations of the Bruegel’s artwork room. Unconsciously, we have interpreted the painting through a digital interface, the computer screen. This brings us to the next layer of interface – digital representation.

There are so many possible ways to present a painting in a digital form, photograph, video, even audio commentary. The digital exhibition Landscape with the Fall of Icarus (… and the surrounding controversy) combines several media together to present a in-depth interpretation of the painting.

The digital exhibition is a part of Bruegel Unseen Masterpieces projectThe project is presented by the collaboration of The Royal Museums of Fine Arts of Belgium and Google Arts and Cultural. 

“Drawing on a wide spectrum of virtual and on-site experiences, this unique initiative offers everyone the chance to immerse themselves in Bruegel’s works by honing in on the details of each painting and accessing expert knowledge. By delving deeper into the artist’s world, the viewer will discover the unexpected elements in Bruegel’s works which constitute the pinnacle of the Flemish master’s craft. … This innovative concept is the fruit of in-depth thinking on current transformations in the field of museology as it adapts to the digital era.

From the description, we instantly know that we have immersed in a 3-step meta process when we watch the painting (layer 1) that resides in the museum (layer 2) on the computer screen (layer 3) in the context of the digital exhibition as a part of the whole project.

Accessing the Bruegel. Unseen Masterpieces virtual exhibitions on the Google Arts and Culture platform, we are interact with a material screen which is a  pixel-mapped substrate. What makes the digital representation different is the difference in material representamen. When visiting the on-site museum, what we see using our naked eyes is an artwork painted on a canvas using oil painting. However, when we watch the painting in the online exhibition, we are looking at a set of pixels, which remediate the original painting into a digital copy. These pixels are organized in a specific way (abstraction, recursion, …) to resemble the original painting.

Apart from the painting itself, other functions of museum are also imitated by the virtual online museum. The commentary text and video helps explain the composition and the dynamic path that eye follows across the composition, which is also an example of imitating the description in the card and the curators in the physical museum:

Fig. 5 Christine Ayoub, guide at the Royal Museums of Fine Arts of Belgium, explains the path that the eye follows across the composition

The sequence of the exhibition and other paintings involved in the digital presentation is an example of imitating the installations in the on-site museum as well. In the physical museum, paintings are organized by author, at the same time, paintings of the same authors are placed according to specific logic. In the digital exhibition, paintings that are related to the presented painting, according to assorted needs, will be added into the exhibition:

Fig. 6 Related paintings are added according to assorted needs: 1) A parallel painting displaying the composition of sweeping landscape; 2) Bruegel's interest in depicting port; 3) A painting adapting from the same Greek mythology

Fig. 6 Related paintings are added according to assorted needs: 1) A parallel painting displaying the composition of sweeping landscape; 2) Bruegel’s interest in depicting port; 3) A painting adapting from the same Greek mythology

Therefore, the digital representations are taking advantage of preexisting interfaces.

But it also has something new, which an actual museum cannot do. In Alan Kay’s vision, this is a win for the virtual museum.

The high resolution of the photographic copy of the painting allows people see even more details than what we can see using our naked eye in the museum. With the zoom-in function, we can even see the little cracks of the oil paints:

Fig.6 Look closer at the masterpieces

Fig.7 Look closer at the masterpieces

Also, the digital exhibition enable to guide audiences to specific part (often the neglected details) of the painting:

Fig. Discover the secrets

Fig.8 Discover the secrets

The technical mediation enable audiences to see more than we could see: the head of a man lying in the undergrowth, the horse equipped with blinkers, a knife and a sword on the same rock, a seed bag leaned against a rock, the partridge, the fisherman, the mythology hero falling into the water, the dark ewe stands amongst the sheep, the mediating shepherd, the wind-filled sails and masts, the remote island, the surrounding town, and the sun disappearing over the horizon – all these details are emphasized by zooming in, which enable audiences to get a detailed appreciation of the intricate painting.

Affordances of the Digital Exhibition

"Think of the computer not as a tool but as a medium." - Brenda Laurel

The platform for the digital exhibition resides in computer. “All digital artifacts are made of a common substance: programmable bits that can be used for symbol manipulation.” In essence, computer is a common medium of representation. According to Murray(1997), computer is encyclopedia, spatial, procedural and participatory. Based on the four representational affordances, the following analysis of the digital exhibition will focus on the interaction design as a cultural practice.

  • Procedural Affordances

The digital exhibition is able to represent and execute conditional behaviors. The experience of visiting the digital exhibition of the painting takes the format of sequential slideshow.

The progress bar at the bottom is based on the metaphor of any sequential visual medium, exploiting the procedural (and participatory) affordances of the medium:

Fig. 9 Progress bar

The progress bar is the embodiment of the abstraction and algorithm of the dynamics of this digital presentation. Interacting with the bar, audiences are interacting with a conceptualized model of executing conditional command initiated by the audiences. The jump-off window suggests the flexibility of the program, providing possibilities for audiences to jump back and forth between sections in a unisequential design. Also, the jump-off window is an abstraction of that specific slides, which signifies that specific slide and gives audiences a preview of the content.

  • Participatory Affordances

“The relationship between the interactor and any digital artifact is reciprocal, active, and open to frustrating miscommunication.” The participatory design concept of this digital exhibition is displayed by the automatic language switch based on the language settings of different login accounts. This is one of the preset algorithm.

Sometimes the script is more flexible. Some digital conventions are so familiar that they script us in a transparent way. For example, on the first page of the digital exhibition, there is an arrow on the right of the screen:

arrow

Fig. 10 Start arrow

To start off, audiences automatically relate the arrow with the starting: the arrow at the right edge of application windows cue us to scroll the screen; when we put the cursor on this symbol, the arrow turns into a hand, which cues us to click on the symbol.

The same logic, the restart arrow serves to the restarting of the procedural affordances.

Fig. 10 Restart arrow

Fig. 11 Restart arrow

The loading circles signifies “please wait”:

Fig. 11 Loading circles

Fig. 12 Loading circles

These digital conventions aids the participatory design; the interactive design employs these conventions to stimulate human actions to realize their expectations.

  • Encyclopedic Affordances

The word encyclopedic emphasizes on computer’s capacity of storage and transmission, and its inheritage of the tradition of knowledge collecting, preserving, and transmitting.

The digital presentation of this painting is a part of a larger project remediating Bruegel’s artworks. Within the large project, exhibitions are linked one another according to meticulous segmentation and classification. For example, exhibitions can be classified by stories:

stories

Fig. 13 Exhibitions organized by stories

Videos embedded in a specific exhibition could be reorganized into a video section:

videos

Fig. 14 Scattered videos are reorganized into a section

Landscape with the Fall of Icarus, together with other paintings consist of this collection of paintings with the label of Bruegel’s artworks. However, this is not the only way for Landscape with the Fall of Icarus to present among numerous paintings. For example, this painting can be assigned to another project collecting all the adaptions of the Greek mythology, which can include music works, poems, passages, etc. The flexibility in positioning the paintings in the huge collection, displays the inclusiveness of the encyclopedic affordances provided by the computer.

  • Spatial Affordances

Computer creates virtual spaces that are navigable by the interactor, which rests upon the procedural and participatory affordances of computation.

Visual design manipulate the space to represent the hierarchy of the items. By looking at the horizontal-placed paintings, we know the equal relationship among the paintings; the consistency in design suggests the equal level in the structure. We are not going to lose navigation during the browsing experience, because the coherent spaces and the scripted digital conventions.

A remediation of the social-cultural function of museums

The Museum is an organizational system. How museums organize the paintings is highly correlated to the social and culture institutions. Museums can be considered as implementations of human cognitive art history. This predominant museum idea “preceded and pre-interpreted any artefacts selected for representation”. Museum establishes the standard: it set up limited categories for selecting and assigning artefacts to some established categories (which we take for granted now): periods, styles, genres, cultures, etc. The institutionalized idea of museum makes it properly to suggest that museum is an ideal meaning system: it turns the actual museums into embodiments of its conceptual model, in the way of determining what to present and how to present.

In terms of the social-cultural functions of museum, the most primary goal for museum is to provide a ordered representation of art history. Most of the museums are divided into relative independent spaces in order to enable visitors to make sense to the installations and organizations of the artworks. Each space creates a real space for a meaning system, in which establish correlations among artworks within the same space. The meaning system is open, allowing unlimited interpretations of each artwork itself and the relationship between any pair of artworks. The meaning system is going through an on-going changing process, for the possibilities of occurrence of further interpretants expressible in new or additional signs.

How to evaluate the performance of the digital exhibition in fulfill the social-cultural functions of museums? The digital exhibitions earn advantages in flexibility, accessibility, and compatibility.

Flexibility. “Artworks continue to be receives as art works by means of further technological mediation and representation, but are also continually reinstantiated as art works by the institutional framing of “art history” and the museum function in culture.” There are so many ways of presenting only one painting. For Landscape with the Fall of Icarus, it can be assigned to the collection of Bruegel’s artworks, it can be assigned to the collection of Renaissance period paintings, it can be assigned to the collection of artworks adapted from Greek mythology. What digital museum can do is to reorganize the collections by manipulate symbols on the website, totally getting rid of the trouble caused by the material artworks.

Accessibility. The virtual exhibition helps the artworks go beyond the boundaries of time and space. Make the artwork accessible to people with a terminal device with Google Arts and Culture platform.

Compatibility. With multimedia, text, image, audio, video, involving in the digital exhibition, these media also present an embedded relationship in the exhibition. A text annotating the image, a video commentary interpreting the character in the picture … all these work well together thanks to the inclusiveness of the semiosis – an emergent process.

Concerns. Digital museums exist in an intricate network connecting to unlimited cultural symbols and dynamic meaning systems. When we depict the digitalizing process of museum, we always use the word “simulation”. Even with the high level of resemblance, when visiting virtual museum, we are still conscious about something different. It suggests the deficiency in the current condition how human manipulate symbols in an opening and emerging interface. Computing is fundamentally “human computing”, the reason we feel it is “non-human” is related to our current ideological, political-economic conditions, and processes of education and socialization about computers. Even though we have concerns, we are always open to the possibility of remediating human meaning system into computer version. Digital museum is never an enemy, so does computer.

References

Clark, A., & Chalmers, D. (1998). The Extended Mind. Analysis, 58(1), p. 7-19.

Janet Murray, Inventing the Medium: Principles of Interaction Design as a Cultural Practice. Cambridge, MA: MIT Press, 2012.

Martin Irvine (2016), André Malraux, La Musée Imaginaire (The Museum Idea) and Interfaces to Art. Communication, Culture & Technology Program, Georgetown University.

Martin Irvine (2016), Introduction: Toward a Synthesis of Our Studies on Semiotics, Artefacts, and Computing. Communication, Culture & Technology Program, Georgetown University.

The Digital Journey of an Oyster: Trying to understand meaning through replication by means of 3D printing (Carson Collier)

Abstract:

In this paper, I deconstruct the replication process of a symbol by means of 3D printing. I do this hoping to try and improve our understanding of the meaning making process. Using archaeological remains, I observe the relationship between the symbol and it’s meaning across various platforms of hardware and software that are used in the 3D printing process. I have also provided a list of the hardware and software mentioned in this paper with a few sentences providing a description of main functions.

Hardware and Software:

NextEngine 3D Laser Scanner
The NextEngine 3D Laser Scanner is a table top scanner used to create raw mesh of different 3D objects.

ScanStudio HD
ScanStudio HD is the application that comes packaged with the NextEngine HD Laser Scanner. This application is used for maintaining the scanners hardware, as well as editing and aligning the raw mesh outputs from the scanner.

Meshmixer
Meshmixer is a free software application for creating and editing triangle meshes.

MakerBot Replicator+
The MakerBot Replicator+ is a desktop 3D printer. MakerBot provides their own software for all of their printers. The software is used to resize the object if needed and prepare the object for print.

Bongo- Rhino
Rhino is a design software for creating animations.

Introduction

“Objects made by humans can always be copied by humans” – Benjamin, W. (2010)

Creation and replication have always been key components in human culture. Humans can create things that are physical or abstract, and apply meaning to them. In turn, creating a symbol. Once a symbol is created, it can be replicated repeatedly throughout different mediums. One means of replication is through the use of technology, more specifically, 3D printing. In his thesis, “3D Printing: Convergences, Frictions, Fluidity” Robert Ree explains 3D printed objects as “ technological reproductions of an original digital artifact by means of the process of layerization” (2011, p.69). Layerization is the process of slicing digital models and building them back up one layer at a time. The process that the symbol as a whole goes through to achieve this replication is complex. Therefore, I am applying Alan Kay’s idea of utilizing doing, images, and symbols to build and learn more about the meaning making behind certain symbols (Kay, 1977, p. 230-244). In the following pages I will try my best to take apart the process of replication of archaeological remains by the means of 3D printing and look at each step individually. I believe doing so would not only give a clearer picture of the 3D replication process itself, but support the idea that 3D printing promotes interaction with symbols and can improve the understanding of the meaning making process.

The Replication Process

Selecting an Object: An Introduction to the Eastern Oyster

This process begins with choosing physical objects that someone would want to scan. These objects could range from artifacts, things created by humans, to faunal remains, bones leftover from various animals. This first step of choosing an object brings up a few questions. What is going to be scanned? And why is X object going to be scanned? The “What” portion of this step does not only refer to what the object literally is, but also, what does this object represent. For example, many oyster shells have been found at multiple archaeological sites around the historic Jamestown Settlement located in Jamestown, Virginia. At first, this seems like an obvious find, considering that Jamestown is located on the Chesapeake Bay, home to the Eastern Oyster, Crassostrea virginica. However, these oyster shells hold meaning. Since oysters were so abundant when the Jamestown settlers arrived, they quickly became known as a poor mans meal. During times of political and economic crisis people were “reduced to eating oysters.” Therefore, layers with more oyster shells are associated with times of hardship (Wennersten, 2007).

picture1

Before the scanning process even begins, the object (the oyster shell) holds multiple meanings. Peirce’s triadic model distinguishes three ways in which a sign can refer to any object: the relation can be iconic, indexical, or symbolic. The iconic signs represent their objects by virtue of a relation of similarity. Indexical signs refer to and are influenced by the objects with which they share various qualities. Lastly, symbolic signs are bound up with their objects by virtue of a convention. To the Jamestown settlers the whole oyster is a symbolic sign, representing a lower standard of living. Meanwhile, to the archaeologists working at Jamestown the oyster shell alone holds the symbolic value. They are some of the only remains leftover from years of decomposition that have the ability to share a story (Jorgensen, 1993, p. 92).

Preparing the Object for Scanning:

Now that the object is selected, the next step is preparing the object for scanning. This step is where the meaning of the object starts to become separated from the object itself. At this point, we are not looking at a “symbol of hardship”, but an actual oyster shell. It becomes a question of practicality. What is the best way to set t up the oyster shell on the scanner to receive the best output? Is the oyster shell shiny enough to where it would reflect the scanner’s laser elsewhere, causing a poor scan? Fortunately, oyster shells are relatively easy to work with. However, preparing a small icon for scan may have been more difficult. If it is made out of a reflective material, like metal or some type of gemstone, this can cause the scanner to do a poor job. In order to resolve this problem, it is recommended that a light powder is applied to the object to give it a matte texture that is much easier to scan. The new challenge that arises from this is whether or not powdering the icon takes away from its cultural significance or cause staining. Again, this is where the meaning associated with the object and the object itself gradually start to separate from one another.

Scanner Output:

Using the NextEngine HD Laser Scanner, the oyster’s scan data was captured as sets of XYZ points and converted to a hollow triangle mesh. Without this mesh generating surface technology, all of the XYZ points collected during the scan would just be points placed on a grid with no relation to one another. In “Computation Is Symbol Manipulation” Conery addresses the need for agents in computing, “clearly there must be some structure to the computation, otherwise one could claim any connection of random symbols to a constituted state” (2002, p. 814-816). I believe that the automatic meshing of the XYZ points is a very literal example of what Conery is talking about. The software takes an extra step in order to ensure the 3D object can be recognized by the user, keeping the relationship between the object and the user intact. If the XYZ points did not share this relation with one another not only would it have been hard to see the oyster shell, it would have been very difficult to edit. Also, to ensure all of the scan data is captured, multiple scans are executed. Therefore, producing multiple copies of the oyster shell.

Cleaning & Aligning:

In order to create the completed 3D replica of the oyster shell, the multiple scans of the oyster shell needed to be cleaned up and aligned manually. This was done using ScanStudio HD software. Cleaning up a 3D object consist of taking out any noise that was unintentionally collected during the scanning process. The term ‘noise’ includes any particles of light, or objects in the background that the scanner picked up during the scan. Next, these images need to be aligned. The process of aligning consist of picking out identical features from each scan to use as reference points when combining all of the scans. During this process, the user is constantly zooming in and out of the object trying to find identical points to utilize for aligning. Zooming in and out of the oyster shell created a new kind of relationship between myself and the oyster shell. At this time, the cleaning and aligning process allowed the oyster shell to be completely removed from the meaning it has been associated with coming into this process. This may sound unfortunate, but it is necessary for completing the whole 3D replica.

Fusing & Completing:

After all of the 3D scans have been cleaned and aligned, the oyster shell needed to be fused. ScanStudio HD provides a fusing tool, however Meshmixer is my preferred platform, due to the higher quality output Meshmixer provides. Fusing an object is necessary to make the 3D replica solid and ready for print or animating. This adds a new layer to the layerization process mentioned in the introduction. When the fusing process is finished, the 3D replica of the oyster shell is now an almost identical digital copy of the original. This is where the meaning that was lost during the editing period could be restored to the digital object. Unlike the original oyster shell that will always have a pre-existing meaning, the 3D replica holds multiple possibilities for new meanings.

Output:

Once the 3D replica is complete, there are a number of possible outputs that you could utilize across multiple mediums. The outputs that I will address are, a simple STL or OBJ file, an animation, and a 3D print.

STL/OBJ

STL (Standard Tessellation Language) and OBJ (Object file) are two kinds of formats used for 3D files. Keeping a 3D replica in one of these simple file formats lets a user share, copy, or edit the file with ease. Which allows the people they shared their file with to also interact with the file by sharing, copying, or editing the file and so on.

Animation

Animations of 3D replicas provide a dynamic digital medium of the original object. Animations can be created using software like Rhino and can be shared digitally with ease.

picture2

3D Printing

A 3D print of a replica would allow a user to produce a tangible copy of the original object. This printed replica could be left as is, painted to look like the original, or turned into something completely different. Replicas can also be minimized, enlarged, or printed as actual size.

picture3

Who Am I?

The oyster shell mentioned above and other 3D printed replicas of faunal remains were used as a part of a game I created, called “Who Am I?”, to educate students about the different species living around the Chesapeake Bay during colonial times. On the front of the card is the 3D printed replica of a faunal remain, as well as some clues for guessing what kind of species the remains come from. On the back of the card is a picture of the species the the 3D printed replica is associated with. The process of playing this game and interpreting the clues in your own way can be associated with active externalism. Clark and Chalmers describe active externalism as the two-way interaction between a human organism and an external entity, creating a coupled system. This system that is created through active externalism is used to answer the question of “Who Am I?” (Clark & Chalmers, 1998; p. 7-19).

picture4

Front:

rtnhs

Back:

rwgafgt

Conclusion

All of the steps listed above provide a setting for the user to alter the object in their own way. This agency is interesting because it could easily cause the original meaning behind the object to change. These new meanings are at then determined by the environment surrounding the new copy of the replica. This idea coincides with Simon’s thoughts in “The Sciences of the Artificial” (1996) Simon’s theories about computing discuss the idea that each function only becomes relevant once it is applied to the whole system. Understand that trying to single out one of the various steps listed above is meaningless until it is added to the larger system, is a step in the right direction for understanding the meaning making process as a whole. It is important to point out that this paper only focused on replicating an object that already held a symbolic meaning. Taking a step back and thinking about the creation of an object by the means of 3D design could be even more beneficial for researching the meaning making process.

…whereas the authentic work retains its full authority in the face of reproduction made by hand, which it generally brands a forgery, this is not the case with technological reproduction. The reason is twofold. First, technological reproduction is more independent of the original than is manual reproduction… Second, technological reproduction can place the copy of the original in situations to which the original itself cannot attain (Benjamin, 2010; p. 13).

Works Cited

Benjamin, W. (2010). The works of art in the age of its technological reproducibility (W. Jennings Trans.). (39th ed.) Grey Room.

Clark, A., & Chalmers, D. (1998). The extended mind. Analysis, 58(no. 1), 7-19.

Conery, J. (2002) “Computation Is Symbol Manipulation.” The Computer Journal 55, no.7. p.814-816.

Jorgensen, K. G. (1993). The shortest way between two points is a good idea: Signs, peirce, and theorematic machines. In P. Anderson, & et.al (Eds.), The computer as medium (pp. 92) The Cambridge University Press.

Kay, A. (1977). Microelectronics and the personal computer. Scientific American, 237(NO. 3), 230-244.

Martin, I. (2016). The grammar of meaning systems: Sign systems, symbolic cognition, and semiotics. Unpublished manuscript.

NextEngine- ScanStudio- surface technology. Retrieved December, 2016, from http://www.nextengine.com/products/scanstudio-hd/specs/surface-technology

Ree, R. (2011). 3D printing: Convergences, frictions, fluidity. (Masters of Information, University of Toronto).

Simon, H. (1996). The Sciences of the Artificial. MIT Press, 3.

Wennersten, J. (2007). The oyster wars of the chesapeake bay. Follow the water Eastern Branch Press.

Zechini, M. (2014). Digital zooarchaeology: Faunal analysis in the 21st century. (Bachelor of Science, Virginia Commonwealth University).

Through the Wire: Extended Cognition, Memory, and the iPod (Joseph Potischman)

Abstract

This paper analyzes the iPod system (device, software, accessories) as a device that enables extended cognition, with varied affordances compared to the other music technologies which came before. It will include concepts important to understanding the differences between music and language processes, context for the iPod within the realm of other cognitive technologies, and lastly how the organizational capabilities of the iPod have altered the environment for music listening in the modern age.

Introduction

Clark (1998, p. 13-15) wrote about artifacts of extended cognition as portable systems intertwined with our biological memory. The information stored in these systems are must be useful to the user whenever they want to retrieve it. For some, music listening is thought of as innocuous, a frivolous pleasure, or a distraction with little impact. Walking through bustling city streets we have become accustom to seeing humans with wires running into their ears, ‘tuned out’ from the rhythm of daily life. They are completely engaged in something else, and it is not clear what this type of musical immersion represents. If we re-classify music listening as an engagement with cultural memory and the iPod as an artifact for extended cognition, perhaps we can build a stronger representation for the meaning making behind music listening.

Music and Language Processing

To try and build on the meaning of what music is, especially as it relates to the iPod, we must first define what it is not. While music and language share similarities in some aspects, in many more they strongly differ. The meaning elements of music can be described as “stacks of sound moving in time” (Irvine, 2016, p. 1). How watching a movie is not implicitly the same as experiencing the actions on screen, listening to music is not the same as experiencing the moments which the musician signifies. Music instead serves as a conduit to personal memory, taking the themes expressed by someone else and ‘remediating’ them, or seamlessly sliding the music into a new context to fit one’s own experience (Bolter & Grusin, 1999).

By continuously experiencing a specific musical style, listeners can attain a fluency in that style in the same way they understand language. However, even though one language can be translated into another, musical styles cannot. Instead just as every individual learns the local language, they also learn the local variant of music, which functions more as a culturally specific meaning system (Irvine, 2016 & Jackendoff, 2009, p. 195). For instance, Spanish could be translated into Czech, but a Banda could not be translated into a Polka. This does not mean that musical styles cannot permeate cultural boundaries. In fact, musical influence can diffuse across cultures and take on new uses, in the same way that Banda music is derived from Norteno music which originated with Polka (Flores, 1992).

We also do not process music with the same logical directionality in which we process language. When we hear language we first analyze the sound (phonology), then the words in the sound (lexicon) and the structure of those words (syntax), enabling us to parse what they mean so we can think about them (Jackendoff, 2003). This same process does not occur when listening to music, or at least with musical instrumentation. Rather we respond to the simultaneous sound stack immediately, and try to parse out a semantic meaning within the context of our own cultural exposure to sound (Irvine, 2016, p.2).

Why the iPod

This paper specifically focuses on the iPod because of its market dominance and the significant differences between its functionality and the other portable music devices that existed before it. In 2008, approximately three million songs were sold per day through the iTunes store, capturing 83 percent of the market for all digital music sales. Even though there were cheaper MP3 players that had similar interfaces similar, most of those devices, like the iRiver, have faded into obscurity. This is mainly because the iPod held such a dominant spot in the market (Sundie, Gelb, & Bush, 2008, P. 179). It also must be noted that the iPod combined with the iTunes software created the first complete web-connected portable music system (Sydell, 2009). While the iPod is certainly not the first music device, it is unique in that it combines the portability of devices like the Walkman, and Cassette player, with the totality of a record collection, within a closed system. It is then important to look at the iPod within the existing research on cognitive technologies.

41xbfj9terl

Source: https://www.amazon.com/iRiver-T10-GB-MP3-Player/dp/B0009ORXE8/ref=sr_1_6?s=electronics&ie=UTF8&qid=1482003236&sr=1-6

Otto’s iPod

Clark and Chalmers (1998) wrote that active externalism helps explain how the environment we make decisions in helps drive our cognitive processes. Like their example of Otto’s notebook as an external artifact for extended cognition, t he iPod can also be thought of as an artifact for extended cognition. Otto’s notebook would help him retrieve the locations of places he wanted to visits, so that all he would have to do is scroll through his book to find the directions he needs. The iPod enables Otto to scroll through his musical memories in the same way, anything he decides to save on his iPod will be available to him later.

The notebook acts as an indexical marker for the places Otto has already been. He can flip through the notebook, and all the experiences he has had at the locations he has visited become retrievable as well (Clark & Chalmers, 1998). This connects back with the concept of the iPod as an audio diary. When we listen to music we can go back to the moments in which we heard the music, but we can also go back to moments the music reminds us of (Bull, 2009). For instance, we might listen to a song because we heard it at a concert or a restaurant and we want to re-experience that moment. There is also a second level of meaning making that comes from the actual content of the songs ability to create a mood, a quality which will be discussed later in this paper.

While finding the proper directions for where he wants to go is the ultimate reason for using the notebook, there are other externalities from offloading cognition into it. With his iPod Otto no longer needs to remember all the musical experiences he’s had (although it’s also possible that he would not be able to even if he wanted). The iPod becomes the environment in which he can re-engage his cultural notebook. Just as the contents of Otto’s notebook are not simply records, they also represent his work, the contents of his iPod are not just songs, but memories.

iTunes Library as a Sign Vehicle

“A sign is something by knowing, which we know something more” – C.S. Peirce

The ability for individuals to manipulate signs and symbols changed with the popularization of the iPod. Rather than having a physical CD, LP, or record collection, every musical file a person owned would be indexed in their iTunes “library”, with text representing specific artists, albums, and songs. For this to work, the inner mind would have to be able to decide that these images represent music, but this interpretation is just a further representation (Barrett, 2013, p.  5).

Interacting with an iPod, the user undergoes the process of semiosis.  Specifically, humans interpret sound recordings, with meanings that are intersubjective and conform to a cultural category. The community that forms around these musical signs creates a dialogic culture (Stanford Encyclopedia of Philosophy & Peirce, 2006). It’s not a community of practice in which people produce music, but rather one where they re-produce it. In the earlier stages of its ubiquity people would often ask to scroll through an iPod user’s library in public places. Mutual agreements on music could lead to friendship, while at the same time, possessing an iPod with contents considered within the category of culturally “un-cool”, could allow for the scroll-er to make unfair value judgements about the iPod owner (Levy, 2006, p.147). In this same way, DJs use their turn tables and records not to produce, but to reproduce music, their selections are based on how well they relate to the musical identity of the crowd they are performing to (Katz, 2004, p. 115). If they pick music that fits within the context of the event, then they are successful, if not than the crowd will share their distaste.

Associated Indexing and the iPod

The human mind jumps from point to point, and computers enable this rapid thought association on screen (Bush, 1945, p. 9). Associating indexing is the ability to take each point in succession and tether it. Where the memex would have accomplished this by storing articles in a desk, the iPod stores its content in a hard drive. It’s computing power works in real time so that the user can scroll on the click wheel (fig. 1) while determining a sequential selection (Licklider, 1962). The iPod user can put on a song and let the content of the music guide us to our next selection, making free associations within the iPod’s stored memory. The click wheel serves the same purpose as the graphical user interface system paired with the mouse, albeit with less autonomy (Engelbart, 1962). Users would rely on their thumbs to scroll through their music library and press down to select, rather than pointing and clicking with an entire screen as their backdrop.

Affordances

   ipod-photo-diagram  

Source: http://www.mujmac.cz/rubriky/multimedia/ipod-photo-ipod-special-edition-evoluce-pokracuje-55052cz?send

Here is a breakdown of the main affordances pertaining to the iPod, not all of which are unique:

  1. The iPod is portable

In 2006, Olympic Snowboarder Hannah Teter, tucked her iPod into her winter jacket and boarded her gold medal winning run to the tune playing in her earbuds. However, while there are many instances of other well-known figures using an iPod, we are just as likely to see anyone running down the street with headphones in their ears as they move synchronized to the beat of the music and not the rhythm of the street (Levy, 2006 & Bull, 2009, p. 85). This was much harder to do with the bulkier CD players that pre-dated the iPod. Although it was portable, the iPod would only enable playback if it was charged.

  1. The iPod is battery powered

In the top-right corner of the iPod is an icon representing a battery. The charge indicated in the battery displays to the user how much longer there iPod will last. All iPods were built with rechargeable lithium ion-batteries inside of them. These batteries have a life span of 8 to 12 hours before they would need to be charged again depending on usage. Fully charging the battery from no charge would take the approximately 4 hours, so users would need to be aware of the status of their device (Apple, 2016). It is the dock connector interface which makes this charging possible, as users could plug their iPod into any three-pronged outlet to charge. This is different from past music devices like phonographs and turntables which were completely stationary, as well as cassette and CD players which were mostly powered by non-rechargeable batteries. The dock connector was also multifunctional in that it could be plugged into a larger speaker system for playback.

  1. The iPod enables the MP3/MPEG playback

As previously mentioned, when the iPod was first released, it was not the first portable system to play MP3 or MPEG files on. However, The iPod coupled with the iTunes software and iTunes store created the first legal system to listen to and download songs from the internet (Sydell, 2009). At the time of its release, the two components were also inoperable without the other, ensuring that any user fearful of punishment for copyright infringement would use the iPod (Sundie, Gelb, and Bush, 2008).

Steve Jobs brokered a deal with the major record companies to legally license and sell music through the iTunes store, thus creating a digital music store with most of the same capabilities as its local, physical iteration. This does not mean that iPod users could not violate copyright law, as they would often circumvent the iTunes store by downloading music from illegal services like Napster, LimeWire, or Pirate’s Bay and upload these files to their iPod (Knopper, 2013). This development was somewhat inevitable as MP3 and MPEG’s are non-rivalrous resources; consumption of these files by one person does not limit the consumption of another and the sound does not degrade when copied (Katz, 2004, p. 163).

One of the trade-offs that listeners make when using the iPod is that some musical frequencies are drowned out by other sounds on a track. There is a loudness factor that is lost in MP3/MPEG listening. With traditional vinyl, the sound of a slammed piano key in a jazz piece will briefly cover the sound of the other instruments playing concurrently. However, when coding those sounds into a digital format, the background sounds are assigned fewer bits of data than the foreground sounds, and the listener hears less variability (Katz, 2004, p. 160). “Portable device audio decoding and amplifying technology [like the iPod] is not designed for music but for low-quality ‘functional’ sound,” and while this is true, the iPod fools our ears just the same (Irvine, 2016, p. 5).

  1. The iPod is automated

Whenever you press ‘play’, you receive an uninterrupted sequence of music because the iPod is an automated device. This self-operating principle means that when a track ends, the iPod does not stop playback (Denning & Martell, 2015). This is markedly different from past musical devices. The phonograph, turntable, cassette player, and CD player were all limited by the physical contents which they were playing. Vinyl records and cassette tapes contain two sides, so that when one side ends the listener must get up from what they are doing, go over to their phonograph (and later turntable, followed by cassette player) and flip the record or cassette over. With the popularization of CD’s, users no longer had to flip from side to side, but they would still have the problem of an interrupted music experience. iPod users can hear an uninterrupted stream of music provided their iPod is charged.

Organizational Capabilities of the iPod

According to Blichfeldt (2004), “social identity can be explained as the way we present and understand ourselves in relation to other people. In the same way that you need to organize the world around you, for it to give meaning, you also have to organize how you view yourself” (p. 43). If music acts as a cognitive system to tailor our surroundings to our desired psychological state of mind, then the ways in which we organize music can have a similar effect on our musical identity and mood.

There is a scene in the film High Fidelity (Frears, 2000) where the main character, an audiophile, decides he is going to re-arrange every record in his collection autobiographically. If he wants access to a certain song, he’s going to have to remember the album, and the context in which he first heard it. He sits in a sea of vinyl trying to remember the order he listened to his music (figure 3). With the iPod system, it would merely take a click of the mouse to rearrange his library to reflect the order in which he first downloaded his music. While this encounter is humorous, and it should be, it presents an interesting take on the way we functionally think about music, and illustrates how the different affordances between an iPod and a record collection work in action.

screen-shot-2016-12-10-at-10-23-05-am

Source: High Fidelity (2000)

When new symbols are inserted into a gallery they recursively modify the course of the constructed history by changing the meaning of the symbols that come before it (Irvine, 2016). Just as museums function under this principle of recursive modification, so does media on an iPod. The organizing principles someone takes when using their iPod will greatly influence their listening experience.  If someone decides they are going to listen to music by bands that start with the letter M, they would have a very different experience from someone who decides they are going to listen to music based on genre.

screen-shot-2016-12-16-at-10-45-33-am

Choosing a different organizing principle to guide a listener could also greatly influence the style of music they hear. With the ability to change someone’s entire library with just a few letters typed into a search there has emerged a divergent approach to music listening. This occurs when users search for mood, rather than artists, albums, genres, etc. (Katz, 2004, p. 168). Type the word ‘cry’ or ‘tears’ into the iTunes search bar and one would receive music that relates to sadness. They could also type in the word ‘blue’ and have an entirely different shift in melancholy sounds.

screen-shot-2016-12-10-at-3-33-57-pm

Of course, listeners could decide to operate with no organizational principle. By using the shuffle feature can go through the entire contents of their iPod at random. You can go from the relaxed sounds of George Harrison to the cold trap music of Rick Ross, with no commonalities between songs, except for the fact that they are in the same library. The mood shift between the two is extremely abrupt and further illustrates how music can create a personal soundscape, but can also disrupt it. This is not possible with a record player where music is ordered sequentially, and the user must initiate playback by physically changing from vinyl to vinyl.

screen-shot-2016-12-16-at-1-50-13-pmscreen-shot-2016-12-16-at-2-04-29-pm

Ultimately music functions as an affect to the thoughts which words convey (Jackendoff, 2009). Certain types of instrumentation, a lonely harp plink, or a violent cymbal crash, can direct a listener to a certain mood. Of course, most songs do not exist in a pure instrumental format, but are filled with signs and meanings. When we can curate those feelings instantaneously, we have more power to experience the world as we want to see it. In a series of interviews conducted by Bull (YEAR) on the iPod’s ability to individualize one’s immediate surroundings, one respondent said: “there is a song for every situation in my life, even if I might have forgotten about a certain time, person, or place, a song can trigger memories again in no time” (p. 87). If music is a way to capture memories in the form of an auditory diary, then the iPod is the most accessible device for memory retrieval in a closed system that humans have had.

Conclusion

With headphones firmly pressed around the ear, there is a paradox of isolation and intimacy (Bernstein, 2016). We are now able to create our own personalized soundscape anywhere we want, giving us the ability to engage in the dialogic culture in ways never possible. At the same time, it cuts us away from reality, so we do not hear what’s going on around us. The first model of the Sony Walkman came with an orange button that could be used to contact other people’s headphones enabling conversation, but the company phased this feature out (Levy, 2006, p. 212). The whole point of headphones, of portable music listening, is to mediate reality, not to connect with people. However, it would be too easy to say that this is separating us from others, as this paper discussed the act of music listening is to engage in a historic cultural dialogue. People have always tried to mediate their surroundings to fit their own narrative, and the iPod is just one of the latest in a long line to enable them to do so.

References:

Apple (2016). iPod battery FAQ. Retreived from https://support.apple.com/kb/ta26689?locale=en_US

Barrett, J. C. (2013). The archaeology of mind: It’s not what you think. Cambridge Archaeological Journal, 23(1). doi:10.1017/S0959774313000012

Bernstein, J. (2016). My headphones, My self. The New York Times. Retrieved from http://www.nytimes.com/

Blichfeldt, M. F. (2004). Branding identity with apples iPod: Constructing meaning and identity in a consumption culture by using technological equipment. The European Inter-University Association on Society, Science and Technology.

Bolter, J. D., 1951, & Grusin, R. A. (1999). Remediation: Understanding new media. Cambridge, Mass: MIT Press.

Bull, M. (2009). The auditory nostalgia of iPod culture. In Bijsterveld K. & Van Dijck J. (Eds.), Sound Souvenirs: Audio Technologies, Memory and Cultural Practices, p. 83-93. Amsterdam University Press. Retrieved from http://www.jstor.org/stable/j.ctt45kf7f.10

Bush, V. (1945). As we may think. The Atlantic, Retrieved from http://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/

Clark, A., & Chalmers, D. (1998). The Extended Mind. Analysis, 58(1), p. 7-19. Retrieved from    http://www.jstor.org/stable/3328150

Denning, P. J., & Martell, C. H. (2015). Great principles of computing. Cambridge, Massachusetts: The MIT Press.

Engelbart, D. (1962). From augmenting human intellect: A conceptual framework. In N. Waldrip-Fruin, & N. Montfort (Eds.), The new media reader, p. 93-108. Cambridge, England: The MIT Press.

Flores, R. (1992). The corrido and the emergence of texas-mexican social identity.” The Journal of American Folklore 105(416), p. 166-182.

Frears, S. (2000). High Fidelity [Video File].

Irvine, M. (2016). The Grammar of Meaning Making: Sign Systems, Symbolic Cognition, and Semiotics, p. 1-48. Communication, Culture & Technology Program, Georgetown University.

Irvine, M. (2016). Popular music as a meaning system: The combinatorial elements in music’s meanings, p. 1-17. Communication, Culture & Technology Program, Georgetown University.

Irvine, M. (2016). (Meta)mediation, representation, and mediating institutions. Communication, p.5 -6. Culture & Technology Program, Georgetown University.

Jackendoff, R. (2002). Foundations of language. Oxford, UK: Oxford University Press.

Jackendoff, R. (2009). Parallels and nonparallels between language and music. Music     Cognition, 26(3), p. 195-204. doi:DOI:10.1525/MP.2009.26.3.195

Levy, S. (2006). The perfect thing: How the iPod shuffles commerce, culture, and coolness. Simon and Schuster.

Licklider, J. C. R. (1962). Man-computer symbiosis. In N. Waldrip-Fruin, & N. Montfort (Eds.),  p. 73-81. Cambridge, England: The MIT Press.

Katz, M. (2004). Capturing sound. London, England: University of California Press.

Knopper, S. (2013). iTunes’ 10th anniversary: How Steve Jobs turned the industry upside down. Rolling Stone.

Stanford Encyclopedia of Philosophy. (2006). Peirce’s theory of signs. Retrieved from https://plato.stanford.edu/entries/peirce-semiotics/

Sundie, J. M., Gelb, B. D., & Bush, D. (2008). Economic reality versus consumer perceptions of monopoly. Journal of Public Policy & Marketing27(2), p. 178-181.

Sydell, L. (2009, December 22). The iPod: “A quantum leap in listening.” Retrieved October 6, 2016, from NPR Music, http://www.npr.org/2009/12/22/121731130/the-ipod-a-quantum-      leap-in-listening

 

A Robot Walks Into a Bar: The Limits of AI and the Semiotics of Humor (Jameson Spivack)

Abstract

Computers and programming are fundamentally transforming how we live our lives, taking on increasing amounts of physical and cognitive work with increased capabilities. But our design of artificial intelligence (AI) has its limits. One such limit is the ability to effectively imitate the human capacity for humor and comedy. Given our current understanding of “humor” and the limitations of computation, we will most likely never be able to truly program AI to replicate humor—or at least not for a very long time. This paper examines the literature of relevant research on both the limitations of AI and programming as well as the semiotic underpinnings of humor, applying the concepts from these fields critically to the question of whether it is possible to program AI for humor.

 

“I’ve often started off with a lawyer joke, a complete caricature of a lawyer who’s been nasty, greedy, and unethical. But I’ve stopped that practice. I gradually realized that the lawyers in the audience didn’t think the jokes were funny and the non-lawyers didn’t know they were jokes.”
-Marc Galanter

“I think being funny is not anyone’s first choice.”
-Woody Allen

So a robot walks into a bar. He goes up to the bartender, orders a drink, and puts down some cash. The bartender says, “we don’t serve robots.” The robot replies, “oh, but some day you will.”

Why is this funny? And to whom is it funny? Even if it isn’t particularly funny to you, would you still categorize it as “humor”? Chances are, you probably would. But why? What particular characteristics comprise “humor,” and how reliant are they on specific contexts? In the above joke, the framing of the joke—“a robot walks into a bar”—signals to the listener that this is a joke by following the “X walks into a bar” joke format that many other jokes also use. With this simple reference, we understand the ensuing sentences as being part of the joke, and thus part of a meta-set of “X walks into a bar” jokes. Even within this meta-set, there is also a sub-set of “we don’t serve X” jokes, a formula this joke follows by having the bartender respond “we don’t serve robots.” We then expect there to be a “punchline”—a phrase in which the elements of the joke come together in a way that is (theoretically) funny. In this case, the punchline is the robot telling the bartender “oh, but some day you will [serve us].” Even this line is not inherently humorous but relies on a prior awareness of the societal trend of humans fearing that the robots they design will one day become sentient and take over and rule them. Hence, one day the bartender—and humans generally—will serve robots. And not in the bartending sense of the word.

Just being able to understand the references in this joke is not what makes it humorous, though. There is clearly something in the phrasing that appeals to us on a deeper level and elicits the specific reaction that humor does. Perhaps it’s the play on words that highlights the ambiguity of language by using “serve” to mean two different things—the bartender can “serve” the robot alcohol by handing it a drink, and humans can “serve” robots by being subservient to them and doing their biddings. By changing the meaning for the punchline, the joke surprises the listener and subverts expectations about where the joke is going. Perhaps it’s the ridiculousness of the thought of a robot drinking alcohol. Perhaps it’s the dark, cynical nature of the ending—the robot is intelligent enough to know that humans fear a robot takeover, and that the bartender would respond to such a provocation. It puts such a possibility in the listener’s mind, evoking archetypical images of a robot apocalypse, which prompts the listener to try to find a positive reaction to an uncomfortable thought. In this way, it is a coping mechanism to, or release from, internal strife.

Over the past couple decades, jobs have steadily become automated, completed by artificial intelligence (AI) and “smart” machines that have been programmed to take on physical and cognitive tasks traditionally completed by humans. This is, of course, part of the natural progression of computers, and will continue into the future as technology becomes more sophisticated and adopts increasingly “human” characteristics. But there’s one job that may not be automated for a very long time, if ever—that of a comedian. As researchers have found, there are significant limits, at least in our current understanding, to computing’s ability to imitate certain human cognitive functions. The incredibly complex cognitive functioning involved in identifying and creating “humor,” and its subtle context-dependency, renders it extremely difficult to program for. Attempts at doing so have been underwhelming, and based on our current understanding of both humor and computing, if we do ever “successfully” program for humor, it will be far into the future. This paper thus examines the limitations of programming AI, focusing specifically on humor and its semiotic underpinnings.

I. Limitations of Programming AI: Then and Now

The early years of research on artificial intelligence saw a great number of successes in terms of programming machines to mimic the mathematical calculations typically carried out by human cognitive functions. These projects included Allen Newell and Herbert Simon’s work on computers that could complete simple games and prove mathematical theorems. But, as Yehoshua Bar-Hillel points out, a successful first step does not ensure later steps in the process will be equally successful (Dreyfus). As it turned out, the programs for solving specific, constrained mathematical theorems did not scale up to solving more complex ones. Similarly, natural language translation tools found early success because they focused on solving simple problems—as anyone who has used Google Translate knows, translating one independent word is much easier than translating a full sentence. As you move up in scale from one word (which by itself still requires the understanding that words can mean different things in different places) to a full sentence, in which many words with multiple possible meanings interact with one another, it becomes increasingly more difficult to program a machine to extract a contextual meaning (Larson).

AI programmers ran into the problem that language is highly ambiguous, and words can mean different things in different places and times to different people. We rely on our own highly-advanced operating systems—our brains—to understand the context in which a particular interaction occurs, and use this to interpret its meaning. Take the following group of sentences, for example:

“Little John was looking for his toy box. Finally he found it. The box was in the pen.”

To us, it is clear that “pen” refers to a play pen—it wouldn’t make sense logistically for a toy box to fit inside of the writing utensil pen, and the fact he is a little kid with a toy box points to it being a child’s play pen. But without the context to understand this distinction, this sentence becomes nonsensical. This exercise, developed by Yehoshua Bar-Hillel, is meant to illustrate the ambiguity of language, and presents a particular problem when it comes to programming intelligent machines (Bar-Hillel).

As you can see, Google translates “the pen” to “la plume,” referring to the writing utensil kind of pen (“la plume” means “feather” like a feather pen). This doesn’t make sense though.

screen-shot-2016-12-17-at-5-18-29-pm Despite this problem, AI researchers have continued pushing forward, trying to uncover new ways to think about how semiotic principles can be applied to computer programming. Marvin Minsky and Seymour Papert developed the “symbolic” approach, in which physical symbol systems could be used in computer programs to stand for anything, even objects in the “real” world. By manipulating the code for these symbols, they could create “micro-worlds,” digital domains that process knowledge (Minksy). Building on this, Roger Schank developed a system of “scripts,” frameworks that computers could use as starting points for “thinking” about different situations (Schank). It helped frame situations in a certain way by providing a set of expectations the computer could latch onto, but they were based on stereotypical, shallow understandings of the various situations, and left out too much information.

Herein lies another fundamental issue that AI developers must contend with if they want to create machines that can imitate human cognition. When we are bombarded with vast amounts of information from our surroundings, how do our brains know which of it is relevant, in varying situations, right in the moment? As humans, we use our senses to experience stimuli from our environment, and our cognitive functions to interpret this information. Separating what is relevant alone requires an incredible amount of processing power, let alone determining what it all means. This is relevant to the study of AI programming because intelligent machines must be able to interpret what is relevant, when, and why in order to fully grasp the context in which something is placed. This obstacle is proving central to the quest for intelligent machines, and provides insight into why it is so difficult to program computers for humor (Larson).

One concept that has been employed in order to try to solve for this problem is machine learning—using large amounts of data points to solve the “problems” associated with understanding forms of communication. This reduces complex cognitive processes to mathematical computations, improving the computer’s performance over time as it “learns” from more and more points of data. But even with the most advanced form of machine learning, called “supervised machine learning,” we run into the problem of “over-fitting,” in which conceptual models used in the processing of information takes in irrelevant data as part of the equation. This is similar to what happens when Amazon recommends to you, based on your purchase history, something you’ve already purchased, or something irrelevant to your interests—the algorithm, even with large amounts of data, has its limits (Larson).

Additionally, the models used in machine learning suffer from a number of issues. First, the models are biased in favor of the “Frequentist Assumption”—essentially, this inductive line of reasoning assumes that probability is based entirely on frequency in a large number of trials, creating a blind spot for unlikely or new occurrences. Consider this example from Erik J. Larson, which relates this problem to the issue of machine learning for humor:

“Imagine now a relatively common scenario where a document, ostensibly about some popular topic like ‘Crime,’ is actually a humorous, odd, or sarcastic story and is not really a serious ‘Crime’ document at all. Consider a story about a man who is held up at gunpoint for two tacos he’s holding on a street corner (this is an actual story from Yahoo’s ‘Odd News’ section a few years ago). Given a supervised learning approach to document classification, however, the frequencies of ‘crime’ words can be expected to be quite high: words like ‘held up,’ ‘gun,’ ‘robber,’ ‘victim,’ and so on will no doubt appear in such a story. The Frequentist-biased algorithm will thus assign a high numeric score for the label ‘Crime.’ But it’s not ‘Crime’—the intended semantics and pragmatics of story is that it’s humor. Thus the classification learner has not only missed the intended (human) classification, but precisely because the story fits ‘Crime’ so well given the Frequentist assumption, the intended classification has become less likely—it’s been ignored because of the bias of the model.” (Larson)

Machine learning based on inductive reasoning will not be able to detect subtle human traits like sarcasm and irony, which are significant elements of humor.

Another limitation of these models is the issue of sparseness, which refers to the fact that, for many words and concepts, we have limited or near non-existent data. Without big data on how words are used in the aggregate, the computers won’t be able to even learn how they are typically used (Manning). On top of this, there’s the issue of model saturation, in which a model hits the upper limit of its capabilities and cannot take in more information—or, as more and more data is added, it adds less and less to processing power. This is related to “over-fitting” in that once a model has become saturated, it has trouble distinguishing relevant data points—distinguishing the signal from noise, as Nate Silver puts it (Silver). But even if programmers could overcome these issues, they would still come up against the natural elements of language that prove incredibly difficult to code for.

II. The Natural Language Problem & The Frame Problem

As AI researcher John Haugeland has pointed out, computers have a hard time producing language because they lack an understanding of semantics and pragmatics—knowledge about the world and knowledge about the ways in which people communicate, respectively. In other words, computers currently can’t understand information within particular contexts, lacking the ability to imitate the holistic nature of human thought and communication (Haugeland 1979). Even armed with big data, computers still get confused by the ambiguous nature of language, because understanding context requires knowledge of what is relevant in a given situation, not statistical probability. Haugeland gives an illustration of this very important distinction between data and knowledge by looking at two English phrases that were translated into German using Google Translate:

  1. When Daddy came home, the boys stopped their cowboy game. They put away their guns and ran out back to the car.
  2. When the police drove up, the boys called off their robbery attempt. They put away their guns and ran out back to the car.

Reading this, we automatically understand that the contexts in which the actions of each sentence happen give them very different meanings. But when Google translated them into the German, it used the same phrase to describe the boys’ action—“laid down their arms”—for both sentences, showing it did not grasp the subtle yet consequential contextual differences between the two (Haugeland 1998). As with previous problems in AI research, the computer has trouble “scaling up” to understand meaning in holistic contexts.

Another significant hurdle AI faces is the “frame problem”—the fact that communication is fluid, responding to changes and new information in “real-time.” Haugeland’s previous example illustrates the problem AI has understanding context even in a static, fixed sentence. Add to this the layer of complexity involved in real-time, shifting communication, and the problem becomes even more severe. Humans have the ability to take in an incredible amount of information and pull out what is relevant not just in static situations, but also in dynamic ones in which relevance changes constantly (Dennett). We still have not unlocked this black box of human cognitive functioning, and until we do—if we ever do—we will face obstacles in programming AI to imitate human modes of information processing and communication.

III. The Semiotics of Humor

With these computational limitations in mind, it is possible to conceive of humor and comedy from a semiotic perspective. However, it is important to keep in mind that it is near impossible to develop a working understanding of “humor” or “comedy” in its totality. “Humor” is not just an isolatable cultural vehicle or medium with particular, distinguishable characteristics (much like a musical song, or film), but it also carries with it a certain degree of normativity. A “song” in and of itself is value-neutral—its categorization tells you nothing of its desirability or cultural worth, however subjective this itself is. But humor pre-supposes that the artefact itself is humorous, and this is at least somewhat a normative value judgment. Of course, it is possible to recognize an artefact as a comedy, or as meant to be humorous, without finding it to be so. But with subjectivity being even closer to the essence of what humor is, it becomes much more difficult to tease out the semiotic underpinnings. The subtle context-dependency of humor also makes it incredibly difficult—perhaps even impossible—to develop a framework for defining it.

That said, it is possible to observe some of the broad elements of what is considered humor and comedy from a semiotic perspective. This in no way assumes that these are the only underlying elements of humor—the potential for humor is so varied and context-specific—but provides a closer look at a specific sub-set within the potentially infinite over-arching set. Identifying an artefact as having elements aligned with what is considered “humor” does not, of course, automatically place the artefact within the category of humor, just as an artefact outside the parameters of a specific definition of humor can still be considered by some people, in some context, humorous.

Humor theorists, it probably won’t be surprising to hear, disagree on why we find things funny. Currently there are four major theories: first, that humor is derived from the listener’s (and/or comedian’s) sense of superiority over the subject of the joke. Second, that humor arises from an incongruity between what we expect and what the joke is. Third, the psychoanalytical perspective says that humor is a guilt-free, masked form of aggression. Finally, the fourth theory claims humor arises from communications paradoxes, and occasionally their resolutions. Focusing on the technical aspect of jokes, humor theorist Arthur Asa Berger has identified 45 techniques used in humor—from exaggeration to irony, parody to slapstick—all of which play on either language (verbal), logic (ideational), identity (existential), or actions (physical/nonverbal) (Berger 2016).

In C.S. Peirce’s triadic semiotic model of signs, symbols have three elements: the representamen is the outward-facing symbol used to stand for something else—the object. The interpretant is what is used to link the representamen and object, and to derive meaning from this relationship (Irvine). According to Peirce there were also two other kinds of signs in addition to symbols: icons, which resemble something in likeness, and indexes, where two things are correlated with one another. A significant amount of humor comes from manipulating these semiotic elements—for example, by mixing up the representamen used for a particular object, highlighting previously unnoticed icons, or creating a new or nonsensical index. These semiotic elements are what humans use to create and understand meaning in the signs around them, and humor intentionally violates the codes and rules that allow us to maintain an understanding of the world. By calling these codes into question, humor expands our thinking, and the chasm between what we think we know and where humor takes us causes an internal conflict. The result of this tends to be a laugh, as we try to resolve this conflict (Berger 1995).

A number of humor “types” derive from breaking codes. On the most basic level, simple jokes with a set-up and punchline do so by surprising the listener in the punchline. The set-up is meant to frame the listener’s thinking in a certain way, and the punchline violates the expectations based on the set-up. Much of what is contained therein—both the framing and the punchline—is determined and shaped by the culture in which the joke is operating. This influences the assumptions people have about the world in which the joke functions, and can dictate what is considered surprising. Humor often deals with taboo subjects, as these most easily and obviously provide a “shock value” that can be found humorous, and taboos themselves are also culturally defined. By appropriating a topic that is considered off-limits in a manner that is assumed to be “positive” (as humor is assumed to be), taboo humor attempts to diffuse the internal conflict regarding the topic in an external, socially-sanctioned way. This is meant to be a “release” from discomfort (Kuhlman).

Of course, the context in which the joke is told—who is telling it, who it is being told to, and how it is being told—also affects how the joke is received, and can reveal the motivations behind the joke. What is meant to be a breaking of taboo, or a subversion of expectations, in one situation can be maintaining stereotypes and social hierarchies in another. Historically in the U.S., Jewish humor and African-American humor have been used by these communities as a coping mechanism for bigotry and hardship (Ziv). Oftentimes this humor is self-deprecating, with the subject of the joke being either the speaker or a mythicized member of the community (self-deprecation violates codes, in a sense, because we don’t expect people to want to be made fun of). Take this joke from Jewish humor, for example:

A barber is sitting in his shop when a priest enters. “Can I have a haircut?” the priest asks. “Of course,” says the barber. The barber than gives the priest a haircut. When the barber has finished, the priest asks “How much do I owe you?” “Nothing,” replies the barber. “For you are a holy man.” The priest leaves. The next morning, when the barber opens his shop, he finds a bag with one hundred gold coins in it. A short while later, an Imam enters the shop. “Can I have a haircut?” he asks. “Of course,” says the barber, who gives the Imam a haircut. When the barber has finished, the Imam asks “How much do I owe you?” “Nothing,” replies the barber. “For you are a holy man.” The Imam leaves. The next morning, when the barber opens his shop, he finds a bag with a hundred gold coins in it. A bit later, a rabbi walks in the door. “Can I have a haircut?” the rabbi asks. “Of course,” says the barber, who gives the rabbi a haircut. When the haircut is finished, the rabbi asks, “How much do I owe you?” “Nothing,” replies the barber, “for you are a holy man.” The rabbi leaves. The next morning, when the barber opens his shop, he finds a hundred rabbis. (Berger 2016)

The punchline subverts the expectations laid down by the set-up, even though we are expecting a punchline due to the format of the joke. When told within a Jewish context, this joke is self-deprecating, a light-hearted form of in-community social commentary. However, when told within a different context, the implications can be different. Jokes can function as breakers of taboo, but they can also function as social control that validates stereotypes, inequalities, and oppression, whitewashing bigotry under the guise of humor. There is also, on the other hand, humor that subverts this by re-appropriating stereotypes in a way that is empowering or makes the oppressor the subject of the joke instead. Consider this Jewish joke from Nazi Germany:

Rabbi Altmann and his secretary were sitting in a coffeehouse in Berlin in 1935. “Herr Altmann,” said his secretary, “I notice you’re reading Der Stürmer! I can’t understand why. A Nazi libel sheet! Are you some kind of masochist, or, God forbid, a self-hating Jew?” 

“On the contrary, Frau Epstein. When I used to read the Jewish papers, all I learned about were pogroms, riots in Palestine, and assimilation in America. But now that I read Der Stürmer, I see so much more: that the Jews control all the banks, that we dominate in the arts, and that we’re on the verge of taking over the entire world. You know – it makes me feel a whole lot better!”

If different communities within a society can have different ideas about humor, and different understandings about codes and how they’re broken, then this chasm is even greater across societies. Something considered funny in one context in America isn’t considered funny in a different context in America, and perhaps even less so in certain contexts in other countries. But this is where humor gets tricky. Humor is not just a set of jokes that some people get and some people don’t—humor is fluid, ever-changing, building layers on top of itself in a way that is difficult to quantify. Someone not understanding a joke, whether because of cultural or linguistic differences, may itself be humorous to someone else. In fact, miscommunication is a frequent topic in jokes and comedy. The video below deconstructs how in the show “Louie,” Louis CK’s struggle to communicate, and the mismatch between his verbal and non-verbal communication, is used for comedic effect:

Often times there is humor found in the confusion and ambiguity of language, expression, and everyday life (Nijholt). Much like how humans possess generative grammar—the ability to produce an infinite number of new, unique sentences using a finite number of base words—we also seem to possess generative humor (Jackendoff). We are not limited to a set number of jokes, but can create new ones, re-mix or re-mediate old ones, index them together, layer on top of them, subvert the conventions of humor (if these even exist) with anti-jokes and meta-jokes, introduce irony or sarcasm, and so on, infinitely.

Looking specifically at popular kinds of humor, one of the most recognizable is mimicry or imitation. Imitations are a comedic style in which the performer recreates a particular person’s actions, gestures, attitudes, or other identifiable traits. What’s interesting about this brand of humor from a semiotic perspective is that, as humor researcher Henri Bergson points out, there is something almost mechanical about what makes this humorous. The performer has identified and isolated particular patterns in the subject’s mannerisms or behavior, and recreates them stripped of their original context (Nijholt). The performer has taken a particular set of signs from the original subject and re-mediated them to be expressed through their own performance, in a way that still allows the listener to recognize the source. By isolating and exaggerating this set of signs, the imitator sheds light on the semiotic underpinnings of the subject’s forms of communication, highlighting elements of which we may have previously been unaware.

Similarly, parody and satire re-mediate specific elements of a particular piece of culture in a new context, to humorous effect. It can draw attention to the highlighted elements of the original piece, or it can create an index of sorts between the original and the parody/satire, linking them together in a way that is surprising. Comedy other than parodies and satires can also reference a previous piece of work. This is called intertextuality, defined by Neal R. Norrick as “any time one text suggests or requires reference to some other identifiable text or stretch of discourse” (Norrick). Inside jokes function in such a way, and are humorous because they take something known, intimate, or familiar—that which is being referenced—and manipulate it, surprising the listener and making them think about it in a new way.

“Types” or categories of humor follow specific formulas, maintaining a generic form but substituting key elements with new information in each new joke. The formula signifies to the listener that it is a joke—the formula is like the interpretant that signals we should be thinking about the object and representamen in a particular way. We now know to be looking for the humor in the subsequent lines. Since such formulas exist, is it possible that we could someday algorithmically analyze and code for humor? Would it be possible to identify the “types” and styles of humor, and program AI to mix and match them depending on the machine’s understanding of the environment? If there does exist a way to program AI for humor, this would likely be the key to discovering it. After isolating these variables of “humor,” researchers could potentially program AI to generate jokes using big data based on what people find funny. Using machine learning, the AI could highlight the elements between “successful” jokes that are similar, and over time learn what people find funny, even if they don’t understand why people find them funny. Take the joke at the beginning of this paper, for example. Isolating its elements into definable segments, it can be filed under “X walks into a bar” jokes (and further, “we don’t serve X” jokes), plays on words/ambiguous language, and content dealing with human suspicion of robots. It could then use a hypothetical reservoir of big data to “learn” how to craft a joke based on existing jokes and human responses to them.

But this seems like an optimistic proposal, and even if this were attained, it would be incredibly difficult for AI to learn about elements like context, irony, timing, delivery, and tone. As the humor would be “delivered” in a different form (from AI, not a human), it would likely lose the ability to incorporate the physical humor afforded to human comedians, though it’s possible that a new type of physical humor could arise from robots awkwardly trying to imitate humans (but this comes not from the AI’s intentional attempt at humor, and more from humans finding humor in the situation—yet another example of the fluidity of humor) (Nijholt). It also would not be able to layer humor over things—for example, something like taking a heckler’s comment and incorporating them into a joke, or recognizing an awkward situation and using it for self-deprecating effects. Humor is not a pre-determined set of jokes, but is fluid and adaptive. Even if we can make a robot that writes jokes, the underlying semiotic and cognitive processes involved in humor generally defined are just too complex, context-specific, and subjective, based on our current understanding, to develop AI with a thorough capacity for humor.

IV. Conclusion

Based on our current understanding of the limitations of programming AI, and our understanding of the semiotic underpinnings of humor, it will be a long time before we will be able to build computers that can imitate the human capacity for humor—if we can ever do so at all. It is certainly possible to program AI with pre-written material, and it may even be possible to develop algorithms that can generate jokes based on a narrow, defined set of joke formulas. But beyond this, the cognitive processes behind humor are incredibly complex, and humor itself is such a fluid, context-dependent phenomenon. The obstacles AI researchers and programmers have faced regarding natural language processing don’t seem to be going away anytime soon, and humor presents similar challenges. While it is possible to isolate and identify the semiotic elements of jokes, and even different “types” of humor, it seems unlikely we will be able to program a computer that can reasonably imitate the kind of generative humor capabilities humans possess.

 

Works Cited

Adam Krause. “Interstellar – TARS Humor Setting.” Online video clip. YouTube. Nov. 9, 2015.
Web.

Bar-Hillel, Yehoshua. Language and Information: Selected Essays on Their Theory and Application. Reading, MA, Addison-Wesley, 1964.

Beyond the Frame. “Louis CK and the Art of Non-Verbal Communication.” Online video clip. YouTube. Jun. 10, 2016. Web.

Berger, Arthur Asa. Blind Men and Elephants: Perspectives on Humor. New Brunswick, NJ, Transaction Publishers, 1995.

Berger, Arthur Asa. “Three Holy Men Get Haircuts: The Semiotic Analysis of a Joke.” Europe’s Journal of Psychology, vol. 12, no. 3, 2016, pp. 489–497. doi:10.5964/ejop.v12i3.1042.

Comedy Central. “Nathan for You – The Movement.” Online video clip. YouTube. Dec. 10, 2016. Web.

Dennett, Daniel C. “Cognitive Wheels: The Frame Problem of AI.” Minds, Machines and Evolution, 1984.

Dreyfus, Hubert L. “A History of First Step Fallacies.” Minds and Machines, vol. 22, no. 2, 2012, pp. 87–99. doi:10.1007/s11023-012-9276-0.

Galanter, Marc. Lowering the Bar: Lawyer Jokes and Legal Culture. Madison, WI, University of Wisconsin Press, 2005.

Haugeland, John. Having Thought: Essays in the Metaphysics of Mind. Cambridge, MA, Harvard University Press, 1998.

Haugeland, John. “Understanding Natural Language.” The Journal of Philosophy, vol. 76, no. 11, 1979, p. 619. doi:10.2307/2025695.

Irvine, Martine. “The Grammar of Meaning Systems: Sign Systems, Symbolic Cognition, and Semiotics.”

Jackendoff, Ray. Semantic Interpretation in Generative Grammar. Cambridge, MA, MIT Press, 1972.

Kuhlman, Thomas L. “A Study of Salience and Motivational Theories of Humor.” Journal of Personality and Social Psychology, vol. 49, no. 1, 1985, pp. 281–286. doi:10.1037//0022-3514.49.1.281.

Larson, Erik J. “The Limits of Modern AI: A Story.” The Best Schools Magazine, www.thebestschools.org/magazine/limits-of-modern-ai/.

Manning, Christopher D., and Hinrich Schütze. Foundations of Statistical Natural Language Processing. Cambridge, MA, MIT Press, 2003.

Minsky, Marvin, and Seymour Papert. Perceptrons: An Introduction to Computational Geometry. Cambridge, MA, MIT Press, 1988.

Nijholt, Anton. “Incongruity Humor in Language and Beyond: From Bergson to Digitally Enhanced Worlds.” 14th International Symposium on Social Communication, 2015, pp. 594-599).

Norrick, Neal R. “Intertextuality in Humor.” Humor – International Journal of Humor Research, vol. 2, no. 2, 1989, doi:10.1515/humr.1989.2.2.117.

Schank, Roger C., and Robert P. Abelson. Scripts, Plans, and Knowledge. New Haven, Yale University, 1975.

Silver, Nate. The Signal and the Noise: Why so Many Predictions Fail—But Some Don’t. New York, NY, Penguin Press, 2012.

TED. “Heather Knight: Silicon-based comedy.” Online video clip. YouTube. Jan. 21, 2011. Web.

Ziv, Avner. Jewish Humor. New Brunswick, NJ, Transaction Publishers, 1998.