Category Archives: Final Project

Designing for Learning

Warning: Use of undefined constant user_level - assumed 'user_level' (this will throw an Error in a future version of PHP) in /home/commons/public_html/wp-content/plugins/ultimate-google-analytics/ultimate_ga.php on line 524

Max Wilson


E-Learning is already a large and booming industry. With a wealth of options for software and providers in the individual, academic institution, and corporate training spaces, what are the core elements behind e-learning technology writ-large? This paper approaches this question through an overview of learning norms and case studies of a basic language learning program, an academic oriented course delivery company, and a corporate skills training platform. Analysis reveals that these are each different iterations of Learning Management Systems. Differentiated primarily by content, the unifying elements and of content delivery and management through SCORM offer companies and users with a plethora of options for combinatorial design of applications and platforms tailored directly to customer needs.


“The Return on Investment promise of higher education is gone.”  Melissa Bradley, Adjunct Professor, McDonough School of Business, speaking at the McGowan Symposium on Business and Ethics at Duke University, November 9, 2018.

Traditional 4-year higher education programs are being scrutinized for their applicability in a world where workers need to continually learn and adapt to new challenges alongside the technological tools that are changing how we live, work, and play. The high sticker prices are being met with skepticism when the compensation of the jobs they earn pale in comparison to the jobs of self-taught “basement” programmers. As concerns about affordable access to relevant education has grown, so has the availability of computers, the level of internet access, and the power and affordability of content hosting and delivery services. In 2012, two Stanford professors performed an experiment with recording their classes and releasing the video online. Shortly afterwards they left their teaching roles to start Coursera, an online platform with the explicit purpose of sharing the best university courses with the world at large.[1]

Coursera was revolutionary when it went live for the high profile institutions and professors behind the courses being taught, however, the fundamentals of the service were nothing groundbreaking. Distance learning is not a new concept, dating back to the early mail based correspondence programs of the 1950s.[2] Computer based learning was also not a new concept, with Rosetta Stone providing customers with boxed sets of CD-ROMs in 1992 that promised immersive language learning through your newest home appliance, the personal computer.

Over the past decade those seeking to learn new skills for work or life have been faced with an ever growing number of options, formats, and providers to choose from. Furthermore, these e-learning solutions for individuals represent just a sliver of a marketplace dominated by corporate buyers seeking to onboard new-hires efficiently, offer employees cost-effective and convenient professional development, and upskill workers as new technologies continue to challenge the existing skillset of employees.

From a business or social standpoint, these e-learning solutions simply repackage age old content and teaching styles into a new medium for consumption. Since the final product is minimally differentiated from the service it replaces, are the multitude of platforms and systems in the marketplace truly different from one another? Below, I have gone through three examples of e-learning systems that from a user perspective constitute very different use cases. By deblackboxing these different software platforms I expect I will find a very consistent set of modular technical building blocks, that are fundamentally simple to develop, select from, and recombine in a combinatorial manner. I believe a thorough understanding of the core elements of these Learning Management Systems will empower me in my future work to help clients select their ideal providers and design for success in their workforce development programs.

Professional Relevance and Market Insight

Through gathering sources and researching the technical elements of these platforms I found no true market leader in the Learning Management System space. Instead, I found a lot of extremely similar platforms, making even more similar claims of valuable user outcomes filled with the most buzz-worthy of terms including artificial intelligence, machine learning, and innovative. The preponderance of relatively equal market options, for particularly business oriented products, tells me two things about the nature of LMS software:

  1. The fundamental technical modules that comprise an LMS are relatively simple, easy to build, and do not represent a significant hurdle to the development of a market ready LMS.
  2. LMS platforms meet basic customer needs adequately, but are largely not differentiated in the user experience they provide, otherwise one would be likely to rise to the top of the market.
  3. Companies using LMS desire enough specific tailored features that designing a universally applicable, Off The Shelf (OTS), LMS very difficult.

In my post graduate career as an Digital Strategy Change Management Consultant I will be involved in the creation of content for client LMS, the selection and implementation of externally produced LMS, and the design of internally created LMS. A thorough understanding of the modular elements of LMS software will help me improve upon the existing approaches to LMS software currently on the market and those being developed for proprietary use by e-learning companies or internal training departments.

Comparative Case Studies

Digital Artifacts, Learning Interfaces, and Cognition

The three e-learning software platforms I will be addressing are simply the most recent iteration of educational methods and environments. Early human life was full of experiential lessons. Over time we developed language ability to communicate immediate threats and advance the collective security through the sharing of lessons experienced by others in the community. At the same time as we were using language to communicate, we were also depicting our experiences in the form of cave paintings and other visual symbolic representations of the world around us. There is much debate around the purpose of these symbols, but a common theory says that some were likely used to help communicate important information about the world, what animals were of value or posed a threat, the timing of seasons, family lineages, etc. This early development of visual learning is a crucial precursor to modern e-learning.[3]

E-Learning rests on a foundation of teaching techniques that have their roots in the early communication of risks, history, and the lessons of nature. Teaching psychology singles out three distinct types of learning and learners: visual, tactile, and auditory. As research on learning styles becomes more widely known, classroom teachers have worked to adapt their curriculum to find a happy compromise of all three, ensuring every student receives some portion of learning in their best style.[4] Fundamental to learning through a digital interface is participation with the content. The icons and images support visual learning, the need to select answers on the screen or type out responses provides some degree of tactile engagement, and the videos, audio support, and use of “interaction sounds” – clicks, dings, etc. – trigger auditory learners.[5]

In addition, the affordances of the digital medium, as defined by Janet Murray, seem to align well with providing learners on digital platforms elements suitable to each learning style. As we will see, each of the following platforms accesses and delivers (1) encyclopedic knowledge, requires (2) spatial interpretation of the icons, images, and elements used in the lesson, follows (3) procedures in learning familiar to a broad spectrum of users, and engages the user in (4) participatory interaction with the content in order to move through lessons.[6] Given the affordances inherent in the medium of these platforms, let’s see how they differentiate each other from a user perspective, and more importantly, how are they different or not within their separate black boxes.


As a great example of nothing being “new,” Duolingo is the modern, mobile, lightweight, kid cousin to the pioneer of computer based foreign language training, Rosetta Stone. Duolingo offers English speaking users the ability to study any of 32 languages – including two fictional languages: Klingon and High Valyrian – and non-English speaking users to learn English as well as a few select other languages depending on their native tongue. The process of learning is one of context based vocabulary acquisition through repetition and translation. While the effectiveness of the platform for true language learning is highly debated, it is nonetheless a popular piece of mobile software for engaging in a form of language learning. So how does this software deliver lessons to users?

One hint to the inner-workings of the app, is how the structure of lessons are displayed and made available to users. Each lesson “group” is displayed as an element of a tree of groups. The tree hierarchy structure shows the evolution of the lesson from one group to the next, visually hinting at the progressive nature of the education method. Additionally, only the immediate next tier of lessons, and all previously completed lesson groups, are available for users to work through. This provides a clear structure to the training program by preventing users from jumping ahead without first going through the introductory lessons. All told, this progressive style hierarchy of lessons, suggests a similar hierarchy on the backend of the program.




It is useful to think about what this sort of language training would look like in the analog world to understand how it is built and organized behind the scenes.

In a physical environment, the Duolingo lessons are most similar to successive sets of themed flash cards. Each lesson takes the user through a discrete set of new words and phrases, in a predefined order that slowly builds user familiarity with new elements. As the user progresses, these elements are combined to make more complicated elements that introduce new concepts. Even as the content evolves, the method of delivery remains the same. Users are either presented with a phrase and several translation options to choose from, a phrase and a keyboard to fill in the missing elements, or simply an array of icons on which they must tap pairs of words with their English equivalents. All of these elements are simply digital versions of a deck of flashcards supplemented by fill in the blank and translation exercises one would expect from an entry level classroom language course.

Given this modularized lesson format, we can envision a rather simple artifact based hierarchical database working behind the scenes. Each lesson group must have an associated set of digital artifacts, our flash cards, with individual lesson elements encoded into each. These lesson elements are then queued up and presented to the user as they work through a lesson set. If a user makes a mistake and provides an incorrect answer that individual lesson element is “reshuffled” into the remaining “cards” of elements and presented to the user again before they complete the lesson.

This retrieval system is a straightforward preprogrammed arrangement of elements, functioning similar to pages in a book. From a content development standpoint, the biggest hurdle for Duolingo would have been the creation of the first set of language lessons, the ordering and presentation of the vocabulary and lesson elements. Once one language was complete the sequential hierarchy of elements could be repurposed through roughly direct translation into each subsequent language offering. While limited in scope of content, the content retrieval and delivery mechanism, starting with selection of content by the user, then presentation through a mobile or desktop interface, and sequenced completion of a discrete set of procedural elements is a fundamental system of processes behind all e-learning platforms reviewed here.

Where Duolingo also provides a simplified example of a more complex operation performed by our next few cases is in the Learning Management System, or LMS. One of the affordances of providing content through a digital system that is not offered by traditional book based individual learning is the ability to track and certify progress toward an objective in a dynamic manner. Users, as identified by their username and login credentials, become a discrete tracked element in the Duolingo LMS. As users complete lessons, a record of completion is made in their user accounts. These completion “notes” indicate to the system which lesson groups should remain open to the user the next time they log in, and serve as the keys needed by users to access the next set of content in the lessons. Duolingo takes this tracking a step further into the user experience, making it an element of their method of gamification.

When users sign up for the software, they indicate a desired amount of progress they would like to make in their language learning each day. Upon completion of each sub-lesson, they receive positive feedback in the form of a dial increasing towards their daily goal. Gamification is a fundamental element used by e-learning software to enhance their stickiness with users, ideally increasing user engagement with the software, thereby improving outcomes. Duolingo not only incorporates this gamified element of daily achievement into the user experience while they are engaged in the app but also capitalize on the indicated commitment of the user to remind them, via pop-up notification on their mobile device, to return to the app and complete their lessons for the day.

Next, we will see how the fundamental LMS elements used to make Duolingo successful provide the foundation for more in-depth appearing e-learning solutions.


While Duolingo gamifies language learning for the purpose of keeping users engaged, cutting out traditional classroom elements, such as lectures and assignments, Coursera leans in to the value of the classroom experience. With a stated purpose to bring the best of university classes to students at every stage and place in life, Coursera mimics the in-class learning experience, including videos of lectures, assignments, peer discussion, and exams. With over 150 university partners, Coursera’s content offerings are their source of differentiation amongst e-learning companies. However, while the content is interesting, the content is not inherent to the technology behind Coursera, instead it is a product of successful sales and marketing. At its core Coursera is another example of a software product that is essentially a Learning Management System.

A Learning Management System is the interface between e-learning content developers/admin and users that serves as a repository and distribution hub for the educational content offered. LMS software also tracks information about user progress through courses, completion and performance on assignments and exams, and other details about user engagement including time spent on the platform. While our next case is on a specific corporate oriented LMS software product, Coursera offers an example of a custom LMS product.

To meet the specific needs of Coursera’s stakeholders, the company has developed its own proprietary LMS. Their LMS is comprised of some of the most common technical modules seen in LMS software:

  1. A repository of course content sectioned off according to content developer, learning topic, and a variety of paywalls.
  2. A portal for development of content by university partners with the support of Coursera IT and content teams.
  3. A multimedia hosting and delivery service for efficient storage and consistent streaming of video content.
  4. Learner communication system for learners registered in the same courses to communicate with each other about course content and collaborate on group exercises.
  5. Learner progress tracking system with a consistent testing mechanism.
  6. A gamification engine that introduces game like elements into the learner experience to increase engagement and stickiness of the platform.

The value of using a proprietary LMS to manage the delivery of university content to learners is the creation of a consistent user experience across all courses. Through consistent supporting elements to the content, users develop a relationship and familiarity with the Coursera way of teaching. However, not all academic endeavors lend themselves to logically ordered learning and evaluation, making electronic delivery of diverse subjects challenging.

Coursera’s initial offerings were focused on math and computer sciences, subjects which are well suited for evaluation in an empirical manner. Assignments and exams in these areas can be developed and administered as multiple choice or fill in questions with a limited number of answers (eg. 2 x 4 = 8). However, higher education is not limited to STEM, so universities and students interested in Coursera understandably expected to have access to a more holistic range of courses. This demand posed a new problem to the Coursera model: how to grade the assignments of 30,000+ students per course when “Barack Obama,” “Obama,” and “the President” were all acceptable answers?

Instead of waiting for, or developing in house, the machine based human language analysis required to assess open ended writing prompts, Coursera turned to its vast array of students as a potential solution. Building on the learner communication system of the platform, they introduced a peer-to-peer grading system. Facing a variety of levels of user commitment, challenges in consistent interpretation of rubrics, and often opaque user incentives, this peer grading element is an area where Coursera continues to iterate. More important for this analysis however, is how this need to introduce a system for human grading highlights a clear constraint of e-learning.

Just as language software like Duolingo will likely always fall short in comparison to a real world language immersion program, computer based learning will continue to struggle to provide assessment of more than discrete measurable answers. While this shortfall will certainly hamper the value of such technology to teach complex transferrable skills like advanced creative problem solving and human centered design, the bulk of e-learning applications exemplified by the next case competently meet discrete customer requirements.


When most people hear the phrase “corporate training,” an image comes to mind of a bland hotel conference room, filled with rows and rows of chairs, facing drop down screen featuring a text heavy PowerPoint. Just as employees dread these boring training sessions, so do their employers, who have to shell out money for trainers, space, and supplies, while giving up days of employee productivity. Employee onboarding and training comprise a large part of the e-learning market and Lessonly is just one example of software available to corporate companies.

As an Off-the-Shelf LMS, Lessonly offers similar core capabilities as the Coursera LMS. Easy, non-technical, drag-and-drop content creation, tracking of learner progress and performance, assessment and feedback, and content hosting. Fundamentally, Lessonly stores, organizes, and retrieves content for learners just like Duolingo and Coursera. Where it differs is in the requirement of its customers, the companies in this case, to fit into the network of tools and systems they use to run their business.

Lessonly and other OTS LMS options on the market achieve the fit desired by companies through the support of third party plugins. Through plugins for common business stack applications like Zenefits, Salesforce, and Slack, companies can integrate training with employees daily activities. Completion of programs can be tied to sales outcomes in Salesforce, managers can assign trainings to Slack teams to keep everyone up to speed with new practices and procedures, and progress along various career development pathways feed into performance metrics in Zenefits, the human capital management platform.

Through integration with existing corporate technology stacks, Lessonly, functions as just one more modular application within a larger bundle of tools used to manage company operations. What is interesting about Lessonly is not how it is unique but rather how it is the same as so many products on the market. While variants may be focused on managing traditional education environments, sales and customer service training, or presented as an alternative to traditional education entirely, they all provide the same content creation, management, and delivery elements described above. So how does such a universal set of features actually work? 

Shareable Content Object Reference Model (SCORM)[7]

Behind all three of these platforms is content built according to SCORM. SCORM is a model that guides developers on the creation of units of learning content that can be easily shared across systems. “Shared content objects” created according to this model can be interpreted by different operating systems and content delivery mechanisms and enables users working through a variety of interfaces to interact with the centralized content stored in an LMS.

At a basic technical level SCORM guides developers on how to package content for delivery and Run-Time communication. The specifications around packaging enables the LMS to know which units of content should be accessed in response to user prompts, what type of content it is, the name needed for retrieval, etc. This is essentially the naming and identification scheme of the flash cards in the Duolingo context and the videos and exercises in the Coursera and Lessonly cases. Run-Time communication feeds into the delivery and tracking features of the LMS. What prompts does the user receive while the content is running? What information is to be recorded as the user works through the system? How is the completion of the content to be handled in the user’s record in the LMS.

As long as content follows the SCORM model, it is able to be used, sold, and repurposed by companies, platforms, and users across any number of compliant platforms.

LMS and the Future of Learning

When set next to each other, the cases of Duolingo, Coursera, and Lessonly, serve to illustrate how the e-learning environment is notable more for its content than its technology. At a basic level, each of these applications take traditional models of education and learning and simply repackage educational units for delivery as digital artifacts. Relying on SCORM, or one of a few other similar models for shareable content, each application is built around a relatively simple LMS responsible for handling the inputs of content developers and interactive consumption of that content by learners.

Because SCORM is a universally available guidance for unit creation, the differentiating elements of these platforms are not the underlying technology but rather the content itself and the interface through which makers and users interact with the content. The Coursera commitment to university partners for creation of the best content is similar to the battle between Netflix and Hulu over content aggregation and creation rather than other less differentiable features. However, due to the complex and dynamic nature of learning, e-learning systems that are able to combine this simple LMS technology with tools for greater tailoring of content to individual user needs and the user environment, potentially stand a chance to break away from the pack of basic offerings.

Already we see discrete software applications taking advantage of GPS and Augmented Reality technology to provide users with a significantly more tangible learning experience in museums and national parks. Similar to the Netflix algorithms that interpret user preferences to suggest content in line with viewer tastes, the data gleaned from the delivery of dynamic rather than static content and associated assessment measures could be used to identify the optimum method of content delivery for each learner. With so many potential applications of the basic LMS skeleton in education and business, the question of creating a unique LMS or adopting an OTS version must be grounded in what is the end goal? Starting from an objective setting analysis of the purpose of the LMS and affordances desired for a particular use case, the basic building blocks of an e-learning platform are readily available to build on. The real challenge remains one of whether companies, institutions, and individuals can pause to think about what they really need from their learning programs or if they will jump at the newest and shiniest set of marketing pitches.

[1] Leber, Jessica. “The Technology of Massive Open Online Courses.” MIT Technology Review. Accessed December 13, 2018.

[2] “Correspondence Education | Britannica.Com.” Accessed December 14, 2018.

[3] Tattersall, Ian. “How We Came to Be Human.” Scientific American, 2006, 66–73.

[4] Dunn, Rita, Jeffrey S. Beaudry, and Angela Klavas. “Survey of Research on Learning Styles.” California Journal of Science Education II, no. 2-Spring, 2002 (n.d.): 75–98.

[5] There is much debate about whether the participatory nature of digital interfaces triggers the learning and memory centers of the brain as well as the tactile use of pencil and paper, unfortunately that is a debate we will need for brevity’s sake to put aside for another time.

[6] Janet Murray, Inventing the Medium: Principles of Interaction Design as a Cultural Practice. Cambridge, MA: MIT Press, 2012. Selections from the Introduction and chapters 1-2.

[7] “SCORM Explained: In Depth Review of the SCORM ELearning Standard.” Accessed December 15, 2018.

Callebaut, Werner, and Diego Rasskin-Gutman, eds. Modularity: Understanding the Development and Evolution of Natural Complex Systems. The Vienna Series in Theoretical Biology. Cambridge, Mass: MIT Press, 2005.

Frey, Carl Benedikt, and Michael A. Osborne. “The Future of Employment: How Susceptible Are Jobs to Computerisation?” Technological Forecasting and Social Change 114 (January 2017): 254–80.

Gamma, Erich. “Elements of Reusable Object-Oriented Software.” Addison-Wesley Professional Computing Series, 1994.

Norman, Donald A. “Cognitive Artifacts.” In Designing Interaction: Psychology at the Human-Computer Interface, edited by J. M. Carroll. Cambridge, Mass: Cambridge University Press, 1991.

Sabharwal, Arjun, ed. Digital Curation in the Digital Humanities: Preserving and Promoting Archival and Special Collections. Chandos Information Professional Series. Waltham: Chandos Publishing, 2015.

Tattersall, Ian. “An Evolutionary Framework for the Acquisition of Symbolic Cognition by Homo Sapiens.” Comparative Cognition & Behavior Reviews 3 (2008).

How Can You Mute Your Voice on iPhone?

Warning: Use of undefined constant user_level - assumed 'user_level' (this will throw an Error in a future version of PHP) in /home/commons/public_html/wp-content/plugins/ultimate-google-analytics/ultimate_ga.php on line 524

Banruo Xiao


The purpose of this paper is to de-blackbox the mute function embedded in every social media application that every user uses many times on the smartphone. This paper is trying to show that the one click effect is not that easy and simple. Conversely, it is a complicated process combining Internet, software and hardware to work together to achieve the result. Following with this purpose, this paper will address Internet and software and hardware in a smartphone separately to explain how each functions and how they work together.


Technology often brings surprise and excitement to users in their daily use. It is hard to imagine how people can talk to each other in remote distance before cell phone and Internet are created. Now we have smartphone, and iPhone is a typical example. Numerous applications, following with the creation of smartphone, are designed to make users more convenient on using the Internet. Users do not even need to pay for making online phone calls on social media applications. Technology can do more than that. When someone makes a call, he/she can even mute his/her own voice as long as the call keeps continuing by a simple click on the mute function. It is really incredible that one can hear the voice from the recipient, but no one can hear his/her voice. This paper will mainly focus on the mute function that each social media application embeds, and how it technically works from the side of designer’s view. This paper will be divided into two parts to logically explain the mute function. The first part will discuss how users can make online phone call through Internet. The next part will pay attention on how to technically achieve the function of muting one’s voice with a simple click on iPhone.

How can people make online phone call?

It is a common sense that we now are in a digital age. Internet connects each device together. Users can acquire and share information and communicate through Internet. It seems that we all are on the Internet. However, from computer scientist’s and Internet developer’s point of view, Internet is designed based on a complex system, which contains multiple layers and various modules. The layers and modules are working together to provide a user-friendly interface for people who know nothing about computer design and Internet design, like me. In other words, the design of Internet is far more complex that user’s thoughts. Although Internet is a product of complex design thinking, it follows many universal principles.

  • How the process works and the definition of some key terms

To allow users to communicate online, such as sending message, making online phone call and video call, the Internet networked devices rely on protocols which are the methods for sending and receiving data packets (Irvine, 2018). Transmission Control Protocol (TCP) and Internet Protocol (IP) are the two important communication protocols. TCP works on breaking up information and message into pieces called packets and resembling the packets into original information. IP is responsible for ensuring the data packets are sent to the right destination. The Internet works primarily “end to end” to make sure data packets are sent or received correctly from one connecting point to another (Gralla, Troller, 2006). In this case, Internet is also known as packet switched network. To understand and interpret the protocols, devices must have a socket or a TCP/IP stack software.

The technique for allowing users to make online phone call is Voice over Internet Protocol (VoIP), which uses TCP/IP to deliver voice message. By relying on VoIP, the process of making an online phone is simple. One can speak into the microphone attached to the device. The VoIP phone transforms the voice signal into digital data and compresses it for easier delivery through the Internet. The compressed, digitized voice signal will be broken down into packets. The voice packets will be sent to an IP voice gateway nearest to the destination. The IP gateway will take the voice packets. With a process of combing, uncompressing and converting back to the original form, the voice signal will be sent through the normal Public Switch Telephone Network. The recipient can listen through speakers and a sound card, or using an earphone connected to the device through a USB port (Gralla, Troller, 2006). Currently, most smartphones adopt Voice over Long Term Evolution (VoLTE) which uses VoIP to achieve network communication (2017).

  • A broader image of the process: Modularity, layering and Internet’s extensibility and scalability

The complete process of making an online phone call implies numerous Internet design principles, and the rule of thumb is that the Internet is not an integrated product. Indeed, there are many layers and modules working for different objects and purposes behind the interface we usually see to form the whole Internet. Modularity and layering shape the architecture of Internet; and the consideration under the principles is to make the components more independently but can work together with efficiency.

According to Barbara van Schewick, modularity employs abstraction, information hiding and a strict separation of concerns to make the Internet more users friendly. More specific, modularity separates visible information and hidden information that users only need to see the visible information to fulfill their purpose, while designers can access to the hidden information to develop their modules. In this case, from user’s side, they can only do the actions including opening an application and calling someone. TCP/IP and VoIP staffs are hidden, while application designer knows how to work on them.

At the same time, layering is a special form of modularity which constrains the dependency among modules. Lower layers can only interact with its neighbors and provide service to the higher layer. At the same time, higher layers are protected from changes in lower layers. Layering helps reduce complexity of the network. And end to end argument places the functionality of each layer. In this case, TCP and IP are two layers working separately. At the same time, they work together when the voice signal needs to be sent to the destination.

The modularity and layering also give Internet more possibility. The ability of adding unlimited modules and layers solves two design problems: scalability (how does the design scale to unlimited connections) and extensibility (how to add new modules and layers to the common architecture). As long as protocols function correctly, the two problems will no longer need to be concerned.

Explain the mute function (basically how people act)

After being clear of the process of making an online phone call, it is pretty straightforward to understand how mute function works. Basically, the mute function works similarly to an on/off toggle switch. From user’s side, by simply clicking the icon of mute function on screen, the microphone embedded in the phone is automatically turned off. Based on the process explained in the previous section, no more voice signal needs to be digitized and compressed. In this case, the following steps are automatically ended. However, from developer’s point of view, the whole process is not that simple. There are many questions need to be answered before reaching the “one click effect”. For example, how is it possible to touch the screen to turn on the mute function?

  • How to actually achieve a mute status on a smartphone

Before addressing the question, this part will firstly de-productize a smartphone. The components of a smartphone will show how each part cooperates together to satisfy the user’s demand.

  • The components of a smartphone

The first obvious component should be the display. It is an interactive interface enabling users to interact with the device. Today, there are mainly two types of display. One is based on LCDs, and the other is based on LEDs. According to Apple’s official website, the newest version of iPhone has LCDs based display, meaning that the lights users see are generated by the lights from the other side of the display shining through some filters (FOSSBYTES, 2017). The next component is battery. The battery of most brands’ smartphone is normally built in rechargeable lithium-lion battery.

In a phone, perhaps the most important item is ‘system on a chip’ or SoC, which comprises CPU, GPU, LTE modem which is used for communication, display processor, video processor, and other bits of silicon turning it into a functional system. Apple’s own developed chipset uses ARM’s system architecture.

In addition, each device would contain Random Access Memory (RAM) and memory. RAM works with CPU to increase processing efficiency and to extend battery life. And memory has varies capacity which is used for internal storage. On the outside, all smartphones come with a rear facing and front shooting camera, comprising up to three main parts: the sensor for detecting light, the lens and the image processor.

In addition, there are five main sensors allowing a smartphone to provide the touch enabled functionality. They are: “

  1. Accelerometer: Used by applications to detect the orientation of the device and its movements, as well as allow features like shaking the phone to change music.
  2. Gyroscope: Works with the Accelerometer to detect the rotation of your phone, for features like tilting phone to play racing games or to watch a movie.
  3. Digital Compass: Helps the phone to find the North direction, for map/navigation purposes.
  4. Ambient Light Sensor: This sensor is automatically able to set the screen brightness based on the surrounding light, and helps conserve battery life. This would also explain why your smartphone’s brightness is reduced in low-light environments, so it helps to reduce the strain on your eyes.
  5. Proximity Sensor: During a call, if the device is brought near your ears, it automatically locks the screen to prevent unwanted touch commands.” (FOSSBYTES, 2017)

Indeed, there are too many more components inside an iPhone to have a space writing down all of them. Some other crucial elements relative to mute functions include three microphones, earpiece speaker, lower speaker enclosure, top speaker assembly, and board chips containing gigabyte LTE transceiver, modem, WiFi/Bluetooth module and touch controller.

  • Touch screen and how it works with other parts to achieve the mute function

More than that, the most obvious relative component probably is the touch screen. For allowing users to use touch commands, its touch screen includes a layer of capacitive material. iPhone’s capacitors are arranged according a coordinate system. “Its circuitry can sense changes at each point along the grid. In other words, every point on the grid generates its own signal when touched and relays that signal to the iPhone’s processor. This allows the phone to determine the location and movement of simultaneous touches in multiple locations (How the iPhone Works, 2007).” The touch screen detects touch through two ways: mutual capacitance and/or self-capacitance. “In mutual capacitance, the capacitive circuitry requires two distinct layers of material. One houses driving lines, which carry current, and the other houses sensing lines, which detect the current at nodes. Self-capacitance uses one layer of individual electrodes connected with capacitance-sensing circuitry. Both of these possible setups send touch data as electrical impulses (How the iPhone Works, 2007).” The later version of iPhone combines the capacitive touch sensing layer and the LCD display layer into one layer.

The iPhone’s processor and software in the logic board chip interpret input from the touch screen. The capacitive material sends raw touch location data as electrical impulses to the processor, and the processor asks software located in memory to interpret the raw data as command and gesture. The interpretation process will analyze the size, shape and location of the affected area and determine which gesture the user made. It combines physical movement and information about the application the user was using and what the application was doing. The processor may also send command to the screen and other hardware. In mute function’s case, when a user is calling someone and trying to mute his/her own voice through an application, the processor will follow the above steps and send command to turn off the microphone. At the same time, when the user is calling someone through the application, some other hardware including RAM, LTE transceiver, WiFi/Bluetooth module and modem will also begin to function to complete the process of transferring Internet signal which is discussed in the first part of this paper. In general, for the hardware part, the processor in the logic board chip is the most important component to deal with all the required steps to fulfill the command of muting voice.

  • Design principles and concepts: affordance (the icon), interface (touch screen and others), modularity, computational thinking

The whole process working on the iPhone shows several design principles and concepts. For example, the icon of the mute function, which is universally applied by almost every social application embedding the function, clearly represents no more speaking or talking would not be allowed. In the design principle, according to Martin Irvine (2018), something affords an action or certain interpretation when its use seems to be an “obvious” inference can be called an affordance. It is an artefact leaves visual cues on how to use it. In fact, the “obvious” inference never comes out automatically. Instead, it is a product of socialization, and human understand it from social learning. At the same time, the touch screen, which enables users to use their smartphone, can be seen as an interface. Interface is defined as anything connects two different systems across the boundaries of them. In this case, the touch screen is the interface connecting users and the smartphone. More than that, the touch screen is actually an interactive interface which allowing users to interact with the device. Furthermore, the idea of modularity can also be applied. Each component of the iPhone runs individually, but they can work together to take the command.



Overall, the process of muting one’s voice includes Internet protocols digitizing, compressing and sending voice signal to the destination. At the same time, by rely on the touch screen as an interactive interface, the smartphone enables users to simply touch the display connecting processor, microphone, speaker and software working together to complete the process. And the most fascinating thing is both Internet design and hardware design share similar design principles, implying that universal design principles build a solid foundation any technological design.


Barbara van Schewick, Internet Architecture and Innovation. Cambridge, MA: The MIT Press, 2012. Excerpt from Chap. 2, “Internet Design Principles.”

Elnashar, A., El-Saidny, M. A., & Mahmoud, M. (2017). Practical Performance Analyses of Circuit-Switched Fallback and Voice Over LTE. IEEE Transactions on Vehicular Technology, 66(2), 1748–1759., M. P. (2017, November 03). Inside the iPhone X: First teardown reveals two batteries. Retrieved from

Gamet, J. (n.d.). IPhone 4: Finding the Hidden Hold Button. Retrieved from

Gralla, P., & Troller, M. (2007). How the Internet works (8th ed.). Indianapolis, IN: Que Pub.

Tracy V. Wilson, Nathan Chandler, Wesley Fenlon & Bernadette Johnson “How the iPhone Works” 20 June 2007. <> 13 December 2018Martin I., (2018). “Introduction to Affordances and Interfaces.”

Martin I., (2018). The Internet: Design Principles and Extensible Futures (Why Learn This?) (n.d.). Apple iPhone 7 Teardown. Retrieved from


Meitu app re-examined, from a design perspective

Warning: Use of undefined constant user_level - assumed 'user_level' (this will throw an Error in a future version of PHP) in /home/commons/public_html/wp-content/plugins/ultimate-google-analytics/ultimate_ga.php on line 524

Xiaoman Chen


Meitu has been the leading photo-editing app in China for ten years and now begins to expand its presence worldwide. Popular functions mainly include photo editing, one-touch beautification, AI art painter, etc. The biggest reason for its rise lies in its organic mechanism of cutting-edge technologies such as image enhancement programs, facial recognition, image generation and so forth. Meitu itself did not have any novel technological breakthroughs, but by optimizing the usage of existing technologies, it displays the power of combinatorial design. Another reason is wide applicability that makes it available both for PC and mobile devices within various systems. Besides, in terms of user interface, Meitu sets multiple forms of constraints to ensure the usability of the application. As a phenomenal image-editing app that interacts with individuals and the society as a whole, Meitu is gradually reshaping the face of daily life and culture. This research paper will further scrutinize design aspects behind its popularity: technical dependence, user interface, and social role of Meitu.


In China, it is a typical scenario that people stretch out arms taking selfies within less than three seconds and then retouching blemishes or adding filters for a half-hour before posting them on social media. Image editor apps are widely used in China, whose selfie consumers might be the most advanced in the world. According to the statistics, each user has more than 2 photo editing apps downloaded on their phone. Among them all, Meitu is undoubtedly the superstar in this realm.


Figure 1. Statistics: each user installed an average of 2.4 apps. (Image source:

Figure 2. Meitu logo. (Image source:

Meitu starts off as a PC version of photo editor in 2008 and later launched at mobile devices in 2011. As a powerful photo editor to go viral, Meitu app has created a rich portfolio of offerings by integrating advanced photo imaging technologies into one platform and introduced them to everyday life through an intuitive interface. With a mission of making the world a more beautiful place, it has led the mainstream aesthetics trend and helped improve users’ social lives. That’s why after 10 years of growth, facing with newborn competitors, it is still the dominating image processing apps in China and worldwide. So far, Meitu has ranked No.1 by daily new user and No.2 by weekly active penetration rate. From a global scope, it’s available in more than 26 countries and been downloaded more than 1 billion times. Meitu is a phenomenon.

Figure 3. Photo editing app–daily new user average. (Image source:

Figure 4. Photo editing app–weekly active penetration rate. (Image source:

In order to get a general understanding of how it works, first we can take a quick tour on its interface and main features.

Click on the pink icon with characters “Meitu”, which stand for “beautiful picture”, and then we are on the main page that includes multiple options for photo editing. All of those modules are presented with understandable icons to imply their functions: a camera icon for taking photos, magic wand for basic editing, woman figure for reshaping portrait, grids for collage, a notebook for tips, etc.

Figure 5. Meitu chart-flow

Among all those modules on the main page, “Edit” and “Beautify” are two essential parts that aggregate core functions. “Edit” gathers a number of basic photo editing options that are relatively standards, such as “auto-enhance, crop”, “brightness”, “text”, “eraser”, etc. Whereas “Beautify” is to retouch on human face or body shape: enhancing skin, erasing acne and wrinkles, slimming down one’s face, making one look taller, etc. To operate those functions, all we need to do are dragging the bar down on the page or simply touching up on screen. Those core functions are basically the Photoshop sections but in a “one-touch” way. Another star feature in Meitu is ArtBot Andy, an AI robot that repaints selfies with a choice of styles and visual effects. At the time of writing a total of 712,042,886 users had witnessed the ArtBot’s “tech magic”.

Overall, the interface of Meitu is straightforward. Details about those functions mentioned above will be further discussed in the following part. Actually, Meitu did not invent any novel technologies throughout its development, but it is still one of the users’ favorites. What makes it possible? What makes it user-friendly? This article will answer those questions by re-examining Meitu app from a design perspective: combination, constraints, and sociotechnical system.

What makes Meitu possible?

  1. Technology combination

In the book “The Nature of Technology”, one core principle illustrating the essence of technology is combining existing elements, which are technologies themselves. Every novel technology, according to Arthur, was born as a hybrid of mechanical and organic. So was Meitu app.

When people say Meitu app is an innovator, they are talking about its innovation of presenting multifunction. As a commercial success in photo editing mobile application market, Meitu clearly knows the power of combining existing technologies. Throughout its development, Meitu itself comes with no new technological breakthroughs but adopts existing technologies to one platform where all of them come into play collaboratively.

Figure 6. Meitu neutral network inference framework. (Image source:

  • Image processing programs

“P图”, a phrase that means editing photos, were not that popular in China until the rise of Meitu. In this phrase, “P” originates from the name of a professional graphics editor, Photoshop, which was developed by Adobe systems. The first version of Adobe Photoshop was released in the 1980s when photo editing computer programs emerged. It does have magical features, with a variety of color adjustments, filter toning, highlight coverage, local processing, free transformation, etc., precisely modifying the picture. However, such a complex system requires a large amount of manual adjustments and is not something that every user can tackle.

As mobile devices become ubiquitous, various photo editing apps are emerging one after another. They can access the camera anytime by implementing the system API–application programming interface–and collect images from it for processing. Those magical features of photo apps are actually realized by integrated, stylized, template-based series of image processing programs.

Figure 7. MTenhance: Image Enhancement. (Image source:

For removing the defects of an image such as spots and acne on the face, it is usually a matter of changing the color and greyscale of the skin around the face. If a part of the photo has a relatively large gray level, it will be detected as a “noise”. Therefore, the secret to clearing acne is actually the “noise reduction” in image processing. This is the most common way to handle images and includes various algorithms, such as the filtering algorithm, which aims to replace the original value with the average of surrounding gray levels for the purpose of lowering the gray level difference to make the noise not so obvious. Through the user interface, users were just removing the spots with a simple drag on the adjustment bar under this function where they are actually setting a threshold for the noise reduction.

Similarly, most popular filters are characterized by overexposure, low contrast, and offsets of shadow and highlight hue. In 2011, Instagram pioneered in the field of “automatic filter” through the combination of brightness, contrast, and saturation in one process.

Simply put, in terms of basic photo editing, Meitu is more like a light-weight version of Photoshop, with lower technical requirements. Although image editing software like Meitu seems to be more updated and more powerful, the basic image editing programs are already mature and are similar to those of the past.

  • Face recognition

Another important feature in Meitu is reshaping one’s face and eyes and adding make-up or accessories automatically. This function is actually realized by the facial recognition system, which is capable of identifying or verifying a person from a digital image or video source. Facial recognition could be traced back to 60 years ago and has been used in various areas: security system, financial authentication, brand and PR agencies, etc. In the past decade being utilized in entertainment apps more frequently.

Figure 8. MT Face: Face-related technology. (Image source:

The realization of face adjustment is clearly structured: face detection–key point positioning–region harmonizing–color blending–edge blending. Face detection is the capture of how many faces on a photo. Now it has overcome the problems caused by face angles, expression changes, light intensity and so forth. The basis of face detection is the key point positioning, that is, to find where the nose is, where the eyes are, and this process is often achieved by neural network technology for machine learning. After the key point positioning, the operation of “reshaping face” has a foundation. After finding the outline of the face, the shape can be changed through certain calculations and graphics transformations. The same is true for the eyes, eyebrows, and mouth. Then, in order to make sure that it can be coordinated with the picture to be implanted, it is necessary to color-harmonize the face to ensure its color consistency and image fusion. The last thing to do is to implant the face features into the prepared template.

Figure 9. Deep Neural Network. (Image source:

Nowadays, Meitu makes the adjustment process can be realized in a real-time manner–with the help of deep neural network technology. Turn on the front camera, and we can see a beautified self.

  • Image segmentation and generation

Andy the ArtBot, labeled as the world’s first A.I. painting robot, is now a superstar in Meitu app. Meitu Inc has been researching and developing artificial intelligence for years. In 2010, Meitu Inc established the Meitu Imaging Laboratory (MTlab). In 2012, MTLab began to pay attention to artificial intelligence and deep learning. Andy is the latest outcome of MTlab—more specifically–its latest successful case of combining image segmentation and generation technology.

Figure 10. MTgenerate: Image generation. (Image source:

Actually, those have previously been applied in the field of painting. For example, Google’s AutoDraw can match a user’s sketch with an existing image in the database to complete this picture. At the Davos Forum, Kai-Fu Lee also tried to use a robotic arm to do a painting. Comparing to image matching and robotic arm painting, completely repainting a portrait—as Andy does—is not something brand new but a bit more complicated.

By dismantling technologies used in Andy, we will find the following steps:

First, Andy learned a lot of illustrations, based on which he created a generic painting model. That is a long-term process of image generation. The core of image generation technology is based on the production network “Draw Net” that developed by MTlab. Draw Net is responsible for constructing painting models through big data analysis and deep learning. The artistic styles of those models all generate from a database that includes various compositions and strokes.

Second, after seeing the user’s selfie, with the help of facial recognition technology (which we have mentioned above), Andy grasps the contours and facial features of the user.

Then it locates the hair, clothes and background areas by using image segmentation technology.

Finally, by using the painting model to present the main features, Andy finishes his job.

It is not the first time that Meitu applies image generation and segmentation technology. At the beginning of 2017, Meitu app launched a “hand-drawn” feature, which is their initial try of combining facial and segmentation technologies. Also, prior to Andy’s birth, generation technology was being used in a fun feature in Meitu, from which users can see what they will look like if they were Europeans. Through big data and deep learning, the “machine” mastered the facial features of people from different countries and then uses Draw Net to generate a network, for the purpose of converting the user’s Asian features into European features. Andy, in this sense, is a mature form in this combinatorial revolution.

2. Software & hardware dependence

By borrowing existing technologies, the software is well prepared to be implemented. The next step is to find a “medium” to place it. Here, the medium mainly refers to two aspects: software and hardware.

As for software, it means the operating system (OS). Specifically, it is a collection of software that manages hardware and provides services for programs like Meitu. It is able to hide hardware complexity, manages computational resources, and provides isolation and protection. Meitu is no different with any other programs. To use Meitu on laptops or smartphones, people don’t have to literally speak binary, comprehend machine code of this program but understand it in a streamlined graphical user interface (GUI). Through this interface we can work on the image with a mouse or a finger, clicking and seeing them happening right in front of you. All this translation work is done by the translator in your device—the operating system. Most of us are using them every day: Windows, Mac, Linux, Android, iOS, etc. There are universal key elements of those operating systems. The first one is abstraction. They basically eliminate all the unnecessary, redundant “things”: process, thread, file, sockets, memory, etc. The second one is mechanism. Main actions it conducts include creating, scheduling, opening, writing and allocating.

Hardware, as its name implies, includes tangible components of a computer: motherboard, central processing unit, memory, storage, monitor, mouse, keyboard, etc. for the personal computer; display, camera, application processor, sensor, memory, etc. for mobile devices. Although they seem to have nothing to do with Meitu itself, still, software and hardware are two prerequisites for Meitu and other applications.

The secret behind the large user base of Meitu partly lies in its low requirements of hardware and software dependence. In 2008, Meitu was initially born as a computer photo editing software. Unlike Photoshop, which is relatively high demanding in hardware requirements especially processor, RAM, hard-disk space, etc., Meitu is much more light-weight. Plus, compared to its mobile application competitors, there are multiple choices of operating systems that are compatible with Meitu app. The applicable operating systems so far include Android, iOS, Windows, iPad, WindowsPhone.

“Meiyan camera”, all also called beauty camera, is a trendy function in Meitu app by which users will get auto-beautified selfies in real time. The operating process of “Meiyan” camera, could be a good case of wrapping up all the combinatorial components in Meitu app.

Figure 11. Flowchart: how beauty camera works

When users turn on Beauty camera and start taking photos, Meitu would get connected to their camera and the ambient light sensor embedded in the smartphone, detecting the surrounding environment. If the light is too dark and causes much noise, the application will automatically turn on the noise reduction/image denoising and exposure correction to make sure the result is noise-free and bright. Meanwhile, for the purpose of making further adjustment like enlarging eyes, smoothening skin, etc., and realizing beautification function, the app turns on its facial recognition to grasp main physical features in picture and then applying other image editing techniques in order to get a facial beautification effect (like smoothening skin, enlarging eyes, etc.). Thanks to the synergy and combinatorial property of Meitu, the whole process of taking photos and retouch runs smoothly within three seconds.

What makes Meitu usable?

As a platform gathering various powerful technologies, another important responsibility for Meitu app is to increase the usability of the application, which means to hide the complex behind technical part, reduce the possibilities of misleading or jeopardizing efficiency. To realize that, application designers need to make the interface clear and intuitive. Here, the concept of “constraints” has to be taken into consideration.

According to Donald Norman´s classic The Design of Everything Thing, constraints is to limit the actions of users on a system. By restricting users’ behavior, designers can help users understand the status of the system they are in and thus reduce the chance of errors. Through the interface of Meitu app, we could easily find how constraints are applied.

  • Paths

Paths are to help users control in a limited variable range. Usually, they are designed in forms of progress bar or channels, of which shape restrict users’ action to linear motion. Most of image editing functions in Meitu embed intensity bars for the user to make adjustment intuitively. The interface is neat and clean, with only a linear bar on it. As it shows, hardly could users misuse this mode.

Figure 12. Screenshot: brightness intensity bar

  • Barriers

Barriers are designed for redirecting users’ actions, which are heading to a relatively negative or unsuccessful result. If users press the backward button in the middle of the image editing process, the dialogue box will pop out asking the user if he/she decides to quit or not. In another case when image quality would potentially be harmed after editing, the system will inform users with an attention box. With barriers applied to the interface, users are given more transparency to what consequences they are going to face and more agency to make a choice.

Figure 13. Screenshot: barrier settings in Meitu

  • Symbols

In terms of design, symbols take the form of text, sound, visual images, etc., which are used for categorizing, clarifying, and cautioning users about certain actions. “Undo/Redo” options at the top of the screen would be a good example of symbols being used for constraint in Meitu. When the user cannot undo/redo previous effects any further, the “back” or “forward” symbol would be grayed out as a caution. Similarly, a direct text notification can be viewed as a constraint if the system is unable to complete a certain action.

Figure 14. Screenshot: symbol setting in Meitu

Conversational interfaces are of great necessity because through which users are given the opportunity to speak to their devices. Otherwise, the interface will function inefficiently considering the endless possibilities of how to use it will be frustrating to the users. From this perspective, Meitu is qualified for implementing multiple forms of constraints.

How Meitu exerts influence on society?

“If someone shoots another with a gun, who is doing the shooting—is the person or the gun?”, Latour asked. The seemingly absurd question highlights the necessity to think about the relationship between humans and non-human artefacts. From a sociological perspective, humans and technical objects cannot be separate but intermingle. Latour makes this intertwining clear on a conceptual level by introducing “technical mediation.” The gun is a mediator that actively contribute to the way in which the end is realized. The same is also true for Meitu app. Under a sociotechnical context, we could never ignore its interaction with other components and how they influence one another.

  • Beauty Obsession

There is a name for a new kind of face perfected by the Meitu app–with enlarged eyes, sharpened chins, pale skin–which now you see everywhere on the internet and even reality: “Wang hong lian” (internet celebrity face). This trend is fueled by the centuries-old tradition obsession with flawless skin and big eyes. On the one hand, Meitu leads this trend and continuously consolidates public views towards the concept of beauty with its technical tricks embedded. On the other, such an epidemic of stereotype counter-forces Meitu to constantly upgrade its popular functions related to face reshaping.

  • Culture penetration

During the 2018 spring festival, Meitu launched a new activity to celebrate, called “winning gift money with face score”. By using its AI ArtBot function, users send a selfie portrait to the system to calculate the user’s facial attractiveness and then they get gift money according to the rating. Debuted on 15 February, the activity attracted many users, and two million yuan for the Spring Festival was claimed very rapidly. Apparently, Meitu added more fun to a traditional custom with the help of technology.

  • Commercial pressure

Under the guidance of mass culture, Meitu is undoubtedly transforming from industrial products to consumer products. The emergence of social media enables it a higher degree of generality and greater penetration. By frequent sharing of selfies retouched by Meitu on social media, young people unintentionally help providers to promote their products and services. and therefore contribute to the commercial pressure.

In ten year, Meitu has been installed on more than one billion phones mostly in Asia areas. It has been estimated that more than half the selfies uploaded on Chinese social media have been retouched by Meitu. Apparently, its popularity is the result of synergies between different actors and organizations, meaning that we users and application itself are co-evolving constantly and mutually. From the very birth of any software application, it was being influenced by users’ needs, market conditions, technology development, etc. In the case of Meitu app, mobile devices development, software technology, target costumer’s behavior (mainly referring to female under 30), current social background directly decide how the app would be designed and updated. On the other hand, Meitu is also changing–or becoming–a part of Chinese culture.



As the pioneer photo editing app in China, Meitu app is innovative in combining and transforming multiple existing technologies on one platform, shortening the distance between users and emerging technologies within an intuitive interface. Before selfie apps were everywhere, Meitu set a basic model for its followers, explored various possibilities for the future trend of photo-editing: face-related technologies, image generation, motion capture…it is fair to say the biggest achievement of Meitu is introducing cutting-edge technologies to everyday life, with much fun.

 [Works cited]

Arthur, W. B. (2009). The nature of technology: What it is and how it evolves. Simon and Schuster.

Augusteijn, M. F., & Skufca, T. L. (1993). Identification of human faces through texture-based feature recognition and neural network technology. In Neural Networks, 1993., IEEE International Conference on (pp. 392-398). IEEE.

Berg, L. (2018). Young consumers in the digital era: The selfie effect. International Journal of Consumer Studies42(4), 379-388.

China’s Selfie Obsession | The New Yorker. (n.d.). Retrieved December 3, 2018, from

Fan, H., & Ling, H. (2017, October). Parallel tracking and verifying: A framework for real-time and high accuracy visual tracking. In Proc. IEEE Int. Conf. Computer Vision, Venice, Italy.

Jiguang (n.d.). Retrieved December 5, 2018, from

Latour, B. (1994). On technical mediation. Common knowledge3(2), 29-64.

Li, P., Ling, H., Li, X., & Liao, C. (2015). 3d hand pose estimation using randomized decision forest with segmentation index points. In Proceedings of the IEEE international conference on computer vision (pp. 819-827).

Liang, P., Blasch, E., & Ling, H. (2015). Encoding color information for visual tracking: Algorithms and benchmark. IEEE Transactions on Image Processing24(12), 5630-5644.

Manovich, L. (2013). Software takes command (Vol. 5). A&C Black.

Meitu – Beauty Themed Photo & Video Apps. (n.d.). Retrieved December 5, 2018, from

MTlab. (n.d.). Retrieved December 11, 2018, from

Norman, D. (2016). The design of everyday things. Verlag Franz Vahlen GmbH.

Norman, D. A. (1999). Affordance, conventions, and design. interactions6(3), 38-43.

Photoshop system requirements. (n.d.). Retrieved December 8, 2018, from

Rankings – Cheetah Data. (n.d.). Retrieved December 4, 2018, from

Schwebs, T. (2014). Affordances of an App: A reading of The Fantastic Flying Books of Mr. Morris Lessmore. Barnelitterært Forskningstidsskrift, 5(1), 24169.

Van den Muijsenberg, H. J. (2013). Identifying Affordances in Adobe Photoshop (Master’s thesis).

Varagur, K. (2016, May 2). “Auto-Beautification” Selfie Apps, Popular in Asia, Are Moving into the West. Retrieved December 3, 2018, from

You, H., & Chen, K. (2007). Applications of affordance and semantics in product design. Design Studies, 28(1), 23–38.

Zheng, A., Cheung, G., & Florencio, D. (2018). Joint Denoising/Compression of Image Contours via Shape Prior and Context Tree. IEEE Transactions on Image Processing27(7), 3332-3344.

De-Blackboxing the Health Features of Apple Watch

Warning: Use of undefined constant user_level - assumed 'user_level' (this will throw an Error in a future version of PHP) in /home/commons/public_html/wp-content/plugins/ultimate-google-analytics/ultimate_ga.php on line 524

Tianyi Zhao


Apple Watch, firstly launched in 2015 and having evolved to the fourth generation in 2018, is the smartwatch designed and marketed by Apple Inc. As a young and featuring product of Apple’s ecosystem with well-designed physical appearance and practical functions, Apple Watch has already highly praised and favored by customers. The success of Apple Watch depends on its unique design principles and theories. The thesis examines Apple Watch Series 4 from the overall system view, modularity, affordances and constraints. Then the paper will focus on the health function by analyzing its interface and internet within other Apple devices.

I. System View and Modularity

“Good design is a renaissance attitude that combines technology, cognitive science, human need, and beauty to produce something that the world didn’t know it was missing.” – Paola Antonelli, 2001

And here comes Apple Watch.

Smartwatch, removing the sparkling diamonds, precious metals and intricate features of the mechanical watch, shows as an integral whole so that it seems undetachable as a black-boxed device. However, when we look inside Apple Watch, it is a complex system with a relational structure of “interconnected, interdependent or mutually constitutive elements,” which is decomposable into many subsystems, and each can be managed semi-autonomously from the whole system. (Irvine, 2018)

By leveraging with the modular design principle, it is simple to unveil the complexity hidden by the various designs. Inside Apple Watch with the elaborate design and the tiny body, the system can be surprisingly broken down into eight main modules, including the components like case, display screen, S4 chip, the Digital Crown, three heart sensors, speaker, microphone, battery, removable bands and etc. which are “orchestrating” the combined modules into a system. The main modules and features show as below:

Case – Made of aluminum with three choices of colors: space grey, silver and gold.

Display Screen – There are two screen sizes 44mm and 40 mm. LTPO OLED Retina display with a resolution of 368 by 448 (44mm) and 324 by 394 (40mm). Support Force Touch with 1000 nits brightness.

S4 Chip – The processor with 64-bit dual-core processor. It is the complete System in Package (SiP) with the entire system fabricated onto a single component, empowering Apple to pack multi-functions and capabilities into a tiny body.

The Digital Crown – Made of aluminum with a built-in titanium electrode to measure heart rate by touching with a finger. During navigation without obstructing the display, the haptic feedback delivers a precise, click-like feel as user scrolls.

Heart Sensors – There are two kinds of sensors—electrical and optical sensors. The two electrical heart sensors include Digital Crown Electrode on the side and Back Crystal Electrode to generate an electrocardiogram and send to user’s doctor in real time. The optical heart sensor, equipped from the first generation, is for user’s quick check.

Figure 1. Heart Sensors on Apple Watch Series 4


Speaker & Microphone – The speaker is on the left side with a large area. The microphone is relocated to the right side for reducing echo.

Battery – built-in rechargeable lithium-ion battery, guaranteeing up to 18 hours.

Removable Bands – There are abundant colors and materials of watch bands that can be changed and slide into the band slots to satisfy the need of personalization.


II. Affordances and Constraints

Donald A. Norman defines affordances as the “perceived and actual properties of the thing, primarily those fundamental properties that determine just how the thing could possibly be used.” (Norman, 9) Apple Watch, designed as a wearable technology, is convenient for users to carry everyday with tiny size and light weight. As for the detailed components, to ensure the effectiveness, their affordances have been designed more perceivable and discoverable. The display screen affords to be touched, meanwhile where to touch depends on context specifically determined by the user interface. With a textured grip and as an extrusion on the exterior side, it is clear for users that the Digital Crown can be both rotated and pressed; also, the red circle on the top of crown, which indicates the electrocardiogram, guides users to touch with finger. Likewise, the flat button, located under the Digital Crown, is easily to be recognized by being slightly curved. It is obvious that the affordances of Apple Watch also reveal the indispensability of visibility, which emphasizes that systems are more usable when they clearly indicate their status so that the possible actions can be performed. (Lidwell, 202)

Oppositely, constraints “limit the possible actions that can be performed on a system.” (Lidwell, 50) Besides the convenience, the absolutely minuscule screen of Apple Watch, to a great extent, restricts user interaction. Firstly, information presentation and data input are highly limited. To guarantee the information delivery accurately and in time as well as to maximize reader’s understanding, there are only keywords showing on each alert or notification. The lack of keyboards forces users to fully depend on voice dictation. Secondly, the tiny size causes the emergence of the “fat finger problem.” When users put their finger over the watch screen, for the most time it covers almost 30 percent of the interaction space. Then, Apple Watch performs as an accessory or companion of iPhone instead of a standalone product. The heavy dependence is embodied in the mandatory pair with an iPhone if the user would like to explore more functions besides clock. At last, the limited battery life is also a significant issue. The maximum of 18 hours can hardly satisfy the daily need, especially for the people who usually stay outdoors for a long time. Apple Watch is marketing itself as a good companion to “help you stay even more active, healthy, and connected.”1 But the restricted battery life cannot help abide by its value.


III. Health Function

Apple Watch becomes more than a time-telling wearable, but rather a message notifier and consumer medical device. Health and fitness have been the new selling highlights to attract customers. In this part, the author will study how user interact with Apple Watch during workouts and the graphic design principles for application icons.

A. User Interaction

Due to the limited space and tiny body, there are many interactions of health function being blackboxed in Apple Watch, simplifying the user’s daily life. “Now, with the potential of Health Records information paired with HealthKit data, patients are on the path to receiving a holistic view of their health. With the Health Records API open to our incredible community of developers and researchers, consumers can personalize their health needs with the apps they use every day,” said Jeff Williams, Apple’s Chief Operating Officer. The analysis of how Apple Watch achieves it includes gesture interaction and semiotic icons.

1.Gesture Interaction

Since Alan Kay proposed the concept of “metamedia” interfaces, the awareness – that displays are not simply passive representational substrates of results or states but can be designed to take input as instructions or choices back into a computing process – has prevailed. Apple Inc. goes further than the creation of iPad in 2010. It has reinvented watch with interactive display and built-in modules and functions to maximize human-computer interaction, which optimizes user experience effectively. It not only contains the gestures “swipe,” “tap,” and “press” like smartphone does, but also recognizes physical gestures including wake-up and sleep, fall detection, heart rate monitoring and automatic workout detection. The technology hidden behind is the utilization of accelerometer – which measures changes in motion – and gyroscope – which detects the rate of rotation along three different axes.

Quick Wake-Up and Sleep – Apple Watch can be simply activated by raising user’s wrist or press the Digital Crown; likewise, it goes to the sleep mode when user lowers wrist. The design that accurately detects the rotation simplifies the process when users are doing sports or in any other motion.

Fall Detection and Emergency – The accelerometer and gyroscope, collaborating with a long-time research and its own algorithms of Apple Inc., work together to achieve fall detection. The accelerometer can measure 32 Gs of forces, which empowers Apple Watch to assume a big impact spike that a hard fall can create. Simultaneously, gyroscope is in charge of measuring the rotation rate and visualizing the different ways it does this rate by leveraging three-dimension analysis. It pictures “an axis going horizontally across the screen (the X axis); another one going vertically up the display (the Y); and finally a third sticking straight out through, and perpendicular to, the screen (the Z).” (Rob Verger) If there is a hard fall detected, the system immediately sends out a severe fall alert so that user can call for emergency help or ignore the warning. If the user is moving, Apple Watch waits for user’s response. However, if the watch detects that user has been immobile for about a minute, it will call emergency services and send a message to the emergency contacts with current location automatically.

Figure 2. The Alert Interface after Apple Watch’s Fall Detection


Heart Rate Monitoring – As each beat of the heart transmits an electrical impulse, Apple Watch can accurately read and record these impulses by combining the circuit between the user’s heart and both arms with three sensors. Apple Watch starts and continues to capture and record heart rate whenever user wears it. When Apple Watch detects unusually high or low heart rates or irregular heart rhythms like atrial fibrillation, it alerts the user to the irregularities, so that in-time action will be taken.

Figure 3. Reading Heart Rate with Digital Crown Electrode


Figure 4. Real-time Heart Rate Result


Figure 5. The Alert Interface When Irregular Heart Rate Detected


Automatic Workout Detection – Apple Watch is capable of detecting user’s activities, including walking, running, swimming, elliptical and rowing. The user gets reminders to start a workout and end one with an assumption of the specific workout type when the watch senses the user’ current activity status, during which it gives credit for the exercise the user has already started. Based on personal experience, the operation procedure is so easy and comfortable for the user that he/she would not be distracted when warming up or cooling down.

Figure 6. Reminders of Start and End Work Out



2. The Graphic Design of Icons

On the Home Screen of Apple Watch displays all applications downloaded on the watch, which show in circular shape and are the adapted version of applications on the smartphone. Because of the limited screen size, there are quantities of restrictions of Home Screen icons design. Generally, there are three main rules for Home Screen icons design – simplicity, focus point and similarity.

Figure 7. Home Screen Icons Collection


Simplicity – The icon includes a single element that concentrates the essence of the application in a simple and unique shape. For example, the icon of Phone app is designed with the element of phone; Workout app is obviously displayed with a person in motion; and an envelope signifies the email app effectively.

Figure 8. Icons of Phone, Workout and Email


Focus Point – The icon succeeds capture user’s attention and clearly identifies the application with a focused point. Among the examples discussed above, the phone, a person in motion and the white envelope perform as the focus points to guide user to understand accurately.

Similarity – Application icons on watch OS should maintain similarity to the iOS version. This principle requires designers to make an association between them by using a similar appearance and color palette, which helps user recognize fast and correctly. For instance, the iOS icon and watchOS icon share the same blue background. However, the brand name has been shortened from “Booking” on iOS to “B.” on watchOS to reduce user’s visual pressure on such small screen.

Figure 9. iOS icon vs watchOS icon



B. Apple Ecosystem with Internet

Having been struggling to invent and upgrade for year, Apple Inc. has already established an ecosystem with featuring products. Apple Watch, focusing on health functions, has not only practically expand market in sports and fitness lovers but also optimize common users’ healthy life. In both hardware and software levels, Apple Watch realizes real-time detection and record user’s exercise status and health conditions by building an internet with other Apple devices, like iPhone and MacBook.

Hardware Internet – The first thing when the user gets a new Apple Watch is to pair with an iPhone by scanning the animation on the watch. After setting up the watch via iPhone, the user can start use it with the Bluetooth connection with iPhone. Likewise, Apple Watch can also pair with MacBook to simplify the usage, for example, MacBook can automatically wake up when the user is wearing the paired Apple Watch.

Figure 10. The Pairing Process of Apple Watch and iPhone


Software Internet – Connection via Bluetooth enables Apple Watch to record and update the fitness and health conditions. The HealthKit – an Apple designed bundle that takes a collaborative approach to building a personalized health and fitness experience, including activity, mindfulness, nutrition and sleep – provides a central repository for health and fitness data on iPhone and Apple Watch. In other words, Apple Watch is an activity detector and data recorder, while iPhone serves as a data manager. The internet distributes different works to Apple Watch and iPhone. Conversely, the complementary relationship deepens the significance and indispensability of Internet.

Figure 11. Monthly Data Collection of Heart Rate

Many people believe Apple Watch help them cure iPhone addiction and cultivate a better behavior to improve self-management. However, inside the over-tight Internet with iPhone, Apple Watch becomes an accessory instead of a standalone product. For a better performance, Apple Watch should be within the Bluetooth range, and the Bluetooth and Wi-Fi need to be enabled permanently on the phone.  Personally, I deem that there are two ways to make Apple Watch to be an independent product. Firstly, Apple can take full advantage of iCloud in the process of setting up and syncing. There is no need to pair with an iPhone, but with the user’s iCloud ID; similarly, the data update can also be achieved via iCloud instead of Bluetooth. Secondly, Apple Watch should be differentiated from iPhone, because it is tedious to receive the same notifications twice, both on the watch and on the phone. Rather, the third-party applications can be eliminated from the watchOS, and adding the missing apps – such as Notes, Podcasts, Voice Memos and etc. – is necessary. Apple Watch can never be separated from the over-dependence with iPhone with the lack of essential apps and the redundancy of repeating functions.



Apple Watch has been one of the most featuring Apple products since its debut in 2015. The design principles hidden inside the tiny body are various and complex, empowering the wearable more multi-functional and practical. De-blackboxing Apple Watch in a system view, it is surprising to explore the complexity with many subsystems or modules. Modularity allows users to manage the complex whole structure of the watch by breaking its functions into separate and interconnected components. It is quite clear to understand how display screen, chip, battery, heart rate sensors, the Digital Crown and etc. work together as various sub-components. Affordances and constraints generated by elaborate designs diversify and help Apple optimize the watch to satisfy the users’ demands by annual evolvement. Health function, as the most attractive selling point, transforms Apple Watch from time-reader to the “ultimate workout partner.” The user interaction is based on multiple gesture detection, collaborating with the built-in algorithms. The graphic design on the home screen icons also simplifies the daily use and process, associating with the health and fitness highlights. At last, as a key product in the Apple ecosystem, users enjoy the convenience brought by its dependence with iPhone. However, the over-tight Internet predicts that the future market share Apple Watch will be totally tied to the prevalence of iPhone. Thus, here I propose two solutions that upgrading with the utilization of iCloud and focusing on differentiation from iPhone in some functions, which will help Apple Watch to change the embarrassing positioning as an individual product and embrace further breakthroughs.


Works Cited

Allison, Conor. “And finally: Apple Watch Series 4 detects AFib with 98% accuracy, says Heart Study.” Wearable, Sep. 16, 2018.

Bower, Graham. “iPhone dependence is killing Apple Watch. Here’s how Cupertino could fix it.” Cultofmac, Apr. 19, 2018.

Hale, James Loke. “How Does Apple Watch Health Work? Here’s Everything The Device Can Do For Your Well-Being.” Bustle, Apr. 19, 2018.

Irvine, Martin. “Introduction to Affordances, Constraints and Interfaces.” 2018.

Kaufman, Lori. “How to Rearrange the App Icons on Apple Watch.” How-to Geek, Dec. 17, 2015.

Lidwell, William., et al. Universal Principles of Design. Rev. and updated ed., Rockport Publishers, 2010.

Michaels, Mary M. “The Apple Watch Case Study: What we can learn and apply from an affordance analysis.” Human Factors International, 2015.

Norman, Donald A. The Design of Everyday Things. Basic Books, 2003.

Jeffries, Adrianne. “Why There Aren’t More Apple Watch Apps, According to Apple Watch Developers.” Motherboard, July 21, 2015.

Verger, Rob. “The Apple Watch learned to detect falls using data from real human mishaps.” Popular Science, Oct. 2, 2018.

Wallas, Paul. “Apple Watch; UI and UX Review.” Medium, May 28, 2017. Accessed on Dec. 12, 2018.

Apple Developer. Accessed on Dec. 12, 2018.

Airbnb — Create a world that inspires human connection

Warning: Use of undefined constant user_level - assumed 'user_level' (this will throw an Error in a future version of PHP) in /home/commons/public_html/wp-content/plugins/ultimate-google-analytics/ultimate_ga.php on line 524

Zijing Liu


      As travelers search for accommodation, instead of looking for hotels, people have got another choice — to stay in a stranger’s home. It is cheaper, offers a sense of home, and probably most important, a kitchen. It is Airbnb. What has made Airbnb such a worldwide success? This paper will de-blackbox the technological design principles behind Airbnb as well as how these principles combine systematically to work a whole symbolic cognitive artefact. The writer shall utilize the knowledge learned from this lesson and scholarly articles, explore the points from a designer’s perspective rather than merely users.

Figure 1 symbolic meaning of Airbnb 


      Airbnb operates an accommodation marketplace that allows hosts to list their available places to be rented by users who seek a short-term lodging. It serves as an inter-mediation to connect people who want to rent their dwelling places and who want a place to stay. It dedicates to create a world that inspires human connection and redefine what it means to be home.

      The technologies embedded in Airbnb include search engine, online accommodation database, digital calendar, digital map and GPS, digital media, translation tool, messenger, online transaction, visualization tool. As Brian Arthur pointed out in “the Nature of Technology”, technologies, all technologies, are combinations. (Arthur, 2009) None of the technologies were invented by Airbnb. Airbnb just combines these existing technologies in dynamic of balance and interaction to achieve designers’ intention, to show the “magic”. There is a built-in “ratchet effect” in human systems of artefacts and technologies. As Prof. Irvine indicated, the metaphor “ratchet” describes “a memory function in technology development that enables a society to use the “mental models” of already developed technologies as the starting point of new developments” (Irvine).

      How did Airbnb combine the technological components? Why does it become such a powerful and popular application? The author will deproductize its interface and affordances, design principles in multiple layers.


From Webpage to Mobile Application

The first and most basic definition of technology is a means to fulfill a human purpose.

— W. Brian Arthur

      Airbnb started as a simple website that provides bed and breakfast, to satisfy the need of looking for accommodation resources for travelers who cannot afford the expenses of hotels. Airbnb was born to fulfill a human purpose. The interface of Airbnb is clean and simple, guiding users to enter required information, as modules should be designed to hide their internal complexity and interact with other modules through simple interfaces. (Lidwell-Holden-Butler, 2003)

Figure 2   Airbnb history

      The Airbnb website was built in 2008. Airbnb follows the web design principles from web browsers to mobile applications. Firstly, a distributed network system across unlimited client/server implementations. (Irvine) A “server” is just a computer on a network that serves up responses to other computers. Since Airbnb is a global application for traveling, the only way is to connect remote users with remote resources in an open and scalable way. Secondly, extensible of unforeseeable future applications. (Irvine) Airbnb was launched in 2008 while the Internet has already evolved for almost 30 years. The Internet has bred numerous applications, one of which is Airbnb, utilizing the shared house renting information through the world. The globalization of the Internet provides worldwide access to Airbnb. Thirdly, scalable for adding new users, nodes, agents and Web-deliverable services. (Irvine) For future expansion, it allows the system to alter the number of users, resources and computer entities. So it did. In March 2009, Airbnb had 2500 listings and close to 10,000 registered users. But now Airbnb provides access to 5+ million unique places to stay in more than 81,000 cities and 191 countries.

      As Airbnb grew in popularity, it launched internationally and released an iPhone application in 2010. The major differences of the Airbnb website and application are result from the different size of screens, as flexibility-usability tradeoff should be considered in the design process. (Lidwell-Holden-Butler, 2003) Computers have larger screen so that it can perform more functions, more flexibility at the same time. Mobile devices such as smartphones have smaller screen. Although it cannot afford too many functions at once, it has higher usability — more efficient to handle. The superior affordance of Airbnb webpage users over mobile application — due to the screen size constraint — is the collaboration among different modularity. If the user changes the range of housing price, the house listing information and their location on the digital map change with it. For mobile application users, although they could only view one page at one time, the number of houses is shown as soon as the price range changes, which helps users to narrow their targets down to specific houses efficiently.

Figure 3  flexibility-usability tradeoff on the interfaces of Airbnb website and app


What makes Airbnb worldwide

The best design is that you do not even aware of it.

— Donald A. Norman

      Everyone has cultural biases, expectations, and value judgments that are the result of living in a particular society or subgroup. It is the job of the designer to identify and consciously examine these biases so they can become the subject of active choices rather than passive acceptance. (Murray, 1997, p.29) It is hard to design software that supports people from different countries, background, and culture. Airbnb serves as a cognitive artefact — an aspect of the material world that has been modified over the history of its incorporation into goal-directed human action. (Cole, 1996) The Airbnb interface is an artificial device designed to maintain, display, or operate upon information in order to serve a representational function — to transform the properties of the artefact’s representational system to match the properties of users’ internal cognitive system. (Norman, 1991) The mapping between representing world and represented world matches faithfully.

      Firstly, the huge success of Airbnb is partly attributed to the universal cognitive-symbolic design (as shown in Figure 4). Culture is considered to be composed entirely of learned symbols and shared systems of meaning — the ideal aspect of culture — that are located in the head. (Cole, 1996) In the digital interface, the signs can be specified to function as either (or both) symbolic “content” (rendered text, images, video, etc.) or as action translators (icons, links, gestural controls) for initiating computational processes designed also to render back other patterns of symbolic representations. (Irvine) Airbnb uses commonly agreed icons and images to show available amenities (e.g. Free parking, Washer, Wifi), so that it is readily understandable to people all over the world.

Figure 4   universal cognitive-symbolic design

      Secondly, Airbnb serves as “metamedium” , which is a medium designed for representing, processing, and interpreting other media. (Manovich, 2012) Airbnb supports multiple languages, currency, payout methods, thus expands the potential users. There is also a translation tool (Google translate I believe) embedded in the digital interface, so guests are able to read reviews in different languages with merely one click. The affordance theoretically connects people all over the world since once users get a digital translator, language is no longer a problem. It is worth mentioning that Airbnb supports multiple transaction tools (as shown in Figure 5), and adjust payment method automatically based on region. When I used Airbnb in China, the payment method was Alipay — the dominating transaction tool run by Alibaba — when I chose it, it jumped to Alipay to fulfill the payment. After I came to the U.S., I found the payment method was changed to Google Pay automatically. It demonstrates that the GPS system in smartphone is applied while using Airbnb to locate users’ current country or region, and adjust transaction tool based on the GPS information transmitted.


Figure 5   payment methods Airbnb support (resource: Airbnb Help)

      Thirdly, Airbnb cooperates with social media platforms, such as Facebook, to strengthen the connection among users. As stated on the Airbnb official website, “Social Connections shows you how you’re connected to others, either directly or through mutual friends, depending on your Facebook privacy settings. It also highlights your Airbnb activity, which may include your username, Facebook profile photo, and recent locations you visited your Facebook friends who are also on Airbnb.”(Airbnb Help) This means when two Airbnb users are friends on Facebook, they automatically become friends on Airbnb.

      Hence, Airbnb exerts the unlimited connectivity of the Internet with various affordances to eliminate the cultural and social barrier between hosts and guests. Culture and media technologies are co-produced or co-constitutive” and finally form a co-mediation system. (Irvine)


Airbnb as house searching media — Explore & Trips

The possibilities were inherent in the modularity of the design itself.

— Carliss Y. Baldwin & Kim B. Clark

      “Modularity allows us to manage a larger and more complex who structure by dividing up its functions into separate, interconnected components, layers, and subprocesses.” (Irvine) Airbnb separated its functionality into several modules, which forms “a hybrid structure containing interconnected, independent and hierarchical elements”. (Clark, 2000)

      In “explore” module, the most observable technology embedded in is search engine — allowing users to enter destination place. What is important but commonly ignored is the “autocomplete” function — when users type in the first few letters, the search engine jumps out some results automatically that correspond the information already entered (as shown in Figure 6), which utilizes another essential design principle, recognition over recall. That is, minimize the need to recall information from memory whenever possible. (Lidwell-Holden-Butler, 2003) Behind the search engine is the database of cities that offer Airbnb lodging resources. Hence, it is not difficult to observe that Airbnb stands on the strong foundation of the highly connected Internet, with which Airbnb would be able to provide the targeting information users need within its own network.

Figure 6   search engine in Airbnb

      As users intend to narrow down their target lodging resources by adding more conditions, the interface of Airbnb provides a sense of control. Users can choose their expecting conditions (e.g. range of dates, the number of guests, home type and amenities), or even more importantly, whether to show the filters on the interface or not.

      To show the results, Airbnb uses digital media to help users find lodging. Each accommodation resource contains the detailed photos of abode, digital map of the target location (exact location will only be provided after booking to protect privacy of hosts), online forum of reviews. All the media are simulations to traditional media (e.g. printed photos, maps). By hiding the “digitalization” process, affordances become invisible to users thus we always take them for granted.

      Another important module is the visualization tool — a line chart corresponding the price range and number of houses embedded in “filters”. It shows how many houses fall in this price and overall lodging level in the city. Clearly, different modules — map, photos and price visualization — function together in a dynamic balance. 

      In “Trips” module, Airbnb stores users’ travel history, showing by combination of photos and digital maps. As long as the users log in their Airbnb account, they are able to track their travel history and Airbnb orders. Therefore, all the history is stored in the Internet cloud space, it is free to log in and recover information. “Cloud” is a collection of servers that act in a coordinated way. It is kept by Airbnb permanently and will not disappear. This affordance, on the other hand, proves the infinite space and possibility of the Internet — millions of users are able to save their information without worrying lose it or damage.


Airbnb as social media — Saved & Inbox

New media always remediate the old ones.

      — Lev Manovich

      People usually need discussion before traveling together, because they cannot make final decision of where to stay by themselves. Therefore, Airbnb adds affordance to serve as social media, to allow users to share information online. Accommodations in the same city are saved as one collection, users are able to “invite friends” of each collection via social media platforms (e.g. Facebook, SMS, WhatsApp, Wechat, QQ), email system (e.g. Gmail), or directly copy the link.

      Communication between the hosts and guests is particularly important. Based on the demand, Airbnb expands its affordance to messenger, so that hosts and guests can send real-time messages and achieve international instant communication. This affordance is especially crucial for people who intend to travel to a different country, which means they may not share the same communication tool and they cannot get quick response via email. Take me as an example. When my mother and I traveled to New York and booked a house on Airbnb — the first time I booked lodging in a different country by Airbnb — communication became a problem because the host did not use Wechat and email was inconvenient for instant messaging. We communicated about all the pre-arrival issues via Airbnb and informed the hosts as soon as we arrived at New York. Even if we had time differences, everything got settled down because of barrier-free communication in Airbnb messenger affordance. (See Figure 7)

Figure 7   Instant messaging between hosts and guests

      In addition, the digital media interface of Airbnb, as brought up by Murray, reflects four affordances, which were encyclopedic, spatial, procedural, and participatory. (Murray, 1997) Airbnb constructed an online community where hosts can connect with other hosts (e.g. share stories, ask for advice). It is encyclopedic since users can obtain answers or advice of almost all the topics from others who had run into the similar situation. Also, hosts and guests are free to view the reviews of each other. Users have access to the reviews from all over the world and at the same time, the reviews of users themselves are open to anyone within the network. The spatial affordance refers to virtual spaces the designers created that are also navigable by the interactors. (Murray, 1997, p.70) Users can get access to unlimited resources through many-to-many communication — World Wide Web. The infinite space of the Internet is displayed and navigated through the graphic user interface, we just hardly aware of it. The procedural property is its ability to represent and execute conditional behaviors. (Murray, 1997, p.51) Once users meet some unique problems such as canceling reservation or properties damage, they can get help directly from the Airbnb team via social media contact. Participation in digital media increasingly means social participation. (Murray, 1997, p.56) In fact, social participation is a requirement for every Airbnb users. After finished a trip, both hosts and guests are encouraged to leave reviews to each other. Hosts are willing to do so since in this way, they can get more exposure, guests can get more completed profile to raise their credibility. Accordingly, Airbnb online forum has grown larger, thus attracting more users to the application.

      The designer must script both sides, interactor, and digital artifact so that the actions of humans and machines are meaningful to one another.


Airbnb as hierarchical model — Profile

The word adds another dimension to the world of humans.

— Michael Cole

      Personal profile is a crucial part to increase credibility between hosts and guests. When my mother and I booked the same apartment in New York, the host approved my request while declined my mother’s because she thought I had a more completed profile with reviews from former hosts while my mother was a new user — her profile was blank.

      To complete the profile, users have to complete it step by step, and here comes another important design principle that was commonly ignored, progressive disclosure — separate information into multiple layers and only present layers that are necessary or relevant. (Lidwell-Holden-Butler, 2003) Airbnb segments needed profile information into several parts: first sign up an account, then provide detailed information (e.g. name, headshot, identification), payment method. Airbnb shows which step it is so that users would know how many steps left. Progressive disclosure guides users through the complex procedures with simple operation. (as shown in Figure 8 & 9)

Figure 8  Progressive Disclosure in Profile

      Figure 9  Profile complete procedure

      Further, progressive disclosure is an efficient design principle to hide the infrequently used controls or information. For instance, notifications, currency, payment methods, terms of service are hidden in “settings”, detailed notifications and terms of service are hidden inside them. Apparently, Airbnb builds the “profile” module in hierarchy and multiple layers, and hides the unpopular functions to manage complexity.


Expansion of Airbnb — Open Homes & Experiences

Redefine what it means to be home.

— Airbnb

      Open Homes is a program that Airbnb launched in 2012 in the aftermath of Hurricane Sandy. It offers free, contemporary accommodations to those who lost their homes due to conflicts, disasters, illness. The goal of the program is to grow a community of hosts who believe that offering a welcoming space can help someone rebuild their life. (Airbnb Help)

      By operating Open Homes, Airbnb goes far beyond an enterprise that earns profits by running hospitality service. It has paid effort to philanthropy that depends strongly on information flow. Airbnb works closely with nonprofit organizations such as the International Rescue Committee and Mercy Corps to develop the program.

      Open Homes connects organizations seeking short-term stays and volunteers offering up their homes for a specific cause. When volunteers sign on, they’ll be able to specify the cause they’d like to donate their room or home to. Nonprofits looking to set up a family or individual for a few days or weeks while they suss out more permanent housing will be able to view lists of potential volunteers. The new platform automates much of the work that Airbnb has been doing manually up until this point.

      Experiences is a program launched in 2016. It offers a deep-dive into the local host’s world through special knowledge, unique skills, and inside access to local places and communities that guests couldn’t find on their own, creating lasting connections. It was built on a distributed network that utilizes the same affordances as searching accommodations: photos showing the attracting characteristics, digital calendar and map help to check availability, GPS system makes recommendations based on current location.  

      Information flows in and out, as a consequence, we shall be living in an infosphere that will become increasingly synchronized (time), delocalized (space), and correlated (interactions). (Floridi, 2010)



Good design is aimed simultaneously at perfecting the object and at improving the overall practice of the field.

— Janet H. Murray

      Airbnb has imperfections due to its constraints. Primarily, unlike hotels, facilities are not as completed. Neither housekeeping nor any room service is provided until the guests leave. Also, there is no place to keep luggage after checking out. So Airbnb does not fit for long-term stay. Moreover, the cleaning fee and service fee will not show up until you complete the final step of booking. It causes problems as users finally make their decisions but they find there are still a bunch of extra fees. Furthermore, news reported some hosts installed hidden cameras to secretly monitor every move of guests, which severely violated the law and ethics as well as the guests’ privacy. In addition, Airbnb does not cover all the loss and damage for hosts, although it provides “Host Guarantee Program”, while there is no protection over guests if their items are lost or stolen.

      The popularity of Airbnb has also brought a series of social problems. Firstly, under the intense circumstances that Airbnb has attracted a significant number of customers away, many hotels are driven out of business and hotel employees, therefore, lose their job. American Hotel and Lodging Association — including juggernauts Marriott International and Hilton Worldwide — put pressure on local government to compete with Airbnb by judging it evading taxes. Hospitality Net reports that local, state, and federal governments miss out on $226 million in tax revenues per year from the reduction in hotel stays in New York City alone. (See Figure 10) Secondly, Airbnb has driven up the real estate price of some cities such as Amsterdam, because local hosts are able to afford more on a flat when they rent it out. Thirdly, because of the increasing house resources, Airbnb has caused over-crowded problems to local communities, including noisy parties, parking congestion. 

Figure 10  Protesters gather outside of New York Governor Andrew Cuomo’s office on third avenue in New York. (Sources: Frank Franklin II / AP)



Technology is never neutral or independently determinative.

— Martin Irvine

      In conclusion, Airbnb has designed a sociotechnical system of interconnected agency and co-dependency.” (Irvine) All the technologies in the lower layer of the system — the network’s core — provide general services that can be used by all applications. (Schewick, 2012) Digital photos, maps, calendar, GPS — can be easily found in other applications (Uber, Facebook, Amazon, etc. name whatever you want). As a result, having the designer of applications (who know the need of applications) design application-specific functionality is more efficient than asking designers of lower layers to anticipate the need for future applications. (Schewick, 2012) Airbnb designers do not need to think about how to design GPS or a new technology, instead, they only need to focus on the higher layers of the system — how to combine the different affordances to achieve their goals.

      Application autonomy — a hierarchy relationship between the application and the network — can be used to perfectly describe the whole Airbnb design system. The interface of Airbnb is in control, and the network plays a serving role. Lower layers are responsible for very general building blocks, which can be used by Airbnb designers to realize application-specific needs in higher layers of system. By putting Airbnb on end hosts in control, the principle of application autonomy effectively puts control over the use of the Internet in the hands of users.

      Computational thinking is also applied in the design process of Airbnb — using abstraction and decomposition to solve problems. Initially, Airbnb built their website to solve the problem that “travelers could not afford hotel price while renters needed extra money to pay for rent”. Airbnb decomposed the problem into small pieces: on one hand, travelers looked for some places for short-term rest; on the other hand, house owners had extra spaces to earn extra money. To think computationally is to interpret a problem as an information process and then seek to discover an algorithmic solution. (Denning, 2015) Airbnb solved the solution by using algorithm to establish a platform that bridge travelers and house owners so that they could both benefit from Airbnb. All of these services act by using the software on the network to generate the connectivity needed to join the two ends of a relationship. This relationship, in turn, can become a service, as in the examples above, or remain as a relationship without involvement of exchange of products or services.

      Further, Airbnb utilizes abstraction in the housing searching function. It “coding” the residence resources with numbers — price — on the digital map. Thus, as users check the map of destination city, they will gain a map full of abstract prices and corresponding lodging location. Users are able to easily check the location and price of the house by moving the mouse onto the photo, then the corresponding price icon would be highlighted. At the same time, the average nightly price is provided above the line chart for references. (See Figure 11)

Figure 11  “Coding” houses with prices on digital map

      Computer simulations of physical media can add many exciting new properties to the media being simulated. (Manovich, 2013, p.86) Airbnb employed simulation, that is, modeling physical objects in the real world and their interactions. (Evans, 2011) Airbnb constructed online “Community”, where users are free to browse others’ conversation, ask questions and leave comments in “Discussion Room” — a simulation to real-world correspondence. Airbnb Community borrowed concepts from real-world —  conversation and discussion room only exist in real life. Airbnb imitated the affordance and built a virtual ecosystem to allow conversations realized online through fictitious rooms. (See Figure 12)

Figure 12  Airbnb Community: simulation to conversation and discussion rooms



The Internet of Thinking: Problematizing the Semiotic Processes Behind Google Arts and Culture

Warning: Use of undefined constant user_level - assumed 'user_level' (this will throw an Error in a future version of PHP) in /home/commons/public_html/wp-content/plugins/ultimate-google-analytics/ultimate_ga.php on line 524


On February 1st, 2011, Google Culture Institute launched its long-awaited digital venture, the “Google Art Project”. Originally in partnership with 17 museums from Europe and the U.S., the platform offered a new perspective on how museums are used and will be used on the Web. Google’s endeavor, now titled “Google Arts and Culture” has become the epicenter of a debate within the professional museum field where scholars see a paradigm shift towards a destabilized way of apprehending works of art. I argue that the driving force behind such a shift are the design principles that are at play. By comparing the design of Arts and Culture to the Metropolitan Museum of Art’s website and the online database Artstor, I will extrapolate the poor implementation of Google’s design principles that facilitate user interpretation and access to understanding art and culture. I will end by using my analysis as well as my background in Art History to offer suggestions on how to improve the visualization for Google’s platform that incorporate such semiotic layers.

The success of Google Arts and Culture was perhaps unintentional. The enterprise was developed as a side project by Google’s Group Marketing Manager Amit Sood who envisioned a platform that provided a “number of digital reproductions of works from participating museum institutions, which can then be visualized in high resolution and explored through a drag-and-drop, zoom-in-and-out interface.”[1] High resolution images could easily be found online, yet Google created pictures composed of as many as seven billion pixels. Also dubbed “gigapixel” images, these provide the user an even better experience than the “real, breathing thing” because the quality allows the user see a “microscopic view of details in brushwork and surface condition that previously were difficult or impossible to see with the naked eye.”[2] Yet among these- and more- digital feats that characterizes Google’s platform, the problem arises as to where does the platform mediate and provide interpretable information for users. Establishing any sort of meaning and dialogue between works of art recalls Malraux’s idea of the musée imaginaire– an ‘imaginary museum’ or ‘the museum without walls’.[3] A concept on creating the ideal collection of all artworks seen in our imagination where dialogue rests on the view of art and art history as essential for establishing dialogue between works.[4] The Metropolitan Museum of Art’s website and Artstor are designed with that very intention in mind; providing access to material and information for users to construct their own meaning system.

The Metropolitan Museum of Art

Providing some background information on how both sites get and organize their metadata will determine how each institution designs their platform. Understanding data storage, creation and dissemination will become key points of comparisons to Google Arts and Culture. The Metropolitan Museum of Art, also known as “the Met”, has a very thorough website designed as a one-stop-shop for anyone who is curious about the museum’s vast collections, archives, exhibitions or visiting the Met Museums themselves. The homepage is divided into multiple sections that merge together when you scroll up or down the page. The Metropolitan Museum of Art’s metadata is from its own database through their digitization efforts of their collections. According to their website, “The Metropolitan Museum of Art creates, organizes, and disseminates a broad range of digital images and data that document the rich history of the Museum, its collection, exhibitions, events, people, and activities.”[5] Such an undertaking was managed by the Thomas J. Watson Library Office of Digital Projects that “provided access to research materials from the Libraries of The Metropolitan Museum of Art, selected materials from Metropolitan Museum of Art curatorial departments, and partner libraries and archives.”[6]

Don Undeen was the information architect at the Met for 4 years before coming to Georgetown University. In an email, he explains how the Met creates, stores and updates its metadata:

“The Met stores all the data about its collection in a proprietary database system called TMS (The Museum System, from Gallery Systems). It’s a big complicated relational Database that runs on an Oracle database. The collection managers and curators from all the museum departments keep this system updated and objects are acquired, new information is gained, they are placed in various exhibitions, travel on loan, etc. So the database is more than just the information that visitors to the website see, but is actually for managing the objects in the collection as well.”[7]

The museum’s collection falls into two categories; images of works believed to be in the public domain and those to be under copyright or other restrictions. To overcome this division, The Metropolitan Museum of Art implemented a new policy in February 2017 known as “Open Access” which allows artworks in the public domain to be freely available for unrestricted use as well makes available data from the entire online collection under the Creative Commons Zero (CC0).[8] Images of artwork believed to be in the public domain are also available on ITHAKA-Artstor and Google Cultural Institute, reinforcing a shared platform of metadata that also plays into Malraux’s concept of a museum without walls.


Artstor is also committed to disseminating knowledge of scholarship and teaching through digital images and media. A nonprofit organization, Artstor is comprised of Artstor Digital Library that includes millions of high-quality images for education and research across disciplines from a wide variety of contributors around the world. They also developed JSTOR Forum, a software that allows institutional users to catalog, manage, and distribute digital media collections and make them more discoverable. In 2016, Artstor formed a strategic alliance with ITHAKA, a not-for-profit organization whose mission is for preserving and expanding access to knowledge and is home to other services for higher education such as JSTOR and Portico. Whereas the Met uses its own collection as metadata, Artstor relies on contributors to promotes and shares information. Contributors are listed as museums, artists, artists’ estates, photographers, scholars, special collections and photo archives on the website. As such, Artstor uses several different kinds of databases to get their information onto one platform. Artstor has specific guidelines on how to sort through, group, classify and organize the information they receive. According to the website, “…the Digital Library can be searched and browed by object-type classification (e.g. painting, architecture, etc.), country/ region, and earliest and latest date. The classification terms are applied from an in-house controlled list (painting, sculpture, etc.); the country terms are from the Getty Research Institute’s Thesaurus of Geographic Names (TGN); and numeric earliest and latest dates are created for each record. The Artstor Advanced Search and Browse functions depend on these consistent access points to improve access to all collections.”[9] To alleviate  problems associated with information overload, Artstor uses the following techniques:

  1. Clustering duplicates and details representing a unique work “behind” a lead image to represent the entire work and provide the highest quality
  2. Collaborative filtered groups determine which images are saved in conjunction with other specific images by users making groups
  3. A controlled vocabulary from Getty Research Institute’s Union List of Artist’s Names (ULAN) allows searching for works by any part or variant of an artist’s name to find images linked to the ULAN creator record

The Google Behind Arts and Culture

How do both of these platforms compare to Google Arts and Culture? In order to understand Google’s platform, one needs to understand how Google as an entire company thinks and creates its platforms. It is hard to image a life before Google. Indeed, Google is intertwined in almost anything if not everything we do. We retrieve information ranging from something trivial like DYI tutorials to helping students (like me) write papers like this one. Google was founded in the late 1990s along with the revolution to commercialize the Web. The rise of this mega-tech company started a new way of searching and categorizing ideas and issues through algorithms. In their book Google and the Culture of Search, Hillis et. al aptly state how “Google operates as a nexus of power and knowledge newly constituted through extremely rapid changes in networked media technologies…”.[10] Google is an online platform that is also a kind of interface that (should) facilitate meaning. An interface is a metaphor that describes discovering “important cognitive and technical patterns that apply to all kinds of symbolic artefacts: books, photographs, artworks, music, architecture (3D built space), and, more recently, the symbolic substrate of pixel-based screens that can represent any 2D digital object. “Interface” is also part of web of related conceptual metaphors: medium/mediation, affordance, window, node, link, relay.”[11] Google’s search model (PageRank) which is one component of its interface relies on relevancy, instantaneity and generic individualization which ultimately skews how they operate their database and more broadly speaking, their platforms. By creating algorithms that will preemptively select what you would search, Google is narrowing down the scope of your search results based on what you want to see. Such a concept ties into Manovich’s ‘paradox’ of personalized technologies, where in “…following an interactive path, one does not construct unique self but instead adopts already pre-established identities.”[12] Ultimately Google is telling a viewer information whereas a normal interface is giving a viewer access to materials and information so they create their own meaning system. Studying the search technology behind Google has broader implications on how users, searchers and viewers navigate, classify and evaluate Web content on all of Google’s platforms, including Arts and Culture.

In order to extrapolate Google’s design principles, one needs to look at the motive behind Arts and Culture as well as how it receives its data. First and foremost, Google Arts and Culture is a digitized open archive of sorts (the word ‘open’ complicates the matter and will be further extrapolated further down). The original 17 institutions that were part of the project have now expanded to over a thousand collections, museums and other cultural institutions since its initial launch 7 years ago. Don rightly proposes the following definition of the Google Arts and Culture project; “The Google Art project is more of a Content Management system for web-facing educational art websites. Its goal is to amass collections from as many sources as possible and present it nicely, at the expense of having complete records, complete collections, or making it available for re-use.”[13] Google therefore gets its data from institutions that would like to participate and ask for a kind of ‘membership’ for their artwork to be displayed on Google’s platform. Arts and Culture imports museum collection to its platform and has the liberty to interpret, place and treat these objects as it sees fit. Participating institutions provide their collection on this free platform and Google provides the technical services of archiving their data. Not only does it gain visibility for the institutions but also adds to Google’s branding. It’s a win-win for both parties involved.

The services provided on the Arts and Culture platform are also using Google’s current technologies, just bundled and combined in a different way. Combinatorial or cumulative designs are found on an interactive screen interface, where in using this interface “we become conductors of a complex, unobservable orchestra of actions for transforming signs and symbols through ongoing computation and combinations with other symbolic structures, and combinations with many other conductors.”[14] Google has bundled together technologies that have already existed before creating Arts and Culture. For instance, “Street View”, “Nearby” and all maps are from Google Maps, basic information on hours of the museums from Google Search and data on images fetched partly from Google Images. At an initial glances, the design of the platform seems transparent and natural but it’s not, given that Google’s software embedded in Arts and Culture just needs to fetch the basic scripted metadata from other platforms embedded into this platform. Arts and Culture therefore becomes an example of a ‘meta-medium’, a phrase coined by Alan kay to describe a computer with “no longer a single medium but a medium for other media processed by user-activated software in the same substrate used for display.”[15] The Metropolitan Museum of Art’s website and Artstor are also meta-mediums but for a different purpose. The motivation behind the Met collections database and Artstor rests on its priority to be a resource for scholars and for the public, being as open as possible and erring on the side of more sharing. The Met’s Open Access API further reinstates its commitment to Arts and Culture, however, doesn’t have a terms of service statement regarding re-use, and they’ve even made it hard to download the images on the site. Try right-clicking to save image; it doesn’t work anywhere that I can see. In a sense it’s a “Dead End.”

A Problem of Design

 Such back-end technologies impact every aspect of how Google designs and visualizes its metadata on Arts and Culture The Design of Arts and Culture is rather interesting if you look at other Google platforms in comparison. All of Google’s products and services follow a strict design guide called Visual Assets Guidelines. According to Google’s Art Director Christ Bettig, the guide “defines our visual design (across platforms)  and provides the assets / information needed in order to create any visuals for Google products or marketing collateral.”[16] Among the many guidelines in this comprehensive report, a few items stood out to me. Google’s core design philosophy has always been to create products that are built on a large scale and that are “mobile-first”; two concepts that are not embedded into Arts and Culture. Not only does Google violate their own design principles, the company fails to incorporate interactive design features that are intended to provide user information to facilitate interpretation and usable representation.

One of the key components of Arts and Culture is the continuous scroll feature that has one New York Times critic Roberta Smith saying “the images start sliding past like butter.” Each image is compressed into a flat tile that takes its place next to another flat tile, maybe of the same artist or a different artist? I don’t know. A featured theme of “Vermeer” is displayed on Arts and Culture’s home page. Does that have anything to do with the featured stories down below? I don’t know either- and that’s the problem. Arts and Culture is a database, a platform and most importantly, an interface. As stated earlier, an interface is intended to facilitate meaning to the viewer so they in turn can establish connections and create a nodes of meanings within their own information network. The visualizations of Arts and Culture inhibit such meaning-making. The flat tiles of images gives the viewer some information and no information at the same time. If you click the theme you are overburdened by information; Who was Johannes Vermeer?, “The complete works in augmented reality”, “The Mona Lisa of the North”, “Every painting in one place”, “Vermeer on screen,” “create your own masterpiece,” “Vermeer in pop culture,” “Justin Richburg x Vermeer,” “The Devil is in the Details”- the list goes on and on until you get to “explore more” which honestly is pointless given that Google has thrown so much information in your face already. By breaking up the information into seemingly “informational” nodes, Google is making it “difficult to identify the explicit transfer of knowledge because there is very limited interpretative text explaining the conceptual threads that tie items together.”[17] Barranha and Martins both argue that for virtual museums or collections such as the Arts and Culture, there is a need to “…opt (or should opt) for architecture which is flexible, transparent, distributed and open to collaboration and multiple realizations.”[18] The generalized “themes” created by Google also puts into discussion Malraux’s museé imaginaire, in which western culture has embedded a notion of stylistic continuities and generalities that result in a thematized style that ultimately leaves little to learn about.

The same search of Vermeer on the Met Collection and Artstor reveal a complete different perspective- and story behind both website’s design principles. The Met’s search results in 38 Vermeer paintings, that can be filtered by “object type/material”, “geographic location”, “date/era” and “department” as well as can be sorted by “Relevance”, “Title”, “Date”, “Artist” and “Accession Number.” The same search on Artstor has many more results and similar classifications with additional categories; “Contributor” and “Collection Type”. Such classifications draw a parallel to the three sign functions, icon, index and symbol, used in the semiotic process. As Irvine states, “they function as exemplary types (prototypes, ideal model forms) that the American viewer is given access to as part of a democratic cultural history, and thus open onto the whole network of symbolic values for art history.”[19] A viewer using the Met and Artstor can therefore understand the messages and meanings represented in the details of the artworks because of those categories of descriptions. Such connections are not made apparent on Google’s interface. It is unclear what concepts and relationships are being used or what organizing principles are being deployed when grouping the flat tiles into chunks on-screen or using the infinite scroll feature. When you click on Every Painting in one place, you see Vermeer’s paintings have been thematized one again according to “flirtation”, “Storytelling”, “Concentration”, “Correspondence”, etc. These themes created are further reaffirming Manovich’s concept of creating “pre-established identities.” The viewer is being told Google’s perspective on how to understand Vermeer whereas being shown the different types of interprenants that are available to their disposal. Google is actively changing how one interprets artists, paintings and art history in general when creating these preemptive themes and categories. Google becomes a ‘a zoo of oddities’ and an ‘endless seeing of the Internet’ where Arts and Culture represents ‘a kind of cultural illness’ that has infatuated the general public. [20]


















Another Perspective

There are other ways of designing and using Google’s technologies that can incorporate the missing semiotic layers apparent on the Met and Artstor’s website. The Met has gracefully and easily employed the basic meaning-making ‘system’ within their design of the website. While Google has shuffled its metadata in themes and groups, the Met re-organizes and interprets its various layers of data onto a comprehensive and interactive Heilbrunn Timeline of Art History. The timeline “pairs essays and works of art with chronologies telling the story of art and global culture through the Museum’s collection.”[21] While Arts and Culture is like Artstor in the sense that it is a database, it is also a quasi-archive. Alexandra Lussier-Craig speaks to how Arts and Culture (titled the Arts Project back in 2015) behaves as an archive:

“It behaves similarly to archives, in the plural, in the way that it treats the individual collections. The items in the Art Project are primarily arranged according to the institution that contributed the collection […] items are treated as though they originate in museum collections.”

Lussier-Craig adds that even though the Arts Project groups items according to the museum or institution that contributed them, “each of these collections the itemsʼ arrangement is far less structured . The arrangement of items within each collection is algorithmically generated[…]”[22] To combine the qualities of the Met’s archives and Artstor’s database with Google’s technologies, I propose that Arts and Culture creates a sophisticated timeline of sorts. Arts and Culture has already the metadata available to create this timeline. The timeline would be illustrated to incorporate the time period and geographic region together on one screen to create an indexical relationship among artworks. The contributing institutions would be another layer added onto the timeline, to locate where the artworks are in physical space. I would get rid of the Street View as it adds no information other than how you would see the artwork. Audio and videos that are already on the platform can become incorporated under each artwork on the timeline. Creating this chronological design and visualization will not only created usable information but also incorporate Google’s “large scale” design principle. Another feature that might be interesting to add is to take a picture of an artwork before you and the app will situate that within the timeline. This can create a real-time index to help the viewer understand the artwork before them.


Google Arts and Culture is founded on the principle to “discover artworks, collections and stories from all around the world in a new way.” Yet  Google’s design, visualizations and organization of its metadata creates a “content management platform” for web-facing institutions. From the outset, the search engine was designed to avoid the subjectivity, maintenance expense, slow speed of indexing and limited scalability common to human-maintained directory sites.[23] Google’s focus on instantaneity and scalability neglects basic design principles that facilitate a user’s ability to understand and contextualize art and culture. The platform becomes a winding road of information with no directionality, that is rather funny for a search engine who prides itself on mastering the art of design. The Arts and Culture platform becomes an interesting case study of the power of design and the seemingly trivial details on our ability to form connections and create meaning. I wish to see the changes, if any, that Google will create to this platform that had such immense potential.


[1] Agostino, C. “Distant Presence and Bodily Interfaces: ‘Digital-Beings’ and Google Art Project.” Museological Review, no. No.19 (2015): p. 65.

[2] Beil, Kim. “Seeing Syntax: Google Art Project and the Twenty-First-Century Period Eye.” Afterimage: The Journal of Media Arts and Cultural Criticism. 40.4 (January/February 2013): 22-27.

[3] Irvine, Martin. “Malraux and the Museé Imaginaire: (Meta)Mediation, Representation and Mediating Institutions.” Google Docs. Accessed December 11, 2018.


[4]  Kristoffermilling. “Malraux and the Musee Imaginaire: The ‘Museum without Walls.’” Culture in Virtual Spaces (blog), June 17, 2014.

[5] “Image and Data Resources.” The Metropolitan Museum of Art, i.e. The Met Museum. Accessed December 9, 2018.

[6] “TJWL Office of Digital Projects.”

[7] Undeen, Don. Dec. 10, 2018. Gmail Interview.


[9] “Metadata Policy & Standards | Artstor.” Accessed December 9, 2018.

[10] Hillis, Ken, Michael Petit, and Kylie Jarrett. Google and the Culture of Search. 1st ed. New York, NY, 10001: Routledge, 2012.

[11] Manovich, Lev. “The Language of New Media.” MIT Press, 2001, p. 226.

[12] Manovich, p. 129.

[13] Undeen, Don. Dec 10, 2018. Email Interview.

[14] Irvine, Martin. “Introduction to Affordances, Constraints, and Interfaces.” Google Docs. Accessed December 9, 2018. Page 7.

[15] Irvine, p. 7.

[16] “Google Visual Asset Guidelines.” CB. Accessed December 12, 2018.

[17] Lussier-Craig, Alexandra. “Googling Art: Museum Collections in the Google Art Project,” n.d., 7.

[18] Barranha, Helena, and Susana Martins. “Beyond the Virtual: Intangible Museographies and Collaborative Museum Experience.”, January 2015.

[19] Irvine, Martin. Art and Media Interfaced: From Studying Interfaces to Making Interfaces. Accessed December 11, 2018. file:///Users/adrianasensenbrenner/Downloads/Irvine-Making-Interfaces-Project%20(1).pdf

[20] “Is Google Bringing Us Too Close to Art?” The Daily Dot, March 21, 2013.

[21] “Home | Heilbrunn Timeline of Art History | The Metropolitan Museum of Art.” The Met’s Heilbrunn Timeline of Art History. Accessed December 13, 2018.

[22] Lussier-Craig, p. 8.

[23] Hillis, p. 36.



“About the Google Cultural Institute.” Accessed December 12, 2018.

Agostino, C. “Distant Presence and Bodily Interfaces: ‘Digital-Beings’ and Google Art Project.” Museological Review, no. No.19 (2015): 63–69.

Beil, Kim. “Seeing Syntax: Google Art Project and the Twenty-First-Century Period Eye.” Afterimage: The Journal of Media Arts and Cultural Criticism. 40.4 (January/February 2013): 22-27.

Barranha, Helena, and Susana Martins. “Beyond the Virtual: Intangible Museographies and Collaborative Museum Experience.”, January 2015.

“Google Visual Asset Guidelines.” CB. Accessed December 12, 2018.

Hillis, Ken, Michael Petit, and Kylie Jarrett. Google and the Culture of Search. 1st ed. New York, NY, 10001: Routledge, 2012.

“Home | Heilbrunn Timeline of Art History | The Metropolitan Museum of Art.” The Met’s Heilbrunn Timeline of Art History. Accessed December 13, 2018.

“Image and Data Resources.” The Metropolitan Museum of Art, i.e. The Met Museum. Accessed December 9, 2018.

Irvine, Martin. “Introduction to Affordances, Constraints, and Interfaces.” Google Docs. Accessed December 9, 2018.

Irvine, Martin. “Malraux and the Museé Imaginaire: (Meta)Mediation, Representation and Mediating Institutions.” Google Docs. Accessed December 11, 2018. file:///Users/adrianasensenbrenner/Downloads/Malraux-Imagined-Museum-Interface%20(1).pdf

Irvine, Martin. Art and Media Interfaced: From Studying Interfaces to Making Interfaces. Accessed December 11, 2018. file:///Users/adrianasensenbrenner/Downloads/Irvine-Making-Interfaces-Project%20(1).pdf

“Is Google Bringing Us Too Close to Art?” The Daily Dot, March 21, 2013.

Kristoffermilling. “Malraux and the Musee Imaginaire: The ‘Museum without Walls.’” Culture in Virtual Spaces (blog), June 17, 2014.

Lussier-Craig, Alexandra. “Googling Art: Museum Collections in the Google Art Project,” n.d., 114.

Manovich, Lev. “The Language of New Media.” MIT Press, 2001, 68.

“Metadata Policy & Standards | Artstor.” Accessed December 9, 2018.

Proctor, Nancy. “The Google Art Project: A New Generation of Museums on the Web?” Curator: The Museum Journal 54, no. 2 (April 1, 2011): 215–21.

“TJWL Office of Digital Projects.” Accessed December 9, 2018.

Undeen, Don. Dec. 10, 2018. Gmail Interview.

Surveillance Capitalism as a Result of Internet Personalization

Warning: Use of undefined constant user_level - assumed 'user_level' (this will throw an Error in a future version of PHP) in /home/commons/public_html/wp-content/plugins/ultimate-google-analytics/ultimate_ga.php on line 524


It is no secret that companies use technology to track users’ activity online. Often, companies use language such as “personalization” or optimization” to justify the collection of users’ behavioral data. This verbiage frames technological surveillance as a Faustian bargain in which users cede some of their privacy to obtain an optimized, more personal experience. However, Shoshana Zuboff argues in her paper “Big other: surveillance capitalism and the prospects of an information civilization” that the collection and selling of user data has created a new form of “surveillance capitalism,” in which users’ quotidian behavior is commodified. Zuboff’s paper is divided into four components of computer-mediated transactions that contribute to a state of surveillance capitalism: data extraction and analysis, monitoring and contracts, personalization and customization, and continuous experiments (Zuboff, 2015). This essay will illuminate how the affordances and designed history of “personalization and customization” on the internet have contributed to the rise of surveillance capitalism.


When one engages in contemporary discussion about data privacy and collection at the hands of technology corporations, a ubiquitous example is often given as the one true parable of creepy, invasive behavioral data collection: ad retargeting. Someone will mention that they were looking at a pair of shoes on one website, and then a few days later, they saw the same pair of shoes pop-up as a banner ad while they were browsing Facebook. To most people this seems like the apex of invasive behavioral advertising. In actuality, this practice of retargeting just begins to describe the ways in which corporations gather and analyze behavioral data from people on the internet. Most internet users have little knowledge about the actual scope and extent to which corporations collect and analyze behavioral data about them, and this lack of knowledge is largely by designed obscurement. This asymmetry of knowledge about data collection and analysis is one of the basic tenets of Shoshana Zuboff’s definition of the “fully institutionalized new logic of accumulation” that drives most tech companies as “surveillance capitalism.” (Zuboff, 2015).

Because the internet’s complexity demands multiple layers of modular abstraction, reinforced by pressure from the consumer economy to productize these modules, it’s no wonder that the internet helped enable a system of “surveillance capitalism.” The internet, and its associated and similarly mythologized “big data,” is often viewed as a singular, being with its own agency (Zuboff, 2015). However, if we adopt a sociotechnical systems view, we can see that the internet and the data it collects are a designed system that exists as a product of various technological affordances and design ideologies. Once we view the internet and the web as a designed system, rather than a divine monolith, we can begin to see which actors are exerting their agency onto different parts of the system. Using this perspective, we can begin to understand how the modern internet became a vehicle for this type of surveillance capitalism.

At first glance, the term surveillance capitalism seems to invoke dystopian views of Big Brother watching your every move and forcing you to buy things. In fact, Zuboff purposefully invokes some of this imagery in the title of her paper “Big Other” (Zuboff, 2015). But, one might be thinking, “this seems like histrionic language to use to describe the mutually beneficial trade-off of free services for some advertising data.” But, Gary Marx, a surveillance expert at MIT reminds us that “While coercion and violence remain significant factors in social organization, softer, more manipulative, engineered, connected, and embedded forms of lower visibility have infiltrated our world. These are presumed to offer greater effectiveness and legitimacy than Orwell’s social control as a boot on the human face.“ (Marx, 2015). Corporations are not using physical coercion or presense to force behavioral changes as one might imagine with traditional concepts of surveillance, rather they are using surveilled data to design systems of advertisements that subtly – and effectively – manipulate people.

Within Zuboff’s paper, surveillance capitalism is divided into into four contributing components of computer-mediated transactions: data extraction and analysis, monitoring and contracts, personalization and customization, and continuous experiments (Zuboff, 2015). This essay will illuminate how the affordances and designed history of “personalization and customization” on the internet have contributed to the rise of surveillance capitalism.

How has the design of personalization as a key affordance of the internet created opportunities for surveillance capitalism to exist? To outline the affordances and design of every personalization module that contribute to one’s experience of a personalized internet, and thus enable a practice of surveillance capitalism, would require multiple volumes of books  Thus, here I aim to de-blackbox the designs of two key personalization features of the internet to illuminate how they contribute to internet surveillance capitalism: internet browser cookies, and geolocation data.

HTTP Cookies

As mentioned before, the internet is often conceptualized as a massive entity or space that users can “visit,” “surf,” or “go to.” While a spatial allegory is conducive to help one organize and process the information available to him or her on the world wide web, defining the internet as a monolithic structure disguises the communicative nature on which the internet was founded. For personalization to occur, and by extension for one to be surveilled, the internet obviously must be different and unique for each user.

One of the key concepts of the modern web experience is that your browser remembers who you are. Users of the world wide web are able to make accounts for almost any website so that they can interact with the website and have the state of their interactions on the website be saved. When I log in to Facebook, I expect to see my unique, personalized feed of friends and family. To think that the internet could exist in any other way seems almost absurd. This ability for the web to remember who you are, to stay logged into a certain account or hold goods in a virtual shopping cart, is largely attributable to cookies. A cookie, also referred to as an HTTP cookie or a browser cookie, is information about a user that is stored from a website into the user’s browser or hard drive, so when they return to that site later, the site can read the information on the cookie to remember who the user is (“Internet Cookies,” 2013).

Lou Montulli, borrowing from a designed solution in computer science called a “magic cookie,” was the first person to implement cookies into web browsers at Netscape. The cookie was designed to allow the web browser to remember a user’s preferences (Hill, 2015). Now, there are many different features and types of cookies that have developed from that use case, but they all share the common feature of being a “small piece of data that a server sends to the user’s web browser.” Cookies can either be “first-party cookie,” meaning the cookie’s domain is the same as the page a user is on and only sends information to the server that set it, or “third-party cookie,” which are mostly used for tracking and advertising (“HTTP cookies”).

Figure 1.  A common pop-up explaining  that a website is going to use cookies.

Cookies are a basic form of surveillance that most people explicitly consent to in various types of pop-ups because the cookies allow a user to skip repetitive processes like filling out content preferences or location information. However, the affordances of tracking and personalization that cookies bring to web browsers can allow third parties to create profiles that surveil and map users across myriad sites (Hill, 2015). Using NoScript, the Electronic Frontier Foundation found that visiting exposed their browser to “10 (!) different tracking domains.” These third-party cookies are hosted in sites across the web, allowing the tracking organizations to build robust profiles of behavioral data about a user’s experience on the web (Eckersley, 2009). Most collectors and aggregators claim that this information is kept anonymous, but research has shown that “leakage” of personal identifiable information via online social networks can link user identities  “with user actions both within OSN sites and elsewhere on non-OSN sites” (Krishnamurthy & Wills, 2009).

Thus, the design of the cookie itself does not create the issue of surveillance, rather it is the network of actors that take advantage of the browser cookies’ technological affordances that create a scenario in which a user can be identified, profiled and tracked throughout their journey on the web. As the EFF recognizes, “all of this tracking follows from the design of the Web as an interactive hypertext system, combined with the fact that so many websites are willing to assist advertisers in tracking their visitors” (Eckersley, 2009).  Cookies did not create an environment in which surveillance capitalism was inevitable, but the design of cookies as a primary module of the world wide web did contribute to its growth.   Because ”behavioral tracking companies can put whatever they want in the fine print of their privacy policies, and few of the visitors to CareerBuilder or any other website will ever realize that the trackers are there, let alone read their policies,” third parties can continue to use data from cookies to model and sell the quotidian activity of a web user without the user ever even knowing that their identity was surveilled and sold (Eckersley, 2009).

Location Data Sharing

Smartphones have become ubiquitous tools that help us navigate the world around us. Need to find the closest matcha store to you? Pull up Google Maps and have it lead the way. But, actually, that’s a pretty far walk and the sky looks a bit ominous. Open your weather app to check the weather in your area. Turns out it should start raining any second, so you decide to call an Uber to pick you up at your exact location. To ask how these apps on your smartphone helped mediate your journey home seems like a simplistic question. Obviously, the app just asked to use the GPS data that your phone collects. Much like allowing cookies on web browsers, one usually has to accept some sort of push notification or pop-up to allow an app to communicate with the phone’s GPS.

The designed interface of these notifications can be vague about what a user’s location data is used for. Also, much like third-party cookie tracking on web browsers led to the development of a marketplace and industry around users’ behavioral data, a third-party marketplace also came to exist from the buying and selling of users’ location data.

Within Apple’s Human Interface Guidelines for iOS, Apple recognizes that designers need to request permission to access personal information such as location. Within the iOS design guidelines, apps are encouraged to “provide custom text (known as a purpose string or usage description string) for display in the system’s permission request alert, and include an example.” This string is presented in a standard iOS system-provided alert, so the permission request will be familiar to an iOS user (“Requesting Permission,” 2018).

Figure 2. A notification asks the user to share location data

However, within Apple’s design guidelines, nothing is mentioned about a requirement to let a user know if their personal data will then be sold to third-parties. As the New York Times reported, “Of the 17 apps that The Times saw sending precise location data, just three on iOS and one on Android told users in a prompt during the permission process that the information could be used for advertising. Only one app, GasBuddy, which identifies nearby gas stations, indicated that data could also be shared to ‘analyze industry trends’” (Valentino-DeVries, Singer, Keller, & Krolik, 2018). This sharing of location data from app companies to third-parties is not a cottage industry:

At least 75 companies receive anonymous, precise location data from apps whose users enable location services to get local news and weather or other information, The Times found. Several of those businesses claim to track up to 200 million mobile devices in the United States — about half those in use last year. The database reviewed by The Times — a sample of information gathered in 2017 and held by one company — reveals people’s travels in startling detail, accurate to within a few yards and in some cases updated more than 14,000 times a day (Valentino-DeVries, Singer, Keller, & Krolik, 2018).

Companies that sell and analyze this location data might claim that the data is all surrendered consensually, but as is apparent with the vague guidelines for “requesting permission” that app developers must use to access the iPhone’s location measurements, it is likely that users are unaware that their movements are being commodified.

Apps that use location data do so with the utilization of the history of increasing personalization as a validation. People do not object to application-based surveillance because they believe that the deal is designed to benefit them. The designed experience of enabling an application to utilize personal information, including “current location, calendar, contact information, reminders, and photos,” is meant to highlight the benefits of personalization while neglecting to specifically outline ways in which the company behind an app may use that personal data as a commodity to profit on.


Both cookies and the ability to access location data allow a user to have a more personalized, unique experience with the internet. I am not trying to argue that these designed features of the world wide web and smartphones inherently create a form of malevolent surveillance. With both browser cookies and location data sharing, users of the world wide web and the appified internet generally have to opt-in to be surveilled. However, the design of these systems of surveillance obscures the extent to which the user is being surveilled. Most users are told that a website or app will collect behavioral or location data to “optimize” or “personalize” the user’s experience. This asymmetry of knowledge between the user and the surveilling company creates a state in which users can continue to be surveilled.



Eckersley, P. (2009, September 21). How Online Tracking Companies Know Most of What You Do Online (and What Social Networks Are Doing to Help Them). Retrieved December 11, 2018, from

Hill, S. (2015, March 29). The History of Cookies and Their Effect on Privacy. Retrieved December 11, 2018, from

HTTP cookies. (n.d.). Retrieved December 12, 2018, from

Internet Cookies. (2013, July 29). Retrieved December 11, 2018, from

Jennifer Valentino-DeVries, Natasha Singer, Michael H. Keller, & Aaron Krolik. (2018, December 10). Your Apps Know Where You Were Last Night, and They’re Not Keeping It Secret. The New York Times. Retrieved from,

Krishnamurthy, B., & Wills, C. E. (2009, August 17.). On the Leakage of Personally Identifiable Information Via Online Social Networks, 6.

Marx, G. (2016). Windows into the soul : surveillance and society in an age of high technology . Chicago ;: The University of Chicago Press.

Requesting Permission – App Architecture – iOS – Human Interface Guidelines – Apple Developer. (n.d.). Retrieved December 12, 2018, from

Zuboff, S. (2015). Big other: surveillance capitalism and the prospects of an information civilization. Journal of Information Technology, 30(1), 75–89.


Translate Like A Human

Warning: Use of undefined constant user_level - assumed 'user_level' (this will throw an Error in a future version of PHP) in /home/commons/public_html/wp-content/plugins/ultimate-google-analytics/ultimate_ga.php on line 524

— De-blackbloxing Google Translate

Huazhi Qin


Machines gradually take over the translation tasks in real life. As machine translation (MT) develops, different methodologies have been applied to this field and then generated multiple distinct translation systems. Rule-based machine translation (RBMT), statistical machine translation (SMT), and neural network machine translation systems (NMT) are the three most important systems. Among them, Google Translate uses Google’s neural network translation (GNMT) system, one of state-of-the-art NMT model, to achieve a breakthrough in MT. GNMT is a model integrating four components: recurrent neural network, long short-term memory, encoder-decoder architecture, and attention mechanism. However, the accuracy of Google Translate still faces challenges in terms of internal translation process and integration with the audio and image input.



According to Russell, machine translation (MT) utilized the power of machines to achieve “automatic translation of text from one natural language (the source language) to another (the target language)”. (Russell et al., 2010).  As the increasing interactions all over the world, the demand to overcome language barriers has expanded. Due to the human translation asks for a lot of efforts and time, people sought help from the computer to take over this task. How to improve machines’ performance in translation has become one of the most important topics in computer science.

Since the 1950s, the scholars have tried applying different methodologies to machine translation (MT) to bridge the gap between machine and human translations. They developed multiple distinct translation systems. Among them, rule-based machine translation (RBMT), statistical machine translation (SMT), and neural network machine translation systems (NMT) are three core systems.

In 2016, Google introduced its updated translation service Google Translate using Google Neural Machine Translation System (GNMT), which marked a great improvement of machine translation. By integrating deep learning technology, Google Translate implemented the “attentional encoder-decoder networks” model which contributed to reduced translation errors by an average of 60% compared to Google’s phrase-based production system. (We et al., 2016)

Nevertheless, the current machine translation system has still been criticized for its limited accuracy.


Rule-Based Machine Translation (RBMT) — Linguistics

1.Translation process

Rule-Based Machine Translation (RBMT) is the oldest approach to make machines translate. Basically, it simulates the process of constructing and deconstructing a sentence based on language-specific rules, following one type of automatic translation process called Bernard Vanquois’ Pyramid (Figure 1).  The whole translation process experiences three steps – analysis, transfer, and generation – based on two sources – dictionaries and grammars. The implement of linguistics rules is the core feature.

Figure 1 Bernard Vauquois’ Pyramid (source:

According to Evans, a language is composed of primitives (the smallest units of meaning) and the means of combination (rules for building new language elements by combining simpler ones). (Evans, 2011) SMT also focuses on these two elements. To be more specific, firstly, the machine analyzes the grammatical category and links for every word of the sentence in the source language based on the perspectives of morphologic, semantic, and syntactic rules. (Figure 2) Secondly, every word in the source language is transferred to the adequate lexical items in the target language according to dictionaries. At last, the complete target sentence is generated by synthesizing every part in step 2 according to the grammatical rules in the target language.

Figure 2 Analysis of the sentence in the source language (source:

2. Limitations

When RBMT transfer meanings, there are obvious limitations in the following three aspects.

Firstly, the quantitative need of dictionaries and grammatical rules is hard to be fulfilled. The manual development of linguistic rules can be costly.

Secondly, RBMT is somewhat a language-specific system which means that it often does not generalize to other languages.

Thirdly, it only works for plainly-structured sentences while hard to deal with complicated ones, especially ambiguous and idiomatic texts. Human languages are full of special cases, regional variations, and just flat out rule-breaking. (Geitgey, 2016)


Statistical Machine Translation (SMT) – Probability Calculation

1.Translation process

Statistical machine translation (SMT) dominates the field of MT from the 1980s to 2000s. Unlike RBMT, no linguistic or semantic knowledge is needed in SMT. Rather, parallel corpora become the foundation of machine translation. In addition, SMT systems are not specially designed for any specific pair of languages.

Regarding the translation process, SMT applies a statistical model to machine translation and generates translation based on the analysis of bilingual text corpus. (Synced, 2017) The key feature is the introduction of statistics and probability.

There are also three steps in the process: 1) break the original sentence into chunks; 2) lists all possible interpretation options for each chunk (Figure 3); 3) generate all possible sentences and find the one with the highest possibility. The “highest possibility” means the sentence which sounds the “most human”. (Geitgey, 2016)

Figure 3 A large number of possible interpretations (source:

2. Limitations

Despite statistical machine translation overcomes many shortcomings of RBMT, it still faces many challenges, especially in terms of sources and human intervention.

As regards sources, although no linguistic rules are required, statistical machine translation requires a great deal of training data about double-translated texts. (Geitgey, 2016) As for human intervention, the SMT system is consist of numerous separate sub-components and rely on multiple intermediary steps (Figure 4) which requires a lot of work from engineers. (Zhou et al., 2018) Excessive human intervention will definitely influence translation results.

Figure 4 SMT is consist of many intermediary steps (source:


Neural Machine Translation – Google Translate

Neural machine translation (NMT) is considered to be born in 2013 when two scientists applied deep learning neural networks to machine translation and proposed a novel end-to-end encoder-decoder structure. In the next few years, sequence-to-sequence learning using the recurrent neural network (RNN) and long short-term memory (LSMT) has been gradually integrated into NMT. (Synced, 2017)

However, NMT systems are criticized for its computationally expensive both in training and in translation inferences. Also, NMT systems lack practicability in some cases, especially when encountering rare words. (Wu et al., 2016) Thus, original NMT was rarely put into practice due to its poor performance in translation speed and accuracy.

In 2016, Google Brain team announced Google’s neural network translation (GNMT) system which addressed many of the issues. GNMT help Google Translate achieve state-of-the-art translation results. It reduces translation errors by an average of 60% when compared to Google’s previous phrase-based production system. (Wu et al., 2016) Then, I will de-productize Google Translate, one of the most advanced applications of NMT, to elaborate the how NMT works.

1.De-blackboxing Google Translate

According to Google Brain team, Google’s neural network translation (GNMT) is a model consists of a deep LSTM network with 8 encoder and 8 decoder layers using residual connections as well as attention connections from the decoder network to the encoder. (Wu et al., 2016) There are four major features in GNMT: recurrent neural network, long short-term memory, encoder-decoder architecture, and attention mechanism

A. Recurrent neural network (RNN)

Unlike previous machine translation, people understand the sentences, contexts, and information based on the understanding of previous ones. In other words, human thoughts have persistence. The introduction of recurrent neural network (RNN) brings machine translation an ability of memory, letting machine think like a human. The recurrent neural network contains loops which allow information to persist. (Github, 2015) In also means that the previous calculations can further influence change the results of future outputs.

However, traditional RNN sometimes faces the problem of long-term dependencies. When the machine has to trace further back to narrow down and determine the next word. (Github, 2015) For instance, when predicting the last word in the text “I was born and grew up in China… I can speak Chinese.” The close word “speak” only deliver the clue that the next word is most likely to be a language. The further previous contexts “China” can help narrow down to the specific word “Chinese”. In short, the gap between relevant information become wider.

B. Long short-term memory (LSMT) (Figure 5)

In order to address this issue, long short-term memory (LSMT) networks are applied to machine translation. At any given point in LSMT, it accepts the latest input vector and produces the intended output using a combination of the latest input and some ‘contexts’.

Figure 5 an unfold LSMT (source:

The horizontal line, namely the cell state, running through the top of the diagram. It conveys information straight down the entire chain. The structures, consisting of a sigmoid neural net layer and a pointwise multiplication operation, call gates. The three gates in an LSMT regulate the information flow, deciding what old information should be kept and what new information should be included in the next cell state. When generating the results, the gates only output the results needed. (Github, 2015) The whole process is based on a ton of example input and finally generates a filtered version. (Srjoglekar246, 2017)

As regards the actual translation process, for instance, the cell state might include the gender of the present subject to generate the proper pronouns. When encountering a new subject, the gender information of the old subject will be excluded. Then, a word relevant to the verb might be generated in the output step, since it is most likely to come following a subject. (Github, 2015)

C. Encoder-decoder architecture

Based on LSTMs, Google Translate built up its encoder-decoder architecture. Encoding can be seen as the process and result of the analysis. Decoding is the direct generation of the target sentence. Basically, the decoder network is similar to the encoder one. Thus, I will only discuss the encoder network in details below.

At the beginning, the sentence will be input into the system word by word. The encoding process refers to that the word will be encoded into a set of numbers. (Geitgey, 2016) The numbers represent the relative position of each word in a word embedding table and reflect its similarity with other objects. (Systransoft, 2016) (Figure 6)

Figure 6 the encoding process (source:

There are two approaches Google Translate use to influence the “quality” of that numbers. The first one is bi-directional input, which means that the entire sentence will be input in reverse order. The following words also influence the meaning and “context” of the sentence. Thus, the “position” of the word will be more accurately output.

The second one is the principle of layering. According to Universal Principles of Design, layering refers to the process of organizing information into related groupings in order to manage complexity and reinforce relationships in the information. (Lidwell, 2010) The encoder network is essentially a series of 8 stacked LSTMs. (Figure 7) Every layer is impacted by the lower layer. The pattern of the data becomes more and more abstract when the information goes to higher layers which contributes to represent the contextual meanings of words in the sentence. (Srjoglekar246, 2017)

Figure 7 GNMT’s encoder networks (source:

In short, the encoder-decoder architecture can be displayed in Figure 8.

Figure 8 GNMT’s encoder-decoder architecture (Schuster, 2016)

D. Transformer – a Self-Attention Mechanism

However, the outputs of encoding process will bring too many complexities and uncertainties to decoder network, especially when the source sentence is too long. (Cho et al., 2014) In order to better process encoding, Google Translate build up a self-attention mechanism called Transformer between two phases. (Uszkoreit, 2017)

Transformer enables the neural network to pay more attention to relevant parts of inputs focus on relevant parts of input when encoding. (Synced, 2017) (Figure 9) So as to determine the level of relevancy, Transformer lets the system to look back at the input sentence at each step of the decoder stage. Then, each decoder output depends on a weighted combination of all the input states. (Olah & Carter, 2017)

Figure 9 the integration of Transformer (the purple lines denote the weights) (source:

2. Limitations

Although GNMT is the state-of-the-art model in current MT field, the accuracy and reliability of its translation results still face lots of challenges.

Regarding the system itself, as what mentioned above, the filtering process is based on examples. Thus, it is important to collect a large amount of training and test data which can provide the diverse vocabulary and their usages in various contexts. In addition, it is hard to detect mistakes and inaccuracy of the outputs and then difficult to correct them, especially the omission of the information.  (Zhou et al., 2018) Meanwhile, the rare word problem, monolingual data usage, memory mechanism, prior knowledge integration, coverage problem and so forth are also needed to be further improved. (Synced, 2017)

Furthermore, in addition to text input, Google Translate accept input in the formats of audio and image, which raise higher requirement to natural language processing. According to the information theory, the omissions and errors will occur in the step that transfer audio and image information to the source that the system process. Then the accuracy of results will be definitely harmed.



Although still facing challenges, Google’s neural network translation system overcomes numerous shortcomings of RBNT, SMT, and original NMT and make huge improvements in terms of data amount, fluency, accuracy and so on. It brings a new possibility to the field of machine translation. This field is undergoing fast-paced development. It is reasonable to believe that the application of NMT will continue to achieve greater breakthroughs and then lead the future path of machine translation.



Cho, Kyunghyun, Merrienboer, V., Bart, Caglar, Fethi, . . . Yoshua. (2014, September 03). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. Retrieved from

Evans, D. (2011). Introduction to computing explorations in language, logic, and machines. Lexington, KY: Creative commons. Pp. 20-21.

Geitgey, A. (2016, August 21). Machine Learning is Fun Part 5: Language Translation with Deep Learning and the Magic of Sequences. Retrieved from

How does Neural Machine Translation work? (2016, October 13). Retrieved from

Lidwell, William, Kritina Holden, and Jill Butler. Universal Principles of Design. Revised. Beverly, MA: Rockport Publishers, 2010.

Olah, C., & Carter, S. (2017, August 31). Transformer: A Novel Neural Network Architecture for Language Understanding. Retrieved from

Russell, S., Davis, E., & Norvig, P. (2010). Artificial intelligence: a modern approach (3rd ed.). Upper Saddle River, NJ: Prentice Hall.

Srjoglekar246. (2017, February 19). Understanding the new Google Translate. Retrieved from

Synced. (2017, August 17). History and Frontier of the Neural Machine Translation. Retrieved from

Schuster, M., & Le, Q. (2016, September 27). A Neural Network for Machine Translation, at Production Scale. Retrieved from

Understanding LSTM Networks. (2015, August 27). Retrieved from

Uszkoreit, J. (2017, August 31). Transformer: A Novel Neural Network Architecture for Language Understanding. Retrieved from

Wu, Mike, Chen, Zhifeng, Mohammad, Wolfgang, . . . Hughes. (2016, October 08). Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. Retrieved from

Yeen, J. (2017, October 06). AI Translate: Bias? Sexist? Or this is the way it should be? Retrieved from

Zhou, S., Kurenkov, A., & See, A. (2018). Has AI surpassed humans at translation? Not even close! Retrieved from

Socio-technological thinking behind browser

Warning: Use of undefined constant user_level - assumed 'user_level' (this will throw an Error in a future version of PHP) in /home/commons/public_html/wp-content/plugins/ultimate-google-analytics/ultimate_ga.php on line 524

Socio-technological thinking behind browser

CCTP-820 Leading by design

Professor Irvine

The Web and web browsers, Image 1


Today we interact with web browsers all the time. In the morning, people open up Safari with their Iphones to search for today’s weather; or turn on their laptops and use Chrome to fill out the miss of news during our sleep. With using the web browsers on all kinds of electronic devices, we can say that the whole world is on our fingerprint. Browsing online has been an important part not only in people’s daily life, but also in the whole human society. The web browser was first designed as an tool, and was the only tool at that time, to connect to the Web. While with the technology developing, web browser serves more than an useful daily used tool and medium to access websites nowadays, but as a digital mediation. People might understand how to use the web browsers with simple clicks, but not the the socio-technological design idea behind it, or its transformation from a medium to a digital mediation, which changes the world we live in. In order to do so, we need to go deeper than simply judging from the surface, and de-blackbox the invisible part of it. This paper provides an overview of the history of the web and the web browsers, introduces the principles behind it and examines the socio-technical design thinking and its impact behind web browsers.


Web browser, is an internet-based application used to access and view websites on multiple electronic devices (Techterms). According to the statistics conducted by StatCounter, the major web browsers that we used today are but limited to: Chrome, Safari, Firefox, Opera, UC Browser and Internet Explorer. And also as shown on the web browser global market share graph on StatCounter, these six web browsers make up around 92% of global desktop browser usage (StatCounter).

The main function of web browsers is to access and view websites. And as Ron White explained in his book How Computers Work, the brief process behind the “clicks” is that it processes the HTML or the hyperlink, which is the web source, and then request through router to the site server, then display the webpage in the browser window (White, 2008).

From the users view point, we can only see visible user interfaces on web browser windows, such as the search bar, the address bar, the bookmark drop down list etc, while other parts are remain “blackboxed”. It’s hard for us to explore and examines the social and technical design thinking behind it by only examines from the user interface, since web browser is not only a thing but a product of its socio-cultural milieu. As for this reason, we start with examining the history behind web browsers as the history takes an important role in technology development and builds what web browser we see today.  

The Web

Web browsers cannot live alone without the Web. The Web was first designed and invented by Tim Berners-Lee in 1989. As Berners-Lee states in his book Weaving the Web: The Original Design and Ultimative Destiny of the World Wide Web, that his initial design idea of the Web is to have a open, united and global place to share and access information, as “a space in which anything could be linked to anything (Berners-Lee, 1999)” So what are the technology process behind this information-share web? According to Professor Irvine, World Wide Web is “a group of protocol layers, designed to enable intercommunications between internet servers (and services) and individual connected devices with the Internet/Web software (the “client” system, with softwares connecting to Web service) (Irvine, 2018).” Which indicates that web is not, again, a simple thing, but is created by different layers and modules. These protocol layered architecture enables the web to be extensible and scalable for new applications, because the layers made up architecture is independent which allows other applications to run on it.

The Web is designed as a hypermedia system, which allows it to include non-linear text and other media such as graphics, sounds or videos. As Irvine introduces that the “open, standard-based, device-independent” Web architecture including a network system that across client/server implementation; unlimited modulars and layers that can be added in the future; entensible for future applications; a model for interoperability for all software and hardware manufacturer and also to be scalable for new services and users (Irvine, 2018).

The client/server architecture

Going off to the client/server architecture mentioned above, it is the key architecture of the Web architecture. According to Berners-Lee, the modele of the client/server architecture has two important parts, the server side and the client side. The web server provides service to the clients. The client request a service, then the HTTP software on the server responds to the client’s file request, send the requested data packed as an HTML file. Then the graphic, sound or text rendering and displays on the web window interface at the client side as formatted content. While in this process, HTML file is an important part which enables the assumed interaction. As explained by Irvine, HTML is the core markup language for the Web, which is designed to describe hypertext and documents. HTML file is the carrier of contents such as images and texts that will be displayed on webpages. This also indicates the hypermedia design thinking behind the Web.

Client/server architecture, Image 2

The Web Browser

You affect the world by what you browse. —— Berners-Lee

People all know about the basic function of the web browsers, which allows us to search online, view websites, online shopping and watching videos. It has came a long way to what we see today. As introduced in A Brief History of Web Browsers and How They Work (Mcpeak, 2018), the brief history lists:

The earliest version of web browser is the WorldWideWeb, which then renamed as Nexus, is also designed by Berners-Lee in 1990, as an text-based application to display contents and access the Web.

Then in 1993, Mosaic was designed which was the first browser that enables the graphic content displaying.

In 1995, Microsoft introduced their first web browser – Internet Explorer

Then 1996, Opera was first introduced to the public

In 2003, Apple launched Safari, which then has dominating IOS market

Later come to closer time in 2015, Microsoft Edge was designed

For us as users, what we can see on the browsers are the user interfaces (UI), which is surface of web browser. Generally web browsers all contains the address bar where users can enter the URL, the search bar, the home button, refresh button, the back and forward button, tab management area and bookmark options.

For the users, all they need to do is couple “clicks”, then they can get whatever they want to find and can enjoy the content on the web window. But, if we see this process from a technology pointview, it also involves multiple layers and modules. As introduced earlier about the client/server architecture, web browser as a web application carrier, follows the same process and structure. The basic process is that the browser acts as the client, retrieves the hypermedia content from the server side and then displays it on user interfaces.

Generally, web browsers contains seven layers, lists from the top layer to the bottom, including the User Interface, the browser engine, the rendering engine, the networking component, the JavaScript engine, the UI backend and the data storage.

Browser layers, Image 3

The UI (User Interface) part, as introduced above, is what visible for users as they open up browsers. Let’s take a further step and take a closer look at the process of opening and displaying web page on the “code” level. As explained by White, the process of opening the page starts by directing the browser with hyperlink or typed URL. Then the browser sends the address to a network, or a cable-connected ISP (Internet Service Provider). The internet provider sends the address to the nearest DNS (Domain Name System) to find the correct number-formed  IP (Internet Protocol) behind URL. With the IP address, the browser sends a HTTP request to the web server through a router. The web server would read the address and return a signal to acknowledge that it received the request and successfully connect. Then the process of displaying starts. After successfully connected, the web server would send back files contain HTML or CSS to the browser part, which at the same time requests all the documents to compose the whole web page to send back to the browser. As all the files arrived, they will be displayed on the screen, and these files will be also stored in a cache to save time of loading and operating the same page in the future (White, 2007). The JavaScript interpreters enables the display of JavaScript animations and interactives. While the browser engine works as the middleman between the User Interface and the rendering engine. The rendering engine would read the HTML documents to make a Document Object Model tree and display the content (McPeak, 2018).

The process of opening and displaying web pages proves the layered design thinking behind web browsers as well as the computational thinking which used abstraction to display simple and easy output to the users while hiding the heavy and complicated process behind it, for a better and easier consumer experience.

Cookies and Tracking issues

As mentioned earlier that in the operating process of web browsers, the last layer situated at the end is the data storage part. This leads to the contentious feature of the Web and the web browser, the tracking, which raises privacy concerns of users. The tracking feature represents the socio-technical design thinking behind web browsers .

This “haunted-Ad” situation must has been happened to all modern people: after searching about organic coconut water on Google, with using Safari on laptop, this specific brand coconut water appears on the sidebar ads when opening Facebook with laptop, and even also on Instagram sponsored stories on your cell phone. This is a, but not the only, performance of browser tracking.

In order to figure out what is browser tracking and the process of browser tracking, we should first look at cookies. The name of “Cookie” was introduced by Lou Montulli, is acquired from the term “magic cookie”, which is designed to describe the packet of data passed between programs (Stuart, 2002).

Cookies or “cookie.txt”, as White introduced, are small pieces of information stored on devices by websites people visit. The cookies would be sent back to the website each time the user visit this website and notify previous activities on it. So that the web server would get an idea of user preference, which page is most hitted, what articles user read last time he visit, in order to provide better experience and meet the needs of the users (“What are cookies”, 2018). For example, cookies are commonly used for online shopping websites. These cookies would record the personal information that user voluntarily entered, and also the item in electronic carts. So that next time the user open up the same shopping website, the user could see the items he put in the cart last time, just in case he still want to purchase it. This is another example of browser tracking with using cookies.

How cookies work, Image 4

Cookie and Cache seems very similar as they both store data, the biggest difference between is that they serve different purpose. A cache, as introduced earlier, is a information technology to cache web HTML documents and images, as a way to reduce loading time. While Cookies track user characteristics and preferences. Tracking user preference and web activity can be seen as a way to better serve users. For example, streaming websites can have more accurate suggestion for videos, or searching engines can suggest more relevant searches, or even as simple as showing the right language for the website. This feature of web browser embodies the computational thinking, that simplify the complicated features and only present simple abstraction outputs to users.

But, this convenient feature can also be used for advertising and commercial purposes. Even though each cookie from different website takes only a small piece of the information, it is possible that they can be generated and create a profile or a unique id of the user. The “haunted-Ad” situation is possible because of the unique id provided by third-party cookies. Third-party cookies appear when websites feature content from other external websites such as sidebar or banner advertisements (“Internet Safety”). This indicates that the browser tracking feature also reflects the social design thinking of meeting social needs by simplifying the process of advertising.


The web browser we use today is no longer just a simple tool, a thing, but more as a digital mediation which formed by multiple layers and modules, and embedded with the socio-technical design thinkings. The development of the Web and web browsers changed the society and people’s behaviors. Going through the history and background of the Web and web browsers, we can clearly see the architecture they follow reflect the basic technology design thinking and the evolved features enabled them to meet the social need, as a proof of the embedded social design thinking.

Works Cited


“Web Browser.” P2P (Peer To Peer) Definition. Accessed December 10, 2018.

“Browser, OS, Search Engine including Mobile Usage Share.” StatCounter Global Stats. Accessed December 11, 2018.

White, Ron. How the Internet Works. 9th ed. Que Publishing, 2007

Irvine, Martin.The World Wide Web: From Open Extensible Design to Fragmented “Appification”. November 2018

Berners-Lee, Tim. Weaving the Web: The Original Design and Ultimative Destiny of the World Wide Web. New York, NY: Harper Business, 2000.

McPeak, Alex, and Thomas Volt. “A Brief History of Web Browsers and How They Work.” September 07, 2018. Accessed December 11, 2018.

Stuart, Andrew. “Where Cookie Comes from.” ZATZ, June 1, 2002.

“What Are Cookies?” Indiana University. January 8, 2018. Last Modified January 8, 2018.

“Internet Safety: Understanding Browser Tracking.” GCF Global. Accessed December 9, 2018.


Image 1,,2817,1815833,00.asp

Image 2,

Image 3,

Image 4,

Learning Management Systems – Outline

Warning: Use of undefined constant user_level - assumed 'user_level' (this will throw an Error in a future version of PHP) in /home/commons/public_html/wp-content/plugins/ultimate-google-analytics/ultimate_ga.php on line 524

Topic Inspiration: E-Learning has become a central tool for life-long learning, workforce development, and as a replacement or supplement to traditional classroom academic learning.

Platforms to compare in the space:

  1. Duolingo – Freemium, mobile only language learning
  2. Coursera – Full academic and certificate oriented education platform / online “school”
  3. Lessonly – Workforce training platform

Underlying design: Learning Management Systems

Key Components:

  1. Hierarchy file management system – affordances and constraints of complex file hierarchies
  2. Gamification
  3. Testing and repitition

Topics of content motivate the design:

  1. Workforce training
    1. Shorter
    2. Targetted
    3. Certificate or skill authentication vs coursera – traditional oriented

Focus on the learning user interaction with core course content, leaving out the admin interaction, education management aspects.

The affordances of the underlying technologies are implemented in different cases for the topic design case

  1. Mobile language app vs a complex linear algebra course



  1. Class reading on interactive software design – always on and responsive
  2. Icon display
  3. Hierarchy affordances