Category Archives: Final Project

Design Thinking: Is Uber’s UX diminished by Pool?

By Dan Epelle Evelyn

CCTP 820: Leading by Design: Principles of Technical and Social Systems

Georgetown University, Fall 2019

Depending on what is important to you, Uber’s Pool ride-sharing experience further provides affordability and flexibility for passengers, and even claims to be convenient for drivers. Uber Pool includes advantages that are far more reaching than cost-effectiveness for riders; like how pooling connects people in a given social network where two or more people share a short ride, then conveniently cut costs for doing so.

News stories[1] have even been published on social relationships brought about by social connections made via Uber Pool. Think of the algorithm that executes the Uber Pool ride option; an intelligent system letting passengers share a trip’s cost while heading toward the same direction. We see here, how such an algorithm is designed to intentionally put passengers heading in the same direction together. This calculated attempt is also embedded in Uber’s acclaimed mission to make commuting affordable for riders and effortless for drivers[2]. Five years have now passed since the service was first launched, so measuring the introduced effect of  Uber Pool ride-sharing service on Uber’s User Experience (UX) is feasible.

Getting picked up at your doorstep is the opportunity cost forgone when a rider uses the Uber Pool option which saves costs but requires waiting and/or walking a distance to meet with a driver. In this document, I reflect on the User Experience of Uber since the introduction of its carpooling function. I use design thinking for deblackboxing and analyzing the socio-technical components behind the Uber experience. In doing so, I consider the technologies that have been made available to Uber which enabled the design of this innovation. I also identify the unique ways in which the design might be combined for improved User Experience (UX). In doing so, both quantitatively and qualitatively, I provide support for the hypothesis that Uber Pool diminishes Uber’s User Experience (UX) as a result of an algorithm that is weighted in favor of the driver’s convenience, over user experience for riders.


In collecting data for this paper I have utilized my knowledge of the Uber Pool experience as an observer and user of Uber as a service, combined with theories, design principles for building application software and academic resource from CCTP 805 Leading by Design course taught by Communications, Culture, and Technology (CCT) program Founding Director, Prof. Martin Irvine, at Georgetown University. This includes ideas from class discussions, weekly blog reflections, social interactions debating the subject with peers and industry professionals, and for depth, a case study directly connected to the design of Uber Pool by Uber Technologies.



Uber as an App is designed to be simple: push to start, one button. As an application of technology in transportation, Uber is also very close to so many of us, at least as many as there are, that own a mobile phone, internet connectivity, and have downloaded the Uber App for any purpose that ranges from rider, driver, partner, consultant, etc. This number is in hundreds of millions. The Uber App is built to provide safe transport of persons across distances, and when such persons are grouped together, we can trust that a combinatory arrangement of that nature can also contribute to reducing traffic-related congestion in urban areas – especially where road network problem is directly correlated with car traffic congestion. Uber has delivered this service in over 600 cities globally since it was founded in 2009 by startup gurus, Travis Kalanick and Garrett Camp.

Its socio-economic value has accelerated the company’s status from the pride of San Francisco[3] to one of the fastest-growing startups in the world[4].  All of these conveniences, however, packaged by Uber, comes with a need for consumers to compute the trade-offs when determining how much of Uber’s advertised convenience they can now afford.

Some of these computations and deliberations around tradeoffs when using Uber as a service became even more pronounced with the introduction on Uber Pool in 2014. Uber itself has had to make adjustments, which might suggest to an outsider that the company acknowledges the inconvenience brought about by Uber Pool since its release. Of course, adjustments are what you do, and not scrap off a new feature, which has otherwise become the most profitable for the company following its release. Some insights presented by research are in favor of the notion that Uber’s user experience (UX) is, however, burdened by Uber Pool. What can we expect from anything that must yield profits as a business, gain socio-economic significance as a service, and provide convenience for millions of humans as a technology? As a person, Uber might be full of stress in this regard – this, according to world data, is associated with anxieties common to the ride-share experience: passengers aren’t given any information beforehand as to where their co-riders are sitting inside the vehicle, making it difficult for them to confidently reach for the door with the vacant seat[5]. This comes after perhaps walking a short distance to meet the driver who by design isn’t obligated to come to you at your location – because you did not pay for it.

In all, the inconveniences posed by Uber Pool are felt mostly by riders, and this negates the pledge for ease of use the brand promised since inception. Although recent updates to the Pool service shows concern from Uber about its UX design, not so much exists as ‘changes’ made specifically to reconcile the difference.

Fig 1.3 – Simulating constraints and setbacks with Uber Pool (Source: Medium)

If any support for claims that Uber prioritizes the experience of drivers over the convivence of riders is shown, what could possibly be the design justification of Uber Pool? How does Uber justify a design that intentionally directs rider traffic to drivers, increasing demand and offering ‘reduced costs’ to passengers at the expense of their own comfort? A key component of Uber’s service is its promise for ease of use and increased convenience. In what ways does walking a distance to your driver or perambulating with strangers to strange drop-off locations sound convenient?

In addition to these budding problem statements, Uber Pool is also reportedly growing inaccurate with coordinating navigation data, this sometimes can keep a rider irate, anxious and confused all through a ride – I have personally wondered on one such occasion if the predicament was worth the $0.87 saved for choosing to pool with others. It is essential to know that this choice exists because Uber presented an option which is designed to be perceived as convenient for riders.

“I have personally wondered on one such occasion if the predicament was worth the $0.87 saved for choosing to pool with others” – Uber Pool Reviews  


Universal Design Principles: Uber as an App

In today’s climate, humans are growing more and more dependent on technology. Apps that confidently and repeatedly promise convenience while offering a user experience that combines minimal human effort for implementation, will most likely succeed. Uber has successfully capitalized on this from the start, and its commitment to providing convenience in commuting as an advantage over the traditional cab is now globally acknowledged as successful. After success, however, comes failure – a given with nearly all technologies that have been designed and pushed on the market as super convenient and overly efficient. After taking hundreds of Uber rides in tens of locations including Africa, Europe and the United States of America, I confidently report that; user experience is not the same for any two cities, neither do I expect it to be, given the complexities involved with cross-cultural translation of design for products and services in general. The main ideas however that brought about research into Uber’s diminishing User Experience (UX) with respect to Uber Pool, came about by firsthand knowledge of the shortcomings with Uber Pool in location-specific Washington DC, Los Angeles, and New York City.

Fig 1.3 – “Don’t book an Uber Pool if you’re in a rush” – Uber Pool Review via Twitter.

Designing a transportation solution for use in a clustered and compact city will have considerable limitations if (or when) applied as a solution in a less-dense and wider population cluster. These limitations might take the form of the need to vary design combinations and employ additional technology that supports and satisfy a location-specific need. To re-imagine Uber as an App and its implementation of universal design principles, one can think about a few questions that support the design of applications built as technology for transportation.

  • In what ways has Uber combined and executed a special implementation of technology for transportation?
  •  In the use of general technology for App development, which features make the Uber App unique?
  • To what extent is Uber customizable to fit the unique needs of people and cities?

Some obvious combinations are manifest in the User Interface (UI) layer of the Uber App. All transactional functions are implemented via a payment gateway in handshake with banks and other associated financial institutions. Uber has no special in-app bank and operates a cashless system (in some countries) but combines technology for card transactional services by leveraging an Application Programming Interface (API) to simplify the implementation and maintenance of financial services. Maps to unlock the iOS and Andriod geolocation feature is also seen implemented as a GPS approximation and navigation function in the design of the Uber App. This technology is implemented by a MapKit and CoreLocation Framework on iOS which allows Uber to customize the observable features like device tracking, routing and uses a scheduler for adding riders based on real-time simulations and approximations on a server. On an Andriod device, Google’s location API’s is the equivalent technology being implemented on Ubers technology stack.

All in-app communications including driver-rider communication, corporate communications, disputing cancellation fees or reporting a lost item after a ride are done by implementing the conventional mobile text messaging and telecommunications technology. Uber employs Twilo, Apple Push notification service, and Google Cloud Messaging to make these features a possible part of the Uber experience. Uber also dabbles into gamification for designing algorithms that execute business development strategies – like the sale of Uber Pass, the latest combination that offers discounted rides up to 10% each month at a flat rate of $14 as a monthly subscription fee. Again, this is a designed effort to stimulate some sense of convenience, while guaranteeing steady monthly earnings for the company. Lastly, all of the clicks and swipes within the App are also implementations of the standard operating principle of a mobile phone.

These makeup for some of the most obvious technological features combined and designed as a socio-technical system for transportation – but what other invisible elements constitute the combinatory design principle that enables the Uber experience, the Uber experience brought about to be, that which we cannot merely see?

“Pattern recognition is an essential skill for creators. See the patterns in user behavior and how to change them. Understand the implicit patterns of use, layout, and function in your work. Then, make them explicit.” — 77 Things, Uber Technologies. 

Socio-Technical analysis and Uber’s BlackBox

Fig 1.5 – What other invisible elements constitute the combinatory design principle that enables the Uber experience? – Uber’s Blackbox

The boundary mapped by Uber in a social-technical system suggests how modern and native to America the App really is based on the technologies it adopts and the human actors at play individually or as corporations that directly service Uber.  In orchestrating the Uber experience, we see how modular and customizable Uber is an APP.  Considerable changes in Uber’s design is first tested out in some regions before spreading out to other locations permissible by its global footprint. Uber has shown this layer of its controlled operations by making available Uber Pool in few US cities – as with any new function before it would be spread out to other locations – albeit if Uber Pool succeeds in the select locations.

Uber, like every other company, keeps trade secrets of its own which empowers the company to succeed and keep its competitors on their toes. To point out one way in which I have determined that these hidden configurations exist, I draw oversight from a notable competitor.

Lyft is taking forward steps to uniquely combine transport technology to gain a competitive advantage in the growing industry. Ridesharing on Lyft boasts greater conveniences as it posits through its unique service that; picking up a rider from their location should be prioritized – but at a small extra cost which Uber scratches off when a rider walk in the case of Uber Pool. There are obvious differences and this orchestrated complexity is weighted on how each company combines and coordinates its own computation for maximum profits in the ride-sharing business.


Below is an instance analyzing the rideshare option on Uber and Lyft. The locations are set identically, with rides requested at the same time of the day,  commuting to the same destination.

The table below compares all identifiable differences and similarities in the featured App data:

Fig 1.7 – App Data Analysis: Random sampling Lyft versus Uber ride-share options.

In the random sample analyzed above, we visualize some important decisions a rider might make, each time ride-sharing becomes an available option to them. Depending on what is most important in the moment, a rider might opt for either of the service providers above and a user’s experience with using either of these services is dependent on the outcome of such decision.

“User Experience UX is an iterative process where you take an understanding of the users and their context as a starting point for all design and development” – 

Fig 1.6 – Uber at scale showing ventures and sub-ventures of the brand.



Uber is built as a mobile application and adopts universal design principles to enable basic user interaction like adopting swipe, click and touch features of a mobile device for ease of use. Uber has also scaled to becoming a venture with several other sub-ventures offering services that relate to servicing logistics needs for billions of people around the world.

At this time, there isn’t room for riders to customize and coordinate the Uber App, so the constraints faced by riders who opt for Uber Pool would linger until it is addressed by Uber to directly enhance its dwindling user experience. In the meantime, we can, however, challenge the complexity of the Uber App design and ask why functions like Uber Pool are designed the way they are and not some other way. Already Uber attempts to give answers by providing a ton of options within its App, enabling users to take part in decision making and giving them a sense of inclusion in the design process.

  • How much do you value your time at the moment?
  • How much are you willing to spend?
  • Are you open to the concept of social networking, chatting with strangers or meeting your next best friend?

“It’s like playing with Lego: the basic brick doesn’t change, but the builder uses it to create, unleashing its potential. Our components are basic at their core, but also highly customizable through style overrides and can be configured in many ways” – Uber Technologies.

The invisible parts that make up for what we see and know remain the most powerful in the design of Uber as a product, technology, App or service. These parts are responsible for creating the experience we now perceive as a burden in terms of user experience. In the creation of a product or service, the User Experience (UX) design takes into consideration all of the end-user needs for the formulation of values, meaning, and relevance embedded in the overall experience of using the product or service. User experience (UX) design is the process design teams use to create products that provide meaningful and relevant experiences to users. This involves the design of the entire process of acquiring and integrating the product, including aspects of branding, design, usability, and function.

Key findings from this paper are summarized thus:

  • The affordances or constraints of native mobile device features, and most of the hidden layers behind the Uber app, have nothing to do with Uber:
  • Modularity allows Uber to manage a larger and more complex whole structure by dividing up its functions into separate, interconnected components, layers, and subprocesses.
  • Data network design is organized as a stack of layers for the abstraction of functions into separately managed modules that pass information “up” the stack for the functions of the whole network.

“No product is an island. A product is more than the product. It is a cohesive, integrated set of experiences. Think through all of the stages of a product or service – from initial intentions through final reflections, from first usage to help, service, and maintenance. Make them all work together seamlessly.” — Don Norman, inventor of the term “User Experience”






Augmented Reality – Interactive and Immersive Design


We have probably witnessed the applications and possibilities it brought about in many aspects of daily life, such as entertainment and gaming. But Augmented Reality is more than a BeautyCam filter or cutting fruits on Fruit Ninja, AR applicability extends to practical fields including medical, military and education. Although we might have unintentionally encountered some common applications of AR already, we are not necessarily aware that they are AR-based because the terminology seems elusive and abstract, for example, what exactly is being augmented? What are the ways of augmenting? What is the ultimate purpose of the augmentation?

The definitions of AR vary, but in essence they all indicate a characteristic, Augmented Reality can be perceived as a medium where digital information overlaps with the physical environment (Craig, 2013). In Craig’s work Understanding Augmented Reality: Concepts and Applications, he proposed that “the ultimate goal of augmented reality is to provide the user with a view of the surroundings enriched by virtual objects”. Indeed, humans have been modifying the surrounding conditions of the reality to make living easier since day one. However, it was not until the emergence of Information Age did the majority of the alteration shift from sufficing survival to gaining as much as information as possible. Today, digitalized computers allow enormous amounts of information to be retrieved, saved and available for manipulations speedily. One can easily find traces in this respect in AR applications, let’s take the simplest example, the digital maps allow us to gain information of a certain place that we are not physically placed at. While we are using the application, we get a faster comprehension (than actually getting to the place to gain information), a possibility of gaining information. In Engelbart’s 1962 work Augmenting Human Intellect: A Conceptual Framework, the author defined the concept “augmenting human intellect” as increasing capability to face a complex problem, to gain comprehension to suit particular needs, eventually to resolve the previously complex problem. Based on this connection, the ultimate goal of AR is to challenge and redefine the existing reality, to derive corresponding solutions to the emerging problems. From this ongoing process, not only the amount of information is augmented, but also human intellect.

In this paper, I will discuss two of the essential design principles developers adopt to improve the usability of Augmented Reality applications. Sorted by hardware devices, software bases, applicable fields and so on, the number of applications can be innumerable. Applications and settings can be infinite, depending on human’s initiatives and technological bedrock. For this reason, this paper will only focus on mobile augmented reality (MAR) experience.

1.1 Interaction design

In any interactive design, it takes computer intellect to form a platform and human intellect to comprehend. Perhaps we can refer this experience to the metaphor of watching a movie, while the lighting, angles and tones of the set can be as compelling as possible, it is the interpretation of how the observation constructs meaning in real world situation that helps viewers understand the story it intends to convey.

1.2 Elements to interact

Although AR is designed to be interactive, this process is not always visible. It is hard to be fully aware of the interactions going on in the space and time, for instance, it remains ambiguous to most people about what “reality” is being augmented and what the virtues of the augmentation are. To better understand AR and take an active role in participating in the interactive process, one must determine what is there to interact with and the underlying design techniques that enable it.

The definition of Interaction Design (IxD) is abstract yet self-explanatory in the title. To successfully interact, both product and user need to contribute their share of effort. As Gillian Crampton Smith proposed, Interaction Design consists of 5 dimensions, 1) words, 2) visual representations, 3) physical objects/space, 4) time, 5) behavior. The first four dimensions encompass what products and services (digital/non-digital) have to offer, while the fifth dimension (behavior) stresses the importance of the user interface, in this respect, users are encouraged to realize their goals and objectives as much as possible by using the products.

1.1.1 Words

History and culture endow characters and letters with specific meanings. In IxD, words serve as one of the essential elements to improve the usability. In common with any other application, A successful AR application should have enough words to explicate the instructions and elucidate the usage, allowing users to form an understanding of what the next step is and what goals can be achieved using the application. The amount of words should be concise enough to make clear the objectives instead of providing overwhelming information.

1.1.2 Visual Representations

“Humans are visual animals”, this statement holds true in the context of using application. In line with the first element, visual representations adorn applications with cognitive symbolism. For instance, we have long figured out that hands can be utilized to grab and drop objects, so when the cursor turns into a hand shape, we know it means the targeted files can be moved to almost any other spots on the screen. In a word, affordance that suits its intended usage is appreciated in an application. Most of the time, instead of giving out wordy instructions to proceed, simply put out a button-like representation and the usability is enhanced by the suited affordance. (See Fig. 1)

(Figure. 1. On apple’s measure app, words as descriptions and cinemagraphs are presented to indicate the possible movements the application can recognize. Visual representations such as images and videos deliver an instant instruction for users.)

 1.1.3 physical objects/space

The third dimension takes context into account, it is the physical environment within which users will be interacting with the products. Since this paper mainly discusses about design principles of Augmented Reality applications on mobile devices, the object (the device as virtual window through which users experience the products) is the mobile devices such as laptop and smartphones, the range of space (physical environment where users use the products) can be as broad as desired enhancement of environment can be achieved. (See Fig. 2)

(Figure. 2. Real-time maps allow users to garner information from anywhere on the maps, this process can be done by almost any digital devices and anywhere as it is a personal context.)

1.1.4 Time

 This concept can be interpreted as the time that users spent on interacting with the application. Users get feedback (audible/visual) from the application and over time, participate in a complete interactive process with the application. Reasonably timed feedback from products is crucial in constituting this dimension, as users gain further instruction and information from feedbacks to their actions hence the steps of interaction unfold. The amount of time depends on the capability of specific applications and the depth of purpose that users intend to obtain. (See Fig. 3)

(Figure. 3. Amazon’s Augmented Reality function allows users to view products on the intended surface before purchase. The products move around the room with the motion of the user’s fingers and when placed, devices vibrate to indicate that the product has “dropped” on the surface. When surfaces are not detected by the device, there will also be words and visual representations on the screen explaining the error occurred.)

1.1.5 Behavior

 In relation to the user interface, behaviors are considered as a range of actions conducted by users to interact with the product, including operation, presentation and reaction (Kevin Silver, 2007). In IxD, the first four dimensions integrate at this step, shaping users’ behaviors (i.e., the predefined possibilities or constraints of command) and encouraging users to create a personalized experience. 

1.1 Summary

 Computers are participatory medium (Murray, 2012). As a computational system, interactive is an innate feature of augmented reality experience. The level and quality of interactions depend both on computer and human interface, how effective does an application present its ideas and provide clues to users, the first three dimensions (words, visual representations, physical objects/space) can be directly improved in design processes, while the last two dimensions (time, behavior) are engaged with user interface, thus they are influenced but not straightforwardly altered by the technical modifications. However, they can be directed to develop positive and compelling interactions with the application if the elemental designs are successful (e.g., timely feedback).

2.1 Immersion technology

Most forms of media utilize certain senses of human body. We can read a book, listen to the radio, however, within the framework of Augmented Reality (AR), using only eyes, ears and hands would not have achieved an optimal experience for users. That is what differentiates AR from other forms of media – an immersive user experience (Craig, 2013). 

 As far as the current development of AR applications on mobile devices is concerned, to assert that AR provides a complete immersion would be somewhat unrealistic. Unlike Virtual Reality (even VR has its limitations, e.g., locational mobility), Augmented Reality leaves the users with connections with the physical world, meaning it has sensual boundaries of the environment it shapes. Applications and settings of AR can be infinite, depending on human’s initiatives and technological bedrock. However, full immersion has yet to be achieved.

 Notwithstanding the efforts in progress, the total immersion in most AR applications has not been successfully registered in the physical world, due to both the limitations in design techniques and the human factors (Aukstakalnis et al, 2016). So far, there’re few academic works and pragmatic studies on AR’s total immersion theme. Whether MAR total immersion is a realistic goal remains in question, however, feasible means are discussed in existing literature about how enhanced immersion in AR applications can be achieved by ameliorating organizational/modular system (hardware and software components). Given that this paper only discusses MAR, the ensuing discussions will be centered on the software layer. 

2.1.1 Sensual design

 Visual design covers a range of standards in software components such as image processing and recognition. Studies have indicated that humans garner information mostly (80-85%) by visual system, largely exceeding that by other senses (Politzer, 2015). Human eyes receive lights reflected by objects, then stimulate the cognitive system in brains to process and recognize objects. For this reason, visual design is a pivotal factor in AR immersion because it decides whether users are able to establish beliefs in the digitized environment while creating little maladjustment. (See Fig. 4 and 5)

As Fig. 4 and 5 presented, the digitized information (the animation character and spider) is a virtual layer overlaps the actual environment captured by the camera. Because the figures do not blend in seamlessly with the real-world environment, the degree of immersion (the level of disbelief in virtual environment) is low compared to those applications that take in-depth simulation (e.g., brightness, contrast similar to that in actual environment) into consideration. (See Fig. 6)

(Figure. 6)

 In contrast, Civilisations AR, an application launched by BBC, packs more pixels into given screen areas, improving the quality and authenticity of the digitized information, thus telling a more compelling story. 

 Other improvements to the software layer can also be made to generate a more immersive experience for users, such as feedback timing, which affects the latency in human-computer interaction (UCI).

2.2 Limitations

 For immersive goals, developers must consider improvements both in hardware (e.g., head-up displays, two-handed panel, etc.) and software components. For the limited scope in this paper and a few instances and cases in existing business and academic fields, the last part was not able to develop comprehensively. Moreover, for the lack of time, other feasible software improvements were not presented (e.g., audio effect, object recognition, etc.). Moreover, since the users’ interaction plays a vital role in creating experiences, human factors also need to be taken into account, because AR is essentially a hybrid image of the digitalized and the physical, so users might choose certain information but not all to process (Bolter et al, 2013).

3.1. discussion

 As we discussed in the previous paragraphs, augmented reality is both an interactive and partially immersive experience. Although whether total immersion of AR remains unsolved, the subject itself is designed to be heuristic, which means that in AR design process, user experience is not the main concern but rather the challenges it can impose on the reality and the derivative solutions to the problems. With further development, AR has the potential to reach a ubiquitous level, advanced immersion design (images, audio, feedback) relates to interaction design and provide users with better AR experience, conversely, interaction design helps users develop refined immersion experience.



Engelbart, D. C., and Friedewald, Michael. Augmenting Human Intellect a Conceptual Framework . Fremont, CA: Bootstrap Alliance], 1997., 1997. Print.

Interaction Design Foundation, The Encyclopedia of Human-Computer Interaction, 2nd. Ed.

Sziebig, Gabor. (2009). Achieving Total Immersion: Technology Trends behind Augmented Reality- A Survey.

Fischer, Jan & Bartz, D. & Strasser, W.. (2005). Stylized augmented reality for improved immersion. IEEE Proceedings. VR 2005. Virtual Reality, 2005.. 2005. 195-325. 10.1109/VR.2005.71.

Jacobs, Marco, Livingston, Mark, and State, Andrei. “Managing Latency in Complex Augmented Reality Systems.” Proceedings of the 1997 Symposium on Interactive 3d Graphics. ACM, 1997. 49–ff. Web.

Aukstakalnis, Steve. Practical Augmented Reality: A Guide to the Technologies, Applications, and Human Factors for AR and VR. Old Tappan, NJ: Addison-Wesley Professional, 2016.

Dunleavy, Matt. “Design Principles for Augmented Reality Learning.” TechTrends: Linking Research and Practice to Improve Learning 58.1 (2014): 28–34. Web.

Murray, Janet H. Inventing the Medium : Principles of Interaction Design as a Cultural Practice . Cambridge, Mass: MIT Press, 2012. Print.

Craig, Alan B. Understanding Augmented Reality: Concepts and Applications. Waltham, MA: Morgan Kaufmann / Elsevier, 2013.

Bolter, Jay David, Maria Engberg, and Blair MacIntyre. “Media Studies, Mobile Augmented Reality, and Interaction Design.” Interactions 20, no. 1 (January 2013): 36–45.

Choudary, Omar et al. “MARCH: Mobile Augmented Reality for Cultural Heritage.” Proceedings of the 17th ACM International Conference on Multimedia. ACM, 2009. 1023–1024. Web.

Unlocking Facial Recognition – The design and implications of Facial Recognition and how privacy may or may not be compromised by using it. 

Eish Sumra, Leading By Design Final Project, December 2019


Facial recognition is both an innovative and intimidating feature growing in prevalence on smartphones. Most newly released phone models include some for of facial recognition or fingerprint sensors in order to unlock the device. This presents us with many questions about how they are designed, both as an interactive element of hardware and software, but also how they deal with the data they need to function correctly. In order to establish these processes we must de-black box the product, by looking specifically at Apple’s popular Face ID, how it works, what ecosystems of data it collects and creates and what implications there are for privacy rights. Then by including debate by commentators and rhetoric from a prominent court case, a clear narrative on the burden of protection can be built to discover which party, the user, the company or the government is responsible for the security of a person’s information. 


In the past decade alone, how we unlock our smartphones has changed as much as the hardware itself. From simple two-button unlocking systems to passcodes, to ‘connect the dots’, to fingerprint sensors, to now Apple’s prominent ‘Face ID’ – we have seen the idea of phone privacy redefine how we interact with our devices. Originally pushing buttons were our only form of keeping our phone locked when offline, now with elaborate hardware/software interaction, companies have created interactive ways to open your phone and supposedly protect it from others. 

However, the concept of facial recognition and fingerprint sensors, as impressive and future-forward as they may seem, the information needed to make the processes work are quite personal and if stolen or shared inappropriately, could lead to vital data being given or taken to external parties who can use it for other means. While there are few mainstream stories of anyone exploiting these processes, it doesn’t necessarily mean we are all safe from cyber misuse. Additionally, these seemingly exciting features lead to our extremely personal and individual information, fingerprints, facial identification, being shared with a company or a product. This in itself is a potentially dangerous precedent to be set by tech companies – encouraging the sharing of impossible information in order to use a basic and necessary function on our phones. These processes are optional, one may disable the use of facial recognition or fingerprint sensing if one doesn’t want it, however, the vast population of smartphone users still use both functions or one of the two, perhaps without truly understanding what information is being shared and how our personal data is being used – not to mention the implications this has on our own levels of privacy. 

This paper investigates both Face ID and Touch ID, how they work and the implications of the mechanisms at play. Little is understood by consumers about what is happening when one opens their phones, yet the information being used is important and specific to each person. By reaching into the black box of phone security and unpacking the various levels of design and technical systems, we are able to compare the safety of both functions and discover how using them affects the users. By using information shared by the companies of Google and Apple, we can establish the journey of our personal data and compare the accessibility of such data. 

Once the mechanisms are understood we can break them down and look at design flaws and security implications of both. Then we can ask the question: “Who is responsible when it comes to our privacy on smartphones?” Do governments and policymakers have to ensure boundaries are put in place to protect consumers? Is it the duty of phone manufacturers to stick to their promises of secure usage or is the burden on us, the general public, to consistently scrutinize these functions and the companies at play, and avoid/use the devices depending on our own moral position on privacy? With three key players in this field, sometimes it can be unclear who is at fault if anyone is at fault when private information is shared. However, the lines can be seen clearly once the mechanisms of these functions are truly understood and the commitments of tech companies evaluated accordingly. In my research, I hope to lay out the narrative as it pertains to the simple act of unlocking a phone, however, this debate is wide-ranging and includes other applications and other smart devices such as the Amazon Echo or Google Home. Privacy is perhaps one of the biggest issues facing our time, as the more we are connected with one another, the more exposed we are to threats, hacking and exploitation. Our human interactions are becoming ever more interwoven with technology, from our speaking patterns and conversations being heard by smart speakers, to films of private property being seen by smart security cameras. I hope to lay out what our privacy expectations should be and how we can ensure the safety of billions of people online. 

How does Face ID Work? 

Face ID is set up by the use of the front camera which projects over 30,000 dots from infrared light, onto your face, tracking the undulations and specificities of one’s profile. You have to move your head in a circular motion, up and down, so the dots can cover as much surface area as possible, building a digital picture of your individual face. The infrared map of one’s face is translated into a mathematical representation for your machine. What makes Face ID particularly interesting is the ability to detect one’s face, even when wearing a hat, scarf, sunglasses or a beard

Face ID is just one of many facial recognition softwares currently in use by phone manufacturers. Each one operating using varying technology but all with the same goal, to create a system of translating the contours of a head to a mathematical code that can be understood by an operating system. As a phone function, it is impressive in both its specificity in identifying different faces and its ability to continually update and modify the mathematical mapping which has taken place, allowing natural or frivolous changes to one’s appearance to happen without affecting the phone’s ability to recognize identity. 

As a design concept, I have been drawn to facial recognition because it has become standard across many disciplines, not just personal technology. In airports, it is used in immigration lines, in some countries it is used through closed-circuit cameras on streets and by regular apps such as the Apple photos app which identifies faces and collects pictures into albums relating to specific people who crop up multiple times in one’s camera roll. It is also an intensely personal entity to involve in technological processes, one which, depending on your views, can be a great way to ensure privacy for your device, or be vulnerable to exploitation. Using our class materials, we can determine facial recognition to be a great piece of interactive design. Using Ben Shneiderman’s: ‘Eight Golden Rules of Interaction Design’, the Face ID example of facial recognition stands up to many of the tests. 

The first rule is “Strive for consistency,” the rule states: “consistent sequences of actions should be required in similar situations.” Naturally, Apple has integrated Face ID into multiple uses, such as downloading an app, opening an app or paying for items using apple pay. The interface is the same each time and the function works with similar speeds and exacting results with each form of use. The second rule is “Seek universal usability”, with Shneiderman going on to say that: “Recognize the needs of diverse users and design for plasticity, facilitating the transformation of content. Novice to expert differences, age ranges, disabilities, international variations, and technological diversity each enrich the spectrum of requirements that guides design.” Once again, Face ID allows for all users to be able to access their phones using the software. This is because mapping infrared dots on an object doesn’t discriminate, it literally is a map of lines, shapes, and depth, which means regardless of aging skin, the color of your skin, or the uniqueness of your appearance, the sensors should be able to map and therefore respond to whatever is put in front of it. The information released about Face ID also states that: “Accessibility is an integral part of Apple products. Users with physical limitations can select “Accessibility Options” during enrollment. This setting doesn’t require the full range of head motion to capture different angles and is still secure to use but requires more consistency in how you look at your iPhone or iPad Pro.

Face ID also has an accessibility feature to support individuals who are blind or have low vision. If you don’t want Face ID to require that you look at your device with your eyes open, you can open Settings > General > Accessibility, and disable Require Attention for Face ID. This is automatically disabled if you enable VoiceOver during the initial set up.” This inclusion of ‘accessibility’ into design thinking further enhances the strength of its interactiveness. 

Number three and four are “Offer informative feedback” and “design dialogs to yield closure”. With Face ID, the system tells you to move closer or hold up your phone at a different angle through direct verbal communication or through simple signs such as the ‘shaking’ of the padlock icon, showing the phone isn’t receiving the information it needs to open. The ‘dialog’ of the function is clear, with the padlock icon being shown on the locked screen when the screen is woken up by a physical movement or touch, it will then show an opening of the padlock icon when the user’s face is identified by the sensor, then the screen tells you to ‘swipe up’ because your phone is unlocked and you are able to access your apps. It is a simple set of processes, but one consistent with good design. 

Number five is an obvious rule:  “prevent errors” – which is something Apple, in particular, has worked hard to do, many manufacturers are also attempting to do the same. The rule states: “As much as possible, design the interface so that users cannot make serious errors; for example, gray out menu items that are not appropriate and do not allow alphabetic characters in numeric entry fields.” Facial recognition services should not respond to non-human entities, nor should they respond in any way to an entity that is human but not the user themself. It prevents this through ensuring that in order to open, an eye or iris is detected in the mapping of a face. The final three rules are “permit easy reversal of actions,” “keep users in control” and “reduce short-term memory load.” All three are met easily. You can lock your phone using a button, only a user can (or should) open the phone using the phone, and no knowledge is needed of the function apart from the general awareness that the phone can open using one’s face. It is obvious that this design feature of smartphones definitely presents as a well designed, simple to navigate function and so is a strong design addition to any smartphone from a pure usability standpoint. 

Is Facial Recognition Safe? 

The burning question around facial recognition is not whether it is an interesting feature, or whether it works correctly. Facial recognition can be argued to be a much-needed form of protection for one’s device, much more than a passcode that can be viewed by others or figured out through other methods. The phone is in essence protected by your face being on your body only. However, the clear question is what happens to the data your phone collects? How vulnerable is the system of facial replication? Most importantly, can anyone access the digital map created of a user’s face?  

Companies have thought long and hard about how to design facial recognition on personal devices and how to ensure the information gathered doesn’t make a user susceptible to malicious intent. Apple has released a comprehensive guide to how Face ID specifically protects your data. One key passage which sticks out is the following: 

“Face ID data doesn’t leave your device and is never backed up to iCloud or anywhere else. Only in the case that you wish to provide Face ID diagnostic data to AppleCare for support will this information be transferred from your device. And even in this case, data isn’t automatically sent to Apple; you can first review and approve the diagnostic data before it’s sent.”

What is key here is the idea of consent. The user can review and approve any data sent to Apple, apart from this the data does not leave a device, nor is it uploaded to any cloud system. This is a surprising and ultimately impressive design function built by Apple, whereas I went into my investigation thinking that Apple surely collected data externally in order to help its software run better, it does not. Instead, the software ensures that the ability for Face ID to learn more about its efficacy from data, is contained within the confines of the device, with no interaction with any other system within the family of Apple products or mechanisms. The company ensures that a separation between operations remains instead of actively using data we give our devices in a centralized control system (such as the cloud). 

The same rules apply for non-Apple applications, with the privacy report stating that: “Within supported apps, you can enable Face ID for authentication. Apps are only notified as to whether the authentication is successful. Apps can’t access Face ID data associated with the enrolled face.”

Some writers and technology experts have questioned how far Apple can go to protect the users, with author Jake Laperruque writing an opinion piece for Wired magazine about how Face ID could be a weapon for mass surveillance. Laperruque says that: “Apple doesn’t currently have access to the faceprint data that it stores on iPhones. But if the government attempted to forced Apple to change its operating system at the government’s behest—a tactic the FBI tried once already in the case of the locked phone of San Bernardino killer Syed Rizwan Farook—it could gain that access. And that could theoretically make Apple an irresistible target for a new type of mass surveillance order.” The author goes on to say that: “To many these mass scans are unconstitutional and unlawful, but that has not stopped the government from pursuing them. Nor have those concerns prevented the secretive FISA Court from approving the government’s requests, all too often with the public totally unaware that mass scans continue to sift through millions of Americans’ private communications.”

Despite his concerns, Laperruque says that Apple has been a fierce protector of privacy rights and that the problems or concerns could arise from governments using the data or having access to such data through coercion. He points out that this should be the focus of any hesitation from the public to use facial recognition: “The public should demand that Congress rein in the government’s ever-growing affinity for mass scan surveillance. Limiting or outlawing the controversial Upstream program when the authority it’s based on expires this December would be an excellent start, but facial recognition scans may soon be as big a component of mass surveillance, and the public need to be ready.”

So in order to be trusted, companies such as Apple have made sure that they promote their belief in user security and they continue to fight for the interest of consumers instead of those who wish to exploit the data. Google is another company that has been openly committing to advancing its technology so that the recognition software could not be exploited. An affordance the firm allowed for was speed when deciding to let their software respond to faces with closed eyes. This could, in theory, mean that a third party can gain access to a phone by holding it up to the user while they are sleeping. However, the backlash from customers was swift and ensured that with their most recent smartphone the Pixel 4, they worked on adapting the software to protect from such instances. The Guardian newspaper covered this shift and reported that: “Google has announced an update that will offer a more secure option. ‘We’ve been working on an option for users to require their eyes to be open to unlock the phone, which will be delivered in a software update in the coming months,’ it told technology website The Verge. ‘In the meantime, if any Pixel 4 users are concerned that someone may take their phone and try to unlock it while their eyes are closed, they can activate a security feature that requires a pin, pattern or password for the next unlock.’ Google’s initial decision was based on a tradeoff between speed and security, with the company focusing more on speed than Apple had when it launched its competing system in 2017 alongside the iPhone X.”

This in itself is an encouraging turn of events, with Google admitting that privacy is more important than operational speed or comparative advantage against other smartphones. 


Facial recognition can be an intimidating function for anyone to use. We have placed the most identifiable part of ourselves in the hands of a company, without much say in how they use that data, only whether they use it. While multiple experiments show there are kinks in the armor of the software, this is in no way enough to diminish the extraordinarily specific readings it can provide the phone with. Certainly, it is easy to see why using your face is a better way to ensure your phone works in your own hands, compared to numerical entry or pattern recognition. Our face is not something that can be easily replicated, nor is it something which one can ‘figure out’ like date of birth or a preferred set of digits. As a design concept it is well thought out and well-executed, while being a very natural way to open a phone as you are usually facing your phone when you are trying to open it. Apple, Google and other phone providers have acted well to make sure that the data isn’t shared or stored in the cloud, it merely exists on the physical device and any updates or modification to the data is kept in the same ecosystem. Only information about how well the devices are responding to faces is shared with companies so they can evaluate the efficacy of the feature. Privacy is maintained because it is designed to exist in the singular module of a phone instead of the vast online systems which connect our phones to each other and to the digital world. Therefore, as it is extraneous to design systems it works in tandem with the phone’s operations but is not dependent on any other operations which take place outside the hardware. It is natural to fear the loss of such information, however, companies have taken it upon themselves to write clear rubrics to ensure that users feel safe, in fact, in contrast to social media companies who regularly hide information regarding security from the public in order to freely collect masses of data which they can use to further exploit users behavior and online lives.

Additionally, another win for individual privacy came earlier this year. While on a federal/international level there are not mandated laws protecting civilians from having governmental/public institutions insisting on using one’s information to open a phone, in California, there was a ruling to the contrary. Forbes reported that: “A California judge has ruled that American cops can’t force people to unlock a mobile phone with their face or finger. The ruling goes further to protect people’s private lives from government searches than any before and is being hailed as a potential landmark decision.”

The article goes on to state that: “But in a more significant part of the ruling, Judge Westmore declared that the government did not have the right, even with a warrant, to force suspects to incriminate themselves by unlocking their devices with their biological features. Previously, courts had decided biometric features, unlike passcodes, were not “testimonial.” That was because a suspect would have to willingly and verbally give up a passcode, which is not the case with biometrics. A password was therefore deemed testimony, but body parts were not, and so not granted Fifth Amendment protections against self-incrimination.” This is a powerful step forward for individual rights of usage. 

What we can gather from the world slowly adapting to these new phone features, is that the rights of users are of paramount importance to tech companies, as without the trust of consumers their business plans would be defunct and their products widely susceptible to skepticism and criticism, two things which are huge barriers to sales. Therefore it is in the interest of these firms to ensure privacy and security when handling data and by also ensuring the general public is aware of their rights. Furthermore, legal institutions have begun to link these features to the right to privacy and constitutional law, evolving the influence of technology on national legal systems.

The burden is and should always be placed on the manufacturers, whose influence on the world seems only to be getting more and more powerful. Their products have become so integral to modern life that many can forget and do forget the implications some functions have on their individual rights and their right to privacy. We must ensure that people are educated correctly about using such devices and mechanisms at play, however the responsibility to protect users comes from the tech companies and those who utilize facial recognition in these products. Our legal system must hold them to account while also ensuring that no governmental or corporate actors can gain access to our data through coercion or stealing, they provide consumers with an extra level of protection (at least in liberal democracies such as the U.S.). Yet, as proven by the California court case, the company in question must fight for the right of its users and ensure that their devices are not leaving anyone exposed to negative forces. Facial recognition in the form of unlocking a phone, seems like an innocent feature, an impressive one as well. However, in the wrong hands, it could be a dangerous weapon. All liberal actors must work together to build a framework of information and protection to allow. this feature to continue being used while putting the needs and rights of the user first.


Brewster, Thomas. “Feds Can’t Force You To Unlock Your IPhone With Finger Or Face, Judge Rules.” Forbes, Forbes Magazine, 14 Jan. 2019,

Laperruque, Jake. “Apple’s FaceID Could Be a Powerful Tool for Mass Spying.” Wired, Conde Nast, 14 Mar. 2018,

Apple, Apple. “About Face ID Advanced Technology.” Apple Support, 29 Oct. 2019,

Shneiderman, Ben. “Golden Rules of Interaction Design .” Ben Shneiderman,

Pieter Vermaas, Peter Kroes, Ibo van de Poel, Maarten Franssen, and Wybo Houkes. A Philosophy of Technology: From Technical Artefacts to Sociotechnical Systems. San Rafael, CA: Morgan & Claypool Publishers, 2011.

Ron White, “How the World Wide Web Works.” From: How Computers Work. 9th ed. Que Publishing, 2007.

Richard N. Langlois, “Modularity in Technology and Organization.” Journal of Economic Behavior & Organization 49, no. 1 (September 2002): 19-37.

How to improve the safety of ride sharing by design

Jun Nie


According to the principle of modular design, this paper divides lifting process into six different parts. By comparing the safety measures used by Chinese ride hailing software Didi Chuxing and American Uber in different modules, some suggestions for Didi’s re-launch ride sharing products will be provided.

Key words

Safety Female Sexual-assault Design Social responsibility


As ride-hailing applications become more and more popular, the safety problems of passengers, especially the personal safety of female, have become more and more serious with the repeated exposure of sexual assault cases. Didi in China even had to suspend its hitch service when two women were brutally murdered by drivers after sexual assaulting within three months. Recently, Didi hitch, which added a number of security measures and technologies, resumed trial operations. Faced with these security measures closely related to people’s travel safety, we cannot help wondering how much safe protection these special designs can provide for passengers? And what are the gaps that are not fully covered or connected closely enough? This article will help users have a deeper understanding of the travel products they use every day by sorting out the design principles and safety technologies behind the simple interface, hoping that users can make a better use of them to protect themselves during the ride sharing.

  • Background information

The emergence of ride-hailing software has brought a lot of convenience to our lives, and the emergence of ride-sharing mode has reduced people’s travel costs through resource sharing. Didi is the most popular car-hailing app in China, like Uber in the U.S.

As a largest ride-hailing platform in China, Didi has cornered more than 90 percent of the Chinese market. However, Didi’s rapid rise has also exposed problems and flaws in its product design and business logics, and the lack of safety guarantees for passengers, especially for female, is the most notably. In 2018, an airline stewardess in Zhengzhou was brutally killed while riding a Didi hitch, and within three months, another Wenzhou woman was raped and killed by the driver after repeatedly asking for help in vain on the way. The tragedies made Didi have to suspend the hitch service and reform on the safety issues in its product design and service systems. Meanwhile, Uber Technologies Inc. said it received 5,981 reports of sexual assault during 2017 and 2018, which underscores the risk that has been a chief criticism of ride-hailing companies around the world.

  • Existing problems and practical significance

By analyzing the rape and murder of a girl in Yueqing, Wenzhou, on August 24, 2018, we can find that the flaws in Didi’s product design and service made the perpetrators’ plot easier to succeed. Didi failed to respond to a passenger’s complaint about the driver the day before, so the victim was tortured the next day. The driver’s information was not available to the family until four hours after the incident, because “the information involves the user’s privacy and the front-line customer service staff does not have rights to access”. After the police intervention, they were required to provide an introduction and a police officer’s id for identity verification before they received the license plate number and driver’s information, which delayed the rescue time. These problems exposed huge defects in Didi’s user evaluation and feedback system, customer service system, and the conflicts between user privacy, information security and emergency management system were intensified in an unprecedented way in the emergency. Although the design cannot fundamentally prevent the occurrence of similar tragedies, the optimization of design and a closer connection between each module can greatly improve the safety factor of passengers and decrease the using risks. Therefore, the discussion on this issue is of practical significance.

On Nov 20, 2019, Didi hitch began trial operations in seven Chinese cities after a long and multi-faceted adjustment. This time, Didi provides multiple security measures, such as real-name authentication and face recognition before the formal use of lift service, looking through the new features for safety, authorizing the automatic recording function, learning the basic safety knowledge and completing six safety quiz. Besides, everyone needs to read and check the “Didi hitch travel initiative”, “Lift platform privacy policy”, “Lift information platform user agreement”, etc., and has a free casualty insurance provided by the platform.

These cumbersome procedures seem to temporarily calm users’ anxiety, showing great sincerity in demonstrating Didi’s efforts to ensure passengers’ safety. However, from the user’s perspective, the design principles and technologies behind these security measures remain “black boxes.” Without a deeper understanding of the safety measures used in every part of the ride, it will be difficult for users to fully trust Didi’s hitch service again. Therefore, the whole process of issuing an order to the driver, picking up the passengers and driving according to the course, sending the passengers to their destination, getting payment and evaluation should be divided into different modules supported by various safety methods. What technologies and design principles have been used to ensure the safety of the passengers need more detailed interpretation to the users.

In addition, except for the cumbersome use procedures that have caused some users’ dissatisfaction, the newly launched Didi hitch is controversial because it limits women’s access to rides from 5 a.m. to 8 p.m. Critics say Didi’s adjustment is misplaced, because they know the bad guys cannot be eradicated, getting rid of female users will be more simple. Banning women from using hitch rides late at night and early in the morning seems to reduce the likelihood of female victimization, but it is suspected of discriminating against female and violating the rights of female groups. It is absurd not to provide lift service for women at some period of time for the purpose of safety protection. Its essence is an avoidance of this kind of problem and a helpless action in the trial operation stage. In order to carry out long-term operation smoothly, a more reasonable solution must be found.

Therefore, this article divides the safety problems during the lift into six different modules and makes a specific analysis on the safety measures and technical support required by each part. By comparing the security measures used by Didi and Uber in each section, we can better understand the design logic and practical effect behind these black boxes. Finding the differences between can help them have further improvement and learn from each other. We know that the modular design made it possible to partial optimization, even if some modules are still insufficient, we can improve the whole system performance through targeted adjustment. Besides, effective interaction and connection between modular interface can make using experience become more fluent, and then passengers will gain a more comprehensive security guarantee as well.

  • Safety measures during the process
  1. Driver access qualification screening and review

The first step to effectively ensure passenger’s safety is to verify the identity of registered hitch drivers. The issue concerns what personal information is collected about drivers during the review and how much that information can ensure that drivers meet safety standards.

In the August 2018 incident, an investigation showed that the driver had a number of bad loans. Many tragedies might have been avoided if access to drivers had been strictly monitored at the beginning. In addition to requiring all drivers to upload their id cards, driving licenses and registered vehicle information for re-registration, the newly launched Didi hitch tried to cooperate with third-party credit products, such as the public security organs and list of dishonest persons to conduct comprehensive background checks on registered car owners. All users, including passengers, are required to submit id cards for real-name authentication and face recognition to ensure the authenticity of registration information. At the same time, Didi took the lead in the industry in launching a “video verification function”, which requires that identity information must be collected dynamically in the form of video, in order to prevent identity information fraud and other black industry chain cheating.

“In 2017, Uber kicked off a comprehensive effort across the company to focus on safety.”  When it comes to “strengthen background screenings for drivers”, the background-check process of Uber is very rigorous, and on an ongoing basis. “Although the criteria for background check varies by state, Uber mostly conducts digital background checks via a startup called Checkr.” Checkr screens applicants by using Social Security numbers to identify associated addresses and then reviews driving and criminal histories in national, state and local databases. Every US prospective driver must undergo an annual Motor Vehicle Record (MVR) review and a thorough background check for issues including, but no limited to, driving violations, impaired driving, and violent crime before their first trip. Uber will disqualify individuals with any felony convictions—including sexual assault, sex crimes against children, murder/homicide, terrorism, and kidnapping—at any time in the person’s last 7 years, the potential driver will be disqualified according to Uber’s standards.

Before performing annual background check, Uber was the first US ridesharing company to implement continuous driver screening technology, which monitors and flags new criminal offenses through a number of data sources to make sure that every driver meets the high standard continuously. Both its real-time tracking of crimes and its annual review of driving qualifications have helped Uber screen out a large number of drivers who do not meet the standards because of their criminal records. “During 2017 and 2018, more than one million prospective drive did not make it through Uber’s screening process, and more than 40,000 drivers have been removed from the app due to continuous screening. Therefore, strict access rules and a high-standard elimination system ensure that Uber has a team of reliable and credible qualified drivers to provide safe driving services for passengers.

Therefore, judging from the screening of drivers’ qualification, Didi still needs to carry out specific implementation in many aspects. With a large population and incomplete credit investigation system in China, it is difficult to find a reliable third party for personal information review. In addition, the real-time monitoring of crime data and timely elimination can effectively avoid the loophole between the annual qualification review, and a more detailed screening system can strictly guard against the fluke mentality and criminal behavior. Uber’s dynamic monitoring and continuous examination design are very worthy of Didi’s reference. Regular report of the number of eliminated drivers who do not meet the standard not only make a guarantee of users’ right to know, but also give an effective feedback to the public on driver background check and screenings.

  1. Order delivery and demand matching

Didi hitch is different from taxis and ride-hailing (operating vehicles). The private nature of the vehicles determines the regulatory difficulty. The ride-sharing model based on private cars is different from Uber’s ride sharing service as well. As a non-profit product that aims to make full use of private cars that regularly commute to and from a fixed location to provide convenience for others with the same travel needs. Therefore, whether a driver’s own scheduled trip and a passenger’s destination are on the same way has become an important measure of order delivery.

In the safety overhaul, Didi hitch’s “nearby pickup function” provides four commonly used locations for each car owner, such as the company, home, parents’ home, etc., which can be modified twice every 14 days. Car owners can only pick up passengers between these permanent locations. The setting of common sites can effectively avoid aimless or intentional behavior of choosing orders, reduce security risks to a certain extent. In addition, Didi limits the number of orders one driver could receive every day and sets up a dual confirmation mechanism between passengers and drivers according to local conditions and relevant regulations on private minibus sharing—“the driver is not allowed to choose the passenger while the passenger can choose the driver.” This design principle allows the car owner to invite multiple passengers whose destinations are on the way for ride sharing, and then the passenger can accept one of them or refuse. Passengers can judge whether they are traveling together or not according to the destination, the owner’s trust value and the number of trips. In order to solve the problem of delayed passenger confirmation, Didi has also designed a variety of reminder mechanisms to ensure that passengers deal with the invitation timely.

In addition, in order to provide more security for women who take late-night cars, Didi’s delivery system has made “safety” a precondition since last year, adding “safe order” steps to the previous “global optimal principle”. At present, according to the actual situation of drivers and passengers, the order system of Didi can calculate from passengers’ gender, travel habits, order distance, starting and ending positions, driver’s gender, driving habits, historical order information, complaint records and other more than 200 angles to determine whether the driver and the passenger are suitable.

If female passengers are not familiar with the design principle, they will complain that they have to wait longer than adult men when ordering a car in the middle of the night, because the good drivers with good service quality and low complaint rate may be far away from her. Although this design principle will prolong the waiting time, it can ensure the safety of female late-night travel to some extent. After the company started implementing the “safe order” in September 2018, the number of sex-related crimes committed by Didi in the first half of 2019 dropped by 70 percent compared with the same period last year.

However, the accuracy of “safe order” is limited by the detailed characteristics of both drivers and passengers, and it is difficult for some passengers without real-name registration, new registration or a small number of orders to accurately match the right driver, which will affect the travel safety. In addition, if adult men do not use the “call for other” function when they order car for their female or underage relatives, the system cannot identify the occupant accurately, and therefore the passengers will miss the protection of the “safe order”.

3. Identity verification, monitor and path tracking

When passengers wait for the driver to go to the corresponding location after making the order, the display of personal information and the communication interface of drivers and passengers should be designed specially. At present, both passenger’s and driver’s versions of Didi no longer display specific profile picture and name. The passenger’s mobile phone number will be encrypted, so the driver only can be contacted through a virtual middle number which will expire 30 minutes after the order is paid, and then they will not be able to contact each other again.

Since Didi hitch needs to be booked in advance, the time interval between booking and setting out is long. In order to avoid the change of driver, Didi requires them to conduct face recognition multiple times when they invite passengers with them and when they arrive at the passenger’s starting point. After boarding the car, drivers and passengers can choose whether to use the APP’s recording function to record the journey. These recordings will be uploaded to the platform through real-time encryption, and the recordings without travel disputes will be automatically deleted after 7 days. If someone needs to listen to the recording and restore the scene later, Didi will also request user’s authorization and listen to it in a secure and confidential environment. Similarly, Uber use “Phone number anonymization” and “Real-time identification” to keep passengers’ number private and ensuring the right driver is behind the wheel.

Another safety issue is fatigue driving. While Uber uses “Driving-hours tool” to prevent drowsy driving, which requires drivers to rest for 6 straight hours after a total of 12 hours’ driving, Didi designed a special “Orange video dashcam” to monitor the driver’s driving status at all times. If the driver blinks frequently and slowly in a short period of time, closes his eyes for a long time and opens his mouth to yawn, the recorder will send voice alerts in time and report the situation to Didi’s safety response center via 4G network if necessary.

Besides, all Uber rides are tracked by GPS from start to finish. The “RideCheck” uses sensor and GPS data to detect if a trip goes unusually off-course or a possible crash has occurred. When a potential crash or suspicious trip issue is detected, both the rider and the driver will receive a notification asking if everything is OK. Passengers can share their trip with designated loved ones who can follow their trip on a map in real time and know when they’ve arrived as well. These kinds of real-time monitoring and close connection can bring passengers a sense of security and quickly locate the passengers and vehicles in case of an accident.

4. Emergency rescue

Although both Didi and Uber have adopted a variety of technologies and measures to ensure the safety of passengers, emergency situations are inevitable. How to quickly intervene in the scene to help passengers or drivers, or to quickly locate the vehicle to rescue them after the crime becomes the key to the safety design. Didi requires passengers to set up emergency contacts before use. Once the passenger turns on the “escort mode”, the driving routes of the vehicles will be automatically shared with the emergency contact, and the platform will also pay attention to the track in real time and intervene in case of abnormalities.

In the most urgent situation, Didi’s passengers can choose to call 110 or send a text message to the police. The information of the vehicle, driver and current location will be displayed on the page to facilitate passengers to communicate with the police. Meanwhile, all emergency contacts set up in advance will also receive a text message asking for help. Uber also has a “In-App Emergency Button” in the Safety Toolkit, which can connect riders and drivers directly to 911 with a simple swipe, and the customer support team is specially trained to respond to urgent safety issues, which can provide 24/7 incident support to the passengers and drivers.

Didi outsourced customer service to an unprofessional third-party service platform before, and the front-line customer service had little authority, so they could only give feedback to the superior, which wasted a lot of rescue time in passing information from layer to layer and waiting for feedback. Since last September, Didi has upgraded its customer service capabilities. In order to ensure the professional handling of security incidents, Didi’s customer service system is divided into security system and service system. All complaints related to safety will be immediately transferred to the corresponding security team, a more professional security customer service to deal with. To better assist the police in retrieving evidence in emergency situations, Didi has also set up a 24/7 special docking team. In order to balance the need between protecting personal privacy and police evidence collection, Didi divided user information into three security levels with different retrieval processes, ensuring that 98% of the primary and secondary information could be obtained within 10 minutes. Hoping that these lessons learned from the bitter experience can help Didi make targeted adjustments and improvements, providing reliable assistance and timely rescue to passengers in times of emergency.

5.Passenger feedback and rating system

Since 2016, Didi has evaluated the scores of passengers and car owners through the “trust value”, hoping to encourage users to abide by the rules of the platform and travel in harmony by establishing a credit mechanism similar to the score system for illegal behaviors accumulated by traffic management departments. The “trust value” is designed for both vehicle owners and passengers. It is 12 points initial to measure the reliability of users on the platform. 2, 3 or 5 points will be deducted for the car owner’s lateness, malicious comments, harassment and other behaviors that result in complaints from passengers. For passengers, 2 or 3 points will also be deducted if a driver complains about tardiness and malicious comments. When the “trust value” is lower than 7 points, it will receive a platform warning; when the value is lower than 5 points, user will be banned for a certain period of time; when it encounters a major complaint or the score value is lower than zero twice, the user will be permanently banned.

The recently launched Didi hitch upgraded the “trust value” to a “behavior score” based on a micrometer scale. More behavior, evaluation and complaint data are included in the score, and comprehensive evaluation is conducted from four dimensions: performance, compliance, friendliness and cleanliness. Only those who abide by the agreement, do not cancel the order or be late, obey the rules of the platform, keep vehicle interior clean and tidy, communicate with passengers friendly during the journey can get a higher “behavior score”. Users with high scores will be entitled to the priority of passenger travel display, the priority of driving orders, the priority of new product functions and other rights and interests. Users with lower scores will be subject to stricter restrictions, such as delayed delivery of orders, cancellation of double compensation, and restrictions on receiving orders for special situations. Both drivers and passengers with behavior scores below 400 will no longer be able to use the hitch service.

Uber uses a 2-way rating system to keep both the rider and driver experience safe, comfortable, and enjoyable. Compared with Didi, its scoring system is characterized by screening effective scores among all the ratings with a certain tolerance rate, while passengers’ riding behavior is restricted by the evaluation of drivers. According to the drivers, the overall partner rating is an average of individual rating (from 1-5 stars) provided by riders from 500 or fewer most recent trips. Cancelled trips and unaccepted trip requests will not count toward the overall rating, and those unrelated individual ratings will be automatically removed when applicable. In order to avoid a low rating, the drivers should keep both the inside and outside of the car clean, try not to call riders excessively or right away, wait to begin the trip until asking the rider’s name, ask riders if they have a preferred route and avoid asking for 5 stars. Sometimes things may go wrong, drivers who face the unlucky situations are encouraged to hold a good attitude and focus on the things that can control.

A good rating system can fully mobilize the enthusiasm of drivers, even in adverse circumstances, drivers will not worry too much about the damage to the score, so that the driving service can always maintain a consistent quality. Nevertheless, “Respect is a two-way street, and so is accountability,” Kate Parker, Uber’s head of safety brand and initiatives said. Therefore, drivers have option to give riders a low rating anonymous if they habitually leave their trash behind and disrespect the drivers. On May 28, 2019, Uber announced that “riders with ratings that are ‘significantly below average’ may lose access to the app, part of a rollout of the company’s updated community guidelines, which riders must abide by to continue using the service.”

6. Safety awareness and social responsibility

It is relatively easy to design products or procedures to keep passengers safe, but it is harder to “design” in people’s mind and heart. In order to make passengers more aware of self-protection, build safer communities and create a more harmonious social atmosphere, Didi and Uber need to assume more social responsibilities in terms of safety awareness.

For Didi, the new security quiz and pre-use safe training are a good start, but Uber has already accumulated a lot of experience in this area through its own practices. Firstly, Uber has formulated clear and universally applicable Community Guidelines, which provides drivers and riders with prevention information and education materials. Secondly, Uber offers a number of safety tips to reduce travel risk for passengers. For example, passengers should take a second to double-check the App for driver’s information. Only when the license plate number, the car make and model, the driver’s photo are matched can the passengers hop in.  If everyone can form such a habit before every trip, keep alert at all times, and have a certain self-help common sense, with the help of various technical support and safety measures provided by the APP, the possibility of accidents in the car will be greatly reduced. Besides, Uber is committed to help stop incidents before they happen by partnering with and learning from women’s safety groups, building tools and policies, promoting safety. In specifically, Uber pledged $5 million to women’s safety organization, such as “Futures Without Violence”, “RAINN”, “A Call to Men” through 2022, partnered with the National Sexual Violence Resource Center and the Uber Institute to create a taxonomy which can categorize misconduct and sexual assault incidents, invited law enforcement and 100+ women’s safety and advocacy organizations to help develop the processes and technology.

These measures show that Uber is taking the social responsibility of the enterprise and making contributions to build a safer community by connecting various organizations. As mentioned above, the best product design and thorough security measures cannot eradicate evil. However, the improvement of passengers’ safety awareness, zero tolerance of sexual assault and violence in the society, and the sound post-incident judicial processing process and psychological counseling system can greatly increase the cost of violation, reduce the motivation for crime, and fundamentally avoid the recurrence of the Didi tragedy.

  • Limitations and suggestions

According to the user feedback collected by Didi, many of the complaints come from cumbersome safety measures, excessive restrictions on the car owners of hitch rides and inadequate regulations on passenger behavior. Didi still has a lot to do in the trial operation and nationwide promotion phase, and it is constantly soliciting design schemes and rectification opinions from the public, such as “whether men need the guarantee of relatives and opposite sex friends to drive a hitch”.

Although Didi has used some videos and articles to explain new online security measures and technologies, the interpretation of the design principles is not detailed and transparent enough to reassure people who have been shocked by the incidents before. Tim Berners-Lee, who proposed the Contract for the Web, considers that “we need platforms to open their black boxes and clearly explain how they are minimizing or eliminating risks their products pose to society.” Therefore, Didi still needs to put some efforts in explaining the design principles and operation logics behind the user interface. And “releasing reports” to demonstrate the progress they have made and taking more social responsibilities are good ways to build a good corporate image and restore consumers’ confidence.

In addition, Didi restricts the using time of female passengers during the pilot period from 5 a.m. to 8 p.m., which is contrary to the spirit of the contract and the development trend of the future enterprise, because “Companies must understand that long-term success means building products that are good for society and that people can trust them.” The stopgap measure not only cannot fundamentally solve the problem, but also expose the ubiquitous security concerns still remain. Thus, Didi needs to “tackle the negative (even if unintended) consequences of platform design and explore some better solutions in the coming days.


The tragedy on Didi in 2018 caused widespread concern because of its bad influence, but there are also a lot of incidents and disputes that people don’t know about, which may derive from the design of the platform, the lack of safety measures, and the sinister human nature when there is no outside supervision. This paper disassembles the ride hailing process into different modules, from passengers ordering on the APP to drivers delivering passengers safely, sorting out the security issues involved in each section and the technical support needed to solve the problems, finding out the aspects worth learning from each other and further improving by comparing Didi Chuxing, a Chinese ride-hailing app, with Uber in the U.S.

According to the modular analysis of lifting process mentioned above, we find that there are many similarities between Didi and Uber’s security measures, which also means that the technologies used to increase safety are nothing new. The most different point is how these modules interact with each other, and how to combine them in a more efficient way so that we can provide a comprehensive security protection to every user by design.


Martin Irvine, Introduction to Modularity and Abstraction Layers

Lidwell, William, Kritina Holden, and Jill ButlerUniversal Principles of Design. Revised. Beverly, MA: Rockport Publishers, 2010. 

Richard N. Langlois, “Modularity in Technology and Organization.” Journal of Economic Behavior & Organization 49, no. 1 (September 2002): 19-37.

Carliss Y. Baldwin and Kim B. Clark, Design Rules, Vol. 1: The Power of Modularity. Cambridge, MA: The MIT Press, 2000. 

Tim Berners-Lee (24 November 2019). “I Invented the World Wide Web. Here’s How We Can Fix It”. The New York Times. Retrieved 15 December 2019.

Morgan Winsor (6 December 2019). “Uber reveals nearly 6,000 incidents of sexual assaults in new safety report”. ABC News. Retrieved 15 December 2019.

Hamza Shaban (29 May 2019). “Uber will ban passengers with low ratings”. The Washington Post. Retrieved 15 December 2019.

Sara Ashley O’ Brien and Kaya Yurieff (3 November 2017). “What we know (and don’t know) about Uber background checks”. CNN Business. Retrieved 15 December 2019.

Heather Somerville (5 December 2019). “Uber Safety Report Details Sexual Assaults in U.S. Over Two Years”. The Wall Street Journal. Retrieved 16 December 2019.

Ahiza Garcia, Sara O’Brien (6 December 20219). “Uber releases safety report revealing 5,981 incidents of sexual assault”. CNN Business. Retrieved 15 December 2019.

Lora Kolodny (5 December 2019). “Here’s what Uber is doing to solve its sexual assault problem after reporting more than 3,000 incidents last year”. CNBC. Retrieved 16 December 2019.

Uber Technology, Inc (5 December 2019) “US Safety Report”. Uber. Retrieved 16 December 2019.

“Star ratings-A closer look at the ratings system”. Uber. Retrieved 16 December 2019.

“Driver Safety-Is Uber Safe for Drivers”. Uber. Retrieved 16 December 2019.

“How Driver Screenings work”. Uber. Retrieved 16 December 2019.

Sachin Kansal (17 September 2019). “RideCheck: Connecting you with help when you need it”. Retrieved 16 December 2019.


Fan App: How Design Promotes Interaction

Xiebingqing Bai


This paper analyzes how fan app is developed and serves its functions in a design perspective, mainly using theories about socio-technical system, interface design and HCI. The major research question is how fan app uses design principles to engender better interaction. As part of the whole fan economy, fan app provides multiple services to fans and is crucial for the B to C business model. Fan app emerges in a complex social and technical context. It combines a wide range of technologies to perform better interactive functions. Since fan app is a complex system, it embodies the principles of design such as modularity and hierarchy to organize diverse functions and affordances. As a main tool for fans to interact with artists and their community, fan app integrates many interface design and human-computer interaction principles in detail. Through many design clues, fan app could use its interface to affect users psychologically and socially, thus promoting a deeper communication within a community. The app will be deblackboxed and three case studies will be analyzed to conclude what aspects designers should consider ideally when developing a fan app.

Fan app in sociotechnical system

With the development of fan economy and entertainment industry, many kinds of fan app keep emerging. There are official fan apps for a single artist or a specific group to interact with fans, and third-party apps gathering many celebrities. Fan app not only is a platform for fans to interact with artists, but also serves as an efficient tool to form and solidify fan community. In this paper, I focus on the first type of fan app which is for a single celebrity or a group. These apps are mainly developed by the artist’s company, and act as a main outlet to bond with fans. Fan app is part of a bigger socio-technical system, which involves all dimensions of mediation and interface, including social, technical, cultural, political, economic and demographic aspect. As a specific technology artifact, it mediates both the underlying medium function and social cultural institutions. Its emergence relies on Internet developments, smartphone industry, talent agency companies, entertainment ecosystem and special fan culture in Asia. 

Surprisingly, although fan economy is very mature in many countries in the world, the typical cases of this type of fan app I mentioned only developed in China due to different mutual influence of actors in the socio-technical system. Firstly, there is a strong market demand emphasizing the tight interaction of artist and fans in Asia, which is very different from western countries. Compared with China, fan economy in Korea and Japan developed in the era of personal computer, so the main outlets they use are official websites. Fans use the websites to view the latest information and purchase derivative items and that communication way remains today. However, Chinese fan economy developed at a different time, when smartphone started to boom and apps become the major way for them to get relevant information. As a result, fans mainly use multiple apps as main information outlets in China. Here comes the emergence and prosperity of fan app. We can see this specific technology artifact is highly related to the development of computer and smartphone industry. The social-technical relations are networks of distributed agency with multiple kinds of agents. Technology and human agents have mutual relations and technology actors can transfer the goal of actants (Latour, 1999). Also, fan app has a delicate relationships with other social media. On the one hand, it competes with social media like Weibo for users and attention. On the other hand, those fan apps can directly link towards other social media and fans are encouraged to do some tasks for fan community in other media.

The phenomenon that fan app mainly prospers in China also has demographic reason. Compared to other countries having vibrant fan economy, the huge population in China guarantees sufficient user growth and smooth operation on a daily basis. The steady user growth and adequate user-generated content is really important for an app, because an app is an isolated information outlet targeted at a vertical market demand while web has infinite information and is inclined to open for wider audience. Since fan community for a single artist is a relatively limited group of people, only huge population base can make this kind of fan app possible. In the whole fan economy, fan app not only promotes deeper interaction between artist and fans, but also facilitates the bonding of fan community. Fans could gradually form a using habit and get accustomed to contributing to this community regularly. In this way, more loyal consumers are created for talent agency companies regarding future purchasing behaviors and activities. Fan app as an artifact and interface, constantly mediates Chinese culture, population, institutions, economy, social atmosphere, smartphone industry and Internet development. It relies on many invisible forces to make it possible. As Gregory Bateson said, “What can be studied is always a relationship or an infinite regress of relationships. Never a ‘thing’. “(Bateson, 1972) We need to put it both in social-political system and technical system to study. 

Combinational technologies in fan app

Although fan app appears to be a simple interface for us to interact, it’s actually a combination of many kinds of cumulative technologies, which could be both in device and in network service. By deblackboxing it, we can find fan app combines technologies in a scalable and extensible way. Those technologies are functioning together to form a complete system and platform. The fan app has functions of uploading text, audio, picture and video posts, watching streaming content, searching friends based on user location, joining instant chatting, online shopping, playing some simple games and interacting with the virtual simulation of artist. The microphone and camera receive video and audio information then converter transfer it into binary code and send via a specific radio frequency. It also combines the technology of photo-polishing app to airbrush some images, changing the color value of pixels. And the length of binary code could be cut to edit the audio messages.

Most fan apps have their featured streaming video content about celebrity’s daily life and special activities, which could be watched while not completely downloaded. Streaming technology is a combination of Internet technology and audiovisual technology. Different from conventional information transmission using TCP/IP protocols to divide data into small packets then assemble them in the end-users, streaming technology usually uses RTMP and HLS protocols. Streaming video stores some data in buffering area to preload some video data in users’ device. When the bandwidth speed becomes low, the program will use some data from buffering area to guarantee a continuous video playing and reduce video lag. The buffer usually doesn’t need so much memory as we may imagine, since it uses a kind of loop structure to throw up data having been played and keeps emptying space constantly for buffering other content. Since streaming content needs the bandwidth speed quicker than playing speed in device, we can regard the buffering as a kind of redundancy to make up for the potential Internet interference, which is a common design strategy for steady information transmission.

Moreover, fan app always use GPS function to locate users and allow them to find other users nearby, further promoting the bonding within a community. GPS system uses GPS satellites to pin down our accurate coordinate on the earth. And those apps can identify who you are when next time you log in via cookie. Cookie is a small file stored in client device, associating with a particular web to record the user’s information and preference. At first the file become bigger and bigger while user browse the website frequently, the local storage can’t hold the file if it continues to expand. Then a smart design comes, which links the file with a unique ID thereby what stored in device is just that ID, and the whole file is stored in a server of that company. Users will be identified by their ID next time log in. 

In most fan apps, users can have directly access towards other social media. And if you want to purchase something in fan app, you will be linked to other transaction apps like Ali Pay. The functionality linkage of different apps is achieved via API (Application Programming Interface). API enables apps to connect different services and bring more functions to people without knowing their complex source code, which is convenient for apps to combine functions from multiple platforms. Some fan apps create a digital simulation of artist for fans to click and interact. When users click on the simulation appearance, they can enjoy a virtual conversation with artist. That function utilizes the magical touchscreen technology. Each pixel on the screen is coordinated and matched with a mapped grid pattern. When we touch pixels of that simulation image, the invisible wires on the pixel display layer can sense the voltage change and detect the specific location, then microcontroller will translate the location information and send the response to the pixel layer according to the location. In this way, further animation effects will be triggered as a feedback when we press the matched area on the screen. What’s more, as other apps, fan app have an option allowing users to clean cache in the set up module. Cache memory is a software component to store data besides main memory. Since the CPU speed is much faster than main memory, it will take a while to retrieve data from main memory. The data cache stores is the result of recent computing, which could be directly requested in a faster way if CPU needs them again. Although we can’t tell this processing difference, the processing system is designed to optimize the possible computing performance. Combing technologies of receiving, transmission, presentation, location and processing, fan app uses fan culture as a lens to orchestrate all cumulative and scalable technologies.

The modularity and affordances of fan app

Most fan apps have clear hierarchy, layers and modules. The complexity of actions are abstracted into several big concepts and modules. In a fan app called Vae+, there are four layers in the design hierarchy overall.  The similar functions are gathered in a subgroup.

Vae+ is a fan app designed only for one Chinese singer Vae. We can see it’s a complexity composed of multiple modular parts serving a particular function. As Langlois said, “one way to manage complexity is to reduce the number of distinct elements in the system by grouping elements into—by hiding elements within—a smaller number of subsystems.” (Langlois, 2002, P20) This app has gone through several iterations, and constantly add more functions. Modularity allows for repeating reconfiguration to match different purposes, such as adding more subgroups in the community module. With clear abstraction layers, all functions could be easily indexed. This modular design is similar to other fan apps, although they have different level of complexity. Another fan app M77 is designed for a Chinese actress Shuang Zheng. It has modular parts of posts, announcements, badges, messages and setting. In general, fan apps commonly have a home page for latest news and announcements, a modular part for communication within the fan community, a module for purchasing items.

Those fan apps also have obvious affordances and constraints. As part of the smartphone system, it already has some built-in affordances with the touchscreen, camera, microphone and keyboard. Under the purpose of those apps, they possess some particular affordances. Norman divided the real affordances and perceived affordances of artifacts. He said “in graphical, screen-based interfaces, the designer primarily can control only perceived affordances.”(Norman, 1999, P39) Perceived affordance is more important because it determines the most possible relationship between user and artifact, and how people will use the artifact at first. All digital artifacts belong to a shared landscape of potential affordances, fan apps also embody the procedural, participatory, encyclopedic, and spatial design strategies (Murray, 2012). Take Vae+ as an example, all tasks insides belong to a set of procedures. Every modular parts and design clues encourage a participatory user experience. The information on home page and posting modules is very comprehensive for this particular community. And the virtual appearance of that artist emphasizes a spatial sense by depicting his background architectures in detail. Some legacy affordances of newspaper and map are transferred into digital formats and have been refined in the process. Although fan app is a product developed under a vertical and narrow need, it actually offers a diverse range of possibilities for users to navigate. “Direct perception of possibilities for action is what the concept of affordance is about.” (Murray, 2012) In this app, users can enjoy the specific affordances of viewing information, interacting with artist, posting thoughts and artworks, joining communities, communicating with others and playing games. Users can even edit the audio concent and polish photos. And most affordances are in line with users’ mental model. But constraint is users can only upload one type of post in each subgroup within the community modular part. And there is a size limit for video upload.

Information richness is a result of many affordances the apps have. According to media richness theory, fans turn to those fan apps partly due to the richness of this particular kind of media. There are four aspects evaluating the media richness: the availability of instant feedback, transmit multiple cues, use of natural language rather than numbers, the personality focus of the medium (Trevino, 1990). Compared with other media outlets for communicating with artist such as websites and social media, those fan apps have much richer information. The content users post can be commented or liked instantly by other users. It’s also more likely to obtain opportunities to interact with artist in the app. And you may get instant feedback while having a virtual conversation with a simulation of artist. The diverse affordances allow users to generate many kinds of input, including text, images, audio message, videos and behaviors. And since a fan app is merely designed for a unique community, users can post more personal thoughts and the context could be better understood.

Interface design and humancomputer interaction of fan app

The two keys of interface design are page flow and page layout. The former establishes clear architecture and strict logic, while the latter integrates scattered information and determines clear primary and secondary relations. In this part, I will use three fan apps Vae+, M77 and Yianfan as case studies. Vae+ and M77 are designed for one single artist while Yianfan is for an artist group. The first principle of interface design is consistency, which means consistent actions, terminology and layout. We can see these fan apps are designed based on the characteristics and style of a particular artisit. The overall interface design of Vae+ is simple and elegant, aligning with the personality of the singer Vae. This app uses soft blue as dominant hue, and blue is for the font and button while white is for the major layout. At bottom there are five buttons originally filled with white for major modular parts, when users click on one button, it will suddenly fill with blue. And in the UGC module, the submodules for different topics represented by icons drawn with blue lines. 

On the right there is a blue semicircle, when click on it, it will turn into an artist’s cartoon face, then the whole simulation image of the artist will be triggered for users to start a virtual conversation. That image is also against a blue background, with architectures and stars in white. Those architectures are some featured buildings in Beijing, where the artist lives. Those detailed could make fans feel very relatable. This part of design also reflects the principle of preventing errors. Since triggering the simulation image is a huge action in this app, and users are very likely to click on the blue semicircle inadvertently when viewing the home page, so the designer adds one step to avoid possible error. If users want to trigger the virtual conservation, they need to firstly click on the blue semicircle then click on the artist’s cartoon face. Those two steps of actions avoid a great amount of potential errors if there is only one step. To prevent errors in a design, we could add more steps for confirmation besides exerting some constraints on an artifact. 


In M77, the overall interface design is also consistent with the personality and style of the artist, a young lady. Most visual elements are dominated by purple, although with some color changes. The menus are underlined and filled by purple. Since “bubble” and “planet” are core concepts in this app, designers make many buttons and visual elements in the shape of circle to emphasize this concept, and it also has sound effect of bubble when clicking on the major buttons. In the dairy part, every date is also capsuled in a shape of bubble. Consistency in a sense is not only represented by the specific design details such as font and terms, but also how a product could be used and experienced as a whole, how a concept could be embodied consistently throughout a product.

Yianfan is developed for a group of teenager boy artists, so the app is designed under the context of a high school and users are regarded as their parents. The underlying concept for users in this app is raising their kids and watch kids’ growth. The overall visual style is very vivacious to align with their young age, with the dominate color of red and yellow. The icons also have many elements of high school, such as schoolbag, brush pot and notebook. And there is a module where designers create a virtual land modeled after a school, which could generate a relatable atmosphere among users. The whole layout promotes a sense of child raising and precipitate a special bonding between fans and artists. 

What’s more, those apps all use stack style and list style to display posts and announcements. Specifically, the stack style is mainly used for artist’s posts and official videos. In Yianfan, stack style also applies for music display. Users need to slide from left to right to view next stack. And list style is used for displaying users’ posts. Stack style is more eye-catching so users can immediately notice artist’s posts, which is what they most care about. That embodies the design principle of visibility, meaning the important part should be made very visible and salient. And list style could accommodate more and is suitable for showcasing user-generated content. The consistent displaying style also fosters a viewing habit to pay more attention on what designers want them to care more. All in all, consistency in essence means accordant representations for a set of symbols inside an app, applying an identical relationship between signifiers and the signified. Throughout design consistency could help users understand and further apply the underlying design logic and concept of an app, and simultaneously reduce possibility of errors. Compared with other kinds of apps, the consistent design style in the case of fan app is more important because it not only results in a more smooth user experience, also has a salient effect of immersing fans into a world with unique atmosphere and visual features, promoting the perception of a deep bonding within the community. 

The most remarkable thing in fan app is how they use design methods and clues to encourage more using time and acquire more user attention. First method is using many ways to identify and classify different users. Besides being marked with level, users can illumine particular badges when they achieve certain missions in M77. Different badges are designed for different kinds of users, such as users with a special talent or expertise, and users having attended certain activities. In Yianfan, users will be identified by which grade they are in this school. These special identifications can generate a competition. To achieve a higher level and a specific virtual accomplishments in the app, users will consciously commit on doing certain tasks. 

Plus, constant rewarding feedback is very crucial for continuous attention. In Vae+, designers create pine cones as the virtual money in the app. Fans can use those pine cones to do many things in the app. When users stay in the app for a certain duration of time, more pine cones will be collected. And there is a ranking list in the app, exhibiting users contributing most in a particular task. Different from other fan apps, in Vae+ the derivative items in shopping module can only be purchased by the credits in the app, rather than real money. And credits are collected through particular user behaviors in the app. This design changes what a shopping module usually could be, reframing a shopping context into a rewarding context. As Murray said, “Innovative design is often the result of reframing familiar activities, such as rethinking the context in which they can be performed.” (Murray, 2012, P26) In M77, when you reach certain level, the according privileges will be unlocked. An interesting design is that you can only know the privilege next to your level, but can’t know further beyond that. That sense of mystery stimulates a desire for continuous exploration. In Yianfan, there is a module for users to win a gift randomly. Everyday when you click on that module, a virtual gift will be presented. That’s really attractive for users to regularly stay in the app because you never know what gift you will get next time. When you accomplish the goals set in the app and obtain rewarding feedback as a kind of confirmation, you could acquire a sense of ritual and fulfillment. Rewarding feedback to a degree serves as a guidance providing users with directions about the ways they should use the app. Human expectations for artifacts are fostered by learning process, and gradually become habitual after a period of time. Constant rewarding feedbacks can form the using habits unconsciously. 

Additionally, interaction is the most significant function in fan app. Good interaction design can cultivate positive two-way communication and promote collective engagement. From Douglas Engelbart to X PARC and Steve Jobs, designers put more and more emphasis on human input in the field of human-computer interaction. As Murray said, designers need to transfer “users” into “interactors” (Murray, 2012). And the interaction process need to be consistent with users’ expectations to engender a sense of direct manipulation. In Vae+, as other social media you can comment and like a post, announcement and a piece of news. You can click on the simulation image of artist to start a conversation and every time there are several options provided for users to choose as a response to the artist. The simulation image also has some animation effects to emulate a realistic conversation. And this app provides an internal game module for users to play with each other, in this way fans in the community could have further interaction even without the artist. In Yianfan, you can send virtual flowers to the artists. Every time you enter the app, a voice of one artist will be played automatically to welcome you. Sometimes interactive videos will be launched inside, which will give users free choice to choose from several options at certain times in the video, and different users will see different results based on their unique choices. Fans can also write letters for their favorite artist via the app. And there is a space full of different virtual gifts for users to explore, which is designed as a playland. Some clues such as arrows are provided inside about how to interact with this interface. 

In those apps. Confirmative feedback is essential for a more confident interaction. When each button is clicked, a special effect will emerge as a confirmation. In Vae+, a button will be filled with blue color when you click on it. In M77, a button will change into purple with an animation effect such as spin and swag and a sound effect. And this sound effect applies throughout every step in the app. In Yianfan, a button will be filled or underlined by red. Confirmative feedback is part of the user-centered design and conforms to users’ psychological need. 

According to situated action theory, using behavior is a complex set of actions influenced by surrounding social and material world, rather than just mainly controlled by a specific way computer pre-scripted (Lucy Suchman, 2011). So designers not only need to know the purpose of an artifact, but also under what circumstances people will use it and how it could be used. In fan app, one person’s single behavior is a part of collective behaviors and is constantly affected by others. Many functions in fan app must be achieved through computer-supported collaborative work. In Vae+, the user-generated content is clearly divided into many subgroups such as daily lives, jokes, voices, games, photos, literature, popular topics, questions and announcements, and people can only post one type of content in each group. That restriction is solidified through a great amount of posting. What one person behaves is unconsciously defined and confined by others. In M77, content commented or liked by the artist will be displayed in a salient way, and those content also motivates others to generate similar posts to increase the possibility of being noticed by artist. To promote certain behaviors, designers should create a certain context for users rather than just specific modules and elements. 

All in all, fan app design should consider three levels of core human need (Murray, 2012). Firstly, it needs to have multiple functions, including providing relevant information about activities and tours, posting user-generated content in many forms, communicating with other people, interacting with artists, purchasing items and doing certain tasks. Those functions set some particular goals for users when employing the product. And for fan apps interactive function is the most important. Secondly, designers should investigate the underlying social context, relationships and value this app reflects and incorporate such context into the overall design. Fan culture has its unique conventions and behavior standards, and within different fan communities this culture has delicate mutations. The context of this kind of app is also quite different from other social media platforms where more conflicts are likely to appear and public topics are more welcomed. The communication context in fan app is generally more personal and younger with a unique discourse system. The app design should align with the particular system of symbols and communication of this specific fan culture and online community. For example, playing a voice of the artist automatically is a way to suddenly bring users into a certain context when they start to use it. Inserting songs and some special effects in the app could create an immersive atmosphere. And the overall design needs to be centered on an underlying concept in line with a certain context, such as a planet, village and school (a metaphor the whole community agrees on). Moreover, fan app is a part of general and enduring human activities and values. Human always tend to live in a community throughout thousands of years, and it’s the same in cyberspace, but it’s hard to find a group you can commit to in the open-ended website due to anonymity of uesrs. Nowadays we are not connected by geographical proximity anymore, and we are grouped by common interest and value. Fan culture is a powerful glue to aggregate a bunch of people tightly. And fans need to get some positive feedbacks when they continuously support an artist, but in China the offline fan activities don’t have a very mature operation pattern like Japan. What they actually need is not realistic interaction, but a sense of constant interaction and fulfillment in order to get the reason for keeping on. As a result, fan apps meet fans’ demand to associate with a community in cyberspace to stabilize a using habit and get positive feedbacks from their favorite artist for their further online behaviors. 


Fan app is a digital artifact embedded in Chinese special social-technical environment, and it has developed from simple version to a more interactive iteration. It uses fan culture as a core value to bring a diverse range of technologies together, and applies modularity to organize those functions in a hierarchy system to achieve information richness. For a good user experience, design details should cling to the consistency principle and accurately match with multiple affordances. Different display styles should be chosen for making important content conspicuous. Multiple design strategies should be applied to motivate user engagement, such as identifying different users by unique marks and providing constant rewarding feedbacks. Interactive design should meet with users’ mental model and give confirmative feedbacks. Design should also investigate the surrounding context of user behaviors and three levels of core human needs to draw the underlying design logic. Nowadays the interactivity is much more important for fan app than other social media. How to design a better interactive interface and how to use design clues to encourage deeper interaction is the eternal question in fan app designing.


Galitz, W. O. (2007). The essential guide to user interface design: An introduction to GUI design principles and techniques (3rd ed). Indianapolis, IN: Wiley Pub.

Latour, B. (1999). Pandora’s hope: Essays on the reality of science studies. Cambridge, Mass: Harvard University Press.

Murray, J. H. (2012). Inventing the medium: Principles of interaction design as a cultural practice. Cambridge, Mass: MIT Press.

Lo, S.-K., & Lie, T. (2008). Selection of communication technologies—A perspective based on information richness theory and trust. Technovation, 28(3), 146–153. 

Suchman, L. (2011). Anthropological relocations and the limits of design. Annual Review of Anthropology, 40(1), 1–18.

Complex engineered systems: Science meets technology. (2006). Berlin ; New York: Springer.

Norman, D. A. (2002). The design of everyday things (1st Basic paperback). New York: Basic Books.

Kornberger, M., & Clegg, S. (2003). The architecture of complexity. Culture and Organization, 9(2), 75–91.

Feldman, S. (2004). A conversation with Alan Kay. Queue, 2(9), 20.

Couldry, N. (2012). Media, society, world: Social theory and digital media practice. Cambridge ; Malden, MA: Polity.

Market Economy Institute of RAS, Loginov, E. L., Shkuta, Т. А., Market Economy Institute of RAS, Loginova, V. L., & Market Economy Institute of RAS. (2018). The supersystem of the digital economy: Functioning and development based on the principle of self-organizing integration. Market Economy Problems, (4), 48–53.

Baldwin, C. Y., & Clark, K. B. (2002). The option value of modularity in design: An example from design rules, volume 1: the power of modularity. SSRN Electronic Journal.

Norman, D. A. (2011). Living with complexity. Cambridge, Mass: MIT Press.

Stone, D. L., & Open University. (2005). User interface design and evaluation. Amsterdam ; Boston, Mass: Elsevier : Morgan Kaufmann.

Arthur, W. B. (2009). The nature of technology: What it is and how it evolves. New York, NY: Free Press.

Anderson, Janna and Rainie, Lee. (2012). The Future of Apps and Web. Pew Research Center’s Internet &American Life Project. 

Nothing New: Deblackbox the New Oriental TOEFL-Preparation Website


In the background of economy boom, studying in the USA becomes more common. Therefore, more people need to take TOEFL tests. Many websites designed for TOEFL preparation appear and compete in the market. The New Oriental website is a major one among these websites. This paper deblackboxes the New Oriental website with ten Usability Heuristics proposed by Jakob Nielson to explain why this website’s interface is designed in that way, with multimedia and its HTML coding to demonstrate how it achieve these functions and contents, and with a sociotechnical background to illustrate how this website’s appearance is possible. After this deblackbox, it’s found that sociotechnical background is important and necessary for the New Oriental website’s development, and nothing innovative in these design principles and this website’s multimedia functions and contents. The only new thing in the New Oriental website is its way of integrating these principles, functions and contents. The pros and cons of “nothing new” characteristic in the New Oriental website, as well as other products will be analyzed. Considering both advantages and disadvantages of “nothing new”, how to integrate design principles and technologies within a certain sociotechnical background is the key to a product’s success in the short term.



With the development of technology, test preparation has already gone out of the classroom. Nowadays, students can prepare tests with online resources, including videos, audios, texts and images. Some will collect learning resources by themselves from sharing platforms, others will pay to receive them from professional test-preparation institutions.

Many English test-preparation websites have appeared and thrived in China, especially those for Test of English as a Foreign Language (TOEFL) preparation. That’s because in the recent ten years, more and more Chinese student studies in the United States. In 2018/19, there are 369,548 Chinese study in the USA (IIE, 2019). Most of these students need to take TOEFL tests, but Chinese English education is not specifically for preparing it. In this case, many private institutions offering TOEFL preparation services thrive in the market. Due to the fierce competition and high profit, these institutions invest lots of money in improving their services, including the website where students can learn English and prepare for the TOEFL. These websites are designed specifically for TOEFL, with abundant learning resources in the form of text, graphic and audio.

In this paper, one of the major TOEFL-preparation websites in China- New Oriental – will be deblackboxed. The design principles, functions, contents, and socio-technical background behind it will be analyzed to answer these questions: why they are designed in this way and why it could be possible.


2. Deblackbox

2.1 Implementation of Usability Heuristics

Back to 1994, Jakob Nielson proposed ten general design principles for interaction design to enhance product’s usability, which means that users can use this website with efficiency, effectiveness and satisfaction. (ISO, 2018) These ten design principles are called “Heuristics”. These ten Usability Heuristics serve as guideline for UX designers to design and test their websites and applications. The New Oriental website is also designed with these principles.

Visibility of system status

Figure 1 A loading icon in the red frame

Every time a user clicks to a new website page, a book with turning pages will appear in the area where is being loaded (Figure 1). This book is an appropriate feedback to tell its users what is going on in the system. In addition, this indication of loading lasts within reasonable time. The long-time waiting for loading also makes users confused. In the New Oriental website, if loading time is more than 10 seconds, the page will turn into a “No Internet” page (Figure 2) to tell users that their internet connection is problematic. Other causes of long-time loading also have their correspondent notification pages.

Figure 2 No Internet page

Match between system and the real world

The New Oriental website “speaks” the users’ language. The words, phrases and concepts it uses are familiar to its users- people who are preparing for TOEFL. No system-oriented terms are used in this website to cause confusion and misunderstanding among users.

Figure 3 Terms in the navigation bar 

User control and freedom

When its users click on a wrong term, they will enter a page where they don’t intend to read. In order to return to the former page, the users can simply click the backward button in their browser, or they can click on the icon “新东方在线TOEFL” (“New Oriental TOEFL” in English, as indicated in the red frame in Figure 4) in the top left to return to the home page. This icon is always on the top left of every page of this website, which means no matter in which page the users are, they can return to the home page conveniently by clicking on it.

Figure 4 The “New Oriental TOEFL” icon in the red frame

Consistency and standards

This website possesses both internal and external consistencies. In the part of internal consistency, every function and every content in this website only have one consistent title. Also, it uses the same typography for contents and titles, same color to highlight the page where a user is. For external consistency, this website uses the same or similar icon as what other websites use when their function is the same. For example, there is a shopping cart icon on the top right of the page (Figure 4). This shopping cart item is the same or similar as others in other websites (Figure 5).

Figure 5 Different kinds of shopping cart icon

Error prevention

Error prevention concept is implemented in many designs of the New Oriental’s website. For instance, the gap between different terms is large enough to avoid users clicking on a wrong one by mistake. Different functions are arranged in different blocks, which is logically clear so that it lowers the chance of entering into an incorrect page or missing in a mess of functions.

Figure 6 Different blocks of functions

Recognition rather than recall

This website offers users functions and contents to recognize whether they are the ones they need or not, rather than provides them with an input box to type in. This latter design increases users’ memory load because it requires them to recall their memory. The users can simply find the functions and resources they need in the website by browsing.

 Flexibility and Efficiency of Usage

Designers increase websites and applications’ flexibility and efficiency by adding accelerators, which could be hot keys or options in interface. In the New Oriental website, hot keys defined by computer systems could also be used. The users’ usage history recorded in cookie also increases the efficiency. For example, the next time the users can log in automatically with their information in cookie. 

Other Usability Heuristics- including aesthetic and minimalist design, assistance for users to recognize, diagnose, and recover from errors, and help and documentation- are also implemented in the New Oriental website. All these ten heuristics together ensure the New Oriental website’s usability of this website.


2.2 Multimedia

The deblackbox of ten Usability Heuristics’ implementation in the New Oriental website explains to us why this website’s interface is designed in that way. However, the website has other characteristics besides the usability of interface: it also affords different components to enable “semiotic interactions with software and transformable representations for anyone taking up the role of the cognitive interpreting agent”. (Irvine, 2018, p5) Therefore, a deeper deblackbox is needed to give an insight for the website’s contents and functions. The New Oriental website not only has exercise sections where users can do tests, but also have sections that serve other goals, such as advertising board to sell test-preparation courses, message board for user online communication, and personal information section for users register their information or trace their study outcome. Multimedia plays an important role in the semiotic interactions between these sections and users. Each media used in this website will be analyzed in the following paragraphs.

2.2.1 Text

As pointed out by Ron White, “the Web page itself consists of an HTML text file. HTML is a collection of codes enclosed in angle brackets- < >- that control the formatting of text in the file.” (2008, p370) Like other websites, the New Oriental website is also developed by HTML codes. This section will illustrate how HTML codes present text in the New Oriental website.

Figure 7 A sentence in the web page

Figure 8 The codes of the sentence module

 In the Figure 7, there is a sentence in the text. The Figure 8’s blue area indicates its correspondent HTML codes. As indicated in Figure 8, there are 11 lines of HTML codes to present the one sentence in the Figure 2.2.1, from <p style> to </p>.

Figure 9 The area of the sentence

The first 5 lines are for defining the area in which  the sentence locates (the blue area in Figure 9) and the characteristics of this area. The next two lines define the typography, font size and color of the sentence. The string of the sentence is in the 8th and 9th lines, between <strong style= “max-width: 100%;”> and </strong>.

2.2.2      Graphic

Figure 10 A picture in the web page

Figure 11 The codes of the picture module

The Figure 10 is a picture in the website page. The second image indicates its correspondent HTML codes. As indicated in the figure 11, there are 6 lines of HTML codes to present the one picture in the first image, from <p style> to </p>. Compared with codes of text, the codes of picture look shorter and simpler. However, they are more complex in fact. Because picture can’t be written in codes directly. If a designer wants to put a picture in a website, the picture should be transformed into a universal resource locator (URL) and then the URL can be written in codes as text (White, 2008, p370). Graphics don’t exist in a website’s HTML codes but in the website’s graphics server. For example, the Figure 10’s URL is The title of the picture should be typed before the picture’s URL, although the title will not appear in the website interface unless the user moves the cursor on the picture. When a user opens this website page, the browser will send a request to the New Oriental website’s server to ask for that page’s document. Then, the server will send the document back to the internet provider address of the user’s browser via internet. At the same time, the website server sends an instruction to the sites that contain the picture identified in that website page’s HTML coding, telling the site to send this picture to the user’s browser.

2.2.3      audio


Figure 12 The audio player in the New Oriental website

Figure 13 The codes of the audio module

The codes of audio playing module looks quite complicated but most of the codes in Figure 13 are used to define the user interface design of this module, including the button, the progress bar, and the color. Only three lines of codes (as indicated in the blue area of the Figure 13) are for audio’s URL. The HTML5 <audio> element specifies a standard way to embed audio in a web page, so that

audio files no longer have to be played in a browser with a plug-in (like flash). Similar to graphics, when a user opens this website page, the browser will send a request to the New Oriental website’s server to ask for that page’s document. Then, the server will send the HTML code document back to the internet provider address of the user’s browser. Also, the website server sends an instruction to the sites that contain the audio identified in that website page’s HTML coding to tell the site to send this audio to the user’s browser.

2.3 Sociotechnical Background

After deblackboxing the New Oriental website’s GUI and its contents and functions, this website’s interface design and the technical foundation become clearer. However, as a public website that is built on internet, its sociotechnical background should also be analyzed. The third step of deblackbox is to analyze the technological and social background that make this website’s development possible.

Figure 14 Sociotechnical background of the New Oriental’s website

2.3.1 Social Background

The economy boom of China increases the number of middle-class and high-class family, which have money to support their children’s education out of public schools in China. With the trend of higher education internationalization, the number of Chinese students studying in the United States has been increasing rapidly. According to statistics from the Menhukaifang Report, from 2006 to 2016, the total number of overseas students sent by Chinese universities rose from 67,723 to 328,547 (Lu, 2018). Most of students who want to study in the United States have to take TOEFL tests, so this surge increases the demand for TOEFL preparation institutions that can offer professional test-preparation service for students. Also, the number of Internet users in China reached 649 million in December of 2014, and the proportion of Internet access through desktop computers and laptop computers was 70.8% and 43.2% respectively (CNNIC, 2015). In sum, the demand for professional TOEFL preparation services and popularization of computer lay a foundation for the New Oriental website’s development. Moreover, the New Oriental website cooperates with Educational Testing Service (ETS) to offer official exercise to its users. This international cooperation could improve their test preparation’s quality and effectiveness.

 2.3.2 Technical Background

The usage of the New Oriental website is inseparable from many internet technologies. Its URL can direct users’ browser to its site. A user can type “” into the address space on the browser’s toolbar, or search “New Oriental” in any search engine to enter this website. Each part of the URL has its meaning (White, 2008). For example, “http” means that this website uses HTML codes; “liuxue.koolearn” is the domain name and this name is unique.

The data (texts, graphics and audio) is transmitted in the form of packet via internet between different servers and browsers. Transmission Control Protocol (TCP) and Internet Protocol (IP) guarantee the packet’s correct and efficient transmission. Those packets are made up of sequences of 1 and 0 binary codes, which can be sent through electric wires, fiber optic cables, and wireless network.

All these technologies come together not only serve the New Oriental website, but also other websites.  


3. Conclusion

With this deblackbox, we know that the New Oriental website is not simply a website built with HTML, rather an interface that connects users to a complex dynamic internet world, where servers, browsers, packets, and codes work continuously and cooperate mutually in order to present the interface. In the New Oriental website, ten Usability Heuristics are used in its interface design to improve its usability; multimedia, such as text, graphic and audio, enriches the website’s contents and functions; sociotechnical background makes the website’s emergence and development possible. 

Although the New Oriental website offers a unique experience for users, there is nothing innovative in its design and technology. These design principles could also be found in other well-designed website; HTML is a common coding language amongst most of the websites in the internet; multimedia has appeared and been used in online education for a long time. These existing constructive elements reduce the time and fund used in research and development of new products. However, they also make products easily replaceable. There are many websites that have similar functions with the New Oriental’s, such as KMF, Xiaozhan TOEFL and New Channel. The little innovation in the technology and design of these websites intensify their competition.

Making innovation is not easy. Designing a new format of a website which is different from the one suggested by design principles is dangerous to some extent. If the new design is not more understandable and learnable than the one designed with the principles, the users will be confused and even stop using it, which could be highly possible for the reason that the competitions for users’ time between applications and websites are cruel nowadays. Inventing a new function or even technology is also dangerous and difficult. Billions of dollars are invested in research and development (R&D) by tech-companies around the world, especially the giant tech-companies, like Google, Facebook and Amazon. For example, in 2017, Google’s parent company spent 21.4 billion U.S. dollars on R&D. (Statista, 2018) However, R&D is a long-term strategy that may not bring an instant lucrative breakthrough, and whether the new technology will be invented successfully and bring benefit to its investors or not remind unknown. (HowDo, n.d.)

Therefore, in the short term, the creative way to integrate existing design principles and technologies within a specific social and technological background becomes the key point to a product’s success. This is the most important suggestion provided by the deblackbox of the New Oriental website.



Blake, R. J., & Guillén Gabriel A. (2008). Brave new digital classroom: technology and foreign language learning. Washington, DC: Georgetown University Press.

China Internet Network Information Center (CNNIC). (2015). “35th Statistical Report on Internet Development in China”. Retrieved from (2015). The Internet: HTTP & HTML. Retrieved from

Hang Lu. (2018). Chinese universities in the last ten years (2006-2016): Development Status, Characteristics and Prospects of Students Studying in the United States —— the Analysis based on the “Open Report”. Modern Education Management. 2018(2).

HowDo. (n.d.). Learning from R&D leaders. Retrieved from

IIE. (2019). Number of International Students in the United States Hits All-Time High. Retrieved from

ISO 9241-11:2018. (2018). Retrieved from

Martin Irvine. (2018). Computing with Symbolic-Cognitive Interfaces for All Media Systems: The Design Concepts that Enabled Modern “Interactive” “Metamedia” Computers.

Martin Irvine. (2018). The Internet: Design Principles and Extensible Futures.

Ohanian, T. (2017). 10 Usability Heuristics for User Interfaces in Web Design. Retrieved from

Statista. (2018). Annual research and development expenditure of Alphabet from 2013 to 2018 (in million U.S. dollars). Retrieved from

White, R. (2008). How computers work. Indianapolis, IN: Que Pub.

World Leaders in Research-Based User Experience. (n.d.). 10 Heuristics for User Interface Design: Article by Jakob Nielsen. Retrieved from

W3C. (n.d.). HTML5 Audio. Retrieved from

My Television is a Computer: Design Thinking in Apple TV’s Interface


            For the past decade or so, television has been undergoing a shift  from a broadcast medium to a digitally networked one. And while such a shift entails certain technical advances and the designs therein, it also indicates a need for redesigning the user interface of the television. Unsurprisingly, Apple has engaged with the design of digital television interfaces from the very beginning, primarily through their Apple TV product. This paper takes the Apple TV – and the constituent parts which are involved in its human-computer interaction – as a case study in the design problems posed in applying computational media to television. Namely, it first considers the Apple Remote, and the challenges of using a remote as an input device for a computational machine, as opposed to a keyboard and mouse. Secondly, it considers the two graphical user interfaces designed for the Apple TV, the Front Row interface and the tiled grid. This section deals with the design principles common for 10-foot user interfaces while also dealing with the ways in which these two interfaces accomplished different purposes within the sociotechnical system built around Apple TV.


In Apple’s never-ending quest to achieve unparalleled dominance in every segment of the media market, their forays into television proved multiple and varied in success. Even laying aside attempts which largely failed, such as Macintosh TV and the Apple Interactive Television Box, the history of Apple TV – or iTV as it was initially called – is convoluted and filled with eccentricities. The Apple TV as a product, initially introduced in 2007, underwent a number of iterations, even to the point that branding aside, the Apple TV which one might purchase today, in 2019, is designed to accomplish very different tasks than the one released in 2007. These iterations mirror trends in consumer computing and the rise of streaming video services. Even given these technical evolutionary stages, however, one persistent design problem (and its solution) separates out Apple TV as indicative of digital television and distinct from other technologies in the late 2000’s through present day: how to design a computer as a TV. From the perspective of 2019, this question may seem quaint, or even banal. After all, the answer seems straightforward enough, simply connect the TV to a laptop through an HDMI cord, or another technology to post-date Apple TV, Google Chromecast. But neither of these solutions actually solves the problem that Apple was tackling. Instead, they create systems where the television screen acts as a mirror for the PC or laptop monitor. For the user to manipulate the program, they turn to their laptop to do so. In other words, the television is a glorified monitor, and not a unique computing machine with both input devices and interfaces designed for the medium of the TV. The problem of TV interface design is one which defines the cultural moment of 2007 to 2009 where the role of the television decidedly changed from that of a broadcast medium to a medium which took full advantages of the affordances of networked computing (Braun 2013). These affordances, which according to Simon and Rose (2010) had been incorporated in limited ways with television technologies since the 1990’s, required distinct sorts of design choices for the expanded application of computational affordances in the last half of the 2000’s: designing the television as a system of information and designing the television as a user interface. The first goal of this paper, then, is to explore design choices which informed technical components of Apple TV employed in several iterations: data synchronization and data streaming. This section will also discuss the consequences of those choices in terms of design and user experience, as well as reasons for the ultimate triumph of streaming over synchronization. Second, this paper will seek to understand the design principles behind Apple TV as a computer designed for the television and according to its affordances and constraints; which is to say, without textual input or an X/Y pointing device (i.e. a mouse). This section will discuss the Apple remote as an input device and the principles of 10-foot user-interfaces at large, as well as the specific design of the Apple TV interface.

Computing TV

            It takes no great technologist to determine that the inputting mechanism for a television significantly differs from the inputting mechanism for a computer. To understand the exact interface problems which the Apple TV ultimately addressed, however, one must first understand the gap in the market Apple was attempting to fill through this product: on-demand television. To be fair, pay-per-view television was developed in the United States as early as the 1950’s (Smith 2001). Pre-digital on-demand television involved calling in to the pay-per-view provider by telephone in order to request the desired program. With the advent of digitally transmitted television in the 1990’s, as well as the boom in consumer computing in the late 20th century, the constraints of broadcast and cable television for providing on-demand video wavered in the face of new digital affordances. For Apple in 2007, during the incunabula stage of the Apple TV, great strides were being made toward designing an interface wherein users could take advantage of iTunes’ ability to offer video on demand not on their laptop, but on their television (Chamberlain 2010). In a certain sense, with digital television technologies taking the position of market dominance previously held by cable, Apple needed a way to compete with the affordance of digital TV to provide on-demand video within its own technical system. Apple had been providing on demand video rentals and purchases themselves for about 4 years, at the time, through their innovative iTunes platform, but the platform suffered severe limitations due to some of the design choices of other Apple products at the time. In other words, before the release of the iPhone in 2007, Apple’s only networked devices – that is, the only devices with the capability to access the Internet – were its laptop and desktop products. In those early days, the constraints of the iPod required a level of dedication from the user in order to purchase or consume on-demand music or video. Anyone who owned an iPod in its earliest iterations surely remembers having to buy a song or movie on the desktop or laptop iTunes interface, next plugging the iPod into the computer’s USB port, waiting for the iPod to sync, and finally dragging the music file from the iTunes library to the iPod device so as to signal the initiation of copying the song or movie file to the iPod from the personal computer. Only once this process was completed could the user consume their “on-demand” product away from their personal computer.

Now, this account of the iTunes store only has to do with the design problems of Apple TV insofar as it sets the scene by describing the state of Apple’s dealings with on-demand video at the time of the Apple TV’s release. Particularly, this early approach to using Internet-based marketplaces, accessed on personal computers and not media players themselves, explains Apple’s decision to design the first generation of Apple TV effectively as a storage and playback unit for media files bought on the iTunes store. Just as iPod users bought content from their personal computer, and then connected their iPod to the computer in order to copy files to the media player, first generation Apple TV users originally had no other option than to purchase media on a personal computer (Cheng 2007). Oddly, this unfortunate legacy from iPods and other Apple media players was not a technical constraint – as even the first Apple TV’s were networked computers with Internet access, which Apple capitalized on in promoting access to YouTube from the Apple TV – but instead a simple instance of technological “lock-in,” where a technological practice persists, not because it offers any advantages, but because it is an established method for doing something (David 1985). Fortunately for the user, this inconvenience lasted less than a year, as a software update in January 2008 provided users with the opportunity to access the iTunes store directly from the Apple TV interface, removing the inconvenient need for a personal computer to mediate this access.

Of course, this Apple TV differed significantly from the ones on the sterile white shelves of Apple stores today if not in its technical capacity, then in its technical function. In other words, even though Apple TV employed data streaming in order to play YouTube videos, it primarily relied on the technique of data synchronization for playing videos purchased from its own store (Pegoraro 2007). Unlike streaming, which involves sending media files from router to router as Internet packets, which are dissembled upon sending and reassembled upon receiving, data synchronization involves the data set on one device being exactly copied, or mirrored, on another. And while data synchronization offered some advantages, such as ensuring that a user’s iTunes library on a personal computer might match their library on their Apple TV, its major drawback, from a design perspective, was that it required the files to be downloaded to the Apple TV itself. In some respects, this constraint only becomes evident looking through the rearview mirror: the design of the fifth generation Apple TV, with only 32 and 64 GB’s of storage relies on its capabilities to stream music and video, and therefore to not require much space by way of internal storage. Comparatively, the first generation Apple TV’s basic model boasted 164 GB of storage. To be fair, the first generation Apple TV offered capabilities for both streaming and synchronization, however, synchronization was the default mode, as streaming was foreign to many users (Pegoraro 2007). Regardless, only after streaming made data synchronization largely obsolete would models with less storage space appear more tenable. The second generation of Apple TV, however, left behind the relic of data synchronization in favor of streaming for all time-based media.

Input and Interface Design for Apple TV

            Even with the technical capacity to use personal and consumer computing products as a gateway to content for the Apple TV, which a traditional television could then display on its screen, Apple still needed to overcome significant design challenges in terms of the input and interface of this media player. Particularly considering the emergence of on-demand movies and media on digital cable in the 1990’s and 2000’s, Apple needed a human-computer interface which simultaneously appealed to the brand’s design sensibilities, as well as provided users with straightforward navigation techniques where they could browse, locate, and access media easily and without frustration. In a word, Apple set out to design an interface and user input system which allowed for the user to take full advantage of the computational power of the Apple TV, while still perpetuating the user experience of using a television, and not a computer, which is to say, the primary input mechanism was a remote and not a mouse and keyboard. Because interfaces facilitate human-computer interaction, the following discussion of user-facing software and input devices often refer back to one another; the software must accommodate the constraints of the input device, and the input device should maximize the affordances of the software. At the same time, they can be more or less taken as individual modules in the combinatorial design of the Apple TV, and therefore I will address them as such. To be clear, by discussing the design of Apple TV’s interface, I am not only referring to the design of the Graphical User Interface (GUI), but the deep cultural history of the word as a meeting and the joining point for two disparate artefacts (Irvine n.d.). Specifically, Daniel Chamberlain’s definition of interface helpfully charts out the territory to be covered throughout this paper. He writes, “In a material sense we can think of those interfaces as consisting of three parts—a physical means of interacting with a screen-based display driven by dedicated software” (Chamberlain, p. 85, 2010, original emphases). Particularly, I am concerned with understanding the design of the GUI as interfacing the physical means of interacting (the Apple Remote) to the networked content (movies, music, etc.) on their screen-based display. The previous discussion took the dedicated software to task, albeit in limited scope, because the software, in the case of television’s adaptation of digital affordances, is not necessarily unique to the television medium, and therefore of secondary concern for this paper. In the following paragraphs, then, I consider the design problems and solutions of input and interface for Apple TV from both the design of the Apple Remote, as an input technology, as well as the Front Row interface and the subsequent tiled app interface of tvOS.

Apple Remote

Figure 1: First Generation Apple Remote

What’s a television without a remote? Ever since the television remote was invented in the latter half of the 20th century, its status as the ubiquitous mode for interacting with the television has remained unchallenged. For Apple to carry the experience of television watching through to Apple TV, they were wise to adopt this piece of hardware in their sociotechnical system – even if in the second generation and onward they enabled Apple TV with Bluetooth capabilities to connect with Bluetooth QWERTY keyboards for the convenience of those particularly fed up with the obstinance of “typing” with the Apple Remote. The Apple Remote was largely designed so as to allow a user to manipulate the Front Row interface – through which the user interacted with the Apple TV. Until a major design change in the fourth  generation of Apple Remote, its design visually borrowed from the iPod family, boasting only a minimalistic six buttons and the iconic “wheel” of the iPod (albeit this wheel did not function as a wheel and was simply four buttons positioned along the circumference of a circle). Because of their limited number, many of the buttons accomplished more than one feature, based on

Figure 2: Second Generation Apple Remote

whether the user was interacting with the Apple TV through the course of playing media or through navigating the menu. The fast-forward and rewind buttons, for example, doubled as left and right navigators, as the increase and decrease volume buttons doubled as up and down navigators. Through this technique, the design choice of positioning these buttons along the wheel proves more helpful than a simple perpetuation of Apple’s iconic brand and design philosophy. In other words, while the wheel did in fact preserve Apple’s visual aesthetic in the Apple Remote, it also allowed for these buttons to perform semiotic double duty. For example, if rewind/fast-forward and increase/decrease volume had been positioned in rows or columns, this implicit directionality of left/right and up/down would have been lost. For the buttons to perform double duty in the hypothetical column and row setup, they would need to be labeled as such so that they user might properly interpret their function. By positioning them along the wheel, the designers at Apple assume that their users have already been conditioned to understand the significance of directionality and design their hardware accordingly. Bruno Latour (under the pseudonym Jim Johnson) calls this phenomenon pre-inscription – the information or learning which the user is assumed to have before interacting with a technology (Johnson 1988).

The obvious limitation of such a remote, however, were its limited affordances for inputting textual data, which inevitably created problems for any sort of search functions. For a technology which organized entire catalogs of films and music, this constraint was fairly significant. Even with the adaptation of connectivity to Bluetooth keyboards with the second generation, Apple had designed the product so as to assume that the Apple Remote would function as the primary input device – any choice to allow the user to use a keyboard would be an extra flourish and affordance. For this reason, Apple had to ensure that whatever graphical user interface it designed or used would be entirely navigable through the affordances of the Apple Remote. This meant not only foregoing the QWERTY keyboard, but also abandoning any sort of mouse or cursor. Fortunately for Apple, interfaces which solved this problem proved not to be design impossibilities. And while, as we will see, the first interface did not endure until the present day, it characterized much of the early user experience of Apple TV. Furthermore, many of the principles of which it consisted still can be identified in the design of Apple TV’s current interface.

Front Row Interface

Figure 3: Front Row Interface

While the Front Row interface was always conceived of by Apple as a multimedia player, it pre-dated the Apple TV by around two years. And even while the user might have originally interacted with the Front Row interface through the medium of a Macintosh computer, Front Row’s release coincided with the release of the first generation Apple Remote, as the two were designed for one another. The Front Row interface was Apple’s first attempt at developing “10-foot user interface” (10-foot UI). 10-foot UI’s emerged with the rise of smart TV’s and the need to not only account for the manipulation of icons and symbols with a remote device and not a mouse and keyboard, but also the increased distance of the user from the screen when watching television as opposed to working on a computer (the 10 feet in 10-foot UI reference this distance) (Lal 2013). This distance generated several constraints of transitioning computing technology to the medium of the television (the first being the use of the remote as an input device): screens needed to be treated as single entities, and not windows, icons needed to be larger, and the interface should make it clear to the user which icons or symbols with which they were interacting by highlighting or otherwise indicating the icon at hand (Lal 2013).

According to Michael Moyer (2009), through the course of their development, 10-foot UI’s solved the problem by employing one of two major methods, the first of which quickly proved inferior to the second. The more obvious, but less successful design of 10-foot UI’s involved developing browsers for the television screen. As alluded to above, this involved enlarging the search bar and other icons, so as to ensure its visibility from the couch. Ultimately, however, it failed to provide a way to input text without requiring either the extreme patience of selecting every letter individually out of the entire alphabet for textual input or requiring the connection of a Bluetooth keyboard. For this reason, the second common design for 10-foot UI’s – a widget-based design – remains the industry standard. The widget system involves sorting out each program or feature into separate icons, not dissimilar to the manner in which apps are presented on smartphones, which the user can then sort through in order to select their desired function. Particularly in a digital economy where many Internet-based services are not housed by companies which manufacture communication technologies themselves, the widget design seems to offer many advantages. For example, instead of relying on a browser to mediate access to Netflix, Hulu, YouTube, Spotify, or any other number of media companies, the widget offers the user direct access to the content therein. Perhaps the greatest testimonial to these advantages, however, can be found in the fact that many of our touchscreen technologies, such as smartphones and tablets, employ the design of widget-based interfaces, at least in part, even when interfaces which rely on searching or textual input more heavily are not constrained by the physical and technical limits of their input systems.

Apple’s Front Row interface, however, needed to solve the problem of accessing the growing number of media types available for purchase or general consumption on Apple’s platforms. This included music, movies, TV shows, podcasts, and pictures. For all intents and purposes, the “widgets” which a user could choose between acted as access points, not for different media services or companies, but to discrete libraries for different types of media housed within Apple’s platforms. In other words, a user could toggle between text reading “Music,” “Movies,” “TV Shows,” and so on, while the correlating icon cycles along with the highlighted text to the left, in order to access the respective libraries. Upon accessing said libraries, the user would go through a similar process to select the actual file that they intended to stream or to play.

Leaving the Front Row and the Constraints of Widgets

Figure 4: tvOS Interface with “tiled widgets”

When Apple retired the Front Row interface in 2011 in favor of its OS X Lion for the Macintosh and tvOS for Apple TV (released the following year), it also abandoned the text-based widgets which defined the previous system. Instead, Apple opted to display software options through tiles of widget icons, with which the user interacted in essentially the same manner as he or she had grown accustomed to while using Front Row (i.e. using the remote to move vertically or horizontally, highlighting an icon to select along the way).

While this interface design afforded no new technical capacities for the Apple TV it signaled an important shift in the economics the product. As mentioned above, the Front Row interface was basically designed so that Apple users could organize and access a diverse number of media types all of which (or at least most of which) were under Apple’s umbrella. Of course, Apple could have simply kept the Front Row interface and added the products from new developers to the list of text which users could scroll through (i.e. Netflix, Spotify, and so on), but by choosing to design the interface with tiled widgets, which looked so similar to app icons on the ubiquitous iPhone, Apple subtly indicated a shift in their thinking about the Apple TV as a product. In other words, Apple TV was no longer a product which enabled consumption of media purchased through Apple’s platforms in the living room but was now a platform which facilitated the consumption of all digital and streaming content. From a developer’s point of view, this opened up the Apple TV from being an in-house Apple device to being one where products from a diverse number of developers could promote and distribute their products. However, this new focus of the Apple TV as a product did nothing to de-black box it from a consumer perspective. Meaning that even as Apple TV expanded the accessibility of its interface to non-Apple products which used Internet protocols to stream media, it still severely limited any access to the Internet writ large, particularly as opposed to browser designs for 10-foot UI’s. The interface constrains the user from accessing any aspects of the web other than those expressly designed to be accessed by the apps represented by the widget icons.


Apple TV proves an interesting case study as a smart TV technology not only because it serves as an archetypical example of the design problems all smart TV’s encounter – namely, the problem of distance and the problem of input – but also because of the interesting transition in this product’s specific history wherein it shifts from being designed as a product to mediate access to in-house Apple media, to being a meta-medium which facilitates access to non-Apple software and products. By considering the design problems encountered and overcome by the Apple TV, one discovers new answers to the old question, “why is this technology designed this way and not another way?” Particularly, Apple TV makes evident a unique symbiosis of the input technology and the interface design, where the interface accounts for the constraints of the input device.

Perhaps it is worth noting that the Apple TV product design only represents a single approach to reconciling television and personal (entertainment) computing, of which several product designs remain popular. Of course, there are other products on the market which solve these design problems in essentially the same way as Apple TV, such as Roku set-top boxes and the Amazon Fire TV Cube. Other products, however, like Google Chromecast, entirely sidestep design problems introduced by 10-foot UI and remote control input, and simply connects a personal computer to a TV, whereby the computer functionally acts as an input device and no new interfaces need to be designed. Comparatively, many companies now manufacture TVs which themselves possess computational power and can connect to the Internet. For smart TV products such as these, the design not only needs to accommodate for the limited user input of a remote and the 10-foot UI design, but also must consider how to seamlessly integrate the computational affordances of the smart TV with those of the cable TV system.

Considering the design of the Apple TV as primarily defined by input and interface problems, however, ultimately only considers a limited – albeit unique and important – set of principles guiding the design of this product and others like it. For example, it only deals with the combinatorial and modular design of Apple TV in an indirect and incomplete sense. And while an analysis which takes combinatorial and modular design into greater account might contribute a study which more fully de-black boxes this product, it would not necessarily address the unique design problems which we have explored above. By better understanding the input/interface system, the unique qualities of computers designed for video consumption on television monitors reveals an interesting moment of designing computational media both according to its own affordances and constraints, but also according to the affordances and constraints of input devices.


Braun, J. (2013). Going over the Top: Online Television Distribution as Sociotechnical System. Communication, Culture and Critique, 6(3), 432–458.

Chamberlain, D. (2010). Television Interfaces. Journal of Popular Film and Television, 38(2), 84–88.

Cheng, J. (2007, March 27). Apple TV: An in-depth review. Retrieved December 14, 2019, from Ars Technica website:

David, P. A. (1985). Clio and the Economics of QWERTY. The American Economic Review, 75(2), 332–337.

Irvine, M. (n.d.). Introduction to Affordances, Constraints, and Interfaces.

Johnson, J. (1988). Mixing Humans and Non-Humans Together. 35, 298–310.

Lal, R. (2013). Digital Design Essentials: 100 Ways to Design Better Desktop, Web, and Mobile Interfaces. Rockport Publishers.

Moyer, M. (2009). The Everything TV. Scientific American, 301(5), 74–79. Retrieved from JSTOR.

Pegoraro, R. (2007). Apple Tries to Bridge Computer Desk, Living Room. Retrieved from

Simon, R., & Rose, B. (2010). Mixed-Up Confusion: Coming to Terms with the Television Experience in the Twenty-First Century. Journal of Popular Film and Television, 38(2), 52–53.

Smith, R. A. (2001). Play-by-Play: Radio, Television, and Big-Time College Sport. Baltimore: John Hopkins University Press.

Universal design principles on game design —- take Candy Crush as an example


In July 2018, a research of a typical “match 3 game” Candy Crush revealed an estimated revenue of about $930 in the past 12 months, from both the purchasing action from its users and the commercial profit from its advertisers. There’s also a whopping 9.2 million users spend more than 3 hours daily on the game according to the developer of this game (Cheema, 2019). The prosperity of internet technology development gave birth to the development of both stand-alone games and online games. Tracing back to the possibility and playability of games, I can see the process of the prevalence and maturity of the game market as well as their design logic. Curious about mobile games and the game development thereof, I hope to illustrate how the universal design principles and the design of PC and smartphones are being applied in game designs in this paper.

In the following part of this paper, I’ll first explain the current market situation and consumer habit to summarize the premise and growing environment of the game market. Then, I’ll focus on the de-blackboxing of video games developed in the past 10 years, using Candy Crush as the main example and conducting comparison and contrast between different devices and some contemporary games. Eventually, I hope to illustrate the specific adoption of universal design principles.


Game market analysis

United States was the top country for mobile game development, which varied from browser games, PvP (Player versus player) to MOBA (Multiplayer online battle arena games). The history of online games started in the 1970s and went through a flourishing period in the early 20th century. The development of mobile games profits from the development of mobile phones, including a higher resolution, accurate screen response and high-speed networks.

Graph 1: Worldwide distribution of games market revenue from 2015 to 2019, data collected by Newzoo and graph created by

During the recent 5 years, the market share of smartphone games and PC games has continuously occupied a large percentage of total game market. Also, according to the research by, 53% of game developers were developing games for PC and Mac, followed by 38% of them for smartphones and tablets. PC and smartphones have always been the 2 most popular platforms for game developing, which also lead to a compatible requirement on multi-platforms investment. Moreover, 72.3% of mobile users in the U.S. are mobile phone gamers (IAB(Trends), 2016). Although there is not a huge overlap on mobile games and PC games, there is a tendency to expand games’ adaptability on multi-platforms both for promoting and technology developing.

Graph 2: iPhone’s top grossing mobile gaming apps in the U.S., data collected by Newzoo and graph created by

For all mobile games released in the U.S. market, Candy Crush has occupied the second and the fourth highest daily revenue. It is interesting to find that Candy Crush adopts a much easier game logic, technical demands and producing cost than Fortnite, the champion, while still occupies a great market feedback. What makes a simple “match 3 game” go viral and become a national-wide popular mobile game? The rest of the paper will explicitly explain it by de-blackboxing the design principle and the compatibility of consumers’ psychology.

Consumer insight

Graph3: gamers demographic in the U.S. by 2017. Data collected by Pew Research Center and graph are created by

According to the research, 64% of the general U.S. population are game players. Most of them are younger people (below 50) and don’t possess an extremely high education level. Also, women usually have a higher evaluation of games that are well-designed, fantasy and call for the participation of a community or with a storyline. While most men overweight the competitivity over those factors.

Therefore, for game designers, it is important to accurately target users by identifying their internet acceptance, learning ability, and plot preferences. Generation Z game players tend to be accustomed to computational learning and are relatively easier to accept new things and operation methods autonomously. Not to mention that there are a great bunch of game users who witness the whole creation and development of the internet and internet products. As a result, almost all current game players own a general perception of computational techniques and have a strong anatomy of self-learning skills.

Furthermore, some game players used to spend time viewing game videos of peer players on YouTube, including reviews, trailers, instructions, etc., most of whom aged between 16 to 34 years. In conclusion, the game designers should both guarantee the game effect and also make the players pick it up and figure it out quickly. The adoption of long-accumulated computer using habit should fit the demand of game players and designers, out of this standpoint, take on the universal design principles and apply them to the new game development.

Match 3 game developing analysis

Game developers are always finding ways to attract new users and keep them active. There is a previous research paper conducting comparisons over 7 different “match 3” games. First, social network sites enable people to share their profiles and daily lives within a limited community. Usually, SNS (like Facebook) accounts are external tools that can be displayed in a player’s game profile and thus establish the relationship with the player’s friends in real life. Only with a few clicks, the players can then reach an emotional connection as well as deliver a real attraction to their friends with the game (Omori & Felinto, 2012). Second, games with a flexible concentration requirement and lower entrance bar will easily attract more “un-hardcore” players. Allowing consumers to spend little time on it for several minutes anywhere at their convenient, it simply attracts them by the game’s entertaining quality and casualness. Then, by interaction between the game and the player and the bonus mechanism, the continuous gratification keeps the players with a longer connectivity. Other than those, privacy and security, marketing virality, etc. also affect user’s activeness and attractiveness. In conclusion, despite game interface, there are mechanics, gameplay and ideal stories that taken into account when evaluating a good game. In the rest of the paper, I’ll focus more on the design principles, like software and hardware dependence, modules interaction and network communication, etc. on analyzing the specific application on the game industry.


The term affordance was first introduced by Gibson in 1977 to interpret the interaction of the environment and the user from the ecological psychology view. The prevalence of smartphones has drawn huge attention over the affordance for exploring user’s perception and adoption for ICT (Information and Communication Technology) products (Leonardi,2011), which can actually affect the usability of the products. The graphic interface design and interface interaction of game design mostly benefit from the capacity of the smartphone screen.

Graphic User Interface

The icon of the game is made up of three basic items of the game: two normal candies and a special one (which owns some special function). Here are several more examples of the icon design in the game.

With the emerging smartphone screen development, the screen resolution has been experiencing a rapid improvement from 1136*640 of iPhone 5 to 2436*1125 of iPhone X. The Apple corporation nearly doubles its presenting quality to reach the lucidity of object presenting. And the newest MacBook Pro 13’ has a higher resolution of 2560*1600. Other than the displaying quality regarding screen resolution between the mobile device and PC, the screen arrangement is also sharing a different ratio. The following is a comparison of the same module (the first stage of the map) of the game entrance interface. The left one is what displayed on a webpage that is more like a square shape and is organized as a plane layout, while the right one is compatible with my iPhone 8 that not only adopts a rectangle shape but also demonstrates a deepened arrangement that simulates natural physics law of a foreshortening perspective. Moreover, when zooming in the two screenshots, I can see there is a worse fringe color on the PC website since this display area has fewer pixels to hold different color dots to achieve a smooth transition. However, one thing that the PC webpage one overweighs the mobile one is that it reacts to the cursor and the click while the mobile one can only sense the touch/press. Both cursor and touch ask for the detection within a specific area that I’ll illustrate more thoroughly in the following parts.


The vibration of a smartphone is connected to the haptics. Inside a phone, there is usually a vibration motor inside the smartphone that controls the thousands of vibrations of our phone each day. The vibration in Candy Crush can notify an effective swiping together with the elimination of an array of candies. After the swiping triggers the backstage program, it analyzes whether a player’s action is effective or not together by delivering signals to the vibration program. If it is classified to be an effective swipe, it will both complete the exchanging action and the vibration, along with a slight sound effect generated by the speaker of the smartphone. If it is classified to be a noneffective swipe, it will neither complete the exchanging action nor make the sound effect, but will still activate the vibration motor that indicates the swiping action.

Other than that, the design of the smartphone inserts the functional module of the vibration motor, which not only helps with the convenience of smartphone users but also creates more possibilities for the application designers. I still remember when I first set up my iPhone, it asked me to customize some functions that are related to my personal habits. The vibration feedback is one of such examples. iPhone actually provides several vibration patterns for its users to choose from and apply to different occasions. It is not only a feedback or a reminder from the phone, but also an interaction between the user and the device which lets users know that they are participating in the design process of their devices and also improving the user experience.


For a long time, designers of physical products and internet applications have formed a common knowledge to adopt a strong consistency both from other pre-existing works and other parts of this system. Replicating and transferring the older services into a new context greatly increase the usability and learnability of a new application or product. Lidwell, Holden, and Butler classified the system consistency of design principles into 4 types: the aesthetic consistency, the functional consistency, the internal consistency, and the external consistency.

Aesthetic consistency

The aesthetic design on the interface of one application leaves the users a strong visual impression for the basic characteristics and functions. Previous researchers have concluded that visual aesthetics plays an important role in the evaluation process of users when they encounter an interaction system (Tractinsky, Cokhavi, Kirschenbaum & Sharfi, 2006). The consistency in Candy Crush reflects both in its icon design and its uniform style throughout the game.

For the simplest element—a candy, the designers rebuild a candy world by animating geometric figures with designed lightening effect to mimic the true candy shape. By means of the pixels and their capacity on the screen, one pixel can display 16777216 colors by changing the proportion of the primary colors (Red, Green and Blue) and their luminance by 256 extents. As human eyes are most sensitive to those colors, the huge variety of colors is enough for human eyes to recognize. Also, plus the high resolution of current electrical devices, the icons can present a 3D effect. By arranging the colors on different pixels, the designers create a vivid color effect including highlight and shadow of an object, thus can simulate what we see every day in our life.  Nowadays, designers are always trying to simulate human eyes effect to focus on the “human-centered” interaction and promote the communication between users and the machine. Creating an interface from the users’ point of view helps the users to understand the system without having a knowledge of the algorithm behind the interface or any complicated learning instructions. It helps the users figure out what something does, how it works and what operations and interactions are possible in this system (Norman, 2013). Not only the candies displayed on the game icon or the level composition, the obstacles and boosters also show a vivid simulation of real-world objects for user’s recognition.

Functional consistency

In an interaction system, there are always some signifiers that can be easily precepted by the users and explicitly illustrate their functions in the right place. The functional consistency uses symbols from previous social experience and asks the users to leverage them those previous knowledge in dealing with a new application. Common signifiers to some extent teach people how to control those functions in the new environment.

There are several main special candies in the game Candy Crush. Based on fundamental candy forms, there are advanced types of candies, such as striped candies (those with horizontal strips and those with vertical stripes), wrapped candies (those look like a bag of candies) and color bombs (the black chocolate ball covered with colorful sprinkles). Stripes candies allow users to eliminate an entire row or columns candies. We get to know the direction by observing the direction of the stripes: those with horizontal stripes can clear all candies of their rows and those with vertical stripes can clear all candies of their columns. Wrapped candies can explode and eliminate the 8 candies around it in a 3*3 rectangle range, it is designed like a shopping bag that can explode anytime and create a strong damage around it. The color bombs can clear all the candies with the same color after being swiped with a color candy near it. The large amount of colorful candies attached to the chocolate ball shows its capability to absorb a large amount of candies at one time. Beyond those, there are also various kinds of boosters, blockers and obstacles with different designs and t functions. Candy Crush hardly has any instructions for users to learn how to create or use specific items or boosters. A good functional consistency also focuses on creating an appropriate affordance to make the users’ desire doable with the technology used.

Internal consistency

We can see the internal consistency everywhere in a design: The icon of the game shares the same elements with the compositions, the character design and environment design both achieve a playful and colorful theme, not to mention the sound effects. All those elements are combined together and provide the users with a joyful consistency that also cultivates trust with the users.

External consistency

It is not an easy thing for various internet products are observed having similar design standards, but it is essential for users to have a continuous perception of using habits to let game designers apply those intelligence into new game creations. In the early 2000s, when emerging mobile games were carried by the improvement of smartphone techniques, they were also exploring the best-fitting way to help with the effective communication between users and machines, as well as encourage the appropriate feedback from the system. Before Candy Crush, Fruit Ninja and Angry Birds promoted the interaction between users and the game in virtue of screen-touching control technique. The development of touchscreen enables the installation of mass interaction application and better exploits the affordance and possibility of smartphone.

There are many substrates on a smartphone screen or other devices that support touchscreen function, in which the top several layers are transparent that catch electrically movement. When our fingers touch on the screen, there is a touch sensor that detects the presence or movement of one object within a touch-sensitive area. Therefore, it can catch the location, proximity, pressure magnitude, etc. of the object movement. In this way, our bodies get to connect with the computing system and become part of the design.

In Candy Crush, after the players making an “swiping gesture”, the screen then provides the feedback as the result of exchanging the position of two candies. However, the rules of the game restrain the effectiveness of the exchange. If the exchange action makes a three same matching in the same row or column, or combine two special candy to promote a stronger effect, the exchange is considered effective, vice versa and the two candy jellies return to their original position, and this noneffective swiping won’t be concluded in the movement limitation.

Some touch sensors include several electrodes on different substrates. By approaching the screen with our finger, there exist several electronic changes on different layers of the touch-sensitive areas and thus enables the screen to recognize our gestures and moving tendency. Fruit Ninja, similarly, uses a screen swipe to simulate the chopping action when a fruit is “killed”. The users first create a moving path on the screen, after the sensor detects and decodes it as a continuous action, the feedback would be a cut-up effect follows the line on the screen. So is Angry Birds. The successful action in Angry Birds requires the detection of both moving path and the intensity and touching time. No matter what, by copying some properties of familiar objects or operations, they mimic the users’ daily using habit and maintain a good user experience.

People are better at recognizing things they have previously experienced than recalling them from memory (Lidwell, Holden & Butler, 2010). In a complex system, it is important to make unfamiliar things recognizable. Often, encountering familiar options stimulates users’ acceptance and the decision-making process of a new product. Maintaining the consistency of application is thus important for designers to adopt previous game design construction and apply those experiences into new product creation.

Feedback Loop

Candy Crush is a game of luck, as the success of the level depends on the random candy pattern you are given. Actually, not completely random. The game researchers classified it as a behaviorist psychology strategy. It stimulates a positive feedback loop that encourages a repetitive behavior. The level is divided into several difficulties, at the beginning of the introduction of a new element, the game is at the entry-level to help the players develop substantial skills in recognizing and applying those elements. Then, it suffers a relatively higher difficulty to reduce the satisfactory and intrigue their competitivity. And at one time, an easier level will be needed again to maintain the player’s activeness and confidence.

When it comes to how to control the level difficulty, there are many patterns of the algorithm design that decide the overall difficulty of one level. For example, the players cannot predict the falling pattern of the candies. Every time a bunch of candies are cleared, there will be candies dropping from the very top and become part of the new candy pattern. Therefore, the game will be planning the dropping candies that decide whether the possibility of a user to complete the current level. For the users, they can either choose a riskier way without predicting possible falling patterns or resisting change and maintaining an equilibrium strategy in game playing. This is a mutual feedback that also requires a long-term memory and analysis of the player’s operating habits. It keeps the player assessing their plans and choices of the game strategy. It has long enticed the players to stick to this uncertainty.


Video games have been developing since the existence of internet and computational technologies. The construction of game design is built on the affordance and capability of current technology and decides the degree and development phase that a game can be supported. Learning from the previous experience, game designers pass the intelligence to new game creation and innovative application. Although it seems that Candy Crush has adopted a simple customer logic design, the development background is actually deeply rooted in the human primitive behavior pattern for a long time. Game company thus manages to demonstrate a lower development cost and let it become popular all in a sudden.

Those universal design principles and the examples provided above indicate the adaptability and continuity of some basic design rules: affordance to best apply the hardware capacity and software capacity into user interactive process, consistency to stimulate recognition and build trust with users and make the game more controllable and learnable, also, an appropriate feedback that keeps the stickiness and activeness of users by memorizing and analyzing their previous behavior patterns. This article decomposes the daily game habit into calculable and controllable rules and techniques that make what used to be obscure and hidden design logic more transparent. Therefore, it concludes several wide-applied design principles in the game design industry and retains the possibility to further optimize PC and mobile video games, but also helps with the forecast and evaluation of the consumer’s future performances and behaviors.



Anderson, G. S., Varonis, E. M., & Varonis, M. E. (2015). Deconstructing candy crush: what instructional design can learn from game design. The international journal of information and learning technology.

Cheema, S. (2019, June 27). A whopping 9.2 million people play ‘Candy Crush’ for 3 hours daily. Retrieved from

Dondlinger, M. J. (2007). Educational video game design: A review of the literature. Journal of applied educational technology4(1), 21-31.

Guard, D. B., & Trend, M. (2018). U.S. Patent No. 9,965,106. Washington, DC: U.S. Patent and Trademark Office.

Heaven, D. (2014). Engineered compulsion: why Candy Crush is the future of more than games. New Scientist222(2971), 38-41.

Jerald, J. (2015). The VR Book: Human-centered design for virtual reality. Morgan & Claypool.

Leonardi, P.M. (2011), “When flexible routines meet flexible technologies: affordance, constraint, and the imbrication of human and material agencies”, MIS Quarterly, Vol. 35 No. 1, pp. 147-167.

Lidwell, W., Holden, K., & Butler, J. (2010). Universal principles of design. Beverly, Mass: Rockport Publishers.

Omori, M. T., & Felinto, A. S. (2012). Analysis of motivational elements of social games: a puzzle match 3-games study case. International Journal of Computer Games Technology2012, 9.

Smith, D. (2014, April 1). This is what Candy Crush Saga does to your brain | Dana Smith. Retrieved December 15, 2019, from

Tractinsky, N., Cokhavi, A., Kirschenbaum, M., & Sharfi, T. (2006). Evaluating the consistency of immediate aesthetic perceptions of web pages. International journal of human-computer studies64(11), 1071-1083.

Tsai, J. and Ho, C. (2013), “Does design matter? Affordance perspective on smartphone usage”, Industrial Management & Data Systems, Vol. 113 No. 9, pp. 1248-1269.

2018 Video Game Industry Statistics, Trends & Data – The Ultimate List. (2019, November 5). Retrieved December 13, 2019, from

Tracking the Path of Spotify Music: Design Principles and Technologies that Make Spotify Workable


Streaming technologies are getting more popular and essential with the development of the Internet. Cheap, convenient and fast, streaming music is gradually replacing the physical music players, such as vinyl, CD player, and phonograph. Music streaming platforms not only provide users a wide variety of songs without the limitation of time and space, but also construct a social network environment where users can share music. Leading companies in the industry include Spotify, Apple Music, Amazon Music, Pandora etc. In addition, streaming music service companies offer users mobile apps with great functions, meaning that users do not need to stick to a PC to enjoy the music. The competition among companies makes consumers the biggest beneficiary.

In this research paper, I am going to use the case study of Spotify app to research into the question of “How does Spotify implement as many design principles as possible?” This paper is divided into three main parts: 1) two main infrastructures that keep Spotify running: proliferating data infrastructure as well as audio and streaming infrastructure (Eriksson et al., 2017); 2) three ways Spotify works as interface: interface between websites, between human and devices as well as between human and the large computing system; 3) Spotify as a sociotechnical system. My expected goal of this paper is to visualize the series of unobservable series of systematic actions triggered by “one click” of the button on the GUI of Spotify.


Streaming(adj.): relating to or being the transfer of data (such as audio or video material) in a continuous stream especially for immediate processing or playback (Merriam-Webster)

  • How Does Music Streaming Work?

With streaming, “the client browser or plug-in can start displaying the data before the entire file has been transmitted”. During the procedure of streaming, the audio file is transmitted and delivered in small packets, which compose metafiles, and then decoded by the codec. When the buffer is filled by the decoded results, the files are turned into music and the computer straightly plays the music (White, 2015) “Each of the scores of available audio codecs are specialized to work with particular audio file formats, such as mp3, ogg, etc. As the buffer fills up, the codec processes the file through a digital-to-analog converter, turning file data into music, and while the server continues to send the rest of the file” (White,2015).

Different from downloaded music, which is stored permanently in the device’s hard drive and whose access does not require the connection to the internet once stored, streaming music works through wi-fi or mobile data, and the users do not “own” the music. As long as there is a steady stream of packets delivered to the computer, the user will hear the music without any interruptions.

  • Digitization of Sound Wave and Compression of Audio Files

The digitization process of sound wave follows the information transmission model of Shannon. The digitization process is like an analog of sound: it is recorded as a sequence of discrete events and encoded in the binary language of computers. Digitalization involves two mains steps, sampling, the measurement of air pressure amplitude at equally spaced moment in time, and quantization, the “translation” from the amplitude of individual sample to integers in binary form. Sample rate refers to the number of samples taken per second (samples/s), which is also called Hertz (Hz). Bit depth refers to the number the number of bits used per sample. The physical process of measuring the changing air pressure amplitude over time can be modeled by the mathematical process of evaluating a sine function at particular points across the horizontal axis. As it showed in the graphic below” (DIGITAL SOUND &MUSIC).

In order to reduce the file size and stream more efficiently over network, the sample rate of digital audio, bit depth and bit rate, are often compressed, during which the quality of the music is inevitably damaged. Compression can be lossy and lossless according to audio quality: lossless compression enables compressing the file size while remaining the quality of the audio. What is more, “the file can be restored back to its original state; lossy compression permanently removes data (by reducing original bit depth)” (BBC).

Two Main Infrastructures of Spotify:

When digital audio is packaged into files and “become music at Spotify, aggregation of data occurs on, at, and via many computational layers” (Eriksson et al., 2017). The platformization of Spotify hides the complex data exchange process behind the well-designed GUI, so that its design principles are invisible to consumers. The two main infrastructures, proliferating Data Infrastructure and Audio and Streaming Infrastructure, and several detailed design principles, are what make Spotify workable.

  • Proliferating Data Infrastructure

Following an “end-to-end server and client model”, Spotify’s proliferating data infrastructure, proposed by Eriksson et al., is the foundation of its service . This infrastructure enables the communication between Spotify’s servers and their clients’ devices: at one end lied their servers and data centers to “send out music files and fetch back user data” (Eriksson et al,2019); at the other end lied users’ playback devices, a PC, a smartphone or a tablet.  Spotify is synced to the Spotify Cloud Server, a non-physical public data storage by Google, meaning that it “operates based on Google’s cloud compute, storage, and networking services, as well as its data services, such as Pub/Sub, Dataflow, Big Query, and Dataproc” (Datacenter Knowledge). 

Data is the most important part of this infrastructure. Spotify builds ways for data exchange. Eriksson et al. see the transmission of information and data between users and Spotify’s server as an “event delivery system”. Most events that are produced within Spotify are generated from Spotify’s users. They define user data as sets of “structured events that are caused at some point in time as a reaction to some predefined activity” (Eriksson et al., 2019). When a user performs an action, for instance, clicking the play button on the app, a piece of information (“an event”) signaling “playing the music” is sent from the user’s device to the server via the internet.

  • Audio and Streaming Infrastructure

The second infrastructure is the audio and streaming infrastructure. Spotify balances the file size and the speed of the internet very well since its streaming service experiences a very low latency, that is the delay, between a user requesting a song and hearing it, is almost imperceptible. Spotify’s  low latency streaming is owe to Ogg Vorbis format, an open-source lossy audio compression method “that offers roughly the same sound quality as mp3, but with a much smaller files size”. Note that music files are not permanently stored in the destination device, what happens is that the buffer stores a few seconds of sound before sending it to the speaker, so Spotify’s client “fetches the first part of a song from its infrastructural back end and starts playing a track as soon as sufficient data has been buffered as to make stutter unlikely to occur” (Eriksson et al., 2017 ). The small file size enables Spotify’s server to send the file fast, and thus to play music almost instantly after the client clicks the play button.

Spotify offers different bitrate in regard to streaming quality.

This design enables users to adjust the music quality according to their needs, or they could “turn on the automatic quality streaming” so that the app will automatically detect the best bitrate to use based on the internet environment of the device, which is super convenient for mobile app users, since they do not need to worry about suddenly running out all data by accidentally selecting extreme.

Sum Up: How Does Spotify Track Streams?

The scale of data passes through Spotify is enormous: In 2016, the company “handled more than thirty-eight terabytes of incoming data per day, while permanently storing more than “70 petabytes of…data about songs, playlists, etc.” (Sarrafi,2016)

What happens when the user clicks the “play” button? How does Spotify deal with the large scale of data? How can Spotify make sure the data are transmitted accurately? How Spotify manages to make sure the data is on the right path and head to the correct destination? All things about de-blackboxing the design of Spotify is to track the path of its data files.

In order to deliver music worldwide, Spotify applied SIR (SDN Internet Router).  To recap, “every computer on the Internet has an IP, that IP belongs to an IP network and that IP network belongs to a an organization”(Spotify Labs). So, Spotify identifies users by their IP addresses. First, Spotify has two transit providers to make sure they can reach all clients. Transit providers are companies who own very large networks and allow other organizations to connect to their network for a fee so they can reach the rest of the world. Second, Spotify uses Content Delivery Networks (CDNs), extremely well-connected network, to reach faraway users and help with the bandwidth required to send users the music, so that their users don’t have to wait for their bits to travel all over the world. They have physical data centers in London, Stockholm, Ashburn (VA) and San Jose (CA). In addition to utilizing Google Cloud Platform, Spotify also utilizes up to at least five internet exchange points (IXPs) located in Frankturt (DEC-IX), Stockholm (Netnod), Amsterdam (AMSIX), London (LINX), and Ashburn (EQIX-ASH). The service is also attached to some subscriber networks, broadband or mobile providers, to speed up and shorten the distance to their users (Dbarrosop, 2016).

Spotify splits traffic between data centers. When a Spotify client connects to the service, a combination of techniques is used to make sure that the connection is made to the best possible data center. Also, when Spotify connects to organizations’ network, they need to know which of those connections are suitable to reach the connecting client (Spotify Labs). By applying SRI, Spotify could monitor available paths, choose the best one based on real time metrics, and thus provide clients better and more accurate service.

“As Nicole Starosielski claims, “a simple ‘click’ on a computer commonly activates vast infrastructures whereby information is pushed through router, local internet networks, IXPs, long-haul backbone systems, coastal cable stations, undersea cables and data warehouses at the speed of light” (Eriksson et al. 2017).

Detailed Design Principles that Enable Spotify’s Different Features

  • Spotify as Interface for Agencies

  1. Interface Between Different Websites

Linked Data and Data Integration

Spotify links and combines data from different sources, enabling two of its key features: one is collaboration with companies , the other one is the playlist function. In this section, I will focus on the former feature, and the latter one will be discussed later.

Spotify operates data integrations with lots of companies, the most observable one (across the app’s interface) is the “infrastructural tie-in” with Facebook. It merges its login system with Facebook so that users could either sign up with their email or Facebook account. By linking account to Facebook, Spotify let users “display their Facebook name, picture, and find their friends easily on Spotify” (Spotify). They could also share their playlists with their friends and know what their friends are listening. Two huge databases are connected by users’ simple clicking of “sign up with Facebook”. The collaboration is a win-win deal for both companies. 


Interoperability is “the ability of different information systems, devices and applications (‘systems’) to access, exchange, integrate and cooperatively use data in a coordinated manner, within and across organizational, regional and national boundaries.” ( Sharing music and personal profile is one of its interoperabilities. Spotify’s users can share Spotify music via multiple platforms such as “Skype, Tumblr, Twitter, Telegram, etc.” When users click one of the bars, the link will direct them to the relating websites, which triggers the information integration process. They can also share songs via Spotify Uniform Resource Indicator (URI). This link is convenient because it directly takes users to the Spotify application, without having to go through the web page first (but with HTTP song link, users will be directed to web page). For example, the link of this song is””; the URI is“spotify:track:0WVAQaxrT0wsGEG4BCVSn2?context=spotify%3Aplaylist%3A37i9dQZF1DX0BcQWzuB7ZO”. What is the most interesting part is that users can install Spotify music and playlist on their personal website by copying the embed code of the music. Copying and clicking the link inside the application is the analog of opening a new window in the web browser.

In sum, data integration and interoperability are inseparable: it is data integration that enables  the interoperability between Spotify and different websites. The sharing of databases adds lots of “social” functions, and thus affordances, to Spotify.

2. Interface Between Users and Devices

Viewing Spotify from the perspective of offline, it serves as an interface between users and devices, or a product of human computer interaction. This part is a continuation of what happens when the computer finished “communicating” with Spotify’s servers. After the packets arrive at the computer, the system decodes and sends the result to a buffer, then to the speaker. Take the computer desktop application for example, Spotify GUI enables human to interact with the computer by means of visualization, audition and tactician. It serves both as a controlling interface for users to send their command, and a checking interface for users to monitor how well the computer completes their commands. Although it does not invent any new function for computers, it helps users to “develop and assemble” different functions of the computer. The action of clicking the play button (through the touchpad, tactician) triggers the RAM, the buffer, the speaker, the monitor…For example, the interface informs the user that the song starts to play, that is when the computer receives the user’s command and reacts to it , in three ways (on the monitor, visual): 1. The play button change from “pause” to “play”; 2. The progress bar is moving and the remaining time of the song is changing; 3. The speaker icon pops up on the album cover. All of which, accompany with the sound of the music (auditory), are signs that human is interacting with the computer and that the computer is playing the song they want.

3. Interface Between Users and the Underlying System


Playlist is the building block of Spotify. It connects different modules of Spotify, for example, from one playlist to another or from one singer to a music genre. The user can find out that playlist is everywhere in Spotify’s desktop: in the home page, there displays different types of playlists in the form of square shaped photos; in the main navigation bar on the left side, there exits a list of playlists and the function of “add new playlist”; when the user search for a song, an artist or a key word, what appears is a screen full of playlists. Playlist is the simulation of album, in which different songs are connected by the same singer. Spotify’s playlists connect songs together based on different elements, such as mood, weather, genre…For example, the songs in the playlist named “Christmas Hits” are connected by the Christmas element. When the user clicks into this playlist, he or she is likely to get into another playlist related to Mariah Carey (who is famous for singing Christmas songs) or “Christmas Classic” (because these two playlists share the same element “Christmas”). Thus, playlists link the data of artists, songs, key words etc. together.

As Eriksson et al. indicate in their study, the streaming metaphor itself implies a continuous flow of music, reminiscent of a never-ending playlist (Eriksson et al, 2019). Building playlists is not a new function invented by Spotify, since“early media players such as Winamp provided functionality for reaggregating tracks into customized playlists–an approach to music that built on previous assembling practice and technology” (Eriksson et al, 2019). Spotify uses playlist to guide users from one module to another. It allows users to mix and match their favorite tracks and rewrap the tracks into their personalized playlists. It also enables Spotify to recommend and create playlists for users based on data of preexisted playlists. The creation and recreation of playlists is a non-stop data aggregation process that keeps Spotify’s different functions working.

Music Recommendation

Followed by the launch of “expert playlists for every mood and moment”, Spotify took a step toward “algorithmic and human curated recommendations”. It “not only delivers music, but also frames and shapes data” (Eriksson et al. 2017). It is also a method of how Spotify enables “deep and unique conversation” between users and the complicated technological system.

Spotify recommends music in various ways: the weekly updated playlists such as “Discover Weekly” and “Release Radar” as well as song and album recommendation such as “Top recommendations for you”, “Similar to…” and “Because you listened to…”.

Take the “Discover Weekly” for example. It is based on three recommendation model: 1. Collaborative Filtering: it makes prediction based on users’ historical behaviors on Spotify, such as  “whether a user saved the track to their own playlist (see, playlist appears again!), or visited the artist’s page after listening to a song” (Ciocca, 2017). Say user Pipi listens to tracks A,B,C, user Sisi listens to the track D,E,F., and they are paired up. Spotify will recommend B to Sisi and F to Pipi, after making sure that neither of them has listened to the music Spotify recommends. The whole process is actualized by matrix math and Python libraries.  2. Natural Language Processing (NLP)models: it is the analytics of text. By searching over the web to look for blog post and other written texts about artists and music, Spotify can figure how people define and describe a specific song or musician; Take Echo Nest for example (Whitman, 2012), they put Spotify’s data into a chart called “cultural vectors” or “top terms”. Each artist has their top terms and associated weight, meaning the possibilities that people are likely to describe them. Basically, Spotify uses the charts to create a vector to determine the similarity of two songs. 3. Raw Audio Models: it is the analysis of the track’s characteristics. Spotify uses convolutional neural networks to analyze the similarities of the characteristics between the music, such as “time signature, key, mode, tempo, and loudness”(Ciocca, 2017). Then, it recommends songs for users based on their listening history.

(“Cultural vectors” or “top terms,” as used by the Echo Nest)

Through layers of analysis and calculations of different sources of data, Spotify makes personalized recommendation for users, which is also a process of database integration and an interoperation across different companies.

  • Spotify as a Sociotechnical System

Spotify only allows its users to download music within the application, which means that users cannot export their user libraries (both online and offline) outside of Spotify’s ecosystem. Music libraries can only be synced between devices with Spotify app. It seems a good way for Spotify to secure their clients, right? But the whole story is not that simple. It has a lot to do with the music industry licensing agreements and digital rights management (DRM).

A Proprietary Format is a format that a software program will accept or output data that is entered into the program (the law dictionary). Music streams from Spotify, protected by DRM, are encrypted in OGG Vorbis format. It is an open source and patent-free alternative for lossy compression, which means that Spotify’s software developers do not need to pay license fees by supporting OGG in their application, neither do they need to publish the change when they fix the code according to their own needs” (Mitchell, 2014). Spotify’s audio files is coded by its own engineer based on the original OGG format, thus its music cannot be decoded or decrypted by other software. To put is simple, users cannot use other media player to play Spotify’s music, do not even mention to burn Spotify’s music into CD.

 Spotify is available in most of Europe and the Americas, Australia, New Zealand, and parts of Africa and Asia. Its content can be accessed through both app and web player: the app can operate on iOS, Android, Mac and Window system as well as through several sound systems, TVs and car stereo systems; the web player is supported by web browsers including Chrome, Firefox, Edge and Opera (Spotify). To be more specific, Spotify is coded to operate on those systems and devices. 

The enactment of Digital Performance Right in Sound Recordings Act (DPRA) and Digital Millennium Copyright Act (DMCA) forces many music streaming providers to pay a sound performance royalty in addition to the musical work royalty (Richardson, 2014). Spotify “must pay licensing fees to copyright holders (record labels, such as Warner Brothers and SONY) for each song played, whether offered to a paying, or to a non-paying customer, which makes the freemium model of customer expensive. That is also the reason why Spotify charges for premium services.

Also, if the user opens  “About Spotify” tab, he or she can see a series of logos for Universal Music Group, EMI, Warner Music Group, etc. under the “Content provided by” part, meaning that the music on Spotify comes from elsewhere.

Spotify “opens its Application Program Interface (API) to external developers, whose applications could retrieve data from the Spotify music catalog”. But developers have to follows a set of rules in order to develop their apps, such as agreeing with terms of uses and creating client IDs. Also, they have to “go through a rigorous Spotify approval process before being released on the platform” (Myers, 2011). 

Its collaboration with other companies and websites is supported by API. Spotify uses proprietary format to protect the artists and the content of its software.


From producing, compressing and packaging, to transmitting, decoding and playing music files, Spotify does not invent anything new for the music streaming industry, but it does a great job in working as an interface of “connecting and coordinating”. It connects websites, people, and physical functions of devices. It coordinates the labor distribution of the software’s functions, the passage of data packets, the relationship between its service and the sociotechnical environment. With the purpose of connecting and coordinating, following the rules of the large sociotechnical background, Spotify applies as many design principles as possible to provide their users the most pleasing, convenient and personalized music streaming service. There is nothing about the technology itself that makes everything a blackbox for consumers (Martin Irvine), but how Spotify applies the design rules. Spotify is a product of human computer interaction.

Reference and Citations:

  1. Peter Brusilovsky(2007). The Adaptive Web. p. 325. ISBN 978-3-540-72078-2.
  2. Richardson, J. H. (2014). The Spotify Paradox: How the Creation of a Compulsory License Scheme for Streaming On-Demand Music Services Can Save the Music Industry. SSRN Electronic Journal. doi: 10.2139/ssrn.2557709
  3. Eriksson, M., Fleischer, R., Johansson, A., Snickars, P., & Vonderau, P. (2019). Spotify teardown inside the black box of streaming music. Cambridge, MA: The MIT Press.
  4. Ron White, How Computers Work. 9th ed. Indianapolis, IN: Que Publishing, 2007. Excerpts.
  5. Peter J. Denning and Craig H. Martell. Great Principles of Computing. Cambridge, MA: MIT Press, 2015. Review chapters 4, 5, 6. Excerpts in pdf.
  6. Brianwhitman, Author. “How Music Recommendation Works – and Doesn’t Work.” Variogram by Brian Whitman, 11 Dec. 2012,
  7. “Streaming.” Merriam-Webster, Merriam-Webster,
  8. “5.1.2 Digitization.” Digital Sound & Music, 23 Jan. 2018,
  9. “Encoding Audio and Video – Revision 5 – GCSE Computer Science – BBC Bitesize.” BBC News, BBC,
  10. Sverdlik, Yevgeniy. “How Much Is Spotify Paying for Google Cloud?” Data Center Knowledge, 7 Mar. 2016,
  11. Dbarrosop. “SDN Internet Router – Part 1.” Labs, 28 Jan. 2016,
  12. Dbarrosop. “SDN Internet Router – Part 2.” Labs, 2 Feb. 2016,
  13. Ciocca, Sophia. “How Does Spotify Know You So Well?” Medium, Medium, 5 Apr. 2018,

What makes TikTok possible——the technologies and design principles behind it


Technology is not only the fusion of existing technology, but also the extension of human cognitive ability. As one of the most popular and time-wasting applications, TikTok has gained huge success all over the world. People spend lots of time on TikTok without realizing it. Why is it so hard for people to quit TikTok? This paper approaches this question by detailed analysis of the design principles and technologies behind it. TikTok integrates a variety of technologies and make them into a multi-functional video social software. Through the example of TikTok, we can further understand that on the one hand, technologies are always the combination of technologies that already exist and the harness of new pheromone in society. On the other hand, technology is bound to bring change both in good ways and wrong ways, so we need to use critical thinking to analyze and view technology.


Launched in 2016,Douyin has become one of the most popular apps in china. To extend the global market, TikTok, the English version of Douyin, was launched in 2017. According to market research company Sensor Tower, TikTok, the short video app, was downloaded more than Facebook, Instagram, and YouTube in the first three months of 2018, reaching 45.8 million downloads. Since Douyin and TikTok have little difference in product design and functionality, this essay uses “TikTok” to represent both the version of Douyin and TikTok in a unified way.

Figure 1. Top non-game apps by downloads(Source:

Figure 2. TikTok new installs by month in 2018(source:

TikTok is a short-form mobile video app that is mostly focusing on young people. Users can shoot a variety of 15-second music short videos, including dance performance, script imitation, talent expression, emotional expression, skill-sharing, life record, and other content forms to create their works. Unlike most video apps, TikTok doesn’t have a “start” button. Once the app is open, the video starts playing automatically. You can scroll through different videos by swiping up and down, just like you scroll through pictures on Instagram. When you follow a particular account, TikTok also feed you a similar account. Whether in the car, when eating or even at work, you can always open to browse funny videos. TikTok even reached 500 million monthly active users as of June 2018(SCMP, 2018), of whom mostly below 25.

Figure 3. Age group of douyin users(source:

The popularity of TikTok exists for three reasons: Immersive interaction design, diverse functions, fragmentation propagation. Through 15 seconds short videos, people can not only share their life but also get to know all kinds of funny things. My following essay will de-blackbox TikTok in three aspects to see what makes TikTok addictive: interaction design, systems, and modularity, short-form video technology.

Interaction design

Interaction design is the practice of designing interactive digital products, environments, systems, and services. (Cooper,2007). Interface is significant in interaction design because the interface is what the user sees and operates; it sits between the machine and the person, like the knobs and dials on a toaster, or the icons on a computer screen (Murray,2011). Interaction design strives to create and build meaningful relationships between users and products. Its goals can be analyzed from the perspectives of “usability” and “user experience,” focusing on human-centered user needs.

The interaction design of TikTok can be categorized into three aspects. Firstly, TikTok chooses to play videos on full screen, which draws users’ attention immediately and significantly reducing the user’s cost on learning how to use it. As Janet Murray said: “A better design value than intuitive is transparent: a good interface should not call attention to itself but should let us direct our attention to the task(Murray,2011).” A well-orchestrated user interface is transparent(Cooper,2007). The first time people enter TikTok, they can learn how to use it right away without thinking. Because the interface is arranged in a neat and easy understanding way, no matter how cool your interface is, less of it would be better (Cooper, 2007). On its main interface, we can only see some necessary icons, including post videos, thumb up, comments, and share. Another attractive design is when viewing the comments and filling in the comments, only one pop-up appears, and the video continues playing. This pop-up design creates a continuous watching environment for the user because even when they can only see the videos on half of the screen, they still can listen to the sound from the video.

Figure 4. TikTok home page and comment page.

Secondly, TikTok has efficient gesture interaction, which provides proper conventions for users. Constraints limit the possible actions that can be performed on a system. Proper application of constraints makes designs easier to use and dramatically reduces the probability of error during interaction (Lidwell, Holden, & Butler, 2010). When using phones, we get used to scrolling up and down to adjust the order of the page. TikTok applies the same conventions; that is, the user can switch between different video contents only by sliding up and down. Each swipe can get new content, which is efficient and straightforward. If the user wants to know more about the account that they are watching, they can scroll left to see all the videos on this account. Users can do most of the functionality just by swiping and clicking on a page in TikTok.

Figure 5. TikTok gesture interaction.

Before TikTok, most video apps like Snapchat and Instagram usually design a home page that shows a thumbnail of the video, and users can view the video by clicking on the video cover. However, TikTok cuts down the process between “click the video” and “watch the video” ,which reduce time for users to enter videos and switch between videos.

Figure 6. Snapchat and Instagram interface.

Another unique gesture is double-clicking. By double-clicking on the screen, people can thump up and collect the video. In design, it is crucial to show the effect of an action. Without feedback, one is always wondering whether anything has happened. Icon of little red heart appears on the screen after double-clicking to give feedback to users so that they can make sure their actions are valid. Once the user double-clicks the video, TikTok stores it in  like list so that users can watch it whenever they want.

Figure 6. Double-clicking effect in TikTok.

Thirdly is immersive design. After opening TikTok, the user cannot choose what the next video is. TikTok plays the popular content on the whole screen based on its recommendation algorithm, which provides users with unexpected feeling. This sudden feeling is the result of rewarding stimuli. Rewards are attractive; they are motivating and make us exert an effort. Anything that makes an individual come back for more is a positive reinforcer and therefore a reward (Schultz,2015). One important function of the positive reward is to maintain an active repertoire of behavior (Ferster&Skinner, 1957). Thus, unknow videos make users get addicted to scrolling up the screen to watch more videos. Besides, videos play on a loop until people slide up or down. In this way, people immerse themselves in current videos without distractions. If the user plays the video more than three times in a loop, the system of TikTok assumes that you like the video. Consequently, the “share” icon becomes another social media icon to encourage the user to share the video on other social media.

Figure 7. The change of share icon in TikTok.

TikTok also highlights music attributes, such as music information display. A critical reason for TikTok’s success is its soundtrack. Soundtrack in every video can be directly used by other people,which lower the cost of video shooting, encourage users to create and produce hot spots. After clicking the record icon, it jumps to the select music page. Users can press the “use this sound” to use the same soundtrack for their videos.

Figure 8. Soundtrack link in TikTok.

System and modularity

Technologies are built from a hierarchy of technologies. A technology consists of main assembly and supporting assemblies; each assembly or subsystem must be organized this way too (Arthur, 2009). All the functions of TikTok are made up of several small modules. The video shooting function of TikTok consists of a video module and an audio module. After user input videos and choose music, TikTok decodes and combines them and finally outputs the complete video on the platform. The video editing technology can be classified into various modules such as face recognition, real-time capture, and beauty algorithm. By working at the same time, they provide us with filters and effects used on the video.

Figure 9. TikTok effects and beauty function.

The technology of recommending videos is also the consequence of modularity. What you watch in the next video is decided by different algorithm modules such as the content of the video that you like, the amount of video comment, the amount of forwarding. What can be studied is always a relationship or an infinite regress of relationships. Never a ‘thing’ (Bateson, 2000). TikTok is not an isolated product but the result of various interdependent subsystems of modules working together.

Figure 10. Main function and module of TikTok.

Short-form video technology

There are two types of TikTok video,one is the live video, and another is the recorded videos. In this part, I would explain how does TikTok provide these two different videos.

As an application, TikTok operates under the layers of the internet. Each layer only processes data to and from the layers it connects to, and is designed not to “know” or have to deal with all the complex variables handled by the other layers(Irvine). When we initiate TikTok, the application layer provides us with the main interface through internet protocols such as HTTP. When we use TikTok to watch videos, the transport layer transmits information into specific data from the database through the internet and presents them into the application layer. The core protocol of transport is TCP/IP protocol. To be specific, TikTok sends requests to its back end through the internet. Its back end is a system that supports the operation of TikTok running on mobile phones. Then, TikTok’s back end sends a request to its database. The database would find out what the user needs and send them back to the back end and the interface on phones. All these procedures happen in one second to provide what we see on TikTok now.

Influencers or celebrities usually initiate live streaming videos on TikTok. When celebrities use some equipments to broadcast videos, an encoder collects data and compresses into a video stream that can be watched and transmitted. The encoder itself may be inside the camera, but it can also be a stand-alone device, computer software, or mobile application. After that, the video data are packaged into a real-time transmission protocol for transmission over the internet (Miller, 2018) Then, the media server in the cloud receives all the video data and changes them into streaming video.

For the recorded videos on TikTok, they consist of multiple images, which is a continuous set of images. Capture chip inside the phone act as a server for internet video. It receives analog signal form TikTok and turns the signal into digital information at a rate of 30 frames a second (White, 2007). Then, the capture device sends the information through some compression standard. When displaying the videos on TikTok, the compression algorithm divides the videos into frames. The compression algorithm transfers those different parts between frames to transmit less data and make videos play smoothly. Videos on TikTok use the H.264 compression standard. The H.264 standard represents coding efficiency enhancement and flexibility for effective use over a wide variety of network types and application domains. It differs from previous technology in enhanced motion prediction capability, use of a small block-size exact-match transform, adaptive in-loop deblocking filter, and enhanced entropy coding methods (Wiegand, Sullivan, Bjontegaard, & Luthra, 2003).

One of the highlights of TikTok is its vibrant and exciting soundtrack accompanying videos. The audio we hear on our phones and computers is digital-analog audio(White,2007). To provide music in a limited app, the audio module in TikTok processes sound in three ways. Firstly, it perceives sound as a signal and digitizes the sound. Secondly, it changes the digital signal into a binary bit. Thirdly, TikTok records the sampled and quantifies data in a specific format to play, copy, and retrieve the music.


In conclusion, The reason why TikTok is addictive has two: first, excellent interaction design that brings good user experience and sense of agency; second, precise algorithm and technical support. With the example of TikTok, it is easy to find out that what makes it popular is not because it invents new technology, but because it can combine various technologies. All the video related features can be found on TikTok so that users can satisfy their needs on one application. Although TikTok is different in some ways from other video applications, but the design principles behind them are the same. They are all sociotechnical artifacts. As Vermaas has said, technology is an expression of our endeavors to adapt to the world in which we live to meet our needs and desires. Technological action may, therefore, be termed a form of goal-oriented human behavior aimed at primarily resolving practical problems(Vermaas, Kroes, Poel, Franssen, & Houkes,2011). TikTok acts as a medium between culture and society. Through videos, it spreads cultural values as well as popular hot topic and establishes connections between strangers. Instead of the content that it displays, the significant function of TikTok is extending people’s recognition of the diversity of the world.

Admittedly, TikTok does change the way of entertainment and social networking. But we also need to be cautious of its fragmentation of information. Fragmentation of information could make people get used to receiving short forms of information and make it harder to concentrate on complete and long information such as books.

Work Cited

Arthur, W. B. (2009). The nature of technology: What it is and how it evolves. Simon and Schuster.

Bateson, G. (2000). Steps to an ecology of mind: Collected essays in anthropology, psychiatry, evolution, and epistemology. University of Chicago Press.

Borak, M. (2018, July 24). Douyin is the most downloaded app in the Apple App Store · TechNode. Retrieved from

Cooper, A., Reimann, R., & Cronin, D. (2007). About face 3: the essentials of interaction design. John Wiley & Sons.

Ferster, C. B., & Skinner, B. F. (1957). Schedules of reinforcement.

Graziani, T. (2018, October 21). How Douyin became China’s top short-video App in 500 days. Retrieved from

Iqbal, M. (2019, February 27). TikTok Revenue and Usage Statistics (2019). Retrieved from

Irvine M. The Internet: Design Principles and Extensible Futures.

Lidwell, W., Holden, K., & Butler, J. (2010). Universal principles of design, revised and updated: 125 ways to enhance usability, influence perception, increase appeal, make better design decisions, and teach through design. Rockport Pub.

Miller, J. (2018, December 6). Live Video Streaming: How It Works: Wowza. Retrieved from

Murray, J. H. (2011). Inventing the medium: principles of interaction design as a cultural practice. Mit Press.

Schultz, W. (2015). Neuronal reward and decision signals: from theories to data. Physiological reviews, 95(3), 853-951.

Tung, H., & Zhang, Z. (2018, July 24). 8 Lessons from the rise of Douyin (Tik Tok) · TechNode. Retrieved from

Vermaas, P., Kroes, P., van de Poel, I., Franssen, M., & Houkes, W. (2011). A philosophy of technology: from technical artefacts to sociotechnical systems. Synthesis Lectures on Engineers, Technology, and Society6(1), 1-134.

White, R. (2014). How computers work: the evolution of technology. Pearson Education.

Wiegand, T., Sullivan, G. J., Bjontegaard, G., & Luthra, A. (2003). Overview of the H. 264/AVC video coding standard. IEEE Transactions on circuits and systems for video technology13(7), 560-576.

Yang , Y. (2018, July 17). Tik Tok racks up 500 million global MAU as short video craze continues. Retrieved from