Category Archives: Final Project

A Sociotechnical Approach to Software as Politics

Mariana Leyton Escobar


This essay uses secondary sources, mainly Gabrielle Coleman’s “Coding Freedom: The Ethics and Aesthetics of Hacking” (2012) and Manuel Castells’ “The Internet Galaxy: Reflections on the Internet, Business, and Society” (2003) to perform a preliminary analysis of how the development of software came to be the center of two ways of thinking about technology. The main concern is how this could be explored with the method proposed by actor-network theory. The findings set the stage for a more focused analysis based on primary data collection.

In a compelling anthropological account of the evolution of the free and open source software culture, Gabrielle Coleman shares the following poem:

Programmers’ art as

that of natural scientist

is to be precise.

complete in every detail of description, not

leaving things to chance.

reader, see how yet

technical communicants

deserve free speech rights;

see how numbers, rules,

patterns, languages you don’t

yourself speak yet.

still should in law be

protected from suppression,

called valuable speech!

(Schoen, cited in Coleman 2012, p. 161)

This poem is compelling enough, making the case for why programming code should be considered, and thus protected, free speech. Indeed, as Coleman (2012) explores in her study, the free and open source movement developed a culture around “broad, culturally familiar visions of freedom, free speech, rights, and liberalism that harks back to constitutional ideals” (p. 2). In this sense, the poem is made more complex because it represents a movement with specific ideals. But it goes beyond that, as the poem, written in 1999, was actually part of a larger, worldwide protest against the arrest of then six-teen year old, free and open source software developer Johansen (Coleman, 2012).

One of the ways in which DVDs are protected so that they are not copied and distributed without permission is to encode encryption in it, something known as digital rights management (DRM). DRM are types of access control technologies developed to restrict use of proprietary hardware and copyrighted works established by the Digital Millenium Copyright Act of 1998 as the software not-to-meddle-with. In other words, the DMCA establishes, among other things, that the “production and dissemination of technology, devices, or services intended to circumvent” measures that control access to copyrighted works, such as DRM technologies, is a crime.

Johansen had written, along with two anonymous developers, a piece of software called DeCSS that would allow people to unlock an encryption encoded in DVDs to control their distribution. The poem is in fact a transcoding of the code of such software (Coleman, 2012, pp 161, 170). A piece of the code can be seen in the image below, a snapshot of a page in Coleman’s book.

A piece of the contested code (Coleman, 2012).

A piece of the contested code (Coleman, 2012).

The poem then becomes a technical artifact that is part of a complex sociotechnical system built around the philosophy of creating and sharing software free of restrictive intellectual property rights.

That sentence contains several components that will be expanded in this essay in order to explore the free and open source software movement as a sociotechnical system that has emerged in parallel to the commercial software sociotechnical system with the development and expansion of personal computers, the Internet, and the web. By following the actor-network theory method proposed in social and technology studies (STS), it will offer a preliminary analysis of how the development of software came to be the center of two ways of thinking about technology. Using secondary sources, it will evaluate the type of nodes and links that would need to be followed to explore this question in a subsequent, more focalized study.

Sociotechnical Systems

“To conceive of humanity and technology as polar opposites is, in effect, to wish away humanity: we are sociotechnical animals, and each human interaction is sociotechnical. We are never limited to social ties. We are never faced only with objects.” (Latour, 1999, p. 214)

In 1986, Langdon Winner, an STS scholar, wrote the popular essay “Do Artifacts Have Politics?” posing the idea that they do. In his view, technology should not be seen from a deterministic perspective by which it is expected to have specific impacts on society, but he calls attention to the fact that the social deterministic theories of technology — that consider not the technology but the socioeconomic system in which it is embedded — go too far in removing any interest from it. Not denying the usefulness of a social constructivist approach, to understand how artifacts have politics, Winner argued, the technological artefacts themselves had to be taken seriously. Without focusing on a specific technology, his argument is that artifacts have politics insofar as they are the result of structuring design decisions, decisions that once the artifact is finalized and put in the world, influence “how people are going to work, communicate, travel, consume, and so forth over a very long time” (Winner, 1986, p. 5).

A good example for both ideas — that a technological artifact can structure how people organize and that this influence can last for a long time — is the QWERTY keyboard configuration. The QWERTY design does not favor any specific design requirement, neither for the users nor for the hardware (or now software) that holds it, and yet it has not changed since its inception and it is likely it will continue to last. Paul David (1985) offers a great account of the “one damn thing follows another” story that led to this situation based on the concept of path-dependence. This economics concept explains how certain outcomes can result from “historical accidents” or chance “rather than systemic forces” (p.332).

Among the three factors David identifies are determinant in the history of the QWERTY keyboard is a need for “technical interrelatedness” (p. 334) which is the need for system compatibility or interoperability among different parts of a technical system. The typewriter in this case was considered an instrument of production as it was at first mostly bought by businesses that would invest in training workers to memorize and efficiently use the QWERTY keyboard. Thus, the compatibility that was valued by the time the market for the typewriters started to grow circa 1890 was that of the keyboard with human memory. In this way, not only the keyboard, but a specific design of a keyboard, had structured the organization and budget of a business in a way that eventually determined that we are still using a layout designed for typing with ten fingers on phones in which we type with two thumbs. This type of back and forth with technology structuring social forces and then being shaped by those very forces is at the very center of what is meant by sociotechnical system.

A definition

Through a philosophical characterization of technical artifacts (as opposed to natural or social objects) and their context of use, Vermaas et al (2011) propose a baseline concept of the matter at hand. To begin with, a system can be defined as “an entity that can be separated into parts, which are all simultaneously linked to each other in a specific way” (p. 68). A sociotechnical system is a hybrid system — a system in which the components that make it up are essentially different, or put in the authors’ words, “components which, as far as their scientific description goes, belong in very many different ‘worlds’” (p. 69). A sociotechnical system is then a hybrid system in which certain components are described by the natural sciences and others by the social sciences (ibid.). In a such a system, there can be many users at one time and they can take on the role of user, operator, or both (ibid). A sociotechnical artifact is then the “redefinition of technology” as a node in a sociotechnical system (Irvine, 2016).

The Social and the Technical?

Recognizing the effect that the cultural structuring of technological innovations could have, and that social and or cultural developments could be understood by looking at the technical base of such development, Régis Debris (1999) proposed mediology as a methodology to explore “the function of a medium in all its forms, over a long-time span (since the birth of writing), and without becoming obsessed by todays media” (p. 1). Indeed, Debris did not refer to a study focused on “the media,” but focused on the relationship between what he refers to as “social functions” such as “religion, ideology, art, politics” in their relationship with the “means and medium/environment [milieux] of transmission and transport” (ibid). The focus of this methodology is on the relations between “the social” and “the technical” but by expanding the definition of the latter to include not just the technical artifact, the medium, but also its environment.

While Debris’ (1999) proposal expands what is to be understood from “the technical,” he maintains a duality between that and “the social,” something that actor-network theory (ANT), another method to explore sociotechnical systems, removes. Bruno Latour, one of the key proponents of this approach, argues such dualism needs to be discarded because, misguided, it has only served to hide a more complex reality: that humans are “sociotechnical animals, and each human interaction is sociotechnical” (1999, p. 214). In Pandora’s Hope (1999), Latour offers the telling of a “mythical history of collectives” by which he explores eleven levels through which human and non-human objects (actants) are theorized to have co-evolved together, as well as four interpretations for what technology mediation means, to explain how humans and non-humans can “fold into each other.” His theoretical analysis aims to show how it is that humans and non-humans are part of one same process that has happened throughout history which has resulted in the current “collective” — ANT’s term for the assemblage of humans and non-humans, used instead of the term “society.


Technical mediation and four moments of association in ANT

The four ways in which technology is a mediator are important to understand a key concept to use ANT as a method of analysis, as they are the means by which agency is distributed in a network. Collectives change as humans and non-humans articulate different associations among them according to specific purposes:

  • Translation: the means by which the goals of two or more actors (human or non), articulate their individual goals.
  • Composition: the means by which the articulated individual goals become a different, composite one through successive translation.
  • Enrrollment: the process by which the joint production of the association formed produces outputs through a blackboxed process (a process in which only inputs and outputs can be observed, while the process between them is not easily discernable). This moment can vary depending on how many components are coming together, their type of goals, etc. Once the actors can align their goals and create a blackbox, they become, as one, a new actant is created, leading to the last step.
  • Displacement: the creation of a new hybrid, a composite of human(s) and non-human(s), which forms a new collective with distinct goals and capacities. (Latour, 1999, pp. 176–198)

ANT as a methodology then can be used to understand how agency is distributed in different phenomena (not just “social” phenomena, hybrid phenomena) of which sociotechnical artefacts are a part. To apply it, Latour (2007) explains it is necessary to be extremely observant and collect all data that evidences traces of humans or non-humans components establishing links among each other to pursue certain goals. By doing this, and through a process of thorough description of thick data, he suggests it is possible to understand how agency is distributed among humans, non-humans, mediators, events, and blackboxes that hide some assemblage of them (Latour, 2007).

By retracing these links, reversing the blackboxing, and exploring their historicity, we can use ANT to understand why sociotechnical systems work the way they do, at what moments there were alternatives for it, and in what way the system found some level of equilibrium by blackboxing some assemblages. In this case the focus will be on understanding how and the development of software came to be the center of two ways of thinking about technology.


ANT is a theory filled with new terminology that can be very confusing. As a thorough account of it goes beyond the scope of this essay, I include as a supplement to this section a selection of the Glossary shared by Latour in In Pandora’s Hope (1999).


An expression from the sociology of science that refers to the way scientific and technical work is made invisible by its own success. When a machine runs efficiently, when a matter of fact is settled, one need focus only on its inputs and outputs and not on its internal complexity. Thus, paradoxically, the more science and technology succeed, the more opaque and obscure they become. (p. 304)


Unlike society*, which is an artifact imposed by the modernist settlement*, this term refers to the associations of humans and nonhumans*. While a division between nature* and society renders invisible the political process by which the cosmos is collected in one livable whole, the word “collective” makes this process central. Its slogan could be “no reality without representation.” (p. 305)


A term employed by Whitehead to designate an event* without using the Kantian idiom of the phenomenon*. Concrescence is not an act of knowledge applying human categories to indifferent stuff out there but a modification of all the components or circumstances of the event. (p. 305)


A term borrowed from Whitehead to replace the notion of discovery and its very implausible philosophy of history (in which the object remains immobile while the human historicity of the discoverers receives all the attention). Defining an experiment as an event has consequences for the historicity* of all the ingredients, including nonhumans, that are the circumstances of that experiment (see concrescence). (p. 306)


A term borrowed from the philosophy of history to refer not just to the passage of time-1999 after i998-but to the fact that something happens in time, that history not only passes but transforms, that it is made not only of dates but of events*, not only of intermediaries* but of mediations*. (p. 306)


The term “mediation, .. in contrast with “intermediary*,” means an event* or an actor* that cannot be exactly defined by its input and its output. If an intermediary is fully defined by what causes it, a mediation always exceeds its condition. The real difference is not between realists and relativists, sociologists and philosophers, but between those who recognize in the many entanglements of practice* mere intermediaries and those who recognize mediations. (p. 307)


Like society*, nature is not considered as the commonsense external background of human and social action but as the result of a highly problematic settlement* whose political genealogy is traced throughout the book. The words “nonhumans*” and “collective*” refer to entities that have been freed from the political burden of using the concept of nature to shortcut due political process. (p. 309)


This concept has meaning only in the difference between the pair “human-nonhuman” and the subject-object dichotomy. Associations of humans and nonhumans refer to a different political regime from the war forced upon us by the distinction between subject and object. A nonhuman is thus the peacetime version of the object: what the object would look like if it were not engaged in the war to shortcut due political process. The pair human- nonhuman is not a way to “overcome” the subject-object distinction but a way to bypass it entirely. (p. 308)


Shorthand for the “modernist settlement,” which has sealed off into incommensurable problems questions that cannot be solved separately and have to be tackled all at once: the epistemological question of how we can know the outside world, the psychological question of how a mind can maintain a connection with an outside world, the political question of how we can keep order in society. and the moral question of how we can live a good life-to sum up, “out there,” “in there,” “down there,” and “up there.” (p. 310)


The word does not refer to an entity that exists in itself and is ruled by its own laws by opposition to other entities, such as nature ; it means the result of a settlement* that, for political reasons, artificially divides things between the natural and the social realms. To refer not to the artifact of society but to the many connections between humans and nonhumans*, I use the word “collective*” instead. (p. 311)


Computing and the Internet — Communities, Programming, and Values

The history of computing and the Internet has been told from many perspectives over the years, and a theme that emerges consistently is about how different communities of users emerged and co-evolved along the technology in different ways. This section will highlight how this co-evolution is not determined by the technologies themselves, but by the interactions between actors who use, tinker with, and expand on the technology, and how the technology changes along these actions. In such way, computing, networking, and software can be seen as sociotechnical artifacts that are part of a sociotechnical system. They don’t evolve on their own and don’t determine what people do with them. Users and technologies come together to develop a sociotechnical system in which users can use and/or create applications for the computers and the Internet in turn shape the way in which users and technologies assemble. In this process, blackboxing can take place in a variety of places, but the focus here will be on how the development of software came to be the center of two ways of thinking about technology.

In Principles of Computing, Denning and Martell (2015) explain how computing can be understood as a science in itself because in its most abstract conception, it is a matter of processing information. As such, computing can be applied to a number of different domains (such as security, artificial intelligence, data analytics, networking, robotics, etc.) because, as a method to process and generate information, it is about following certain principles that can be combined in a number of different ways in different domains to achieve different objectives (Denning & Martell, 2015, pp. 13–5). Computing as a method in itself then does not determine what can be done, but can guide its application through principles based on communication, computation, recollection, coordination, evaluation, and design (ibid). As such, computing opens up a world of opportunities for those interested in developing a computing application for specific domain. This is what Mahoney (2005) explores in the different histories that emerged as communities of practitioners got together to develop specific domains, thus bringing more attention to those aspects facilitated by computing. He focuses on the different aspects of computing that were developed by different groups, such as data processing and management for the scientists and engineers creating it, the private sector or for government.


Software is how we “put the world into computers” (Mahoney, 2005).

Mahoney emphasizes how historians of computing are only beginning to explore the history of software. While he emphasizes the importance of removing the focus from the machine to include its use, history, and design, in order to understand this history properly, he also says that “associated tasks such as analysis, programming, or operation” need to be understood. This echoes Latour’s urging for analysis of traces of all activities in a sociotechnical system. For Mahoney, understanding the history of software was important because the software is what “actually gets things done outside the world of the computer itself,” and the communities that develop software are the ones filling the gap between “what we can imagine computers doing and what we can actually make them do” (Mahoney, 2005, p. 128). He says that in not understanding this history, we miss out on understanding that this process is not determined and so we don’t learn what the alternatives are. This is important because software is how we “put the world into computers,” and to do that entails on “how we can represent in the symbols of computation portions of the world of interest to us and how we can translate the resulting transformed representation into desired actions” (Mahoney, 2005, pp. 128–9). The history of computing then is not just about how transistors, chips, and screens, but about how different groups of people used such components to develop some areas, based on the principles of computing, in terms of their interests in a way that selects how to represent the world.

To put this in a more concise way, Alan Kay explained computers as a meta-media, a medium whose content is “a wide range of already-existing and not-yet-invented media” (Manovich, 2012, p.44). Because the computing doesn’t set rules for what can be done with computers but what principles should be followed to use computing in general (Denning & Martell, 2015), the range of the “not-yet-invented media” remains wide. Moreover, technology in general (not just computers) also follow two key principles: “cumulative combinatorial design” and “recursiveness,” which explain that technologies are made of components of previously made technologies, which can be used as components later on (Arthur, 2011).

To the extent the computer was developed to be a general-purpose machine — and the Internet designed as a general-purpose, “dumb network” meant only to transport data, — users can develop applications for this meta-media by developing software. In doing so, and following the principles mentioned, users can use software to represent and combine the formats of previously existing media, remix, and expand on them, thus contributing to the meta-medium. If the computer allowed users to manipulate information more easily, the Internet added to that by allowing users to do so while connecting with each other.

In that way, in Software Takes Command, Manovich (2013) shares Mahoney’s concern for software by explaining that “software has become our interface to the world, to others, to our memory and our imagination—a universal language through which the world speaks, and a universal engine on which the world runs” and yet its history has remained mostly unexplored (p. 2). For Manovich, the key to understand about software and its representational function is that, by digitalizing information so it can speak the language of computers, we transform it in a substantial way:

“In new media lingo, to “transcode” something is to translate it into another format. The computerization of culture gradually accomplishes similar transcoding in relation to all cultural categories and concepts. That is, cultural categories and concepts are substituted, on the level of meaning and/or the language, by new ones which derive from computer’s ontology, epistemology and pragmatics. New media thus acts as a forerunner of this more general process of cultural re-conceptualization.” (Manovich, 2002, p. 64)

Under this light, the focus on software development is emphasized rightly as it turns out that programming code and algorithms to “put the world into computers” entails a decision-making process of what to represent of the world and how to do it. To the extent that more of our activities are then mediated by software-based technologies, they are being mediated by decisions that had to consider alternatives of representing the world to begin with. At the same time, our networked technologies have developed in such a way that the use of some software is highly distributed across the globe, and so the interactions among users and developers of software can be seen as a sociotechnical system made of a wide array of human and non-human components, including the designers and users of software, as well as all the components necessary for software to exist and function.


Software as a Sociotechnical System

To explore software as a sociotechnical system then would entail exploring the history of computing and the development of the Internet, along with a whole array of details depending on what aspect of software one is interested in.

In this case, the focus is on how software became the center of two ways of thinking about technology as evidenced by the emergence of a community that values the “free and open” aspects of software, while another one emerged that valued the commercial aspects of software while promoting the idea of “quality software.”

Gabriella Coleman’s (2012) anthropological account of the free and open source software community and the way in which they developed technological and material practices, along with their own vision of liberal ideas, along with Manuel Castell’s (2003) sociological explanation of how four levels of  “Internet culture” developed with the emergence and initial expansion of the Internet will serve as secondary data to explore the initial nodes and links in the sociotechnical network that would need to be explored to account for such development. While neither of the authors uses ANT, both emphasize the interaction of networked individuals and collectives with technology and, without falling into a techno-deterministic approach, give technology the sufficient “importance” to guide the ANT analysis that would put such technology in the same place as human actors.

Emergence of a community

While free and open source software is not a new concept, as such was the method to develop and share software in the initial stages of computing, the focus on it as a philosophy to think about technology has developed more recently. Coleman (2012) explores how a community of free and open source software developed internationally as the self-identified hackers were able to connect with each other around FOSS projects and thus develop two main components: a material one based on the practice of developing software, and an own vision of liberal ideals. As she explores the ways in which the FOSS community struggled with intellectual property laws in order to promote a system of software development that did not necessarily commodify it, she finds that the community values the liberal idea of free speech, but opposes that of commodifying everything. A romanticized interpretation of liberalism is what soothes this tension (p.3-4). In her account, on top of telling the development of encounters with the law, (part of which is explored in the first segment of this essay), another important moment emerges as software commercialization begins to boom and the not-yet-so big community of software developers develop two ways of thinking about software. In 1976, after it became clear that hackers were sharing the source code for Microsoft products, Bill Gates wrote a letter to the then called “hobbyists” in an attempt to explain why developing software outside of a commercial venture would not be sustainable as it would not develop “quality software” (p. 65). A decade and many more developments later, Richard Stallman was establishing the Free Software Foundation, the GNU Manifesto, and the General Public License.

For Castells (2003), four main “Internet cultures” emerged as the Internet propagated: the techno-elites, the hackers, the virtual communitarians, and the entrepreneurs. The techno-elites were the original Internet architects and the community that spread from there, which valued meritocracy and openness both in their method of work and in their design, which is why the Internet is based on open standards. However, for hackers, the open source was not enough, it had to also be free, not in terms of cost, but in terms of freedom to share, understand and tinker with. Castells argues that while the Internet was developed with open Internet protocols, the concern was radicalized by the hacker culture with the “struggles to defend the openness of the UNIX source code” (p. 39-43). Such struggles eventually turned into the movement for free and open source software explored by Coleman. The other two layers explored also serve to understand the sociotechnical context of these developments. On the one hand, the virtual community aspect of the Internet culture calls attention to the easiness with which users can form networked communities across the globe, an important aspect of the FOSS movement. In addition, the entrepreneurial layer brings to front the opposing force that led to software being the focus of a discursive battle over software (p. 52-60). With the advent of digital technologies, the market for new digital products emerged and thus the eagerness to protect the intellectual property of those products.

An encoded poem as a piece of the sociotechnical

From both accounts, the free and open source software community must be read as global and as part of a network that includes the history of computing and the Internet, the history of the expansion of these technologies, the history of intellectual property law (as well its global expansion), as well as their different ideological, cultural, economic and political contexts. As explored by Coleman, the FOSS culture has spread not only through the development of software but by the sharing of such development, online and offline, as she discovers the importance of in-person events for these hackers (2012). As theorized by Castells (2009), the power that networked communities can leverage with the Internet and related technologies has changed and it has the potential to have global impacts. To the extent that the FOSS community continues to expand and openly challenge liberal ideals and ways of thinking about software and technology in general, understanding this complex sociotechnical network is pressing.

The poem quoted above, under this light, becomes a much more complex piece of the sociotechnical puzzle. It is an expression in the name of freedom that not only makes a cultural and political statement by equating code with speech, it also takes the form of a protest artifact by being the transcoding of a piece of contested software. That software in itself is a transcoding of one way to represent the world in the world of networked computers — one way that turned out to activate a network of legal, economic and political arrangements that, in affecting that piece of software, affect all other coded speech. In such a way, this artefact does indeed have politics, but under the light of ANT, it does so in a much more complex way than it sounds.


Arthur, W. B. (2011). The Nature of Technology: What It Is and How It Evolves (Reprint edition). New York: Free Press.
Castells, M. (2003). The Internet Galaxy: Reflections on the Internet, Business, and Society (1 edition). Oxford: Oxford University Press.
Castells, M. (2009). Communication Power. Oxford: Oxford University Press.
Coleman, E. G. (2012). Coding Freedom: The Ethics and Aesthetics of Hacking. Princeton: Princeton University Press.
David, P. A. (1985). Clio and the Economics of QWERTY. The American Economic Review, 75(2), 332–337.
Debray, R. (1999, August). What is Mediology? Le Monde Diplomatique.
Denning, P. J., & Martell, C. H. (2015). Great Principles of Computing. Cambridge, Massachusetts: The MIT Press.
Irvine, M. (2016). Understanding Media, Mediation, and Sociotechnical Artefacts. From Concepts and Hypotheses to Methods for De‐Blackboxing. Communication, Culture & Technology Georgetown University.
Latour, B. (1999). Pandora’s Hope: Essays on the Reality of Science Studies. Harvard University Press.
Latour, B. (2007). Reassembling the Social: An Introduction to Actor-Network-Theory (1st edition). Oxford; New York: Oxford University Press.
Mahoney, M. S. (2005). The histories of computing(s). Interdisciplinary Science Reviews, 30(2), 119–135.
Manovich, L. (2002). The Language of New Media (Revised ed. edition). Cambridge, Mass.: The MIT Press.
Manovich, L. (2013). Software Takes Command (INT edition). New York ; London: Bloomsbury Academic.
Vermaas, P., Kroes, P., Franssen, M., Poel, I. van de, & Houkes, W. (2011). A Philosophy of Technology: From Technical Artefacts to Sociotechnical Systems. San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA): Morgan & Claypool Publishers.
Winner, L. (1980). Do Artifacts Have Politics? Daedalus, 109(1), 121–136.

Researching design principles of Magoosh test preparation portal from technical and social perspectives

Course: CCTP – 820: Leading by Design: Principles of Technical and Social Systems

Instructor: Prof. Irvine Martin

From: Galib Abbaszade

Fall 2016


Final Paper

Subject: Researching design principles of Magoosh test preparation portal from technical and social perspectives


Magoosh is the online test preparation portal, which helps students to pass required test exams (such as SAT, GRE, GMAT, TOEFL, etc.) and be admitted to undergraduate and graduate schools.  This research paper is an attempt to discover capacities of Magoosh test preparation portal from its designing principles viewpoints, which give to “every student access to effective, affordable and engaging test prep tools” and methods. Within this paper the teaching mechanisms provided by Magoosh are described from the angle of applying design thinking approach. To illustrate (to deblackbox) tutorial technics used by Magoosh’s instructors through the portal features, this research discusses such paradigms as the design principles, modularity of the site, abstraction methods, symbolic technologies, mathematical algorithms for assessing students learning progress, etc., applied in this portal. To demonstrate the accuracy of the research hypothesis within this paper, some evidences, such as supportive data and graphics, media and images, references and links are embedded into the text body.


In fact, to meet Universities admission requirements, students need to obtain certain scores of test exams, such as SAT, GRE, etc. Utilizing Internet power and capability, Bhavin Parikh, CEO of the Magoosh internet based company, and three his companions launched this on-line tutorial portal in 2009, which is focused on conducting preparation for test exams.

Considering tests designed based on English language proficiency, the whole test preparation process also built on English language base. It provides equality and even condition for everyone who use Magoosh preparation services from any country, with different language and cultural background.

From Persian language Magoosh means wise, highly-learned and generous one. This interactive portal is designed based on combinatorial modularity approach and cognitive abstraction principles to provide users with certain services by using both technological features and accumulated knowledge gathered from previous tests takers and tests organizers’ experience. In general, designers of this portal did not create something new, not any single part of the whole system. Rather they wisely used already existing hard and software modules to construct the whole portal “building” from the available construction blocks, such as searching engines, communication and computational devices, commonly accepted media symbols, photo and movie appliances, etc. Once the whole portal was constructed from these “blocks”, engineers of the portal “breathed” life to its functioning by uploading it with knowledge gained from previous cognitive experience of organizations or individuals were affiliated to the test taking process. Based on this knowledge they could design this portal as an interactive application which has ability to assess the level of a test taker knowledge and propose recommendations for further preparation period. Math algorithms used by Magoosh engineers make possible to complicate or simplify assignments momentarily reacting to accuracy rate of a test taker. Also, this progressive and regressive evaluation approach for accuracy of answers helps to provide test takers with the same condition which is supposed to be at the real exam.

To increase accessibility of customers and provide them with all options provided by Magoosh portal, company engineers designed the application for smart phones in 2014. Using modularity design principles, the portal designers could adopt the complicated test preparation process for mobile devices and give students a chance to save their time by memorizing words for verbal section or math formulas for analytical section of the chosen exam at any relevant time and place they are.

If the analytical sections of the test preparation process are mostly designed based on math rules and equations, the verbal sections of the exams are designed based on different types of semantic mechanisms and programs to evaluate the grammar accuracy of test takers. The portal engineers customize various symbols and signs to help test takers perform and train quick reaction to the questions which should be answered within short time frame, usually about 50 seconds per each. Considering that all testing exams are designed to be taken through Internet interactive programs, it gives Magoosh more chances to increase the number of its clients providing them with the same approach and mechanisms for tutorial reasons and creating identical condition similar to the real exam environment.

From social perspective, it brings more equity, equality, openness and fairness to the learning process. Everyone, with different cultural and professional background, being connected to internet from any place in the World can get the same access and opportunity to preparation tools of the portal, increase her/his ability and skills to obtain required test scores.

Analyzing Magoosh design principles and features

This section discusses the main elements employed in Magoosh portal to design architecture for the whole system, such as (a) modularity, abstraction and layers as core components for designing the complex systems of the Magoosh portal; (b) cognitive artifacts and symbolic technologies; (c) media features as symbolic cognition tools for the learning process; and (d) algorithmic modules for enhancing interactions between users and portal teaching technics. At the end of this section, possible social aspects originated from functioning of this portal are also reviewed.

(a) Modularity, abstraction and layers as core components for designing the complex systems of the Magoosh portal

Magoosh portal is the best sample of employing modularity design principles to construct multi-media online portal with multiple layers and architectural hierarchy. The main page of the site introduces the general information of the portal and the symbols of test exams, which could be activated as links to open the sub-systems for each test.

Apparently, assembling existing parts in new functional system considers using combinatorial design approach to build new technologies. In the sample of this portal we observe that Magoosh engineers could combine different sophisticated abstraction layers in one functioning subsystems to provide range tutorial services.


Evidently, the combinatorial design mechanism of the whole site is a principle approach for construction learning processes for each test preparation courses. (Further, for better visualization this paper discusses design principles based on Graduate Record Examination (GRE) test.)


GRE is remaining the most popular test exam after TOEFL for entering higher educational programs of Universities in the United States.

test-takers-by-major gender-distribution

To provide the complete version of tutorial GRE course, constructors of the site interconnected multiple modules (or subsystems) with numerous hidden components and functions. In fact, combinatorial approach and modular design requires that components and modules of the whole system connected properly through adjusted interfaces. In addition, working system depends on employing certain standards for interoperability functions of components. On the example of Magoosh portal it is proven that combining existing technologies is effective approach to create new technology. Combinatorial design generates further modularity combinations and increases benefits and efficiency for users within test preparation period. Each segment of the whole system may be employed separately or altogether with group of other modules which provides variety of options for learning process.

(b) Cognitive artifacts and symbolic technologies

Apparently, it would be reasonable to discover now interface design of the portal, as a piece of the cognitive artifact, or how it reveals to its users. Constructors of this portal could assemble two important combinatorial design principles – having used cognitive artifacts and simplifications for easing the whole teaching process to increase its efficiency. Realizing importance of this tutorial portal its architects could satisfy users’ expectations through the site features. They made test preparation mechanisms more affordable and friendly.


Constructors of the site could accumulate 100 million test questions and answer into one site and provide 2 million hours of video lessons. Avoiding other tutorial tactics of generalizing the learning process, Magoosh portal provides individual approach by tuning the whole education course to reflect to certain needs of each students.

screenshot-12 screenshot-13

Using a range capacities of media elements, such as symbolic technics, audio and video lessons, designers of the site make possible to get multimedia cognitive artifacts serving for tutorial needs of users. On other hand, students can take an advantage to get an individual feedback to be focused more on constraints and lack of the certain skills.


Through this approach each user of this site can obtain plenty of required information, various test samples, and her/his answers assessment results indicated in graphs and diagrams to analyze progress of the whole learning process. To ease the studying course for students, engineers of Magoosh portal apply minimalism approach and use flat design technics for portal applications, which indicates through pure-colored block and regular geometries. It helps students to focus their attention on the core required information or features and not be distracted by other characters on the site. In fact, the constructors of the site use graphics, images and other digital media tools to shape the test preparation process in more practical way. To make it more friendly, interactive and attractive, they created efficient interface and affordances tools.


(c) Media features as symbolic cognition tools for learning process

To boost efficiency in the learning process, designers of the site immensely use different type of cognitive technologies and media features, beginning from simple recognizable images and ending with the audio and video lessons. This fair approach makes the whole education process more understandable and place everyone in the same condition to be properly prepared for the real tests. Also, it helps symbolic thought represented in Magoosh learning and software technologies to be cumulative. In fact, the portal media features influence to enhancing cognitive capacities of users and serves to improve both important learning skills – understanding and memorizing. From the technological evolution perspective media features used in Magoosh portal may be marked as outstanding “symbolic species” within development process of human cognitive learning tools. The engineers of the portal used symbolic thought instruments – such as language, analytical and verbal assignments, images, abstract technics, technical mediation, etc. – to increase abstraction and rationality skills of students. In some degree, this portal converged various types of cognitive symbolic systems, such as semiotics, languages, images and combined them through technical modules and software capacity to produce new educational platform to use for efficient learning process. This combinatorial approach opens new horizons for using technological modules interconnected each other to develop learning abilities of students. In other words, symbol systems employed in the portal, their reflexivity (capacity to represent other symbols) increase knowledge comprehension and responsiveness skills of students. Considering the test questions are derived from various academic and social fields, the symbol systems used in Magoosh portal also have collective character.


In designing preparation test, Magoosh engineers used experience of their colleagues and partners working in the same field. In this transition, and in addition to other organizational conditions, the certain attributes of media and symbolic artefacts (such as “store and forward” capabilities) help the site constructors to present previously obtained practical knowledge to new students. Thus, the whole Magoosh portal itself emerged as a symbolic cognition system and combination of abstraction layers to boost the learning process itself. Presenting questions through conventional representations of symbolic thought (such as images and/or geometrical figures), the designers also save linkages between initial sources of previous exam data and contemporary testing systems.


However, tests requirements are improving and changing and test preparation tutorials also should be adjusted to these changing demands. Therefore, designers of the portal utilizing the reflexive nature of symbols, which makes possible to “re-encode one set of symbols to other ones” and create new combinatorial system, where one media system (or meta-media) serves to represent other media systems.

As an important part of GRE exam, designers of the portal also pay special attention to preparation courses focused on Issue and Argument Task Examples (essays) by providing audio and video lessons with rich explanation and step by step teaching writing technics.


(d) Algorithmic modules for enhancing interactions between users and portal teaching technics

Another important option of Magoosh portal is its ability to reflect to each individual needs for learning and preparation process. One of the main goals for Magoosh engineers is to make portal capacity and its attributes tuned properly to the specific preparation requirements of every student. To accomplish this task engineers vastly use various technology and media features, employ combinatorial modularity design principles for composing different abstraction layers and test levels. From technical perspective, the engineers of the site used special mathematical algorithms to reflect every students’ studying progress individually reacting to the accuracy rate of his/her answers to the preparation test questions. In fact, this approach is duplicating the same algorithms which are employed in conducting the real test, organized by Educational Testing Service (ETS) organization, official entity for assessing and grading test scores to be submitted to Universities.


Also, the portal provides students with other individual approaches, such as audio, video lessons divided based on the certain topics, and an option to get in touch with a mentor through e-mails and messengers to discuss the personal learning strategy, to get an advice to improve some certain skills, or understand recommended technics, which is important for achieving required scores.

After signing in, within the toolbars a student can choose “Custom practice” option under the “Practice” button, which opens menu for her/him with new choices to tune the learning process based on her/his personal needs. Among this menu s/he can select such options like “Section” (whether it math or verbal one, and with different types of extensions), “Difficulty” (the level of complication), “Number” and “Time” for test taking, “Mode” (whether to provide explanations for correct answers), and etc. This optional page in the Magoosh portal is a sample of a separate combinatorial modular section, or meta-media system with other multiple media sub-systems embodied in this page.


Upon completion of the probation test, student can choose “Review” option from the main toolbar and realize of her/his progress or regress within studying course. The site also provides charts and schematic images for visualization of analysis of the studying process.

Once student realized her/his weaknesses and areas to be improved s/he has a chance to return to the “Lessons” section and choose the appropriate field for additional tutorial classes and tests.

In addition to the general introduction, the “Lessons” section of the site is composed with consideration all requirements of the real test exam and indicate all possible themes and sub-sections, which may be appeared at the official test. For example, math section covers all possible variations of sub-sections such as General Math Strategies, Arithmetic and Fractions, Percentage and Rations, and etc. Right at the same page, a student can switch to Verbal lessons and sub-sections, such as multiple variation of Text Completion tests, Sentence Equivalence, Vocabulary, Reading Comprehension, and etc. In general, this page comprises all necessary tools, strategies, samples, and information, which requires to pass GRE official exam and get higher scores.

Designers of the site use different types of practical cases, games, and attributes (like flashcards for memorizing) to make the whole learning process more efficient and enjoyable.


To increase accessibility and make it more convenient, Magoosh engineers launched in 2014 the application of their portal for smart phone devices as well. This application replicate the most important features of the main portal and adds more options for using memorizing flashcards for both main section of the test – analytical and verbal.


Also, there is a rich source of practical information is hidden under the “Resources” button on the main toolbar. This section contains such important options as Study Plans (where a student can get plentiful recommendation of how to make the preparation strategy more efficient within the certain period), which sources are the best for defined targets, considering different level of every student, from beginner to advanced one. Also, this section has some additional tools for memorizing formulas and increase vocabular base. And, it also provides some other sources for enriching preparation course (such as Testimonials), to get acknowledged of other Magoosh students’ opinion regarding Magoosh services and opportunities, and to know more about other Magoosh products.

Social impact

From social implications points of view, this site brings more equality and equity to range of students with different backgrounds, financial opportunities, accesses to the source of knowledge. It also opens boarders for international students to compete each other, what leads to rising of the average intellectual (GPA and IQ) level of admitted students in each University. Being relatively cheap in comparing with intramural type of tutorial courses, it makes possible for enrollees with low income to get access to better international schools. Students with technical background using this tutorial site may improve their knowledge in English language grammar and those with social science background can upgrade their skills in math and fast calculation. Also, it is globalizing education process giving a chance for students from different continents and countries to interact and study at the same Universities, to know more of each other and keeping up international relations. In the certain degree, Magoosh through its professional and social activities is connecting people and preparing more educated future generations.


This site does a good job by giving stage to alumni of the preparation programs to share their ideas with new enrolled students. Social aspects of the site are implemented through such link like “press”, “blog”, and can be opened from other social networking portals. It tremendously increases reliability of new students who would like to know more about sites from independent sources. It would be fair to outline that openness at the portal and providing students with tribunes for socializing are implemented impressively. The blog page on the portal resembles all features of the site and performs as media sub-system in bigger meta-media system. The blog page is composed based on all options provided by the site itself to offers students easy access to wanted information.



On the bright sample of Magoosh portal, this research proves that combinatorial design principles bring together already existing technical and software modules to create new technology to be employed in different areas of public life. The number of internet users are growing daily and hourly. Having built on the internet based mechanisms and providing on-line tutorial courses, Magoosh became as a great global source for those who are eager to develop their education and career through education programs at Universities. Magoosh also can be viewed as a platform for social activities, which helps people around the world to connect and know each other.



Deacon, T. W. (1998). The Symbolic Species: The Co-evolution of Language and the Brain. New York: W. W. Norton & Company.

Denning, P. J., Martell, C. H., & Cerf, V. (2015). Great Principles of Computing. Cambridge, Massachusetts: The MIT Press.

Latour, B. (1999). Pandora’s Hope: Essays on the Reality of Science Studies (1 edition). Cambridge, Mass: Harvard University Press.

Manovich, L. (2013). Software Takes Command (INT edition). New York ; London: Bloomsbury Academic.

Norman, D. A. (2010). Living with Complexity. Cambridge, Mass: The MIT Press.

Norman-Cognitive-Artifacts.pdf. (n.d.). Retrieved September 27, 2016, from

McLuhan, M., & Gordon, W. T. (2003). Understanding Media: The Extensions of Man : Critical Edition (Critical edition). Corte Madera, CA: Gingko Press.

Donald A. Norman, Living with Complexity. Cambridge, MA: The MIT Press, 2010.

Donald A. Norman, “Cognitive Artifacts” In Designing Interaction, edited by John M. Carroll, 17-38. New York, NY: Cambridge University Press, 1991.

Richard N. Langlois, “Modularity in Technology and Organization”. Journal of Economic Behavior & Organization 49, no. 1 (September 2002)

Carliss Y. Baldwin and Kim B. Clark, Desing Rules, Vol. 1: The Powerof Modularity. Cambridge, MA: The MIT Press, 2000.

Kate Wong, The Morning of the Modern Mind: Symbolic Culture.” Scientific American 292, no. 6 (June 2005)

Michael Cole, “On Cognitive Artifacts”, From Cultural Psychology: A Once and Future Discipline. Cambridge, MA: Harvard University Press, 1996.

James Hollan, Edwin Hutchins, and David Kirsh. “Distributed Cognition: Toward a New Foundation for Human-computer Interaction Research”. ACM Transactions, Computer-Human Interaction 7, no. 2 (June 2000)

Regis Debray, “What is Mediology?”, Le Monde Diplomatique, Aug., 1999. Trans. Martin Irvine.

WeChat, in a system design perspective

Chen Shen

Abstract Ever since its debut in 2011, the Chinese messenger app WeChat promptly evolved to one of the largest social networks worldwide. Moreover, WeChat integrated many key functions, successfully eliminating the users’ need to switch to other apps. While its counterparts in other parts of the world developed the specialty in the corresponding fields, WeChat evolved in its generality to support all kinds of plug-in-like apps. Especially after it supported mobile payment, and Chinese government began to cooperate with WeChat and used it as a portal to public services, WeChat began to play a core of Chinese mobile life. The highly integrated mobile environment made profound impacts on the Chinese users as well as society as a whole. And by integrating social network, media, business, advertisement, public service, this platform is creating possibilities that unimaginable by other apps even other societies. This paper will analyze WeChat in a system design perspective, discuss the dependence, affordance, and emergence of WeChat, providing a non-determinist way to understand WeChat’s prevalence in China.

What is WeChat? While Chinese young people build their online life upon it, people from other parts of the world have hardly heard of it. In short, WeChat is a Chinese social media app that integrates lots of the core functions of popular everyday apps. But after we examine the system design and underlying structures of WeChat with the concepts and paradigms gained in this course, we may find WeChat is much more than that.

 Introduction to WeChat

WeChat started off as an IM (Instant Messaging) app in 2011. Currently, it is the dominating social media and IM app in China. By MAU (monthly active user), WeChat ranked No.4 worldwide, only after the Facebook series.


Figure 1. Monthly active users of selected social networks and messaging services. Image from

WeChat was 6 years younger than Facebook (counting from Facebook’s open to public registration which is 2005), and it was targeted for the China market only at first. Since WeChat is a completely mobile app which does not have a corresponding website, we can compare the mobile MAU of Facebook and WeChat and come to notice that WeChat’s MAU increased at an even more rapid rate than that of Facebook.


Figure 2. Number of mobile monthly active Facebook users worldwide from 1st quarter 2009 to 3rd quarter 2016 (in millions). Image from


Figure 3. Number of monthly active WeChat users from 2nd quarter 2010 to 3rd quarter 2016 (in millions). Image from

Then we take a quick look at the WeChat interface (we will talk about the functions later in the paper). It’s fairly easy to register a new WeChat account (we encourage you to do that right away and get a better understanding of it), the app markets of major smartphone OS all provide free download). One can create a WeChat account using the QQ account (a Chinese PC-based IM software by the same company, showed in Fig.1 and has 650 million MAU), or with a mobile phone number. Once logged in, the first step is to add contacts. It is easy to import QQ contacts and mobile contacts in batch, keeping the existing contacts alive on the new platform. And there are ways to add new contacts, one of the easiest ways is by scanning QR code (we will discuss QR code in detail later). Every account has a unique QR code, one can press the “+” on the top right corner and then press “Scan QR Code” (e.g. this one, the author’s account) to send a friend request. If the other user confirms it, the two can start chatting. The whole interface design is clearly for mobile use, with big icons, no intense text, and all buttons gathering in the right/bottom part of the screen for one hand navigation.

img_2949 img_2942

Figure 4,5. The download, scan QR Code of WeChat

The chatting part is not that different from WhatsApp, or Messenger. The majority of the screen is dedicated to messages, with four icons listed at the bottom, which are Text/Voice Switch, Input Window, Emoji, and Attach.


Figure 6,7. The chat interface, and layout of WeChat

Back to the main page, right next to Chats are Contacts, Discover, and Me. Contacts are mainly for contacts managing. Discover has the Moments function that enables users to share photos and browse friends’ Moments. Me serves as the setting of WeChat. Four clear-cut pages distinguish different interaction scenario, with the highest menu-depth of three, meaning that users can navigate to any function within three clicks.

The interface of WeChat by far appears straightforward. Before we proceed, here are some data about WeChat:

  • Daily active user improved 64% in 2015
  • 25% of the WeChat users open this app more than 30 times daily (2015)
  • In the first quarter of 2016, WeChat generated 1.8 billion online revenue
  • During the Spring Festival of 2016, WeChat users sent and received “Red Packets” (celebration message with digital cash) in a total of 32.1 billion times
  • Adult users read articles on WeChat for 40 minutes daily on average (2015)
  • WeChat has portals to 85 thousand mobile apps

Why is such a plain looking app so powerful? What is the underlying power that made it the fourth biggest social media network in the world? In the following sections, we will discuss three aspects of WeChat: dependence, affordance, and emergence.

Dependence. What made WeChat possible

I will call this mechanism evolution by combination, or more succinctly, combinatorial evolution.                                                                                                                                             —W. Brian Arthur

As Arthur put it In The Nature of Technology, “Novel technologies must somehow arise by combination of existing technologies”, we can see the same mechanism in both the birth and growth of WeChat. Because WeChat itself came with no novel functions but a new way of combing existing technologies at the time, in a way that enables new synergies between the elements.

Before WeChat’s launch, Blackberry users enjoyed an IM app called BlackBerry Messenger, it had all the functions that WeChat 1.0 had, except for the constraint that BBM can only be used on BlackBerry phones, which accounted for only 16% of the global market. The platform limited the widespread potential of BBM.


Figure 8. Global market share held by smartphone operating systems. Image from

Then in the second half of 2010, an app named Kik launched. Kik supported all the basic messenger functions, and it supported adding friends directly from mobile contacts. Unlike BMM, Kik fully afforded cross-platform communication. The downloads skyrocketed to over a million within two weeks of release. Due to the splendid performance and incomparable edge, Kik was banned on the RIM platform (for BlackBerry).

Three months later, Talkbox launched with the ability of “Push to Talk”. But it didn’t share the capability of multi-platform at the start. As a result, BBM, Kik, and Talkbox, which were combinations of existing technologies themselves, all had advantages and disadvantages respectively.

In the beginning of 2011 WeChat 1.0 launched. Comparing to the current version, 1.x could be only labeled as minimalism, but it “inherited” both Kik’s cross-platform compatibility and Talkbox’s Push-to-Talk versatility. Comparing to the foreign competitors, WeChat had the unique advantage of the gargantuan user base of QQ, which was produced by the same company as WeChat. Because of that, WeChat can seamlessly inherit a user base of more than half a billion and enjoy a huge starting edge against its domestic counterparts. As WeChat evolved with the ability to import mobile contacts in batch, everyone the user actually knows in person is within the reach of WeChat.

These are the technology dependences of WeChat, it is fair to acknowledge WeChat didn’t bring novel technologies per se, yet the way to unite existing elements and create a new environment is also a kind of innovation.

Comparing to traditional social networks like Facebook, Pinterest, WeChat is different due to its hardware dependence. WeChat is “mobile native”, rather than “mobile migrate”. From the beginning of WeChat, it was a smartphone app, meaning that everyone use WeChat meets a hardware requirement list: speaker, microphone, GPS, camera, etc. As a result, WeChat can just assume every single user has full access to voice messages etc., which is a great edge against website-based messenger app. For example, if a Messenger user sends a voice message, the receiver gets the notification on the Facebook website when he is using a public computer with no speaker, then the data is transmitted but the information not delivered.

WeChat’s rapid rise also has its historical dependence. Unlike most developed countries, in the age of landlines, voicemail was not effectively popularized in China. Many reasons were behind this technical malformation, for example, the relatively short spell between the popularization of landline and mobile, and landline service was largely monopolized by some nation-owned companies at the time who lack the motivation to popularize new services. The result was the Chinese society gradually built up a huge hankering for voice message service. Then WeChat played the outlet of this huge affection need, facilitating the spreading.

We can also talk about the social and cultural dependence of WeChat. The design of every successful software cannot be truly universal. It must correctly adapt to the culture context of the targeting market. Yet for the nuances in apps or services, people tend to compare them in the spirit of absolutism. But sometimes it is wrong to regard them as advantage or disadvantage, but rather an active choice. For example, many messenger apps have the function to use specific symbols to indicate the current status of a message user sent. For Facebook Messenger, one knows if the recipient reads the message. For WhatsApp, they even add three different indicators to communicate more reliably and effectively.


Figure 8. Status indicators of WhatsApp

From a software development perspective, this function is very easy to add, but WeChat never adapts to such a method, because of the social and cultural context of its main target market. In China, messages created by such apps are filtered and censored. If WeChat has a sent check, there is the possibility a message is blocked instead of lost. If the sender tries to resend multiple times and still cannot send the message, he may realize it is been filtered which is undesired by neither WeChat nor the government.  So the sent indicator is incompatible with the Chinese social context. Instead, WeChat embraces the method to indicate a sending failure when it is due to network connections and prompt user to resend.


Figure 9,10. How certain words are filtered in WeChat


Figure 11. Resend indicator in WeChat

As regards to the read check, it is more of a cultural difference. China has long been regarded as a nepotism society, in which a declared ignoring is very aggressive and shameful for both parties of the conversation. Instead, if one reads a message and chooses not to reply,  both sides would avoid the loss of face which can be a much greater issue than the message itself. This psychology deeply rooted in the China. Comparing to the Sun-Apollo worshiping western culture, eastern cultures are more Moon-oriented, which emphasize the value of vagueness and ambiguity, to the extent they are regarded as aesthetic objects. Many studies have done about this cultural feature and we do not have to discuss it in depth. But one explanation by Hayao Kawai in Japanese Psyche: Major Motifs in the Fairy Tales of Japan can be particularly helpful to understand the eastern spirit: “nothing has happened” wherein nothing is interpreted as a special subject rather than null. When A ignores a message B sent, he actually replies a message “nothing”, leaving B in the ambiguity (which is a good thing) to interpret the situation as a superposition state of either being blocked or being ignored. In conclusion, the read check is extremely unsuitable for eastern cultures and WeChat’s lack of status indicators are actually by design.

In this part, we examine the dependences of WeChat from technical, historical, social, and cultural perspectives. There are other dependences of WeChat as well, but we can already see clearly there is no room for a determinism explanation for WeChat’s success, which is practically a combination of functions, constraints, compromises, and contexts.

Affordance. What made WeChat magical

Human brains and computers will be coupled together very tightly; the resulting partnership will think as no brain has ever thought and process in a way not approached by information handling machines today.                                                                                                 —J.C.R. Licklider

When talking about WeChat’s affordance, we can divide the subject into two sections, affordance for developers, and for users.

The greatest thing about WeChat may be the integration. From the last section, we can see that WeChat started rather simple, with no extraordinary function or service. But from Version 1.0 to the current 6.5.1, WeChat kept integrating useful functions into the platform. As a result of this consistent evolving, WeChat is now called “an App to rule them all” in China. While in U.S. one may need a dozen of apps for daily life, in China WeChat alone is sufficient.



Figure 12,13,14. The function integration and comparison of WeChat

The reason and logic behind this are the open API structure. For any novel app in China, functions aside it cannot compete with the dominating user base of WeChat. If users can access the service via WeChat, it means millions of user influx. We can liken the platform effect of WeChat to web portals when WWW was at its early stages, main portals like AOL provide access to other contents which made them popular. When WeChat opened the API for other apps, many third party developers began to provide additional features to the already magnificent complex. And many successful app developers believed in a better future if their products have a daemon instance on WeChat platform. Thus the integration of WeChat began. For example, Group Buy, the Chinese version of Yelp, number one of this market in China, had its own website and mobile app for long. But in late versions, WeChat and Group Buy carried out cooperation in depth and added a Group Buy portal in WeChat. Through the portal, WeChat users can gain access to Group Buy functions without leaving the WeChat platform, even without the need to install the Group Buy app in the first place. It means Group But potentially share the vast user base of WeChat, which can be a win-win situation for both parties.


Figure 15. WeChat official API web page

Group Buy was already influential and famous before the grafting, an overlord of its own market. Yet it cannot resist the prospect of cooperating with WeChat. The similar “immigration” happened to other leading apps as well: Didi (Chinese Uber), 58 (the leading housekeeping app in China), Meituan (a leading take-out service in China)  successively joined the league and kept expanding the platform.

For those apps less famous, the motivation can be even stronger: once the portal is established, they immediately accomplish the transition from a start-up app to an industry leader. This phenomenon happened many times in WeChat Games, the game platform of WeChat. In WeChat Games the majority of games are not developed by WeChat. But once a game is integrated into the WeChat platform, it immediately has users, payment method, promotion platform, multiplayer cooperation/competition platform, etc. In a game purism perspective, many of the popular games in WeChat Games are rather dull, with low graphic performance and monotonous game mechanism. But WeChat turned them into social network games which serve as a totally different role for the users. It is fair to say by integration, WeChat is reforming the landscapes in many app fields. 

Of all the capabilities WeChat has integrated over the years, the payment is particularly a game-changer. It also happened in version 5.0. Once the users bound a bank account to WeChat, the app turns into an online transaction platform. Once again, it was no novel function to support online transaction for a mobile app, in China Alibaba had Alipay for this function long before WeChat. But by combining the user base and the portal to other apps, WeChatPay created a new payment environment.

By the time of WeChatPay launched, Ali already occupied more than half of Chinese online payment. The secret weapon of Alipay is Taobao and Tmall (Chinese biggest online commerce platform), the dominating e-commerce platforms of China. As WeChat is for QQ, Alipay is the natural extension on mobile terminals of Taobao and Tmall. But this was also the limit to Alipay, it is more of an extension of traditional online payment for online shopping, not creating a new model of paying. Both Taobao and Tmall are physical commodities based platform, so WeChatPay seized the service based transactions market where no unified payment platform monopolized before. It also came with the innovation to send “red packets” to other individual or groups with digital cash in it, to create a whole new model of transactions. The amount happened in both service based market and red packets were both small comparing to physical commodities,  but by doing this WeChatPay effectively foster the users’ habit of paying within WeChat, and then exploited the habit to other areas.


Figure 16. WeChat Pay functions usage percentage. Image from 

Taobao achieved the total sales of 15 billion dollars in a single day on 11 Nov. 2016, the Bachelor’s Day, and more than 80% of that was done on mobile phones, how can WeChat compete with that? One thing WeChat is trying is integrating one of the biggest online supermarket of China, JD, into the platform. As a result, users can directly buy everyday things in WeChat. Comparing to Taobao and Tmall, JD had much fewer choices when it comes to the types of merchandise, we can liken JD to Target while Taobao more like eBay. The Chinese name of Taobao has the meaning of “treasure hunting”, in which a vast of choices are available if you are good at hunting. But in the mobile context, it can be harmful as well. A typical treasure hunting scenario on desktop involved longer time, comparing between commodities (different pages), and bargaining with the seller. But the mobile context affords none of these. So mobile buyers tend to buy well-known commodities, they care about quality over variety, they choose familiarity over novelty. In this case, JD’s fit perfectly into the slot. As a unified supermarket, JD had better quality management over the commodities than Taobao, but with much fewer choices. Together with WeChat, they provide an easy model of purchasing wherein customers buy daily consumables without much of choosing.


Figure 17. WeChat Pay users and main purchase categories. Image from 

By fostering new mobile payment model like red packets, new payment environment like service based market, and integrating supermarket like JD, as well as blocking the portal to Taobao, WeChatPay’s user rate doubled from 2015 to 2016. And with the high popularized rate and payment ability of WeChat, the government is using it as an interface to smart city. In many cities in China, like Beijing, Guangzhou, Shanghai,  Wuhan, users can access mobile public services via the WeChat portal. For example, Beijing WeChat users can pay the utility bill, pay the traffic fines, make an appointment in hospitals, check out a book, conduct visa services, and many other public services within the WeChat platform. The list is rapidly expanding, as well as supported cities in China. And because this kind of service is mainly done with webpage-based technology, which is very easy to develop and maintain, they act like optional plug-ins for WeChat, making it even more flexible and extensible. It is no longer science fiction that one can get access to all the services, both public and commercial, with portals enabled by WeChat.

When talking about WeChat’s affordance, we cannot ignore QR Code. QR is the acronym of Quick Response, it is a technology originated in Japan during the 1990s. It is a label generated by the algorithm to be optical read and decoded. It can easily encode complicated text information (1850 characters) into a small label attached to other things. Because of the high redundancy in QR encoding algorithm, when the surface suffers no more than 30% damage it is still readable, making it extremely suitable for printed outdoor situation. Nowadays, China is among the countries which best integrate these technologies into the society. And WeChat is compatible with it from the start.

Every WeChat user has their own unique QR code, so a typical scenario when two people meet and want to exchange contact information is one provide the QR while the other scan. In an instance, a friend request is sent and connection established. But QR in China is much more than that. In the following pictures, the first one is British Embassy in China, the second one is a sweet potato vendor. These two poles is a quick demonstration of the QR craze in China.

main-qimg-203d36ec9028f1724052d381bc95a985 mp55778135_1453361975703_6

Figure 18,19. The QR code used by British Embassy, and by sweet potato vendor

U.S also used QR for a little while but it didn’t prevail, there are some reasons behind that.

  1. Lack of technical dependence. When QR was introduced into U.S, no standard reader was available. Neither Android nor iPhone had the reader in-build. One had to install additional apps that can only read QR to retrieve the information.
  2. Lack of universal portal. Even users read a QR with the special reader, the most thing they can do is to access a URL, which is not a big leap from traditional text-based information.
  3. Lack of regulation. QR was introduced to U.S in its early stages, where the standard protocol was not fully fulfilled. Many custom-made QR failed to generate a universally readable information.

Comparing to that, when we analyze the reasons why QR is so prevalent in China, we can find other historical and social reasons. The first is traditional QR leads to a URL, which is a string of Latin letters. To the English world, the URL itself is symbolically meaningful, but not for Chinese. Especially for the vast population with no English literacy. As a result, any method that can automatically translate information to URL is crucial and easily popularized. Another key reason is the timing. QR entered China when O2O model was on the rise, individual retailers have the greatest motivation to propagate their product or promotion information through this way. So in China, the popular of QR was not driven by WeChat or any other tech giants, but by numerous retailers trying to use new technology to boost their sales.

From a technical perspective, QR Code is simple and outdated, but from a sociotechnical perspective,  a simple technology infused by self-motivated individuals can achieve much higher than it seems to afford.

And QR profoundly expands the possibilities of WeChat. With a built-in reader, WeChat can decode all kinds of information embedded in the code, be it a URL leading to ticket sale, a transaction indicator leading to a purchase, a contact information that users follow, or a verification code for the user to log into a system, or just some text information for the users to read, WeChat can process them all, in the blink of an eye. In these part, we analyzed some major affordance enabled by WeChat. With the application portal, the payment method, and QR reading ability, WeChat user can process both online and offline information and services, penetrating the traditional barriers between different economy modes. They also get the ability to share all formats of media through the “share to” function of WeChat. In the next part, we will examine the emergent features of the WeChat society.

Emergence. What made WeChat phenomenal

We shape our tools and thereafter our tools shape us.                                     — Marshall McLuhan

By emergence, we mean collective behavior or properties of a system that can not be deduced by analyzing the constituent parts of the system. Emergence is the result of synergies, rather than bundling up of the elements within a system. Though in this paper we focus on the system design of WeChat, but emergence is rarely by direct design. As we see in the example of QR code in China, it is not WeChat who invents and promotes all the innovative and pervasive uses of QR code, but the decentralized self-motivated agents equipped with QR generating and reading abilities enabled by WeChat,  co-create a QR-omnipresent China. By this logic, in the complex sociotechnical system of all the users, companies, apps, government sectors, WeChat gradually begins to play the role of an enabler.

This can be a new stage for a mobile app at which it really begins to impact the society, not by providing functions for individuals to use, but providing a sociotechnical context for individual to exploit and co-evolve with the platform.

In this way, WeChat reshapes China in many visibly and invisibly. This is not the focus of the paper so we will not delve into details. But from a system design perspective, it is necessary and important to see the potentials when a designed system evolved into a decentralized adaptive system.

WeChat reshapes customs.

In China, the Spring Festival custom is one of the most consistent ones. People go back to their hometown at this time of the year for the celebration and family reunion. This is the time every year that big cities seem evacuated. In the past thirty years, the CCTV Spring Festival Gala is the core of traditional family activity. Members of the family gather together to watch TV until midnight comes and officially proceed into a new year. This tradition has been so long and so stable even in the age of new media TV are constantly losing its appeal. But even this massively collective tradition is changed during the past three years.

With the Red Packet function of WeChat, it is easy to send digital cash to friends with greeting words, and one can send red packets into groups. For example, you can seal 200 CNY into a red packet with ten parts and send it to a group of 20. Then it is more like a game of gamble, each member of the group who saw the message can open the packet and get a random amount of money from 1 cent. And the first 10 people open the red packet will carve up the 200 CNY. Traditionally, elders should give younger generation cash in the hope of a bliss coming year at this time of year, so the red packet smoothly blend into the custom by providing an alternative way for Chinese to give blessing cash to others. Especially in this time of year one is supposed to spend with the family, so this is a good way to stay connected with friends. But the gambling nature of group red packet slowly transforms into a game. Wherein group members send a packet with small amounts and the one gets the highest/lowest part continue the procedure. Some companies are also using this chance to send out large bonuses. As a result, everyone is in the close monitor of the smartphone, in fear of miss a large packet.

In 2014, the first Spring Festival this new tradition emerges, 5 million people take part in the game. In 2016, it was 516 million, nearly half of Chinese population. People sent red and received red packets 32.1 billion time, which is ten times more than 2015.

To further the trend, technical giants cooperate with CCTV or other TV stations, to add QR code into the gala. So during certain points of the show, if one shakes the phone, there is a chance he gets all kind of bonus, be it cash, tickets, coupon, collectibles. In the peak time of Spring Festival Gala, there were 810 million phones shaking in one minute. Even if each user are shaking two phones (which is very common in that context) at the same time, there were 400 million people shaking phones at the same time. Simply magnificent.

One can try to imagine the figure continue to grow in the coming Spring Festival. But no matter what the figure is, a whole new civil custom broke out in 2 years with an emerging business of billions.

WeChat reshapes the economy.

Online commerce platforms like Taobao is the dominating form in China. But during the years a new form of e-commerce is rising, called WeChat Business. Unlike Taobao, wherein the owner should practically maintain a web page about the online store, the WeChat Business owner only has to take pictures of the products and write some short introductions and send to groups he is in.

WeChat business has an even lower threshold and focuses more on user experience than propaganda. It also exploits individual credit in the field which is seriously missing before. And the cash flow is more fluent than Taobao because in Alipay a third party supervision will hold on to the money until a purchase is successfully finished in the protection of consumers.

This trend is encouraging more and more young people to start their own business rather than finding a job. There are absolutely shortcomings in the trend, but the stimulation for innovation is undeniable. With an easier approach to connect and interact with the customers and make transactions, Chinese young generation avidly embrace this entrepreneur wave.

The trend is not limited to business. Creating and maintaining an official account is more and more common nowadays. Due to the ability to push one article every day, many journalists, bloggers, designers, photographers, writers take WeChat as the staging area and try to build up their own fan community. A young girl I know in China, 21 years ago, has been writing in her official account for one year. Lately, she published a book and already ranked fifth in Chinese sales for youth literature.  This is unimaginable before WeChat came along.

More and more ambitious individuals, like her, like the old farmer selling potatoes with QR code, are seizing the opportunity for self-fulfillment, leading to an effervescent social environment. WeChat cannot take full credit for that, but it is the infrastructure beyond which these dreams are built.

WeChat reshapes society.

WeChat reshaped society in many different ways, here we only take a quick look at the moments. In moments users can share pictures, texts, or link to articles to his friends, or he can forward an article he likes. As a result, good articles quickly get to circulate among users. China is a society with relatively low social engagement, partly due to the practical ideology of the time, partly due to a low credibility of official media. So many people, especially the young and liberal ones, prefer WeChat articles than editorials. By forwarding such articles, different opinions and insights are circulated quickly and widely. There is still censorship for that, but the post-censor mechanism still enables the articles to been read by many. Thus it is a solid step towards a more open society.


The best way to predict the future is to invent it.                                                                  —Alan Kay

Though Chinese technology companies have long been accused as copycats, WeChat is something unprecedented in other parts of the world. By integrating popular functions, providing a portal to other plug-in apps, and merging the online-offline network of the individual user, WeChat is practically practicing a new kind of application, or rather, a mobile platform. It may present the future trend of social network and mobile apps, even a possibility of future digital presence. With the profound social influence of WeChat, China is experiencing a kind of integrated mobile life not yet experienced here in the U.S. And the young generation is using it as a platform to make more innovations thanks to a more flattening social structure enabled and connected by WeChat.

The rise and success of WeChat have its unique social and historical dependence which may not be repeated, but the rest of the world should pay attention to this emerging app, as well as the emerging social momentum empowered by it. It might as well be the next Facebook, or something bigger.


Abelson, H., Ledeen, K., & Lewis, H. (2008). Blown to Bits: Your Life, Liberty, and Happiness After the Digital Explosion (1 edition). Upper Saddle River, NJ: Addison-Wesley Professional.

Arthur, W. B. (2011). The Nature of Technology: What It Is and How It Evolves (Reprint edition). New York: Free Press.

Baldwin, C. Y., & Clark, K. B. (2000). Design Rules, Vol. 1: The Power of Modularity (4th Printing edition). Cambridge, Mass: The MIT Press.

Berners-Lee, T. (2000). Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web (1 edition). San Francisco: HarperBusiness.

Chen, S., & He, W. (2014). Study on Knowledge Propagation in Complex Networks Based on Preferences, Taking Wechat as Example. Abstract and Applied Analysis, 2014, e543734.

Collins, H., & Kusch, M. (1999). The Shape of Actions: What Humans and Machines Can Do. MIT Press.

Davis, M. (2001). Engines of Logic: Mathematicians and the Origin of the Computer (Reprint edition). New York: W. W. Norton & Company.

Deacon, T. W. (1998). The Symbolic Species: The Co-evolution of Language and the Brain. New York: W. W. Norton & Company.

Denning, P. J., Martell, C. H., & Cerf, V. (2015). Great Principles of Computing. Cambridge, Massachusetts: The MIT Press.


Gleick, J. (2012). The Information: A History, A Theory, A Flood (2.5.2012 edition). New York: Vintage.

Kawai, H. (1998). Japanese Psyche: Major Motifs in the Fairy Tales of Japan. Woodstock, Conn: Spring Publications.

Latour, B. (1999). Pandora’s Hope: Essays on the Reality of Science Studies (1 edition). Cambridge, Mass: Harvard University Press.

Lidwell, W., Holden, K., & Butler, J. (2010). Universal Principles of Design, Revised and Updated: 125 Ways to Enhance Usability, Influence Perception, Increase Appeal, Make Better Design Decisions, and Teach through Design (Second Edition, Revised and Updated edition). Beverly, Mass.: Rockport Publishers.

Lien, C. H., & Cao, Y. (2014). Examining WeChat users’ motivations, trust, attitudes, and positive word-of-mouth: Evidence from China. Computers in Human Behavior, 41, 104–111.

Manovich, L. (2013). Software Takes Command (INT edition). New York ; London: Bloomsbury Academic.

McLuhan, M., & Gordon, W. T. (2003). Understanding Media: The Extensions of Man: Critical Edition (Critical edition). Corte Madera, CA: Gingko Press.

Murray, J. H. (2011). Inventing the Medium: Principles of Interaction Design as a Cultural Practice (1st edition). Cambridge, Mass: The MIT Press.

Norman, D. (2013). The Design of Everyday Things: Revised and Expanded Edition (Rev Exp edition). New York, New York: Basic Books.

Norman, D. A. (2010). Living with Complexity. Cambridge, Mass: The MIT Press.

Norman-Cognitive-Artifacts.pdf. (n.d.). Retrieved September 27, 2016, from

Peng, X., Zhao, Y. (Chris), & Zhu, Q. (2016). Investigating user switching intention for mobile instant messaging application: Taking WeChat as an example. Computers in Human Behavior, 64, 206–216.

Rammert, W. (2008). Where the action is: distributed agency between humans, machines, and programs. Berlin. Retrieved from

Russell, J. (n.d.). WeChat, China’s top messaging app, no longer tells users when it censors their messages. Retrieved from

Vermaas, P., Kroes, P., Franssen, M., Poel, I. van de, & Houkes, W. (2011). A Philosophy of Technology: From Technical Artefacts to Sociotechnical Systems. San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA): Morgan & Claypool Publishers.

Wang, X., & Gu, B. (2016). The Communication Design of WeChat: Ideological As Well As Technical Aspects of Social Media. Commun. Des. Q. Rev, 4(1), 23–35.

Wang, Y., Fang, W.-C., Han, J., & Chen, N.-S. (2016). Exploring the affordances of WeChat for facilitating teaching, social and cognitive presence in semi-synchronous language exchange. Australasian Journal of Educational Technology.

Wardrip-Fruin, N., & Montfort, N. (Eds.). (2003). The new media reader. Cambridge, Mass: MIT Press.

WeChat: China’s Integrated Internet User Experience. (n.d.). Retrieved December 15, 2016, from

Wen, Z., Geng, X., & Ye, Y. (2016). Does the Use of WeChat Lead to Subjective Well-Being?: The Effect of Use Intensity and Motivations. Cyberpsychology, Behavior, and Social Networking, 19(10), 587–592.

Xu, J., Kang, Q., Song, Z., & Clarke, C. P. (2015). Applications of Mobile Social Media: WeChat Among Academic Libraries in China. The Journal of Academic Librarianship, 41(1), 21–30.

Zhongwei, L., Hao, J., & Yangfan, X. (2015). Tencent WeChat’s Micro-Innovation of Integration and Iteration under Technical Paradigm Transformation *. China Economist, 10(5), 106–122.

Zittrain, J. (2009). The Future of the Internet–And How to Stop It. New Haven, Conn.: Yale University Press.


Research on designing graphical programming language

Reid Wang


As the bridge between human beings and computers, programming language is designed by computer scientists to replace the obscure machine language. Compared to binary code, programming language, thanks to its new features such as the application of natural language words and grammar, greatly enhance the efficiency of human – computer interaction.

However, for most computer users, the existent programming languages are still too abstract and difficult to understand, while the necessity for customizing functions has never disappeared. Further, the development of software links every modern industry with computer science, thus making higher requirement to citizens for better understanding of programming language.

This paper intends to discuss whether graphical programming language, as a possible form of future programming language, can help ordinary users study and use programming language in a more friendly and riveting way by analyzing the key principles of designing programming language that is valuable for daily use.


From vacuum tubes and transistors age to integrated circuits age, the computing power of computers has ballooned during the short 70 years. Computer itself also evolves from the gigantic machine occupying a whole room to the laptop and tablet that can be carried by people anywhere. However, no matter what form in which computers appear, the basic principle of electronic computer design has never changed. Compared to the first electronic computer born in Pennsylvania, though modern computers have already been manufactured with new technologies including graphical operating system, highly interactive software and powerful self-learning ability that can not be imagined by people several decades ago, they still can not work without the combination of 0 and 1 representing the basic logic unit like “and” or “not”. So the problem is that computer is essentially a machine with unique mechanism based on binary code which is totally different from human beings’ natural language. Although the early computer scientists can make instructions to computers using binary code directly, it is impractical for every developer to learn such kind of thing not only because of the high leaning cost but also the complexity of modern software. Therefore, a new form of language that can both embody the logic of computer and contain the features of natural language is needed. That is programming language.

This paper will discuss the relationship between programming language and graphical design. On the whole, this paper will be divided into four main parts. Part one will focus on computer and programming language, including the computation mechanism of computer system and key features of programming language. Part two will expand the topic to the relationship between computers and human beings as well as the information flow from machines to human beings. Part three includes the application of design principles in designing graphical programming language with a few cases. Part four will talk about the social and business impact of popularizing graphical programming language.

1.Programming language and computers

1) Some basic concepts
Before meticulously researching the new form of programming language, it is imperative to talk a little about the execution process and working mechanism of programming language. Most of computer users have heard that computers are based on binary code, but few of them really understand what “binary” really means.

One critical concept about binary system is Boolean algebra. This theory introduced by George Boole describes logic system in a mathematical way and lays foundation for the binary system of modern computers. The core part of Boole’s theory is the variables with two possible values representing true and false respectively. Therefore the system consisting of n variables can generate 2n possible states. A similar system is Morse code which uses dot and dash to represent English characters and numbers. Back to modern computers, by controlling the value of each logic unit, computers can execute complex instructions since the billions of transistors integrated in a CPU can combine with each other to produce innumerable possibilities.


A binary-tree model to illustrate the mechanism of Morse code

Programming language itself just translates the binary code to people and people’s words to computers. Just as we have to combine words and words together to explicitly express what we mean, we need a collection of certain programming language words for executing our instructions and that is program. The translation process is also executed by a program called compiler. According to the classical book about compiler, Compilers, Principles, Techniques and Tools, compiler is defined as “a program that reads a program written in one language – the source language – and translates it into an equivalent program in another language – the target language”.

Though all programming languages serve like translators between human beings and computers, considering the gap between binary code and human beings cognition, the programming languages with more features of human beings’ natural languages are easily accepted by learners in practice. However, the efficiency of a programming language is usually inversely proportional to its legibility.

2) Key features of programming language
Today one of the most famous disputable topic among programmers is which programming language, from the most complex ones like C++ to the relatively simple ones like javascript, is the best one in the world. It is hard to rank the mature commercial programming languages because every language has its own pros and cons. But there are some common features of prevalent programming languages and those features are determined by computer itself.

  • Natural-language-based

In light of the uniqueness of computer’s logic, the mature programming languages are almost based on human beings’ natural languages, especially English. It is not difficult to understand why to say so. For users, there is no need to describe the working process of computers. For example, users are not interested in what happens in BIOS or hardware when they start up a computer and it is enough to show them where power button is. To eliminate ambiguity, a battery-like icon is often presented besides a power button. For developers, programming language allows them to input single words to realize complex functions. Like the power button, programmers only need to call certain key words instead of exactly knowing everything happening within CPU, memory and hardware. In a simple program like “Hello World”, every word has its own meaning and the meanings can be easily accepted by learners because they are tightly related to their original meanings in English. For instance, the key word “printf” just means printing certain contents on screen.


Classical “Hello World” program in C

The relationship between programming language and natural language can be explained reversely, too. Natural language helps programming language build its meaning system and programming language in turn helps computers understand what words of natural language mean. As mentioned above, Morse code adopts a special way to simply signify characters and numbers. The way to make computer understand words and numbers is similar. An encoding system, like ASCII, will construct mapping relationship between words and special code that can be read by computers, such as binary code or hexadecimal code.

  • Modular

If we really begin to learn a main stream programming language, we can find that most of them, no matter what perspective we use to view them, is black-boxed and modular-designed.

First of all, most programming languages are based on functions containing relatively independent code called code blocks. Branch statement, of which form is if A, then B, else C, is a great example. In program, A, B and C are clear because each part will be strictly enclosed in brace. Maybe programmers still have to write code line by line, but it is much more convenient for eliminating bugs because by modularizing each logic part, programmers can check the programs block by block instead of word by word.

What’s more, a program can not run without the support of operating system. That is why software developed for Windows can not run on Mac os. To be specific, the code inputted on visible interface itself does not mean anything to computers because computers are not manufactured with recognizing key words of popular programming languages. Actually, what stipulates that “printf” means printing words on screen or “int” indicates integer is exterior files called head files. In the above example, what makes key words like “printf” work is the head file “stdio.h” listed at first. For programmers, it is unnecessary for them to know what is contained in the head file (in fact stdio.h is about standard I/O function). Common head files together constitute standard libraries. Obviously, standard libraries only tell programmers what is available to them, but not why it is available. That is why program is black-boxed.

  • Hierarchical

No matter what language we use, ultimately programming is about communicating with computers. People themselves do not have the ability of controlling electronic states of transistors directly. Therefore, computer is a huge black-box system for human beings.


Natural language to machine langugae – interface tp hardware

Machine language is the language of hardware and natural language is the language of human beings, each representing a different logic. A programming language with more features of natural language can be understood by people more easily but computers will spend more time converting it into machine code because calling standard libraries and interpreting syntax can be time-consuming. Low-level programming languages that are close to hardware, like assembly language, are often highly efficient but can hardly understood by people. From this perspective, it is reasonable to say that programming language is hierarchical.

2. Human – computer interaction

1) Information flow
Shannon’s traditional information theory has already built a model for analyzing what happens when people communicate with computers. For modern computer users, they are accustomed to receiving information from the pixel-based screen. However, the problem is that information theory can not explain how brain processes information since the destination of information – human brain is essentially a black-box system in information theory. Another serious flaw is that information theory ignores the effect of information form.


Shannon’s information theory model

In fact, two things interesting of Shannon’s model about programming is exactly the information source part and destination part. The former one is about computers and interface and the latter one is about human beings.

For human beings, the form of information can profoundly influence the final effect of communication. A famous theory concerning this is from Neil Postsman, who regards media as metaphor. To be specific, Postsman criticizes that TV, as a new form of media, distracts people from paying attention to content compared to newspaper. The statement, however, to some extent indicates that images are more attractive to human beings.

2) Human – computer interface
Now that programming language is designed for people and used by people, the design principles used for daily objects are also applicable when evaluating a programming language. Just as natural languages which update every year with thousands of new words and grammars, programming languages also head towards the direction of supporting more key words, functions and libraries. However, what happens with the development of programming languages is that learning and using programming languages is not easier than decades ago. On the contrary, the operating systems, from DOS to Windows, greatly lower the cost of using computers for people.

Compared to DOS, Windows 95, the first generation of Windows operating system that can work independently, introduced a brand-new way of communicating with computers. Users don’t need to input word-based instructions and wait for feedback shown on monochrome interface, instead, in order to implement certain function, they only need to click buttons or icons with clear words or images. The following version of Windows operating systems optimize user experience, but never overturn the form of graphical interface. For programming language, the history of operating system at least indicates one thing that graphical design may help users better understand what they can and should do when they contact a new product. Therefore, graphical programming language has the potential to revolutionize the way people study computer science and develop software.




From words to graphics – the history of Windows interface

3. Designing graphical programming language

1) Interface design – affordance, constraint and feedback
To design a graphical programming language, the first step is to design an interface that can clearly tell users what to do and what they can achieve. For programming learners, two difficult things is remembering what key words mean and designing a proper algorithm for certain goals. So for graphical programming language designers, they have to meet the basic requirements from users.

Blockly, the online graphical programming project developed by Google, makes a good example to solve the problem. The whole interface of Google Blockly is divided into three parts. The left part provide users with selections of programming modules available for building their own projects. To help users understand the effect of available modules and find out the modules they need quickly, designers of Blockly categorize all modules according to their function. The middle part is the main area that allows users to constitute their own programs. By dragging the modules and modifying values of the parameters in the light-colored areas, users can freely decide what their programs are like. The right part just shows the equivalent source code. For users who are interested in learning basic concepts in a more graphical way, the source code can be helpful.

Just like Lego, Google Blockly offers similar affordance, constraint and feedback to users. By observing the margin shape of each module and reading the instructions, users can establish preliminary impression of programming. When a module is dragged to the right location that can make the whole program work, it will shine and create special sound effect. To prevent users from making mistakes, module will not make response to users when it is placed at wrong place.


Interface of Google Blockly

Considering one of the most important characteristics of programming language is modularity, the interface design of Google Blockly is perfect for people to understand what modularity means. In fact, a good design for graphical programming language interface not only should be user-friendly, but also should reflect the features of object programming language.

One feasible strategy for designing a graphical programming language is gamfication. Of course, gamfication does not mean designing a game but absorbing the advantages of game as a new form media that create a sense of immersion, which can be explained as “not interrupting us all the time to tell us it does not matter”.

So the first question is what is game what makes game different from other forms of media. Generally speaking, game cab be defined as “a problem-solving activity, approached by a playful attitude” by Jesse Schell. If we narrow the range down to screen-based games, we can find what makes game intriguing and playful is its interaction with people.

Here another question emerges – what does interaction mean to players? To answer this question, Richard Bartle comes up with a model that divides players into four main types – killers, achievers, socialists and explorers. For players, Although Bartle’s system is initially used to explain the players’ psychology in MUD(Multi-user dungeon) games, considering the interaction style of games does not intrinsically change, Bartle’s theory is still valuable for researching modern games.


Bartle’s theory of players

Back to gamfication of programming language, the first thing we have to do is to figure out what role programmers or learners play in Bartle’s system. No matter full-fledged programmers or inexperienced beginners, what they face is unknown and waited to be created. Therefore, we can easily categorize them as explorer. What’s more, gamfication means response from computers is important as well. Considering this, taking users of graphical programming language as achievers is necessary.

So now our goal is to design a fully interactive and immediately respondent programming language with attractive interface, of course. Here is a great case of a programming language that meets all these requirements – Scratch.


Interface of Scratch

Scratch is an interesting teaching programming language developed by MIT. Just like Google Blockly, it adopts the Lego-like design to reflect the inner logical structure like loop or branch in the program. What is different is that it does not strictly simulate the traditional programming languages but uses a pseudo-code and natural-language style to create interactive effect. In the example shown in the above picture, one amazing thing is that the property and action of the cartoon character, whose parameters can be adjusted by learners, is labeled with different colors. By doing this, learners can quickly understand the definition and meaning of statement or function. The execution result of the Lego-like code is that the animate characters can move, stop or change costumes precisely according to what learners have inputted.

As mentioned above, users of graphical programming languages are best regarded as explorer and achiever. Apparently, the users of Scratch can easily be motivated because their work can be directly reflected as an animation that can only be achieved by proficient programmers without Scratch. Also, Scratch provides users a complete community service allowing users to upload their own projects, thus encouraging learners to explore more possibilities using Scratch language.


Scratch’s powerful community

3) Efficiency
There have been many graphical programming languages that focus on learners and light users, like product managers and UI designers, who have the demand for writing short scripts occasionally, but professional programmers, hackers and computer science students still prefer to program in old way. Be it Blockly or Scratch, those graphical programming languages are designed as teaching tool instead of production tool. Compared to traditional programming languages, the existing graphical programming languages can not solve some problems like:

  • Low efficiency of complex program

Low efficiency of graphical programming language can be explained from two perspectives. On the one hand, graphical interface means calling more exterior functions, in other word, involving more black-box systems. It will definitely lower the efficiency of executing program. On the other hand, the industrial programs can be hundreds times as complicated as the programs developed by graphical programming languages. It will be inconvenient for programmers to read because too many geometric figures can distract programmers from what the program is really about.

  • Programming is not all about programming languages

A prevalent misunderstanding people have of programming is that learning programming language equals with learning programming. It is ridiculous to say that anyone who knows English is novelist and that is the same to programmers. Graphical programming language may be able to help learners understand the basic concept of Boole algebra, but it can not teach them how to use it. Therefore, graphical programming language itself is not enough to help people understand programming and computer science.

Graphical programming language is an excellent teaching tool but not a great production tool now. To develop graphical programming languages that can replace traditional programming languages, designers have to jump out of the circle of existing programming languages. However, the co-existence of graphical programming languages and traditional programming languages may be the best situation considering the development of computers.

4. Social – technical system

1) Computational thinking
Although graphical programming language itself is not enough for anyone who aspires to enter IT industry, the popularization of it can propel people to form attitude of computational thinking, which is interpreted by Jeanette Wing as “computational thinking involves solving problems, designing systems, and understanding human behavior, by drawing on the concepts fundamental to computer science”.

So how to explain “ the concepts fundamental to computer science”? Back to the learning process with Scratch language. During the learning process, learners will observe, experiment, analyze, summarize and introspect. Also, to realize the object like “making the character head upwards for three steps in yellow costume” , they will learn to break down the conundrum into some sub-problems like “head upwards”, “three steps” and “yellow costume”. In general, computational thinking is an intelligent, rational and efficient way of thinking.

Graphical programming language unveils the mysterious programming language to ordinary people. We do not need everyone to be eminent programmer who can finish a complete project independently, but computational thinking should be accepted by every modern civilian.

2) Non-elitism
Another interesting thing relative to the popularization of graphical programming language is that user-friendly interface and appealing interactivity can effectively lower the barrier for developers, or more precisely, small-scale developing teams or individual developers. Maybe those developers are incapable of developing any large software targeting millions of users, but the opportunity for them is that need can be personalized. Sometimes the need can be so specific or minor that big companies often ignore the voice from those users. If we browse the plug-in center of Chrome, we can find many functions beyond our imagination.

Under such a circumstance, developers can be users themselves. Users with peculiar needs will develop small program or write a few lines of scripts themselves and then share with others online. For those developers, they don’t need to learn computer science courses systematically. So the best programming language for them is the simplest and most visualized one.

I would like to call the emergence of individual developers or non-proficient developers non-elitism in Internet age. Non-elitism means that anyone who has special interest can develop his or her own application, build personal website or create art work, thus forming community with single topic and strong user engagement.


As a form of language that plays important role in human-computer interaction, programming language guides people how to communicate with computers and de-blackbox the complex digital system in our life. However, the abstraction and complexity of main stream programming languages prevent ordinary people from learning them. Graphical programming language, compared to traditional programming language, is accepted by ordinary people more easily because of its clearness and interactivity in education field. Nevertheless, it is still hard to make verdict on the future of graphical programming language in working field.


Adams, Ernest. “The designer’s notebook: Postmodernism and the 3 types of immersion.” Retrieved 1.5 (2004): 2015.

Aho, Alfred V., Ravi Sethi, and Jeffrey D. Ullman. Compilers, Principles, Techniques. Addison wesley, 1986.

Anderson, Chris. “The long tail.” Wired magazine 12.10 (2004): 170-177.

Arthur, W. B. (2011). The Nature of Technology: What It Is and How It Evolves (Reprint edition). New York: Free Press.

Bartle, Richard. “Hearts, clubs, diamonds, spades: Players who suit MUDs.” Journal of MUD research 1.1 (1996): 19.

Bodker, Susanne. “When second wave HCI meets third wave challenges.”Proceedings of the 4th Nordic conference on Human-computer interaction: changing roles. ACM, 2006.

Boole, George. An investigation of the laws of thought: on which are founded the mathematical theories of logic and probabilities. Dover Publications, 1854.

Bryant, Randal, and O’Hallaron David Richard.Computer systems: a programmer’s perspective. Vol. 281. Upper Saddle River: Prentice Hall, 2003.

Fangohr, Hans. “A comparison of C, MATLAB, and Python as teaching languages in engineering.” International Conference on Computational Science. Springer Berlin Heidelberg, 2004.

Jeannette Wing, “Computational Thinking.” Communications of the ACM 49, no. 3 (March 2006): 33–35.

Manovich, Lev. Software takes command. Vol. 5. A&C Black, 2013.

Norman, Donald. The Design of Everyday Things: Revised and Expanded Edition. Rev Expedition.
New York: Basic Books, 2013.

Postman, Neil. Amusing ourselves to death: Public discourse in the age of show business. Penguin, 2006.

Prechelt, Lutz. “An empirical comparison of seven programming languages.”Computer 33.10 (2000): 23-29.

Rochkind, Marc J. Advanced UNIX programming. Pearson Education, 2004.

Schell, Jesse. The Art of Game Design: A Book of Lenses. 2nd edition. Natick: A K Peters/CRC Press, 2014. Print

Shannon, Claude Elwood. “A mathematical theory of communication.” ACM SIGMOBILE Mobile Computing and Communications Review 5.1 (2001): 3-55

Outline of the Final Project (Chen)

Final Paper Outline

WeChat (working title)

1 What is WeChat

A brief history and current statistics

Introduction to interface and functions

De-blackboxing the design and analyze how the designs provide affordance to the functions

Analyze key design principles

  • Modularity
  • Combinatoriality
  • Hierarchy
  • Abstraction


2 How people use WeChat

Introduction to typical use of WeChat

Interviews as qualitative method, focusing on how different age/culture groups use WeChat

Analyze key usage

  • Messenger / voice message / online phone / video chat / voice-text converter
  • Group
  • Social network / Contacts management / Friends hunting
  • File sharing & transmission / QR reader / Bar code reader
  • Official Accounts / Mobile reader/ Marketing
  • eWallet / Transaction platform / Red Envelope / tickets and coupons
  • Search engine / API / Shopping, gaming


3 When WeChat

Introduction to the mediation of WeChat on user

Cognitive artifact

Influence on people’s expression, case study WeChat emojis


Influence how media are made, to suit the spread model of WeChat

A new agent emerge

Influence on people behavior

4 How it does these

Analyze how WeChat integrate and coevolve with existing systems or nets

  • Internet
  • Telecommunication net
  • Other app functions enabled by open system
  • Nets of things


5 Social impact

As a sociotechnical system, how WeChat impact Chinese society

  • Social inclusion
  • Social justice
  • Social stratification
  • Social convention
  • Emerging jobs and economy


6 Reasons behind its uprise

Brief introduction to the reasons why WeChat thrive

  • Technical reasons
  • Social reasons
  • Political reasons
  • Global context


7 What is WeChat

Integrated platform to combine people’s everyday needs into one solution and systematically expand the possibilities of online life

Band in Your Hand: De-blackboxing GarageBand – Jessie


GarageBand is a music software for Apple devices such as iPhone and iPad. It has a library of sound effects and can be used to create songs with multiple audio tracks. In this paper, I discuss the design principles in GarageBand, such as modularity, affordance, and constraint. I also examine whether GarageBand fulfills Alan Kay’s vision of meta-media.

1. Introduction

GarageBand is a music application for OSX and IOS systems. It enables users to create multi-track songs with a lot of pre-made virtual instruments such as keyboards and guitars. There are thousands of loops in its library of sound effects. It also can serve as a DJ machine. Projects created in GarageBand can be exported in many formats such as wav. It provides amateur musicians with powerful tools to play and compose music.

For this paper, I am focusing on the IOS version of GarageBand for iPad. I’m going to examine how design principles apply to the user interface design and the programs of actions of GarageBand. I’m also going to demonstrate some principles by re-creating Daft Punk’s song Give Life Back to Music with GarageBand. Here is the video I made for this song.

Give Life Back to Music, re-created by Jieshu Wang with GarageBand, originally by Daft Punk.

Here is the link to the GarageBand file that you can download and import into your GarageBand app.

2. Modularity

Modularity is a method with which designers divide systems into subsystems in order to manage the complexity. Every module hides their own complexity inside and interacts with other modules with interfaces. Each module is divided into more sub-modules. In this way, the overall system complexity is reduced[1].

With years of development and updates, GarageBand is becoming more and more complex. However, as a user, I never felt it complicated to use. That’s because its designers use the principle of modularity very well. Improvements in one module would not influence other modules, so users don’t need to change much their existing using habits to adapt new functions. Here I will examine the modularity in GarageBand to see how it helps improve user experience and manage system complexity.

2.1. Modules in GarageBand

2.1.1. Sections and tracks

The basic function of GarageBand is to create your own music. Each song or project you create will not impact each other unless you import one project into another one. So, each song can be seen as a module. This is the topmost level of modularity for users.


Each project is a module.

Inside one of the projects, there are two dimensions of modules. They are like two coordinate axes in an XY plane. The vertical axis is for audio tracks, while the horizontal axis is for sections (time).

Sections and tracks in GarageBand as modules. Video/Jieshu Wang

The first dimension of modularity is audio tracks. Within one project, you can add no more than 32 audio tracks, more than enough for most amateur musicians. Each track serves as a module that hides its complexity—its timbre, chords, loops, melodies, and other properties. When you are editing one track, you can play the sound of other tracks in order to synchronize your beats without affecting them.

The second dimension of modules is song sections. Each section is made up of several bars. The default number of the bars in one section is eight, but you can easily increase or decrease the number as you wish. Each song can consist any number of sections. Each section is a module where you can add no more than 32 tracks. While you are in the interface of one section, each audio track can be easily moved, trimmed, looped, cut, and copied, but your action in one section would have no impact on other sections—except adding or deleting tracks, which would automatically add a blank track with the same instrument or delete the same tracks in other sections. If you want to edit other sections, you can click any area in the current section and drag it to the left or right to enter the section behind or before the current section.

Here’s another advantage of dividing one song into sections. Since one song normally lasts several minutes, with the size constraint of the iPad touchscreen, in order to squeeze the whole song into the limited width of the screen, the length of one bar would be extremely short, too small to recognize. Any small variation of the sound wave would be very hard to locate. Users would have to zoom in many times to find a specific bar he/she is looking for, and then zoom out before zoom in again to reach another bar. Sections resolve this problem perfectly. It provides users with a navigation system like longitude and latitude. For example, only three numbers are needed to locate one specific bar in one song—the ordinal numbers of the section, the track and the bar within the section. If GarageBand doesn’t have sections or has just one section for the entire song, it would be very difficult to locate one bar among hundreds if not thousands of bars in one interface.

In general, if you create a song with five sections, and each section has eight bars and four tracks, then you get 5X4=20 modules that you can edit separately. For the Give Life Back to Music, I created 7 sections and 21 tracks, totally 147 modules. Although modules can be edited independently, they combine together organically. When you finish your project, you can export the whole song with tracks perfectly mixing together and sections seamlessly connecting one by one. If there’s a mistake or a sound effect you’d like to add or change, all you have to do is to find the right module and modify it accordingly.

2.1.2. Modularity of sound effects

As I discussed above, a song project in GarageBand is divided into modules according to time and tracks. Inside each module, GarageBand provides us with a large number of options of sound effects. Those sound effects are divided into two main modules—Tracks and Live Loops.

In short, Tracks are mainly sound effects that imitate real instruments such as pianos and guitars, while Live Loops are pre-edited loops, each of which is consist of rich tracks with different genres or styles such as EDM and Hip Hop.

 Two modules of sound effects that you can add into audio tracks: Live Loops and Tracks Live Loops

Both modules (Live Loops & Tracks) have many sub-modules according to instruments or genres. In Live Loops module, there are eleven pre-edited loops modules in different styles—EDM, Hip Hop, Dubstep, RnB, House, Chill, Rock, Electro Funk, Beat Masher, Chinese Traditional, and Chinese Modern. In each module, there are even small sub-modules. For example, in the module of EDM, there is a default setting that includes eleven mixed tracks with nine pre-edited loops—totally 11X9=99 editable modules.

 EDM Live Loop has 99 pre-edited modules. Users can add more as they wish.

The basic unit of loops all come from 1,638 so-called Apple loops stored in GarageBand. Users can choose from those 1,638 loops to mix their own loops, as well as import other audio files as loops. 1,638 is a large number. How can we find a loop that fulfills our need? For convenience, designers labeled loop units with three types of properties—instruments, genres, and descriptions, forming a three-dimensional selection network. In this way, they programmed the users’ action of selecting loops into three modules. For example, if I’d like use two or three bars of country music loop played by guitars that would relax my audiences, I would choose the keyword of “Relaxed” in descriptions, “Country” in genres, and “Guitars” in instruments. Then I get seven items left in the list, labeled with “Cheerful Mandolin”, “Down Home Dobro”, and “Front Porch Dobro”, which are exactly what I need.


1,638 Apple loops are categorized by three standards: 16 instruments, 14 genres, and 18 descriptions. Tracks

In the module of tracks, there are thirteen options or sub-modules:

  • Keyboard: Play an on-screen keyboard with piano, organ, and synth sounds.
    • There are seven types of timbre for users to choose—keyboards, classics, bass, leads, pads, FX, and other, totally 133 timbres that you can play with a virtual keyboard on the touchscreen.
    • According to different timbres, there are many sound properties that you can mess with. For example, for a timbre called “Deep House Bass”, you can modify the properties of filter attack, cutoff, renounce, filter decay, and pitch.
  • Drums: Tap on drums to create a beat. There are eight drum kits and eight drum machines.
  • Amp: Plug in your guitar and play through classic amps and stompboxes. Basically, it’s a virtual guitar amplifier and effector. There are four categories (clean, crunchy, distorted, and processed) of guitar amps—altogether 32 guitar amps and 16 bass amps.
  • Audio Recorder: Record your voice or any sound. There are nine effects you can choose, such as large room and robot.
  • Sampler: Record a sound, then play it with the onscreen music keyboard.
  • Smart Drums: Place drums on a grid to create beats.
  • Erhu: Tap and slide on strings to bow a traditional Chinese violin.
  • Smart String: Tap to play orchestral or solo string parts.
  • Smart Bass: Tap strings to play bass lines and grooves.
  • Smart Keyboard: Tap chords to create keyboard grooves.
  • Pipa: Tap the string to pluck a traditional Chinese lute.
  • Smart Guitar: Strum an onscreen guitar to play chords, notes, or grooves. Four styles (acoustic, classic clean, hard rock, roots rock)
  • Drummer: Create grooves and beats using a virtual session drummer

In general, the options for one track can be shown in the image below.


credit: Jieshu Wang

2.1.3. Modularity of action

Under this modular organization, users’ actions of creating a song are also divided into modules. Users have to divide a song into several sections and edit each track in each section separately. For example, a song of 96 bars can be divided into 12 sections of 8 bars. Let’s say it is a simple pop song with 5 tracks—drum, two guitars, bass, and vocal. There are in total 12X5=60 modules that can be edited separately. Accordingly, the user can divide her action into 60 sub-actions. First, she would edit section A—firstly, the drum module of section A, then the two guitar tracks of section A, then the bass track of section A, and then the vocal track of section A. While she is editing the bass track of section A, she must play the three tracks (drum and two guitars) that she already edited in order to synchronize the beats. This function is an interface between modules of actions. Similarly, when she is recording her vocal for the fifth track of section A, she must wear her earphone to listen to the instrument accompaniment of the first four tracks, in order to follow the beat and tune of existing modules. If the user needs some backing vocal, she can add an additional vocal track and sing harmony all by herself.

There are interfaces between different sections. For example, many pop songs have some conventional chord progressions such I-VI-IV-V. In this case, users can simply copy and paste the repeating tracks into new sections. In addition, the drum doesn’t vary a lot during a song. So users can also copy and paste previous drum tracks into later sections, or just loop them to fill the whole song. In Give Life Back to Music, I copied and pasted many tracks, such as the drum tracks and keyboard tracks in order to save time.

With this modularity of action, the creating process of songs is simplified. It’s easy for amateur musicians to manage the complexity of the music.

2.2. GarageBand as a module for other systems

The music industry is a complex sociotechnical system. A lot of technologies, organizations, individuals, commercial companies, and academic institutes are involved in this global system. GarageBand is a part of it, serving as a module for many larger systems.

GarageBand is a module of the iLife software suite, which contains iMovie, iPhoto, iWeb, and other media software. These software all have their own functions, applications, and purposes. For example, GarageBand is a music software, while iPhoto is used to edit images and iWeb is a website creation tool. Meanwhile, they interact with one another through interfaces. For example, song projects created in GarageBand could be imported into iMovie, serving as background music for videos, which in turn can be imported into iWeb, as a part of the web page. In my video of Give Life Back to Music, I exported the song from GarageBand into iMovie.


GarageBand projects can be imported into iMovie

Moreover, as a part of Apple system of software and hardware, GarageBand projects can be transported very easily between Apple devices through AirDrop, a feature using Bluetooth technology, as shown below.


A GarageBand project created on iPad was transmitted to MacBook using AirDrop. It can be edited further using the MacBook version of GarageBand or other software such as Logic Pro.

In addition, GarageBand can interact with other systems outside Apple system through interfaces. For example, Voice Synth is a virtual vocoder on iPad. Since there’s no function of vocoder in GarageBand, when users want to use vocoder, they have to turn to third-party applications such as Voice Synth, as shown on the upper panel in the image below. Here, I will show you the interface between GarageBand and Voice Synth. I used the “Robot” effect in Voice Synth to record me singing “let the music come tonight, we are gonna use it; let the music come tonight, give life back to music”, exported it as a wav format file, updated the audio file to my iCloud Drive on the cloud of Apple, and imported it into an audio track in GarageBand, where I can edit it further and mix it with other tracks. With third-party modules of applications, GarageBand doesn’t need to design its own vocoder module, which might cost a lot of money, and users don’t need to install Voice Synth if they don’t need vocoder effect—not all users want to distort their voice. The interfaces involved here include protocols that are shared by the audio processing community, such as the audio format, Cloud computing, and data transmission methods. On the other hand, the projects of GarageBand can also be exported into other apps such as Logic Pro for further manipulation, partially because they share the same audio engine.


GarageBand also can be used as a module in a hardware system. Using an audio interface such as Apogee Jam, users can use GarageBand as a virtual amp for guitars and basses.

3. Affordance

Affordance is a “property in which the physical characteristics of an object or environment influence its function[1].” As Donald A. Norman mentioned in his book The Design of Everyday Things[2], affordance provides us with clues that how things “could possibly be used”. The design of user interfaces of GarageBand demonstrates this principle, too.

Many interfaces of virtual instruments imitate interfaces of real instruments. For example, there is a virtual keyboard in the module of keyboards. This imitation follows people’s existing mental model, so that users know how to play the keyboard at the first glance of the interface.

Interfaces of some keyboards

Interfaces of some keyboards

Icons on the interface follow people’s existing mental models, too. For example, the green triangle indicates “playing the music”, while the red dot indicates “recording.” And the virtual wheels and rotary knobs afford rotating, let alone the black and white keys that imitate piano, which afford pressing. When you are pressing one of the keys, the hue of the key you are pressing would be darker, imitating the shadow of real keys, indicating that you are “pressing down” a key.


The shadow of the key that users are pressing.

The interfaces for drums also imitate real drums. There are several virtual drumheads that afford knocking. Sometimes, clicking different areas of the same drumhead would cause different sound effects, just like real drums. For example, tapping the center of the drumhead of the biggest drum in Chinese drum kit would cause a deep hit sound, while tapping the rim of the drum would sound like clear knocking. Moreover, the stronger you press the touchscreen, the louder the sound will be. In addition, different gestures would cause different effects, too. For example, in the Chinese drum kit, if you drag your finger around the rim of the biggest drum, it will sound like a stick sweeping across a rough surface—a “rattle” sound.


Interfaces of some drums

However, in the function of “smart drum”, things are different. There are no virtual drums in the interface, but an 8X8 matrix. The two dimensions of the matrix are “Simple-Complex” and “Quiet-Loud”. There’s no such thing as a “drum matrix” in real life, but users know how to use the matrix once they see the interface—there are 64 squares in the matrix, and there are drum components with similar sizes arranging to the right of the matrix. It seems the components are waiting to be dragged into the matrix. So the components afford dragging. There is an icon of dice on the lower left. Physical dice affords rolling. So the perceived affordance of the dice is rolling in order to get a random result. Indeed, when you tap the icon of the dice, it will “roll” in its place, and the drum components will randomly “roll” into the matrix, forming a random beat pattern in a metric framework according to your tempo and time signature.


Interface of smart drum

Another example of affordance is the interface of guitars. There is an icon of a switch on the upper right of the screen labeled “chords” and “notes,” which you can tap to switch between chords mode and notes mode. The notes mode imitates the interface of real guitars, with six strings, which you can tap to play or drag to produce a little variation of pitch. However, the interface is different from real guitar. A real guitar player would use his/her left hand to hold the chords and use his/her right hand to pluck or strum the strings. But in GarageBand, you only see the left part of the neck. However, it’s very easy for a guitar player to realize how to play the virtual strings—by tapping the string between frets, which afford tapping.

The chords mode imitates nothing in the real world, but it provides users with a perceived affordance of tapping as well. As you can see from the gif below, there is a rotary knob at the upper center labeled “autoplay”, with which you can choose from four pre-made chord progressions or you can turn off the autoplay. There are eight vertical bars, each with a chord name, according to your key. For example, for the key of C major, the eight bars are labeled as Em, Am, Dm, G, C, F, Bb, and Bdim. All of them are common chords used in C major. If the autoplay is off, six strings would remain on the screen, affording tapping. If you tap the chord label on the top of the vertical bars, the six strings would be “played” at the same time, imitating the sound effect of strumming. If you tap individual strings in the vertical bar labeled Em, it will play the sound of the corresponding string as if your left hand is holding the Em chord. If you turn on the autoplay mode, all you have to do is tapping the chord name, and GarageBand would play some pre-made chord progressions.

Interface of the Hard Rock guitar in GarageBand

Interface of the Hard Rock guitar in GarageBand

Other instruments like Smart Strings, Pipa, Erhu, and Smart Bass also have many well-designed affordances.

In a word, the designers of GarageBand are really good at using affordance. They imitate real instruments and use many icons, switches, and rotary knobs to integrate so many complex functions in a limited screen.


The interface for amps is full of virtual rotary knobs.

However, many designs are not completely created by GarageBand designers. For example, there are a lot of music applications that imitate guitar and piano. Many of them use similar interfaces as GarageBand. But few apps combine keyboards with guitars in one app, and most of them don’t provide such flexibility as GarageBand. Some professional apps such as Logic Pro provide users with a massive library of sound effects and huge freedom to manipulate music, but they usually cost a lot of money and space. Logic Pro X is powerful but costs $199 for OS X system, and there’s no IOS version. On the contrary, GarageBand cost me just ¥30 (approximately $5) five years ago, and now it’s free for all iPad users!

4. Constraints

The IOS version of GarageBand has many constraints.

First of all, the app size is limited by the maximum size for IOS apps—4GB. The standard was set up by Apple and had increased from 2GB to 4GB in 2015[3]. The app size of the current IOS version of GarageBand is 1.28GB. It makes sure users have enough space to store their projects.

Second, the size of interface area of GarageBand is restricted by the physical size of the touchscreen of iPad. The most common sizes of iPad are 7.9-inch (iPad Mini) with 2048 X 1536 resolution, 9.7-inch with 2048 X 1536 resolution, and 12.9-inch with 2732 X 2048 resolution. It’s bigger than a cell phone but smaller than a laptop computer, so they need different designs. Everything must be on the touchscreen. That is one of the reasons why they design sections. Imagine we have a screen two meters long, maybe we can work without sections.

Furthermore, many music instruments are very long, such as piano, guitar, and erhu. A common piano has 88 keys, and a common guitar has 18 frets. How to put them on a small screen? The designers of GarageBand have many good ideas. For example, for keyboards, the default setting is two octaves from C2 to C4. You can scroll the keyboard to the left or right to play higher or lower pitches. In all, there are 10 octaves. Besides, there is a double-row mode with which you can play four octaves on the screen, as shown below.


The third constraint is that the gesture used in GarageBand is limited by the capacity of the touchscreen. Today, iPad’s multi-touch screen is very powerful. It can sense the pressure of fingers and responds accordingly. For example, the stronger you tap the virtual drums in GarageBand, the louder it will be. But GarageBand will not respond to the finger pressure lighter than the lower limit or stronger than the upper limit of the recognizable pressure of the touchscreen. Besides simple tapping, it also supports other gestures, such as dragging. Designers of GarageBand should choose gestures that are available in iPad, otherwise, the gestures will fail. Other versions of GarageBand have their own constraints depending on their platforms. For example, the OS X version of GarageBand doesn’t support multi-touch gestures because MacBook doesn’t have a touchscreen, but it has a much bigger library of sound because the processing capacity of MacBook is powerful than that of iPad.

5. Does GarageBand fulfill Alan Kay’s vision?

Alan Kay envisioned a universal media machine, with which people can remediate all kinds of media and create their own media with unlimited freedom[4]. Does GarageBand fulfill his vision? I don’t think so.

First of all, GarageBand doesn’t provide us with a flexible enough programming environment. In fact, it doesn’t provide any programming environment at all. It gives us a library of sound effects and pre-made loops, but it’s not easy for you to create your own. It doesn’t allow you to edit the properties of sound. For example, if I want to edit my voice, there are only nine effects for me to choose from. I can’t modify the acoustic characteristics as I wish. It’s like a coloring book with pre-printed line drawings that you can fill with colors but you don’t really “create” the art and it will not improve your creativity as well. You are restricted by the line drawings. It produces an illusion of “creativity.” Most times, when we are talking about “creating” music in GarageBand, we are just re-mixing pre-existed sound effects stored in GarageBand in pre-made ways. Just like my “re-creating” of Give Life Back to Music, there’s nothing creative in my “re-creating”. All the creativity came from Daft Punk.

Second, GarageBand cannot be used to edit media other than music. It has nothing to do with videos, texts, paintings, and so on. It is not a meta-medium.

However, I think GarageBand in some degree democratizes music. For example, I never succeeded in playing the F chord in guitar but I can play it in GarageBand. I can’t sing harmony with myself, but I can record harmony in different tracks in GarageBand and play them together as if I am singing with myself. I don’t know how to write a song, but when I re-create other people’s songs in GarageBand, I can learn the arrangement and composition of songs by decomposing them.

6. Conclusion

GarageBand is a music software with which amateur musicians can create songs on Apple devices. In this paper, I discussed the design principles in the iPad version of GarageBand, such as modularity, affordance, and constraint. In particular, I argue that GarageBand doesn’t fulfill Alan Kay’s vision of meta-medium, but it does simplify the process of creating music for amateur musicians.


[1] Lidwell, William, Kritina Holden, and Jill Butler. Universal Principles of Design. Gloucester, Mass: Rockport, 2003.

[2] Norman, Donald. The Design of Everyday Things. Basic Books, 2002.

[3] Kumparak, Greg. “iOS Apps Can Now Be Twice As Big.” TechCrunch. Accessed December 18, 2016.

[4] Manovich, Lev. Software Takes Command. International Texts in Critical Media Aesthetics, volume#5. New York ; London: Bloomsbury, 2013.