Author Archives: Rohan Somji

Design Architecture of Spotify for Social Capabilities


“Dear 3,749 people who streamed ‘It’s the End of the World as We Know It’ the day of the Brexit vote, hang in there.” In 2016, flexing its ‘user data’ muscles, Spotify demonstrated in its ads that it now has the capacity to extract information from social affairs and use it to peer into personal moments of users and how these affairs influence their music choices.

My aim in this paper is to unearth the layers of design architecture involved for enabling the social capabilities a user has on Spotify. Along the way we will see how it relies on older concepts and systems that were here long before Spotify itself. We will discover that the design at the bottom level of its architecture (its backend, the system, the cloud services, streaming, data storage, etc) is the same kind as the one used by many social media companies. Spotify steers itself in a non-social media direction under the influence of the socio-cultural landscape, to maintain itself as a solely music & podcast streaming platform, using design at the top level of its architecture (affordances, constraints, conventions, icons, features, accessibility).



Fig: Spotify Ad

As one of the top music and podcast streaming services available, their user base has grown substantially over the last few years owing to the increase in available music as well as increase in services. As the user base grew so did a sense of community and a need for social bonding over music. This involves features to share on social networking platforms, features to see what your friends are listening to, collaborating on creating playlists, following each other’s playlists, etc. The data infrastructure underneath the audio and streaming infrastructure is becoming more relevant each and every day as user data grows and ML techniques are applied.

This year they are making it even more personal with their “2020 wrapped” providing more granular details on both music and podcast listening habits: how many total minutes users listened to podcasts, their top podcast with its listen-count, which songs were on repeat, which song you discovered before they went hit and more.

Fig: Spotify Unwrapped 2020

One of the benefits of using Spotify has always been its mobility. Smartphones made it easy to keep music portable by connecting the user to a central database of music collection available via streaming services on the internet. This means the user listens to music in a number of locations and during a wide range of activities. So do their friends on Spotify. Currently Spotify offers accessibility to adding friends via facebook, following a friend’s playlists, and being notified of a friend’s listening activity on the desktop app.

How does all this data get tied up together to provide a highly structured user dataset, intelligible insights and accurate recommendations? What kind of design architecture allows sensible data to emerge from this constant in-flux and out-flux of data arriving from multiple separate interactions of users? What do the design decisions tell us about how Spotify wishes to operate?


Data Infrastructure

Spotify shut down the last of its US data centers in 2016 freeing itself of on-premise infrastructure and migrating onto Google’s Cloud Computing Platform . The data centers existed to “send out music files and fetch back user data” (Eriksson et al,2019, 44). Now streaming and storage of their user data and music files is in Google’s Cloud Storage. The cloud computing services provide the advantage of Google tools like BigQuery cloud data warehouse, Pub/Sub for messaging and DataFlow for batch and streaming processing. In the fourth quarter of 2019, Spotify reported 271 million monthly users and 124 million Premium subscribers, and all this data is stored in Google’s Cloud Platform (Spotify Case Study (n.d.), Google Cloud Customer).

Clicking the play button, tapping on a playlist, using any affordance provided by Spotify results in an “event”. Whenever a user performs an action in the Spotify client—such as listening to a song or searching for an artist— a small piece of information, called an event, is sent to their servers (Maravić, 2016).  Spotify calls the process “Event delivery”, which makes sure that all events get transported safely from users to a central processing system, managed on Google Cloud (Maravić, 2016). Cloud Pub/Sub is the transport mechanism for all the events.

A pub/sub is a message-oriented middleware system where publishers send messages via a portal that categorizes published messages into classes on one end. The subscribers on the other end choose which messages to receive (What Is Pub/Sub? | Cloud Pub/Sub Documentation). Both do this without each other’s knowledge. Here, messages can range from commands to payment information to subscriptions to premium accounts.

Fig: (What Is Pub/Sub? | Cloud Pub/Sub Documentation)

The pub/sub system as part of the internet “uses the bundles of data packed in smaller units” (Irvine, (n.d.), pp. 6). The packet itself has no information but much like carrier waves in radio signals, it carries data that can be called the “payload”. This packet has other information in bits that determines its path in the network and the payload has the message stored in it which is received, and converted for the GUI on the other end (White, 258-259).


Connecting and Accessing Databases

User details, such as username, country, and email, are stored in a user database. Every time a user logs in, that database is queried (Vesterlund, 2015). Spotify uses Apache’s Cassandra and Hadoop to store user profile attributes and metadata about entities like playlists, artists, etc. (Mishra & Brown, 2015). These software libraries (by Apache) are frameworks that allow for the distributed processing of large data sets across clusters of computers using simple programming models.  Similarly, it uses these software libraries for the following (Mishra & Brown, 2015):

  1. Log collection like “completion of a song or delivery of an ad impression,” – Kafka
  2. Real-time event processing like “clicks” “search” – Storm
  3. To remove duplicate events, clean up data to “generate metadata like genre, tempo,” – Crunch
  4. Store user profile attributes and metadata like playlists, artists, etc. – Hadoop and Cassandra

It’s the Storm pipelines that fetch the metadata back, group it per user and determine user level attributes to represent a user’s profile which is then stored in a Cassandra Cluster. Spotify calls this the User Profile Store (UPS).

Fig: (Mishra & Brown, 2015)

When a user is listening in real-time, Apache Storm is at work. A large volume of data moves across all the libraries, while being worked on by Spotify’s recommender algorithm, to generate “Recently Played” songs, curated playlists like “Discover Weekly”, “Shows to try”, “Recommended Radio”, etc.

The deployment of these databases along with cloud-based pub/sub system collectively called Event Delivery system are used to generate and move critical data. These include EndSong Event (an event emitted when a Spotify user is done listening to a track), which is used to pay royalties to labels and artists, calculate Daily Active Users (DAU), Monthly Active Users (MAU); and the UserCreate Event (an event indicating a new Spotify user account was created) (Janota & Stephenson, 2019).


Connecting Users to Each Other

Users can integrate their Spotify profiles with their Facebook account, and, be able to find all their Facebook friends who are also on Spotify i.e. within the pub/sub system we mentioned before, these friends are referred to as “topics” that can be “subscribed” (followed). You can subscribe even without integrating your Facebook account by searching for one another, i.e. querying a database (Setty et. al., 2013, pp. 2). Similarly, publicly available user created playlists can be searched for. Users could also mark these as “collaborative” allowing others to edit, giving them writing permissions.

In the desktop version, we see a pane on the right side of the screen called “Friend Activity” that lists the songs they have been listening to.

Fig: (Johnson, 2019)

This means that real-time user metadata is being shared across the platform using the pub/sub cloud systems through Apache Storm and Casandra that is then presented on the Graphical User Interface of computers. It might be difficult to understand why Spotify has not yet integrated this feature into its smartphone app. One possible reason why they have not launched this feature on smartphones might be because UX researchers and engineers are still in the testing phases to figure out the best affordances and user flows to optimise smartphone specific user experience with the “Friend Activity” feature while still maintaining Spotify as a primarily music and podcast streaming platform as opposed to a social media platform.

Fig: Architecture Supporting Social Interaction (Setty et. al., 2013, pp. 3)

The ‘external database’ mentioned in the figure is now part of Google’s Cloud Storage. All of Spotify’s backend services are operated via Google Cloud Services still using the Pub/Sub system.

In terms of database management and access, Spotify uses similar pub/sub systems as Facebook, Twitter and Google+ (Kermarrec and P. Triantafillou, 2013, pp. 16: 5). Yet Spotify is not a social media platform. Which means that it is only on the top level of the design architecture that it fashions itself as a cloud based music and podcast streaming service. At the bottom levels of the design architecture as well as the kind of user data it has, the way it manages it and the way it leverages the data, is it remarkably similar to how many social media companies design their systems.



Spotify uses data to calculate royalties, run A/B tests, process payments, serve playlists and suggest new tracks to users. (Leenders, 2018) This data is protected by encrypting it with a single keychain. “Each user has their own set of keys that should be used for the encryption” which reduces the impact of any possible data leak since even hackers need decryption keys. Additionally it allows Spotify to control the lifecycle of data for individual users centrally.

Padlock is their key management service that manages keychains for all Spotify users. “This means, for example, every time a user looks at a playlist (even their own), the playlist service makes a call to Padlock to get the keychain of the playlist owner and then decrypts the playlist. Each service that calls Padlock gets its own set of keys” (Leenders, 2018). The keys have other applications as well. For example when a user opts out of targeted advertisement, access to this user’s personal data by the targeted advertisement can be blocked by removing the corresponding key so the advertiser can no longer identify the user as a target.

We can deblackbox ‘friend activity’ and its family of features using our knowledge of Spotify’s Padlock system. The same encryption-decryption keys must be called upon when users follow each other or receive notifications about their friend’s activities. The user can opt out of displaying their data in ‘friend activity’ tabs any time. All they have to is switch off  ‘Listening activity’. This would mean that the corresponding key is removed, thus blocking access to this user’s metadata related to the relevant ‘friend activity’.





Fig: Social Privacy Settings



Social Savvy Affordances or the lack thereof

It seems that it is a strategic business/political decision then, that Spotify does not become a social media platform despite the kind of robust data infrastructure it has. With the heavy hitting that Big Tech is receiving in recent years, it seems reasonable to want to steer clear from the controversial limelight related to user data. Privacy, market power, free speech and censorship are key issues that plague social-media based platforms (Boskin, 2019). Spotify has avoided being politicized, while still proliferating its user database and metadata acquiring capacities. Despite a range of designs available out there on — from a ‘strong’ social features centric design (Jessica Man) to a ‘weak’ social features supplemented design (Cecilia Lu, Sanjana Seshadri), nothing even close has been integrated into Spotify.


Fig: Man





Fig: Lu

Fig: Seshadri

This means that in terms of UI design, Spotify may not be willing to create affordances that enable large scale social interactions internally within the app, steering clear of any Facebook wall-like features. Spotify may be trying to adhere to its privacy statements and avoid political controversies surrounding user data. Searching people on Spotify is not as straightforward as Facebook in terms of both accuracy as well as simplicity. You cannot comment, react or tag a song inside Spotify. The no of actions one can perform on a song one likes is limited. Sharing is not granted internally, nor are there many features to recommend others, notify or perform any social actions on these songs. This can partially be explained as a flexibility-usability tradeoff (Lidwell-Holden, pp. 86) and partially by Spotify’s reluctance to be more social savvy, meaning the lack of affordances could actually be a strategic design constraint.

De-blackboxing the app reveals that though it has quite similar structures to social media apps, it is on the user-front that Spotify wishes to remain a streaming platform. It is here that we see how even technical decisions at the database structure level and UI design decisions depend on not just universal design principles for efficiency, but also socio-cultural landscapes.

Spotify allows “sharing” via third party apps like Facebook, Whatsapp, Tumblr, etc, using Web-based APIs but not internally with friends or followers. It does not access contacts on the mobile phone to connect to other possible Spotify users. Facebook data integration is the only way Spotify connects its users. It is understood that Spotify Wrapped was created to be shared on Instagram Stories. Meanwhile, Spotify acquires better AI systems to improve its recommendation service, continuing to focus on being streaming service platform (Novet, 2020). This again can be noticed in the proliferating affordances provided for personalized curated music such as discover weekly, made for you playlists, recommended radio playlists, etc.



Spotify derives its power from its database management systems. All the music and all the user data is stored on Google cloud using these systems. We saw how pub/sub system is utilised to operationalise streaming services on Spotify. We saw how various software libraries are used by Spotify to engage, extract and channel metadata generated by users as they interact on that app. The library softwares, cloud storage and the pub/sub system form the backbone on which Spotify functions. Accordingly, there are UI affordances and constraints that the app provides to the user that dictate how these systems will be used on the backend. And finally we saw how these UI affordances and constraints are placed, depend not just on universal design principles but also on the overall strategy of the company i.e. just because Spotify has the user data, the systems and the capacity to provide certain services (social media related) does not mean that the top level design has to necessarily exploit bottom level design architecture just because the the bottom level design has the capacity for it.



Boskin, M. (2019, April 29). Big tech must get its house in order or risk stronger regulation | Michael Boskin. The Guardian.
Greenberg, D., Kosinski, M., Stillwell, D., Monteiro, B., Levitin, D., & Rentfrow, P. (2016). The Song Is You: Preferences for Musical Attribute Dimensions Reflect Personality. Social Psychological and Personality Science, 7.
Jakobsen, A. Y. L. (2018). Eventization of listening: A qualitative study of the importance of events for users of the streaming service Spotify. 125.
Janota, B., & Stephenson, R. (2019, November 12). Spotify’s Event Delivery – Life in the Cloud.
Spotify Engineering.
Johnson, D. (2019). How to Find Friends on Spotify. Lifewire.
Kermarrec, A.-M., & Triantafillou, P. (2013). XL peer-to-peer pub/sub systems. ACM Computing Surveys, 46(2), 16:1–16:45.
Leenders, B. (2018, September 18). Scalable User Privacy. Spotify Engineering.
Lidwell, W., Holden, K., & Butler, J. (2010). Universal Principles of Design, Revised and Updated: 125 Ways to Enhance Usability, Influence Perception, Increase Appeal, Make Better Design Decisions,.
Rockport Publishers.
Lu, C. (2018, December 16). Spotify Mobile Case Study: Integrating Friend Activity. Medium.
Man, J. (2019, August 10). Re-imagining Spotify as a Social Media Platform. Medium.
Maravić, I. (2016, February 25). Spotify’s Event Delivery – The Road to the Cloud (Part I). Spotify Engineering.
Mishra, & Brown. (2015, January 9). Personalization at Spotify using Cassandra. Spotify Engineering.
Novet, J. (2017, May 18). Spotify just bought an AI startup to help it stay ahead of Apple Music. CNBC.
Nudd, T. (2009). Spotify Crunches User Data in Fun Ways for This New Global Outdoor Ad Campaign.
Reach for the Top: How Spotify Built Shortcuts in Just Six Months. (2020, April 15). Spotify Engineering.
Ron White, How the Internet Works.” Excerpt from How Computers Work. 10th ed. Que Publishing, 2015.
Seshadri, S. (n.d.). Spotify. SANJANA SESHADRI. Retrieved December 8, 2020, from
Setty, V., Kreitz, G., Vitenberg, R., van Steen, M., Urdaneta, G., & Gimåker, S. (2013). The hidden pub/sub of spotify: (Industry article). Proceedings of the 7th ACM International Conference on
Distributed Event-Based Systems – DEBS ’13, 231.
Spotify Case Study. (n.d.). Google Cloud. Retrieved December 9, 2020, from
Vesterlund, M. (2015, June 23). Switching user database on a running system. Spotify Engineering.
What Is Pub/Sub? | Cloud Pub/Sub Documentation. (n.d.). Google Cloud. Retrieved December 7, 2020, from

IBM Cloud as a Modular Design

The first “truly modular” computer design was IBM’s System/360, a broad, compatible family of computers introduced in 1964, so it’s no wonder that IBM continues to make use of the modular design framework to innovate and bring about new services (Baldwin and Clark, 2000, 13).

A module is a unit whose structural elements are powerfully connected among themselves and relatively weakly connected to elements in other units (Baldwin and Clark, 2000, 63). IBM cloud services are offered as public clouds, private clouds and hybrid clouds (combining access to public and private clouds). The IBM Cloud is an integration of cloud services, products and services in the form of applications to manage these clouds and workloads, virtual servers, networks, security integrated to avoid hacks and leaks and data storage on their physical servers that can be accessed by companies remotely.

This solution works well for businesses based on the modular design of these cloud computing services. Each of the functions, applications and technologies provided in a cloud service package is composed of whose structural elements that are powerfully connected among themselves and relatively weakly connected to elements in other units (Baldwin and Clark, 2000, 63). This allows IBM to continuously innovate and add, remove and replace new products and services while maintaining the larger package.

The cloud consists of at least 6 distinct overarching service modules –  infrastructure, hardware, provisioning, management, integration and security. Each of these themselves have further modules that are setup in hierarchical structure (nesting elements in layers or levels) in complex systems as we head down each individual layer (Irvine, 2). Indeed it is because of this layering that many of modular designs work as it is the the process of organizing information into related groupings in order to manage complexity and reinforce relationships in the information. (Lidwell et. al., 2010, 95)

It is this modularity that allows for users to perform versatile operations such as building their own customised cloud environment. Without this built in modularity, separate applications could not be mixed and matched to do just as much as the user wants. There is no more need of  “extensive control” of all elements of a cloud design (Baldwin and Clark, 9). “Standalone” cloud designs are out and the modular cloud services that offers modularity on front stage as well as back stage is now in, allowing more flexibility for engineers, designers as well as clients.


References –

Carliss Y. Baldwin and Kim B. Clark, Design Rules, Vol. 1: The Power of Modularity. Cambridge, MA: The MIT Press, 2000.

Lidwell, William, Kritina Holden, and Jill ButlerUniversal Principles of Design. Revised. Beverly, MA: Rockport Publishers, 2010.

Martin Irvine, Introduction to Modularity and Abstraction Layers (Intro essay).

Universal Design Principles in the Macbook Keyboard

If Norman was correct, then the touchbar is yet another attempt in moving away from good perceived affordances towards cultural conventions as an approach to designing UI’s. The keyboard has physical characteristics that give it certain user benefits in terms of universal design principles. The letter keys are in the center, an affordance to be manipulated by the index finger and the middle fingers of both hands. The physical characteristics of keys influence the way they function and are likely to be used (Lidwell et. al., 20). Its buttons provide feedback in the form of keystroke sounds that give us a sense of completion on stroking the keys, signaling that the key was indeed struck well. Of course the more direct feedback is from the screen cursor moving along. All the command keys such as ‘return’, ‘caps lock’, ‘shift’, etc are provided on the side and are larger than other keys to enable larger surface areas to tap without looking. In fact, the QWERTY design of the keyboard layout exists so that we do not need to constantly keep looking at the keyboard while we type. If the function of the UI is to be as transparent as it can while it facilitates our interaction with the computer, then the Macbook Touchbar is definitely working in the opposite direction.

If we are to understand technologies and societies not as groupings of isolated, independent parts, but as a complex system of relationships, then we must look at technology as undergoing, what Brian Arthur calls, combinatorial evolution (Arthur, 7; Irvine, 1).

The touchbar replaces the top row of the keyboard which contained function keys F1-F9. Over the years these keys have lost their original relevance as terminal keys and began serving as keys for other functions within the mac ecosystem. F1  increased brightness, F2 decreased brightness, F3 launched “Mission Control” (a form of multi screen display in macs), F4 launched the application menu, and so on. Though these keys relied on convention i.e. one had to learn the commands and practice it over time to remember, most users who grew up using these keyboards were familiar with them. This meant there was no need to divert attention from the screen while typing. The touchbar introduced by the Macbook in 2020 evolves from the predictive texting feature in iPhone, a feature that is useful when the keyboard is seamlessly attached to the screen. But that is not the case with Macbook, where the keyboard is at right angles with the screen. Only recently one could seamlessly move the fingers over to the F2 to decrease brightness, one now has to look at the various symbols open it up and scroll along a display.

Though there are many constraints designed into the touchbar to help us focus (for example, other icons disappear when one of the icons is tapped), it overall is going through the same historical issue that Norman mentions in his design of everyday things: “Each time a new technology comes along, new designers make the same horrible mistakes as their predecessors. Technologists are not noted for learning from the errors of the past. They look forward, not behind, so they repeat the same problems over and over again” (Norman, xv) and adding OLED touchbars is just that.


References –

Brian Arthur, The Nature of Technology: What It Is and How It Evolves. Excerpts from chapters 1, 2, 4.

Donald A. Norman, The Design of Everyday Things. 2nd ed. New York, NY: Basic Books, 2002. Excerpts from Preface and Chap. 1.

Martin Irvine, Introduction to Design Thinking: Systems and Architectures

William Lidwell, Kritina Holden, and Jill ButlerUniversal Principles of Design. Beverly, MA: Rockport Publishers, 2010. Excerpts.

The Internet is Alright

At first, I began getting worked up about ‘Appification’ destroying the web. The web works because of the open ended Hypertext Transport Protocol (HTTP) enabling intercommunications between Internet servers (and services) and individual connected devices based on a client-server model (Irvine, 1). This means there has always been enormous potential of building client-side devices in various ways to enable access to the web. Yet any client-side device/software, be it Apple II, IBM, PC series, or Microsoft had to follow design rules of the World Wide Web. The open, standards-based, device-independent architecture meant that it was distributed, modular, extensible, interoperable, and scalable (Irvine, 2). The networked “hypertext” system is now networked “hypermedia”, with images, videos, audio, all available alongside textual information, each hyperlinked. The link-encoded displayable objects produce on-screen indicators in the graphical interface (colored or otherwise marked text strings, icons, navigational indicators) (Irvine, 3). This extensible feature of the Web is what gave rise to applications. Since modularity and extensibility were features of the web, the market exploited this by building apps which tap into a portion of the internet to retrieve information that the specific application requires.

For example, Tinder a popular match making app, uses the web to store information about its users on servers. When a user logs into tinder and starts swiping, they are sending network deliverable files to tinder servers. This app uses the ISP (Internet Service Provider) to send those files to the nearest DNS node, a cooperatively run set of databases (White, 369). The DNS informs your app of the IP address where your app sends a request to receive communication from tinder servers, which responds with images and texts of people near your location, which you can go through (Swipe). The images and texts of people in your area, is a very small subset of the vast information available around the web. Now think if that was your only access to the web, and anything and everything you wanted to learn about the world would be via Tinder. This is “appification” where applications replace general purpose browsers as gateways to the www.

I began this essay by saying that I was getting worked up. I got worked up because it sounds like facebook is doing exactly what I tried to show with my hypothetical “Tinder as the sole access to the web” example. Mobile computing has moved from, as Zittrain puts it, “from a generative Internet that fosters innovation and disruption, to an appliancized network that incorporates some of the most powerful features of today’s Internet while…heightening its regulability.”(Zittrain, 8). A set of blunt solutions to the problem of overwhelming information and security issues that were waiting to emerge from the open architecture of the web.

After getting worked up, I began searching for features in applications that compromise open ended access for ease of use. I began with google maps and I could find every feature that I needed, with no difference in the desktop as opposed to the mobile application. The same thing happened with chrome as a browser on desktop as opposed to the application. And it kept on happening as I moved across multiple apps. Apps had almost everything their desktop counterparts had, even though I really thought that there has to be some sort of compromise!

I think as long as we have a search engine (google or otherwise) that indexes  every website as extensively and regularly as the googlebot (White, 374), the web will always be accessible. I think it will stay on as this vast ocean, that we once voyaged across. But we voyaged only because we needed something from it.  “People never cared about the Web vs. apps and devices,” commented Mark Walsh, co-founder of “They want free stuff, entertainment, and services when they want them, and on the device they have in front of them” (The Future of Apps and Web). Now that we have a plentiful right where we are at, there is no need to go on long journeys across the web. But if we wish, the indexing search engine will always ensure that there is unlimited access, even if the search engine itself is ‘appified’. The Internet will live.



What does it mean to be on the Internet

The internet looks like a monolith when we engage with it but there are many cultural and socio technical factors that play into it being viewed as such. When we think about what it actually is, we cannot point out to anything in particular. The internet shows itself to us via our computer and mobile screens. But this doesn’t mean that the internet is behind our screens? It is so metaphorically but what does it mean? GUI’s provide us access to the internet but we need to breakdown the internet into its multiple components and layers to understand what does it mean to say “we are on the internet”.

The internet began as ARPANet, a project in the 80’s to solve a network engineering problem of how to connect mutually incompatible computer systems in different locations with no single point where the network could be broken (Irvine, 4). US government via DARPA, managed funding for research projects related solving problems in data communication methods between computers. Telecom companies were far away from such research because they had already invested so much in switched network telecom infrastructure. What came out of these government funded university research labs was the TCP/IP protocol and data packets as ways of sending messages. The Transmission Communication Protocol or the Internet protocol is a method for sending and receiving data packets that work end-end regardless of the incompatibility of computer systems on each end (Irvine, 6). Data Packets as opposed to the single, continuously held, closed circuit connection used in phone calls, make up bundles of data packed in smaller units (Irvine, 6). The packet itself has no information but much like carrier waves in radio signals, it carries data that can be called the “payload”. This packet has other information in bits that determines its path in the network (which computer to go to based on IP address) and the payload has the message stored in it which is received, and converted for the GUI on the other end (White, 258-259). This is the symbolic-technical aspect of the information system we call the internet.

The physical system comprises of the Coaxial cables that run underneath the ground that carry these data packets to your house, the routers, modems, etc in accordance with Internet Protocol. The internet is treated like a monolith as if it’s a totalized unified entity with a force and agency of its own (Irvine, 2). But underneath we see that it is a symbolic-technical system that transfers syntax that we convert into meanings using our GUIs, all grounded in physical technologies likes high-speed phone lines, fiber-optic connections, and microwave links (White, 280).


References –

Martin Irvine, The Internet: Design Principles and Extensible Futures

Ron White, How the Internet Works.” Excerpt from How Computers Work. 10th ed. Que Publishing, 2015.

Interaction Design guidelines and principles in YouTube on iOS

I have considered YouTube as a standard mobile device app on an iPhone. It’s interesting to note the design theme that apple requires shapes the way the app is designed. We start, by opening up the phone and looking at the home screen. The home screen is organised using standard sized icons and the youtube icon is designed using two contrast colors of red and white which still stick to the basic guidelines developed to engage user attention by limiting intensity and colour (Shneiderman, 86). Here we notice iOS’s aesthetic requirement of rounded icons, which means there are no sharp edges. This is part of a larger consistent aesthetic that Apple’ would like its app designers to have so that the entire device functions and appears on a similar note. They require app developers to focus on legible text at every size, precise and lucid icons, fluid motion and a crisp interface with transitions to provide a sense of depth. (Themes–IOS Human Interface Guidelines).

The YouTube app necessarily then, has all of these thematic features. The icon is a pixel grid representation enabled by capacitive sensors hidden below the glass surface of the screen. These capacitors have their own electric field which responds to the the tiny electric fields produced by our fingers (Irvine, 9). The grid of wires and electrodes below the top glass detect this and enable the software to recognise where the touch was placed along the X-Y axis of its grid. The software then, via its coding responds by changing the entire pixel grid of the screen (launching of the app).

On opening the app, we see the same guidelines and principles at work as Shneiderman talks about. Interaction Design in YouTube that follows guidelines: We have separate designs to engage user attention and separate designs to facilitate data entry. The bottom icons which we use to engage are small, the text is legible and large, and the smallest icons are to do with settings that we tamper with the least (Themes–IOS Human Interface Guidelines, Shneiderman, 86). All these fit with guidelines of allowing the most useful parts to be the one most prominent ones. In facilitating data entry, youtube provides a search bar, where we can enter natural language text. Since no coding is required and there are abundant filters available, data entry is well facilitated (See image below).

When we look at the interaction design principles in place, we realise that YouTube is built for all three levels of users which means that the affordances they provide have to still be intuitive and yet allow the expert regular user to engage the app without getting bogged down with unnecessary content. One good approach the app has take is that uses visual cues to guide the user instead of mentioning any form of instruction (a principle that many other apps also abide to). Thus we see functions like “tap to play” that are intuitive to novice users, filters in data search for intermediate users and ways to access history and curate content for expert users (Shneiderman, 89) (see below image). Overall its well designed, guided by the apple aesthetic along with its adherence to basic interaction design guidelines. It is still based on principles of engaging with all 3 users ( Shneiderman, 89-90) which makes it a well designed app.


Ben Shneiderman, Catherine Plaisant, et al. Designing the User Interface: Strategies for Effective Human-Computer Interaction. 6th ed. Boston: Pearson, 2016.

Themes—IOS – Human Interface Guidelines—Apple Developer. (n.d.). Retrieved October 28, 2020, from

A Few Glimpses into the Design Evolution of Computers

I think it was the vision of an open source, accessible computing device to anyone who would want to operate it, that was responsible for beginning of the era of personal computing. Memex, the proto hypertext system that acts as a central universal human augmented knowledge system was conceived as a democratic pool of knowledge (Bush, 1945, p44).

So many design steps that developed in a staggered fashion in the past 70 years are responsible for the way the computer is assembled currently, both physically and digitally. Computers began as numerical and logic processors (Irvine, 2018, 7). But soon in  the 1950’s, we could find the first manipulation of bits to convert these number crunchers into cognitive symbolic mediators. The first stage was alphanumeric symbols printed on scrolls of paper using binary representation. By the 60’s we had CRT screens, instruments used in physics laboratories, that could be repurposed to display inputs and outputs in a computer using bit mapping: a type of memory organisation that enables pix maps. Pix maps are what allow storing of two or more colours in pixels that allow for an image to be displayed on this CRT.

This was the birth of an interface to directly interact with the digital and soon there was a need for smarter “user interfaces”. Douglas Engelbart invented the first mouse using a small blob of wood and two wheels in the bottom, at right angles to each other,  so that the corresponding selection on the screen could move across the X-Y axis (Of course the cursor had to be invented yet) (Moggridge, 2007, 27).

At this point, I think the notion of augmenting human intellect as the defining paradigm of how computing evolves began to shake a little. “It is easy to understand the idea of going for the best, of catering to the expert user, and then providing a path to get there from a simple user interface designed for the beginner. In practice, however, this has proved to be the wrong way round, as it’s not easy to get something right for the beginner when your design is already controlled by something that is difficult to learn.” (Moggridge, 2007, 36). The ideal of reaching maximum augmentation of human intellect by catering to the expert user can be seen in the creation of one of the first proper assemblies of the modern computer. The NLS or the “oN-Line System”  was the first to employ the practical use of hypertext links, the mouse, video monitors, information organized by relevance, and other modern computing concepts. (Moggridege, 2007, 33-37)

It was Larry Tesler who realized the value of participatory design and how it was better to observe people interacting with your interface and make changes accordingly. He started performing what we today call usability tests. He is also respsonsible for the proper integration and invention of the “Double-click”, “Cut”, “Paste”, and Cursors (or improvisation in the case of cut-copy-paste).

All these functions and many other such developments are responsible for the way the computer now presents itself. For example, there may still be a key called “insert” which I last saw in my late 2004 windows enabled computer. This key once was used to insert characters or approve of commands. With the invention of the “Cut & Paste”, the insert key kept falling out of relevance and was repurposed for a few other actions until it became obsolete. It no longer exists on the mac keyboard.



Bush, Vannevar (1945). “As We May Think,” Atlantic Monthly 176 (July 1945)

, Bill. ed., Designing Interactions. Cambridge, 2007. Excerpts from Chapters 1 and 2: The Designs for the “Desktop Computer” and the first PCs. MA: The MIT Press, Pp. 17-68

Irvine, Martin. 2018. Introduction to Symbolic-Cognitive Interfaces for Computer Systems: History of Design Principles (essay).

Irvine Martin. Computer Interface Design Concepts: Major Historical Developments (Original Documents), Compilation.


Around the Code

I had heard the term “source code” numerous times but to really understand it I had to interact with it. Source codes are lines of code that can run a program. But when it comes to the practical world, it seems IDEs (Integrated development Environments) are used to write source codes, as they have more functions and better UI than simple text editors. In fact, IDEs like Xcode even try to make use of color conventions and indexes to make the coding process smoother.

Having an understanding of how coding works can help us deblackbox many of the cognitive symbolic technologies like the applications that run on iOS. By understanding the features and limitations, the affordances and constraints of Xcode, we can see why certain apps in the iOS ecosystem tend to be designed in the way they are.

Like natural languages, programming languages have a grammar too, i.e. syntax rules. They have functions as equivalent phrases, which are the most commonly used codes of line boxed into a single block. They have statements, variables, whitespaces, strings, all of which perform some function to translate actions from human language to machine language.

When we go deeper we see that any code at the basic level is translated into machine language which is in bits. Bits that are the basic unit of Information: a string of 1’s and 0’s. Numbers still lie at the center of programming languages. Working with numbers is still a basic skill one must learn to be proficient with in Python.  We can then see Python as part of a cognitive symbolic continuum, (Irvine, 2) where ratiocinators and analytical engines that used crunch numbers, underneath all the programming devices with advanced user Interfaces.

I tried python on the visual studio code software and it was interesting how even single lines could be run to see what results they give. It reminded me of the feedback that is expected of good GUI so that we can go about coding much easily without focusing on the syntax all the time.

While reading Evans (2011), I was particularly fascinated by the example he gives for colour. Since computers essentially run on bits, it means that each 1 and 0 could mean yes or no to a variety of colour inputs. This means that the computer has a finite range of colour options which it can use on a single pixel and together these varied coloured pixels will come together to deliver an image. But what about a painting? In a physical form, the colours are mixed to form interesting hues and if non-experts stand in front of a Van Gogh we cannot really tell the difference between a fake and an original. Yet an original “Starry Night” impacts us profoundly. Evans says that “The set of colors that can be distinguished by a typical human is finite; any finite set is countable, so we can map each distinguishable color to a unique bit sequence.” (Evans, 2011, 12)  But does that mean a computer image of painting that is indistinguishable from a painting has the same richness in colour? Does it have the same impact? I would like to leave you with this question.


References –

Martin Irvine, “Introduction to Cognitive Artefacts for Design Thinking” (seminar unit intro).

David Evans, Introduction to Computing: Explorations in Language, Logic, and Machines. 2011 edition.


Signal Transmission Theory of Information: A Subsystem for Meanings

Claude Shannon had a problem in electrical engineering to solve, namely, “the fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.” Though he uses the word “communication” what he means is “communications system design” (Irvine,4). Irvine draws the difference for us between E-information and information, where the prior refers to “the physical patterns of quantifiable units”, an engineering information perspective as opposed to the common understanding of information as knowledge/content/meaning (Irvine, 5).

Shannon’s seminal paper “A Mathematical Theory of Communication” tackled this problem by understanding how to transmit signals from one place to another in a reliable way, such that the decoder at the other end receives the same unperturbed message as the encoder encoded.

The main components of his model are –

INFORMATION SOURCE – where the semantic meaning is composed

MESSAGE – where the semantic meaning is translated

TRANSMITTER – the device that transmits the signal

TRANSMITTED SIGNAL- the encoded signal that has converted the message into an electromagnetic form

NOISE SOURCE- random noise or electromagnetic interference

RECEIVED SIGNAL –  the encoded signal that is received along with some noise.

RECEIVER – the device that decodes the signal that to convert into the message that was sent

MESSAGE – the original message that was sent by the information source.


Shannon understood that the information structure had nothing to do with the content of the information and the simplest way to represent information potential is in bits – (yes and no answers that can generate 1 and 0’s corresponding to on-off states in a circuit.) i.e. if the transmitter from one end can successfully on-off certain switches on the other side via electromagnetic signals then that would represent a successful way of coding and decoding information, with min loss. (Denning & Bell, 472-3)

To this end he borrowed concepts like entropy – which would be the measure of the information and the entropy threshold would then defined the boundary between reliable and unreliable channels of signal transmission. (Denning & Bell, 473).

The signal transmission theory is useful when we consider it as a design intervention to the problem of signal coding. But the signal itself is a translation of our prior symbolic message. The engineering problem brackets out (to focus on its technical difficulties) the prior conception of the message. But when we look at the entire process as a symbolic cognitive technology, the chain roughly looks like this – cultural-social context – individual – intention- meaning – encoding – signal transmission -decoding – message reception – individual – interpretation and affect within a cultural-social context.

References –

Shannon, Claude Elwood . “A Mathematical Theory of Communication”. Bell System Technical Journal27 (4): 623–666.October 1948 DOI:10.1002/j.1538-7305.1948.tb00917.xhdl:11858/00-001M-0000-002C-4314-2.

Martin Irvine, Introduction to the Technical Theory of Information as Designer Electronics

Peter Denning and Tim Bell, “The Information Paradox.” From American Scientist, 100, Nov-Dec. 2012.


What Tinder Affords?

Don Norman was very clear in his distinction between real affordances vs perceived affordances. Real affordance is more similar to Gibson’s original definition – “An affordance is an action possibility formed by the relationship between an agent and its environment ( J. Gibson 1977; J. Gibson 1979)…… An agent does not need to be aware of the afforded action, such as the affordance of opening a secret door.”  Perceived affordance would be a doorknob popping out of an otherwise plain door. Of course, the door would no longer be a secret door then!

It is a fine line between convention and affordance as it really depends on what is considered universal. A button affords to be pressed or is it a learned convention over 200 years? It is so transparent now that we no longer see it as learned. In practice this question matters a little less (in now way am I suggesting that it is to be rid of completely!) as long as we are sure buttons will prompt the user to “press” without any training, then the button is a “perceived affordance”.

The touchscreen by itself affords everything from touching, tapping, scrolling, hitting, licking, pressing, pushing the palm into it, etc. These would be real affordances and they do not necessarily have to make sense but they are the “action possibilities” (Gibson, 1979). The task then is to restrict some of these affordances (no not constraints, we will talk about constraints later) by allowing for perceived affordances to be narrowed down to real doable action translations instead of mere action possibilities. Let’s consider “Tinder”. I choose tinder because it is constrained in the number of actions that can be performed in it. On most days the only actions on this app are – swipe left or swipe right. Texting on it (a later phase) may involve some more interaction with the app features, but we will consider only the basic actions performed by users on most days.

As an icon on the touch screen, tinder does not have a perceived affordance of “tapping” but rather tapping is a learned convention and the icon is visual feedback that advertises the affordance (Norman, 40).  The icon is a real affordance but there is no obvious action potential stored within the iconography. The feedback is the app launching and taking over the entire screen. On launching, we see all sorts of icons on the display which are matters of cultural convention. (See fig)

The icons: heart, x, star, back, thunder, require training in cultural conventions, and then some context in how they are used on tinder. Without performing actions and receiving feedback one does not immediately grasp what actions can be afforded. These conventions can be divided in accordance with various constraints offered: physical, logical, and cultural. (Norman, 40)

Physical constraints are the limited spaces on the screen where one can tap and expect feedback. The embossed icons or clearly defined buttons are good conventions that call out for a command “Click here” or are good action translators (Irvine, 5) suggesting that other places are not meant to be tapped. Another physical constraint would be the otherwise white background that offers no perceived action possibilities.

The intuitive swipe left or swipe right features on each photograph on tinder are still a learned convention. Maybe if they were presented as a stack (where we could see that more pictures exist behind) then the notion of swiping left-right as if swiping through flashcards would have been a very good perceived affordance.

An easy to spot logical constraint is the photo that appears after the one you swiped. It demands a similar action out of you. The logical constraint generates a clear affordance of what to do with the pictures underneath.

Since most GUI are now touch-based, designers are primarily left to operate with cultural conventions as a way to achieve perceived affordances, but it would be great to see a perceived affordance arising without the use of cultural iconography.


Donald A. Norman, “Affordance, Conventions, and Design.” Interactions 6, no. 3 (May 1999): 38-43.

Gibson, J. J. (1986). The Ecological Approach to Visual Perception. Psychology Press.

Martin Irvine, “Introduction to Affordances and Interfaces.”

What Are Affordances in Web Design? (2014, July 15). Treehouse Blog.