Author Archives: Lili Zhai

Tracking the Path of Spotify Music: Design Principles and Technologies that Make Spotify Workable


Streaming technologies are getting more popular and essential with the development of the Internet. Cheap, convenient and fast, streaming music is gradually replacing the physical music players, such as vinyl, CD player, and phonograph. Music streaming platforms not only provide users a wide variety of songs without the limitation of time and space, but also construct a social network environment where users can share music. Leading companies in the industry include Spotify, Apple Music, Amazon Music, Pandora etc. In addition, streaming music service companies offer users mobile apps with great functions, meaning that users do not need to stick to a PC to enjoy the music. The competition among companies makes consumers the biggest beneficiary.

In this research paper, I am going to use the case study of Spotify app to research into the question of “How does Spotify implement as many design principles as possible?” This paper is divided into three main parts: 1) two main infrastructures that keep Spotify running: proliferating data infrastructure as well as audio and streaming infrastructure (Eriksson et al., 2017); 2) three ways Spotify works as interface: interface between websites, between human and devices as well as between human and the large computing system; 3) Spotify as a sociotechnical system. My expected goal of this paper is to visualize the series of unobservable series of systematic actions triggered by “one click” of the button on the GUI of Spotify.


Streaming(adj.): relating to or being the transfer of data (such as audio or video material) in a continuous stream especially for immediate processing or playback (Merriam-Webster)

  • How Does Music Streaming Work?

With streaming, “the client browser or plug-in can start displaying the data before the entire file has been transmitted”. During the procedure of streaming, the audio file is transmitted and delivered in small packets, which compose metafiles, and then decoded by the codec. When the buffer is filled by the decoded results, the files are turned into music and the computer straightly plays the music (White, 2015) “Each of the scores of available audio codecs are specialized to work with particular audio file formats, such as mp3, ogg, etc. As the buffer fills up, the codec processes the file through a digital-to-analog converter, turning file data into music, and while the server continues to send the rest of the file” (White,2015).

Different from downloaded music, which is stored permanently in the device’s hard drive and whose access does not require the connection to the internet once stored, streaming music works through wi-fi or mobile data, and the users do not “own” the music. As long as there is a steady stream of packets delivered to the computer, the user will hear the music without any interruptions.

  • Digitization of Sound Wave and Compression of Audio Files

The digitization process of sound wave follows the information transmission model of Shannon. The digitization process is like an analog of sound: it is recorded as a sequence of discrete events and encoded in the binary language of computers. Digitalization involves two mains steps, sampling, the measurement of air pressure amplitude at equally spaced moment in time, and quantization, the “translation” from the amplitude of individual sample to integers in binary form. Sample rate refers to the number of samples taken per second (samples/s), which is also called Hertz (Hz). Bit depth refers to the number the number of bits used per sample. The physical process of measuring the changing air pressure amplitude over time can be modeled by the mathematical process of evaluating a sine function at particular points across the horizontal axis. As it showed in the graphic below” (DIGITAL SOUND &MUSIC).

In order to reduce the file size and stream more efficiently over network, the sample rate of digital audio, bit depth and bit rate, are often compressed, during which the quality of the music is inevitably damaged. Compression can be lossy and lossless according to audio quality: lossless compression enables compressing the file size while remaining the quality of the audio. What is more, “the file can be restored back to its original state; lossy compression permanently removes data (by reducing original bit depth)” (BBC).

Two Main Infrastructures of Spotify:

When digital audio is packaged into files and “become music at Spotify, aggregation of data occurs on, at, and via many computational layers” (Eriksson et al., 2017). The platformization of Spotify hides the complex data exchange process behind the well-designed GUI, so that its design principles are invisible to consumers. The two main infrastructures, proliferating Data Infrastructure and Audio and Streaming Infrastructure, and several detailed design principles, are what make Spotify workable.

  • Proliferating Data Infrastructure

Following an “end-to-end server and client model”, Spotify’s proliferating data infrastructure, proposed by Eriksson et al., is the foundation of its service . This infrastructure enables the communication between Spotify’s servers and their clients’ devices: at one end lied their servers and data centers to “send out music files and fetch back user data” (Eriksson et al,2019); at the other end lied users’ playback devices, a PC, a smartphone or a tablet.  Spotify is synced to the Spotify Cloud Server, a non-physical public data storage by Google, meaning that it “operates based on Google’s cloud compute, storage, and networking services, as well as its data services, such as Pub/Sub, Dataflow, Big Query, and Dataproc” (Datacenter Knowledge). 

Data is the most important part of this infrastructure. Spotify builds ways for data exchange. Eriksson et al. see the transmission of information and data between users and Spotify’s server as an “event delivery system”. Most events that are produced within Spotify are generated from Spotify’s users. They define user data as sets of “structured events that are caused at some point in time as a reaction to some predefined activity” (Eriksson et al., 2019). When a user performs an action, for instance, clicking the play button on the app, a piece of information (“an event”) signaling “playing the music” is sent from the user’s device to the server via the internet.

  • Audio and Streaming Infrastructure

The second infrastructure is the audio and streaming infrastructure. Spotify balances the file size and the speed of the internet very well since its streaming service experiences a very low latency, that is the delay, between a user requesting a song and hearing it, is almost imperceptible. Spotify’s  low latency streaming is owe to Ogg Vorbis format, an open-source lossy audio compression method “that offers roughly the same sound quality as mp3, but with a much smaller files size”. Note that music files are not permanently stored in the destination device, what happens is that the buffer stores a few seconds of sound before sending it to the speaker, so Spotify’s client “fetches the first part of a song from its infrastructural back end and starts playing a track as soon as sufficient data has been buffered as to make stutter unlikely to occur” (Eriksson et al., 2017 ). The small file size enables Spotify’s server to send the file fast, and thus to play music almost instantly after the client clicks the play button.

Spotify offers different bitrate in regard to streaming quality.

This design enables users to adjust the music quality according to their needs, or they could “turn on the automatic quality streaming” so that the app will automatically detect the best bitrate to use based on the internet environment of the device, which is super convenient for mobile app users, since they do not need to worry about suddenly running out all data by accidentally selecting extreme.

Sum Up: How Does Spotify Track Streams?

The scale of data passes through Spotify is enormous: In 2016, the company “handled more than thirty-eight terabytes of incoming data per day, while permanently storing more than “70 petabytes of…data about songs, playlists, etc.” (Sarrafi,2016)

What happens when the user clicks the “play” button? How does Spotify deal with the large scale of data? How can Spotify make sure the data are transmitted accurately? How Spotify manages to make sure the data is on the right path and head to the correct destination? All things about de-blackboxing the design of Spotify is to track the path of its data files.

In order to deliver music worldwide, Spotify applied SIR (SDN Internet Router).  To recap, “every computer on the Internet has an IP, that IP belongs to an IP network and that IP network belongs to a an organization”(Spotify Labs). So, Spotify identifies users by their IP addresses. First, Spotify has two transit providers to make sure they can reach all clients. Transit providers are companies who own very large networks and allow other organizations to connect to their network for a fee so they can reach the rest of the world. Second, Spotify uses Content Delivery Networks (CDNs), extremely well-connected network, to reach faraway users and help with the bandwidth required to send users the music, so that their users don’t have to wait for their bits to travel all over the world. They have physical data centers in London, Stockholm, Ashburn (VA) and San Jose (CA). In addition to utilizing Google Cloud Platform, Spotify also utilizes up to at least five internet exchange points (IXPs) located in Frankturt (DEC-IX), Stockholm (Netnod), Amsterdam (AMSIX), London (LINX), and Ashburn (EQIX-ASH). The service is also attached to some subscriber networks, broadband or mobile providers, to speed up and shorten the distance to their users (Dbarrosop, 2016).

Spotify splits traffic between data centers. When a Spotify client connects to the service, a combination of techniques is used to make sure that the connection is made to the best possible data center. Also, when Spotify connects to organizations’ network, they need to know which of those connections are suitable to reach the connecting client (Spotify Labs). By applying SRI, Spotify could monitor available paths, choose the best one based on real time metrics, and thus provide clients better and more accurate service.

“As Nicole Starosielski claims, “a simple ‘click’ on a computer commonly activates vast infrastructures whereby information is pushed through router, local internet networks, IXPs, long-haul backbone systems, coastal cable stations, undersea cables and data warehouses at the speed of light” (Eriksson et al. 2017).

Detailed Design Principles that Enable Spotify’s Different Features

  • Spotify as Interface for Agencies

  1. Interface Between Different Websites

Linked Data and Data Integration

Spotify links and combines data from different sources, enabling two of its key features: one is collaboration with companies , the other one is the playlist function. In this section, I will focus on the former feature, and the latter one will be discussed later.

Spotify operates data integrations with lots of companies, the most observable one (across the app’s interface) is the “infrastructural tie-in” with Facebook. It merges its login system with Facebook so that users could either sign up with their email or Facebook account. By linking account to Facebook, Spotify let users “display their Facebook name, picture, and find their friends easily on Spotify” (Spotify). They could also share their playlists with their friends and know what their friends are listening. Two huge databases are connected by users’ simple clicking of “sign up with Facebook”. The collaboration is a win-win deal for both companies. 


Interoperability is “the ability of different information systems, devices and applications (‘systems’) to access, exchange, integrate and cooperatively use data in a coordinated manner, within and across organizational, regional and national boundaries.” ( Sharing music and personal profile is one of its interoperabilities. Spotify’s users can share Spotify music via multiple platforms such as “Skype, Tumblr, Twitter, Telegram, etc.” When users click one of the bars, the link will direct them to the relating websites, which triggers the information integration process. They can also share songs via Spotify Uniform Resource Indicator (URI). This link is convenient because it directly takes users to the Spotify application, without having to go through the web page first (but with HTTP song link, users will be directed to web page). For example, the link of this song is””; the URI is“spotify:track:0WVAQaxrT0wsGEG4BCVSn2?context=spotify%3Aplaylist%3A37i9dQZF1DX0BcQWzuB7ZO”. What is the most interesting part is that users can install Spotify music and playlist on their personal website by copying the embed code of the music. Copying and clicking the link inside the application is the analog of opening a new window in the web browser.

In sum, data integration and interoperability are inseparable: it is data integration that enables  the interoperability between Spotify and different websites. The sharing of databases adds lots of “social” functions, and thus affordances, to Spotify.

2. Interface Between Users and Devices

Viewing Spotify from the perspective of offline, it serves as an interface between users and devices, or a product of human computer interaction. This part is a continuation of what happens when the computer finished “communicating” with Spotify’s servers. After the packets arrive at the computer, the system decodes and sends the result to a buffer, then to the speaker. Take the computer desktop application for example, Spotify GUI enables human to interact with the computer by means of visualization, audition and tactician. It serves both as a controlling interface for users to send their command, and a checking interface for users to monitor how well the computer completes their commands. Although it does not invent any new function for computers, it helps users to “develop and assemble” different functions of the computer. The action of clicking the play button (through the touchpad, tactician) triggers the RAM, the buffer, the speaker, the monitor…For example, the interface informs the user that the song starts to play, that is when the computer receives the user’s command and reacts to it , in three ways (on the monitor, visual): 1. The play button change from “pause” to “play”; 2. The progress bar is moving and the remaining time of the song is changing; 3. The speaker icon pops up on the album cover. All of which, accompany with the sound of the music (auditory), are signs that human is interacting with the computer and that the computer is playing the song they want.

3. Interface Between Users and the Underlying System


Playlist is the building block of Spotify. It connects different modules of Spotify, for example, from one playlist to another or from one singer to a music genre. The user can find out that playlist is everywhere in Spotify’s desktop: in the home page, there displays different types of playlists in the form of square shaped photos; in the main navigation bar on the left side, there exits a list of playlists and the function of “add new playlist”; when the user search for a song, an artist or a key word, what appears is a screen full of playlists. Playlist is the simulation of album, in which different songs are connected by the same singer. Spotify’s playlists connect songs together based on different elements, such as mood, weather, genre…For example, the songs in the playlist named “Christmas Hits” are connected by the Christmas element. When the user clicks into this playlist, he or she is likely to get into another playlist related to Mariah Carey (who is famous for singing Christmas songs) or “Christmas Classic” (because these two playlists share the same element “Christmas”). Thus, playlists link the data of artists, songs, key words etc. together.

As Eriksson et al. indicate in their study, the streaming metaphor itself implies a continuous flow of music, reminiscent of a never-ending playlist (Eriksson et al, 2019). Building playlists is not a new function invented by Spotify, since“early media players such as Winamp provided functionality for reaggregating tracks into customized playlists–an approach to music that built on previous assembling practice and technology” (Eriksson et al, 2019). Spotify uses playlist to guide users from one module to another. It allows users to mix and match their favorite tracks and rewrap the tracks into their personalized playlists. It also enables Spotify to recommend and create playlists for users based on data of preexisted playlists. The creation and recreation of playlists is a non-stop data aggregation process that keeps Spotify’s different functions working.

Music Recommendation

Followed by the launch of “expert playlists for every mood and moment”, Spotify took a step toward “algorithmic and human curated recommendations”. It “not only delivers music, but also frames and shapes data” (Eriksson et al. 2017). It is also a method of how Spotify enables “deep and unique conversation” between users and the complicated technological system.

Spotify recommends music in various ways: the weekly updated playlists such as “Discover Weekly” and “Release Radar” as well as song and album recommendation such as “Top recommendations for you”, “Similar to…” and “Because you listened to…”.

Take the “Discover Weekly” for example. It is based on three recommendation model: 1. Collaborative Filtering: it makes prediction based on users’ historical behaviors on Spotify, such as  “whether a user saved the track to their own playlist (see, playlist appears again!), or visited the artist’s page after listening to a song” (Ciocca, 2017). Say user Pipi listens to tracks A,B,C, user Sisi listens to the track D,E,F., and they are paired up. Spotify will recommend B to Sisi and F to Pipi, after making sure that neither of them has listened to the music Spotify recommends. The whole process is actualized by matrix math and Python libraries.  2. Natural Language Processing (NLP)models: it is the analytics of text. By searching over the web to look for blog post and other written texts about artists and music, Spotify can figure how people define and describe a specific song or musician; Take Echo Nest for example (Whitman, 2012), they put Spotify’s data into a chart called “cultural vectors” or “top terms”. Each artist has their top terms and associated weight, meaning the possibilities that people are likely to describe them. Basically, Spotify uses the charts to create a vector to determine the similarity of two songs. 3. Raw Audio Models: it is the analysis of the track’s characteristics. Spotify uses convolutional neural networks to analyze the similarities of the characteristics between the music, such as “time signature, key, mode, tempo, and loudness”(Ciocca, 2017). Then, it recommends songs for users based on their listening history.

(“Cultural vectors” or “top terms,” as used by the Echo Nest)

Through layers of analysis and calculations of different sources of data, Spotify makes personalized recommendation for users, which is also a process of database integration and an interoperation across different companies.

  • Spotify as a Sociotechnical System

Spotify only allows its users to download music within the application, which means that users cannot export their user libraries (both online and offline) outside of Spotify’s ecosystem. Music libraries can only be synced between devices with Spotify app. It seems a good way for Spotify to secure their clients, right? But the whole story is not that simple. It has a lot to do with the music industry licensing agreements and digital rights management (DRM).

A Proprietary Format is a format that a software program will accept or output data that is entered into the program (the law dictionary). Music streams from Spotify, protected by DRM, are encrypted in OGG Vorbis format. It is an open source and patent-free alternative for lossy compression, which means that Spotify’s software developers do not need to pay license fees by supporting OGG in their application, neither do they need to publish the change when they fix the code according to their own needs” (Mitchell, 2014). Spotify’s audio files is coded by its own engineer based on the original OGG format, thus its music cannot be decoded or decrypted by other software. To put is simple, users cannot use other media player to play Spotify’s music, do not even mention to burn Spotify’s music into CD.

 Spotify is available in most of Europe and the Americas, Australia, New Zealand, and parts of Africa and Asia. Its content can be accessed through both app and web player: the app can operate on iOS, Android, Mac and Window system as well as through several sound systems, TVs and car stereo systems; the web player is supported by web browsers including Chrome, Firefox, Edge and Opera (Spotify). To be more specific, Spotify is coded to operate on those systems and devices. 

The enactment of Digital Performance Right in Sound Recordings Act (DPRA) and Digital Millennium Copyright Act (DMCA) forces many music streaming providers to pay a sound performance royalty in addition to the musical work royalty (Richardson, 2014). Spotify “must pay licensing fees to copyright holders (record labels, such as Warner Brothers and SONY) for each song played, whether offered to a paying, or to a non-paying customer, which makes the freemium model of customer expensive. That is also the reason why Spotify charges for premium services.

Also, if the user opens  “About Spotify” tab, he or she can see a series of logos for Universal Music Group, EMI, Warner Music Group, etc. under the “Content provided by” part, meaning that the music on Spotify comes from elsewhere.

Spotify “opens its Application Program Interface (API) to external developers, whose applications could retrieve data from the Spotify music catalog”. But developers have to follows a set of rules in order to develop their apps, such as agreeing with terms of uses and creating client IDs. Also, they have to “go through a rigorous Spotify approval process before being released on the platform” (Myers, 2011). 

Its collaboration with other companies and websites is supported by API. Spotify uses proprietary format to protect the artists and the content of its software.


From producing, compressing and packaging, to transmitting, decoding and playing music files, Spotify does not invent anything new for the music streaming industry, but it does a great job in working as an interface of “connecting and coordinating”. It connects websites, people, and physical functions of devices. It coordinates the labor distribution of the software’s functions, the passage of data packets, the relationship between its service and the sociotechnical environment. With the purpose of connecting and coordinating, following the rules of the large sociotechnical background, Spotify applies as many design principles as possible to provide their users the most pleasing, convenient and personalized music streaming service. There is nothing about the technology itself that makes everything a blackbox for consumers (Martin Irvine), but how Spotify applies the design rules. Spotify is a product of human computer interaction.

Reference and Citations:

  1. Peter Brusilovsky(2007). The Adaptive Web. p. 325. ISBN 978-3-540-72078-2.
  2. Richardson, J. H. (2014). The Spotify Paradox: How the Creation of a Compulsory License Scheme for Streaming On-Demand Music Services Can Save the Music Industry. SSRN Electronic Journal. doi: 10.2139/ssrn.2557709
  3. Eriksson, M., Fleischer, R., Johansson, A., Snickars, P., & Vonderau, P. (2019). Spotify teardown inside the black box of streaming music. Cambridge, MA: The MIT Press.
  4. Ron White, How Computers Work. 9th ed. Indianapolis, IN: Que Publishing, 2007. Excerpts.
  5. Peter J. Denning and Craig H. Martell. Great Principles of Computing. Cambridge, MA: MIT Press, 2015. Review chapters 4, 5, 6. Excerpts in pdf.
  6. Brianwhitman, Author. “How Music Recommendation Works – and Doesn’t Work.” Variogram by Brian Whitman, 11 Dec. 2012,
  7. “Streaming.” Merriam-Webster, Merriam-Webster,
  8. “5.1.2 Digitization.” Digital Sound & Music, 23 Jan. 2018,
  9. “Encoding Audio and Video – Revision 5 – GCSE Computer Science – BBC Bitesize.” BBC News, BBC,
  10. Sverdlik, Yevgeniy. “How Much Is Spotify Paying for Google Cloud?” Data Center Knowledge, 7 Mar. 2016,
  11. Dbarrosop. “SDN Internet Router – Part 1.” Labs, 28 Jan. 2016,
  12. Dbarrosop. “SDN Internet Router – Part 2.” Labs, 2 Feb. 2016,
  13. Ciocca, Sophia. “How Does Spotify Know You So Well?” Medium, Medium, 5 Apr. 2018,

Sociotechnical Background of Spotify (Week11)

We treat the internet as a totalized and unified identity, but what hidden behind its simplified GUI is a designed complex sociotechnical system that is consisted of multiple layers and modules, Like the internet, Spotify, a music streaming service platform, can be also understood as a complex sociotechnical system. According to the top-level view of the major dependencies of the internet, the operation of the Internet service on our PC and devices is based on the interconnection of many system modules. Streaming service, belonging to the digital media and the “content” companies module, is a part of the internet service and is also supported by the internet. 

As a whole, Spotify has two types of licenses for its music, “Sound Recording License agreements, which cover the rights to a particular recording, and Musical Composition License Agreements, which cover the people who own the rights to the song” (CNBC). For the first category, Spotify has deals with three big record labels — Universal Music Group, Sony Music Entertainment Group and Warner Music Group. For the second category, there are two main type of licenses Spotify has to secure: performance rights, basically paid to song publishers when the song is streamed, and mechanical royalties, generally paid to songwriters when a song is reproduced. Performance license is managed through two main firms in the U.S. — BMI and ASCAP. Mechanical rights for streaming services are governed in the U.S. by a government agency known as the Copyright Royalty Board.

The origin of Spotify has a lot to do with Pirate Bay, a website that provides file-sharing links, from which Spotify borrows many technologies of music sharing.  Spotify is based on a client-server structure and follows an “end-to end” design rule. It streams music in three ways: local cache, peer-to-peer and Spotify servers. Spotify has its own server, where the music data are stored. Before 2014, Spotify mainly adopted peer-to-peer service, the mode that does not require a dedicated server for the internet. “When the user plays a track from the desktop client, the audio stream comes from three sources: a cached file on the computer, one of Spotify’s servers, or from other subscribers through P2P”. In this case, each Spotify’s user could both be a server to provide service, and a client that enjoy the service.

According to the video from Code Academy, when the user requests a song, Spotify’s server sends a song broken up into many packets. Then, they choose the “cheapest” path, in the perspective of time, politics and relationships, based on the client’s IP address for the packet. When packets arrive, the transmission control protocol/ TCP does an inventory, sends back information acknowledging the acceptance of the packets and confirms the delivery. If TCP finds out that there are missing packets, the quality of the song will be lowered, or the song will be incomplete. Then TCP will send the missing signal back to the server, who then resends the packets. As long as the TCP confirms that all packets arrived, the song will start to play.

Spotify used to operates its own data centers and stores  data on physical server. In 2016, Spotify announced to transform much of their data from their own server to Google Cloud Platform, but Spotify’s music files will still be hosted on a storage service from Amazon, a dominant cloud hosting player. The transformation from physical server to cloud server makes the hosting of information more scalable and reliable.


“The Internet: Packet, Routing & Reliability”

Martin Irvine, The Internet: Design Principles and Extensible Futures

Denning and Martell, Great Principles of Computing, Chap. 11, “Networking.”

Spotify: From Web Page to App (Sorry I did the week12 reading and question)

  • How does Spotify play music?

Spotify is an online audio streaming platform. It is synced to the Spotify Cloud Server, meaning that the information is sent from the cloud. According to White, “streaming enables PC to play the file as soon as the first bytes arrive, instead of forcing the PC to wait for an entire multimedia file to finish downloading”. Spotify plays music directly from the server to device. I notice that when I click the music, there is a one to two second buffer time until the music plays. What happens is that when I select a song, the “conversation”, carried by “metafile”, between the computer and the Spotify server instructs my computer how to play the song. Spotify contacts its server providing the sound file and information about the internet connection. The server will choose the suitable quality of audio file based on the speed of the internet. It sends higher-quality sound if fast internet connection, while it sends a lower-quality link if slow internet connection. The files are transmitted as series of packets. Just like how the information transmitting process work, when the packets arrive at the computer, the system decodes and sends the result to a buffer. When it is filled up, the files are turned into music.  Like mp3, Spotify has its own audio format, named Ogg Vorbis. “On mobile user can choose what bit rate to stream, in increments up to 320kbps, which is handy especially if you’re worried about using up too much mobile data. Desktop playback is at 160kbps or 320kbps for premium users” (cnet).

  • Spotify Cache

Spotify also uses cache to store music in the  hard drive, which enables us to store temporary music for streaming. “When you press play, you hear the music immediately with few interruptions” (Spotify). But we cannot listen to the cache loaded music offline. If I want to clear my cache, Spotify allows me to clear my cache without deleting those songs that are downloaded.  Is it because cache is in RAM, which is a temporary memory not stored in the hard ware of the device, while downloaded musics are stored physically in the device? (I am not sure how it works)

  • Spotify Connect

Spotify can be accessed via desktop software, app on mobile and tablet and web player. Spotify Connect works over Wi-fi. “It seeks out compatible devices that are connected to the same wi-fi network and links them together to wirelessly to stream music.” After logging into Spotify, we can listen to music via multiple devices (Spotify premium only) , such as smartphone, PC and home audio systems. We can also switch between compatible devices without pausing the music. For example, if I am playing music on Spotify desktop, I can use my Spotify mobile app to control the progress bar on my desktop. I can also “mute” the desktop player and let my mobile phone play the rest of the song. All I need to do is to choose the device at the bottom of the screen. Spotify Connect works better than the Bluetooth pair up, because the delay in between the swapping of device is so tiny that we cannot actually discern it. What is more, when I switch between devices, the volume is automatically adjusted.(what appears on my desktop player)

(what appears on my mobile app player)

  • Something interesting I found about the webpage version of Spotify:

When I type something in the searching bar, the URL on top of the screen changes. I typed “Shawn”, it appears after “search/” . Then the songs related to Shawn Mendes pop up. But I am not sure what “open” stands for. Also, each song and artist has their own link. For example, when I copy the link of a song, artist as well as the album into word document, they appear like this: (song’s link) (artist’s link) (album’s link)

However, I cannot find any information about them by simply reading the link.




Interactive interface and touch screen

Touch screen plays a crucial role in the success of interactive surface. We touch it and it responds. We take such interaction for granted. The design of touch screen, which is based on the x/y coordinate pixel-map of the screen, can be grouped into two types: capacitive, reacts to the change in voltage passing through the invisible wire under the screen, and resistive, reacts to pressure on the screen. Resistive screen is applied in some old Nokia phones, NDSL’s, a hand-held video game, touch screen, electric signing machine of delivery companies, car’s screen, etc. I found its reaction pretty slow and lack of sensitivity. Captive screen, on the other hand, works in iPhone, the navigating machine in shopping mall, etc. Although it reacts quickly and accurately to our touch, it does not work with objects that does not have a same charge to our fingers. When water drops on the screen, or when our hands are wet, the touch screen become slow and less sensitive to the touch. In this case, we deliver our information by means of our gesture. The grids of wires sensor the change of the electrostatic field, triggered by our “input”, of the touched area”, and transmit it to the microcontroller. Then it functions as a translator of information, translate the location to the inner working part of the device.

The pixel grids of the touch screen perform as two-way representation and interaction. The location of the icon inside the pixel grid serves as an index for our action: where can we find the app, to where should we point to while using the app, the direction of our gesture, the meaning of our gesture, etc. Its design principle for motion could be applied in video apps. For example, YouTube. When you are playing a video, a double-tap on the right of the screen means fast-forward, a double-tap on the left means rewind; Swiping from right the left means jumping to the next video, vice versa; by scrolling up and down, we can control the volume of the video. The gestures are all monitored inside the grids. The pixel grids both guide users and the software: they provide index for user’s actions, transmit user’s command to the software and help the device identify the task.

Except for the hand gestures, there are other elements of input. Take the example of initiating the process of finding information: we can type keywords in the searching box to search for things we want (applies in most of the applications). For Taobao app, there are lots of elements that can be applied as input: we can drag photos of the stuff we want (input) to trigger a search, or we can first copy the “special command”(input) for that product shared by other users in other apps, then the search is automatically triggered when we open the Taobao app.  For Spotify app, we can scan the “Spotify code”(input) to search or share a specific song. We can also use voice command to ask it to play a certain song, which functions like Siri. Similar to triggering the touch screen by the change in capacitance, our different forms of input triggers the “translator” of the app. Then it transmits our command to the device. But I am not sure how that process works.

Question: What kind of touch screen does kindle applies? It functions in between these two kinds of screens. It does not work as sensitive and smooth as those screens of smartphones, but it works better than those resistive screens: it allows multitouch usage.


Martin Irvine, (New Intro) From Cognitive Interfaces to Interaction Designs with Touch Screens.

Computer From a Big Calculator to a Metamedium

The two concept leaps,“augmenting human intellect” and “cognitive design”, and the introduction of GUI in the 1980s help computer to evolve from a “big calculator” to a metamedium, a human symbol manipulator and a problem solver.

The main design concept of computer starts from a “numerical and logic processing” machine. The first working-stored program computer EDSCA was huge and immobile. The computer programs run at a linear and uninterruptible sequence. The user interface to the machine was instruction set. The only way people control the computer is writing programs into it. The data and number were represented in binary. The mode of the computer is relatively passive, since it has little interactivity with human. I see computer, of the earlier stage, as simply a data processing machine. If I were a nontechnical user at that time, I would never think of having any connection with that machine, since it is neither user friendly nor smart.

The further development of computer, especially the development of graphical user interface makes the concept of “computers as big calculators” seem too limiting, because calculating is just a facet of computer’s problem-solving function. GUI not only possess the original concept of interface, anything that physically connects different parts of the system, but also enhances human interactivity with the computer. It works as a two-way mediator (input and output) between human and the computer by enabling them to “delegate, extend, and off-load some processes of human symbolic cognition and agency to software. It gets input from human. For example, human type words with keyboard and control the software with the mouse. Human can further manipulate the input by changing the font, arrangement and colors of those words through the interface. The interface displays the functions of computer software such as automatic spelling correction, hyperlinks, movements of manuscript etc. In the case of hyperlink, GUI imitates the process of reading and the affordance of book, a human artifact, and library. Sections of the books are connected together physically by paper and glue. Books are connected together by the library. The hyperlink utilizes the pattern of how books are physically connected to link information together. It saves human from the troublesome process of doing research, so that they are able to acquire the information they need in one glance, instead of going through all resources and filtering them. That is also how computer helps augmenting human intellect.

The development of GUI enables human to communicate with computer by means of the feedback circle of information. It helps computer accept various kinds of commands from human, helps human make use of the hiding details and functions of the computer, and better connects the softwares inside the computer system.


Martin Irvine, Introduction to Symbolic-Cognitive Interfaces for Computer Systems: History of Design Principles

Lev Manovich, Software Takes Command, pp. 55-106, on the background for Allan Kay’s “Dynabook” Metamedium design concept.

Peter J. Denning and Craig H. Martell. Great Principles of Computing. Cambridge, MA: The MIT Press, 2015, chapters 7 (Memory) and 9 (Design).

Programming language and computing system

This week’s reading changes my understanding of programming language. In professor Irvine’s video, he mentions that we use symbols to represent meanings and to represent and interpret other symbols. Language is such kind of symbol, since we use language to represent language. It does not represent what computer speaks and thinks (what it seems like) but imitates human’s thinking pattern. Just like what Evans points out, “we designed artificial languages for some a specific purpose such as for expressing procedures to be executed by computers”, computer is a machine that decodes our information and execute our command, a procedure through which computer helps us solve problems. Instead of being chaotic, it is actually highly organized and follow the syntax rule of language. Programming language only focuses on the surface form of text. Each word or sentence generates a new meaning. The third function of symbol, mentioned by Professor Irvine, is that it does not represent meanings but performs actions on other symbols. For example, the design of operating system can be used as managing and controlling tool of other software applications. In this case, we use programming language to control operating system and further control software applications.

Different from the symbolic human language system, which is complex, ambiguous irregular and uneconomic, programming language serves as a more powerful means of abstraction. Programming language is simple, direct and easy to execute. Last week, we learned about Shannon’s Transmission Model of Communication and Information. I see programming as the process of information transformation from human to programming system. Take the example of Scheme program (although not being widely used), we, human, first put Scheme, the highest level of language, into the programming system as the resource of input. Then the scheme interpreter decodes and transmits the higher level of language, that is the information, to the machine processor, which is the information receiver. Finally, the machine executes the command.

Computing system connects the programming system and the machine by calculating the functions human put in and transmitting human’s command to the machine. Each part of the hardware has their own functions in the calculating process. There are interfaces that connect the separate parts and transit information between them. For example, RAM stores information, while CPU calculates. From the block diagram of CPU and RAM, we can know that the IP indicates the location of the next program while the SP is address of the newly stocked information. The ALU takes two input numbers and produce one output number. CPU and RAM transmit information and values back and force. By looking inside the machine’s hardware, we started to visualize where the information goes after we put codes into the programming system. The way system calculate functions is a mathematical process, and the way each part of the computer hardware function is an engineering process. That is the reason why Wing sees computational thinking as complements and combines mathematical and engineering thinking.


Prof. Irvine, Introduction to Computation and Computational Thinking

Jeannette Wing, “Computational Thinking.” Communications of the ACM 49, no. 3 (March 2006): 33–35.

Peter J. Denning and Craig H. Martell. Great Principles of Computing. Cambridge, MA: MIT Press, 2015. Review chapters 4, 5, 6.

David Evans, Introduction to Computing: Explorations in Language, Logic, and Machines. 2011 edition.

Text message as information

This week’s reading helps us better understand information and its transformation system. Simplifying the process of information transmission, Shannon’s model of information system opens the black box of our communication devices. Digitalization makes texting the most common, convenient and the simplest way of transforming information.

By saying texting, we usually refer to SMS texting, which is sent over a cellular network. Our cell phones are always sending and receiving signals back and forth with a cell phone tower or control channel, even when they are at rest. When text message is typed, it is encoded by code books as bytes and transmitted in data packets, which is the signal the cell phone sent. The signal first arrives at the control channel, the medium that can transmit the signal, then it is stored at the short message service center, or to be sent immediately when the receiver is available. The receiver decodes the data packets, to be more specific, the software translates the signal into information that the user could understand.

By understanding the meaning of the text message, and forming the reply, we know that it is being successfully transmitted and received as information. According to GDI, the reason we can understand the Information transmitted by the text is because it is made of well- formed data, both in the syntax level and semantic level. The syntax of the language is its grammatical rules, which renders the meanings of the information, the semantic level of information. According to Professor Irvine, “meanings are enacted by cognitive agents who use collectively understand material sign structures in living contexts of interpretation and understanding”. Language is a kind of sign that has conventions and a set of man-made rules behind. What we text serves as symbols of meaning. People recognize the pattern, and thus understand the meaning expressed by the signs.

Shannon’s approximation theory of the language indicates that certain pattern of the structure of language made us recognize the pattern. An English speaker can recognize the letters, words, and thus, sentences. However, if the information source is Chinese, and the receiver knows nothing about Chinese, he may recognize such text only by their physical properties, but not by their meanings. Since Chinese is not in their knowledge system. In this case, the transmission of information is not successful, because the receiver cannot recognize the pattern of the information from the sender.

I also come up with a question: Is there any standard of measuring how successful information is transmitted?


Martin Irvine, Introduction to the Technical Theory of Information

Luciano Floridi, Information: A Very Short Introduction. Oxford, UK: Oxford University Press, 2010. Read Chapters 1-4. Excerpts.

James Gleick, Excerpts from The Information: A History, a Theory, a Flood. (New York, NY: Pantheon, 2011).

Peter Denning and Tim Bell, “The Information Paradox.” From American Scientist, 100, Nov-Dec. 2012.



The Affordances and Constraints of Instagram

It is hard to recount since when Instagram has deep rooted into my social life. Sometimes when I unlock my phone, I will mechanically search for the little camera icon in purple and orange and open it. So, when it comes to affordances and constraints, I think of Instagram. As Norman points out, “an interface designer should care about the question of whether the user perceive that clicking on that object is a meaningful, useful action, with a known outcome” (40), the designer of Instagram did it and made users to focus “on the key functional elements and their relationship to one another in the app”.

First, the design of Instagram applies lots of affordances for human capacity for symbolic expression, from the graphic icon to  symbolic icons representing each interfaces of the app. Starting from the design of Instagram’s icon, which features as a camera lens, it reminds users of Instagram’s main function, taking and sharing photos. When the user opens the app, there is a camera icon on the top left corner, which enables the function of “story”. Obliviously, the camera is a symbol of shooting photos and videos. Similarly, the TV icon on the top right corner represents the function of IGTV, which enables the users to upload videos longer than 1 minute. The TV icon reminds people of channels, programs and long videos.  The home icon is a symbol of homepage, the magnifier icon represents the searching function, and the portrait silhouette represents the user’s personal profile. What I find the most interesting is the icon of paper plane. When I was in primary school, I used to throw a paper plane to my friend, with some notes inside, if she sits far away from me in the classroom. This icon enables the feature of adding others’ post to my story, sending the post to other users, and commenting on others’ story. Sending “our current feeling”, such as commenting on other’s stories and sharing posts to them, to friends who are far away through Instagram is like sending message through a paper plane. Instagram is really good at connecting the symbolic meaning of objects to icons inside the app, which helps users to navigate the application by enabling them to “subconsciously” find out the function they need.

Second, the spacial affordances of Instagram give users a more fluent and comfortable user experience. Instagram pages are highly organized, both the nexus structure and the shape of its elements. According to Murray, users in the digital age assume that spatial positioning is meaningful and related to function. Instagram is structured based on its functions. In the home page, the story section is structured horizontally, the user browsing section is structured vertically and the navigation bar is structured horizontally. The story function is composed of the horizontally arranged avatars of the same sizes. The spaces between them are just right to make users feel comfortable. When the user click into one user’s stories, there are several progress bars also arranged horizontally, on the top of the page. If the user switches to the next user’s story, the interface will act like a turning cuboid, reminding him of turning the page of a book. No matter what page the user is in, the avatars are always in the shapes of the circle. The “like, comment, send post and save post” icons and main navigation icons are also arranged horizontally. Instagram’s designer puts icons of  similar importance and functions next to each other to give users hints of the functions.

One of the constraints of the app is that the user can only browse one photo at a time. If he wants to see more, he needs to scroll over the screen. If the post is consisted of multiple photos, the user needs to scroll the photo to see other photos. This constraint actually makes the layout of the application clearer and more well-defined by giving less information to the user at one time. It also reduces the number of the fundamental tasks the user needs to do at one time. If the user sees there is multiple photos in one post, he knows to scroll over to see more. Moreover, if the user likes the photo, he will click into that blogger’s personal page to see more photo. That is when the user knows he finishes a task and need to move to the next one.

Another constraint is that Instagram implants advertisements “implicitly”. The company appears on the homepage of Instagram as a user: it has avatar and Instagram post with caption, and users can like and comment on its post. But different than normal post, there is a bar with “shop now” at the bottom of the advertisement. To attract users’ attention, the bar will change color if a user looks at the post for more than 2 seconds. The bar will lead users to the shopping website of the company. Since the user can simply scroll over the advertisements, this constraint reduces the annoyance they bring to the user.

The affordances and constraints of Instagram have a lot to do with human sign and symbol system. It brings us back to the core of Instagram, photo, which itself is a human symbol.

Mediation and Sociotechnical System

The concept of mediation, first comes to me as quite elusive, blows my mind when I finished this week’s reading, since it helps us visualize the “invisible conditions and relations” (Irvine, YouTube) behind the interfaces that interact with us. Each digital interface has invisible social-cultural relationship behind it, and it is those “already in place” social relations determine the way digital interfaces are designed. Technology, media and society/culture are parts of a socio-technical system: they are interdependent and interacted.

Websites are interfaces reconstructing and connecting pre-existed media, such as graphics and texts, and gathering fragmented information. Meanwhile, they are not simply a “medium” of displaying and transmitting information, which is their “social values, function and power” (Irvine), they generate novel meanings  for social activities.Digital library websites redefine the meaning and people’s way of searching for documents: when we need resources for academic essay, the first thing we do is to open jstor or Google Scholar, but not head into school library. Crowdsourcing websites redefine the social function of the crowd by stressing the mutual benefit between the users and the crowdsourcers. Social websites alternate our communicating methods and our definition of community. Online shopping websites change our mode and definition of shopping.

Searching engines, Google Chrome for example, further connects the websites together. What I find interesting about the searching engine is its graphics. The home page of google chrome consists of Google Top Stories, a searching bar, and two lines of icons linked to our frequently browsed pages.

Each of those functions is represented by a graphic. If those graphics are not hyperlinked to their functions, they are simply graphics and have no meaning, but when they are connected to the sociotechnical system, they each represents another whole system behind the Chrome interface. If you click the biggest graphic, you will enter the researching page of Dr. Herbert Kleber, the main character of the Top Stories. It is given a new meaning by Dr. Herbert’s story. In the researching page, there are more graphics and photos representing his books, portraits and the news about him. As for the two lines of graphics representing users’ frequently browsed pages, they remind users of what they have searched before and lead them to those pages with simply one click. As a result, they serve as  reminders of the users’ memories. The social-cultural function of the graphics in Google Chrome are re-mediated by their functions hided behind the interface. Graphics as people’s cognitive symbol “is the function precedes, and is the precondition for the current technical implementation”(Irvine), that is the design of the interface of Google Chrome.


Debray, R. (1999)  “What is Mediology?”, from Le Monde Diplomatique, Trans. Martin Irvine.

Martin Irvine, “Understanding Media, Mediation, and Sociotechnical Systems: Developing a De-Blackboxing Method”

Martin Irvine, Intro to Media and Technical Mediation (from “Key Concepts in Technology”)

Cognitive Artifacts: From Paper Books to EBooks

“The special characteristics of human enable them to create artifacts that enhance their life.” From engravement on stones, to ink on parchments, books, a mile-stone artifact created by human, represents human developments. Books, paperback or electronic, as cognitive artifacts support nearly all parts of human symbolic system, from languages and writings to images and information visualization.

System View and Personal View of Books and EBooks

For a system viewer, books help human record and spread knowledge, information and memories, which further enhances human memories and improves their learning outcomes, deepening their cognition of the world.  Initially, books function as a recorder. They not only recording people’s ideas, imaginations, deductions, but also archive past events of a society and a culture. Then, books with educational purpose become text books, aiding people’s memorization and study process; books of people’s thoughts become fictions, enriching people’s life and spiritual world. They translate what seems abstract and invisible (thoughts) to concrete human cognitive symbols (graphics, languages, scientific models).

Analyzing books from a personal view, recording, learning and disseminating information is a task for human. Without them, it is impossible for human to complete the task: remembering all events, knowledge and their thoughts becomes difficult, do not even mention archiving and spreading them to other cultures (horizontally) and passing them to the next generation (vertically). Books replace such task, and create new tasks, such as constructing books, buying books, reading and interpreting books, for human.

As for eBooks, their system view includes what all books can do plus making reading more convenient, which further improving the efficiency of information spreading. The personal view is that eBooks replace people’s task of finding books in library and book shops to choose books from a digital library by interacting with an interface.

EBooks: Media technologies as “cognitive technologies”

EBooks transform the action of reading books from a physical-based action to a digital-based  action. Considering eBooks as “cognitive technologies” highlights the purpose of its invention and the balance between human and the world as well as the virtual and the real world. Books directly engage with human, since they create, read and touch books in person; books give them direct feedback involving human symbols, such as graphics, languages, scientific models etc., as well. Based on the action circle, creating books is what human do to the world. Meanwhile, they want reading to be faster, cheaper as well as more convenient, flexible and achievable. They also want to promote the interactivity among readers. Here come the eBooks. They function as a virtual world between human and the real books. In comparison, the term “manufactured products” sounds isolated and dead, because it makes technologies less meaningful by cutting their connection with human cognition.


Donald A. Norman (1991), “Cognitive Artifacts.” In Designing Interaction, edited by John M. Carroll, 17-38. New York, NY: Cambridge University Press

Andy Clark (2008), Supersizing the Mind: Embodiment, Action, and Cognitive Extension ,New York, NY: Oxford University Press, USA.

Modularity: the “Survival Philosophy” of Technologies

If a complex system is a society, the sub-systems or modules are the citizens of that society. The importance of modular design principles to the society of technologies are as the importance of law and social structure to human society.

For me, the combination of design principles and graphical user interface in the design of PC gives users a brief guide of “how different elements in the system perform with the larger one”, which further reduces users’ difficulties of operation and promotes their using experience by exposing more simplified and characterized information to them. The similar rule can be also applied to the design of software applications. Interface is like a filter, it discards complex information, and leaves the essential ones for users; meanwhile it is like a microscope, bringing invisible parts of the system in front of users’ eyes. In order to design an app which looks unified on its interface, all modules, including the invisible parts, of the app have to function well. I will use Apple music to elaborate my point of view.

“Modules should be designed to hide their internal complexity and interact with other modules through simple interfaces”. Take PC for example, the graphical user interface teaches users how different modules, that is functions, of the computer work in a clear and intriguing way. On one hand, GUI “hides” complex things, from “rooms of vacuum tubes, programmed via switchboards…”, to “piles of chips”, and to thousands of confusing coding, behind the monitor, and presents users an organized interface. It turns a complexity to simplicity. On the other hand, GUI serves as a guide, helping users visualize the modulation of PC by icons, tool bars, columns etc. From the graphical user interface, users can tell the function of each module, which echoes the aim of modular design principles: labor is separated clearly among modules. One function cannot affect another. For instance, downloading one more application, say photoshop, to PC will not affect the use of preexisted application, say the mail box. Also, the malfunction of one application will not disrupt the running of the whole system. For Mac Book, when the photo reader does not function, there will be a little rainbow circle appears inside its window, but users can still use Safari. What is more, the GUI surprises and delights users by fulfilling their “whimsical desires” for computers. Users are satisfied by the resolution of the retina monitor of mac book pro, while the electronic circuits and other small assemblies behind the screen will not interest them at all.

In order to design an appeared unified interface, there are two basic rules that need to be followed: one is to keep the interface as simple and clear as possible, as Parnas indicates, “Its interface or definition was chosen to reveal as little as possible about its inner workings” ; the other one is to make sure that each module works well independently and “communicates” with other modules properly. Apple music is a music software. Based on what Baldwin talks about, the operation of Apple music involves two categories of information: visible information and invisible information. The user interface is visible information. The three elements of visible design rules are intertwined in the design of this app. The architecture of the app is the presentation of its functions. It includes a list of “playlists, artists, albums and songs”, photos and names of the albums, and the navigation bar consists of “Library, For you, Browse, Radio and Search”. The hidden information is the hidden design parameters. Users can only reach those hidden parameters through visible modules. There is some hidden information that are not open to users but to app designers. For example, the “For You” function as a whole is a visible information for users, but the background operation of this function is an invisible information. This is because For You can monitor songs and albums users played recently and make recommendations for them. If a user likes Luis Fonsi, For You will recommend playlists of Zedd, Marron 5 and other similar singers. For You also has the friend finding function by connecting to Facebook and contacts. These all requires the connection to the internet. Alteration of such information will affect only the system of For You, but will not trigger any changes in distant parts of the application, such as Library. The Search page has a function named “Recent”, which records what users recently have searched. The Search page also has a function named “Trending”,right below “Recent”, which puts everyone’s recent search together and come up with a ranking. This is an embodiment of how modules work both independently and compatibly.

To sum up, modularity makes the design of a system organized and user friendly.


Baldwin, C. and Clark K. (2000) “Design Rules, Vol. 1: The Power of Modularity”. Cambridge, MA: The MIT Press

Langlois, R. (2002) “Modularity in Technology and Organization.” Journal of Economic Behavior & Organization 49, no. 1: 19-37.

Lidwell, William, Kritina H., and Jill B. (2010)Universal Principles of Design”. Revised. Beverly, MA: Rockport Publishers

“De-redboxing”: the Design of Yelp

The readings give me a clear view on how to decompose the evolution of technologies, to be more specific, how they get where they are now.  I am intrigued by Schumpeter’s comparison between the development of economy and the renovation of technology. Unlike economic development, driven by external forces, technology “created the new by combining the old” , I totally agree with him. It is the cumulation of existed technologies begets the newest technologies.

When I did the reading, an app kept hunting in my mind: Yelp, a crowd-sourced reviewing app, especially for restaurants. It is surprising to see how much it has changed since I first used it five years ago. When you open it, you might find out that its overall layout looks the same as before: the white searching bar and the red frame. What does not change much is its primary structure, “a backbone that keeps its basic function”,: provide information of restaurants, including location, price, menu, comments, etc., for people. 

Meanwhile, the subassemblies of this app do change a lot. Yelp’s designer develops new functions for this app by building more subassemblies based on existed subassemblies. Take the online waitlist and delivery function for example. First, people search for restaurants as normal. The live waiting time is displayed inside each column of the restaurants’ preview. People can join the waitlist by simply clicking the button on the right side of the column. It is easy for customers to understand and manipulate the whole process because the new function is based on the old ones. The function also bridges the gap between customers and restaurants by letting customers actively and conveniently contact with the restaurants instead of passively making phone calls. 

Another function is the online delivery function.  Two years ago, Yelp only had online order function: either people order online and pick up in the store, or it gives people the link to another online food-delivery platform. While now, people can enter Yelp’s own delivery page by clicking the “Delivery” icon displayed at the bottom. The latter function is built on the basis of the former one. The delivery page looks and functions similar to other online food-delivery apps, such as Grubhub. Yelp applies an online food-delivery app as its subassembly. It is also the embodiment of the reconfigurability of technology.

The evolution of technology is tricky: it is highly structured, since its subassemblies are structured following the pattern of the backbone, and new subassemblies are built based on the old ones; meanwhile it is reconfigurable and fluid, since there is no restriction of the origin of the subassemblies and how they are structured within the big pattern.


Martin Irvine, Introduction to Design Thinking: Systems and Architectures.Print.

Brian Arthur, The Nature of Technology: What It Is and How It Evolves. New York, NY: Free Press, 2009.