Author Archives: Jieshu Wang

Band in Your Hand: De-blackboxing GarageBand – Jessie

Abstract

GarageBand is a music software for Apple devices such as iPhone and iPad. It has a library of sound effects and can be used to create songs with multiple audio tracks. In this paper, I discuss the design principles in GarageBand, such as modularity, affordance, and constraint. I also examine whether GarageBand fulfills Alan Kay’s vision of meta-media.

1. Introduction

GarageBand is a music application for OSX and IOS systems. It enables users to create multi-track songs with a lot of pre-made virtual instruments such as keyboards and guitars. There are thousands of loops in its library of sound effects. It also can serve as a DJ machine. Projects created in GarageBand can be exported in many formats such as wav. It provides amateur musicians with powerful tools to play and compose music.

For this paper, I am focusing on the IOS version of GarageBand for iPad. I’m going to examine how design principles apply to the user interface design and the programs of actions of GarageBand. I’m also going to demonstrate some principles by re-creating Daft Punk’s song Give Life Back to Music with GarageBand. Here is the video I made for this song.

Give Life Back to Music, re-created by Jieshu Wang with GarageBand, originally by Daft Punk.

Here is the link to the GarageBand file that you can download and import into your GarageBand app.

2. Modularity

Modularity is a method with which designers divide systems into subsystems in order to manage the complexity. Every module hides their own complexity inside and interacts with other modules with interfaces. Each module is divided into more sub-modules. In this way, the overall system complexity is reduced[1].

With years of development and updates, GarageBand is becoming more and more complex. However, as a user, I never felt it complicated to use. That’s because its designers use the principle of modularity very well. Improvements in one module would not influence other modules, so users don’t need to change much their existing using habits to adapt new functions. Here I will examine the modularity in GarageBand to see how it helps improve user experience and manage system complexity.

2.1. Modules in GarageBand

2.1.1. Sections and tracks

The basic function of GarageBand is to create your own music. Each song or project you create will not impact each other unless you import one project into another one. So, each song can be seen as a module. This is the topmost level of modularity for users.

img_0090

Each project is a module.

Inside one of the projects, there are two dimensions of modules. They are like two coordinate axes in an XY plane. The vertical axis is for audio tracks, while the horizontal axis is for sections (time).

Sections and tracks in GarageBand as modules. Video/Jieshu Wang

The first dimension of modularity is audio tracks. Within one project, you can add no more than 32 audio tracks, more than enough for most amateur musicians. Each track serves as a module that hides its complexity—its timbre, chords, loops, melodies, and other properties. When you are editing one track, you can play the sound of other tracks in order to synchronize your beats without affecting them.

The second dimension of modules is song sections. Each section is made up of several bars. The default number of the bars in one section is eight, but you can easily increase or decrease the number as you wish. Each song can consist any number of sections. Each section is a module where you can add no more than 32 tracks. While you are in the interface of one section, each audio track can be easily moved, trimmed, looped, cut, and copied, but your action in one section would have no impact on other sections—except adding or deleting tracks, which would automatically add a blank track with the same instrument or delete the same tracks in other sections. If you want to edit other sections, you can click any area in the current section and drag it to the left or right to enter the section behind or before the current section.

Here’s another advantage of dividing one song into sections. Since one song normally lasts several minutes, with the size constraint of the iPad touchscreen, in order to squeeze the whole song into the limited width of the screen, the length of one bar would be extremely short, too small to recognize. Any small variation of the sound wave would be very hard to locate. Users would have to zoom in many times to find a specific bar he/she is looking for, and then zoom out before zoom in again to reach another bar. Sections resolve this problem perfectly. It provides users with a navigation system like longitude and latitude. For example, only three numbers are needed to locate one specific bar in one song—the ordinal numbers of the section, the track and the bar within the section. If GarageBand doesn’t have sections or has just one section for the entire song, it would be very difficult to locate one bar among hundreds if not thousands of bars in one interface.

In general, if you create a song with five sections, and each section has eight bars and four tracks, then you get 5X4=20 modules that you can edit separately. For the Give Life Back to Music, I created 7 sections and 21 tracks, totally 147 modules. Although modules can be edited independently, they combine together organically. When you finish your project, you can export the whole song with tracks perfectly mixing together and sections seamlessly connecting one by one. If there’s a mistake or a sound effect you’d like to add or change, all you have to do is to find the right module and modify it accordingly.

2.1.2. Modularity of sound effects

As I discussed above, a song project in GarageBand is divided into modules according to time and tracks. Inside each module, GarageBand provides us with a large number of options of sound effects. Those sound effects are divided into two main modules—Tracks and Live Loops.

In short, Tracks are mainly sound effects that imitate real instruments such as pianos and guitars, while Live Loops are pre-edited loops, each of which is consist of rich tracks with different genres or styles such as EDM and Hip Hop.

 Two modules of sound effects that you can add into audio tracks: Live Loops and Tracks

2.1.2.1. Live Loops

Both modules (Live Loops & Tracks) have many sub-modules according to instruments or genres. In Live Loops module, there are eleven pre-edited loops modules in different styles—EDM, Hip Hop, Dubstep, RnB, House, Chill, Rock, Electro Funk, Beat Masher, Chinese Traditional, and Chinese Modern. In each module, there are even small sub-modules. For example, in the module of EDM, there is a default setting that includes eleven mixed tracks with nine pre-edited loops—totally 11X9=99 editable modules.

 EDM Live Loop has 99 pre-edited modules. Users can add more as they wish.

The basic unit of loops all come from 1,638 so-called Apple loops stored in GarageBand. Users can choose from those 1,638 loops to mix their own loops, as well as import other audio files as loops. 1,638 is a large number. How can we find a loop that fulfills our need? For convenience, designers labeled loop units with three types of properties—instruments, genres, and descriptions, forming a three-dimensional selection network. In this way, they programmed the users’ action of selecting loops into three modules. For example, if I’d like use two or three bars of country music loop played by guitars that would relax my audiences, I would choose the keyword of “Relaxed” in descriptions, “Country” in genres, and “Guitars” in instruments. Then I get seven items left in the list, labeled with “Cheerful Mandolin”, “Down Home Dobro”, and “Front Porch Dobro”, which are exactly what I need.

loops-categories-1

1,638 Apple loops are categorized by three standards: 16 instruments, 14 genres, and 18 descriptions.

 2.1.2.2. Tracks

In the module of tracks, there are thirteen options or sub-modules:

  • Keyboard: Play an on-screen keyboard with piano, organ, and synth sounds.
    • There are seven types of timbre for users to choose—keyboards, classics, bass, leads, pads, FX, and other, totally 133 timbres that you can play with a virtual keyboard on the touchscreen.
    • According to different timbres, there are many sound properties that you can mess with. For example, for a timbre called “Deep House Bass”, you can modify the properties of filter attack, cutoff, renounce, filter decay, and pitch.
  • Drums: Tap on drums to create a beat. There are eight drum kits and eight drum machines.
  • Amp: Plug in your guitar and play through classic amps and stompboxes. Basically, it’s a virtual guitar amplifier and effector. There are four categories (clean, crunchy, distorted, and processed) of guitar amps—altogether 32 guitar amps and 16 bass amps.
  • Audio Recorder: Record your voice or any sound. There are nine effects you can choose, such as large room and robot.
  • Sampler: Record a sound, then play it with the onscreen music keyboard.
  • Smart Drums: Place drums on a grid to create beats.
  • Erhu: Tap and slide on strings to bow a traditional Chinese violin.
  • Smart String: Tap to play orchestral or solo string parts.
  • Smart Bass: Tap strings to play bass lines and grooves.
  • Smart Keyboard: Tap chords to create keyboard grooves.
  • Pipa: Tap the string to pluck a traditional Chinese lute.
  • Smart Guitar: Strum an onscreen guitar to play chords, notes, or grooves. Four styles (acoustic, classic clean, hard rock, roots rock)
  • Drummer: Create grooves and beats using a virtual session drummer

In general, the options for one track can be shown in the image below.

modules

credit: Jieshu Wang

2.1.3. Modularity of action

Under this modular organization, users’ actions of creating a song are also divided into modules. Users have to divide a song into several sections and edit each track in each section separately. For example, a song of 96 bars can be divided into 12 sections of 8 bars. Let’s say it is a simple pop song with 5 tracks—drum, two guitars, bass, and vocal. There are in total 12X5=60 modules that can be edited separately. Accordingly, the user can divide her action into 60 sub-actions. First, she would edit section A—firstly, the drum module of section A, then the two guitar tracks of section A, then the bass track of section A, and then the vocal track of section A. While she is editing the bass track of section A, she must play the three tracks (drum and two guitars) that she already edited in order to synchronize the beats. This function is an interface between modules of actions. Similarly, when she is recording her vocal for the fifth track of section A, she must wear her earphone to listen to the instrument accompaniment of the first four tracks, in order to follow the beat and tune of existing modules. If the user needs some backing vocal, she can add an additional vocal track and sing harmony all by herself.

There are interfaces between different sections. For example, many pop songs have some conventional chord progressions such I-VI-IV-V. In this case, users can simply copy and paste the repeating tracks into new sections. In addition, the drum doesn’t vary a lot during a song. So users can also copy and paste previous drum tracks into later sections, or just loop them to fill the whole song. In Give Life Back to Music, I copied and pasted many tracks, such as the drum tracks and keyboard tracks in order to save time.

With this modularity of action, the creating process of songs is simplified. It’s easy for amateur musicians to manage the complexity of the music.

2.2. GarageBand as a module for other systems

The music industry is a complex sociotechnical system. A lot of technologies, organizations, individuals, commercial companies, and academic institutes are involved in this global system. GarageBand is a part of it, serving as a module for many larger systems.

GarageBand is a module of the iLife software suite, which contains iMovie, iPhoto, iWeb, and other media software. These software all have their own functions, applications, and purposes. For example, GarageBand is a music software, while iPhoto is used to edit images and iWeb is a website creation tool. Meanwhile, they interact with one another through interfaces. For example, song projects created in GarageBand could be imported into iMovie, serving as background music for videos, which in turn can be imported into iWeb, as a part of the web page. In my video of Give Life Back to Music, I exported the song from GarageBand into iMovie.

imovie

GarageBand projects can be imported into iMovie

Moreover, as a part of Apple system of software and hardware, GarageBand projects can be transported very easily between Apple devices through AirDrop, a feature using Bluetooth technology, as shown below.

airdrop

A GarageBand project created on iPad was transmitted to MacBook using AirDrop. It can be edited further using the MacBook version of GarageBand or other software such as Logic Pro.

In addition, GarageBand can interact with other systems outside Apple system through interfaces. For example, Voice Synth is a virtual vocoder on iPad. Since there’s no function of vocoder in GarageBand, when users want to use vocoder, they have to turn to third-party applications such as Voice Synth, as shown on the upper panel in the image below. Here, I will show you the interface between GarageBand and Voice Synth. I used the “Robot” effect in Voice Synth to record me singing “let the music come tonight, we are gonna use it; let the music come tonight, give life back to music”, exported it as a wav format file, updated the audio file to my iCloud Drive on the cloud of Apple, and imported it into an audio track in GarageBand, where I can edit it further and mix it with other tracks. With third-party modules of applications, GarageBand doesn’t need to design its own vocoder module, which might cost a lot of money, and users don’t need to install Voice Synth if they don’t need vocoder effect—not all users want to distort their voice. The interfaces involved here include protocols that are shared by the audio processing community, such as the audio format, Cloud computing, and data transmission methods. On the other hand, the projects of GarageBand can also be exported into other apps such as Logic Pro for further manipulation, partially because they share the same audio engine.

vocoder-interaction-2

GarageBand also can be used as a module in a hardware system. Using an audio interface such as Apogee Jam, users can use GarageBand as a virtual amp for guitars and basses.

3. Affordance

Affordance is a “property in which the physical characteristics of an object or environment influence its function[1].” As Donald A. Norman mentioned in his book The Design of Everyday Things[2], affordance provides us with clues that how things “could possibly be used”. The design of user interfaces of GarageBand demonstrates this principle, too.

Many interfaces of virtual instruments imitate interfaces of real instruments. For example, there is a virtual keyboard in the module of keyboards. This imitation follows people’s existing mental model, so that users know how to play the keyboard at the first glance of the interface.

Interfaces of some keyboards

Interfaces of some keyboards

Icons on the interface follow people’s existing mental models, too. For example, the green triangle indicates “playing the music”, while the red dot indicates “recording.” And the virtual wheels and rotary knobs afford rotating, let alone the black and white keys that imitate piano, which afford pressing. When you are pressing one of the keys, the hue of the key you are pressing would be darker, imitating the shadow of real keys, indicating that you are “pressing down” a key.

press-keys-shadow

The shadow of the key that users are pressing.

The interfaces for drums also imitate real drums. There are several virtual drumheads that afford knocking. Sometimes, clicking different areas of the same drumhead would cause different sound effects, just like real drums. For example, tapping the center of the drumhead of the biggest drum in Chinese drum kit would cause a deep hit sound, while tapping the rim of the drum would sound like clear knocking. Moreover, the stronger you press the touchscreen, the louder the sound will be. In addition, different gestures would cause different effects, too. For example, in the Chinese drum kit, if you drag your finger around the rim of the biggest drum, it will sound like a stick sweeping across a rough surface—a “rattle” sound.

drum

Interfaces of some drums

However, in the function of “smart drum”, things are different. There are no virtual drums in the interface, but an 8X8 matrix. The two dimensions of the matrix are “Simple-Complex” and “Quiet-Loud”. There’s no such thing as a “drum matrix” in real life, but users know how to use the matrix once they see the interface—there are 64 squares in the matrix, and there are drum components with similar sizes arranging to the right of the matrix. It seems the components are waiting to be dragged into the matrix. So the components afford dragging. There is an icon of dice on the lower left. Physical dice affords rolling. So the perceived affordance of the dice is rolling in order to get a random result. Indeed, when you tap the icon of the dice, it will “roll” in its place, and the drum components will randomly “roll” into the matrix, forming a random beat pattern in a metric framework according to your tempo and time signature.

img_0111

Interface of smart drum

Another example of affordance is the interface of guitars. There is an icon of a switch on the upper right of the screen labeled “chords” and “notes,” which you can tap to switch between chords mode and notes mode. The notes mode imitates the interface of real guitars, with six strings, which you can tap to play or drag to produce a little variation of pitch. However, the interface is different from real guitar. A real guitar player would use his/her left hand to hold the chords and use his/her right hand to pluck or strum the strings. But in GarageBand, you only see the left part of the neck. However, it’s very easy for a guitar player to realize how to play the virtual strings—by tapping the string between frets, which afford tapping.

The chords mode imitates nothing in the real world, but it provides users with a perceived affordance of tapping as well. As you can see from the gif below, there is a rotary knob at the upper center labeled “autoplay”, with which you can choose from four pre-made chord progressions or you can turn off the autoplay. There are eight vertical bars, each with a chord name, according to your key. For example, for the key of C major, the eight bars are labeled as Em, Am, Dm, G, C, F, Bb, and Bdim. All of them are common chords used in C major. If the autoplay is off, six strings would remain on the screen, affording tapping. If you tap the chord label on the top of the vertical bars, the six strings would be “played” at the same time, imitating the sound effect of strumming. If you tap individual strings in the vertical bar labeled Em, it will play the sound of the corresponding string as if your left hand is holding the Em chord. If you turn on the autoplay mode, all you have to do is tapping the chord name, and GarageBand would play some pre-made chord progressions.

Interface of the Hard Rock guitar in GarageBand

Interface of the Hard Rock guitar in GarageBand

Other instruments like Smart Strings, Pipa, Erhu, and Smart Bass also have many well-designed affordances.

In a word, the designers of GarageBand are really good at using affordance. They imitate real instruments and use many icons, switches, and rotary knobs to integrate so many complex functions in a limited screen.

amp

The interface for amps is full of virtual rotary knobs.

However, many designs are not completely created by GarageBand designers. For example, there are a lot of music applications that imitate guitar and piano. Many of them use similar interfaces as GarageBand. But few apps combine keyboards with guitars in one app, and most of them don’t provide such flexibility as GarageBand. Some professional apps such as Logic Pro provide users with a massive library of sound effects and huge freedom to manipulate music, but they usually cost a lot of money and space. Logic Pro X is powerful but costs $199 for OS X system, and there’s no IOS version. On the contrary, GarageBand cost me just ¥30 (approximately $5) five years ago, and now it’s free for all iPad users!

4. Constraints

The IOS version of GarageBand has many constraints.

First of all, the app size is limited by the maximum size for IOS apps—4GB. The standard was set up by Apple and had increased from 2GB to 4GB in 2015[3]. The app size of the current IOS version of GarageBand is 1.28GB. It makes sure users have enough space to store their projects.

Second, the size of interface area of GarageBand is restricted by the physical size of the touchscreen of iPad. The most common sizes of iPad are 7.9-inch (iPad Mini) with 2048 X 1536 resolution, 9.7-inch with 2048 X 1536 resolution, and 12.9-inch with 2732 X 2048 resolution. It’s bigger than a cell phone but smaller than a laptop computer, so they need different designs. Everything must be on the touchscreen. That is one of the reasons why they design sections. Imagine we have a screen two meters long, maybe we can work without sections.

Furthermore, many music instruments are very long, such as piano, guitar, and erhu. A common piano has 88 keys, and a common guitar has 18 frets. How to put them on a small screen? The designers of GarageBand have many good ideas. For example, for keyboards, the default setting is two octaves from C2 to C4. You can scroll the keyboard to the left or right to play higher or lower pitches. In all, there are 10 octaves. Besides, there is a double-row mode with which you can play four octaves on the screen, as shown below.

keyboard-two-rows

The third constraint is that the gesture used in GarageBand is limited by the capacity of the touchscreen. Today, iPad’s multi-touch screen is very powerful. It can sense the pressure of fingers and responds accordingly. For example, the stronger you tap the virtual drums in GarageBand, the louder it will be. But GarageBand will not respond to the finger pressure lighter than the lower limit or stronger than the upper limit of the recognizable pressure of the touchscreen. Besides simple tapping, it also supports other gestures, such as dragging. Designers of GarageBand should choose gestures that are available in iPad, otherwise, the gestures will fail. Other versions of GarageBand have their own constraints depending on their platforms. For example, the OS X version of GarageBand doesn’t support multi-touch gestures because MacBook doesn’t have a touchscreen, but it has a much bigger library of sound because the processing capacity of MacBook is powerful than that of iPad.

5. Does GarageBand fulfill Alan Kay’s vision?

Alan Kay envisioned a universal media machine, with which people can remediate all kinds of media and create their own media with unlimited freedom[4]. Does GarageBand fulfill his vision? I don’t think so.

First of all, GarageBand doesn’t provide us with a flexible enough programming environment. In fact, it doesn’t provide any programming environment at all. It gives us a library of sound effects and pre-made loops, but it’s not easy for you to create your own. It doesn’t allow you to edit the properties of sound. For example, if I want to edit my voice, there are only nine effects for me to choose from. I can’t modify the acoustic characteristics as I wish. It’s like a coloring book with pre-printed line drawings that you can fill with colors but you don’t really “create” the art and it will not improve your creativity as well. You are restricted by the line drawings. It produces an illusion of “creativity.” Most times, when we are talking about “creating” music in GarageBand, we are just re-mixing pre-existed sound effects stored in GarageBand in pre-made ways. Just like my “re-creating” of Give Life Back to Music, there’s nothing creative in my “re-creating”. All the creativity came from Daft Punk.

Second, GarageBand cannot be used to edit media other than music. It has nothing to do with videos, texts, paintings, and so on. It is not a meta-medium.

However, I think GarageBand in some degree democratizes music. For example, I never succeeded in playing the F chord in guitar but I can play it in GarageBand. I can’t sing harmony with myself, but I can record harmony in different tracks in GarageBand and play them together as if I am singing with myself. I don’t know how to write a song, but when I re-create other people’s songs in GarageBand, I can learn the arrangement and composition of songs by decomposing them.

6. Conclusion

GarageBand is a music software with which amateur musicians can create songs on Apple devices. In this paper, I discussed the design principles in the iPad version of GarageBand, such as modularity, affordance, and constraint. In particular, I argue that GarageBand doesn’t fulfill Alan Kay’s vision of meta-medium, but it does simplify the process of creating music for amateur musicians.


References

[1] Lidwell, William, Kritina Holden, and Jill Butler. Universal Principles of Design. Gloucester, Mass: Rockport, 2003.

[2] Norman, Donald. The Design of Everyday Things. Basic Books, 2002. http://proquestcombo.safaribooksonline.com.proxy.library.georgetown.edu/9780465003945.

[3] Kumparak, Greg. “iOS Apps Can Now Be Twice As Big.” TechCrunch. Accessed December 18, 2016. http://social.techcrunch.com/2015/02/12/ios-app-size-limit/.

[4] Manovich, Lev. Software Takes Command. International Texts in Critical Media Aesthetics, volume#5. New York ; London: Bloomsbury, 2013.

Wikipedia on the Mobile Web: a Case Study – Jieshu

I will try to use Wikipedia in a web browser–Safari on my iPhone to explore the systems behind the interfaces.

Thanks to my browsing history stored in the cookies of my iPhone, I don’t need to type in the complete URL of https://en.m.wikipedia.org/wiki/Main_Page . After I typed in “wi”, Safari automatically filled the address bar with the whole URL. So, the address bar of Safari is an interface to the database of cookies on my iPhone.

IMG_0610

The address of Wikipedia is also an interface to many systems. For example, the “https” in the URL means Wikipedia is a website “on the World Wide Web using HTML[i]”, therefore an interface to a complex architecture of protocols and resources linked by hypertext. The “m” means the URL directs to a mobile version of the page. In this case, the mobile version is based on the MobileFrontend Extension, a tool by MediaWiki that provides mobile-friendly views[ii]. The “wikipedia” in the URL is the domain name, which was registered with Network Solutions in January 2001[iii]. The “org” is a top-level domain name used for organizations.

After I confirmed the URL, the Safari sent the address to my Internet provider—xfinity, which in turn sent it to a node of the DNS, where the address was expressed in IP address—208.80.153.224. (However, I couldn’t visit Wikipedia through this IP address. I don’t know why. I was shut out of this blackbox…)

屏幕快照 2016-11-21 下午6.50.29

Using this string of numbers, Safari handed the request to the site server and waited in a queue for its turn[i]. In less than one second, the page was loaded completely.

IMG_0609

As we can see from the screenshot above, right below the address bar, there was an area called Smart App Banner, an interface prompting me to open Wikipedia’s mobile app for it detected the app was installed on my iPhone[iv]. Below the banner, there is a navigation button, a search bar, today’s featured article, and entries appeared in today’s news. These sections are graphically presented in Safari through HTML codes, specifically, HTML5, which allows an adaptive content display across different mobile devices.

Below is an image of the HTML codes of Wikipedia’s main page, but the screenshot was made on my laptop since Apple does not allow people to view page sources on iPhones. The HTML file is an interface to a large system of media resources, including texts, images, sounds, and videos. For example, the logo of Star Trek: First Contact on the upper left is a “hyper image”, which is an image with a hyperlink. It is specified in the red box in the HTML file, which defined its source in the database (the blue characters), its size (192X78), its location on the page, and its linking target (/wiki/File:Star_Trek_First_Contact_logo.jpg), which is a high-definition version of this logo.

屏幕快照 2016-11-21 下午9.45.55副本

There are also many hypertexts, which are colored blue and through which you can access to other Wikipedia entries. The hypertexts in purple indicate the ones I clicked before.

Wikipedia is also an interface to the rest part of the World Wide Web. In every entry, there are links to the information resource. For example, the entry of Star Trek: First Contact has 121 references, each one with a hyperlink to a web page outside Wikipedia.

If you have a Wikipedia account, you can log in your account by pressing the “log in” option emerged after you click the navigation button. Here are two interfaces, one of which is the auto-filled user name, an interface to the keychain stored in my iCloud. The other interface is by logging into my account, I was mediating with the user management system of Wikipedia. The system constantly monitors and remembers my actions on Wikipedia, such as changing my profile and editing entries.

FullSizeRender副本

In addition, Wikipedia also serves as a system accessible through other interfaces. For example, in the interface of Youdao Dictionary on my MacBook, there is an area for Wikipedia’s featured articles. As today’s article is Star Trek, the logo of the film and a brief introduction are shown. By pressing the hypertext colored in blue, you can access to the Wikipedia entry of Star Trek. This is possible thanks to Wikipedia’s API, another interface designed for developers to draw data from Wikipedia.

屏幕快照 2016-11-21 下午10.33.45

Wikipedia featured article shown in the interface of Youdao Dictionary.

Some discussion about the debate of “Apps Vs Web”

I don’t think apps will replace the web in the future. I have a Wikipedia app installed on my iPhone. I prefer web version because I can open a hyperlink in a new page. Thus, I can go back to any of my previous pages. But in the app of Wikipedia, my route is completely linear and it’s very easy to get lost. However, app has its merits, especially for services. The web is like an open square while app is like a luxury SPA. The web will continue to thrive on information discovery, while app will keep growing in services providing, such as e-commerce and food delivery. I think they will coexist in a long time, occupying different niches.


References

[i] White, Ron, and Timothy Edward Downs. How Computers Work. 7th ed. Indianapolis, IN: Que, 2004.

[ii] “Extension:MobileFrontend.” MediaWiki, n.d. https://www.mediawiki.org/wiki/Extension:MobileFrontend.

[iii] “History of Wikipedia.” Wikipedia, November 19, 2016. https://en.wikipedia.org/w/index.php?title=History_of_Wikipedia&oldid=750413918.

[iv] Austin, Alex. “How to Set Up An iOS and Android Smart App Banner.” Accessed November 22, 2016. https://blog.branch.io/how-to-setup-an-ios-and-android-smart-app-banner-with-deep-linking-and-download-tracking.

The Internet: The Most Complex System – Jieshu

What does it mean to be on the Internet? Ten years ago, my answer would be “connecting to a modem or a hub using a cable.” Several days ago, my answer would be “connecting to wireless signals.” After this week’s reading, I would say: “First, you need to distinguish ‘on the internet’ from ‘in the internet.’” Here I will try to explain what it means to be on the Internet.

As I said, first we need to reiterate that as end users of the Internet, we are actually “on” or “attached to” the Internet instead of being “in” the Internet. According to Schewick in her Internet Architecture and Innovation, computers on the Internet are those that “support users and run application programs[i]”, such as our PCs, which make up the “edge” of the Internet. On the contrary, computers that establish connectivity among the computers attached to the Internet and that form or implement the network are seen as being “in” the Internet, such as “cable modem termination system operated by a cable provider.” So, being on the Internet partially means using computers that “interface directly with the end users[ii]”.

Second, since the Internet is a modular system organized in layers, being on the Internet also means interacting with user interfaces with details of these modules and layers being black-boxed. For designers, modularity reduces the complexity while layering increases the modifiability[i]. For end users like us, modularity and layering mean that we can surf the Internet without professional training. An example is my experience as a webmaster without specialized knowledge about the Internet. When I was in college, I got a part time job of a webmaster for one dormitory building. My job was to ensure everyone in the building able to connect to the Internet. If one student was offline, I had to find out the problem and fix it. Here are some of my routines if an offline was reported. First I would check whether the cable was correctly connected between the computer and the hub. Then, I would ping the gateway to see whether the computer was connected to the gateway. Most of the time, the problem was IP address conflict because my university assign a static IP address to each student, and it happened that students mistakenly used other people’s addresses. If IP conflict was the case, I would log in the host to check the physical address of the computer that was using the IP address in question and then log in the student information system to check who owns the computer and call him/her to change the IP address. Retrospectively, I feel funny about the fact that I could do my job very well without specialized knowledge about the Internet, which was made possible largely by the blackboxness coming with modularity and layering. The graphic user interfaces of the computers, the hosts, and the student information system enabled me to easily interact with their huge and complex inner structures.

Being on the Internet also means to send and receive data in the form of packets[iii]. Packet switching was a method designed to send messages, with each unit 1500 bytes or less[ii], increasing the robustness of the Internet. For example, a web page was made up of many different media objects, such as texts, hyperlinks, images, music, and videos. When I open a web page in my web browser, the media objects are divided into many small packets. As Abelson, Ledeen, and Lewis mentioned in their Blown to Bits: Your Life, Liberty, and Happiness After the Digital Explosion, “the packets that constitute a message need not travel through the Internet following the same route, nor arrive in the same order in which they were sent.” Sometimes, errors might occur, such as an unloaded image whose place is left blank, however, with other objects intact. You can leave it alone and just read the text in the page, or you can reload the image and it will come to you soon through different routes without reloading the whole page.

In the dimension of time, being on the Internet is a cross-section of an inevitable but accidental historical process. Many visionary pioneers have envisaged things like the Internet. As we read in the past weeks, Vannevar Bush’s memex, Licklider’s man-computer symbiosis, Engelbart’s hypertext, the invention of E-mail[iv]… All paved the way to our ubiquitous networkings. Reading the history of the Internet reminds me of the old days when the Internet was a luxury. In 1994, My father brought home two old and malfunctioning computers from Singapore and fixed them, probably the first computers ever appeared in my hometown. He also had a bunch of five-inch floppy disks. Three years later, Internet bars started to appear in the little town where you could spend two dollars for one hour on the Internet. I remember I spent much of my pocket money in one Internet bar in order to collect information for a course paper about John Rabe’s diary during the Nanjing Massacre in WWII. I stored texts and images in my 3.5-inch floppy disk, which had only 1.44MB space, unbelievably small in today’s standards—not enough for a single image shot by my iPhone!

Who would have thought two decades later, there would be something called the Great Firewall of China (GFW)? It demonstrates another aspect of “being on the Internet”—politics. It may sound ridiculous for the western society that people in China have no access to Google, Facebook, Twitter, Instagram, YouTube, Wikipedia, New York Times, and many other websites including some academic ones because they might “jeopardize the country’s traditional values and political stance[v]”. The Internet in China is jokingly called “the biggest LAN in the world”. I even cannot log in my Georgetown email account in China. I have to purchase a VPN to do that. Most Chinese netizens don’t have VPNs, so they use Baidu, a Chinese search engine instead of Google. The difference between Baidu and Google is obvious. Baidu not only censors information politically but also ranks the search results according to the bidding list, which is the biggest source of revenue of Baidu. In Baidu’s search result, it is very difficult to tell the difference between a normal web page and an advertisement. Earlier this year, a 21-year-old college student called Zexi Wei died after an experimental treatment for synovial sarcoma at a hospital that he learned of from a promoted result on Baidu[vi]. His death revealed another aspect of the Internet—ethics. The packets constantly sending and receiving in our computers have so much power on our life that they allow some people to become billionaires while others as poor as church mice, even being able to decide life or death.

These facts keep reminding me of the complexity of the Internet, a social-technical-political-economic system[iii] that impacts everybody’s life.


References

[i] Schewick, Barbara van. 2010. “Internet Design Principles.” In Internet Architecture and Innovation, New edition. Cambridge, MA: The MIT Press.

[ii] Abelson, Harold, Ken Ledeen, and Harry R. Lewis. 2008. “The Internet as System and Spirit.” In Blown to Bits: Your Life, Liberty, and Happiness after the Digital Explosion. Upper Saddle River, NJ: Addison-Wesley.

[iii] Irvine, Martin. n.d. “Introducing Internet Design Principles and Architecture: Why Learn This?”

[iv] Campbell-Kelly, Martin, William Aspray, Nathan Ensmenger, and Jeffrey R. Yost. 2013. Computer: A History of the Information Machine. 3 edition. Boulder, CO: Westview Press.

[v] “How to Access Websites from China.” 2016. VPN Critic. November 4. https://vpncritic.com/how-to-access-websites-from-china/.

[vi] “Death of Wei Zexi.” 2016. Wikipedia. https://en.wikipedia.org/w/index.php?title=Death_of_Wei_Zexi&oldid=738455235.

Computing Devices as Metamedia -Jieshu

Most of our contemporary computing devices are metamedia, according to Kay and Manovich. Kay and Goldberg called the computer “a metamedium” whose content is “a wide range of already-existing and not-yet-invented media.” From this week’s reading, I identify some reasons.

First, they can be used to represent other media[i]. PCs, smart phones, and tablets are able to represent images, videos, music, books, and other media that are sampled and discretized into numbers.

Second, modern computing devices can edit, combine, and augment other media “with many new properties”, as mentioned by Manovich in his Software Takes Command. For example, iMovie on my MacBook can be used to edit videos. You can insert still images, add music or other audio tracks into the videos, and cut off green screens in the video frames, enhancing the collective performance of individual medium. Another example is an image editing software called Prisma, with which you can render your picture with the style of some famous artworks (as shown below). According to one of Manovich’s propositions mentioned in his New Media: Eight Propositions, the functions of iMovie I mentioned above can be done by human manually, but at a much slower pace, such as manually stroking the shape on a film and cutting the remainder with a scissor[ii].

未标题-1

A photo rendered with the style of The Great Wave off Kanagawa by Hokusai.

Third, metamedia can create new media that do not exist in the past. For example, computer games are a new genre of medium that emerged from modern computing devices. There’s no counterpart of computer games before the information age. One way to create new media with metamedia is hybridization, mentioned by Manovich in his Software Takes Command, which creatively fuses different media together. In order to design a computer game, game designers need to use specialized computing devices to combine digital 3D models, photography, film, scene design, storytelling, history, music, artworks, and other media together.

Furthermore, as Manovich proposed in his Software Takes Command, computing devices have the potential to generate “new media tools”. For example, computers can be used to develop new media software and algorithms. A perfect example is Kay’s Smalltalk program that was designed to allow users to develop their own software. For instance, musicians developed with Smalltalk a system called OPUS that was able to convert sounds of a keyboard into a digital score. A seven-grade girl who never coded before even made a drawing system with Smalltalk[iii].

The Transition of Computing to “Better Democracy”

Kay’s vision was to transform “universal Turing machine” into “universal media machine[i].” This transition in concepts of computing allows for “better democracy[ii]”, as Manovich put it. In other words, it supports average people to manipulate media much more easily and cost-effectively, without professional training. From then on, companies were trying to build personal computing devices with graphic interfaces.

As Kay proposed in his A Personal Computer for Children of all Ages, the price of a Dynabook is $294 in 1972[iv], proximately $1675 for today. Thanks to Moore‘s Law, we can spend much less than that to get a good computer with powerful media processing ability today.

Differences Between Two Kinds of Media

In our digital world, there are a lot of media content. Some of them are captured digitally, which is continuous (part of our media and technical mediation continuum), such as JPG files of digital photos and MP3 files of live music. Some are created totally in software environments, which is new (specific to computation and digital media), such as images drawn with Photoshop from scratch and music generated by AI. These two categories of media have many differences.

First of all, they are generated differently by definition. Media captured digitally are generated through sampling and discretization from analog signals, while media “born digitally” are generated by algorithms, exactly speaking, through the collaboration of humans and algorithms. Thus, continuous media definitely have a source in the real world, while the media “born digitally” are not necessarily so. For example, the digital photo of Lenna was scanned from a magazine, so it has a source in the real world —the printed photo on the magazine. On the contrary, an image produced in a software environment does not need to have sources in the real world. Even if it indexes an object in the real world—e.g. a caricature of a real person—it’s resemblance to its object may vary significantly.

未命名_meitu_1

Left panel: a digital photo of astronaut Claude Nicollier repairing the Hubble Telescope (Source: NASA). Right panel: an image depicting the same event created using software. (Credit: Jieshu Wang)

Second, the resolutions of continuous media are limited by the devices that capture them and the methods used to sample and digitalize them, but media “born digitally” have the potential to enhance their original resolutions. For example, Lenna’s image was sampled with a 512*512 scanner. That is to say, if you zoom in the photo, it will become fuzzier and fuzzier, until individual square pixels are recognizable. But things are different with images generated by algorithms. For example, an iPad app named Frax can generate fractal images that can be zoomed in and zoomed out vastly without a decline in resolution.

2

Lenna’s image gets fuzzy while being zoomed in.

Video: Zooming in and zooming out images in Frax do not cause a decrease in resolution.

Third, continuous media cannot be produced automatically, while media “born digitally” can. For example, as Manovich mentioned in his New Media: Eight Propositions, 3D Non-Player Characters (NPCs) in computer games move and speak under software control[ii]. For instance, in the game of Assassin’s’ Creed, NPCs are generated randomly and would respond to your behavior according to algorithms. For example, running into an NPC will raise your notoriety. Walking down the street with a high notoriety would draw the attention of nearby NPCs and enemies, which might cause combats. Another example is the fact that music captured digitally has to be recorded in a physically existed concert hall or a recording studio. But music generated by algorithms can be composed and produced automatically without human interference as long as the necessary models and variables are provided.

Video: This piece of music is produced by Google’s Magenta program, which is designed to use machine learning systems to create art and music systems.

There’s no absolute boundary in between

However, I don’t think there exists an absolute boundary between the two. They are overlapping bands on a continuous spectrum. Here are my reasons.

First of all, they are both sampled and then discretized signals. While continuous media are obviously sampled from analog signals, media “born digitally” can be seen as samplings of continuous algorithms.

Second, they both need decoding equipments to convert them into perceptible signals in order to be interpreted by human users.

Third, media that completely generated by software do not exist. Media “born digitally” also need human involvement. Software needs to be designed by programmers. In addition, the rules they use to generate media must follow people’s social conventions and mental models for media. Moreover, many software-generated media use digitalized analog media as building blocks, or they try hard to imitate effects of analog media. For example, Google Earth uses digitalized satellites images to build its 3D maps. Garageband allows users to choose from sound effects that perfectly imitate the timbres of real musical instruments. All Photoshop filters are aimed to reproduce real painting brushes, even though they can produce effects that don’t exist in real life, as Manovich stated in his Software Takes Command[i].

Finally, media “born digitally” can also be sampled using methods that are used to sample analog media. Midi files can be converted into Mp3. Frax images can be exported into JPGs. As Manovich said in his Software Takes Command, the newness of new media “lies not in the content but in the software tools used to create, edit, view, distribute, and share this content.” Therefore, as long as the two kinds of media can be processed and distributed with the same software, they are unified. Both continuous media and media “born digitally” can be seen as new media.


References

[i] Manovich, Lev. 2013. Software Takes Command. International Texts in Critical Media Aesthetics, volume#5. New York ; London: Bloomsbury.

[ii] Manovich, Lev. 2002. “New Media: Eight Propositions.” In The New Media Reader, edited by Noah Wardrip-Fruin and Nick Montfort. The MIT Press.

[iii] Kay, Alan, and Adele Goldberg. 1977. “Personal Dynamic Media.” Edited by Noah Wardrip-Fruin and Nick Montfort. Computer 10 (3): 31–41.

[iv] Kay, Alan. 1972. “A Personal Computer for Children of All Ages.” Palo Alto, Xerox PARC.

The Transformation of Computation is Still a Long Way to Go – Jieshu

The histories of computation in this week’s reading are fascinating. I got a glimpse into the days when the concepts of personal computer and network were just starting to form in those great minds, including Alan Kay, Vannevar Bush, and Douglas Engelbart. They contributed a lot to the transformation of computation from the context of military and government to our daily life, which in turn changes the whole world.

1. Conceptual Transformations

Although today’s personal computers and the Internet are taken for granted by many of us, it was really difficult for those early pioneers to conceive. Like many other twentieth-century innovations, “general purpose” computation sprouted from military, government and business applications. Here, I try to identify some conceptual transformations from this week’s reading.

  1. In 1945, in his article As We May Think published on Atlantic, Vannevar Bush emphasized the importance of continuity of scientific records. He then envisioned a hypothetical system called memex that would be used to store, search, trail, and retrieve information in the form of microfilms[i].
  2. In the 1960s, J. C. R. Licklider proposed a man-computer symbiosis system that would enable men and computers to interact organically[ii]. He even foresaw applications like video conferences and virtual intelligent assistant[iii].
  3. Influenced by Bush and funded by Licklider, in the 1960s, Doug Engelbart suggested a system called H-LAM/T and tried to use networks of computers to augment human intelligence, in contrast to the contemporary school of artificial intelligence that tried to replace human intelligence with computers[iv].
  4. In PARC, Alan Kay envisioned a future where computational devices were used as “personal dynamic media” that could enable everyone to handle their own “information-related needs”. He wanted to turn “universal Turing Machine into Universal Media Machine[v].”

2. The New Design that is Old

They also designed many devices and systems definitely ahead of their time, such as Dynabook by Allan Kay, memex and hypertext by Bush, and mouse and computer networks by Engelbart. What amazed me is that many applications and features I consider as being novel stem from the brainchildren of those pioneers, and I never knew this fact in the past. For example, in the iPad app Sketches, you can use an Apple Pencil to draw lines that would automatically align to form rectangles. You also can rotate them and change their sizes. It’s very much like the system of Sketchpad and light pen developed by Ivan Sutherland in 1963, although there are more features available, such as colors and brushes.

3. Computer as a Metamedium

With the development of computing power, those pioneers’ visions gradually came true. in his Software Takes Command, Lev Manovich said that computer became a “metamedium” whose content was “a wide range of already-existing and not-yet-invented media[v].” This is exactly what characterize digital media as new media.

In the 1950s and 1960s, people are not that interested in sharing information with computers, because they already had a lot of media, such as TV, photography, and print[iv]. But digital media don’t merely imitate what conventional media do. Digital media enable us to create our own media. For example, the iPad app Garageband can not only imitate real music instruments like guitars and drums but also can record any sound and use it as a new tone to play music.

IMG_0076

Sampler in Garageband enables you to record any sound and use it as a new tone

4. A Long Way to Go

The two apps I mentioned are both along the path initiated decades ago by those pioneers. However, their visions haven’t completely realized. Here are some examples.

  1. Kay envisioned a system in which everyone including children can build their own media by programming. But recent computing devices are drifting away from this vision. For example, Apple’s products are criticized being closed and not programming-friendly. But these products were embraced by many consumers. In my view, it may not be a bad thing because there are increasingly many applications available with which you can create your own media without programming, such as iMovie and Garageband.
  2. Another example is the Virtual Assistants such as Siri that have much in common with OLIVER (on-line interactive vicarious expediter and responder) proposed by Oliver Selfridge and mentioned by Licklider in his The Computer as Communication Device in 1968[vi]. OLIVER was described being able to develop itself by learning from its experience in your service. This is exactly what artificial intelligence researchers are doing but not doing well today.
  3. In “mother of all demos”, Engelbart presented a graphic road map that included to-do lists and shopping lists. Even today, I couldn’t find any application that could do this.
  4. Engelbart suggested a timesharing program called NLS that could be shared by hundreds of users. It surprises me that how lately this kind of applications appears. Even today, they couldn’t fulfill Engelbart’s vision. I once was a member of an online cooperative team of over 100 members. We couldn’t find an appropriate application that allowed us to read and edit a same file at the same time. In 2014, the best app we could find was Youdao Cloud Notebook but we frustratingly found that files were always overwritten by other people, wasting us a lot of time. Last year, we found a better app called Quip, but it still had many problems. Not to mention how far away it was from a “knowledge navigation and collaboration tool” imagined by Engelbart.

References

[i] Vannevar, Bush. 1945. “As We May Think.” Atlantic, July.

[ii] Licklider, J. C. R. 1960. “Man-Computer Symbiosis.” IRE Transactions on Human Factors in Electronics HFE-1 (1): 4–11. doi:10.1109/THFE2.1960.4503259.

[iii] Licklider, J. C. R. 1968. “The Computer as Communication Device.”

[iv] “CHM Fellow Douglas C. Engelbart | Computer History Museum.” 2016. Accessed October 31. http://www.computerhistory.org/atchm/chm-fellow-douglas-c-engelbart/.

[v] Manovich, Lev. 2013. Software Takes Command. International Texts in Critical Media Aesthetics, volume#5. New York ; London: Bloomsbury.

[vi] Licklider, J. C. R. 1968. “The Computer as Communication Device.”

Call Me a Coder: Some Thoughts about Coding – Jieshu

Learning Python on CodeCademy was so interesting that I nearly forgot there was a blog post to write. Through learning basic knowledge of Python, I got a glimpse into how a programming language is designed to specify symbols that mean things and symbols that do things. I also learned why it is inevitable to use programming languages rather than natural languages to interact with machines if we want them to complete specific tasks.

In Python, the symbols that mean things include variables and strings to that meanings could be assigned by users. For example, we can assign the name of a girl called “Alice” to a variable using codes like girl_name = “Alice” and retrieve the second letter in the name on the console using codes like print girl_name[1]. But machines don’t know the meaning of Alice. They don’t know who Alice is and they even don’t know whether Alice is the name of a person or a dog. Moreover, we can assign any name to “Alice” other than “girl_name”. It won’t make any difference to computers. Alice and Bob are indistinguishable to machines, only with small differences in the order of letters, while in human beings’ eyes, they are names of a girl and a boy who have their own stories. Machines follow predetermined and predictable cascades of actions to process and store these symbols that are meaningful to humans in the form of 0 and 1.

Meanwhile, there are symbols that do things, include the most basic instructions like “print” and “return”. The print code is responsible for displaying anything that behind it onto the screen.

hello-world-gif

print “hello world”

Furthermore, there are symbols that simultaneously mean things and do things. As far as I learn, function is one way to combine the two together, using symbols that mean things to do things such as calculation and representation.

Here are some thoughts emerging from my learning.

Programming languages have a lot in common with natural languages. For example, they both have a tripartite parallel architecture mentioned in Ray Jackendoff’s Foundations of Language[i]. In the section of language construction in his Introduction to Computing, David Evans mentioned that the smallest units of meaning in programming languages are called primitives, corresponding to morphemes in natural language. Besides, they both have syntax and semantics governed by certain grammars. Like natural languages, programming languages also have recursive grammars, allowing infinite new expressions using limited sets of representations[ii].

Programming languages like Python use many words and abbreviations from natural languages with their original meanings persisted. For example, “print” means print something on the screen. “And“, “or“, and “not“, the three Boolean operators have the same meanings as in English. The “def” in the head of a function is an abbreviation for “define”. Not to mention “max()“, “min()“, and “abs()“, the three functions using frequently used abbreviations. This intuitive method lowers the learning difficulty.

The big contrast between the high simplicity of the programming languages and the complexity of the tasks they can achieve really impressed me. I’m not saying that coding is easy to learn. Their simplicity lies in the comparison with natural languages. In his Introduction to Computing, David Evans mentioned four reasons why natural languages were not suitable for coding—complexity, ambiguity, irregularity, uneconomic, and limited means of abstraction[ii]. Compare to natural language, programming languages are really simple. However, the meanings they are designed to express are complex. Programming languages serve as artifacts to that we offload and distribute our cognitive efforts, in order to achieve complex tasks. For example, according to the Wikipedia entry on Python, Wikipedia, Google, CERN, and NASA make use of Python, and Reddit is written entirely in Python. Even machine learning, the most advanced and sophisticated branch of computer science is rooted in the simple syntax of programming languages.

Although I never literally coded before, many programming ways of thinking have already been used by me for a long time. That’s because a lot of things we use today are rooted in programming. The programming thinking has distilled into our daily life. Here are some examples of my experiences:

  1. When I was learning Python in Codecademy, one thing I found interesting is that one method for adding comment was using three double quotation marks before comments, and different parts of the code have their own colors. This reminds me of one habit of mine. When I want to add comments in my notes, I would use three fullwidth periods “。。。” before comments and color them with blue. The purpose is to tell myself that “this is my comments, do not mistake it for the author’s idea!” Similarly, the purpose of “”” notation in Python is to tell Python that “This is my comments, do not mistake it for the codes!
屏幕快照 2016-10-26 上午1.12.28

Like the “”” notation in Python, I use three fullwidth periods and blue color to notate comments in my notes.

  1. The second example: I operated some business accounts on several social media platforms, where I set up many auto-responding rules. For example, if the system receives a message including “Interstellar” from a user, an article talking about the physics in the movie of Interstellar would be automatically sent to the user. I did it not by typing in programming codes, but by using simple functions that were coded by programmers and presented in the form of intuitive graphic interfaces with natural language instructions easy for laymen to learn.

What I found difficult in learning Python is to remember the syntax because it is different from natural language. I was wondering, are programmers supposed to remember all those syntaxes? Another question is whether it is possible to use natural languages to program?


References

[i] Jackendoff, Ray. 2002. Foundations of Language: Brain, Meaning, Grammar, Evolution. OUP Oxford.

[ii] Evans, David. 2011. Introduction to Computing: Explorations in Language, Logic, and Machines. United States: CreateSpace Independent Publishing Platform.

Lenna’s Meaning: A Discussion of a Digital Image – Jieshu

The image of Lenna, probably is the most transmitted and analyzed digital image in the world. So, I think it is a perfect example for discussing the relation between digital information and symbolic meanings.

1

Lenna’s Image

In 1973, in order to complete a research paper in image processing, an assistant professor at USC called Alexander Sawchuk scanned a 5.12X5.12-inch-square of a centerfold from Playboy with three analog-to-digital converters[i]. It became the most widely used standard test image thereafter[ii].

1. The Processing and Transmitting of Information Are Irrelevant to the Meaning

Lenna’s image was selected without specific purposes, demonstrating that digital processing, at least the research of digital processing is irrelevant to the meaning. The reason for Sawchuk to choose this image was that he was tired of those boring pictures existing in his system. Just in time, a colleague came by with an issue of Playboy. Being attracted sexually, of course, he decided to use Lenna’s image on Playboy in his paper. The arbitrariness of the selection showed that the meaning of the image had nothing to do with his research. Even though from a hindsight, it was evident that the image was perfect for image processing algorithm tests because it mixed different properties very well, such as “light and dark, fuzzy and sharp, detailed and flat[i]”, those properties were merely physical attributes of pixels on the screen when it was displayed, also irrelevant to the meaning of the image.

屏幕快照 2016-10-17 下午9.58.34

Some examples of image processing tests using Lenna’s image. Clockwise from top left: Standard Lena; Lena with a Gaussian blur; Lena converted to polar coordinates; Lena’s edges; Lena spherized, concave; Lena spherized, convex. Source: http://www.cs.cmu.edu

The digital representation of the image is also irrelevant to the meaning. Sawchuk used three analog-to-digital converters in his scanner. The converters were responsible for red, green, and blue respectively. That is to say, each pixel in the scanned image is digitally represented by and only by three numbers[iii].

2

Each pixel has three numbers representing red, green, and blue respectively.

As we can see from the website (as shown below) of USC Signal and Image Processing Institute (SIPI) where Sawchuk used to work, the original image consists of 512X512 pixels. Each pixel has three numbers representing three colors. Each number is 8 bits (1 byte), so each pixel is 3 bytes. In turn, the whole image is 3X512X512=786,432bytes, that is 768 kb, as shown in the screenshot below.

4

A screenshot of SIPI website including Lenna’s picture.

So, basically, the image that we see as a naked girl with a blue feather hat is merely made of 786,432 numbers. In other words, in the digital representation of the image, there are only numbers, no naked girl, no hat, no feather, and no symbolic meanings.

2. However, the meaning is reserved, and extended

If there are only numbers, why do people enjoy talking about the story about Lenna? Actually, Lenna was seen as a symbol for the field of image processing, so important in computer science that she was invited to many academic conferences and was surrounded by crazy fans immediately. The sale of Lenna’s issue (Nov. 1972) was over seven million copies, becoming Playboy’s best-selling issue ever.

I think the answer lies in three levels.

First, the meaning of the image is reserved “in the physically observable patterns[iv]” of numbers. According to Paolo Rocchi, “information always has two parts—sign and referent. Meaning is the association between the two[iv].” The associations are stored in our brains. For example, great contrast in brightness is perceived by human as an edge. When the edges form a specific pattern, it would be associated with a face.

5

The red lines show edges that are associated with a human face.

Sometimes, we don’t need a high resemblance to perceive a pattern as a face.

6

A hill on the Mars, misperceived as a human face due to the pattern caused by great contrast in brightness.

After we recognized a human face in Lenna’s image, a higher abstraction—the facial expression, body gesture, and accessories—indicates the gender. In this way, we receive the meaning of Lenna’s image—a naked girl wearing a feather hat.

Second, what is the meaning of “a naked girl wearing a feather hat”? It means sexual attraction to males. What does it mean to use a sexually attractive image in a highly academic context? It might mean a male chauvinistic tendency in the academic community. That was why the usage of Lenna’s image caused controversy. A high school girl even published an article on the Washington Post to discuss the negative impact of Lenna’s image to female students who decided to keep away from Computer Science[v].

Third, as a standard test image, Lenna’s image was frequently associated with computer science. Over time, Lenna became a symbol in computer science. In November of 1972, if people saw this image, they would say: “Oh, she is a Playboy Playmate.” But in October 2016, if people saw this image, they would say: “Oh, this is the famous Lenna. She is somehow important in the history of computer science.”

3. Discussion

Although information transmission is irrelevant to human meaning, the design of information transmission is relevant to the human meaning-making process. The reason why only three converters were used by Sawchuk was based on human perception of colors, which was in turn based on the three types of cone cells in the retina. Each type of cone cells could sense a part of the electromagnetic spectrum that was perceived as red, green, or blue. I’m sure the original image on paper reflects infrared ray, too. But infrared is outside human visible spectrum, so the information in the infrared spectrum is useless, at least in this context. That’s why RGB system was enough to represent and transmit most meanings in images.

At last, I was wondering, computers are able to recognize patterns such as faces and houses, too. The associations are stored in algorithms and memory. Does it mean computers are capable of meaning making, too?


References

[i] Jamie, Hutchinson. 2001. “Culture, Communication, and an Information Age Madonna.” IEEE Professional Communication Society Newsletter 45 (3).

[ii] “Lenna.” 2016. Wikipedia. https://en.wikipedia.org/w/index.php?title=Lenna&oldid=737697952.

[iii] Prasad, Aditya. 2015. “Ideas: Discrete Images and Image Transforms.” Ideas. September 19. http://adityaarpitha.blogspot.com/2015/09/discrete-images-and-image-transforms.html.

[iv] Denning, Peter J., and Tim Bell. 2012. “The Information Paradox.” American Scientist 100 (6): 470–77.

[v] Zug, Maddie. 2015. “A Centerfold Does Not Belong in the Classroom.” The Washington Post, April 24. https://www.washingtonpost.com/opinions/a-playboy-centerfold-does-not-belong-in-tj-classrooms/2015/04/24/76e87fa4-e47a-11e4-81ea-0649268f729e_story.html?utm_term=.059043a988f1.

Jieshu – Dark Echo: A discussion about Affordance

I will use an iPad game called Dark Echo to exemplify the affordances and constraints I learned from this week’s reading. Dark Echo is one of the most amazing puzzle games I encountered for years. Basically, it’s a two-dimensional escape game. It orchestrates both simplicity and complexity, demonstrating many design principles explicitly.

1. Interfaces of Dark Echo

1.1. Click to start

According to Donald Norman, icons on a screen only have perceived affordances, instead of real affordances, and they don’t afford clicking because you can click anywhere on the screen. Clicking an icon is a cultural convention, a constraint that encourages some actions such as clicking while discouraging others such as sliding[i].

1

the icon of Dark Echo

The icon of Dark Echo has a mixed affordance:

  • The icon is displayed on an iPad screen, so there is a physical affordance—you can only get feedbacks by clicking on the area of the touch screen of iPad. The area is limited by the physical size of iPad.
  • The icon has a cognitive affordance of clicking. In their Distributed Cognition, Representation, and Affordance, Jiajie Zhang and Vimla L. Patel suggested that cognitive affordance was provided by culture conventions[ii]. According to Norman, a convention is not arbitrary, but fits human cognition intelligently[i]. Here I will briefly discuss why the icon cognitively affords clicking.
    • As you can see, the icon is black while the background is gray, forming a clear contrast. In her Inventing the Medium, Janet Murray suggested that, if the color or size of an item is different, people would expect that some different actions will be triggered by clicking or touching it[iii]. So, I was thinking, what if the contrast is small, will it afford clicking? Here I did two experiments:
      • I change the wallpaper to black and found that when I click my “Game” folder, the folder popped up and turned gray. So the contrast ratio is also great.
      • I dragged the icon out of the folder and put it on the entirely black wallpaper, and found that I still couldn’t help clicking it, because even if the boundary of the icon faded away into the background, the bright white lines radiating from the center of the icon formed a different pattern from background, arousing my expectation that it was clickable.
    • The shape of the icon is a rounded rectangle, just like any other icons on the screen. According to Murray, if two items are close, we will assume that they have similar properties and would behave in similar ways because cultural convention makes us assume that “spatial positioning is meaningful and related to function”.

1.2. Right after you start

After you initiate the app, the whole screen turns black. Then, an image of headphone appears on the screen.

02

The icon of a headphone

The image serves as an icon resembling the shape of a real headphone, as well as an index to indicate the action of putting on your headphone because a real headphone affords wearing. Thus, even if you don’t read the words under the image, you would immediately know that you are recommended to put on your headphone in order to get the best game experience—actually to be scared most seriously. It is a cultural convention, a constraint that encourages you to make a specific action—putting on your headphone.

Then, the name of the game appears on the screen, with a line of “touch to start” under the title.

03After touching anywhere on the screen, you enter into the interface of level choosing. There are two great levels (Darkness and Light), each with forty sublevels. I have reached the thirteenth sublevel of the Light level, so it automatically presents me where I need to continue.

the string connecting levels for choosing

the string connecting levels for choosing

In the level choosing interface, sublevels are represented by numbered squares connected by a fine line, forming a string. You can slide to move the string to the left or right, but not up or down. Inside the squares of locked sublevels are images of a lock.

Here are some perceived affordances and constraints.

  • The squares of sublevels afford clicking because the shapes of and numbers inside the squares highlight them and differentiate them from the black background. Actually, when you click one of the squares, the line of the square becomes thicker, just like a physical button being pressed down.
05

when you touch one square, its line becomes thicker, resembling a physical button being pressed down

  • The line connecting squares affords sliding in the horizontal directions and restricts actions on the vertical directions. The string stretches from the first to the fortieth sublevel. If you are in the middle, the string extends to the edges of the screen, indicating there are something more outside the edge, similar to the swipable touchscreen slider mentioned in the chapter of Affordance in Victor Kaptelinin’s The Encyclopedia of Human-Computer Interaction[iv]. It is also a cultural convention, a constraint that encourages horizontal action, a mental model developed by the one-dimensional extension of stringed things, such as shell jewelry and tying knots.
  • The icon of gear in the lower-left corner and the icon of two bent arrows afford clicking. Without any instruction, you are crystal clear that the gear is for settings. Another cultural convention.

1.3. When you play

After you enter a sublevel of the game, all you will see is a white icon of a pair of shoe prints in the middle of a totally black background, indicating a dangerous dark room.

06

The screen affords touching. So, basically, there is only one thing you can do—touching the screen. Long press means striding in one direction, while click means walking gently, avoiding making noise to wake sleeping monsters. The sound waves of the footsteps are represented by fading white lines stretching from the walking shoe prints, echoing and bouncing around the walls, revealing the shape of the dark room.

07

Just like in real life, the virtual walls you perceived afford stop, while the virtual paths you perceived afford marching. The puddle you perceived with the sound of water and blue sound wave lines afford wade. The doors afford open, and the monsters afford death.

I think, one of the most interesting parts of this game is that it vividly instantiates the concept of affordance proposed by James Gibsoniv. Gibson suggested that there was no need for animals to build a representation of the objective world. The purpose of perception is to gain meaningful information that is important for them to act in the environment. By detecting invariants in the array of energy (e.g. ambient light or sound wave as in the Dark Echo), animals can pick up meaningful information about the environment. This information is of affordance that is the “action possibilities offered by the environment to the animal”. In Dark Echo, you are in a dark room where lights are out. The only important and meaningful information about the environment is in the invariants in ambient sound. When you move around the dark room, you gather the echo information, whereby to build an internal representation. After that, even if you stop moving and the white lines representing sound waves are disappeared, you still remember the position of at least the nearest wall.

2. Design Principles are not built-in properties of software and hardware

In Dark Echo, we can see many design principles. Here are some examples:

  • Affordance: the icon of the game, the numbered square for choosing levels, and the gear-shaped icon all afford clicking.
  • Constraint: The horizontal string for choosing sublevels limits the sliding direction.

I understand they are choices, not simply necessary properties of software and pixel-grid screens. First of all, these principles are not built-in physical properties of the touch screen, because the screen affords an infinite number of touch patterns. Second, they are not necessary properties of software neither, because there are “alternative methods that work equally well[i]”. For example, the horizontal constraint of the string for level choosing can certainly be replaced by a vertical swipable slider or drop-down list. The gear icon of settings can also be replaced by a word “settings”. The “touch to start” line under the game title can be replaced by an icon of a play button.

3. Designs for Symbolic Expression and Attention Controlling

In Dark Echo, I think eighty percent of the design is for symbolic expression. For example, the Roman numbers in the squares for level choosing, the shoe prints representing the foothold of your avatar, the red radiating lines representing monsters, let alone the horrible sound effects, especially the scream when you die.

08

Dying!

However, I doubt that the designs for symbolic expression and designs for controlling our attention are mutually exclusive since the designs for symbolic expression would also attract our attention.

4. Digression: distributed cognition and democracy

In Distributed Cognition, Representation, and Affordance, the authors mentioned Hutchins’ position, that is the cognitive properties of distributed system can be totally different from that of the individual components of the system. This prompts me to think about the nature of democracy. Democracy can be seen as a distributed cognitive system, whose properties are most determined by the interactions among its components (e.g. individual persons or organizations). Are the results of democratic vote always better than those of other political systems? Why do most people consider democracy as the best option, better than centralism and constitutional monarchy? How do we organize an efficient political/economic/business/educational/scientific research… system using distributed cognition? Those are fascinating questions I’d like to probe into.


References

[i] Norman, Donald A. 1999. “Affordance, Conventions, and Design.” Interactions 6 (3): 38–43. doi:10.1145/301153.301168.

[ii] Jiajie, Zhang, and Patel Vimla L. 2006. “Distributed Cognition, Representation, and Affordance.” Pragmatics & Cognition 14 (2): 333–41. doi:10.1075/pc.14.2.12zha.

[iii] Murray, Janet H. 2011. Inventing the Medium : Principles of Interaction Design as a Cultural Practice. Cambridge, US: The MIT Press. http://site.ebrary.com/lib/alltitles/docDetail.action?docID=10520612.

[iv] Victor, Kaptelinin. 2013. “Affordances.” In The Encyclopedia of Human-Computer Interaction, 2nd Ed. https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/affordances.

A Simple Sociotechnical Interpretation of Artificial Intelligence – Jieshu

“The price of metaphor is eternal vigilance[i].” –Herbert Weiner

This is a quotation in Professor Irvine’s Working with Mediology and Actor Network Theory, also what I would like to start with if I were to explain a different way of thinking about design to someone using popular literature fashion to interpret the relationship between technology and society.

When trying to digest a new thing, metaphor is an oversimplified but useful tool for us to grasp the basic concepts, for example, the word “digest” I used just now, “black hole” in physics, or “content” in media technology. We don’t literally eat something we try to understand, and a black hole is not actually a hole, while content is not invariably something inside a container. A black hole in physics is an area in space with a huge mass but a small volume. Content is an interface(s) mediating different components in sociotechnical systems. Metaphor is a black box that conveniently cuts our inquiry.

Hereby, I will try to use Artificial Intelligence (AI) as an example to elaborate how to use a system view to think differently in future designs. I don’t know whether my interpretation is correct since this week’s reading is very abstract, but I’ll try to use the concepts I learned to do that.

When considering the future of AI, there are two polar camps of thoughts.

  • One is that AI will one day outsmart and ultimately destroy us if it is not under our control, represented by Elon Musk and Stephen Hawking.
  • The other is that AI will not be smarter than us and even if it achieves high intelligence, it will not be an existential risk to us, mostly represented by computer scientists and engineers who devote their every work into this field. This point of view is like National Rifle Association who holds the position of technological neutrality and who insists that it is people, not guns that kill people, mentioned both in Bruno Latour’s On Technical Mediation excerpted from Pandora’s hope[ii] and Pieter Vermaas etc.’ A Philosophy of Technology.

First of all, I think they both made a mistake of seeing AI as a mere technical artifact, defined as “physical objects designed by humans that have both a function and a use plan” by Pieter Vermaas etc. in their A Philosophy of Technology: From Technical Artefacts to Sociotechnical Systems[iii], instead of a collective agency in sociotechnical system[iv]. They interpret AI either as a means to fulfill human expectation and goals or as a natural object with a bunch of functions and structures.

The root of their mistake, I think, lies in the dualistic point of view of seeing society and technology as two separate things. Therefore, in their view, AI belongs to the technological domain, having the potential to impact our social domain.

However, AI is more than that. In sociotechnical system point of view, AI is more like an interface that mediate components in a hybrid system with all the people and factors coming from different areas, both technical and social.

AI could be integrated into many systems. Actually, they are woven together seamlessly, serving as both data givers and takers at the same time, forming new systems constantly. For example, an iPhone user asks Siri about the location of a nearby mall. Here, AI is an interface that combines the user, iPhone, Siri, and geographic information to form a new system that is responsible for things happening next, including user checking website of the mall, Google Map suggesting a direction and user calling for an Uber to get there. Another example is a system formed by AI and its designers. Just like the system formed by a gun and a man, whether the AI will do harm to people is not determined solely by the designers or the AI, but by the new system.

In his Artificial intelligence: a modern approach, Stuart Russell took the dimension of “acting rationally” in artifacts and saw AI as a rational agent that does the right things according to the perceived environment[v]. There are two main points here: responding to the environment and doing the right things, of which the former requires sensors to perceive the world and the latter needs powerful algorithms to crunch the data toward optimal decisions. As the principle of modularity we learned weeks ago, both parts have quite a few branches of disciplines being probed into respectively. For example, in the algorithm part, some people prefer supervised learning while others unsupervised learning, both have a lot of smart methods, brilliant proponents, and histories. Meanwhile, a lot of standards have been established and shared by people in this community.

So, AI not only serves as an interface for Siri users, but also combines and mediates engineers, scientists, companies, philosophers, smart devices, industries, standards, information, academic journals, conferences, historical events, databases, institutions, and so on, technically, socially, spatially and timely.

In system design, we should ask two questions: finding the system boundaries and controlling the predictability[iii]. I think, as for the question of whether AI will destroy us, the scope of the system in question should include all the shareholders and factors dynamically. Using rules and instructions to keep the sociotechnical system running. Also, laws and standards should be established to direct these new systems. I’m not able to give specific suggestions for future AI, but I believe that its role in the sociotechnical system and the debates around it would change as time goes by because people discussing these questions and the sociotechnical structure itself are both being mediated continually.


References

[i] Irvine, Martin. n.d. “Working with Mediology and Actor Network Theory: How to De-Blackbox an iPhone.”

[ii] Latour, Bruno. 1999. Pandora’s Hope: Essays on the Reality of Science Studies. Cambridge, Mass: Harvard University Press.

[iii] Vermaas, Pieter E., ed. 2011. A Philosophy of Technology: From Technical Artefacts to Sociotechnical Systems. Synthesis Lectures on Engineers, Technology, and Society, #14. San Rafael, Calif.: Morgan & Claypool Publishers.

[iv] Rammert, Werner. 2008. “Where the Action Is: Distributed Agency Between Humans, Machines, and Programs.” In .

[v] Russell, Stuart J., and Peter Norvig. 2010. Artificial Intelligence: A Modern Approach. 3rd ed. Prentice Hall Series in Artificial Intelligence. Upper Saddle River, N.J: Prentice Hall.

Jieshu Wang – My Personal History of Abacus: From Physical to Mental

I’d like to talk about abacus—an ancient cognitive artifact that is still widely used in China. I began learning abacus to do arithmetic since I was six and interestingly, as time went on, there emerged a virtual abacus in my mind every time I need to calculate, which directly manifests the ideal-material-duality of artifacts mentioned by John Dewey, Marx, and Hegel[i].

1. Abacus as a Cognitive Artifact

According to Donald A. Norman, “a cognitive artifact is an artificial device designed to maintain, display, or operate upon information in order to serve a representational function[ii].” I will discuss how abaci fit in this definition.

The function of abaci is to do elementary arithmetic, specifically, using beads and their dynamic spatial relationships to represent numbers and their logic relationships. I used to play a 1/4 abacus that was suited to decimal calculation. It consisted of a wooden frame with beads sliding on vertical rods. Each rod was divided into two parts by a horizontal bar, above which each of the beads represents the number of five, while the lower beads represent ones.

The structure of an abacus. Source: www.icespune.com

The structure of an abacus. Source: www.icespune.com

An abacus could maintain and display information. As long as it stays undisturbed, the number represented by it will be recognized by anyone who knows the rules, even though he/she might not know what the number represents, say, the quantity of cows or the revenue of a pawnshop. Even if the positions of beads are disturbed—they are really apt to be messed up—it still maintains and displays the information of the initial state plus the action of disturbance.

Moreover, abaci could be used to operate information. Sets of rules must be learned in order to do that. For example, adding seven to a place involves following rules:

  • If the place you want to add seven has less than three lower beads in up position, then move other two lower beads to up position, and then:
    • If the upper bead is in its default state, i.e. the up position, then, move it to the low position;
    • If the upper bead is already in low position, move it to up position, and add one to the left place;
  • If the place you want to add seven has at least three lower beads in up position, then move three of them to low position, and add one to the left place.

Sounds tricky. A table of pithy formula, however, is used for memorizing the rules, in which, the entry for adding seven is simply “七去三进一,七上二去五进一”. Thus, playing abacus achieves two things:

  • From a system view, abacus offloads people’s cognitive efforts, which are limited in multi-digit and long serial calculation by unaided brain capacities, and ultimately enhances the aggregated system performance.
  • From an individual view, the initial tasks are changed, replaced by a serial of more physical new tasks as follows:
    • remembering the pithy formula (a process called precomputation by Edwin Hutchins);
    • moving the beads accordingly;
    • and translating the final state of beads into numbers.

2. Distributed Cognition of Abacus

Abacus has a long history in China, dating back to 2nd century BC. The shape, configuration, and formula have also evolved a long period to form the modern state. There are many stories, arts, and even music around the theme of abaci. For example, in the long scroll of Along the River During the Qingming Festival painted by Zhang Zeduan during Song dynasty, an abacus appeared on the counter of a medicine store[iii], as shown in the red circle below. Abacus has become a symbol for accounting in China and part of their cognitive process. Actually, in the absence of electronic calculator, learning abacus is an essential part in accountant training programs decades ago.

An abacus in Along the River During the Qingming Festival painted by Zhang Zeduan

An abacus in Along the River During the Qingming Festival painted by Zhang Zeduan

So, from a distributed cognition perspective, the cognitive processes involved in abacus do not only exist in individuals’ mind, but also distribute across social groups, through time, and in its unique culture.

3. My Mental Abacus

As Hollan, Hutchins and Kirsh mentioned in their Distributed Cognition: Toward a New Foundation for Human-computer Interaction Research, distribution of cognitive process may involve coordination between internal and external structure [iv]. This reminds me of my mental abacus, an interesting experience that internalizes external artifact.

I don’t know what cognitive processes are going on in other people’s heads when they are calculating mentally, but for me, there exists a mental abacus in my mind, maybe due to heavy exposure to abacus training when I was little. That is to say, when I calculate, a vivid, three-dimensional image of abacus emerges in my head, with all the beads in their default positions. Then, I calculate by using a virtual hand to move the virtual beads according to the pithy formula and physical laws and then translate the final virtual beads position into numbers. The image can be enhanced by simultaneous actual hand movements. What is more strange is that sometimes, I even think my finger could feel the texture of the virtual beads—plastic, white, light-weighted, smooth, and cool like marble, attributes that may be associated with the forming of mental abacus and of which the specific neural circuits are retrieved by the emergence of mental abacus.

What is similar to the US Navy ships experiment by Hutchins is that, the cognitive processes required for a task are different from those processes actually used in the task, and just like the navigators who feel bearing as a direction in space relative to the position of their body instead of numbers [iv], this cross-modal representation is easy to go wrong. If the numbers are too long, then my mental abacus is easy to get fuzzy.

Nevertheless, I always wonder, what if I build a mental slide rule, can I calculate logarithmically?

4. Question

At last, I want to discuss a question. In his Cognitive Artifacts Chapter in Designing Interaction, Norman insisted that artifact could not change individual’s capacities. Rather, it just changes the nature of task performed by the person, in turn, extends the cognitive capacities of the whole system. I think, however, it contradicts with the co-evolution theory of language and brain in Terrence W. Deacon’s The Symbolic Species[v].

According to Cole, language is a cognitive artifact, just like hammers and tables[i]. Meanwhile, from his interdisciplinary study, Deacon stressed the importance of co-evolution of the human brain with language and symbolic cognition that then enabled human culture and technologies. This implied that artifacts do enhance individual’s capacities, both cognitively and physiologically—maybe through rewiring some vital neural circuits, otherwise, co-evolution would not have happened.


References

[i] Cole, Michael. 1996. “On Cognitive Artifacts.” In Cultural Psychology: A Once and Future Discipline. Cambridge, Massachusetts: Harvard University Press.

[ii] Norman, Donald A. 1991. “Cognitive Artifacts.” In Designing Interaction, 17–23. New York: Cambridge University Press.

[iii] Zhou, Raymond. 2014. “Honor the Past, Live in the Present.” China Daily, International Ed., January 4.

[iv] Hollan, James, Edwin Hutchins, and David Kirsh. 2000. “Distributed Cognition: Toward a New Foundation for Human-Computer Interaction Research.” ACM Transactions, Computer-Human Interaction 7 (2): 174–96.

[v] Deacon, Terrence William. 1997. The Symbolic Species: The Co-Evolution of Language and the Brain. 1st ed. New York: W.W. Norton.