Author Archives: Huazhi Qin

Translate Like A Human


Warning: Use of undefined constant user_level - assumed 'user_level' (this will throw an Error in a future version of PHP) in /home/commons/public_html/wp-content/plugins/ultimate-google-analytics/ultimate_ga.php on line 524

— De-blackbloxing Google Translate

Huazhi Qin

Abstract

Machines gradually take over the translation tasks in real life. As machine translation (MT) develops, different methodologies have been applied to this field and then generated multiple distinct translation systems. Rule-based machine translation (RBMT), statistical machine translation (SMT), and neural network machine translation systems (NMT) are the three most important systems. Among them, Google Translate uses Google’s neural network translation (GNMT) system, one of state-of-the-art NMT model, to achieve a breakthrough in MT. GNMT is a model integrating four components: recurrent neural network, long short-term memory, encoder-decoder architecture, and attention mechanism. However, the accuracy of Google Translate still faces challenges in terms of internal translation process and integration with the audio and image input.

 

Introduction

According to Russell, machine translation (MT) utilized the power of machines to achieve “automatic translation of text from one natural language (the source language) to another (the target language)”. (Russell et al., 2010).  As the increasing interactions all over the world, the demand to overcome language barriers has expanded. Due to the human translation asks for a lot of efforts and time, people sought help from the computer to take over this task. How to improve machines’ performance in translation has become one of the most important topics in computer science.

Since the 1950s, the scholars have tried applying different methodologies to machine translation (MT) to bridge the gap between machine and human translations. They developed multiple distinct translation systems. Among them, rule-based machine translation (RBMT), statistical machine translation (SMT), and neural network machine translation systems (NMT) are three core systems.

In 2016, Google introduced its updated translation service Google Translate using Google Neural Machine Translation System (GNMT), which marked a great improvement of machine translation. By integrating deep learning technology, Google Translate implemented the “attentional encoder-decoder networks” model which contributed to reduced translation errors by an average of 60% compared to Google’s phrase-based production system. (We et al., 2016)

Nevertheless, the current machine translation system has still been criticized for its limited accuracy.

 

Rule-Based Machine Translation (RBMT) — Linguistics

1.Translation process

Rule-Based Machine Translation (RBMT) is the oldest approach to make machines translate. Basically, it simulates the process of constructing and deconstructing a sentence based on language-specific rules, following one type of automatic translation process called Bernard Vanquois’ Pyramid (Figure 1).  The whole translation process experiences three steps – analysis, transfer, and generation – based on two sources – dictionaries and grammars. The implement of linguistics rules is the core feature.

Figure 1 Bernard Vauquois’ Pyramid (source: systransoft.com)

According to Evans, a language is composed of primitives (the smallest units of meaning) and the means of combination (rules for building new language elements by combining simpler ones). (Evans, 2011) SMT also focuses on these two elements. To be more specific, firstly, the machine analyzes the grammatical category and links for every word of the sentence in the source language based on the perspectives of morphologic, semantic, and syntactic rules. (Figure 2) Secondly, every word in the source language is transferred to the adequate lexical items in the target language according to dictionaries. At last, the complete target sentence is generated by synthesizing every part in step 2 according to the grammatical rules in the target language.

Figure 2 Analysis of the sentence in the source language (source: systransoft.com)

2. Limitations

When RBMT transfer meanings, there are obvious limitations in the following three aspects.

Firstly, the quantitative need of dictionaries and grammatical rules is hard to be fulfilled. The manual development of linguistic rules can be costly.

Secondly, RBMT is somewhat a language-specific system which means that it often does not generalize to other languages.

Thirdly, it only works for plainly-structured sentences while hard to deal with complicated ones, especially ambiguous and idiomatic texts. Human languages are full of special cases, regional variations, and just flat out rule-breaking. (Geitgey, 2016)

 

Statistical Machine Translation (SMT) – Probability Calculation

1.Translation process

Statistical machine translation (SMT) dominates the field of MT from the 1980s to 2000s. Unlike RBMT, no linguistic or semantic knowledge is needed in SMT. Rather, parallel corpora become the foundation of machine translation. In addition, SMT systems are not specially designed for any specific pair of languages.

Regarding the translation process, SMT applies a statistical model to machine translation and generates translation based on the analysis of bilingual text corpus. (Synced, 2017) The key feature is the introduction of statistics and probability.

There are also three steps in the process: 1) break the original sentence into chunks; 2) lists all possible interpretation options for each chunk (Figure 3); 3) generate all possible sentences and find the one with the highest possibility. The “highest possibility” means the sentence which sounds the “most human”. (Geitgey, 2016)

Figure 3 A large number of possible interpretations (source: medium.com)

2. Limitations

Despite statistical machine translation overcomes many shortcomings of RBMT, it still faces many challenges, especially in terms of sources and human intervention.

As regards sources, although no linguistic rules are required, statistical machine translation requires a great deal of training data about double-translated texts. (Geitgey, 2016) As for human intervention, the SMT system is consist of numerous separate sub-components and rely on multiple intermediary steps (Figure 4) which requires a lot of work from engineers. (Zhou et al., 2018) Excessive human intervention will definitely influence translation results.

Figure 4 SMT is consist of many intermediary steps (source: skynettoday.com)

 

Neural Machine Translation – Google Translate

Neural machine translation (NMT) is considered to be born in 2013 when two scientists applied deep learning neural networks to machine translation and proposed a novel end-to-end encoder-decoder structure. In the next few years, sequence-to-sequence learning using the recurrent neural network (RNN) and long short-term memory (LSMT) has been gradually integrated into NMT. (Synced, 2017)

However, NMT systems are criticized for its computationally expensive both in training and in translation inferences. Also, NMT systems lack practicability in some cases, especially when encountering rare words. (Wu et al., 2016) Thus, original NMT was rarely put into practice due to its poor performance in translation speed and accuracy.

In 2016, Google Brain team announced Google’s neural network translation (GNMT) system which addressed many of the issues. GNMT help Google Translate achieve state-of-the-art translation results. It reduces translation errors by an average of 60% when compared to Google’s previous phrase-based production system. (Wu et al., 2016) Then, I will de-productize Google Translate, one of the most advanced applications of NMT, to elaborate the how NMT works.

1.De-blackboxing Google Translate

According to Google Brain team, Google’s neural network translation (GNMT) is a model consists of a deep LSTM network with 8 encoder and 8 decoder layers using residual connections as well as attention connections from the decoder network to the encoder. (Wu et al., 2016) There are four major features in GNMT: recurrent neural network, long short-term memory, encoder-decoder architecture, and attention mechanism

A. Recurrent neural network (RNN)

Unlike previous machine translation, people understand the sentences, contexts, and information based on the understanding of previous ones. In other words, human thoughts have persistence. The introduction of recurrent neural network (RNN) brings machine translation an ability of memory, letting machine think like a human. The recurrent neural network contains loops which allow information to persist. (Github, 2015) In also means that the previous calculations can further influence change the results of future outputs.

However, traditional RNN sometimes faces the problem of long-term dependencies. When the machine has to trace further back to narrow down and determine the next word. (Github, 2015) For instance, when predicting the last word in the text “I was born and grew up in China… I can speak Chinese.” The close word “speak” only deliver the clue that the next word is most likely to be a language. The further previous contexts “China” can help narrow down to the specific word “Chinese”. In short, the gap between relevant information become wider.

B. Long short-term memory (LSMT) (Figure 5)

In order to address this issue, long short-term memory (LSMT) networks are applied to machine translation. At any given point in LSMT, it accepts the latest input vector and produces the intended output using a combination of the latest input and some ‘contexts’.

Figure 5 an unfold LSMT (source: codesachin.wordpress.com)

The horizontal line, namely the cell state, running through the top of the diagram. It conveys information straight down the entire chain. The structures, consisting of a sigmoid neural net layer and a pointwise multiplication operation, call gates. The three gates in an LSMT regulate the information flow, deciding what old information should be kept and what new information should be included in the next cell state. When generating the results, the gates only output the results needed. (Github, 2015) The whole process is based on a ton of example input and finally generates a filtered version. (Srjoglekar246, 2017)

As regards the actual translation process, for instance, the cell state might include the gender of the present subject to generate the proper pronouns. When encountering a new subject, the gender information of the old subject will be excluded. Then, a word relevant to the verb might be generated in the output step, since it is most likely to come following a subject. (Github, 2015)

C. Encoder-decoder architecture

Based on LSTMs, Google Translate built up its encoder-decoder architecture. Encoding can be seen as the process and result of the analysis. Decoding is the direct generation of the target sentence. Basically, the decoder network is similar to the encoder one. Thus, I will only discuss the encoder network in details below.

At the beginning, the sentence will be input into the system word by word. The encoding process refers to that the word will be encoded into a set of numbers. (Geitgey, 2016) The numbers represent the relative position of each word in a word embedding table and reflect its similarity with other objects. (Systransoft, 2016) (Figure 6)

Figure 6 the encoding process (source: medium.com)

There are two approaches Google Translate use to influence the “quality” of that numbers. The first one is bi-directional input, which means that the entire sentence will be input in reverse order. The following words also influence the meaning and “context” of the sentence. Thus, the “position” of the word will be more accurately output.

The second one is the principle of layering. According to Universal Principles of Design, layering refers to the process of organizing information into related groupings in order to manage complexity and reinforce relationships in the information. (Lidwell, 2010) The encoder network is essentially a series of 8 stacked LSTMs. (Figure 7) Every layer is impacted by the lower layer. The pattern of the data becomes more and more abstract when the information goes to higher layers which contributes to represent the contextual meanings of words in the sentence. (Srjoglekar246, 2017)

Figure 7 GNMT’s encoder networks (source: codesachin.wordpress.com)

In short, the encoder-decoder architecture can be displayed in Figure 8.

Figure 8 GNMT’s encoder-decoder architecture (Schuster, 2016)

D. Transformer – a Self-Attention Mechanism

However, the outputs of encoding process will bring too many complexities and uncertainties to decoder network, especially when the source sentence is too long. (Cho et al., 2014) In order to better process encoding, Google Translate build up a self-attention mechanism called Transformer between two phases. (Uszkoreit, 2017)

Transformer enables the neural network to pay more attention to relevant parts of inputs focus on relevant parts of input when encoding. (Synced, 2017) (Figure 9) So as to determine the level of relevancy, Transformer lets the system to look back at the input sentence at each step of the decoder stage. Then, each decoder output depends on a weighted combination of all the input states. (Olah & Carter, 2017)

Figure 9 the integration of Transformer (the purple lines denote the weights) (source: googleblog.com)

2. Limitations

Although GNMT is the state-of-the-art model in current MT field, the accuracy and reliability of its translation results still face lots of challenges.

Regarding the system itself, as what mentioned above, the filtering process is based on examples. Thus, it is important to collect a large amount of training and test data which can provide the diverse vocabulary and their usages in various contexts. In addition, it is hard to detect mistakes and inaccuracy of the outputs and then difficult to correct them, especially the omission of the information.  (Zhou et al., 2018) Meanwhile, the rare word problem, monolingual data usage, memory mechanism, prior knowledge integration, coverage problem and so forth are also needed to be further improved. (Synced, 2017)

Furthermore, in addition to text input, Google Translate accept input in the formats of audio and image, which raise higher requirement to natural language processing. According to the information theory, the omissions and errors will occur in the step that transfer audio and image information to the source that the system process. Then the accuracy of results will be definitely harmed.

 

Conclusion

Although still facing challenges, Google’s neural network translation system overcomes numerous shortcomings of RBNT, SMT, and original NMT and make huge improvements in terms of data amount, fluency, accuracy and so on. It brings a new possibility to the field of machine translation. This field is undergoing fast-paced development. It is reasonable to believe that the application of NMT will continue to achieve greater breakthroughs and then lead the future path of machine translation.

 

References

Cho, Kyunghyun, Merrienboer, V., Bart, Caglar, Fethi, . . . Yoshua. (2014, September 03). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. Retrieved from https://arxiv.org/abs/1406.1078

Evans, D. (2011). Introduction to computing explorations in language, logic, and machines. Lexington, KY: Creative commons. Pp. 20-21.

Geitgey, A. (2016, August 21). Machine Learning is Fun Part 5: Language Translation with Deep Learning and the Magic of Sequences. Retrieved from https://medium.com/@ageitgey/machine-learning-is-fun-part-5-language-translation-with-deep-learning-and-the-magic-of-sequences-2ace0acca0aa

How does Neural Machine Translation work? (2016, October 13). Retrieved from http://blog.systransoft.com/how-does-neural-machine-translation-work/

Lidwell, William, Kritina Holden, and Jill Butler. Universal Principles of Design. Revised. Beverly, MA: Rockport Publishers, 2010.

Olah, C., & Carter, S. (2017, August 31). Transformer: A Novel Neural Network Architecture for Language Understanding. Retrieved from https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html

Russell, S., Davis, E., & Norvig, P. (2010). Artificial intelligence: a modern approach (3rd ed.). Upper Saddle River, NJ: Prentice Hall.

Srjoglekar246. (2017, February 19). Understanding the new Google Translate. Retrieved from https://codesachin.wordpress.com/2017/01/18/understanding-the-new-google-translate/

Synced. (2017, August 17). History and Frontier of the Neural Machine Translation. Retrieved from https://medium.com/syncedreview/history-and-frontier-of-the-neural-machine-translation-dc981d25422d

Schuster, M., & Le, Q. (2016, September 27). A Neural Network for Machine Translation, at Production Scale. Retrieved from https://ai.googleblog.com/2016/09/a-neural-network-for-machine.html

Understanding LSTM Networks. (2015, August 27). Retrieved from http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Uszkoreit, J. (2017, August 31). Transformer: A Novel Neural Network Architecture for Language Understanding. Retrieved from https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html

Wu, Mike, Chen, Zhifeng, Mohammad, Wolfgang, . . . Hughes. (2016, October 08). Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. Retrieved from https://arxiv.org/abs/1609.08144

Yeen, J. (2017, October 06). AI Translate: Bias? Sexist? Or this is the way it should be? Retrieved from https://hackernoon.com/bias-sexist-or-this-is-the-way-it-should-be-ce1f7c8c683c

Zhou, S., Kurenkov, A., & See, A. (2018). Has AI surpassed humans at translation? Not even close! Retrieved from https://www.skynettoday.com/editorials/state_of_nmt

The Architecture of Google Drive


Warning: Use of undefined constant user_level - assumed 'user_level' (this will throw an Error in a future version of PHP) in /home/commons/public_html/wp-content/plugins/ultimate-google-analytics/ultimate_ga.php on line 524

Huazhi Qin

Google Drive is a cloud file storage and synchronization service provided by Google. It is a great example to understand the design rules of the Internet, the open, standards-based, device-independent architecture for the Web.

Its service is made up of four components: online interface, sync applications, mobile apps, and storage plan. Users can store files on their servers, synchronize files across their devices, and share files with other users. All these actions can be done by operating its interface. Users can set up folders and fold up the uploaded files based on their needs. Many layers can be seen. Also, the searching bar on the top provides an easy approach to find our target files among layers.

Regarding its architecture, Google Drive displays a typical client-server model. It, in server-side, provide service-based access to application data for users, in client-side. As what Professor Irvine says, it can be considered as a “hypertext” system. The process that users upload, store and download the files is actually a process of encoding and decoding between Web browsers and individual devices.

Meanwhile, sharing is one of the main features of Google Drive, which connect unlimited users through its servers. Files sharing can be easily accomplished via public folders or shared links. Also, simultaneous editing is another form of “sharing”. Rather than isolated, users are closely connected although they upload and edit the files on their own devices in different locations. This shows that Google Drive builds up a distributed network system across unlimited client/server implementations. (Irvine) In addition to website interface, it also offers apps available for Windows and macOS computers, and Android and IOS mobile phones and tablets. It means that it provides a model of interoperability for any software or hardware manufacturer. (Irvine)

Furthermore, synchronization is another core services. Its current Backup and Sync service can automatically upload files from individual devices to their drives when devices connect to the Internet. Also, real-time file sync works when users edit the files online, which means that behind the screen, the backup process remains unstopped.

At last, for its media function, a web-based office suite, including Docs, Sheets, and Slides, is integrated into Google Drive. It allows users to create and edit documents, spreadsheets, and presentations online while collaborating in real-time with other users. Also, multiple forms of files and media can be viewed on the web.

 

(Wikipedia)

 References

  • Martin Irvine, Intro to the Web: Extensible Design Principles and “Appification”
  • “Using Google Drive: A Case Study”. http://www.virtualitworld.co.in/using-google-drive-a-case-study/
  • “Google Drive”. https://wp.stolaf.edu/it/google-drive/

 

Weekly Writing for Week 11


Warning: Use of undefined constant user_level - assumed 'user_level' (this will throw an Error in a future version of PHP) in /home/commons/public_html/wp-content/plugins/ultimate-google-analytics/ultimate_ga.php on line 524

Huazhi Qin

We access the Internet every day. Nearly all apps in our electronic devices build up connections with the Internet and incorporate it as a part of their functionalities. However, most of the users merely consume the “content” on it, such as all kinds of media, including messages, videos, texts, and images, displayed on the apps or online platforms. As what Professor Irvine said, the internet or web for most people is simply what is experienced on, and through, screens and graphical interfaces – the “content” that they can access or transmit, and have it display or play through a network equipped device. (Irvine) In short words, these apps we use are all blackboxed products with a network of interdependencies.

Take Spotify as an example. Spotify provides digital music streaming services. Users can get access to millions of songs, podcast, and videos from artists all over the world. It is a proprietary multimedia application streaming servers to stream audio and video to their users. The permission from the major record labels to use their tracks has been done before users access to it. Also, it uses digital rights management (DRM) protection to those copyrighted works. In addition, it lists terms and conditions to regulate users’ behaviors.

Besides, Spotify builds its own infrastructure based on a collection of tech stack. For instance, it uses Java as its language, Cassandra for the database, Pingdom for website monitoring, as well as Google Cloud Dataflow, Docker, Helios and so forth. (The whole tech stack can be seen at https://stackshare.io/spotify/spotify)

Furthermore, when users use it on different devices, they usually find it provides some services. However, Spotify has to make adjustments to different standards. For instance, the audio settings will be adjusted based on different platforms, devices or network connections.

 (Wikipedia)

In addition, according to Spotify itself, it has now built up a community of 191m users, including 87m subscribers, across 78 markets. The Internet mediates different telecommunications regimes in different countries, conflicts in private and governmental investment and ownership of network infrastructure, agreements on standards, market and business rivalries, intellectual property regimes and control of content, and policy and regulatory issues. Until now, it is still unavailable in China.

References

“Spotify – Spotify Tech Stack.” StackShare, stackshare.io/spotify/spotify.

Martin Irvine, The Internet: Design Principles and Extensible Futures.

 

Interaction Design and Google Maps


Warning: Use of undefined constant user_level - assumed 'user_level' (this will throw an Error in a future version of PHP) in /home/commons/public_html/wp-content/plugins/ultimate-google-analytics/ultimate_ga.php on line 524

Huazhi Qin

As what Janet Murray mentioned in the book, all things made with electronic bits and computer code belong to a single new medium, the digital medium, with its own unique affordances. (Murray) In other words, a digital artifact can be considered as part of a medium with four affordances – encyclopedic, spatial, procedural, and participatory – more or less. That reminds me of how the Google Maps app is elaborately designed to involve users in.

Obviously, Google Maps displays an encyclopedic trait. Its database covers almost every country, state, city, and even every street and building all over the world. The type of services it offers range from location searching to route planning and navigation, as well as changing real-time traffic status. This show its unequaled storage potential far beyond the legacy paper maps. To some extent, the large amount of information included in Google Maps show its encyclopedic trait in the spatial layer. When the “time”, namely real-time information, is displayed in it, the map is a dynamic medium and is encyclopedic in the temporal layer.

According to Murray, the spatial affordance refers to virtual spaces the designers created that are also navigable by the interactors. (Murray) Its graphics users interface exemplifies the Google Maps as a spatial medium. To be specific, the searching bar, menus, and manipulable icons are all the examples. The concept of modularity, or black-boxing, and semiotics, or human symbol systems, we learned in the previous weeks can also be seen here. For instance, multiple specific locations are categories and folded into restaurant, bars, and so forth. When users want to find more about the restaurant nearby, the icon of knife and folk will lead them to what they are looking for.

Furthermore, how Google Maps “teaches” users to use the app shows its participatory affordance, Murray said that “the designer must script both sides, interactor, and digital artifact so that the actions of humans and machines are meaningful to one another”. Also, Google Maps is the digital design which “is selecting the appropriate convention to communicate what actions are possible in ways that the human can understand”. (Murray) The blue dots show the user’s current location. The red pin shows the location the user is searching. Also, in real-time status, the red route shows traffic congestion, while the green lines show where is no traffic delays.

At last, procedural affordance can be seen when the users enter something ambiguous, in which case users will be led to relevant information based on some keywords. Google Maps also shows “no result” to deal with absent information.

Reference

Janet Murray, Inventing the Medium: Principles of Interaction Design as a Cultural Practice. Cambridge, MA: MIT Press, 2012.

 

Graphical User Interface


Warning: Use of undefined constant user_level - assumed 'user_level' (this will throw an Error in a future version of PHP) in /home/commons/public_html/wp-content/plugins/ultimate-google-analytics/ultimate_ga.php on line 524

Huazhi Qin

When first invented, the computer meant something too professional and hard for ordinary people to use. Nowadays, a computer can be operated by everyone without actually “knowing” what computer is and how it works. In other words, complicated functions can be achieved in some simple operations.

An easy-to-use interface is essential to make this happen. According to Engelbart, the development of human intellectual capability experienced four stages– conception manipulation, symbol manipulation, manual, external, symbol manipulation, and automated external symbol manipulation. (Engelbart) Computer lies in stage 4 and displays a close relationship with the prior three stage, especially symbolic system. For instance, a musical note icon represents a music software. A book with a letter refers to a built-in dictionary. The same as a microphone icon, a magnifying lens icon, a camera icon. It can be seen as a kind of simulation.

In addition, Current computers usually display the graphical user interface (GUI) or WIMP system. WIMP stands for “windows, icons, menus, pointer”, which provide ease of use to non-technical people. Beyond the use of icons (or symbols) mentioned above, a window shows what is running. A text or icon-based menu organizes and displays the function users could select. Also, a pointer visualizes users’ movements. In short, users can easily figure out how to do and what they are doing.

Furthermore, the expanding use of the computer, beyond the original boundary of military, government, and business, also help open computers to non-technical people. For instance, Hyperlinking was brought in to directly link to other documents. Photographs can be modified, combined, delivered or inserted in a computer. Video viewing, editing, and sharing are also included in a computer. It keeps computers evolving.

Meanwhile, simulation can also be seen in this progressing process. Manovich described a computer as a “remediation machine” in Software Takes Command. (Manovich) It means computer always imitate older. Including the symbol manipulation mentioned above, the adoption of common interface conventions and tools provides users with clues to operate something new. Users can operate new software based on their past experience. As to how Kay described Dynabook, “simulation is the central notion”.

References

Douglas Engelbart, ”Augmenting Human Intellect: A Conceptual Framework”

Lev Manovich, Software Takes Command, pp. 55-106, on the background for Allan Kay’s “Dynabook” Metamedium design concept.

Natural Language and Programming Language


Warning: Use of undefined constant user_level - assumed 'user_level' (this will throw an Error in a future version of PHP) in /home/commons/public_html/wp-content/plugins/ultimate-google-analytics/ultimate_ga.php on line 524

Huazhi Qin

This week is my first time to learn a programming language, Python. The most impressive point to me is that it is not a technique with multiple special terms. Rather, it is actually a language that represents a new way of thinking and expression in modern life. Just as what Evans said, “understanding computing illuminates deep insights and questions into the nature of our minds, our cultures and our universe”.

Language acts as a communicative tool in our daily life. It helps us deliver interpretable information among people. When the computer becomes a part of our life, we have to learn how to communicate with computers. Thus, the programming language is generated to be read and written by humans to create programs that can be executed by computers.

During taking the Python tutorial lesson, I realized the many differences lying in natural languages and programming languages. According to Evans, natural languages are no longer applicable to a computer in terms of its complexity, ambiguity, irregularity, uneconomic, and limited means of abstraction.

First, programming languages should be absolutely explicit on “what”. Natural languages are ambiguous in many cases. One word could refer to two or more different meanings. For instance, a pronunciation “ta” in Chinese can represent three English nouns “he”, “she”, and “it”. However, every string (or words) in programming languages should only lead to one thing. The string “it” only refers to what you assign to it in the code.

Second, “how” should be described step by step. Because unlike human beings, computers act without basic “common sense” or any knowledge background. In other words, steps should be described one by one, in order to be processed by computers. In the tutorial, when my code does not run successfully, the reason usually roots in that something is “not defined”, which shows that one step was missing in my code.

Also, the programming language should be abstract and describe languages with small representations. According to Evans, natural languages have limited means of abstraction and always too complex and uneconomic.  Too many details will be brought into computation if using natural language. Lots of replacement can be seen in Python. For instance, I can use string “x” to represent a list of numbers. A “for” loop can stand for an action that replaces a particular string with each item in a list.

However, similarities can still be seen in natural languages and programming languages. As for natural language, users attach objects with sound patterns which generate meaning.  In computer science, a language is “a set of surface forms and meanings and a mapping between the surface forms and their associated meanings”. Both two kinds of languages help deliver meanings of users.

References

David Evans, Introduction to Computing: Explorations in Language, Logic, and Machines. Oct. 2011 edition.

Weekly Response – Week 7


Warning: Use of undefined constant user_level - assumed 'user_level' (this will throw an Error in a future version of PHP) in /home/commons/public_html/wp-content/plugins/ultimate-google-analytics/ultimate_ga.php on line 524

Huazhi Qin

In everyday life, we take it for granted that we use computers to directly communicate with others. Computer (actually text software or other chatting apps) can transmit the message we send to other people. Then when people receive that message, they will naturally get what we want to say.

Thus, it is really interesting for me to think about the paradox proposed in The Information Paradox that “how can a system process information without regard to its meaning and simultaneously generate meaning in the experience of its users”. (Denning and Bell, 2012) What commonly thought is that meanings are inherently incorporated in the message or information we sent. But actually, there are not.

Shannon demonstrates in his information theory that information can be transmitted and received accurately by processes that do not depend on the information’s meaning. (Denning and Bell, 2012) Just like, a message can be sent to and displayed in another screen whatever the meaning is. So can the social media post and digital images. Whatever the meaning is, the transmission process can be achieved technologically.

So where is the meaning in this process? According to Denning and Bell, Information always has two parts, sign, and referent. Meaning is the association between the two. In other words, when people attach signs to referents, a message (or a post, an image) has a particular meaning. As what professor Irvine says, the meaning is not “in” the sign system, it is the system. (Irvine)

In addition, how receivers interpret the message he or she receives determines whether an information is successfully transmitted, not only technologically. Thus, senders and receivers are required to own shared knowledge and understanding of the connection between human sign and symbol structures with physical forms. For instance, in online chatting among Chinese young generation, a smiling face always does not mean a “smile”, happy, or other positive emotions. It is usually used to express one’s impatience or ironic thoughts. However, elder people always tend to think it as a real “smile”. At this time, the information is not successfully transmitted, despite the simile face can be seen by senders and receivers.

Nowadays, we experience computer-mediated communication every day. In computer-based context, a communication system can be considered to be accomplished by encoders, channels, and decoders. Also, the metaphors of “encoding” and “decoding” imply that a coding process puts something “in” signal units which are then taken “out”. (Irvine) Two levels of encoding and decoding exist in this process. From the technological level, the text is encoded and transmitted in form of bytes (or data) and decoded or interpreted by software to display on the screen. From meaning transmission level, senders connect their sign and symbolic structure to the texts and generate the meaning. Receivers interpret the texts to get the meaning.

Thus, only a combination of information theory and semiotics can be considered to truly “deliver a message”.

 

References

Martin Irvine, Introduction to the Technical Theory of Information

Peter Denning and Tim Bell, “The Information Paradox.” From American Scientist, 100, Nov-Dec. 2012.

Affordances of books: from paper to digital version


Warning: Use of undefined constant user_level - assumed 'user_level' (this will throw an Error in a future version of PHP) in /home/commons/public_html/wp-content/plugins/ultimate-google-analytics/ultimate_ga.php on line 524

Huazhi Qin

According to Norman, affordances are relationships. (Norman,1999) To be more specific, an affordance is a relationship between an object and a person. Also, it enables a particular kind of interactions in between, which is determined by the properties of the object and the capabilities of the person who is interacting with the object. The book is one of the most important artefacts which numerous affordances.

As human beings’ long history of reading print books, a series of habits have been established, which is constrained by properties of physical books and individuals’ abilities. As for appearances, front cover presents the core information including title and author, while back cover always remain blank. Readers can easily figure out which face they should start with. Also in order to turn to the next page, readers have to flip over rotating the side which is bound up. Besides, information such as title is printed on the spine. It allows readers to stack books in bookshelf with the only spine facing outside. Regarding inside pages, Papers and ink constrain readers to read the book under the light. Fixed font sizes define the distance between eyes and papers. Also, margins allow readers to write down their thoughts or take notes. Contents and page numbers afford to quickly search and locate chapters.

Furthermore, different books vary in weight and size and influence how and where people read or use it. Those lightweight books with a medium size are easy to be carried in hand and bags, which means they are portable and people can take one along and read whenever and wherever they want to. While other books, like dictionaries, are often too heavy and large to take outside.

However, affordances are not always the same in every case. They might be different in different cultures. According to Professor Irvine, the inferences we make are learned from socialization into what’s normative in using all the built “stuff” in a culture. (Irvine) For instance, reading directions differ in different countries. In Japan, books are read vertically from right to left. In China, it should be done horizontally from left to right.

As what Professor Irvine said, societies are always hybrid with many co-existing technologies, contexts of use, and cultural genres. (Irvine) Print books and eBooks now co-exist in our reading experience.

Take Kindle as an example. How we respond and interact with it inherits from how we deal with paper books and also change at the same time. Kindle tries to provide similar reading experiences. The texts presented following the habitual reading direction. And it simulates how reader flip over a printed paper – tap the right side of the screen to turn to the next page and tap on the left side to get back to the previous page. Also, it employs e-ink technology to make the screen somewhat paper-like.

Meanwhile, many reading experiences have been improved in digital media displays. It expands the numbers of books one can bring. “My library” acts as a virtual bookshelf, indicating where to find the booklist. Its adjustable font sizes loosen the reading distance restrictions. The function of quick searching and locating can be used based on keywords or terms rather than chapter titles. Besides, the highlighting sentences and notes readers take are collected in a “notebook” and can be exported to the computer and other software.

Animoji and Co-mediation System


Warning: Use of undefined constant user_level - assumed 'user_level' (this will throw an Error in a future version of PHP) in /home/commons/public_html/wp-content/plugins/ultimate-google-analytics/ultimate_ga.php on line 524

Huazhi Qin

According to Professor Irvine, culture and media technologies are co-produced or co-constitutive and thus form a necessary system of co-mediation. This idea reminds me of Apple Animoji technology, which I believe a great access point to understand the process and meaning of co-mediation system.

Animoji is a new function Apple first launched in 2017. It allows users to customize their own talking emojis by mirroring their facial expression and using their voice. This kind of personalized animated stickers makes a great contribute to improve users’ emotional expressiveness by integrating facial expression into mobile communication.

As what Latour mentioned in Pandora’s Hope, a first sense of mediation is “the program of action” (Latour, 1999). Animoji is smoothly run according to a series of actions. When people want to create and share an Animoji, they just need to open iMessage app on their iPhone and tap the Animoji icon at the bottom of the contact page of a friend. Then they can choose or create a new figure and animate the with facial expressions. Their voice is recorded at the same time. This new-born Animoji can be shared to the friend or saved in video version.

The whole process can also be described from the perspective of mediation, especially in the layer of interface. Both human and nonhuman agents exist in this system and each of them has a goal. As human agents, the designers of Animoji might originally want to expand the ways to express or bring more fun to improve the user experience. Meanwhile, the users might desire to express and share their emotion in a more direct and interesting way. As for nonhuman agents, touchscreen is the physical material users can actually “operate”. Camera and voice in-built recording works for mirroring facial expression and recording voice. Animoji software provides existent and new pieces of emojis to select and assembly as well as . Anyway, just like what Latour said, responsibility for action must be shared among the various actants (Latour, 1999). All these elements work together and stick to one common goal – creating a customized animoji.

When analysis goes deeper, the idea of composition, the second meaning of technical mediation, should be mentioned. Goals are redefined by associations with nonhuman actants, and that action is a property of the whole association. (Latour, 1999) The goal to create a animoji can be divided into several pieces. The first one is to capture one’s moving facial expression, which means Animoji seeks for face detection technology. And then in order to make this technology work, the True Depth camera and the Depth sensing technology should be included which could track over 50 facial muscles of one’s face. (Info, 2017) Besides, the second smaller goal is to integrate captured facial expression into the existent or new-created emoji in real time. Thus Animoji need support to immediately analyze the recorded data which has been accomplished by its A11 Bionic chip. All actants inside offer one another new possibilities, new goals, new functions. (Latour, 1999)

Also obviously, the whole process and technology mentioned above can be seen as a black-boxing. Users can successfully create a animoji without mastering any of its complicated technology. In addition, the animoji that one create and share with friends can also be seen as his or her avatar. And through iMessage, no matter where a person actually is, other people can receive that animoji and sense his or her present emotion. It somehow displays the third meaning of technical mediation – the folding of time and space. (Latour, 1999)

Just like what Latour said, techniques modify the matter of our expression, not only its form.

References

Martin Irvine, “Understanding Sociotechnical Systems with Mediology and Actor Network Theory (with a De-Blackboxing Method)” PDF. 9.

Bruno Latour. “A Collective of Humans and Nonhumans — Following Daedalus’s Labyrinth,” in Pandora’s Hope: Essays on the Reality of Science Studies. (Cambridge, MA: Harvard University Press, 1999), 201.

Info, A. (2017, November 17). IPhone X Animoji Technology – Animoji Info – Medium. Retrieved from https://medium.com/@waynedehenry/iphone-x-animoji-technology-4837b0163473

Evernote as a Cognitive Artifact


Warning: Use of undefined constant user_level - assumed 'user_level' (this will throw an Error in a future version of PHP) in /home/commons/public_html/wp-content/plugins/ultimate-google-analytics/ultimate_ga.php on line 524

Huazhi Qin

During my reading, I find Evernote a great example for me to understand the definition and functions of cognitive artifacts.

According to Donald A. Norman, “a cognitive artifact is an artificial device designed to maintain, display, or operate upon information in order to serve a representational function (Norman, 1991) Evernote is a widely-used app for note taking. In addition to creating notes, it also allows users to sort notes into a notebook, add tags, give annotations or comments, reedit, search and share information.

As what Cole described, artifacts are objectifications of human needs and intentions already invested with cognitive and affective content. In other words, it is manufactured for a reason and put into use. (Cole, 1996) Evernote fulfills users’ needs to edit, save, view and share information. Also, it closely connects with other stuffs including record system, camera, local and cloud storage in order to provide broader services.

With regard to its functions, as what Clark mentioned, cognitive artifacts make the contribution to “the extension of our bodies, the extension of our senses, and crucially, the use of language as a tool to extend our thought”. (Clark, 2008) From system view, it enhances users’ capability of storage, archiving, memory and performance. For example, the form and format of note-taking are broadly expanded. Users are able to record sounds, save websites, and take videos and photos instead of simply typing or writing down texts. Moreover, rather than merely on a particular notebook, users can get access to their notes, the information saved on cloud database, on multiple devices using the same account.

From personal view, cognitive artifacts affect how the task to be performed. Every user is allowed to use Evernote and organize the notes by integrating their own rules and organizational system. For instance, they might own their unique series of tags or implement categorization methods to better suit their needs.

According to what Norman said, artifacts can distribute the actions across time, across people and change the actions required of the individuals doing the activity. (Norman, 1991) It is similar to what Evernote proposes in its own video that Evernote is the software that takes every notes in ever ways and goes everywhere every time.

At last, the video below is an introduction about “what is Evernote”.

References

Andy Clark. (2008). Supersizing the Mind: Embodiment, Action, and Cognitive Extension . New York, USA: Oxford University Press.

Michael Cole. (1996). On Cognitive Artifacts, From Cultural Psychology: A Once and Future Discipline. Cambridge: Harvard University Press.

Donald A. Norman. (1991). Cognitive Artifacts. New York: Cambridge University Press.