Abstract
Thanks to Google Translate, we can realize some communication we never thought that might happen. With GNMT (Google Neural Machine Translation) system, Google had improved his performance by a lot. What human beings’ cognitions does it distribute? How to improve Google Translate? In this paper, I will trace the history of the technology applied in machine translation first, including the PBMT (phrase-based machine translation) system and the GNMT (Google Neural Machine Translation), explain the poor behavior of Google translate by comparing the differences of understanding a sentence between human beings and machines (with examples), and analyze the human beings’ cognitions distributed by Google Translate.
1. Introduction
My friend Melissa, a graduate student in the translation of English and Chinese, had told me that: “As a translator, when I saw the latest progress of Google Translate, I totally could understand the anxieties and fears of the textile workers in the 18th Century when they saw the steam engine. Google Translate, the industry leader of machine translation, is a useful technology for people to extend and distribute their cognitions.
Communication is really important to human beings, the social beings who have to live in society and deal with each other. Languages are always the biggest barrier for people to understand the outside world. Semiotics is the study of signs. According to C.S. Peirce, there are three kinds of signs: icons, indices, and symbols. Icons represent things by simply imitating them; indices convey the idea by being physically connected with them, and symbols convey the meaning because of their usages. The most typical and common example of a symbol is language ( Irvine, 2016a). The semiotic feature of language makes language impenetrable to machines. Google had already improved its algorithm a lot, but still had some flaws.
2.The Main body of the essay
2.1Technical Overview
2.1.1 The introduction of Machine translates
Machine Translation is the use of software to translate a source language into the target language. Machine translation system can be divided into two categories: rule-based and corpus-based. The former’s resources are dictionaries and rule bases, the latter are corpora with the statistical mean.
Only a few years ago, PBMT (Phrased-Based Machine Translation) system was the mainstream approach of machine translation. Google Translate was based on this algorithm as well. Google machine translation basically uses technology-based statistical machine translation method. It takes a large number of bilingual web content as a corpus, and then selected the most corresponding words of the original language. The “Phrase”, here, in “phrased-based” means the smallest unit of translation.
2.1.2 How PBMT works?
First of all, PBMT breaks up the sentences into phrases according to the syntax of this language. Syntax, grammar, describes the rules and constraints for combining words in phrases and sentences that speakers of any natural language use to generate new sentences and to understand those expressed by others(Irvine, 2016b). Here is an example, we can see the syntax tree of the sentence “Once when I was six years old I saw a magnificent picture in a book, called True Stories from Nature, about the primeval forest(From The Little Prince written by Anntonie de Saint-Exupery, translated from the French by Katherine Woods ).”
Then, it will match each phrase to the target language from its big data.
Finally, it is necessary to rephrase the target language phrases so that it can conform to the syntax of the target language.
During the whole translation process, it is also necessary to use other lower-level NLP (Natural Language Processing) algorithms, such as Chinese word segmentation, part of speech, syntax structure, etc. Admittedly, Google’s technology is advanced, but it still sometimes generates all kinds of translation jokes. The reason is that statistical method, unlike human beings with knowledge, needs a large-scale bilingual corpus. The accuracy of translation directly depends on the size and accuracy of this corpus. This way of translation will eventually generate the incorrect translation because of the error propagation since any error in the middle link will continue to spread down, and lead to the wrong final result. Therefore, even if the accuracy of a single system can be as high as 95%, the accumulation of minor error will cause an unacceptable result.
2.1.3 How does GNMT work?
In September 2016, Google has been published “Google`s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation” on http://ArXiv.org.
With the same corpus, the GNMT (Google Neural Machine Translation) system can achieve the same result with less workload compared with the PBMT (phrase-based machine translation) system. The word “neural” in GNMT means it can pay attention on the words you’ve inputted. In the past, the translation engine would only look at each word, one by one, and provide you the language matching with it. But the GNMT will “read” the words at first. For example, if the texts you’ve input is an excerpt of a news that had already translated into French by the news website. The Google Translate will then provide you the same chunk from the French page.
The diagram below shows how GNMT translates a Chinese sentence into an English sentence.
This model follows the common sequence-to-sequence learning framework with attention. It has three components: an encoder network, a decoder network, and an attention network.
Formal languages are defined with respect to a given alphabet, which is a finite set of symbols, each of which is called a letter. This notation does not mean, however, that elements of the alphabet must be “ordinary” letters; they can be any symbol, such as numbers, or digits, or words (Clark, A., f 2013). First, “The encoder transforms a source sentence into a list of vectors, one vector per input symbol.” Here, it encodes each Chinese character into each vector. “Given this list of vectors, the decoder produces one symbol at a time, until the special end-of-sentence symbol (EOS) is produced. The encoder and decoder are connected through an attention module which allows the decoder to focus on different regions of the source sentence during the course of decoding
With GNMT system, Google Translate can achieve a better result under the standard of human assessment. With the help of bilingual human assessors, the sample sentences from Wikipedia and news website can reduce (Wu, Y. 2016).
Data from side-by-side evaluations, where human raters compare the quality of translations for a given source sentence. Scores range goes from 0 to 6, with 0 meaning “completely nonsense translation” and 6 meaning “perfect translation.”
Here are some examples of a translation produced by PBMT, GNMT, and Human.
Machine translation is far from being perfect. The encoding of sentences into vectors, regardless of its language feature, the content may be uncontrollable, and lead to errors. GNMT will still make significant mistakes that human translators will never make, such as misspellings and misinterpretations of rare terms. However, GNMT represents a major milestone.
2.2 Google Translate V.S. Human Beings
“As the engineering branch of computational linguistics, natural linguae processing is concerned with the creation of artifacts that accomplish tasks” (Clark, A.2013) NLP is the main source for machines to understand a sentence, most NLP tasks require the annotation of linguistic entities, with class labels: A part-of-speech tagger, for instance, assigns a part of speech to each word.
When I saw the news of Google ‘s new GNMT system, I was very interested. So I used a Chinese article to test Google Translate. The article I chose was very logical, it is an introduction of an earphone. The result I received was surprisingly good, the choices of words are very accurate and suitable.
But this matter is not that simple.
2.2.1 the Semiotics of language
The linguistic sign units have two sides, the concept and sound-image. Saussure will use sign [signe], signified [signifé] and signifier [signifiant] to mention word, concept, and sound-image respectively. (Martin, I. 2016 a)
An English-speaker can understand the meaning of words. He can follow the command to pick the red paper among 10 pieces of colorful paper. “But a computer cannot understand the meaning of red, just as a piece of paper cannot understand what is written on it (Hausser, R.1999).”
Let’s see Google Translate’s performance when dealing with a more flexible and oral article.
And this is the translation from a Chinese-English translation company: “If the performance of GTX 950 is taken as a benchmark of 100%, then the performances of GTX 1050 and GTX 1050Ti can reach 110% and 140% respectively. They also excel the previous generation models in terms of power consumption. Apart from saving energy, lower power consumption means less heat generated, which is a good news for the game players with small-chassis computers for which ventilation can be an issue.“
We can see that the google translation version, compared with the one translated by human, is barely acceptable.
2.2.2 The Beauty of Language
The machine cannot understand the beauty of language. Poems are fantastic because they have rhythm and verbal poetic images. Here is an example, this is an ancient Chinese poem.
陆游 《卜算子·咏梅》
驿外断桥边,
寂寞开无主。
已是黄昏独自愁,
更著风和雨。
无意苦争春,
一任群芳妒。
零落成泥碾作尘,
只有香如故。
(The Diviner-Ode to the Plum
By Lu You
Tr. Zhao Yinchuan
Beside the broken post bridge there
It blows, solitarily sane
The dimming dusk it can hardly bear
And there’s the slash of wind and rain
It contends for spring with no one
That horde of flowers, let them flare
It falls into dust, trundled to none
Its aroma welling as e’er)
In this poem, the rhyme is following a certain pattern. The human translator can understand it and find the words rhythm with each other to accomplish the translation. But the machine can never tell this difference. It will follow the big data’s command and output the most common combination of these words.
In addition, “Broken Bridge(断桥)” “Dimming Dusk(黄昏)” “Wind(风)” “Rain(雨)” These verbal poetic images jointly create a lonely and clear atmosphere. We can see the picture imaginatively: a plum is opening in a desolate spot next to a broken bridge, the evening wind and rain scatter the plum into the mud, but it still maintains its aroma.
Language sign, according to Saussure, has (1) the arbitrary (i.e., unmotivated) structural relation of sound and meaning in any natural language (the foundation of language as a symbolic system), (2) speech sounds, word forms, and meanings are elements in a system of interrelations, within which, and only within which, they function as constituents of a language; and (3) the recognition of two dimensions of meaning — the “context-free” sense (like dictionary meaning) and social-cultural value (meaning in contexts of use) (Saussure, 1959). These features make language can only be understood by the people who understand the system of this language (signifier), and the meaning of this symbol (signified). This limitation is the biggest barrier for the machine to fully understand one sentence.
“Broken Bridge(断桥)” “Dimming Dusk(黄昏)” “Wind(风)” “Rain(雨)” these words are accepted by the Chinese society to signify a specific meaning. These verbal poetic images can accurately express the feeling of the author so that the readers receive an aesthetic experience.
2.2.2 The Formation of Language
Finally, it cannot match the same target language if the formation of the source language is changed.
Now, Google Translate can deal with the translation between English and Chinese. But if I add an auxiliary on the same sentence, the translation cannot reach the ideal result.
The process of GNMT is purely a process of fitting functions. Through this fitting function, if the source language changes its formation, it will map different target language, even if they have the same meaning. So, adding a few irrelevant words will change the result enormously.
2.3 Distributed, Extended, and Embodied Cognition
Google has already completed the experimental and commercial practice on its TensorFlow platform (TensorFlow™ is an open source software library for numerical computation using data flow graphs) with Tensor Processing Units. This technology
2.3.1Distributed cognition between individuals and technology
2.3.1.1 Beyond Direct Manipulation: Graphical Interface
The interface is the access for people to manipulate the things existing on the screen. An important research issue for the field of human-computer interaction is how to move beyond current direct-manipulation interfaces (Hollan, J.2000). This web page allows us to interpret actions such as, input the source language, change the source language, get the output of the target language, share the texts, listen to the texts, etc. Some of the actions can be realized in the real world, but some have no easy counterpart.
As users become more familiar with an environment they situate themselves more profoundly(Hollan, J.2000). “Everything we take for granted about graphical “interfaces” – software controlled pixel mapping with an “interactive” software layer engineered to track pointing devices and defined regions for user-activated commands (icons, menus, links) – were developed in this context for “augmenting human intellect” and organizing all forms of symbolic representations and expressions. “(Martin I., 2016 c)This website, like the most website, uses graphical interfaces (icons) to connect the users to the website.
2.3.1.2 Provide knowledge
Listen: From the icon “listen”, Google Translate will read this text. Google Translate recorded a human’s voice of thousands and thousands of carefully-chosen sentences to make the voice of it. These sentences are chosen to contain all the sounds in one language and all the combinations in this language (For example, in English, the /s/ sound will change to accommodate the letter in front of it). They by following divide these sentences into sound tokens. The voice we can hear is the combination of these tokens. The technology could provide us with the knowledge and skills that are unavailable from internal representations (Zhang, J., 2006). We could imitate the sound provided by Google to speak another language.
Translation (Website translation, Camera instant translation):
@Aidan Mechem twittered his experience with Google Translate.
With notebooks, we don’t have to memorize everything happens in the daily life; with Google Translate, we don’t have to learn Spanish to communicate with a Spanish-speaker. Google Translate is an affordance of our cross-language communication.
2.3.2 Distributed cognition across individuals.
2.3.2.1 Cross-cultural communication
Instead of spending times in learning other languages, Google Translate provide you the most convenient and efficient solution of understanding other languages. The function of “share”, and “read phonetically” democratize the cross-language communication.
2.3.2.2 Google Translate Community
Languages are much more complex than the machine can understand. The language may be unclear without the context. The sentence like “J’ai votre nom” can be understood in two different meanings: “I have all of your names” and “I have your name; you are one person I respect”. That is why Google Translate Community is available for anyone to correct the translation. Even if the meaning is clear, idioms and specific terms and expressions make translation very hard for machine translations to translate precisely. The meaning of one sentence may change with different social context or the status of the reader and writer. On analyzing the distributed cognition across individuals, reductionists insist that the cognitive properties of a group can be entirely determined by the properties of individuals”; the interactionists insist that the interactions among the individuals can produce emergent group properties that cannot be reduced to the properties of the individuals(Zhang, J., 2006). The communities’ advises can be really important for Google to expand its corpora and by following to provide better service.
3. Conclusion:
Machine learning is a process of identifying the vectors. For example, there are two boxes of fruit, including apples and oranges. The machine will first identify their vectors: (red, with handle, sweet) = apple, (yellow, no handle, acid) = orange. Facing a green apple, the machine can find that this green apple is relatively close to the apples’ vectors, and will identify it as an apple.
In the linguistic field, NPL is the way for the machine to analyze one sentence. GNMT system is a new algorithm for Google Translate to pay attention to the connection between the source language and target language. With these algorithms, Google Translate has actually been incredibly successful, in my point of view. But there are some big problems when translating: for example, from Chinese to English, Google Translate can’t tell where Chinese words start and stop since there aren’t spaces between Chinese words.
An English speaker will say “it’s Greek to me” when he cannot understand a word; a Greek speaker will say “it sounds like Chinese” when he encounters the same difficulty; a Chinese speaker addresses this situation as “it’s Heavenly Script to me”. Google Translate can break down the barrier of language for us. I think Google Translate can be improved to bring a revolution in the near future to the cross-language communication.
References:
Martin, I. (2016 a). The grammar of meaning systems: Sign systems, symbolic cognition, and semiotics. Unpublished manuscript.
Martin Irvine. (2016b). Introduction to Linguistics and Symbolic Systems: Key Concepts. Unpublished manuscript.
Martin Irvine. (2016c). Introduction to the Technical Theory of Information. Unpublished manuscript.
Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., … & Klingner, J. (2016). Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv preprint arXiv:1609.08144.
Carey, J. (2002). A cultural approach to communication. McQuail’s reader in mass communication theory, 36-45.
Clark, A. (2008). Supersizing the mind: Embodiment, action, and cognitive extension. OUP USA.
Clark, A., Fox, C., & Lappin, S. (Eds.). (2013). The handbook of computational linguistics and natural language processing. John Wiley & Sons.
Zhang, J., & Patel, V. L. (2006). Distributed cognition, representation, and affordance. Pragmatics & Cognition, 14(2), 333-341.
Hausser, R., & Hausser, R. (1999). Foundations of computational linguistics. Berlin: Springer.
Ferdinand de Saussure, Course in General Linguistics. 1911-1916. English translation by Wade Baskin, 1959. Excerpts.
Hollan, J., Hutchins, E., & Kirsh, D. (2000). Distributed cognition: toward a new foundation for human-computer interaction research. ACM Transactions on Computer–Human Interaction (TOCHI), 7(2), 174-196.
Chandler, D. (2007). Semiotics: the basics. Routledge.
Short, T. L. (2007). Peirce’s theory of signs. Cambridge University Press.
Bergman, M., & Paavola, S. (2010). The commens dictionary of Peirce’s terms-Peirce’s terminology in his own words.
Denning, P. J., & Bell, T. (2012). The information paradox. American Scientist, 100(6), 470.