Translate Like A Human


Warning: Use of undefined constant user_level - assumed 'user_level' (this will throw an Error in a future version of PHP) in /home/commons/public_html/wp-content/plugins/ultimate-google-analytics/ultimate_ga.php on line 524

— De-blackbloxing Google Translate

Huazhi Qin

Abstract

Machines gradually take over the translation tasks in real life. As machine translation (MT) develops, different methodologies have been applied to this field and then generated multiple distinct translation systems. Rule-based machine translation (RBMT), statistical machine translation (SMT), and neural network machine translation systems (NMT) are the three most important systems. Among them, Google Translate uses Google’s neural network translation (GNMT) system, one of state-of-the-art NMT model, to achieve a breakthrough in MT. GNMT is a model integrating four components: recurrent neural network, long short-term memory, encoder-decoder architecture, and attention mechanism. However, the accuracy of Google Translate still faces challenges in terms of internal translation process and integration with the audio and image input.

 

Introduction

According to Russell, machine translation (MT) utilized the power of machines to achieve “automatic translation of text from one natural language (the source language) to another (the target language)”. (Russell et al., 2010).  As the increasing interactions all over the world, the demand to overcome language barriers has expanded. Due to the human translation asks for a lot of efforts and time, people sought help from the computer to take over this task. How to improve machines’ performance in translation has become one of the most important topics in computer science.

Since the 1950s, the scholars have tried applying different methodologies to machine translation (MT) to bridge the gap between machine and human translations. They developed multiple distinct translation systems. Among them, rule-based machine translation (RBMT), statistical machine translation (SMT), and neural network machine translation systems (NMT) are three core systems.

In 2016, Google introduced its updated translation service Google Translate using Google Neural Machine Translation System (GNMT), which marked a great improvement of machine translation. By integrating deep learning technology, Google Translate implemented the “attentional encoder-decoder networks” model which contributed to reduced translation errors by an average of 60% compared to Google’s phrase-based production system. (We et al., 2016)

Nevertheless, the current machine translation system has still been criticized for its limited accuracy.

 

Rule-Based Machine Translation (RBMT) — Linguistics

1.Translation process

Rule-Based Machine Translation (RBMT) is the oldest approach to make machines translate. Basically, it simulates the process of constructing and deconstructing a sentence based on language-specific rules, following one type of automatic translation process called Bernard Vanquois’ Pyramid (Figure 1).  The whole translation process experiences three steps – analysis, transfer, and generation – based on two sources – dictionaries and grammars. The implement of linguistics rules is the core feature.

Figure 1 Bernard Vauquois’ Pyramid (source: systransoft.com)

According to Evans, a language is composed of primitives (the smallest units of meaning) and the means of combination (rules for building new language elements by combining simpler ones). (Evans, 2011) SMT also focuses on these two elements. To be more specific, firstly, the machine analyzes the grammatical category and links for every word of the sentence in the source language based on the perspectives of morphologic, semantic, and syntactic rules. (Figure 2) Secondly, every word in the source language is transferred to the adequate lexical items in the target language according to dictionaries. At last, the complete target sentence is generated by synthesizing every part in step 2 according to the grammatical rules in the target language.

Figure 2 Analysis of the sentence in the source language (source: systransoft.com)

2. Limitations

When RBMT transfer meanings, there are obvious limitations in the following three aspects.

Firstly, the quantitative need of dictionaries and grammatical rules is hard to be fulfilled. The manual development of linguistic rules can be costly.

Secondly, RBMT is somewhat a language-specific system which means that it often does not generalize to other languages.

Thirdly, it only works for plainly-structured sentences while hard to deal with complicated ones, especially ambiguous and idiomatic texts. Human languages are full of special cases, regional variations, and just flat out rule-breaking. (Geitgey, 2016)

 

Statistical Machine Translation (SMT) – Probability Calculation

1.Translation process

Statistical machine translation (SMT) dominates the field of MT from the 1980s to 2000s. Unlike RBMT, no linguistic or semantic knowledge is needed in SMT. Rather, parallel corpora become the foundation of machine translation. In addition, SMT systems are not specially designed for any specific pair of languages.

Regarding the translation process, SMT applies a statistical model to machine translation and generates translation based on the analysis of bilingual text corpus. (Synced, 2017) The key feature is the introduction of statistics and probability.

There are also three steps in the process: 1) break the original sentence into chunks; 2) lists all possible interpretation options for each chunk (Figure 3); 3) generate all possible sentences and find the one with the highest possibility. The “highest possibility” means the sentence which sounds the “most human”. (Geitgey, 2016)

Figure 3 A large number of possible interpretations (source: medium.com)

2. Limitations

Despite statistical machine translation overcomes many shortcomings of RBMT, it still faces many challenges, especially in terms of sources and human intervention.

As regards sources, although no linguistic rules are required, statistical machine translation requires a great deal of training data about double-translated texts. (Geitgey, 2016) As for human intervention, the SMT system is consist of numerous separate sub-components and rely on multiple intermediary steps (Figure 4) which requires a lot of work from engineers. (Zhou et al., 2018) Excessive human intervention will definitely influence translation results.

Figure 4 SMT is consist of many intermediary steps (source: skynettoday.com)

 

Neural Machine Translation – Google Translate

Neural machine translation (NMT) is considered to be born in 2013 when two scientists applied deep learning neural networks to machine translation and proposed a novel end-to-end encoder-decoder structure. In the next few years, sequence-to-sequence learning using the recurrent neural network (RNN) and long short-term memory (LSMT) has been gradually integrated into NMT. (Synced, 2017)

However, NMT systems are criticized for its computationally expensive both in training and in translation inferences. Also, NMT systems lack practicability in some cases, especially when encountering rare words. (Wu et al., 2016) Thus, original NMT was rarely put into practice due to its poor performance in translation speed and accuracy.

In 2016, Google Brain team announced Google’s neural network translation (GNMT) system which addressed many of the issues. GNMT help Google Translate achieve state-of-the-art translation results. It reduces translation errors by an average of 60% when compared to Google’s previous phrase-based production system. (Wu et al., 2016) Then, I will de-productize Google Translate, one of the most advanced applications of NMT, to elaborate the how NMT works.

1.De-blackboxing Google Translate

According to Google Brain team, Google’s neural network translation (GNMT) is a model consists of a deep LSTM network with 8 encoder and 8 decoder layers using residual connections as well as attention connections from the decoder network to the encoder. (Wu et al., 2016) There are four major features in GNMT: recurrent neural network, long short-term memory, encoder-decoder architecture, and attention mechanism

A. Recurrent neural network (RNN)

Unlike previous machine translation, people understand the sentences, contexts, and information based on the understanding of previous ones. In other words, human thoughts have persistence. The introduction of recurrent neural network (RNN) brings machine translation an ability of memory, letting machine think like a human. The recurrent neural network contains loops which allow information to persist. (Github, 2015) In also means that the previous calculations can further influence change the results of future outputs.

However, traditional RNN sometimes faces the problem of long-term dependencies. When the machine has to trace further back to narrow down and determine the next word. (Github, 2015) For instance, when predicting the last word in the text “I was born and grew up in China… I can speak Chinese.” The close word “speak” only deliver the clue that the next word is most likely to be a language. The further previous contexts “China” can help narrow down to the specific word “Chinese”. In short, the gap between relevant information become wider.

B. Long short-term memory (LSMT) (Figure 5)

In order to address this issue, long short-term memory (LSMT) networks are applied to machine translation. At any given point in LSMT, it accepts the latest input vector and produces the intended output using a combination of the latest input and some ‘contexts’.

Figure 5 an unfold LSMT (source: codesachin.wordpress.com)

The horizontal line, namely the cell state, running through the top of the diagram. It conveys information straight down the entire chain. The structures, consisting of a sigmoid neural net layer and a pointwise multiplication operation, call gates. The three gates in an LSMT regulate the information flow, deciding what old information should be kept and what new information should be included in the next cell state. When generating the results, the gates only output the results needed. (Github, 2015) The whole process is based on a ton of example input and finally generates a filtered version. (Srjoglekar246, 2017)

As regards the actual translation process, for instance, the cell state might include the gender of the present subject to generate the proper pronouns. When encountering a new subject, the gender information of the old subject will be excluded. Then, a word relevant to the verb might be generated in the output step, since it is most likely to come following a subject. (Github, 2015)

C. Encoder-decoder architecture

Based on LSTMs, Google Translate built up its encoder-decoder architecture. Encoding can be seen as the process and result of the analysis. Decoding is the direct generation of the target sentence. Basically, the decoder network is similar to the encoder one. Thus, I will only discuss the encoder network in details below.

At the beginning, the sentence will be input into the system word by word. The encoding process refers to that the word will be encoded into a set of numbers. (Geitgey, 2016) The numbers represent the relative position of each word in a word embedding table and reflect its similarity with other objects. (Systransoft, 2016) (Figure 6)

Figure 6 the encoding process (source: medium.com)

There are two approaches Google Translate use to influence the “quality” of that numbers. The first one is bi-directional input, which means that the entire sentence will be input in reverse order. The following words also influence the meaning and “context” of the sentence. Thus, the “position” of the word will be more accurately output.

The second one is the principle of layering. According to Universal Principles of Design, layering refers to the process of organizing information into related groupings in order to manage complexity and reinforce relationships in the information. (Lidwell, 2010) The encoder network is essentially a series of 8 stacked LSTMs. (Figure 7) Every layer is impacted by the lower layer. The pattern of the data becomes more and more abstract when the information goes to higher layers which contributes to represent the contextual meanings of words in the sentence. (Srjoglekar246, 2017)

Figure 7 GNMT’s encoder networks (source: codesachin.wordpress.com)

In short, the encoder-decoder architecture can be displayed in Figure 8.

Figure 8 GNMT’s encoder-decoder architecture (Schuster, 2016)

D. Transformer – a Self-Attention Mechanism

However, the outputs of encoding process will bring too many complexities and uncertainties to decoder network, especially when the source sentence is too long. (Cho et al., 2014) In order to better process encoding, Google Translate build up a self-attention mechanism called Transformer between two phases. (Uszkoreit, 2017)

Transformer enables the neural network to pay more attention to relevant parts of inputs focus on relevant parts of input when encoding. (Synced, 2017) (Figure 9) So as to determine the level of relevancy, Transformer lets the system to look back at the input sentence at each step of the decoder stage. Then, each decoder output depends on a weighted combination of all the input states. (Olah & Carter, 2017)

Figure 9 the integration of Transformer (the purple lines denote the weights) (source: googleblog.com)

2. Limitations

Although GNMT is the state-of-the-art model in current MT field, the accuracy and reliability of its translation results still face lots of challenges.

Regarding the system itself, as what mentioned above, the filtering process is based on examples. Thus, it is important to collect a large amount of training and test data which can provide the diverse vocabulary and their usages in various contexts. In addition, it is hard to detect mistakes and inaccuracy of the outputs and then difficult to correct them, especially the omission of the information.  (Zhou et al., 2018) Meanwhile, the rare word problem, monolingual data usage, memory mechanism, prior knowledge integration, coverage problem and so forth are also needed to be further improved. (Synced, 2017)

Furthermore, in addition to text input, Google Translate accept input in the formats of audio and image, which raise higher requirement to natural language processing. According to the information theory, the omissions and errors will occur in the step that transfer audio and image information to the source that the system process. Then the accuracy of results will be definitely harmed.

 

Conclusion

Although still facing challenges, Google’s neural network translation system overcomes numerous shortcomings of RBNT, SMT, and original NMT and make huge improvements in terms of data amount, fluency, accuracy and so on. It brings a new possibility to the field of machine translation. This field is undergoing fast-paced development. It is reasonable to believe that the application of NMT will continue to achieve greater breakthroughs and then lead the future path of machine translation.

 

References

Cho, Kyunghyun, Merrienboer, V., Bart, Caglar, Fethi, . . . Yoshua. (2014, September 03). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. Retrieved from https://arxiv.org/abs/1406.1078

Evans, D. (2011). Introduction to computing explorations in language, logic, and machines. Lexington, KY: Creative commons. Pp. 20-21.

Geitgey, A. (2016, August 21). Machine Learning is Fun Part 5: Language Translation with Deep Learning and the Magic of Sequences. Retrieved from https://medium.com/@ageitgey/machine-learning-is-fun-part-5-language-translation-with-deep-learning-and-the-magic-of-sequences-2ace0acca0aa

How does Neural Machine Translation work? (2016, October 13). Retrieved from http://blog.systransoft.com/how-does-neural-machine-translation-work/

Lidwell, William, Kritina Holden, and Jill Butler. Universal Principles of Design. Revised. Beverly, MA: Rockport Publishers, 2010.

Olah, C., & Carter, S. (2017, August 31). Transformer: A Novel Neural Network Architecture for Language Understanding. Retrieved from https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html

Russell, S., Davis, E., & Norvig, P. (2010). Artificial intelligence: a modern approach (3rd ed.). Upper Saddle River, NJ: Prentice Hall.

Srjoglekar246. (2017, February 19). Understanding the new Google Translate. Retrieved from https://codesachin.wordpress.com/2017/01/18/understanding-the-new-google-translate/

Synced. (2017, August 17). History and Frontier of the Neural Machine Translation. Retrieved from https://medium.com/syncedreview/history-and-frontier-of-the-neural-machine-translation-dc981d25422d

Schuster, M., & Le, Q. (2016, September 27). A Neural Network for Machine Translation, at Production Scale. Retrieved from https://ai.googleblog.com/2016/09/a-neural-network-for-machine.html

Understanding LSTM Networks. (2015, August 27). Retrieved from http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Uszkoreit, J. (2017, August 31). Transformer: A Novel Neural Network Architecture for Language Understanding. Retrieved from https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html

Wu, Mike, Chen, Zhifeng, Mohammad, Wolfgang, . . . Hughes. (2016, October 08). Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. Retrieved from https://arxiv.org/abs/1609.08144

Yeen, J. (2017, October 06). AI Translate: Bias? Sexist? Or this is the way it should be? Retrieved from https://hackernoon.com/bias-sexist-or-this-is-the-way-it-should-be-ce1f7c8c683c

Zhou, S., Kurenkov, A., & See, A. (2018). Has AI surpassed humans at translation? Not even close! Retrieved from https://www.skynettoday.com/editorials/state_of_nmt