Author Archives: Tianyi Zhao

The AI Powers Behind Google Translate

Tianyi Zhao

Abstract

Neural machine translation has already been rapidly developed in recent years and is gradually disrupting the previous statistical machine translation, which is still dominant in the industry. This paper mainly discusses how deep learning and pattern recognition applied to the machine translation with the case study of Google Translate, and what Google Translate upgrades the neural networks to enhance translation accuracy and speed with various updated design points. Firstly, the brief history of machine translation and how neural machine translation emerged are introduced. Then, the focus moves to the comparison between statistical and neural machine translation. Thirdly, the key promoted design points of Google Neural Machine Translation are discussed in detail, as well as the problems they have solved. Finally, the author studies pattern recognition on Google Translate, including both image and speech recognition.

Table of Contents

I. Introduction

II. Machine Translation: From Statistical to Neural Machine Translation System

    1. The Transformation

        1.1 Statistical Machine Translation (SMT)

        1.2 Neural Machine Translation (NMT)

    2. Upgraded Design: Google Neural Machine Translation System (GNMT)

        2.1 Eight-Layered Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNNs)

        2.2 Bidirectional Encoder on the First Layer

        2.3 Residual Connections

        2.4 Attention Mechanism

   3. How GNMT Optimizes NMT

III. Pattern Recognition

   1. Image Translation

   2. Speech Translation

IV. Conclusion

I.   Introduction

Artificial intelligence (AI) is changing most of our industries in the decades. Machine learning, the key subject in AI, refers to the science of getting computers to learn, without being explicitly programmed. (Ng, 2016) Its applications vary from data mining—like medical records and web click data – to self- customizing programs such as Amazon’s or Netflix’s recommendation system. Machine translation, firstly coined by Warren Weaver in his book Memorandum on Translation in 1949, has already become one of the ultimate goals that human struggles for in artificial intelligence field. Basically, machine translation performs simple replacement of one language (source) to another (target). Machine translation has already experienced several stages with evolving approaches as the figure 1 shows below. Rule-based Machine Translation (RBMT), emerged in 1970s, consisted of bilingual dictionary and a set of linguistic rules for each language, which included three types – direct machine translation, transfer-based machine translation and interlingual machine translation. The dumbness of RBMT helped to breed the example-based Machine Translation (EBMT) that used ready-made phrases instead of repeated translation. Soon after came the era of Statistical Machine Translation (SMT). Then 2014 was a big year for machine translation when Neural Machine Translation (NMT) was generated with a paper. It is a single, large neural network that reads a sentence and outputs a correct translation.

Figure 1. A Brief History of Machine Translation

(Source: https://medium.freecodecamp.org/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5)

Firstly proposed by Kalchbrenner and Blunsom (2013), Sutskever et al. (2014) and Cho et al. (2014b), NMT works in totally different process to translate sentences. The scholars put forward the encoder-decoders approach, in which encoder encodes a sentence into a fixed-length vector and decoder exports a translation from the encoded output. However, Bahdanau and Cho et al.
(2016) offered to extend the architecture by “allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.” (Bahdanau and Cho et al., 1) The statement provided a solid foundation for the new term attention. The initial proposal of NMT was ignored, but Google Translate penetrated the potential and future value so that it was one of the first to develop and apply NMT in their product. In 2016, Google Translate upgraded the previous NMT by announcing the Google Neural Machine Translation System (GNMT) with a technical report “Google Neural Machine Translation System: Bridging the Gap between Human and Machine Translation,” in which the researchers proposed many improvements to increase the accuracy and speed of NMT.

In this paper, the author focuses on deep learning and pattern recognition applied to machine translation with the case study of Google Translate, which was launched in 2006. Born in the near end of SMT era, Google Translate has experienced the big jump and revolution in machine translation industry. This paper de-blackboxes how Google Translate’s transformation from SMT to NMT and optimization in NMT system with updated design points, including the eight-layered Long Short-Term Memory (LSTM), bidirectional encoder on the first layer, residual connections, and attention. Then the problems Google Translate has solved are discussed respectively. At last, the paper studies the pattern recognition applied on the Google Translate with both image and speech recognition.

II.   Machine Translation: From Statistical to Neural Machine Translation System

Google Translate is born in the near end of statistical machine translation era, so the transformation from SMT to NMT is a big step for promotion. However, the initial NMT was far from Google’s demand and satisfaction, thus Google researched and developed a upgraded version to solve design problems. In this part, a brief comparison between SMT and NMT is discussed. Then the author moves to how Google applies new designs to increase the accuracy and speed.

1.     The Transformation

1.1 Statistical Machine Translation (SMT)

SMT is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. It uses predictable algorithms to teach machine to translate with parallel bilingual text corpus. Then the machine leverages from what it has been taught, which are the translated text, to predict the translation of the foreign languages. SMT is data-driven, which only needs the corpus of both source and target languages. It is normally divided into three types – word-based, phrase-based and syntax-based SMT. When launching in 2006, Google Translate utilized phrase-based machine translation as the key algorithm, which split the text not only into words but also phrases.

Phrase-based machine translation includes three steps in the process. The first step is to break the source sentence into chunks (phrases). Then all possible interpretation options for each chunk (phrase) is listed. At last, the machine generates all possible outputs and chooses on with the highest possibility, which means “most human-like.” (Geitgey, 2016)

However, there were many shortcomings appearing during the SMT process. The word of phrase alignment breaks down the sentences into independent words or phrases during translation. The word cannot be considered and translated until the previous one has finished. So the corpus collection is costly in time and efforts. Additionally, statistical approach cannot be predominant, because it requires a great deal of training data on the bilingual texts and “[it] consists for the most part in developing large bilingual dictionaries manually.” (Poibeau, 139) The numerous separate chunks rely on multiple intermediary steps which requires a heavy work from engineers. Moreover, the translation results may have superficial fluency that may cause misunderstanding.

1.2 Neural Machine Translation (NMT)

NMT is a more advanced approach than the statistical one. It is based on the neural networks in the human brain, so similarly the information is delivered to different “layers” to be processed before output. Compared to statistical approach, NMT does not require alignment between two languages. Instead, it “attempts to build and train a single, large neural network that reads a sentence and outputs a correct translation.” (Bahadanau et al., 1) Most of NMT models belong to encoder-decoder family. The encoder neural network can encode the source sentence to the specific set of features, while the decoder neural network can decode them back to the text, the target one. Being applied deep learning techniques, NMT can teach itself to translate based on the statistical models. According to Ethem Alpaydin, the process of NMT starts with multi-level abstraction in lexical, syntactic and semantic rules. Then a high-level abstract representation is extracted, and the translated sentence will be generated as “decoding where we synthesize a natural language sentence” in the target language “from such a high-level representation.” (Alpaydin, 109)

Compared with SMT, NMT has made great progress. It combines context to find more accurate words and automatically adjusts to a more natural sentences syntactically that are smoother and more readable. As Figure 2 shows below, there has been a drastic increase in translation accuracy from SMT to NMT.

Figure 2. The Comparison on the Accuracy between Statistical and Neural Machine Translation

(Source: https://www.irjet.net/archives/V5/i10/IRJET-V5I1047.pdf)

2.    Upgraded Design: Google Neural Machine Translation System (GNMT)

Although the initial emergence of NMT brought plenty promotions in machine translation, there are many problems still remaining. The basic one should be its computational expensiveness both in training and translation inferences. Thus, Google Translate developed their own system named as Google Neural Machine Translation (GNMT) system (Figure 3) with several breakthroughs and new applications on design to enhance translation performance. GNMT has three components: an encoder network, a decoder network, and an attention network. The features include eight-layered Long Short-Term Memory Recurrent Neural Network, bidirectional encoder on the first layer, residual connections and attention mechanism.

Figure 3. The Model Architecture of GNMT

(Source: Wu, Yonghui, et al. “Google’s Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation.” arXiv.org, Cornell University Library, arXiv.org, Oct. 2016)

2.1 Eight-Layered Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNNs)

Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. (Amidi, 6) In other words, they allow information to persist throughout different steps in a network with loops. (Misra, 2019) They bring machine translation a capability of memory, working like human brains. Nevertheless, the traditional RNNs possess severe drawbacks, such as computation being slow, difficulty of accessing information from a long time ago, and cannot consider any future input for the current state. They have to face the problem of long-term dependencies and the vanishing gradient problem. “When back propagation is calculated… the gradients get really small as the backpropagation algorithm moves through the network,” which causes the earlier layers to learn slower than the later, as the figure 4 shows. (Misra, 2019)

Figure 4. Decay of Information through Time

(Source: https://towardsdatascience.com/using-rnns-for-machine-translation-11ddded78ddf)

Long Short-Term Memory (LSTM) is here to solve long-term dependency problems and the vanishing gradient problem. The secret lays on the horizontal lines, namely the cell state, in the network.  That is the line running across the top of the network in the diagram. So as Figure 5 discloses, information can simply flow through the cell state without being changed. The gates that are combined to the cell state have the capability to add or remove information when it needs to. Generally, LSTMs can remember information for long periods of time.

Figure 5. The Repeating Module of LSTM Network

(Source: https://towardsdatascience.com/using-rnns-for-machine-translation-11ddded78ddf)

Layering is one of the key design principles. Lidwell, Holden and Butler summarizes layering as “the process of organizing information into related groupings in order to manage complexity and reinforce relationships in the information” in Universal Principles of Design. (Lidwell, William., et al., 122) Generally, deeper networks are believed to achieve better performance than shallower networks. Both the encoder and decoder network in GNMT have eight layers. Every layer is influenced by the lower one, while the information gets extracted and becomes more abstract as it goes to the upper layers.

2.2 Bidirectional Encoder on the First Layer

Normally, there is high uncertainty on the source sentence location of the information required to translate certain words in the target sentence. Often the source side information is approximately left-to-right, which is similar to the target side, but sometimes it is not. It depends on “the language pair the information for a particular output word can be distributed and even be split up in certain regions of the input side.” (Wu el at., 6) So, GNMT implements one bidirectional layer, followed by seven unidirectional LSTM layers, in the encoder network, which is the best performing NMT model on the datasets so far.

The bidirectional layer in the encoder empowers the network to have the best possible context at each point during encoding. The bidirectional connection can only be used for the bottom encoder layer because of guaranteeing the maximum possible parallelization during computation. Denning and Martell define parallelism as “computations performed cooperatively by multiple, concurrent agents.” (Denning and Martell, 149) The implementation of model parallelism is for improving the speed of the gradient computation on each replica. However, if all LSTM layers utilize bidirectional connections, parallelism among subsequent layers would be reduced in that the next layer must wait until the previous layer finishes both the forward and backward directions.

Figure 6. The Structure of Bidirectional Connections in the First Layer of the Encoder

(Source: Wu, Yonghui, et al. “Google’s Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation.” arXiv.org, Cornell University Library, arXiv.org, Oct. 2016)

Figure 6 shows how bidirectional connections work on the bottom encoder layer. The layer LSTMf processed the source from left to right, while LSTMb works from right to left. Outputs from LSTMf (Xtf) (in pink circle) and LSTMb (Xtf) (in green circle) are the first concatenated and then delivered to the next layer LSTM1.

2.3 Residual Connections

As mentioned above, the deeper networks generally give a better accurate output than the shallower ones. But the more stacking layers in the LSTM works only to a certain number of layers, while the rest of network “becomes too slow and difficult to train.” (Wu el al., 5) In Google Translate large-scale translation experience, they find out that simple stacked LSTM layers work well up to four layers, barely with 6 layers, and very poorly beyond 8 layers. Therefore, Google Translate Team introduced the design of residual connections.

Figure 7. The Comparison of LSTM with and without Residual Connections

(Source: Wu, Yonghui, et al. “Google’s Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation.” arXiv.org, Cornell University Library, arXiv.org, Oct. 2016)

Residual connections enable the input to the bottom LSTM layer (Xi0 to LSTM1) “element-wise” added to the out from the bottom layer (Xi1). The sum output is delivered to the upper LSTM layer as the input. The implementation of residual connections, to a great extent, improves the gradient flow in the backward pass, allowing engineers to train deeper encoder and decoder networks.

2.4 Attention Mechanism

Although GNMT has been advanced greatly, there is still uncertainty existing in the output accuracy. To ameliorate the architecture, Google Translate launched the Transformer in 2017. It is a new neural network working with self-attention mechanism. The attention function is defined as “mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors.” (Vaswani el at., 3) In other words, the attention applied from the bottom layer of decoder network to the top layer of the encoder network helps rank the importance or contribution level of each work in a sentence, simply displaying by different thickness of lines. Figure 8 below shows an example of the kind of attention that the model learns without any supervision on the alignment.

Figure 8. Sample Translations Made by the NMT Model with the Self-Attention Mechanism

(Source: https://devblogs.nvidia.com/introduction-neural-machine-translation-gpus-part-3/)

During the encoder network, the self-attention mechanism starts by generating initial embeddings for each word from the source sentence. Then it aggregates information from all of the other words by utilizing self-attention to generate a new representation informed by the context. This process repeats multiple times in parallel for all words. (Uszkoreit, 2017) The decoder network works in similar way but generate one word at a time.

3.    How GNMT Optimizes NMT

In the early birth of NMT, its systems used to be worse in accuracy than phrase-based translation systems. However, the upgraded GNMT has solved many problems existing in the previous version of NMT. The first one should be slower training and inference speed. Training an NMT system on a large scale translated dataset usually takes a lot of time and computational resources. Besides, because of the considerable number of parameters used, NMT works much slower than phrase-based systems for inference. The application of eight-layered LSTM RNNs with residual connections can effectively shortened the process time and increase the accuracy. The second problem is ineffectiveness in dealing with rare words. NMT normally lacks robustness in translating rare words; however, self-attention mechanism, cooperating with wordpieces, can efficiently solve it by precisely capturing. Nevertheless, there are still some problems left to be solved, for example, failure to translate all words in the source sentence.

III.    Pattern Recognition

Pattern recognition refers to the automated recognition of patterns and regularities in data, including image, speech, face, etc. The recognitions of optical and acoustic information are two significant segments in pattern recognition. These two have been fully applied to Google Translate as image translation, speech and conversation translation.

1.    Image Translation

Figure 9. Google Image Translation

(Source:https://egyptinnovate.com/en/%D8%A8%D9%86%D9%83-%D8%A7%D9%84%D8%A3%D9%81%D9%83%D8%A7%D8%B1/google-translate)

Image translation in Google Translate APP means to interpret characters or texts by using with the mobile camera, and the translation will appear on the screen in real-time. The technology utilizes optical character recognition, which is recognizing printed or written characters from their images. However, there are various fonts and styles for one character, how can Google Translate accurately identify them? Actually, character image is not a collection of random dots and strokes of different directions, but “it has a regularity that we believe we can capture by using a learning program.” (Alpaydin, 57) For each character, machine learns from abundant fonts and sizes to capture, and then it generalizes a shared description applied for every font of each character. Generally speaking, each character consists of two types of factors — the identity, namely the label of the character; and the appearance, which varies due to the process of printing. The printed characters on the image maybe much different, for instance, Times New Roman’s serifs and strokes have diverse width. However, the differences among these post-added characteristics are too small to influence the respective identity. (Alpaydin, 61)

By leveraging the optical character recognition, Google Translate can easily and quickly capture the character features that can be recognized as language, a sequence of words rational from the lexicon and in semantics. Then the real-time translation is delivered. For example, when an American traveling to Japan, and every landmark is in Japanese. He can simply hold up his smartphone, open the image translation on Google Translate, and the camera will mechanically capture the Japanese characters and directly shows the translation in English.

2.    Speech Recognition

Besides image recognition, speech recognition is heavily used in machine translation as well. In speech recognition, the input of characters in acoustic signal can be identified as a sequence of phonemes, the basic speech sounds. Similarly to the visual recognition discussed above, there are different pronunciations of the same word because of age, gender or accent. The acoustic signal is composed of the features relating to the words and the rest relating to the speaker. In machine translation, what the learning program teaches is only the characteristics of the words instead of those of the speakers. However, Google Translate applies the second type as well, which is for the Conversation Translation. It achieves not only recognizing the input words but also identification of different people in the dialogue. To continue the same example of traveling in Japan, if the American visitor asks the local people who does not speak English how to go to the destination. The Conversation Translation can accomplish instant real-time interpretation between Japanese and English for both sides as the dialogue goes on.

IV.    Conclusion

There is no doubt that the future of machine translation is still burdensome, as there are numerous problems left to be solved, for example, under-translation, data sparseness and difficulty on building knowledge base. Albeit, GNMT has upgraded NMT in many terms. During the past decade, tremendous high-quality technologies or design ideas have published by Google team, dedicating to take the leading role in machine translation. As machine translation performs as one of the ultimate goals that human struggles for in artificial intelligence field, the translation accuracy and speed keep being increased so that machine can do a better job than human beings in the industry.

Works Cited

Afshine and Shervine Amidi. “Super VIP Cheatsheet: Deep Learning.” Stanford University, Nov. 25, 2018.

Alpaydin, Ethem. Machine Learning. MIT Press, 2016.

Britz, Denny, et al. Massive Exploration of Neural Machine Translation Architecture. Mar. 2017.

https://arxiv.org/pdf/1703.03906.pdf

Castelvecchi, Davide. “Deep learning boosts Google Translate tool.” Nature, Sep. 27, 2016. https://www.nature.com/news/deep-learning-boosts-google-translate-tool-1.20696

Geitgey, Adam. “Machine Learning is Fun Part 5: Language Translation with Deep Learning and the Magic of Sequences.” Medium, Aug 21, 2016. https://medium.com/@ageitgey/machine-learning-is-fun-part-5-language-translation-with-deep-learning-and-the-magic-of-sequences-2ace0acca0aa

Google Neural Machine Translation. Wikipedia, 2019, https://en.wikipedia.org/wiki/Google_Neural_Machine_Translation

“Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation.” Transactions of the Association for Computational Linguistics, vol. 5, MIT Press, pp. 339–51, doi:10.1162/tacl_a_00065. https://arxiv.org/abs/1611.04558

Herbig, Nico, et al. “Integrating Artificial and Human Intelligence for Efficient Translation.” arXiv.org, Cornell University Library, arXiv.org, Mar. 2019, http://search.proquest.com/docview/2189132198/.

Karami, Omid. “The brief view on Google Translate Machine.” https://pdfs.semanticscholar.org/c6f9/5d543c0b34c4026b9e6cf64decd94b793823.pdf

Le, Quoc V. et al. “A Neural Network for Machine Translation, at Production Scale.” Google AI Blog, Sep. 27, 2016, https://ai.googleblog.com/2016/09/a-neural-network-for-machine.html

Lidwell, William., et al. Universal Principles of Design. Rev. and updated ed., Rockport Publishers, 2010.

Misra, Aryan. “Using RNNs for Machine Translation.” Medium, Mar. 10, 2019. https://towardsdatascience.com/using-rnns-for-machine-translation-11ddded78ddf

Wu, Yonghui, et al. “Google’s Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation.” arXiv.org, Cornell University Library, arXiv.org, Oct. 2016, http://search.proquest.com/docview/2080906303/.

Papineni, et al. “BLEU: a Method for Automatic Evaluation of Machine Translation.” Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 311-318. https://www.aclweb.org/anthology/P02-1040.pdf

Pestov, Ilya. “A History of Machine Translation from the Cold War to Deep Learning.” Free Code Camp, Mar. 12, 2018 https://medium.freecodecamp.org/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5

Poibeau, Thierry. Machine Translation. MIT Press, 2017.

“Statistical Vs. Neural Machine Translation.” United Language Group. https://unitedlanguagegroup.com/blog/statistical-vs-neural-machine-translation/

Sukhadia, Nihar. “Applications of Artificial Intelligence in Neural Machine Translation.” International Research Journal of Engineering and Technology, Vol. 05, Issue 10, Oct, 2018. https://www.irjet.net/archives/V5/i10/IRJET-V5I1047.pdf

Takano, Tetsuto, and Yamane, Satoshi. “Machine Translation Considering Context Information Using Encoder-Decoder Model.” arXiv.org, Cornell University Library, arXiv.org, Mar. 2019, http://search.proquest.com/docview/2201984233/.

Uszkoreit, Jakob. “Transformer: A Novel Neural Network Architecture for Language Understanding.” Google AI Blog, Aug. 31, 2017. https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html

Vaswani, Ashish, et al. “Attention Is All You Need.” arXiv.org, Cornell University Library, arXiv.org, Dec. 2017, http://search.proquest.com/docview/2076493815/.

Machine Translation & Data Privacy

Tianyi Zhao

Artificial intelligence, although in the fast-growing stage nowadays, is still a blackbox waiting for exploring and exploiting. Currently we are on the stage of leveraging with neural network, in which machines can learn advanced algorithm from practice and testing. The key fields in AI that have mostly impressed me during the course are machine learning and natural language process. The typical practice that combines these two is machine translation. As deep learning develops, neural network has been applied to machine learning and replacing the previous statistical one. With the encoder-decoder model, the source sentence is encoded into a fix-length vector from which a decoder generates a translation during the translation process. It associates context to find more accurate words and automatically adjusts to a more natural sentences syntactically that are smoother and more readable. Google Translate realized its transformation from statistical machine translation to neural one with multiple input methods in 2016. However, the technology still has problems in sequence and wording in reality. Besides, pattern recognition has also been applied to machine translation. There are mainly two types—image and speech. The multimedia in the source input is acceptable, however the output is always in text. Personally, I think the next step of machine translation is not only the accuracy improvement but also the diversity of output. In the near future, there may not simultaneous interpreters any longer.

Besides, to improve the accuracy of machine processing outcome, there needs to be Big Data applied. So here comes a prevalent issue of privacy. How can we guarantee the data practiced for machines are collected legally or authorized? There has been numerous data abuse scandals in the tech giants. During a research on Google Translation URLs, a police investigator was discovered to translate requests for assistance made to foreign police forces. The confidential information becomes no more “confidential” because of online translation. Don DePalma of Common Sense Advisory warned that “free machine translation tools such as Google Translate can in advertently result in a data leak.” (Brown, 2017) As machine learning becomes more popular, how can the public users do to protect their data when enjoying the comfort and convenience brought by machine learning?

Works Cited

Brown, Claire. “GDPR: Beware Data Leaks via Online Search and Translation Tools.” Today Translations, Oct. 22, 2017.

Bid Data and Privacy

Tianyi Zhao

Figure 1. “On the Internet, nobody knows you’re a dog.” Peter Steiner, The New Yorker, 1993.

(Source: https://en.wikipedia.org/wiki/On_the_Internet,_nobody_knows_you%27re_a_dog)

The picture shows above is an adage and meme about Internet anonymity by Peter Steiner, published by the New Yorker twenty-six years ago. The anonymity is also blamed for numerous crimes. However, in the era of big data, everyone on the Internet is so “naked” that each character and habit of us has been represented as the binary system 0 or 1. The data owners benefit from exploiting datasets we have produced. Privacy becomes not private any more, unless we are completely disconnected from the Internet.

On the one hand, we seem to enjoy the comfort and convenience brought by big data, which know us better than we know ourselves; on the other hand, we have to abandon individual privacy. We detest our IDs are sold and exploited, but we do not really care that our privacy is collected, analyzed, and used as long as it is not linked to specific individuals. Because we always believe that if the government or commercial organizations collect more personal data, they will provide better public services and enable customers to consume better products at lower prices.

However, there have been amounts of data abuse scandals in recent years. Facebook has been criticized that more than 50 million Facebook users’ information was accessed and exploited by political data firm Cambridge Analytica to target users with accurate advertising content in order to help Donald Trump’s campaign for President in 2016. Yahoo has been under tremendous pressure for risking the privacy of 3 billion people. Google is facing a big challenge—the launch of the GDPR in E.U. Controlling over 90% of many European countries’ market for general web searches, compared with 68% in the U.S. market, Google collected and analyzed more data than any other company. On Jan 21, 2019, Google was fined nearly $57 million by French regulators for violating GDPR rules because Google failed to “fully disclose to users how their personal information is collected and what happens to it.” (Romm, 2019) The GDPR is clearly a significant extension of the global process of policy convergence, in which criteria for convergence are deepening. (Bennett, 2018) Before the implementation of the GDPR in 2018, around 120 countries in the world have passed data protection statutes which meet at least minimum standards of formal international agreements. (Greenleaf, 2017) Steve Wilson coined a new term as Big Privacy, referring to the data privacy compact for the era of big data and AI. It is designed to enhance transparency about how personal data is collected and created, engender more restraint in how it is used and grant customers appropriate control over data about them. (Wilson, 2018) We can see that the legal systems in the globe are improving the privacy protection, and the GDPR, although it is the most restrict up to now, is just a start point.  The technology giants should self-regulate for achieving further the legal compliance by leveraging with the concept of Big Privacy, beginning from complying with the GDPR.

 

Works Cited

Romm, Tony. “France fines Google nearly $57 million for first major violation of new European privacy regime.” The Washington Post, Jan. 21, 2019.

“Top Data Privacy and Security Scandals.” Datafloq, Nov. 15, 2018.

Bennett, Colin J. “The European Genral Data Protection Regulation: An instrument for the globalization of privacy standards?” Information Polity, 2018. https://pdfs.semanticscholar.org/3813/041fc44467933d64c54c3e39a467c2be63c3.pdf

Greenleaf, G. “Global Data Privacy Laws 2017: 120 National Data Privacy Laws, Including Indonesia and Turkey.” Privacy Laws & Business International Report, Jan 30, 2017.

Wilson, Steve. “Big Privacy: The data privacy compact for the era of big data and AI.” ZDNet, Dec 5, 2018.

Cloud Computing in AWS

Tianyi Zhao

It has been thirteen years since the popularization of cloud computing. Cloud computing has achieved rapid development and dramatic changes, which is another transformation following the one from large-scale computers to the client servers, from the dwindling in size to the cloud in form. Organizations around the world have invested cloud computing and continue to go deep. According to IDG’s Cloud Computing study in 2018, seventy-three percent of organizations have at least one application, or a portion of their computing infrastructure already in the cloud, while 17% plan to do so within the next twelve months. Although cloud computing has been welcomed in the market, the definition of it has not been unified by multi-parties. National Institute of Standards and Technology (NIST) states it as “a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources… that can be rapidly provisioned and released with minimal management effort or service provider interaction.” There is no doubt that the words need more updates as cloud computing evolves.

Amazon Web Services (AWS), officially launched in 2006 and popularized cloud computing, has maintained its leader role as its market share nudged up a percentage point to 34%, remaining bigger than its next four competitors combined (Microsoft, IBM, Google, and Alibaba). AWS keeps its dominance by dealing with wide ranging of cloud computing facilities developing a highly scalable and an on-demand computing platform, providing the full computing stack in the form of virtual resources.

Figure 1. AWS Architecture Diagram

(Source: https://www.researchgate.net/figure/AWS-provides-various-cloud-computing-services-such-as-Compute-Storage-Networking_fig3_328773947)

Among the AWS products, Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) take up the most parts. The users are mainly the system administrators and developers of the enterprises who purchased the services, which help them move faster, lower IT costs and scale. Besides the featured products such as EC2, S3 and RDS, one service called Amazon Transcribe finely leveraged cloud with machine learning. It is an automatic speech recognition service, which takes in audio and automatically generates accurate transcripts, helping developers to add speech-to-text capability to their applications– including customer service, subtitling, search and compliance.

Figure 2. How Amazon Transcribe Works

(Source: https://www.slideshare.net/AmazonWebServices/new-launch-introducing-amazon-transcribe-now-in-preview-mcl215-reinvent-2017)

The figure above shows a general routine that how Amazon Transcribe works.

  • Speech input: to store the file as an object in an Amazon S3 bucket and to specify the language and format of the input file.
  • To identify the individual speakers in an audio clip with speaker identification (between 2 and 10 speakers in an audio clip)
  • Channel identification: to split the audio file into multiple channels and transcribe the channels separately. After finishing transcribing all channels, it merges the transcriptions to create a single transcription.

Additionally, Amazon Transcribe can also transcribe streaming audio in real time and custom vocabularies for higher accuracy. The success of Amazon Transcribe relies on their specific attention on punctuation, confidence score, possible alternatives, time generation, custom vocabulary and multiple speakers. And it is continually learning and improving.

 

Works Cited

AWS. Amazon Transcribe Developer Guide. 2019.

Tiwari, R., et al. “Project Workflow Management: A Cloud Based Solution-Scrum Console.” International Journal of Engineering and Technology (UAE), vol. 7, no. 4, Science Publishing Corporation Inc, 2018, pp. 2457–62, doi:10.14419/ijet.v7i4.15799.

Ruparelia, Nayan B. Cloud Computing. Cambridge, MA: MIT Press, 2016.

Derrick, Lleana Castrillo. The Basics of Cloud Computing: Understanding the Fundamentals of Cloud Computing in Theory and Practice. Amsterdam; Boston: Syngress / Elsevier, 2014.

2018 Cloud Computing Survey. IDG, Aug. 2018.

Bozicevic, Vedran. “State of Cloud Computing Report 2019: Cloud Spending is on the Rise.” GlobalDots, Jan. 2019.

Bailey, James. “AI-Powered Transcription Services Showdown: AWS VS. Google VS. IBM Watson VS. Nuance.” Armedia, Jan. 2019.

Tech Behind Siri

Tianyi Zhao

Siri, launched by Apple Inc. in 2011, has been quite familiar to us as a voice assistant. It simplifies the navigation through our iPhone and the completion of our orders by listening and recognizing our voice. For example, Siri can tell the weather forecast, or call user’s contacts, or even tell a joke. The technologies behind Siri are mainly speech recognition and natural language processing, the two significant branches of machine learning.

 

Speech Recognition and Speaker Recognition

Speech recognition converts the acoustic signal from human into its corresponding textual forms. It primarily examines “what the user says”. Compared to speech recognition, Siri also leverages with speaker recognition to achieve personalization, which focuses on “who is speaking.” For instance, user can simply say “Hey Siri” to invoke Siri. However, it cannot work if any other people say the same words except the user. Enrollment and recognition become two processes to apply speaker recognition. User enrollment occurs when the user follows the set-up guidance from a new iPhone. By asking users to say several sample phrases, a statistical model for the user’s voice is created. The five sample phrases requested from the user show as below in order:

  1. “Hey Siri”
  2. “Hey Siri”
  3. “Hey Siri”
  4. “Hey Siri, how is the weather today?”
  5. “Hey Siri, it’s me.”

Figure 1.  Block diagram of Personalized Hey Siri

(Source: https://machinelearning.apple.com/2018/04/16/personalized-hey-siri.html)

The figure shows how the Personalized Hey Siri proceeds. Within Feature Extraction, the acoustic input is converted into a fix-length speaker vector, including the phonetic information, background information of the environment and user’s identity. Then the speaker’s characteristics are focused and other factors – such as phonetic and environment factors – are deemphasized to achieve the accurate recognition in any circumstances. Hence the five sample phrases will generate five speaker vectors, which are stored in the user profile in each Siri-enabled device.

Natural Language Processing

Figure 2. Deep Neural Network in Siri

(Source: https://machinelearning.apple.com/2017/10/01/hey-siri.html)

After Siri understands what the user is saying, the converted texts are sent to Apple servers for further natural language processing algorithms to examine the intent of the user’s words. Figure 2 shows how Deep Neural Network (DNN) works in Siri. The DNN “consists mostly of matrix multiplications and logic nonlinearities. Each ‘hidden’ layer is an intermediate representation discovered by the DNN during its training to convert the filter bank inputs to sound classes. The final nonlinearity is essentially a Softmax function.” (Siri Team, 2017)

 

Works Cited

Alpaydin, Ethem. Machine Learning: the New AI. The MIT Press, 2017.

Siri Team. “Hey Siri: An On-device DNN-powered Voice Trigger for Apple’s Personal Assistant.” October 2017. https://machinelearning.apple.com/2017/10/01/hey-siri.html

Siri Team. “Personalized Hey Siri.” April, 2018.

https://machinelearning.apple.com/2018/04/16/personalized-hey-siri.html

Aman Goel. “How Does Siri Work? The Science Behind Siri.” Magoosh. Feb. 2, 2018.

https://magoosh.com/data-science/siri-work-science-behind-siri/

 

 

Machine Translation, from Statistical to Neural Era

Tianyi Zhao

Translation applications have been increasingly popular as globalization accelerates. From the merely function as dictionaries of word translation to achieve paragraph or idiom translation, machine translation has been widely applied with the enrichment of various languages and rapid technical evolution. Machine translation has become a significant field of computer science, computational linguistics and machine learning. Beginning with rule-based systems, machine learning has been advanced to statistical and neural approaches, which are two prevalent ones currently. However, as deep learning develops, neural network is gradually replacing statistical machine translation.

Statistical Machine Translation

Figure 1. Statistical Machine Translation Pipeline

(Source: https://www.researchgate.net/figure/Basic-Statistical-Machine-Translation-Pipeline_fig2_279181014)

Statistical machine translation uses predictable algorithms to teach machine to translate with parallel bilingual text corpus. The machine leverages from what it has been taught, which are the translated text, to predict the translation of the foreign languages. It is data-driven, which only needs the corpus of both source and target languages. However, the word or phrase alignment breaks down the sentences into independent words or phrases during translation. The word cannot be considered and translated until the previous one has finished. Besides, the corpus collection is costly in time and efforts. Statistical approach cannot be predominant, because “[it] consists for the most part in developing large bilingual dictionaries manually.” (Poibeau, 139) Additionally, the translation results may have superficial fluency that may cause misunderstanding.

Neural Network Machine Translation

Figure 2. Neural Machine Translation

(Source: https://www.morningtrans.com/welcome-to-the-brave-new-world-of-neural-machine-translation/)

Neural machine translation is more advanced approach than the statistical one. It is based on the neural networks in the human brain, so similarly the information is delivered to different “layers” to be processed before output. Compared to statistical approach, neural machine translation does not require alignment between the languages. Instead, it “attempts to build and train a single, large neural network that reads a sentence and outputs a correct translation.” (Bahdanau, 1) It is a encoder-decoder model, in which the source sentence is encoded into a fix-length vector from which a decoder generates a translation. Being applied deep learning techniques, neural machine translation can teach itself to translate based on the statistical models. According to Ethem Alpaydin, the process of neural machine translation starts with multi-level abstraction in lexical, syntactic and semantic rules. Then a high-level abstract representation is extracted, and the translated sentence will be generated as “decoding where we synthesize a natural language sentence” in the target language “from such a high-level representation.” (Alpaydin, 109) It combines context to find more accurate words and automatically adjusts to a more natural sentences syntactically that are smoother and more readable.

All in all, although statistical machine translation is still prevailing, it will be superseded by the emerging neural networks. I believe neural machine translation will be the near future, because it has the advantages of quality and speed, which are precisely the true values of machine translation.

 

Works Cited

Alpaydin, Ethem. Machine Learning: The New AI. The MIT Press. 2016.

Kaplan, Jerry. Artificial Intelligence: What Everyone Needs to Know. Oxford UP, 2016.

Juan Migual, Alexander, Necip Fazil Ayan. “Transitioning entirely to neural machine translation.” Facebook Code, Aug. 3, 2017. https://code.fb.com/ml-applications/transitioning-entirely-to-neural-machine-translation/

Poibeau, Thierry. Machine Translation. MIT Press, 2017.

Hao, Tianyong, et al. “Natural Language Processing Empowered Mobile Computing.” Wireless Communications and Mobile Computing, vol. 2018, Hindawi, 2018, p. 2, doi:10.1155/2018/9130545.

Bahdanau, Dzmitry, et al. “Neural Machine Translation by Jointly Learning to Align and Translate.” arXiv.org, Cornell University Library, arXiv.org, May 2016, http://search.proquest.com/docview/2079082715/.

“Statistical vs. Neural Machine Translation.” United Language Group.

Information Transmission Model and its Meaning

Tianyi Zhao

This week’s readings mainly unveiled the Information Transmission Model raised by Claud Shannon, which summarizes a simple and unidirectional path showing how the signs and symbols being encoded, transmitted and decoded. There are six basic elements: information source that produces information, transmitter which encodes it to the signals, channel that adapted to the signal for transmission, receiver that decodes the message from the signals, destination which the message arrives, and noise that interferes the signals travel during the channel part. For example, in a conversation, the transmitter is the mouth; the other one’s ears are the receivers; the signals are the sound waves; while the noise could be others’ distraction from their environment. The brains are the information source and destination where their ideas going to be encoded to language words and the words heard being decoded.

Instagram is a popular social media application mainly based on photos sharing. When we upload our photos online, as photos are made of pixel patterns, the transmission process sends the pixels through online channels. Then the pixels are reconstructed and decoded on the software, being displayed on our mobile device as a recognizable photo instead of random pixels. When your friends comment on the posted photos, their input words are encoded as bytes, then transmitted in packets, and decoded on the device.

Professor Irvine claims that the meaning is not “in” the system, it is the system. Put another way, a message does not have meaning until people attach signs to the referents. Furthermore, whether information transmitted successfully depends on how the receivers interpret the message. The communicating groups should share and understand a common knowledge, which means to exchange message and interpret in “assumed context.” (Irvine, 12) For example, when a Korean friend comments my photo in Korean. As a receiver who can only speak Chinese and English, I cannot successfully interpret the Korean characters. They are meaningless for me. So the transmission process fails because the Korean friend and I do not share the common language. Also, it is clear that every information depends on people’s mind to attach referent and interpret.

All in all, there are two levels of communication transmission. Technically, information is encoded and transmitted by bytes, and then it is decoded to adapt to displaying on the device. As for the meaning level, senders attach signs to the referent with a specific meaning, while receivers decode to get the meaning. The success of transmission relies on both information communication theory and semiotics.

 

Works Cited

Irvine, Martin. “Introduction to the Technical Theory of Information.” Feb. 4, 2019.

Denning, Peter J. and Martell, Craig H. Great Principles of Computing. MIT Press, 2015.

Google’s Machine Translation

Tianyi Zhao

The most impressive application of Machine Learning in natural language processing is machine translation. According to Radiant Insight Inc., the global machine translation market is expected to reach USD 983.3 million by 2022. Google Translate is a more prominent product in global market. In 2016, Google Translate realized it transformation from statistical machine translation to neural machine translation with multiple input methods.

 

Pattern Recognition

The recognitions of optical and acoustic information are two significant segments in pattern recognition. These two are fully applied to Google Translate as image translation and speech translation. With camera, Google Translate APP can easily and fast capture the features that can be recognized as language, a sequence of words rational from the lexicon and in semantics. Then the real-time translation is delivered automatically. For example, when you travel to Greece, and every landmark is in Greek. You can simply hold up your smartphone, and the camera will mechanically capture the Greek characters and show the translation in the target language. Of course, we always have different fonts or handwritten characters that seem hard to recognize. However, by leveraging with the learning program, the processing machine can quickly know it with the distinct regularities applicable to each Latin character which are generalized and shared by all kinds of fonts.

Besides image recognition, speech recognition is heavily used in machine translation as well. The input of characters in acoustic signal can be identified as a sequence of phonemes, the basic speech sounds. Similarly to the visual recognition above, there are different pronunciations of the same word because of age, gender or accent. In machine translation, what the learning program teaches is only the features that relate to the words instead of those of the speakers. However, Google Translate applies the second type as well, which is for the Conversation Translate. It achieves not only recognizing the input words but also identification of different people in the dialogue. To continue the same example in Greece, if you ask the local people passing by, who does not speak English, for how going to the destination. The Conversation Translate can accomplish instant real-time interpretation between Greek and English as your dialogue goes on.

 

Neural Machine Translation

According to Ethem Alpaydin, the process of neural machine translation starts with multi-level abstraction in lexical, syntactic and semantic rules. Then a high-level abstract representation is extracted, and the translated sentence will be generated as “decoding where we synthesize a natural language sentence” in the target language “from such a high-level representation.” (Alpaydin, 109) The era of phrase-based statistical translation has ended, while neural system translates an entire sentence at a time rather than cutting it into words. It combines context to find more accurate words and automatically adjusts to a more natural sentences syntactically that are smoother and more readable.

 

Works Cited

Alpaydin, Ethem. Machine Learning: The New AI. The MIT Press. 2016.

Radiant Insights. “Machine Translation Market Size Worth USD 983.3 Million By 2022: Radiant Insights, Inc.” Global Newswire. Dec. 3, 2015.

Wu, Yonghui, et al. “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation.” arXiv.org, Cornell University Library, ar Xiv.org, Oct. 2016.

 

Understanding AI

Tianyi Zhao

My first touch with artificial intelligence was from the robots in the film series Terminator. Then came the Westworld. The movies have visualized the new era when people live and work with artificial intelligence as well as the potential problems – for example, the AI-human relationship – that humans will face in the near future. The AI robots depicted on the screen have widened our horizon on what AI is and how it can be applied in reality. However, this week’s reading has systemized my knowledge about AI for the first time. According to Margaret A. Boden, AI “seeks to make computers do the sorts of things that minds can do.” (Boden, 1) Some key points have dragged me into ponderation.

Virtual machine, an information-processing system that stay in minds of programmers and users, was a bit hard for me to understand until Boden stated that programming languages belonged to virtual languages as well. The experience of learning Python in the last semester reminded and inspired me. Python’s instructions have to be translated to machine code before they can be run. Python running rules have been deeply rooted both in minds of encoder and those of decoders. Its rapid growth with advantages – such as various inbuilt libraries and shorter line of codes – has made it more favorable for AI-based projects. With the example of Python, virtual machine becomes more easier to comprehend.

Figure 1. AlphaGo Beat Top-ranked Professional Player

(Source: https://www.hardwarezone.com.sg/tech-news-googles-alphago-ai-just-beat-number-one-ranked-go-player-world)

Go, an ancient board game with complicated prediction and planning, was thought as the game only for humans. However, AlphaGo broke the belief by beating the world’s best professional players since 2016. The unexpected success credited the planning technique of AI. A plan specifies a sequence of actions with a final goal. To reach the final goal, there are amounts of sub-goals. According to Boden, the planning program needs symbolic operators, a set of prerequisites for each action, and “heuristics for prioritizing the required changes and ordering the actions.” (Boden, 26) The integral enables AlphaGo to plan every step among many possible moves and reach the final win.

Figure 2. Some of Siri’s Functions

(Source: https://www.apple.com/siri/)

Furthermore, one of the prevalent applications of AI is natural language processing, an interactive area among computer science, AI, linguistics and human natural languages, aiming to realize various theories and methods for effective communication between humans and computers in natural language. Apple’s Siri is a typical example. Serving as a personal assistant, Siri can answer varieties of questions and quickly get access to any applications for information needed. The built-in conversational analysis will fast analyze the input sentences, spoken or typed, and decide the answers that satisfies users’ preferences.

All in all, AI has already been around us and quickly develop. The understanding of virtual machine, for example the programming languages, can be easier if applies to a specific language. The virtual machine that progress AI should be useful and interesting. Meanwhile, planning technique is prevalent in AI and develops in big strides with the example of AlphaGo. Natural language processing is a universal application of AI by briefly analyzing Siri.

 

Works Cited

Boden, Margaret A. AI: Its Nature and Future. Oxford: Oxford University Press, 2016.

Warwick, Kevin. Artificial Intelligence: The Basics. New York: Routledge, 2012.

https://www.apple.com/siri/ (2019)