Category Archives: Week 7

Deep Neural Nets and the Future of Machine Translation

GNMT (Google Neural Machine Translation) utilizes a E2E / DNN (End 2 End / Deep Neural Net) to perform encoding and decoding at a much faster rate than traditional statistical models utilized by other machine translation methods. (Le and Schuster 2016) For example, IBM’s statistical word level translation models which had to first split the sentence into words, then find the best translation for each word. (Poibeau 117) The GNMT translates entire sentences rather than singular words, which takes into account more ambiguities associated with spoken and written language for a much more accurate translation. (Le and Schuster 2016)

While GNMT is essentially based on a statistical model, it is on a much grander scale than previous methods. (Poibeau 147) The training with GNMT is much more efficient than the previous statistical models which lacked the “learning” aspect, and required manual fixes upon discovery of design flaws. (Poibeau 147) A huge difference between neural networks now, and the old statistical models, is the multidimensional aspect between encoding and decoding, meaning, “…higher linguistic units, like phrases, sentences, or simply groups of words, can be compared in a continuous space…” (Poibeau 149) 

However, GNMT still requires important interaction during the training process, because it is necessary for the neural network to target the most relevant data, rather than waste computational resources on unnecessary data. (Roza 2019) The adding of features also becomes an issue, if new features are added, the old DNN becomes obsolete. (Roza 2019) Another issue facing DNN is the massive amounts of data required for them to produce the best results, however, it seems the amounts of available data is becoming less an issue than the organization of such data. 

Some pressing questions which came to mind during the readings were:

  1. Why is Chinese (mandarin) lagging behind other languages in the GNMT translation quality? (Le and Schuster 2016) Is it possible that data on Mandarin is less accessible to the DNN than other languages?  
  2. How fluid is DNN learning? Does the training stop when the DNN is actively being implemented? 
  3. How exactly are statistical models different from DNN’s? Is it simply the case that DNNs continue to learn whereas statistical models utilize a less fluid database? 

References

Le, Quoc, and Mike Schuster. “A Neural Network for Machine Translation, at Production Scale.” Google AI Blog, http://ai.googleblog.com/2016/09/a-neural-network-for-machine.html. Accessed 9 Mar. 2021.

Poibeau, Thierry. Machine Translation. 1st ed., MIT Press, 2017.

Roza, Felp. “End-to-End Learning, the (Almost) Every Purpose ML Method.” Medium, 18 July 2020, https://towardsdatascience.com/e2e-the-every-purpose-ml-method-5d4f20dafee4.

Think of things backwards-How does Google Translate work

The machine translation can be considered as a simple model. We input some words or sentences, the machine analysis the input, transfer the input and then generate the output, words or sentences in other languages. When we do not apply the Neural Network to machine translation, we usually use three architectures, direct, transfer and interlingua for machine translation which do not involve probability and statistics. In fact, “GNMT did not create its own universal interlingua but rather aimed at finding the commonality between many languages using insights from psychology and linguistics” (McDonald, 2017).  Google translate belongs to the statistical MT. When comes to the statistical MT, “All statistical translation models are based on the idea of a word alignment. A word alignment is a mapping between the source words and the target words in a set of parallel sentences”, according to the Speech and Language Processing.  Google translate uses the bidirectional RNN to align the input and output. Firstly, it encodes the input sentence into vectors by one RNN for the input language which is used for encoding. Then the vectors will try to be mapped by many vectors which represent words in language of output (actually, words here are still vectors) to find which alignment is the best. It is just like what Speech and Language Processing wrote “think of things backwards”. The task here is to find the hidden output vectors which can generate the input vectors. And from this step we can know that it is a supervised machine learning. After the mapping or alignment, the match vectors will be decoded into words in output language.

Question:

Does the google translate use English as a bridge to link with other language, like Chinese and Japanese?

How does the statistical machine translation deal with syntax and semantics, use the probability and statistics to skip these kinds of problems?

References

A Neural Network for Machine Translation, at Production Scale. (n.d.). Google AI Blog. Retrieved March 9, 2021, from http://ai.googleblog.com/2016/09/a-neural-network-for-machine.html

CS Dojo Community. (2019, February 14). How Google Translate Works—The Machine Learning Algorithm Explained! https://www.youtube.com/watch?v=AIpXjFwVdIE

Jurafsky, D., & Martin, J. H. (2000). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall.

McDonald, C. (2017, January 7). Ok slow down. Medium. https://medium.com/@chrismcdonald_94568/ok-slow-down-516f93f83ac8

NLP and Google Translate

NLP is an interdisciplinary field that combines the knowledge of linguistic, computer science, and artificial intelligence. The two main categories of NLP are Natural Language Understanding, which presents how a computer can pull out meaningful information from a cluster of words and further categorize it, like Gmail recognizing the spam emails and generating them into trash mails for us. The second category is Natural Language Generation, which is more complicated since it is also responsible for understanding the context, distinguishing further subtracting the key information from these contexts. And the purpose is to make the machine understand the human’s natural language. So it can build a connection that allows the occurrence of interactions between humans and machines. In order to achieve this goal, several methods have been used. Morphology: This allows the computer to learn from different word roots and categorize base on the sharing similarity. Distributional Semantics: Learn the meaning of words based on the occurrence and the frequency of a word that appeared in the same sentence or context. And also the encoder-decoder models where a computer can encode a sentence, and decode it into a string of unsupervised clusters. However, since there are too many key details for the machine to remember, RNN occurred, where a computer can learn just the representation of each word, and make predictions. For example, every time I use google documents when I repetitively write about a single word over and over again under a specific topic. It will automatically provide suggestions, usually in phrases or terms, which are super relevant to the topics I’m writing about. Like for now, since I repetitively talk about AI-related stuff. The google document just gives me suggestions like “computer science”, “machine learning”, “natural language processing”, whenever I type something starting with “machine”, “computer” and “natural”, even though some of the time I didn’t tend to type that. 

Since the computer couldn’t understand human natural languages, it is necessary to transform those characters into the language that the computer can understand, for example, numbers like 0, 1. The phrase structure rule is the grammar for computers. And the parse tree can tag every word of a speech and further reveals how the sentence is structured, which allows the computer to access, process, and respond to the information more efficiently. How does google translate work exactly? Since we cannot use the word-to-word translation because the natural language requires a reasonable sequence to be understood. So, the first step is to code the natural language into numbers that can be operated by computers, which is called Vecoter Mapper, and then in order to get the language we want, the break-out sentences have to be generated into a whole again. We simple reverse the whole process, from the recurrent neural network back to the vector mapper. In the whole process, the neural network allows the computer to learn the patterns from the massive amount of real examples based on actual conversations and writings. 

Questions:  How does the computer “understand” the grammar exactly? Or in fact, it doesn’t, it just finds the pattern after all. 

 

References:

CS Dojo Community. (2019, February 14). How Google Translate Works – The Machine Learning Algorithm Explained! YouTube. https://www.youtube.com/watch?v=AIpXjFwVdIE

CrashCourse. 2017. Natural Language Processing: Crash Course Computer Science #36https://www.youtube.com/watch?v=fOvTtapxa9c.

CrashCourse. 2017. Machine Learning & Artificial Intelligence: Crash Course Computer Science #34https://www.youtube.com/watch?v=oi0JXuL19TA.

“Machine Translation.” 2021. In Wikipediahttps://en.wikipedia.org/w/index.php?title=Machine_translation&oldid=999926842.

“Natural Language Processing.” 2021. In Wikipediahttps://en.wikipedia.org/w/index.php?title=Natural_language_processing&oldid=1009043213.

A Neural Network for Machine Translation, at Production Scale. (2016, September 27). Google AI Blog. https://ai.googleblog.com/2016/09/a-neural-network-for-machine.html

 

Lost in Translation – take 2

If not all of us, most of have definitely used Google translate at some point in our lives. I don’t doubt that some instances have been successful but I bet most have unfortunately not. The reason behind this is because natural human language is such a complex system of labyrinths that sometimes can only truly take place in the human mind as we can understand and interpret context, meaning and overall situational understanding that comes hand in hand with language. This is something that computers and machines have not been able to perfectly achieve yet as assigning cultural, interpretations, etc. is extremely challenging for a machine and if we think about it even for humans as often times not being a native and have grown up in the country of the language, you miss a lot of cultural nuances, interpretations, signs, etc. As Thierry Poibeau (2017) explains; “Natural language processing is difficult because, by default, computers do not have any knowledge of what a language is. […] There is a dynamic co-construction of interpretation in the brain that is absolutely natural and unconscious” (Poibeau, 2017, 23-26). However, this challenge is a highlight and key point in today’s technology and leads the path for further advancements as nothing can be done without language. So how does Google Translate work and why is not as reliable often times (aka why does the translation never make perfect sense in the final language)? 

Natural Language Processing and Machine Learning play a crucial role in how Google translate, IPAs and many more computers and systems are able to read, interpret, understand and emulate a sentence whether phonetically or in the context of translation in order to fit natural language and real human dialogue standards. NLP is “concerned with the interactions between computers and human language. How to program computers to process and analyze large amounts of natural language data.” (Wikipedia, 2021) It is needed and used to understand the context and contextual nuances to which I was referring above. Computational linguistics also play a role in this “decoding” (pun intended) process and are “concerned with the computational modeling of natural language, the study of appropriate computational approaches to linguistic questions” (Wikipedia, 2021). One of the fundamental problems in NLP was sentence deconstruction and the ability for a computer to break down the sentence into bite-size pieces in order for it to be processed more easily. Of course, it also has to follow phrase structure rules in order to encapsulate the grammar and meaning of a language. As we have seen from this weeks and previous weeks readings: parse tress, neural networks, natural language understanding (how we get meaning out of combination of letters) and natural language generation (how to generate language from knowledge, distributional semantics (make the machine/computer guess words that have similar meaning but seeing which words appear in the same sentence a lot), count vectors (the number of times a word appears in the same place or sentence as other common words), etc. all build up a system where all this data is pulled and “stored in a web of semantic information where entities are linked to one another in meaningful relationships and informational sentences” (CrachCourse #7, 2019; CrashCourse #36, 2017). 

Google Translate uses Machine Translation (constituted by all of the above and more): “the idea is to give the most accurate translation of everyday texts. […] The ultimate goal is to obtain quality of translation equivalent to that of a human being.” (Poibeau, 2017, 14-21). Google Translate, relies on NN as they look at thousands and thousands of examples to give the best solution/result. By using the Encoder-Decoder model, MT “builds internal representions and generates predictions. The encoder (a model that can read in the input aka the sentence) stands for “what we should think and remember about what we just read” and the decoder “decides what we want to say or do” (CrachCourse #7, 2019). The NN converts the words into a form, numbers, vectors and matrices that the computer understands. Recurrent Neural Networks, have a loop that allows them to “reuse single hidden layer, which gets updated as the model reads on at a time and by training the model on which to predict next, the model waits for the encoder RNN and the decoder prediction layer. The RNN are long short term memory RNNs (LSTM-RNNs) that can deal with longer sentences (instead of just words) much better (CS Dojo Community, 2019). By doing this consistently, if the computer notices that the two words mean something similar, the model makes their vectors more similar and therefore can “predict” the word that will follow the next time it is asked to do so (CrachCourse #7, 2019). The E-D model, takes the words/sentences the RNN turns them into Vector (sequence to vector) and the decoder takes the vectors and the RNN turns it into the words of the other language (vector to sequence) (CS Dojo Community, 2019) However, this fails to address the complexity that comes with contextual meaning and understanding and can be limiting when we are dealing with longer sentences that have more than 15-20 words. Just because words have similar meanings doesn’t mean that they can necessarily be interchangeable in all contexts. (See blog post)

The solution that Google Translate has “come up with” is by replacing RNN with BidirectionalRNN which uses an attention mechanism between the encoder and the decoder and helps the computer know which words to focus on while generating the words for another language (CS Dojo Community, 2019). During the translation process, a – lets say- English sentence is “fed into” the encoder, “translated” (again, pun intended) into a vector which is then taken my the attention mechanism that correlates which -lets say- Greek word will be generated by which English words. The decoder will then generate the result of the translation (in Greek) by focusing on one word at a time as the words have been determined by the attention mechanism. In this specific case, Google translate actually uses 8 LSTM because “deeper networks help better model complex problems” and are “more capable of understanding the semantics of language and grammar” (CS Dojo Community, 2019). 

What does this data look like? Is it saved as words or as vectors? Are knowledge graphs shared across any type of machine/computer/software i.e. does google translate share its data collection with others? 

References 

Blog post from Prof. Irvine’s class 711: Computing and the Meaning of Code (Ironically enough I though of the same title for both of these, i.e. “take 2” on this one’s title) 

Crash Course Computer Science: Natural Language Processing (PBS). 

Crash Course AI, 7: Natural Language Processing

Thierry Poibeau, Machine Translation (Cambridge, MA: MIT Press, 2017).

How Google Translate Works: The Machine Learning Algorithm Explained (Code Emporium).

Wikipedia: Machine Translation,

Wikipedia: Computational Linguistics 

Wikipedia: Natural Language Processing 

The Statistics and The Word

For someone who has always had trouble with words, knowing my computer has had the same trouble is a great relief. English is a confusing language, the rules bend and break depending on the context. It’s hard enough for people to learn language context for what they are learning, a computer, which arguably doesn’t understand the semantic meaning of the word, has to suggest and predict based on the rules we give it and what has happened before.

How does this work? Why are google docs becoming so good a guessing the next word in your sentence? The answer is statistics.

As we write, information is flowing out of our fingertips coming to what we expect to be the eventual end of a sentence.  Each sentence has some meaning which comes in a usually predictable way.  (Subject – Predicate is how I learned it.) That means that as I sentence off there should be certain elements of the prose which are more understandable each step I take. We do this all the time when we can guess what will come at the end of a talk, or guess what someone is likely to say next. It’s because computers, like us, build a repertoire of already constructed sentences that allows us to get a good idea of what is likely to come next. The computer builds models based on millions of lines of texts, evaluating each and every way these words have been constructed, taking into account what has already been written, and then generating a suggestion that has the highest likelihood of being correct.

We can see this in action with google docs, as we are writing it makes in the moment suggestions as to what will come next. This does two-fold. First, this is training the model for the language we expect it to use in real-time. If I choose to write what the AI suggests then it knows it was correct in its choice in how the sentence was structured and choosing the right model for the future. Second, it shoehorns the user into using more predictable language which then the AI can better predict. The more language the AI knows, and the more it knows how you write, the better it is as predicting how you will compose a document.

Writing is an arduous task, to do it well takes a tremendous amount of effort and time. I imagine as these AI advances writing will become easier, to the point where all we will need to do is give the AI the subject of our writing and the context of why it’s being written and the AI will be able to give us a decent first draft.

Statistical probabilities are an interesting thing, as models get better we (people) become more predictable. This lends itself an eerie feeling that someone knows what we will do. Though this is a topic for another time, the predictability of how you write and speak is critical to how systems work. The only way AI can write is because what we write, and perhaps how we think, comes in a predictable way. Think about that next time you decide to compose a document, whether your writing foreshadows what is to come next.  For now, we can start to rely on machines for that next step.

NLP in Languages

“Challenges in natural language processing frequently involve speech recognition, natural language understanding, and natural-language generation”(Wikipedia). I started weekly readings with the question related to patterner cognition I had proposed last week- how could the computer tell the difference between characters in Japanese and characters in Chinese in Machine Translation (MT) without their codes. I ignore the fundamental rules in natural language to achieve language understanding with only the de-blackboxing of Optical Character Recognition (OCR) and Convolutional Neural Networks (ConvNet). Natural languages have two components- “tokens (smallest units of language) and grammar (defines the ordering of tokens)” (CS Dojo Community, 2019). In the processes of OCR, the computer would easily tell the difference between characters in Japanese and characters in Chinese since two languages have distinct grammar/Phase Structure Rules. There are examples in two languages with the same meaning- I’ll go to the school-:

  • 我要去学校。(Chinese)
  • 学校へ行きます。(Japanese)

Although they have the same tokens for “school” (学校), they work in different grammars. In Chinese, the syntax goes with the order- subject→ predicate→ object. While in Japanese, the syntax goes with the order- subject→ object→ predicate (with habitual omit of the first person- I). In other words, for Parse Trees of two languages, verbal phrases will appear in the last part of sentences in Japanese, while in Chinese, they will appear in the middle of sentences. To reach coherent syntaxes and semantics in MT, the computer will recognize two different languages in the paragraph effortlessly.

In the progress of natural language understanding and natural language generation, an Encoder-Decoder Architecture with Recurred Neural Network (RNN) is used. (CS Dojo Community, 2019) The computer firstly converts a source language sentence into a vector with a current RNN. Then, to convert this vector into a target language sentence, another neural network will be introduced. The important point is, for perfection, algorithms need to have a huge amount of datasets to train themselves.

Moreover, we learned some in previous sections for speech recognition- the computer could decode and encode the sound waves by assigning values to acoustic signals vertically (according to the amplitude). To make it more obvious, we will use spectrograms to visualize it. As each language has a certain amount of sounds, it could be trained, again, using DL algorithms. (CrashCourse#36)

Questions:

According to professor Liang, “there’s probably a qualitative gap between the way that humans understand language and perceive the world and our current models.” Is this gap could be closed only with more advanced techniques? Or do we still need a new linguistic theory to systematize human language? Is the current mainstream linguistic theory enough for NLP?

Reference

CrashCourse. 2017. Natural Language Processing: Crash Course Computer Science #36https://www.youtube.com/watch?v=fOvTtapxa9c.

CrashCourse. 2017. Machine Learning & Artificial Intelligence: Crash Course Computer Science #34https://www.youtube.com/watch?v=oi0JXuL19TA.

CS Dojo Community. 2019. How Google Translate Works – The Machine Learning Algorithm Explained! https://www.youtube.com/watch?v=AIpXjFwVdIE.

“Machine Translation.” 2021. In Wikipediahttps://en.wikipedia.org/w/index.php?title=Machine_translation&oldid=999926842.

“Natural Language Processing.” 2021. In Wikipediahttps://en.wikipedia.org/w/index.php?title=Natural_language_processing&oldid=1009043213.

“The Technology behind OpenAI’s Fiction-Writing, Fake-News-Spewing AI, Explained.” n.d. MIT Technology Review. Accessed March 6, 2021. https://www.technologyreview.com/2019/02/16/66080/ai-natural-language-processing-explained/.

Week 7 Reflections

One thing I have been interested in for a while was how devices like Amazon Alexa, Google Home, or Siri take in and process our words into text and then provide us with answers. From the Computer Science Crash Course video, it was explained that the acoustic signals of words are captured by a computer’s microphone. This signal is the magnitude of displacement of a diaphragm inside of a microphone as sound waves, which cause it to oscillate. We have graphable data to represent time and the vertical access is the magnitude of displacement (amplitude). The sound pieces that makeup words are called phonemes. Speech recognition software knows what all these phonemes look like because, in English, there are roughly 44 phonemes, so computer software essentially tries to pattern match. To separate words from one another, figure out when sentences begin and end, and obtain speech converted into text, techniques used include labeling words with parts of speech and constructing a Parse Tree (which not only tags every word with a likely part of speech, but also reveals how the sentence is constructed). 

“You shall know a word by the company it keeps.” But, to make computers understand distributional semantics, we have to express the concept in math. One simple technique is to use Count Vectors.  A count vector is the number of times a word appears in the same article or sentence as other common words. But an issue presented with count vectors is that we have to store a LOT of data, like a massive list of every word we’ve ever seen in the same sentence, and that’s unmanageable. To try to solve this problem, we use an encoder-decoder model: the encoder tells us what we should think and remember about what we just read and the decoder uses that thought to decide what we want to say or do. In order to define the encoder, we need to create a model that can read in any input we give it, i.e. a sentence. To do this, a type of neural network called a Recurrent Neural Network (RNN) was devised. RNNs have a loop in them that lets them reuse a single hidden layer, which gets updated as the model reads one word at a time. Slowly, the model builds up an understanding of the whole sentence, including which words came first or last, which words are modifying other words and other grammatical properties that are linked to meaning. 

Stepping away from the more technical side of NLP and the devices we currently use, I wanted to note that I love the idea of a positive feedback loop. Because people say words in slightly different ways due to things like accents and mispronunciations, transcription accuracy is greatly improved when combined with a language model, which can take statistics about sequences of words. The more we use these devices that try to recognize speech and hear new accents, mispronunciations, etc, the better we can train our devices to understand what we are saying. Scary? Maybe. But also cool.

I’m extremely excited to be reading about natural language processing this week, as I loved the intro course I took in NLP last semester. One of the later assignments we had that reminded me of the Crash Course videos and some of the reading was called “Read training data for the Viterbi tagger.” For context, the Viterbi algorithm is essential for POS tagging but also great for signal processing (cell phone signal decoding), DNA sequencing, and WiFi error correction. Here were the instructions for the assignment:

  • Read the training data
  • Split the training file into a list of lines. 
  • For each line that contains a tab (“\t”), split it by tab to collect the word and part of speech tag.
  • Use a dictionary to track frequencies for:
    • Each word as each tag
    • Each transition from the last tag to the next tag
    • Sentence starting probabilities for each tag
  • Divide by the total number of words to make probabilities and put them into the same nested dictionary structure used by the Viterbi tagger.
  • Now test the tagger:
    • read the test file
    • Tag each sequence of words using the viterbi code
    • Report in a comment: For how many tokens did the tagger find the right solution?
  • Add an evaluation by sentences: for how many sentences is the tagger 100% correct? (include code to calculate this and report the accuracy in a comment)


Here is what my code looked like:

 

Questions:

Google’s version of this is called Knowledge Graph. At the end of 2016, it contained roughly 70 billion facts about, and relations between, different entities… Can you speak more about knowledge graphs, what is necessary to create one, and how they are stored? How does Google use this?

Citations

CrashCourse. Natural Language Processing: Crash Course AI #7, 2019. https://www.youtube.com/watch?v=oi0JXuL19TA.
———. Natural Language Processing: Crash Course Computer Science #36, 2017. https://www.youtube.com/watch?v=fOvTtapxa9c.
Google Docs. “Poibeau-Machine Translation-MIT-2017.Pdf.” Accessed March 8, 2021. https://drive.google.com/file/d/1vOZvxGA-1Uf2HL1MAYqx8Silk9g1r6e9/view?usp=drive_open&usp=embed_facebook.

Machine Translation (Example: Google Translator)

Giving computers the ability to understand and speak a language is called Natural Language Processing (NLP) (NLP:CrashCourseComputerScience#36, 2017) (Daniel Jurafsky, 2000). NLP is considered an interdisciplinary field that fuses computer science and linguistics. NLP explores two ideas: natural language understanding (NLU) and natural language generation (NLG). While NLU deals with how to get the meaning of combinations of letters (AI that filters spam, Amazon search etc.), NLG generates language from knowledge (AI that performs translation, summarize documents, chatting bots etc.) (NLP-CrashCourseAI#7, 2019). There is an infinite number of approaches to arrange word in a single sentence, which cannot be given to computer as a dictionary. In addition to that, there are many words having multiple-meaning, like “leaves”, causing ambiguity, so computers need to learn grammar (NLP:CrashCourseComputerScience#36, 2017). To take grammar into account while building any language translator, we should first ensure the syntax analysis. Second, the semantic analysis must be applied to ensure that sentence make sense (Daniel Jurafsky, 2000), (How-Google-Translate-Works, 2019).

Language translation

Machine translation (MT) is a “sub-field of computational linguistics that uses computer software to translate text or speech from one language to another” (Wikipedia, 2021). Language translation (like Google Translator) is one of the most important NLP applications depending currently on neural networks. It takes texts as input in some language and produces the result in another language.

The first NLP method of language translation is the phrase structure rules-based which is designed to encapsulate the grammar of a language producing many rules and constituting the entire language grammar rules. Using these rules constructs a parse tree that tags words with a likely part of speech and reveals how the sentence is built (Daniel Jurafsky, 2000), (Wikipedia, 2021). Treating languages as Lego makes computers adept at the NLP tasks (The question “where’s the nearest pizza” can be recognized as “where”, “nearest”, and “pizza”). By using this phrase structure, computers can answer questions like: “what’s the weather today?” or executing commands like “set the alarm at 2 pm”. Computers can also use phrase translation to generate natural languages text, especially in the case when data is stored in the web of semantic information (NLP:CrashCourseComputerScience#36, 2017). The knowledge graph is Google’s version of phrase structure processing which contains 70 billion facts about and relationships between various entities. This methodology used to create chat-bots that were primarily rule-based. This approach’s main problem is the need to define all possible variation and erroneous input in rules, making the translation model more complex and slower. Fortunately, the Google Neural Machine Translation system (GMTS) has been arisen and replaced the rule-based approach since 2016.

Deep Neural Network (DNN) Architecture for language translation

Translation requires a profound understanding of the text to be translated (Poibeau, 2017), which can be done using DNN. The language deep learning model (Google Translator, for example) consists of the following parts (How-Google-Translate-Works, 2019), (NLP-CrashCourseAI#7, 2019):

  • Sentence to Vector Mapper (Encoder): which converts words into a vector of numbers representing them. For this part, we can use the Recurrent Neural Networks (RNN), like in Google Translator, to encode words and transform them into representations (Vectors) in order to be understood by computers.
  • Combine representations into a shared vector for the complete training sentence.
  • Vector to Sentence mapper (Decoder), which also another RNN used to convert representation into words.

Those both RNNs are Long-Short Term Memories (LSTM) dealing with long sentences. This architecture works well for medium length sentences (15-20) words, but they failed when the grammar goes more complex. The word in a sentence depends on the word before and the word that comes after. Replacing those RNNs with bi-directional ones solved the problem.

Another problem is what word should we focus on more in a long sentence. Translation now uses the alignment process (Poibeau, 2017), in which they align inputs and outputs together. These alignments are learned using an extra unit located between the encoder and decoder and called the attention mechanism (How-Google-Translate-Works, 2019). Therefore, the decoder will produce a translation of one word simultaneously, focusing on the word defined by the attention mechanism. Google translator (for example) uses eight LSTM bidirectional units supported by the attention mechanism.

However, until now, machine translation models based on deep learning performed well on simple sentences but, the more complex sentence, the less accurate translation (Poibeau, 2017).

References:

  1. How-Google-Translate-Works. (2019). Machine Learning & Artificial Intelligence Retrieved from YouTube: https://www.youtube.com/watch?v=AIpXjFwVdIE&ab_channel=CSDojoCommunity
  2. James H. Martin Daniel Jurafsky. (2000). Speech and Language Processing. New Jersy: Prentice Hall.
  3. NLP:CrashCourseComputerScience#36. (2017). Retrieved from YouTube: https://www.youtube.com/watch?v=fOvTtapxa9c
  4. NLP-CrashCourseAI#7. (2019). Retrieved from Youtube: https://www.youtube.com/watch?v=oi0JXuL19TA&ab_channel=CrashCourse
  5. Thierry Poibeau. (2017). Machine Translation. London, England: The MIT Press,Cambridge, Massachusetts.
  6. wikipedia. (2021). Machine_translation, .2020 Retrieved from https://en.wikipedia.org/wiki/Machine_translation

AI/ML/NLP application: Google Translator- Chirin Dirani

Reading about the Natural Language Processing (NLP) for this class, reminds us with the test proposal by the famous Alan Turing in 1950. He proposed that “the test is successfully completed if a person dialoguing (through a screen) with a computer is unable to say whether her discussion partner is a computer or a human being.” However, having linguistics involved in the NLP field makes achieving this goal a real challenge.  I truly believe that NLP will succeed in formalizing mechanisms of understanding and reasoning when it develops to an intelligent program. This program can understand what the discussion partner says, and most importantly, concludes from what has been said in a dialogue that keeps the conversation going on. In other words, this will happen when we can’t differentiate between a machine or a human answering all our questions in chatbots and when a translating program is able to translate the most accurate translation from one language to another without any mistakes. For this assignment, I will not argue if Turing’s proposal is achievable or not, rather I will use the knowledge we obtained from the rich materials to describe the main design principles behind Google translator as one of the AI/ML/NLP applications.

My interest in Google translator stems from the fact that it is a significant tool I employ in my professional English- Arabic translation work. I witness its rapid and continuous development when I use this fascinating tool everyday. Thanks to the readings of this class, now I understand how this system functions and how neural system developed from translating piece by piece of a sentence into a whole sentences translation at a time. Thierry Poibeau claims that “Machine translation involves different processes that make it challenging.” In fact, incorporating grammar in translator logic to create meaningful text is what makes translation very challenging. However, “The availability of huge quantities of text, on the Internet, discovering Recurrent Neural Networks (RNN), and the development of the capacity of computers have revolutionized the domain of machine translation systems.” According to Poibeau, using deep learning approaches since the mid-2010s gave more advanced results to the field. As deep learning makes it possible to envision systems where very few elements are specified manually, and help the system extrapolate the best representation from the data by itself. 

Google translation processes are well clarified in How Google Translate works: The Machine Learning Algorithm Explained video. It informs us that language is very diverse and complex and for that, using neural networks (NL) proved to be useful in solving the problem of language translation. As we read last week, “neural networks learn to solve problems by looking at large amounts of examples to automatically extract the most relevant features.” This allows neural networks to learn patterns and data, which enables it to translate a sentence from one language to another on its own. In this context (sentence translation), neural networks are called Recurrent Neural Networks because they deal with longer sentences and they are basically long short-term memory. To activate these networks, there is a need for an encoder- decoder architecture, where the first RNN encodes/converts the source language sentence into recognizable computer data (vectors and matrices) and the second RNN decodes the computer data (vectors and matrices) to the target language sentence. Running different types of information, at the same time, using deep learning, allows better decision making. The whole translation process involves more complex and abstract processes, however, the encoder- decoder architecture principle is the essence of any machine translating system.

Moving forward, every time I use Google translator, I will remember that without the revolutionary development in the NLP field (The discovery of RNN), this tool would not have been available. In conclusion, Yes, machine translation system created useful tools by all means, However, the current encoder- decoder architecture is efficient for medium length sentences but not the long ones. This fact makes translation system still far from being able to give the most accurate translation of everyday texts.

 

َQuestions: 

  1. What is the main difference between ConvNet and RNN?
  2. Poibeau said that “through deep learning it is possible to infer structure from the data and for that, it is better to let the system determine on its own the best representation for a given sentence.” How can we trust that the system will make right decisions? 
  3. GPT-2 is a double-edged sword. It can be used for good and malicious causes. Does it make sense to say that when algorithms in this unsupervised model develop to be able to evaluate its accuracy on training data, is the key for GPT-2 to become an open source?

References:

Found in translation: More accurate, fluent sentences in Google Translate (2016) https://blog.google/products/translate/found-translation-more-accurate-fluent-sentences-google-translate/ 

How Google Translate Works: The Machine Learning Algorithm Explained (Code Emporium).

Thierry Poibeau, Machine Translation (Cambridge, MA: MIT Press, 2017).

Google Translate VS. Me

Natural Languages are filled with ambiguity that lies in understanding context, which is what makes it extremely difficult for computers to understand and translate. As a student of the Persian Farsi language, I am constantly running into similar issues that computers face when translating and honestly, they probably do a better job than I can at it… (Score 1 for Google)

I went through the language training in late 2016 to early 2017 at that time my professors yelled at us for using google translate, but after the readings done today, I look back and see that I could have learned from google translate how to better translate Farsi to English for myself. Like our understanding of computer vision, natural language process uses neural networks to translate. Natural language processing (NLP) is a big field concerned with the interactions between computers and human language, how to program computers to process and analyze large amounts of natural language data.  The goal is to make a computer capable of “understanding” contents in writing to include the nuances associated with languages. Within NLP is computational linguistics which looks at approaches to linguistic questions through computational modelling. One of those computational modellings of natural language is machine translation (MT) that examines the use of software to translate text or speech from one language to another. That is what I want to focus on first in this post.

How this is done is through an encoder-decoder model specifically a recurrent neural network (RNN). Remembering how neural networks work for pattern recognition in last weeks reading, it did not come as a surprise to understand the broad concepts of this. What differentiate RNN from last week’s readings is the bidirectional function in which the program can go back into its hidden layer continually and modify it before creating its output. How this works from my understanding:

  • We have an English sentence that gets encoded with numeric values (sequence-to-vectors), so the computer understands it.
  • These numeric values (vectors) go through the neural network hidden layer using an attention mechanism to align inputs and outputs based on the weighted distribution for the most probable translation.
    1. The bidirectional programming looks at words before and after, finding the important words through the attention mechanism. This increases the ability for the computer to understand semantics and translate sentences longer than 15-20 words by aligning inputs and outputs.
  • The highest value is then decoded into a word (vector-to-sequence) and generated as the output.

*This is done word by word.

The above is the explanation for Google’s Neural Machine Translation System, which is a way to deal with the ambiguities that lie in natural languages. I kind of relate to this process on how I understand and translate languages. I’m no expert in understanding Farsi, but I approach it identifying the important words mainly the nouns or verbs like the attention mechanism would do. Then I try to find the context of the sentence by pairing the words I do know with ideas of what the words before and after could be. Where google and I differentiate is that sometimes a word is left ambiguous to me because knowing it will not help me understand the sentence or make it more difficult to understand. I can remember teachers telling me not to worry about all the words but to grab the concept of it. I can mitigate the ambiguities because I understand the context behind the content sometimes interpreting it differently but still able to portray my idea without being as concise as MT needs to be. (Score 1 for me)

Another way to deal with ambiguities in NLP, that I believe is used in Google’s system, is the concepts behind BabelNet and WordNet.  Originally thought this was just a huge database for synonyms of words like a better version of thesaurus.com, but the more I understand what NLP and MT need to have to function the more I understand the difficulty for a computer to find the meanings behind words. From my understanding, BabelNet and WordNet are lexicons that create deeper links than just synonyms by finding the semantic relationships. I think that programs like this help computers understand and generate sentences needed in chatbot conversations by relating words to other words and thereby concepts.

We see an advancement of this through the case studies in which using neural networks one can train a program to guess the next word based on relational semantics and training data. Known as GPT-2 this is the latest evolution in NLP that eerily enough can create news articles that mimic human writing. As impressive as this is, this also brings a sense of caution to the exploitation of this technology in mass producing targeted fake news, reason OpenAI did not release the coding behind this technology. Another difficulty is that even though it is capable of writing human like content the computers still do not understand anything besides word association. Just like with the difficulties of computer vision the lack of understanding permeates and frustrates researchers.

Questions

  1. Of the four philosophies guiding NLP mentioned in the Hao article, which one does Machine Translation fit under? What does BabelNet/WordNet fit under?
  2. It seems like the resounding issues with NLP is the same as Computer Vision, lack of understanding? Do you think with the increasing availability and amount of data that some of the approaches, specifically training neural networks, can improve computers ability to understand of feign understanding?
  3. What is the most critical issue facing MT today? What is the most critical issue facing NLP today?
  4. Can I create my own program that can convert my speech into text?

References:

“A Neural Network for Machine Translation, at Production Scale.” n.d. Google AI Blog (blog). Accessed March 6, 2021. http://ai.googleblog.com/2016/09/a-neural-network-for-machine.html.

“An AI That Writes Convincing Prose Risks Mass-Producing Fake News.” n.d. MIT Technology Review. Accessed March 6, 2021. https://www.technologyreview.com/2019/02/14/137426/an-ai-tool-auto-generates-fake-news-bogus-tweets-and-plenty-of-gibberish/.

“Better Language Models and Their Implications.” 2019. OpenAI. February 14, 2019. https://openai.com/blog/better-language-models/.

“Computational Linguistics.” 2021. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Computational_linguistics&oldid=1008316235.

CrashCourse. 2017. Natural Language Processing: Crash Course Computer Science #36. https://www.youtube.com/watch?v=fOvTtapxa9c.

———. 2019. Natural Language Processing: Crash Course AI #7. https://www.youtube.com/watch?v=oi0JXuL19TA.

CS Dojo Community. 2019. How Google Translate Works – The Machine Learning Algorithm Explained! https://www.youtube.com/watch?v=AIpXjFwVdIE.

“Machine Translation.” 2021. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Machine_translation&oldid=999926842.

“Natural Language Processing.” 2021. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Natural_language_processing&oldid=1009043213.

“The Technology behind OpenAI’s Fiction-Writing, Fake-News-Spewing AI, Explained.” n.d. MIT Technology Review. Accessed March 6, 2021. https://www.technologyreview.com/2019/02/16/66080/ai-natural-language-processing-explained/.