Google Translate VS. Me

Natural Languages are filled with ambiguity that lies in understanding context, which is what makes it extremely difficult for computers to understand and translate. As a student of the Persian Farsi language, I am constantly running into similar issues that computers face when translating and honestly, they probably do a better job than I can at it… (Score 1 for Google)

I went through the language training in late 2016 to early 2017 at that time my professors yelled at us for using google translate, but after the readings done today, I look back and see that I could have learned from google translate how to better translate Farsi to English for myself. Like our understanding of computer vision, natural language process uses neural networks to translate. Natural language processing (NLP) is a big field concerned with the interactions between computers and human language, how to program computers to process and analyze large amounts of natural language data.  The goal is to make a computer capable of “understanding” contents in writing to include the nuances associated with languages. Within NLP is computational linguistics which looks at approaches to linguistic questions through computational modelling. One of those computational modellings of natural language is machine translation (MT) that examines the use of software to translate text or speech from one language to another. That is what I want to focus on first in this post.

How this is done is through an encoder-decoder model specifically a recurrent neural network (RNN). Remembering how neural networks work for pattern recognition in last weeks reading, it did not come as a surprise to understand the broad concepts of this. What differentiate RNN from last week’s readings is the bidirectional function in which the program can go back into its hidden layer continually and modify it before creating its output. How this works from my understanding:

  • We have an English sentence that gets encoded with numeric values (sequence-to-vectors), so the computer understands it.
  • These numeric values (vectors) go through the neural network hidden layer using an attention mechanism to align inputs and outputs based on the weighted distribution for the most probable translation.
    1. The bidirectional programming looks at words before and after, finding the important words through the attention mechanism. This increases the ability for the computer to understand semantics and translate sentences longer than 15-20 words by aligning inputs and outputs.
  • The highest value is then decoded into a word (vector-to-sequence) and generated as the output.

*This is done word by word.

The above is the explanation for Google’s Neural Machine Translation System, which is a way to deal with the ambiguities that lie in natural languages. I kind of relate to this process on how I understand and translate languages. I’m no expert in understanding Farsi, but I approach it identifying the important words mainly the nouns or verbs like the attention mechanism would do. Then I try to find the context of the sentence by pairing the words I do know with ideas of what the words before and after could be. Where google and I differentiate is that sometimes a word is left ambiguous to me because knowing it will not help me understand the sentence or make it more difficult to understand. I can remember teachers telling me not to worry about all the words but to grab the concept of it. I can mitigate the ambiguities because I understand the context behind the content sometimes interpreting it differently but still able to portray my idea without being as concise as MT needs to be. (Score 1 for me)

Another way to deal with ambiguities in NLP, that I believe is used in Google’s system, is the concepts behind BabelNet and WordNet.  Originally thought this was just a huge database for synonyms of words like a better version of thesaurus.com, but the more I understand what NLP and MT need to have to function the more I understand the difficulty for a computer to find the meanings behind words. From my understanding, BabelNet and WordNet are lexicons that create deeper links than just synonyms by finding the semantic relationships. I think that programs like this help computers understand and generate sentences needed in chatbot conversations by relating words to other words and thereby concepts.

We see an advancement of this through the case studies in which using neural networks one can train a program to guess the next word based on relational semantics and training data. Known as GPT-2 this is the latest evolution in NLP that eerily enough can create news articles that mimic human writing. As impressive as this is, this also brings a sense of caution to the exploitation of this technology in mass producing targeted fake news, reason OpenAI did not release the coding behind this technology. Another difficulty is that even though it is capable of writing human like content the computers still do not understand anything besides word association. Just like with the difficulties of computer vision the lack of understanding permeates and frustrates researchers.

Questions

  1. Of the four philosophies guiding NLP mentioned in the Hao article, which one does Machine Translation fit under? What does BabelNet/WordNet fit under?
  2. It seems like the resounding issues with NLP is the same as Computer Vision, lack of understanding? Do you think with the increasing availability and amount of data that some of the approaches, specifically training neural networks, can improve computers ability to understand of feign understanding?
  3. What is the most critical issue facing MT today? What is the most critical issue facing NLP today?
  4. Can I create my own program that can convert my speech into text?

References:

“A Neural Network for Machine Translation, at Production Scale.” n.d. Google AI Blog (blog). Accessed March 6, 2021. http://ai.googleblog.com/2016/09/a-neural-network-for-machine.html.

“An AI That Writes Convincing Prose Risks Mass-Producing Fake News.” n.d. MIT Technology Review. Accessed March 6, 2021. https://www.technologyreview.com/2019/02/14/137426/an-ai-tool-auto-generates-fake-news-bogus-tweets-and-plenty-of-gibberish/.

“Better Language Models and Their Implications.” 2019. OpenAI. February 14, 2019. https://openai.com/blog/better-language-models/.

“Computational Linguistics.” 2021. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Computational_linguistics&oldid=1008316235.

CrashCourse. 2017. Natural Language Processing: Crash Course Computer Science #36. https://www.youtube.com/watch?v=fOvTtapxa9c.

———. 2019. Natural Language Processing: Crash Course AI #7. https://www.youtube.com/watch?v=oi0JXuL19TA.

CS Dojo Community. 2019. How Google Translate Works – The Machine Learning Algorithm Explained! https://www.youtube.com/watch?v=AIpXjFwVdIE.

“Machine Translation.” 2021. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Machine_translation&oldid=999926842.

“Natural Language Processing.” 2021. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Natural_language_processing&oldid=1009043213.

“The Technology behind OpenAI’s Fiction-Writing, Fake-News-Spewing AI, Explained.” n.d. MIT Technology Review. Accessed March 6, 2021. https://www.technologyreview.com/2019/02/16/66080/ai-natural-language-processing-explained/.