Giving computers the ability to understand and speak a language is called Natural Language Processing (NLP) (NLP:CrashCourseComputerScience#36, 2017) (Daniel Jurafsky, 2000). NLP is considered an interdisciplinary field that fuses computer science and linguistics. NLP explores two ideas: natural language understanding (NLU) and natural language generation (NLG). While NLU deals with how to get the meaning of combinations of letters (AI that filters spam, Amazon search etc.), NLG generates language from knowledge (AI that performs translation, summarize documents, chatting bots etc.) (NLP-CrashCourseAI#7, 2019). There is an infinite number of approaches to arrange word in a single sentence, which cannot be given to computer as a dictionary. In addition to that, there are many words having multiple-meaning, like “leaves”, causing ambiguity, so computers need to learn grammar (NLP:CrashCourseComputerScience#36, 2017). To take grammar into account while building any language translator, we should first ensure the syntax analysis. Second, the semantic analysis must be applied to ensure that sentence make sense (Daniel Jurafsky, 2000), (How-Google-Translate-Works, 2019).
Language translation
Machine translation (MT) is a “sub-field of computational linguistics that uses computer software to translate text or speech from one language to another” (Wikipedia, 2021). Language translation (like Google Translator) is one of the most important NLP applications depending currently on neural networks. It takes texts as input in some language and produces the result in another language.
The first NLP method of language translation is the phrase structure rules-based which is designed to encapsulate the grammar of a language producing many rules and constituting the entire language grammar rules. Using these rules constructs a parse tree that tags words with a likely part of speech and reveals how the sentence is built (Daniel Jurafsky, 2000), (Wikipedia, 2021). Treating languages as Lego makes computers adept at the NLP tasks (The question “where’s the nearest pizza” can be recognized as “where”, “nearest”, and “pizza”). By using this phrase structure, computers can answer questions like: “what’s the weather today?” or executing commands like “set the alarm at 2 pm”. Computers can also use phrase translation to generate natural languages text, especially in the case when data is stored in the web of semantic information (NLP:CrashCourseComputerScience#36, 2017). The knowledge graph is Google’s version of phrase structure processing which contains 70 billion facts about and relationships between various entities. This methodology used to create chat-bots that were primarily rule-based. This approach’s main problem is the need to define all possible variation and erroneous input in rules, making the translation model more complex and slower. Fortunately, the Google Neural Machine Translation system (GMTS) has been arisen and replaced the rule-based approach since 2016.
Deep Neural Network (DNN) Architecture for language translation
Translation requires a profound understanding of the text to be translated (Poibeau, 2017), which can be done using DNN. The language deep learning model (Google Translator, for example) consists of the following parts (How-Google-Translate-Works, 2019), (NLP-CrashCourseAI#7, 2019):
- Sentence to Vector Mapper (Encoder): which converts words into a vector of numbers representing them. For this part, we can use the Recurrent Neural Networks (RNN), like in Google Translator, to encode words and transform them into representations (Vectors) in order to be understood by computers.
- Combine representations into a shared vector for the complete training sentence.
- Vector to Sentence mapper (Decoder), which also another RNN used to convert representation into words.
Those both RNNs are Long-Short Term Memories (LSTM) dealing with long sentences. This architecture works well for medium length sentences (15-20) words, but they failed when the grammar goes more complex. The word in a sentence depends on the word before and the word that comes after. Replacing those RNNs with bi-directional ones solved the problem.
Another problem is what word should we focus on more in a long sentence. Translation now uses the alignment process (Poibeau, 2017), in which they align inputs and outputs together. These alignments are learned using an extra unit located between the encoder and decoder and called the attention mechanism (How-Google-Translate-Works, 2019). Therefore, the decoder will produce a translation of one word simultaneously, focusing on the word defined by the attention mechanism. Google translator (for example) uses eight LSTM bidirectional units supported by the attention mechanism.
However, until now, machine translation models based on deep learning performed well on simple sentences but, the more complex sentence, the less accurate translation (Poibeau, 2017).
References:
- How-Google-Translate-Works. (2019). Machine Learning & Artificial Intelligence Retrieved from YouTube: https://www.youtube.com/watch?v=AIpXjFwVdIE&ab_channel=CSDojoCommunity
- James H. Martin Daniel Jurafsky. (2000). Speech and Language Processing. New Jersy: Prentice Hall.
- NLP:CrashCourseComputerScience#36. (2017). Retrieved from YouTube: https://www.youtube.com/watch?v=fOvTtapxa9c
- NLP-CrashCourseAI#7. (2019). Retrieved from Youtube: https://www.youtube.com/watch?v=oi0JXuL19TA&ab_channel=CrashCourse
- Thierry Poibeau. (2017). Machine Translation. London, England: The MIT Press,Cambridge, Massachusetts.
- wikipedia. (2021). Machine_translation, .2020 Retrieved from https://en.wikipedia.org/wiki/Machine_translation