Language translation is more complex than a simple word-to-word replacement method. As seen in the readings and videos for this module, translating a text in another language needs more context than a dictionary can provide. This “context’ in language is known as grammar. Because computers do not understand grammar, they need a process in which they can deconstruct sentences and reconstruct them in another language in a way that makes sense. Words can have several different meanings and also depend on their structure within a sentence to make sense. Natural Language Processing addresses this problem of complexity and ambiguity in language translation. The PBS Crash Course video breaks down how computers use NLP methods.
Deconstructing sentences into smaller pieces that could be easily processed:
- In order for computers to deconstruct sentences, grammar is necessary
- Development of Phrase Structure Rules which encapsulate the grammar of a language
Using phrase structures, computers are able to construct parse trees
*Image retrieved from: https://www.youtube.com/watch?v=fOvTtapxa9c
Parse Trees: link every word with a likely part of speech+ show sentence construction
- This helps computers process information more easily and accurately
The PBS video also explains this is the way that Siri is able to deconstruct simple word commands. Additionally, speech recognition apps with the best accuracy use deep neural networks.
Looking at how Google Translate’s Neural Network works, the Code Emportium video describes a neural network as a problem solver. In the case of Google Translate, the neural networks job or problem to solve, is to take an English sentence (input) and turn it into a French translation (output).
As we learned from the data structures module, computers do not process information the way our brains do. They process information using numbers (vectors). So, the first step will always be to convert the language into computer language. For this particular task, a Recurrent Neural Network will be used (neural network specifically for sentences).
Step 1. Take English sentence and convert into computer language (a vector) using a recurrent neural network
Step 2. Convert vector to French sentence (using another recurrent neural network)
Image retrieved from: https://www.youtube.com/watch?v=AIpXjFwVdIE
According to research from a 2014 paper on Neural Machine Translation, the Encoder-Decoder Architecture model pictured above works best for medium length sentences with 15-20 words (Cho et al). The Code Emporium video tested out the LSTM-RNN Encoder method on longer sentences, and found that the translations did not work as well. This is due to the lack of complexity in this method. Recurrent Neural Networks use past information to generate the present information. The video gives the example:
“While generating the 10th word of the French sentence it looks at the first nine words in the English sentence.” The Recurrent Neural Network is only looking a the past words, and not the words that come after the current word. In language both the words that come before and after are important to the construction of the sentence. Therefore, a BiDirectional Neural Network is able to do just this.
Image retrieved from: https://www.youtube.com/watch?v=AIpXjFwVdIE
Bidirectional neural networks (looks at words that come before it and after it) Vs. Neural Network (only looks at words that come before it)
Using the BiDirectional model – which words (in the original source) should be focused on when generating the translation?
Now, the translator needs to learn how to align the input and output. This is learned by an additional unit called an attention mechanism (which French words will be generated by which English words).
This is the same process that Google Translate uses – on a larger scale
Google Translate Process & Architecture / Layer Breakdown
Image retrieved from video: https://www.youtube.com/watch?v=AIpXjFwVdIE
English translation is given to the encoder, which translates the sentence into a vector (each word gets assigned a number), then an attention mechanism is used next to determine the English words to focus on as it generated a French word, then the decoder will translate the French translation one word at a time (focusing on words determined by attention mechanism).
Works Cited