If not all of us, most of have definitely used Google translate at some point in our lives. I don’t doubt that some instances have been successful but I bet most have unfortunately not. The reason behind this is because natural human language is such a complex system of labyrinths that sometimes can only truly take place in the human mind as we can understand and interpret context, meaning and overall situational understanding that comes hand in hand with language. This is something that computers and machines have not been able to perfectly achieve yet as assigning cultural, interpretations, etc. is extremely challenging for a machine and if we think about it even for humans as often times not being a native and have grown up in the country of the language, you miss a lot of cultural nuances, interpretations, signs, etc. As Thierry Poibeau (2017) explains; “Natural language processing is difficult because, by default, computers do not have any knowledge of what a language is. […] There is a dynamic co-construction of interpretation in the brain that is absolutely natural and unconscious” (Poibeau, 2017, 23-26). However, this challenge is a highlight and key point in today’s technology and leads the path for further advancements as nothing can be done without language. So how does Google Translate work and why is not as reliable often times (aka why does the translation never make perfect sense in the final language)?
Natural Language Processing and Machine Learning play a crucial role in how Google translate, IPAs and many more computers and systems are able to read, interpret, understand and emulate a sentence whether phonetically or in the context of translation in order to fit natural language and real human dialogue standards. NLP is “concerned with the interactions between computers and human language. How to program computers to process and analyze large amounts of natural language data.” (Wikipedia, 2021) It is needed and used to understand the context and contextual nuances to which I was referring above. Computational linguistics also play a role in this “decoding” (pun intended) process and are “concerned with the computational modeling of natural language, the study of appropriate computational approaches to linguistic questions” (Wikipedia, 2021). One of the fundamental problems in NLP was sentence deconstruction and the ability for a computer to break down the sentence into bite-size pieces in order for it to be processed more easily. Of course, it also has to follow phrase structure rules in order to encapsulate the grammar and meaning of a language. As we have seen from this weeks and previous weeks readings: parse tress, neural networks, natural language understanding (how we get meaning out of combination of letters) and natural language generation (how to generate language from knowledge, distributional semantics (make the machine/computer guess words that have similar meaning but seeing which words appear in the same sentence a lot), count vectors (the number of times a word appears in the same place or sentence as other common words), etc. all build up a system where all this data is pulled and “stored in a web of semantic information where entities are linked to one another in meaningful relationships and informational sentences” (CrachCourse #7, 2019; CrashCourse #36, 2017).
Google Translate uses Machine Translation (constituted by all of the above and more): “the idea is to give the most accurate translation of everyday texts. […] The ultimate goal is to obtain quality of translation equivalent to that of a human being.” (Poibeau, 2017, 14-21). Google Translate, relies on NN as they look at thousands and thousands of examples to give the best solution/result. By using the Encoder-Decoder model, MT “builds internal representions and generates predictions. The encoder (a model that can read in the input aka the sentence) stands for “what we should think and remember about what we just read” and the decoder “decides what we want to say or do” (CrachCourse #7, 2019). The NN converts the words into a form, numbers, vectors and matrices that the computer understands. Recurrent Neural Networks, have a loop that allows them to “reuse single hidden layer, which gets updated as the model reads on at a time and by training the model on which to predict next, the model waits for the encoder RNN and the decoder prediction layer. The RNN are long short term memory RNNs (LSTM-RNNs) that can deal with longer sentences (instead of just words) much better (CS Dojo Community, 2019). By doing this consistently, if the computer notices that the two words mean something similar, the model makes their vectors more similar and therefore can “predict” the word that will follow the next time it is asked to do so (CrachCourse #7, 2019). The E-D model, takes the words/sentences the RNN turns them into Vector (sequence to vector) and the decoder takes the vectors and the RNN turns it into the words of the other language (vector to sequence) (CS Dojo Community, 2019) However, this fails to address the complexity that comes with contextual meaning and understanding and can be limiting when we are dealing with longer sentences that have more than 15-20 words. Just because words have similar meanings doesn’t mean that they can necessarily be interchangeable in all contexts. (See blog post)
The solution that Google Translate has “come up with” is by replacing RNN with BidirectionalRNN which uses an attention mechanism between the encoder and the decoder and helps the computer know which words to focus on while generating the words for another language (CS Dojo Community, 2019). During the translation process, a – lets say- English sentence is “fed into” the encoder, “translated” (again, pun intended) into a vector which is then taken my the attention mechanism that correlates which -lets say- Greek word will be generated by which English words. The decoder will then generate the result of the translation (in Greek) by focusing on one word at a time as the words have been determined by the attention mechanism. In this specific case, Google translate actually uses 8 LSTM because “deeper networks help better model complex problems” and are “more capable of understanding the semantics of language and grammar” (CS Dojo Community, 2019).
What does this data look like? Is it saved as words or as vectors? Are knowledge graphs shared across any type of machine/computer/software i.e. does google translate share its data collection with others?
References
Blog post from Prof. Irvine’s class 711: Computing and the Meaning of Code (Ironically enough I though of the same title for both of these, i.e. “take 2” on this one’s title)
Crash Course Computer Science: Natural Language Processing (PBS).
Crash Course AI, 7: Natural Language Processing
Thierry Poibeau, Machine Translation (Cambridge, MA: MIT Press, 2017).
How Google Translate Works: The Machine Learning Algorithm Explained (Code Emporium).
Wikipedia: Machine Translation,
Wikipedia: Computational Linguistics
Wikipedia: Natural Language Processing