Ambiguity vs. Predictability in ML and Google

Annaliese Blank

Some key themes for this week would be grammar and online translation of language. My goal for this week was to unpack this more and see how machine translation and google translate work. I use google translate all the time and I wanted to see its operations since I use the Spanish translations a lot for my travels. I have been to many parts of Mexico and Argentina where I specifically used Google translate to begin a foundation before staying with my previous host families. I took Spanish from third grade forward and even during high school, google translate definitely peaked at my school. It was like the perfect solution to so many problems when other websites or textbooks just couldn’t get the job done enough. The key word here is enough, Google translate when we unpack this does so much more than a simple translation, it does a grammar check and conversational check and makes sure that the current translation is the correct verbal translation, depending on what region you’re in since some areas don’t use the same versions of Spanish.

To further this, I really enjoyed the Machine Learning piece. I especially wanted to make connections here on machine learning. All of this really got me thinking about translation. A question I’d like to raise is, what exactly is translation and how can we understand the process, perhaps through other technology than google, like machine translation? What is the criteria?

In the Machine Learning piece, Martin and Jurafsky, helped me gather some fundamentals on my inquiries. When we de-black box this, we can see there is no perfect way to translate something, especially how I mentioned before that the “perfect” translation doesn’t exist in all of the same locations, since not all language is “universal”. They say, “Technical texts are not translated in the same way as literary texts. A specific text concerns a world that is remote from the world of the reader in the target language. The translator has to choose between saying close to the original text or making use of paraphrasing to ensure comprehension. The tone and style of the text are highly subjective” (Machine Learning, pg. 19). This got me thinking, How, can we trust machine translation or google translate so much if it is impossible to gain 100% accuracy? Where does this trust reside?

 Some other important areas I found really interesting were the discussions of morphology and syntax. Morphology deals with the structures of the words and syntax designs the sentence. For computing, or machine translation this is really hard to do because there is one thing they mentioned the most was AMBIGUITY, high amounts of uncertainty. From what I have read and gathered this seems to be still a main limitation to online translations and could potentially still be a problem for google translate in the future, this problem isn’t fixable. How does ambiguity exist if predictability prevails?

After watching these videos and crash course sites, I feel I have gathered a better understanding of what google translate does and how language can be modified in machine learning and coding. Coding in its own way is its own language.

And finally to describe the levels of a technology for this week, I wanted to continue the topic of google translate more. In the last google video they say, in the tap to translate feature, there is a button where you can translate a message sent to you, and say in voice-over or type out a message in English, and have them send it back in the designated language. It can be any phrase or character count up to 5000 characters. The translation options run as simple as English, Spanish, French, Italian, Russian, Chinese, Arabic. Etc. Once the translation is complete, you can send it, save it, or drag it wherever you need it onto a different app. This is most definitely the least complex NPL and is very easy to manage. What happens here is any of these languages are translated based off their original entry in English and then re-configured into the pre-set language translation of their preference. For AI, this is a game changer because the translations become predictable if the machine has already learned the appropriate inputs. This really helped me understand how this translation process works. Having pre-set features of things like syntax too go hand-in-hand for producing the best translated result. All of this now has led to Google’s most current upgrade, which is “translate as you type” where I mentioned before the pre-set features of the system allow this predictability to be heightened and makes the translation much easier for anyone.

I wanted to take this a step further and I looked up how google translate works in other ways, such as handwriting translation, creating your own verbal phrases, slow down pro-nunciations of certain phrases, and connects your style of languge from facebook messenger on to the translation tab. More details can be found here;

I feel I have gathered a better sense of machine learning and google translate in relation to AI, but I still feel stuck on artificial vs. natural systems.

Data Structure Crash Course:

  • Arrays- values stored in memory
  • Indexes
  • Strings- arrays of characters
  • Null characters
  • Matrix – array of arrays (3 total)

Machine Learning & Artificial Intelligence:

  • Algorithms give computers the ability to learn from the data and allow the ability to give and make decisions
  • Input layer, Output layer, Neuron Layer

How Google Translate Works:

  • language translations- word by word
  • curated data base to help translate pairs
  • tokens- smallest units of language
  • grammar- defines ordering of tokens
  • syntax analysis- does the structure look correct?
  • Semantic analysis- meaning, does this sentence make sense in context?
  • Neural network – component that learns to solve problems and allows the network to learn patterns and data
  • This helps with the translation process
  • Encoder – Decoder architecture – the pathway to insert vectors that carry out translations

Daniel Jurafsky and James H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd ed. (Upper Saddle River, N.J: Prentice Hall, 2008).