Deep Neural Nets and the Future of Machine Translation

GNMT (Google Neural Machine Translation) utilizes a E2E / DNN (End 2 End / Deep Neural Net) to perform encoding and decoding at a much faster rate than traditional statistical models utilized by other machine translation methods. (Le and Schuster 2016) For example, IBM’s statistical word level translation models which had to first split the sentence into words, then find the best translation for each word. (Poibeau 117) The GNMT translates entire sentences rather than singular words, which takes into account more ambiguities associated with spoken and written language for a much more accurate translation. (Le and Schuster 2016)

While GNMT is essentially based on a statistical model, it is on a much grander scale than previous methods. (Poibeau 147) The training with GNMT is much more efficient than the previous statistical models which lacked the “learning” aspect, and required manual fixes upon discovery of design flaws. (Poibeau 147) A huge difference between neural networks now, and the old statistical models, is the multidimensional aspect between encoding and decoding, meaning, “…higher linguistic units, like phrases, sentences, or simply groups of words, can be compared in a continuous space…” (Poibeau 149) 

However, GNMT still requires important interaction during the training process, because it is necessary for the neural network to target the most relevant data, rather than waste computational resources on unnecessary data. (Roza 2019) The adding of features also becomes an issue, if new features are added, the old DNN becomes obsolete. (Roza 2019) Another issue facing DNN is the massive amounts of data required for them to produce the best results, however, it seems the amounts of available data is becoming less an issue than the organization of such data. 

Some pressing questions which came to mind during the readings were:

  1. Why is Chinese (mandarin) lagging behind other languages in the GNMT translation quality? (Le and Schuster 2016) Is it possible that data on Mandarin is less accessible to the DNN than other languages?  
  2. How fluid is DNN learning? Does the training stop when the DNN is actively being implemented? 
  3. How exactly are statistical models different from DNN’s? Is it simply the case that DNNs continue to learn whereas statistical models utilize a less fluid database? 


Le, Quoc, and Mike Schuster. “A Neural Network for Machine Translation, at Production Scale.” Google AI Blog, Accessed 9 Mar. 2021.

Poibeau, Thierry. Machine Translation. 1st ed., MIT Press, 2017.

Roza, Felp. “End-to-End Learning, the (Almost) Every Purpose ML Method.” Medium, 18 July 2020,