Machine Translation & Data Privacy

Tianyi Zhao

Artificial intelligence, although in the fast-growing stage nowadays, is still a blackbox waiting for exploring and exploiting. Currently we are on the stage of leveraging with neural network, in which machines can learn advanced algorithm from practice and testing. The key fields in AI that have mostly impressed me during the course are machine learning and natural language process. The typical practice that combines these two is machine translation. As deep learning develops, neural network has been applied to machine learning and replacing the previous statistical one. With the encoder-decoder model, the source sentence is encoded into a fix-length vector from which a decoder generates a translation during the translation process. It associates context to find more accurate words and automatically adjusts to a more natural sentences syntactically that are smoother and more readable. Google Translate realized its transformation from statistical machine translation to neural one with multiple input methods in 2016. However, the technology still has problems in sequence and wording in reality. Besides, pattern recognition has also been applied to machine translation. There are mainly two types—image and speech. The multimedia in the source input is acceptable, however the output is always in text. Personally, I think the next step of machine translation is not only the accuracy improvement but also the diversity of output. In the near future, there may not simultaneous interpreters any longer.

Besides, to improve the accuracy of machine processing outcome, there needs to be Big Data applied. So here comes a prevalent issue of privacy. How can we guarantee the data practiced for machines are collected legally or authorized? There has been numerous data abuse scandals in the tech giants. During a research on Google Translation URLs, a police investigator was discovered to translate requests for assistance made to foreign police forces. The confidential information becomes no more “confidential” because of online translation. Don DePalma of Common Sense Advisory warned that “free machine translation tools such as Google Translate can in advertently result in a data leak.” (Brown, 2017) As machine learning becomes more popular, how can the public users do to protect their data when enjoying the comfort and convenience brought by machine learning?

