From Natural Language to Natural Language Processing

By Linda Bardha

Language impacts our every day life. Language helps express our feelings, desires, and queries to the world around us. We use words, gestures and tone to portray a broad spectrum of emotion. The unique and diverse methods that we use to communicate through written and spoken language is a large part of what helps us to bond with each-other.

Technology and research in the fields of linguistics, semiotics and computing, had helped us to communicate with people from different countries and cultures through translating tools.

The process of learning a language  is natural and we all are born with the ability to learn it, but language isn’t just a box of words that you string together, put it through a database or dictionary, and “magically” is translated into another language.  There are rules for combining words into grammatical phrases, sentences, and complex sequences. Language is a system that is made of subsystems, and each of these subsystems/layers play an important role for the whole architecture. This might seem like a process that we often overlook, since we don’t think about it when we speak or when we write. But this is not the case when we use tools for translation. That’s why there are issues when you translate text from one language to another through a tool like Google translate. Since each language has it’s own sets or rules and grammar, that makes it harder for a correct and cohesive translation.

As the video on google translate explains, neural networks help with this transitions.

What do we understand with the term Neural Network?

In the field of computer science, an artificial neural network is a classifier. In supervised machine learning, classification is one of the most prominent problems. The aim is to assort objects into classes.  As Jurasfky  and Martin explain, the term “supervised” refers to the fact that the algorithm is previously trained with “tagged” examples for each category (i.e. examples whose classes are made known to the NN) so that it learns to classify new, unseen examples in the future. In supervised machine learning, it is important to have a lot of data available in order to run examples.  As Poibeau points out in his book, a text corpus is necessary for machine translations.  With the increasing amount of translations available on the
Internet, it is now possible to directly design statistical models for machine translation. This approach, known as statistical machine translation, is the most popular today. Robert Mercer, one of the pioneers of statistical translation, proclaimed: “There is no data like more data.” In other words, for Mercer as well as followers of the statistical approach, the best strategy for developing a system consists in accumulating as much data as possible. These data must be representative and diversified, but as these are qualitative criteria that are difficult to evaluate, it is the quantitative criterion that continues to prevail.

Other important part for successful translations in statistical machine translations are machine learning algorithms and natural language processing. Machine Learning in the context of text analytics is a set of statistical techniques for identifying parts of speech, entities, sentiment, and other aspects of text. The techniques can be expressed as a model that is then applied to other text (supervised machine learning). It also could be a set of algorithms that work across large sets of data to extract meaning, which is known as unsupervised machine learning. Different from the supervised machine learning, unsupervised machine learning refers to statistical techniques to get meaning out of a collection of text without pre-training a model. Some are very easy to understand like “clustering” which just means grouping “like” documents together into sets called “clusters.” These can be sorted based on importance using hierarchical clustering from the bottom-up or the top-down.

Now that we know that the process of translation requires a corpus, incorporation of neural networks and statistical techniques, it needs one more component to complete this process: Natural Language Processing. Natural Language Processing(NLP) broadly refers to the study and development of computer systems that can interpret speech and text as humans naturally speak and type it. Human communication is frustratingly vague at times; we all use colloquialisms, abbreviations, sarcasm or irony when we speak, and sometimes even make spelling mistakes. All of these make it computer analysis of natural language difficult. There are three main components  of a given text that need to be understood in order for a good translation to happen:

  1. Semantic Information – which is the specific meaning of each individual word.
  2. Syntax Information – which is the set of rules, principles, and processes that governs the structure of a sentence.
  3. Context Information – which is understanding the context that a word, phrase or sentence appears in.

After we looked at these concepts and how they work, let’s take a look at design principles and mechanisms of Neural Machine Translation. 

  • Translates whole sentences at a time, rather than just piece by piece.
  • This is possible because of end-to-end learning system built on Neural Machine Translation, which basically means that the system learns over time to create better, more natural translations.
  • NMT models use deep learning and representation learning.
  • NMT requires a lot of processing power, which is still one of its main drawbacks. The performance and time requirements are even greater than for statistical machine translation. However, according to Moore’s law, processing power should double every 18 months, which again offers new opportunities for NMT in the near future.

While NMT  is still not perfect, specially with large technical documents, it shows progress from the previous version Phrase Based Machine Translation (PBMT), which is one mode of Statistical Machine Translation. Now the attention is how to improve it even more, and one suggestion is to look at “hybrid” models, which uses both of the previous methods. The efficiency of each depends on a number of factors such as the language used and the available linguistic resources (such as corpus).


Daniel Jurafsky and James H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd ed. (Upper Saddle River, N.J: Prentice Hall, 2008). Selections.

Thierry Poibeau, Machine Translation (Cambridge, MA: MIT Press, 2017). Selections.

How Google Translate Works: The Machine Learning Algorithm Explained (Code Emporium). Video.

Barak Turovsky. Found in translation: More accurate, fluent sentences in Google Translate. (Nov. 15, 2016) found at

Seth Redmore. Machine Learning vs. Natural Language Processing, Lexalytics (Sep. 5, 2018) found at

United Language Group. Statistical Vs. Neural Machine Translation found at