Difficulty in Machine Translation, From English to Korean

The latest two language translation technology models based on statistics or artificial neural machine translation, namely Statistical Machine Translation and Neural Machine Translation. Statistical Machine Translation requires an enormous amount of data while Neural Machine Translation utilizes a large neural network and the deep learning which make it possible to acquire context-sensitive translation.

Korean, as well as Chinese and Japanese, belong to the Han Ideographs. English and Korean are significantly different in terms of structure, such as the distribution of subjects, the word order, the forms of verbs, and so on. English uses a Subject-Verb-Object structure, while Korean uses a Subject-Object-Verb structure. For example,

Besides, omitting subjects in Korean creates confusion in comprehending the meaning. For example,

As you can see, the ‘영희가’ can be omitted in Korean, which can cause a problem in understanding the sentence. Consequently, in order to avoid this, the context in a discourse needs to be closely considered, and this requirement works as a challenge for Machine Translation.

Another difficulty is that in Korean, speech is divided into polite form and impolite form, depending on who you talk to, which is extremely important in Korean since if used inappropriately, it seems quite rude. And the differences between polite form and impolite form is complicated. The politest and formalist form of speech is ending in ‘십니다’, while the less polite and formal form is ending in ‘~요’. And the above two polite speech is when talking to the elder, superiors, or people that you are not familiar with. For example, ‘I listen to music’ in Korean is ‘음악을듣습니다’ or ‘음악을들어요’ when speaking in a polite form. However, when you talk to your friend, subordinate, or people younger than you, the same English sentence will be translated to ‘음악을듣다’ instead, which is an informal and impolite form of speech in Korean. The application of polite or impolite is totally dependent on the context, sometimes a younger person can still talk to an older person in an impolite form, if they are close friends or the older person agree with this.

Besides, the usage of 1stperson and 2ndperson is different in polite and impolite form of speech in Korean. For example,

when I input ‘Do you eat lunch’ in Google translate, the translated one is quite impolite, and it’s very rude to ask like this to the elder, etc. If you watch some Korean dramas, you might find the first character, ‘너’(neo), which means ‘you’ is common to be seen when you look down on the others, and it is quite rude. The ‘I’ and ‘You’ are different in polite and impolite form of speech in Korean. ‘I’ is’저’ in polite form and ‘나’ in impolite form, while “You” is ‘당신’ in polite form and ‘너’ in impolite form.

Also, there are some differences depending on the gender of people. In Korean, people seldom directly call the name of the elder, even if their ages gaps are small. For example, a 13-year-old girl must call a 14-year-old ‘sister’, or else it will be rude. The word ‘older sister’ and ‘older brother’ is used by both boys and girls in English, however, Korean use different words depending on their gender. If a girl call her older sister, she must say ‘엄니’. If a boy calls his older sister, he must say ‘누나’. If a girl calls her older brother, she must say ‘오빠’. If a boy calls his older brother, he must say ‘형’.

These differences significantly add to the complexity and difficulty in machine translation from English to Korean. And here is a video talking about some problems with machine translation from English to Korean and why Korean Machine Translation is terrible.



Teller, V. (2000). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Computational Linguistics26(4), 638-641.

Kim, S., & Lee, H. (2017). A Study on Machine Translation Outputs: Korean to English Translation of Embedded Sentences. 영어영문학, 22(4), 123–147.