One big issue in machine translation

In HCI field, there are two inherently difficult issues, which I’m aware of.
One is translation of human language and another one is voice recognition. ( Similarly handwriting recognition is also a big problem and there has been no big advancement in those fields. )

Today, I tried the Google Translate to see how much it was enhanced.
Recently Google updated the interface of their Translate web site. Now, you can upload a document, and it automatically translate it for you.
Here is its UI.

So, unlike its previous version, you are supposed to paste an URL of a web page to translate there, but pressing “Upload” button which is not in the picture above.

Then, translation result is like this.

Translation Results

Actually, the screenshot was taken after I tried once and changed the translated title from “펜촉 객체를 위한 메모리 관리”, i.e. “Memory Management for a Pen-tip”.
Then I deleted the “uploaded” document ( strangely, even though you don’t upload a file and just pasted a web URL, it calls that as “uploaded” ) and pasted the URL again. Intelligently it remembers the text I changed before.

Anyway, what is funny is :

  • Why it translated the Nib Objects as “Pen-tip” objects in Korean
  • The choice of “콘센트”, or concent, for Outlets

I believe the first one is due to its vocabulary. Somehow Nib is mapped to “펜촉”, or pen-tip.
The 2nd one is rather interesting and I think this is why machine translation is very hard.
“콘센트” or concent to mean outlet is so-called Konglish. I don’t know how we, Koreans, started to use that word. Whether it is right or not, we use it. Google Translate smartly chooses that word and translated as such.
However, here is one or two big problems.

  1. Whether to write it in English characters ( Outlets) or in Korean but as pronounced (아웃렛츠 or 아웃렛 )
  2. Whether to replace it with “콘센트” or concent

The 1st one is a problem of “which one is more natural to Korean people” Do they write words from English or other languages in the original characters or Korean? What makes more complicated here is that we uses English characters for something, while we use Korean characters for others. For something like “Outlets”, which will not mean the actual outlet you use for plugging in power cables but means the specific term in Cocoa/Objective-C programming, we usually don’t write it in Korean characters. Then how can a machine figure out that context and chooses a better or more natural form?

The 2nd problem is more fundamental. It is about how to translate foreign languages. Should proper words be chosen based on how people there use in daily life? Then “콘센트” is right. But how about the context? Whether is is Outlets or 아웃렛, deciding 콘센트(concent) or 아웃렛(outlet) should be decided after figuring out the context, and in this case, Outlets is better and right choice.

How can machine figure those out? It is really difficult problem. Each and every language on Earth will have different property. Even in same Korean, I mean North Korea and South Korea, North Koreans use “얼음 보숭이”, while South Koreans use “아이스크림”, which is written in Korean as “ice cream” is pronounced.

I believe, Google decided to solve this problem by users upload their own translation. In its previous version, you could paste text or URL to a document to translate and you could suggest your own, or better translation to the Google translate system. Now, it is done similarly also, but its UI is different. But anyway it is the same approach.

Then, I believe, Google’s AI collects them and finds some common patterns with higher usage ratio and next time, if someone else ask the Google translate to translate the same text or text with similar phrases in it, Google’s AI chooses the most proper pattern in its expert-system DB. ( If it even uses Expert System. )
But I didn’t see that happen yet. I tried it couple of time for years, but Google’s system doesn’t seem to look up my translation I put there a few months ago. But I can feel that Google people will do that eventually. Without that approach, there is no reason they allowed users to provide their own translation. ( With current version, the translated document is saved in your account. But previous versions didn’t have that functionality. However, the Google system let you suggest your translation. It means that Google’s system collects your translation for future reference. )

It is very interesting to see how Google’s translate system will evolve.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: