Fortnightly Mailing: Machine translation - a crude comparison - statistical method superior to rules-based?

Last September, just prior to Google switching to its own statistical machine translation system for all the language pairs it offers, I set up a crude comparison between the rules-based and statistical methods used at that time by Google for different language pairs. The crudity stemmed in part from my use of the "round trip" comparison method (defects outlined below), from the use of only one sample text, and from the inherent drawback of comparing translation methods across different language pairs, each of which presents different translation challenges.

At the time, Google was offering its statistical method for six pairs: English to Arabic, Arabic to English, English to Russian, Russian to English, English to Chinese (Traditional and Simplified) and Chinese to English. I was curious about whether the statistical method (described here in this 11/5/2008 Linux Insider interview with Google's Peter Norvig) was better than the rules-based approach.

The comparison method was to use Google to do a "round trip" translation of a piece of English text to a test language (Arabic, French, German, Russian, Spanish, Traditional Chinese) and back, and then to ask respondents to judge the resulting texts according to

how much meaning had been lost
the extent of obvious howlers

and then to categorise each from best to worst.

Some of the deficiencies of "round trip" translation as a comparison method are summarised here.

The round trip translations were randomly presented, and the only way that respondents could have found out which language pair was involved would have been to use Google Translator on the test text, which I doubt any of them bothered to do. All respondents provided a name and email address; and all but two asked to be sent a summary of the results of the comparison, which makes me confident that they were taking the exercise at least reasonably seriously. The number of respondents was (only) 13, but the Russian and Arabic (statistical) translations seemed remarkably superior, being judged:

best or second best by 12 and 10 respondents respectively;
to have suffered only a little loss of meaning by 12 and 11 respondents;
to have few if any howlers by 10 and 8 respondents.

The French (rules based) and Chinese (statistical) translations were judged to be rather poorer, with the Spanish and German (rules based) translations judged to be the least good.

This table summarises.

n=13 Only a little loss of meaning Few if any obvious howlers Judged best Judged next best Judged next to worst Judged worst

Spanish 1 0 0 1 5 1

German 2 0 0 1 3 8

French 3 2 0 4 1 0

Arabic 11 8 4 6 1 0

Russian 12 10 6 6 1 0

Chinese 2 1 0 2 3 1

Since Autumn 2007, Google seems to have dispensed entirely the rules-based system, which I think had been provided to it by Systrans, and is offering translations using its statistical method for the 29 language pairs shown in note C below. To give you feel for the output of the current system, here are links to this piece, translated into the 6 languages used for the comparison: Spanish; German; French; Arabic; Russian; Traditional Chinese.

Notes

A. The text used was a paragraph from and article in the Economist:

"The passing of a United Nations resolution on July 31st to deploy up to 26,000 troops and police in Darfur is a welcome diplomatic breakthrough in trying to end the conflict there. At least 200,000 people have been killed and about 2.5m displaced since hostilities broke out in 2003 in Sudan's western region. The UN, led by America, Britain and France on the Security Council, had been pushing an extremely reluctant Sudanese government into accepting such a force for over a year, so it is a victory for relentless diplomatic pressure. Bouquets, then, to the Western trio for keeping at it."

B. Previous Fortnightly Mailing pieces on Machine Translation

12/6/2005 - Combining human with machine translation;
15/3/2006 - Machine translation;
24/11/2006 - Machine translation - the 2006 NIST Comparisons.

C. Google's currently available language pairs

Arabic to English
Chinese to English
Chinese (Simplified to Traditional)
Chinese (Traditional to Simplified)
Dutch to English
English to Arabic
English to Chinese (Simplified)
English to Chinese (Traditional)
English to Dutch
English to French
English to German
English to Greek
English to Italian
English to Japanese
English to Korean
English to Portuguese
English to Russian
English to Spanish
French to English
French to German
German to English
German to French
Greek to English
Italian to English
Japanese to English
Korean to English
Portuguese to English
Russian to English
Spanish to English

Fortnightly Mailing

Categories

Archives

Machine translation - a crude comparison - statistical method superior to rules-based?

Comments

Recent Posts

Recent Comments

n=13	Only a little loss of meaning	Few if any obvious howlers	Judged best	Judged next best	Judged next to worst	Judged worst
Spanish	1	0	0	1	5	1
German	2	0	0	1	3	8
French	3	2	0	4	1	0
Arabic	11	8	4	6	1	0
Russian	12	10	6	6	1	0
Chinese	2	1	0	2	3	1