Tag Archives: post-editing

Paradigm shift on machine translation?

Multilingual #95, coverThe April-May issue of Multilingual, which I’m just catching up with, features seven articles on machine translation (MT). Having a long term interest in this area (which is not to say any expertise) and in its potential for less-widely spoken languages, and having broached the topic on this blog once previously, I thought I’d take a moment to briefly review these articles. They are (links lead to abstracts for non-subscribers):

The evolution of machine translation — Jaap van der Meer
Machine translation: not a pseudoscience — Vadim Berman
Putting MT to work — Lou Cremers
Monolingual translation: automated post-editing — Hugh Lawson-Tancred
Machine translation: is it worth the trouble? — Kerstin Berns & Laura Ramírez
Challenges of Asian-language MT — Dion Wiggins & Philipp Koehn
Advanced automatic MT post-editing — Rafael Guzmán

In the first of the articles, Jaap van der Meer characterizes changes in attitudes about MT over the last 4 years as “revolutionary” — a move “from complete denial [of MT’s utility] to complete acceptance.” What happened? The answer seems to be a number of events and changes rather than a single triggering factor, perhaps an evolution to a “tipping point” of sorts. There have been ongoing improvements in MT, there was the establishment of the Translation Automation User Society (TAUS) in 2004 which “helped stimulate a positive mindset towards MT,” and the empowerment of internet users in the use of MT. Van der Meer also points out a shift in emphasis from finding “fully automated high quality translation” (FAHQT) to what he calls “fully automated useful translation” (FAUT – an acronym that presumably should not be read in French). The latter is not only a more realistic goal, but also one that reflects needs and uses in many cases.

As for the future, van den Meer sees a “shift from traditional national languages to ever more specialized technical languages.” My question is whether we can at the same time also see significant moves for less widely spoken languages.

Van den Meer’s article sets the tone and has me asking if indeed we are at a point where a fundamental shift is occurring the way we think of MT. The other articles look at specific issues.

Vadim Berman looks at some hurdles to making MT work, highlighting the importance of educating users – including mention of a recurrent theme: the importance of clean text going into the translation.

Two of the articles, by Lou Cremers and by Kerstin Berns and Laura Ramírez, discuss the practical value of MT in enterprise settings.

Cremers has some interesting thoughts about the utility of MT in an enterprise setting, something that has long seemed impractical, certainly when compared to translation memory (TM). He begins by noting that “a high end MT system will really work if used correctly, and may save a considerable amount of time and money,” and then procedes to discuss several factors he sees as key to getting good ROI: terminologies and dictionaries; quality input text; volume (pointing out among other things the fact that good MT will tend to lead to a larger amount of text being translated – a key point for considering the value of MT in other spheres of activity I might add); and workflow.

The “correct use” of MT relates largely to the quality of the text: “surprisingly simple writing rules governing he use of articles and punctuation marks will drastically improve MT output.”

Cremers offers a summation which seems to speak for several of the articles:

It’s not the absolute quality of the MT output that is important, but rather how much time it saves the translator in completing the task. In that way it is not different from TM. In both cases, human intervention is needed to produce high-quality translations.

Berns and Ramírez walk through the costs and benefits of MT in a business context. Here the issue is investing in a system but the reasoning could be applicable to different settings. They suggest that the kind of material to be translated is (unsurprisingly) a good guide to the potential utility of MT:

Do you have large text volumes with very short translation times and a high terminology density? Then it is very likely that MT will be a good solution for you. On the other hand, if you have small text volumes with varying text types and complex sentence structures, then it probably will bu too much effot to set up an effective process.

Two of the articles, by Hugh Lawson-Tancred and Rafael Guzmán, discuss “post-editing” as a tool to improve the output of MT.

Lawson-Tancred suggests – contrary to several of the other authors – that the utility of preparing the text going into MT may not be so critical, and that “the monolingual environment of the post-editor is a better place to smooth out the wrinkles of the translation process….” Interestingly, this concept focuses on context, with the basic unit for processing being 5-20 words (that is between the word level of dictionaries and whole sentences). His concludes by speculating that automated post-editing could “develop into a whole new area of applied computational linguistics.”

Guzmán, who has written a number of other articles on post-editing, discusses the use of TM in the context of verifying (post-editing) the product of MT. This basically involves ways of lining up texts in the source and translated languages for context and disambiguation. There are several examples using Spanish and English.

Finally, Dion Wiggins and Philipp Koehn discuss MT involving Asian languages, which most often entails different scripts. There are examples from several Asian languages illustrating the challenges involved.

This is an interesting set of articles to read to get a sense of the current state of the art as regards the application and applied research on MT. It’s a bit of a stretch for a non-specialist with limited context like me to wrap his mind around the ensemble of technical concepts and practices. One does come away, though, with the impression that MT is already a practical tool for a range of real-world tasks, and that we will be seeing much more widespread and sophisticated uses of it, often in tandem with allied applications (notably TM and post-editing). Are we seeing a paradigm shift in attitudes about MT?

At this time I’d really like to see a program to encourage young computer science students from diverse linguistic backgrounds in developing countries and indigenous communities to get into the field of research on MT. I’m convinced that it has the potential if approached strategically to revolutionize the prospects for minority languages and the ways we think about “language barriers.” That is more than just words – it has to do with education, knowledge and enhanced modes of communication. By extension, the set of human language technologies of which MT is a part, can in one way or another play a significant role in the evolution of linguistic diversity and common language(s) over the coming generations.