Code Breakers

Can Google translate teach us to talk to each other?

Text by Peter Lyle

In these days of end-to-end news stories about hacked databases, beloved brands keeping personal data and similar increasingly present risks of our wired world, I get a feeling of security, and a warm sense of something like nostalgia, when I use one of those longstanding online tools that still execute the single task we ask them, whenever we ask them, and don't try any funny business on the side. Currency converters, clocks, useless weather forecasts, that kind of thing. I use translation websites a lot, and until the theme of this issue of Tank came into focus, I thought of those in exactly the same way. In plain English, that was dumb.

It never makes headlines outside of tech and business blogs, but the field of online translation has actually witnessed a number of quiet revolutions and revelations in recent times. Some wondered whether Google's new voice-to-text translation signaled the coming of a universal human language. A 2008 deal with Wikipedia to translate the site for different languages marked another step towards Google's goal of digitizing all human knowledge.

Even the goals Google failed to reach highlight the slow-burning significance of online translation. One rare huge territory where Google hasn't destroyed the competition is Russia. There, a native search portal called Yandex has been able to maintain its dominant position by being better equipped to deal with Cyrillic characters, Russian grammar and other nationally-specific details. Another is China, where Google has lost market share to a local rival, Baidu, for five successive business quarters now. Again, people seem to warm more to its local flavor. An additional problem seems to be the fallout from Google's prior clashes with the country's authorities, which have seen its software locked out of the official, government-endorsed Chinese version of Android, and so from the new smartphones and tablets of many Chinese.

Though its roots lie in a 40-year-old American system for deciphering Russian, the recent past of "machine translation" has ramifications for geopolitics, virtual politics (namely the neverending Google -Facebook-etc fight for online territorial battle for our eyeballs), personal privacy, and - when you think about it - for the language and culture that future generations will inherit.

Though Facebook and Twitter have both recently launched crowdsourced translation initiatives whose impact is yet to be known, most online translation software is derived from a 1968 system called SYSTRAN, which itself evolved from automatic translation technology developed for US intelligence during the Cold War. SYSTRAN learns huge vocabularies and grammatical rules, then applies them to produce translations. It still powers translations for many vast tech brands, from security firms to portals like Altavista and Yahoo and US and European government networks. It used to be the basis of Google Translate, too, until Google replaced that system with its own translation software in 2006, software that set itself apart from SYSTRAN's emphasis on traditional linguistic rules by being based on the comparison of millions of translated documents, from which it learns.

And, unless you happened to have a special interest in the subject, that was probably that. We all translated happily ever after, using Google Translate ever more as it expanded its suite of languages (there are now over 50, including Jamaican Creole). It gave us a limited sort of comprehension of cool foreign websites, or quotations we wanted to get the gist of, or travel information we were trying to make sense of. It gave us laughs and fury at its absurd failures too, but it was hard to grumble, because it would have been unimaginable a decade earlier, seemed to slowly improve, and was free.

But Google Translate and its quirks are becoming an ever more defining part of our online existence, rather than just a cool tool; our world is becoming a more Google Translated one, whether we like it or not.

When I work with foreign journalists and agencies, web-translated text is increasingly common. A couple of years back, Google Translate announced a partnership with Wikipedia whereby Wikipedia contributors would translate and upload entries to their own-language sites using Google's Translator Toolkit software. Last Spring, they celebrated the project's first 16 million words (in 32 languages). In late summer 2010, in a story since crowingly recounted on numerous professional translation sites, the Vancouver Sun reported that the Canadian Mounted Police had been forced to end a service where instant Google translations were sent to French speakers who requested them. After complaints - they were not only offending much of the population over a sensitive cultural issue, but also failing in their obligation to provide a quality translation under equality laws - they ended up paying for a $3,000-a-day human translation service instead. Earlier this year, Google announced it was ending its Google Translate API, which allowed people to offer auto-translated text on their sites on request, because this bandwidth-intensive, not-for-profit system was being widely used in violation of its terms, and to make money. After industry uproar and conspiracy theories of all sorts, it announced it would maintain the service after all, but now with a usage fee that would help manage abuse of it in future.

In March 2011, Google launched a beta voice-recognition system that translated the words you spoke into text. In an interview at the time, Franz Och, the man behind the modern incarnation of Google Translate, was asked if it was the first step to universal voice translation in realtime. "To really do the integrated speech-to-speech translation, where you can have a phone call with someone and it would interpreted live?" He asked. "I believe that based on the technology that we have, and the improvement rate we have in the core quality of MT [machine translation] and speech recognition, that it should be possible to do that in the not-too-distant future."

Four years earlier, then Google chairman Eric Schmidt said, "Google and other companies are working on statistical machine translation so that we can on demand translate everything all the time…Many, many societies have operated in language-defined communities where they really don't understand and are not particularly sympathetic to other peoples' views because of the barrier of language. We're about to have that breakthrough and it is a huge thing."
Schmidt is obviously a very smart chap, and knew the power of evoking that idea of universal comprehension was a powerful one - in the Bible, God's decision to divide human communication into different languages is perhaps the next most symbolically significant punishment after the exile of Adam and Eve from the Garden of Eve.

That Old Testament punishment was the "confusion of tongues", inflicted after God saw the endless ambition of the builders of the Tower of Babel. He said that their shared language meant they were capable of achieving their goals, however outrageous - like building a tower that reached to Heaven. When all humans speak the same language, God warns, "nothing will be restrained from them, which they have imagined to do." So from then on, they scatter into different groups with their own ways of communicating and cursed to divide their energies and misunderstand each other.

Peter Barker, an Australian-born translator who lives and works in Spain, brings the debate firmly back to earth. Barker translates reports, books, pamphlets and everything else, usually via an agency, from English to Spanish and vice-versa. (He's dabbled in French, too.) When I asked if he feared a near future of instant, perfect auto-translation of everything, he said, "It doesn't even make sense. Even if I'm just translating a catalogue or something, so much personal judgment and cultural awareness is involved in every sentence."

Barker does concede, though, that not everybody seems to care. "Look, things are as hard for translators as they are for everyone else. And the knock-on effect of the caution with money in publishing, and in general, at the moment, means that you're not going to get rich even if you're really busy. But machine translation does pose an extra threat, because when people are cutting budgets they might suddenly look at that as a way to save money."

Which, he argues, it isn't - what's fine for getting a basic sense of something is no good at all for doing it properly. `When a piece of machine translated text has been sent to him for a small 'proofing' fee (a fraction of his standard translation rate), he says, "it often takes longer to fix the mangled translation than it would have been to do it from scratch from the original - there are so many red herrings and things that can waste your time."

Many professional translators now use software themselves, Barker reminds me. "Memory translation" software is a variant on the Google approach, in that it learns certain phrases and learns to translate them. That's especially useful in the cases of things like legal documents, where identical phrases and clauses can come up again and again over time. Like SYSTRAN, it can also can be customized to particular kinds of language and context.

The other thing Peter Barker reminds me to remember about memory software, Google's or anybody else's, is that it remembers everything. That's especially important for people like him, who might be confidentially working on commercially sensitive information for brands, but "everybody else should probably think about it too. Whatever you type in goes into the giant brain."

Your words might float out into the public domain on the one hand, and your successful solution will inform the next person who wants a translation on the other, will form part of the future canon. It seems rich to get too paranoid about privacy this case - the whole concept of Google Translate is about learning from as many real uses of language as possible to get better - but it is something to ponder how your thoughts at your desk might become part of the official language of tomorrow, might shape the facts of the world, as translated for all on the web's ubiquitous encyclopedia.

Then it will be the truth, whatever lovers of language have to say about it. On the matter of whether it will be a good translation too, they'll probably be as divided as they are now. For all the pace of recent developments in online translation and all Google's grand sentiments, Franz Och's strategy is rooted in mathematical calculation, not the idea that you can one day make a computer work like a human mind, which the older grammar-and-vocabulary approach to machine translation implied. In other words, if Google is becoming the great, single dictionary of all our futures, online translation is on a course to embrace the idea of slowly getting better, rather than suddenly somehow becoming perfect and eliding all the minute differences in the ways we speak about and make sense of things. For a while yet, we'll still be able to think of it as the useful klutz we call upon to help us make sense of things we can't, or else give us a cheap laugh. For now, one of the most miraculous things about translation technology is the way it somehow leaves us feeling eloquent and wise.

  • Code Breakers - Online Translation Services