I was first introduced to the world of automated translation in 1977 via Brigham Young University’s TSI (Translation Sciences Institute) which later spawned ALPS (Automated Language Translation Systems); I worked at both enterprises as a linguistic programmer.
It’s a huge field now, much more than it was in the ’60s and ’70s when the technologies and theories were merely a-borning; much has been written about automated translation since the ’60s and even earlier. The history is out there on the Net if you want to do your own research ¹ (and that doesn’t mean watching two hours of YouTube videos that tell you what you want to hear). There’s also some funny stuff out there. ²
A post from one of my Facebook friends and translation colleagues was the source for some Japanese text; this is just a raw comparison, and you can draw your own conclusions or dig deeper if you want. Or don’t. But it’s something that fascinates me, and I could study it for a lifetime. Wait, I did. Whatevs.
Google Translate began by using statistical machine translation (SMT), which uses the analysis of huge bilingual text corpora to generate translation based on statistical models. They later moved to a combination of SMT and neural machine translation (NMT) which uses an artificial neural network to predict the likelihood of a sequence of words.
Facebook began using Bing translate (otherwise known as Microsoft Translator) but later developed their own translation engine, first based on SMT and now entirely AI-driven using the neural network model.
DeepL is a relative newcomer to the automated translation scene, but has received high praise from translators and governments alike. It uses neural machine translation, but its power comes from the massive Linguee database. While it currently works with only 11 languages as compared to Google Translate’s 109, the results appear to be consistently better and more natural.
Below you will find two examples of highly colloquial Japanese and the output from the three different translation engines.
(Eeee? Dare? Motte itchatta no wa! Tabun, kamera ni utsutte iru yo ne. Kaeshitee)
What? Who is this? I took it! Maybe it’s on the camera. Give it back
Eh? Who? What I brought! Maybe it’s in the camera. Return
Ehhh? Who is it? I’m the one who took it! Maybe you can see it in the camera. I want it back.
(Son’na koto o suru hito ni wa zettai ni ba chi ga ataru yo 〜)
People who do such a thing will never win ~
People who do such a thing will definitely be hit
People who do such things are going to pay dearly for it.
Neural network translation is interesting in that repeated submission of a single phrase can often result in different outputs:
when given to DeepL results in:
I want it back.
Give it to me.
Give it back to me.
Whereas the original phrase reduplicated (返してー返して.) produces:
Give it back! Give it back! Give it back!
The technology has made multiple quantum leaps since the earliest forays into automated translation. My Pixel 3XL phone is many times more powerful than the IBM 370/138 that BYU was using to develop their one-to-many interactive translation system based on Junction Grammar, both in storage capacity and processing speed. To be very honest, I don’t know what kind of hardware these systems are running on, whether distributed or mainframe or supercomputers that are capable of processing whigabytes of data at processing speeds that almost don’t have enough greek prefixes to describe. I just know they’re big, and fast, and they’re only getting bigger and faster all the time.
That said, translation, particularly literary translation, is just as much of an art form as it is a mechanical process, one that has cognitive components that no computer will ever be able to duplicate. No machine would ever be capable of translating Les Misérables into English, or Harry Potter into Hebrew, for example, and preserve the wonder of language; I challenge any machine, now matter how sophisticated or fast, to translate things like this:
“I stepped off the train at 8 P.M. Having searched the thesaurus in vain for adjectives, I must, as a substitution, hie me to comparison in the form of a recipe.
Take a London fog 30 parts; malaria 10 parts; gas leaks 20 parts; dewdrops gathered in a brick yard at sunrise, 25 parts; odor of honeysuckle 15 parts. Mix.
The mixture will give you an approximate conception of a Nashville drizzle. It is not so fragrant as a moth-ball nor as thick as pea-soup; but ’tis enough – ’twill serve.
I went to a hotel in a tumbril. It required strong self-suppression for me to keep from climbing to the top of it and giving an imitation of Sidney Carton. The vehicle was drawn by beasts of a bygone era and driven by something dark and emancipated.”
-O. Henry – “A Municipal Report”
The need for human translators is in no danger, and never will be – but that’s not to say that technological advances have not brought both advantages and disadvantages to human translators. Back in the day, it was pencil and paper, and hard-copy dictionaries, and rolodexes. Now it’s translation memories and electronic dictionaries and segmentation systems that allow for rapid recall of already-translated words and phrases and best-guessing (fuzzy matching) for things that are close. This speeds up the work and increases consistency, but as a result translation agencies have taken to telling translators that they’ll pay, for example, 9¢ per word for new material, but only 4¢ for fuzzy matches, and almost nothing for 100% matches. This means that translators have to turn out much more material to generate the same amount of income – but what agencies don’t care about is that every word needs to be processed and reviewed through the skillset of the translator as though it were brand-new. What’s more, the proliferation of free online translation services means that any schlub in India or China can claim to be a translator and charge 2¢ per word, and the agencies love that – but in exchange they’re getting lousy output and dragging down the rates of pay for the entire industry – which is exactly why I got out of the business of freelance translation. It’s a crime, and I won’t put up with it.
The Old Wolf has spoken, Der Alte Wolf hat gesprochen. Le vieux loup a parlé. Il vecchio lupo ha parlato.
¹ If you want to dig into the history of machine translation, you can start here, following the references at the end of the article for more. Warning: It’s a very, very deep rabbit hole.
² I’ve addressed academic nonsense before, but it’s worth a mention here.
Hear, hear! The same lousy pay also applies for writing, as you know. There is more demand than ever for internet copy, but the pay is as bad as it was when I tried to make a go of it, and you definitely get what you pay for.
There are too many hacks out there diluting the pool, sadly.
The sense of 「だれ？もっていっちゃったのは！」 is “Who is it? Who took it?” There’s no first-person subject. DeepL got that just as wrong as the other two systems.
For 「返してー」, DeepL’s “I want it back” is again off the mark (as are “I want it back.” — “Give it to me.” — “Give it back to me.”). Facebook’s “Give it back!” is the closest.
I was hoping you would chime in, どうもありがとう!