There’s a lot of talk about Neural Machine Translation (NMT) these days. But few people actually have an idea of how this technology works. No wonder, because there is a lot of mystery around NMT, and explaining it requires some abstract thinking.

In our next three blog posts, we’ll have a closer look under the hood of NMT.

Reports of NMT range from doomsday messaging about the end of the translation industry to sheer euphoria about the quality of translation. We like to be a bit more realistic about it, but when Google states that it has massively improved its service thanks to NMT, there has to be some reason for excitement. NMT is indeed a huge improvement, compared to statistical machine translation, because it presents us with translations that sound very natural and human-like.

Check our previous blog to know if neural machine translation always is the best option.

NMT: the nuts and bolts

But first things first. Let’s start with a short definition of NMT.

Neural machine translation (NMT) is an approach to machine translation that uses a large artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model. Deep neural machine translation is an extension of neural machine translation.

(Source: Wikipedia.)

Let’s dive into some of the terms used in that definition.

  • A neural network is a computational system, mimicking the billions of neuron cells in the human brain, that uses observational data (examples) to learn and make decisions.

  • Neural networks are a type of machine learning, which is a form of artificial intelligence whereby statistical techniques are used to give computers the ability to learn.

  • Deep neural machine translation refers to the term deep learning, which is the extension of a neural network with several layers. Deep learning makes it possible to recognize patterns in digital representations of sounds, images, and other data.

The architecture of NMT

The beauty of neural machine translation is that it translates entire sentences at a time, rather than just piece by piece, which is what traditional Rule-Based Machine Translation (RBMT) does. NMT uses the context of the sentence to figure out the most relevant translation, which it then rearranges and adjusts until it sounds more like a human speaking with proper grammar. This is more or less how humans translate as well. When we want to translate ‘The boy rides the bike.’, we first make a mental representation of what that looks like. Once we have that, we can translate.

NMT uses an encoder – decoder architecture. An encoder neural network reads and encodes a source sentence into a ‘thought vector’ or ‘meaning vector’, which is a sequence of numbers that represents the sentence meaning. A decoder then outputs a translation from the encoded vector.

The whole encoder–decoder system is trained with lots of data to maximize the probability of a correct translation for a source sentence. This makes it possible to capture long-range dependencies in languages, such as gender agreement and syntax structures, which results in much more fluent translations.

Neural Machine Translation encoding

The idea of converting sentences or thoughts into numbers can be a little strange. We’ll dig a little deeper into that topic in our next blog post.

Stay tuned!