Machine translation checking
Machine translation is now able to produce texts of acceptable and sometimes even professional quality depending on the language combination, the field of translation and the file format. For example, an automatic translation from English to Spanish by Deepl Pro of a description of tourist accommodation will immediately give a fully satisfactory result of professional quality that it will not even be necessary to reread. On some rarer language combinations, machine translation is clearly to be avoided. And for a large number of language combinations, on moderately complex subjects, the result will be understandable, but unusable in a professional context. For all these projects that deserve a human eye, we offer machine translation checking and correction.
What is neural machine translation?
Machine translation (MT) is the automated process of translating text or speech from one language to another using algorithms and software. The evolution of machine translation began with systems that used language rules and dictionaries to translate text. It has progressed towards more advanced models based on statistical models and large bilingual corpora to predict translations. The latest evolution is the Neural Machine Translation (NMT) system that uses artificial neural networks to understand and generate translations, offering more fluid and contextually appropriate results.
The typical architecture used for NMT is the encoder-decoder model, associated with attention mechanisms:
- Encoder: A recurrent neural network (RNN), convolutional neural network architecture (CNN) or a transformer encodes the source text into a vector representation (a set of vectors) that captures the meaning of the text.
- Decoder: Another neural network uses this vector representation to generate text in the target language. The decoder predicts the next word in the translated sequence based on the words already translated and the contextual representation provided by the encoder.
- The attention mechanism allows the model to focus on different parts of the source phrase when generating each word of the target phrase. This improves the quality of the translation, especially for long and complex sentences.
The NMT model is trained on large parallel texts (texts aligned in two languages). The training process includes the following steps:
- Data Preprocessing: Texts are cleaned, tokenised (cut into words or sub-words), and sometimes enriched with linguistic metadata.
- Supervised Learning: The model learns to translate by adjusting its parameters to minimize the prediction error between its translations and human reference translations. Retro propagation and gradient descent algorithms are used to optimise the weights of the neural network.
When inferring (translating new texts), the model uses the learned parameters to generate translations. The process can be summarised as follows:
- Encoding: The source text is encoded as a vector representation.
- Decoding: The model generates target text word by word, using attention vectors to focus on the relevant parts of the source text at each stage of generation.
Various techniques are used to improve the performance and efficiency of NMT models:
- Regulation: Techniques such as dropout are used to prevent over-fitting.
- Model Sets: Combine multiple models to improve robustness and accuracy.
- Pre-training and Refining: Use pre-trained models on large amounts of unsupervised data, then refine them on task-specific data.
The limitations of neural machine translation
NMT offers better quality translations than previous methods, especially for data-rich languages. It is able to capture contextual nuances and complex syntactic structures, but it encounters difficulties with languages with little data, very long or very complex sentences, and specialised texts requiring specific terminology. It is on this type of projects that we intervene, to correct the errors in neural machine translation.
The importance of human correction
Human revision of a neural machine translation involves native linguists or speakers evaluating the automatically translated text. This method takes advantage of human expertise to capture the nuances and subtleties that machines might miss. Human reviewers can provide valuable feedback on the fluidity, accuracy and cultural adequacy of translation. This is a major error correction job to make the text understandable. This work is similar to Light Post-edition and focuses on:
- Spelling, grammatical, syntactic and typographical errors: Machine translation systems rarely make these types of errors, but syntax is not always the best.
- Contextual understanding: Machine translation systems may lack contextual understanding, resulting in translation errors.
- Cultural nuances: Unlike the human speaker, it is difficult for a machine translation system to capture idiomatic expressions and cultural references.
- Specialised terminology and jargon: Translation systems may struggle with technical and specialised vocabulary, the human reviewer detects errors and approximations, and refer to the source text to correct the translation.
Main advantages of machine translation correction
- Price: With a price of about 4-euro cents per word read back, machine translation is clearly within reach of all budgets.
- Delay: On average, machine translation can correct 7,000 to 9,000 words per day, which is 4 to 6 times more than in 100% human translation.