Three Quick Tips for Improving MT Quality
What can I do to improve the quality of my KantanMT engine?
Well, at KantanMT.com our members constantly ask us this question so we thought we'd put together a Quick Guide to outline what can be done to improve the translation quality of a KantanMT engine, and help reduce the amount of post-editing required. The good news, it is easy!
There are a number of ways to improve the performance of your engine:-
- More Training Data - As with any statistical machine translation engine, the more training data you use to build your KantanMT engine the better its capacity to generate translations that mimic your translation style and terminology. We call this retraining your engine.
There are two parameters you can monitor here:-
- Number of Training Words: This is the absolute number of words in your training data. We recommend that your engine should have at least 2 million words if it is to have sufficient domain knowledge to produce quality translations.
- Number of unique words in your training data: The number of unique words contained within your training data is a crucial element for building a successful engine. For example, it is estimated that there are approximately 500,000 unique words in the English language (there are an estimated 1 million for French and German, while other languages vary in size). So ensuring that your engine has a high number of unique words is the key to building an 'intelligent' engine. That is, an engine which has the ability to cover the full scope of the language.
A word of caution on training data! Your training data should be from the same domain! For example, if the domain is related to automotive, then the training data should be automotive-related. If it is anti-virus software, then training data should be related specifically to this domain. As in Translation Memory, you do not get good results if you mix the domain of your TMs when translating files. The more precise and focused your domain training data the better the results.
By uploading more training words from a similar domain, you increase the knowledge, understanding and intelligence of your KantanMT engine. In other words, it gets smarter!
So remember, the best results are achieved if you select training data from similar or identical domains. Don't confuse your engine by mixing training data!
Terminology, Terminology, Terminology: Your KantanMT engine just can't get enough of this! To maximise translation quality, ensure that you upload your client's terminology or glossary files into your Translation Area on KantanMT.com. This will ensure that your KantanMT engine will adopt this terminology during the translation process - maximising quality and dramatically reducing post-editing effort. Once again, the more you educate, or train your engine the better the end results will be.
Automate Post-Editing using PEX: Unless your clients have agreed to a gistâ translation (i.e. a rough translation of the source text that allows the reader to understand the "gist" of the content), you'll most likely have to apply some level of post-editing to your translation. Don't let this put you off, this is standard procedure with most machine translated text. However, use a PEX file to automate your post-editing. This can dramatically reduce the amount of manual post-editing required, saving you time and money. (You can read up on PEX by clicking here).
That's it! Enjoy!