Human evaluations of machine translation outputs require considerable effort and are expensive. Human evaluations can take days or even weeks to finish so a new scoring system was developed to automate this process of evaluation. This method is commonly referred to as BLEU Score.
BLEU Score is quick to use, inexpensive to operate, language independent, and correlates highly with human evaluation. It is the most widely used automated method of determining the quality of machine translation.
The BLEU metric scores a translation on a scale of 0 to 1, but is frequently displayed as a percentage value. The closer to 1, the more the translation correlates to a human translation. Put simply, the BLEU metric measures how many words overlap in a given translation when compared to a reference translation, giving higher scores to sequential words.
BLEU scores range from 0-100%. A score less than 15% means that your KantanMT engine is not performing optimally and a high level of post-editing will be required to finalise your translations and reach publishable quality.
A score greater than 50% is a very good score and significantly less post-editing will be required to achieve publishable translation quality.
There is a high correlation between the number of words used in training a KantanMT engine and its BLEU score. Put simply, the more training data that is uploaded to KantanMT, the better the BLEU score and consequently the generated translations.
To maximise KantanMT's learning and its ability to copy your translation style, it's important to upload as much training data as possible.