METEOR score

#evaluation #translation

METEOR (Metric for Evaluation of Translation with Explicit Ordering) evaluates the quality of generated text by assessing how well it aligns with a reference text. It is based on a generalized concept of unigram matching between the machine-produced translation and human-produced reference translations.

It calculates a score based on the harmonic mean of unigram precision and recall, with a higher weight assigned to recall.

Unlike the precision-focused BLEU score and the recall-focused ROUGE score, METEOR was developed to address certain shortcomings of these metrics and aims to better correlate with human judgment at the sentence or segment level.

More details here:

https://en.wikipedia.org/wiki/METEOR
https://www.nltk.org/api/nltk.translate.meteor_score.html