METEOR (Metric for Evaluation of Translation with Explicit Ordering) evaluates the quality of generated text by assessing how well it aligns with a reference text. It is based on a generalized concept of unigram matching between the machine-produced translation and human-produced reference translations.
It calculates a score based on the harmonic mean of unigram precision and recall, with a higher weight assigned to recall.
Unlike the precision-focused BLEU score and the recall-focused ROUGE score, METEOR was developed to address certain shortcomings of these metrics and aims to better correlate with human judgment at the sentence or segment level.