ROUGE score

#evaluation #summarization #translation

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a recall-focused score primarily used for evaluating automatic summarization and, sometimes, machine translation.

The key feature of ROUGE is its focus on recall, measuring how many of the reference n-grams are found in the system-generated summary. This makes it especially useful for tasks where coverage of key points is important.

Among its variants;

ROUGE-N computes the overlap of n-grams,
ROUGE-L uses the longest common subsequence to account for sentence-level structure similarity,
ROUGE-S includes skip-bigram statistics.

More details here:

https://aman.ai/primers/ai/evaluation-metrics/#rouge
https://aclanthology.org/W04-1013.pdf