ROUGE Score
A recall-oriented family of metrics that measures overlap between generated text and reference summaries.
ROUGE-N measures n-gram recall, ROUGE-L uses the longest common subsequence, and ROUGE-S uses skip-bigrams.
It is the default for summarization evaluation but, like BLEU, rewards lexical overlap over meaning.
Use ROUGE for regression tracking; complement it with faithfulness and factuality checks.
Go deeper
Read the full pillar guide on LLM Evaluation or compare evaluation tools in the Tool Comparison Hub.