Perplexity
A measure of how 'surprised' a language model is by a sequence — lower is better.
Perplexity is the exponentiated cross-entropy loss on a held-out corpus. Useful for comparing base models on the same dataset.
It does not measure task quality. A model can have low perplexity and still hallucinate or fail instructions.
Go deeper
Read the full pillar guide on LLM Evaluation or compare evaluation tools in the Tool Comparison Hub.