Glossary
The AI & LLM Glossary
Practitioner-grade definitions for AI evaluation, RAG, safety, and monitoring — written for engineers shipping real systems.
Evaluation
Safety
Hallucination
An LLM output that is fluent and confident but factually wrong or unsupported by the provided context.
AI Red Teaming
Adversarial testing of AI systems to surface safety, security, and reliability failures before users do.
Prompt Injection
An attack where untrusted input overrides or manipulates the model's original instructions.
Metrics
BLEU Score
An n-gram precision metric originally built for machine translation, comparing model output to one or more reference texts.
ROUGE Score
A recall-oriented family of metrics that measures overlap between generated text and reference summaries.
Perplexity
A measure of how 'surprised' a language model is by a sequence — lower is better.