Prompt Injection
An attack where untrusted input overrides or manipulates the model's original instructions.
Direct injection: a user types 'Ignore previous instructions...'. Indirect injection: hostile content lives in a fetched webpage, PDF, or email the model reads.
Defenses: input/output guardrails, instruction hierarchies, tool-call allowlists, content provenance tagging, and adversarial evals in CI.
Go deeper
Read the full pillar guide on LLM Evaluation or compare evaluation tools in the Tool Comparison Hub.