Monitoring

Model Drift

Degradation in model performance over time as production data diverges from training or evaluation data.

Two main types: data drift (input distribution shifts) and concept drift (the relationship between input and label shifts).

Detection uses statistical tests (PSI, KS), embedding-distance monitoring, and rolling-window performance metrics on labeled samples.

Mitigation: scheduled retraining, prompt refresh for LLM apps, and alerting on eval score regressions.

Go deeper

Read the full pillar guide on LLM Evaluation or compare evaluation tools in the Tool Comparison Hub.