Consulting Book AI Assessment

The AI Reliability Blog

Frameworks, anti-patterns, and field notes from validating AI in production.

Can We Trust AI In Production? A Framework

Four signals every team should measure before declaring an AI feature production-ready.

LLM-as-a-Judge Without the Pitfalls

Position bias, verbosity bias, and self-preference — how to calibrate judges you can actually trust.

RAG Hallucinations: Five Root Causes (and Fixes)

Most hallucinations aren't generation problems — they're retrieval problems wearing a generation costume.

Your Agents Need SLOs, Not Just Metrics

Quality scores tell you what happened. SLOs tell you what's allowed to happen.

Oracle AI Testing

Oracle Fusion AI Agents: A QA Readiness Checklist

Ten things to validate before letting Oracle AI Agents touch your ERP.