← Case StudiesFinTech

Cutting RAG hallucinations 87% before launch

A regulated FinTech reduced hallucination rate from 23% to 3% with a layered RAG eval and reranking strategy.

The problem

Customer-facing financial Q&A bot generated confidently wrong citations in pre-launch UAT.

Root cause

Single-pass dense retrieval with weak chunking returned topically-similar but legally-incorrect passages.

Approach

Instrumented retrieval, reranking, and generation independently. Built a 400-question golden set with retrieval ground truth.

Framework used

RAGAS + custom faithfulness judge + Promptfoo CI gating

Results

  • Hallucination rate 23% → 3%
  • Citation accuracy 71% → 96%
  • Launch unblocked in 6 weeks

Lessons learned

  • Layer the eval before tuning the model
  • Reranking pays for itself
  • Faithfulness needs a domain-tuned judge

Facing a similar problem?

Book an assessment with our team.

Book AI Assessment