RAG Failures

Retrieval and Context Issues

When AI systems retrieve the wrong information or miss critical context, leading to dangerous but seemingly correct answers.

The Coffee Machine Reimbursement Trap

When Partial Context Creates Dangerous Answers

RAG FailureHigh Risk3 days to resolution

The Problem

A user asked, 'Can I expense a new coffee machine for my home office?' The retriever fetched a permissive policy document but completely missed the exclusions list that explicitly forbids kitchen appliances. The LLM answered 'Yes,' faithfully reflecting the incomplete context it received.

Impact:

Partially correct but subtly wrong answers are more dangerous than obviously wrong ones. This could have led to policy violations and financial disputes.

Diagnosis

Langfuse Trace

Input: Can I get reimbursed for a coffee machine for my home office?

Retrieved: General-Reimbursement-Policy-v4.pdf

Missing: Reimbursement-Exclusion-List-v2.pdf

Key Metrics

faithfulness:95%

contextualRecall:45%

contextualPrecision:60%

Solution: Three-layered defense using LlamaIndex + Langfuse

Metadata Enrichment

Enhanced documents with policy type, section, and relationship metadata

# LlamaIndex parsing with enhanced metadata
from llama_index.core.node_parser import SemanticSplitterNodeParser

parser = SemanticSplitterNodeParser.from_defaults(buffer_size=3)
nodes = parser.get_nodes_from_documents(docs)

for node in nodes:
    node.metadata.update({
        "policy_type": "reimbursement",
        "section": extract_section(node.text),
        "related_docs": find_related_policies(node.text)
    })

Improved Retrieval

Implemented hybrid retrieval with reranking to surface related exclusion documents

# Hybrid retrieval with metadata boosting
retriever = index.as_retriever(
    similarity_top_k=10,
    filters={"policy_type": "reimbursement"}
)
# Apply reranking to boost exclusion documents
reranked_nodes = reranker.rerank(nodes, query, top_k=5)

Evaluation Gates

CI/CD gate that fails when required documents are missing from context

# Automated gate catches missing context
def contextual_recall_gate(test_case, retrieved_nodes):
    required_docs = test_case["expected_context_sources"]
    found_docs = [n.metadata["doc_id"] for n in retrieved_nodes]
    
    missing = [doc for doc in required_docs if doc not in found_docs]
    if missing:
        raise Exception(f"Missing required docs: {missing}")
    
    return len(found_docs) / len(required_docs)

Results

Before

contextualRecall:45%

contextualPrecision:60%

policyViolations:12%

After

contextualRecall:85%

contextualPrecision:90%

policyViolations:0%

CI/CD gate now fails with actionable error: 'Retriever not finding exclusion list for reimbursement queries.'

Key Lessons

Faithfulness alone is insufficient - high faithfulness with wrong context is dangerous
Contextual Recall is critical for policy and compliance use cases
Metadata enrichment dramatically improves retrieval precision
Automated gates catch edge cases that manual testing misses