Temporal & Data Issues

When Time Becomes Your Enemy

Outdated information and version control problems that lead to incorrect answers based on stale data.

The Outdated PTO Policy Nightmare

When Time Becomes Your Enemy

Temporal CorrectnessBusiness Critical1 week to full resolution

The Problem

Users consistently received answers based on outdated PTO policies. The current policy existed in the knowledge base but was buried in a different section with poor metadata. 40% of policy queries returned outdated information, creating confusion and potential compliance issues.

Impact:

Employee confusion, HR disputes, and potential legal compliance issues. Trust in the AI system eroded rapidly.

Diagnosis

Langfuse Trace
Input: What's our current PTO policy?
Retrieved: hr_policy_v2.pdf (outdated)
Missing: hr_policy_v3.pdf (current)
Key Metrics
contextualPrecision:30%
contextualRecall:40%
temporalAccuracy:25%

Solution: Multi-layered temporal awareness system

Temporal Metadata

Enriched all documents with version, effective dates, and supersession relationships

# Temporal metadata enrichment
docs = [
    Document(
        text=policy_text,
        metadata={
            "doc_id": "hr_policy_v3",
            "version": "3.2.1",
            "effective_date": "2024-01-01",
            "expiry_date": "2024-12-31",
            "supersedes": ["hr_policy_v2", "hr_policy_v1"],
            "section": "pto_policy"
        }
    )
]
Temporal Filtering

Hard filters ensure only current policies are retrieved

# Filter by effective date
current_date = datetime.now().isoformat()
retriever = index.as_retriever(
    filters={
        "effective_date": {"$lte": current_date},
        "expiry_date": {"$gte": current_date}
    },
    similarity_top_k=5
)
Temporal Reranking

Boost recent documents in ranking to prefer current information

class TemporalReranker:
    def rerank(self, nodes, query):
        for node in nodes:
            days_old = (datetime.now() - node.metadata["last_updated"]).days
            temporal_boost = max(0.1, 1.0 - (days_old / 365))
            node.score *= temporal_boost
        return sorted(nodes, key=lambda x: x.score, reverse=True)

Results

Before
temporalAccuracy:25%
contextualPrecision:30%
userComplaints:23%
After
temporalAccuracy:95%
contextualPrecision:85%
userComplaints:1%

Real-time alerts when outdated content is retrieved, with automatic escalation to content team.

Key Lessons

  • Temporal metadata is essential for any time-sensitive information
  • Hard filters prevent outdated content from reaching users
  • Automated monitoring catches temporal regressions before users do
  • Version control for knowledge bases is as important as code version control