Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Content Lifecycle Policy for RAG-Consumed Repositories

ADR-26021: Content Lifecycle Policy for RAG-Consumed Repositories

Date

2026-02-05

Status

Proposed

Context

This repository is consumed by RAG pipelines for weighted decision-making. Two categories of documentation artifacts coexist — content articles (tutorials, guides, reference material) and ADRs (architectural decision records) — but they have fundamentally different risk profiles when retrieved by AI assistants:

  1. Stale content articles present outdated recommendations as current truth. A superseded article stating “use GPT-4 as your frontier model” or “use 3B–8B for chats” pollutes retrieval with harmful misinformation. Unlike ADRs, content articles carry no metadata signaling their obsolescence.

  2. Stale ADRs provide valuable negative knowledge. A superseded ADR retrieved by RAG says “this approach was tried and replaced by ADR-26XXX because [reason]” — this prevents teams from re-exploring dead ends.

The existing ADR infrastructure (ADR-26016) already handles the ADR lifecycle via YAML status field and superseded_by pointer. However, no equivalent policy exists for content articles, leading to an accumulation of outdated material that degrades RAG retrieval quality.

Triggering incident: Two early articles (llm_usage_patterns.md v0.1.5, choosing_model_size.md v0.2.3, both born 2025-10-19) were found to contain outdated model references and oversimplified taxonomies superseded by newer content (Jan 2026 model zoo, aidx framework). These articles had no mechanism to signal their obsolescence to RAG consumers.

Decision

We adopt an asymmetric lifecycle policy for content articles vs. ADRs:

Content Articles

  1. Delete superseded content articles entirely. Git history serves as the archive. Do not retain stale articles in the working tree under any deprecation marker — their presence in the file system means RAG will retrieve and present them as current guidance.

  2. Before deletion, verify no unique technical value remains. If the article contains concepts not covered elsewhere, extract and integrate them into the successor article before deletion.

  3. After deletion, sweep all cross-references. Run check_broken_links.py to catch dangling references to deleted files.

ADRs (No Change)

ADRs continue to follow the immutable lifecycle defined in ADR-26016:

RAG Risk Matrix

Artifact TypeWhat RAG RetrievesRisk if StalePolicy
Content articleRecommendations, how-to guidanceMisleading — presents outdated advice as current truthDelete. Git history is the archive.
ADRDecision rationale (“we chose X because Y”)Valuable — prevents re-exploring dead endsNever delete. Use status metadata for RAG filtering.

Consequences

Positive

Negative / Risks

Alternatives

  1. Deprecation Markers in Content Articles: Add status: deprecated YAML frontmatter to stale articles. Rejected: RAG pipelines would need custom filtering logic for content articles (unlike ADRs, which already have this). More importantly, deprecated articles remain in the file system and get retrieved by keyword-based search regardless of metadata.

  2. Archive Directory: Move stale articles to an archive/ directory excluded from RAG indexing. Rejected: Breaks cross-references (ADR-26016 explicitly rejects this pattern for ADRs; the same reasoning applies to content).

  3. No Policy (Status Quo): Let stale articles accumulate. Rejected: Directly degrades RAG retrieval quality, the primary consumer of this repository.

References

Participants

  1. Vadim Rudakov

  2. Claude (AI Engineering Advisor)