Title¶
Standardizing Autonomous Knowledge Retrieval via an Agentic RAG “Pre-Flight” Workflow for aider.
Status¶
Prposed
Date¶
2026-01-13
Context¶
Our repository is a massive knowledge base (KB) exceeding 1 million tokens, containing critical workflows and software stack instructions. Converting it to the ctandard RAG (Retrieval-Augmented Generation) in aider or Open WebUI faces several industrial-grade constraints:
Context Overload: A million tokens exceed the functional context window of most models (especially, local models, e.g.,
qwen2.5-coder), causing noise and hallucination.Human Error: Manually identifying and adding relevant documentation files to
aidersessions is unreliable and inconsistent.Workflow Silos: Project-specific code and global organizational standards (workflow/stack) are disconnected, leading to architectural drift.
Local Stack Constraints: The solution must remain within the local perimeter (Fedora/Debian) and avoid heavy VRAM/orchestration overhead.
Decision¶
We will implement an Agentic RAG “Pre-Flight” Wrapper (the aidx pattern). This transforms the current passive retrieval into an active, autonomous research loop.
Architecture: A two-stage “Research-Apply” pipeline.
The Researcher (Stage 1): Before launching
aider, a lightweight agent (e.g.,ministral) queries a dedicated local vector database (Qdrant or PostgreSQL withpgvector).The Code Agent (Stage 2):
aideris launched automatically with the retrieved snippets or file paths injected into its initial context via the--message-fileor--readparameters.Namespace Partitioning: The RAG will use separate namespaces (collections) for “Global_Workflows” and “Project_Specific” data to ensure high-precision retrieval.
Consequences¶
Positive¶
Zero-Manual-Setup: The agent automatically consults the global KB without human intervention, ensuring the developer never “forgets” a standard.
Token Efficiency: Only high-relevance chunks are sent to the context window, preserving the 32B model’s “attention” for actual code logic.
Scalability: This pattern comfortably handles millions of tokens by offloading the heavy lifting to specialized vector database indices.
Negative¶
Execution Latency: Adds 2–5 seconds of startup time per session to conduct the research phase.
Orchestration Debt: Requires maintaining a Python-based wrapper and a running vector service (Qdrant/PostgreSQL). Mitigation: Adhere to ADR 26001 (OOP Python standards) to ensure the wrapper is tested and maintainable.
Alternatives¶
Passive RAG (Open WebUI): Rejected. Lacks CLI automation and cannot be easily triggered by
aiderduring code tasks.Manual File Addition: Rejected. High risk of human error and “Knowledge Debt” where developers miss critical workflow updates.
Context Window Stuffing: Rejected. Local models suffer from “Lost in the Middle” phenomena when processing 1M+ tokens simultaneously.
References¶
ADR 26001: Use of Python and OOP for Git Hook Scripts
ISO 29148: Systems and Software Engineering — Life Cycle Processes
SWEBOK Guide V4.0 - Software Engineering Body of Knowledge
Participants¶
Vadim Rudakov
Senior AI Architect (Consultant)