Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

📘 Requirements Engineering in the AI Era: The Gated Velocity Handbook


Owner: Vadim Rudakov, lefthand67@gmail.com
Version: 0.2.1
Birth: 2025-12-07
Last Modified: 2025-12-16


INFO: The handbook is optimized for environments supporting Mermaid.js diagrams. For static export, rasterized versions are available in Appendix A.

Maximize LLM-driven velocity while enforcing Non-Functional Requirement (NFR) compliance via structured, low-friction human Micro-Gating.

I. Foundational Philosophy: Micro-Gating vs. The Architectural Debt Trap

The legacy approach to Requirements Engineering (RE) is broken. Time wasted in lengthy document reviews, political negotiation, and ambiguous acceptance criteria has stalled the development cycle. Our methodology is designed to eliminate this inertia by delegating synthesis and generation to Large Language Models (LLMs) and delegating verification and auditing to Small Language Models (SLMs).

The Critical Failure of POC-First Review

A Proof of Concept (POC) is an evaluation of functional feasibility only. It is not a reliable criterion for production. By delaying human review until the POC is built, we risk committing to an architecture that is unviable for scaling, insecure, or too expensive.

The Core Solution: The Gated Velocity Pipeline

We enforce governance through Gating Functions—mandatory, auditable checkpoints where human experts review SLM-generated audit reports, not voluminous documents. This allows for high-speed LLM generation between gates while retaining executive control over architectural risk.

The Engineer’s Mindset: The WRC Formula

Every engineer must evaluate decisions based on the trade-off between speed and viability. The Weighted Risk Score (WRC) is a conceptual framework guiding all architectural and design decisions in the AI-Augmented Gated Velocity Pipeline. It ensures that velocity is balanced against viability and future cost.

The WRC formalizes the architectural trade-off. We aim to maximize functional achievement and compliance while minimizing complexity and cost. A simplified, conceptual calculation is:

WRC=kEE+kAAkCCWRC = \frac{k_E E + k_A A}{k_C C}

Where kk values are project-specific weighting constants (e.g., kAk_A is high for financial systems).

II. The Model Allocation Strategy: LLM vs. SLM

We strategically allocate model resources based on their inherent strengths and organizational costs. This ensures Efficiency—we reserve high-cost, high-latency tools for high-value synthesis and use local, fast tools for deterministic checks.

1. Large Language Models (LLMs): The Synthesizers

DimensionLarge Language Models (LLMs)Small Language Models (SLMs)
Primary RoleSynthesis, generation, ambiguity resolutionVerification, validation, deterministic transformation
Typical TasksDrafting user stories from BRs, refining specsSchema validation, rule-based consistency checks
Execution EnvironmentCloud-hosted / API (e.g., OpenRouter, Anthropic)Local (CPU/GPU), containerized (Podman/Docker)
Key ConstraintsToken cost, API latency, rate limitsRAM/VRAM budget, inference speed, reproducibility
Change FrequencyLow (used for initial high-entropy output)High (integrated into CI/validation loops)
Human Interaction PointInput formulation, output triageException handling, audit escalation
Example ModelsClaude 3.5, GPT-4o, Llama 3.1 70BDeepseek-R1, Qwen2.5-Coder-7B, Gemma3N

2. Human–AI Collaboration (HAIC)

The diagram below illustrates how the LLM and SLM roles are integrated into the lifecycle, with humans acting as Gating Functions at high-risk junctures. The goal is to maximize the time spent in the large automated blocks while minimizing the time spent in the human review gates.

Figure 1: The Gated Velocity Pipeline. LLMs drive synthesis; SLMs enforce deterministic audits; humans gate only at high-risk decision points (G1–G3). Note the critical feedback loop (“G1: Architectural Viability Gate”) ensuring the LLM corrects its own blueprint.

It aligns with the HAIC pattern validated in arXiv:2511.01324v3, where 54.4% of practitioners use AI as a collaborative partner—not an autonomous agent.

III. The Three Micro-Gating Functions (G1, G2, G3)

Human expertise is reserved for these three gates. In all cases, the human reviewer is presented with a concise, auditable SLM-generated summary instead of raw documentation or code.

G1: Architectural Viability Gate (Architect Sign-off)

This gate occurs immediately after the LLM generates the Technical Specification. It prevents the team from implementing a flawed, unviable blueprint.

G2: Code Integrity Gate (Security/Engineer Sign-off)

This gate occurs before the generated code is merged into the main branch, integrated directly into the Pull Request (PR) process. It combats Generative Debt (structural and security flaws).

G3: Strategic Acceptance Gate (Business Sign-off)

This gate concludes the cycle, verifying that the built system meets the high-level business need.

IV. The AI-Driven Artifact Chain and Governance

1. The NFR Manifest (JSON Contract)

This contract is the central mechanism for architectural governance. It is provided by the human before the LLM starts generation and is the primary input for the G1 Validator. The structured JSON format makes NFRs machine-readable and auditable, preventing them from being lost in prose.

2. The Traceability Mandate: Accountability in AI Engineering

To ensure Accountability (A), every artifact must be traceable back to its origin and the human who approved it.

V. The Artifact Flow and Version Control: Stacked Diffs

The efficiency gained by offloading synthesis to LLMs must be preserved during code review and integration. The LLM-generated artifacts are naturally granular, small, and testable—making them ideal for the Stacked Diffs version control strategy. This strategy enforces small, atomic changes, accelerating human review and maintaining a clean, linear history.

See Stacked Diffs (and why you should know about them)

1. The Atomic Change Mandate

Every User Story (“Elicitation”) that proceeds to the Code Generation phase (“Code & Test Generation”) must be treated as a candidate for a single, independent, verifiable atomic commit.

2. The Stacked Submission Workflow

The developer’s primary responsibility shifts from writing boilerplate code to organizing, reviewing, and submitting the LLM’s output as a clean stack.

StepActionDependencyArtifact Created
Initial CommitBase feature setup (e.g., project scaffolding).NoneBase Diff 0 (Submitted/Merged)
Generate Diff 1LLM (G) generates code for User Story 1.Requires Diff 0Diff 1 (Stacked on Diff 0)
Verify Diff 1Code passes G2 (Code Integrity Gate) and the SLM Audit (I).-Diff 1 is Cleared
Generate Diff 2LLM (G) generates code for User Story 2.Requires Diff 1Diff 2 (Stacked on Diff 1)
Verify Diff 2Code passes G2 (Code Integrity Gate) and the SLM Audit (I).-Diff 2 is Cleared

3. Review and Integration Benefits

The stacked methodology directly enhances the efficiency of the Gating Functions:

This combination ensures that the velocity gained by AI generation is not lost in chaotic integration or slow, monolithic review processes.

VI. Strategic Takeaways for the Engineer

1. Avoid the Architectural Complexity Penalty

The goal of the G1 gate is to enforce the Simplest Viable Architecture (SVA). Engineers must actively mitigate complexity, as it directly impacts your Cost Penalty (C) score.

2. Final Traceability: Gherkin to Executable Code

The final link in the chain is automation. The SLM translates the human-approved BDD (G3) into executable tests, completing the chain and creating a living requirement.

This process ensures that your requirements are not passive documents but active, verifiable code assets. The entire lifecycle is structured for maximum Accountability and Velocity.

VII. Appendices (Mandatory Companion Documents)

The following structured templates are mandatory operational tools for this handbook. They provide the necessary constraints to ensure LLM output is auditable by SLMs.

See Apendices.

Further Reading

  1. Towards Human-AI Synergy in Requirements Engineering: A Framework and Preliminary Study

  2. AI for Requirements Engineering: Industry Adoption and Practitioner Perspectives

Appendix A. Figure 1: The Gated Velocity Pipeline

Appendix B. Glossary of Terms

This glossary standardizes the terminology necessary for operating the Gated Velocity Pipeline and understanding its core architectural principles.

I. Methodology & Architecture

TermDefinitionContext/Rationale
Gated Velocity PipelineThe end-to-end framework, comprising three human-in-the-loop (HITL) Micro-Gates (G1, G2, G3) that enforce NFR compliance at critical decision points to mitigate Generative Debt.The name highlights the core goal: balancing speed (velocity) with quality control (gating).
SVA (Smallest Viable Architecture)The principle that the production system and the SLM validation stack must favor local, CLI-driven, GitOps-native solutions (e.g., local SLMs via Ollama, structured JSON artifacts).Minimizes architectural complexity and cost (zero C1-C4 penalties) by focusing external, high-cost LLMs only on initial synthesis, keeping the deployment stack simple.
Generative DebtArchitectural or technical debt created by LLM synthesis when the generated code or specification violates NFRs (e.g., high memory consumption, security flaws, high complexity).The primary risk the entire Gated Velocity framework is designed to prevent.
WRC (Weighted Response Confidence)The primary viability metric for the methodology, calculated based on the three factors: Effectiveness (E), Accountability (A), and the Complexity/Cost Penalty (C). Must be 0.89\mathbf{\ge 0.89} for Production-Ready.The quantitative measure used for architectural decision-making, trading off speed for viability.
Micro-GatingThe term for the three low-friction, high-frequency human review steps (G1, G2, G3).Emphasizes that reviews are small, fast, and highly targeted, contrasting with monolithic human review.
BDD-First Hybrid ModelThe optimal elicitation approach where the human defines the Gherkin Sketch, and the LLM augments it into the complete Gherkin Feature File.Maximizes both human quality assurance and LLM synthesis velocity.

II. Artifacts & Requirements

TermDefinitionContext/Rationale
NFR ManifestThe definitive, machine-readable contract defining all Non-Functional Requirements. Always stored as a versioned JSON file in the repository.The single source of truth for all automated NFR audits (G1 and G2).
Technical Specification BlueprintThe detailed, LLM-generated design document (typically Markdown) that outlines the code structure, data flow, and architecture before coding begins.The artifact validated at the G1 Gate.
Gherkin SketchThe initial, essential subset of Gherkin steps (e.g., 3-5 critical Given/When/Then scenarios) authored by the human to define the core functional intent.The human’s high-value input into the Hybrid BDD Model.
Gherkin Feature FileThe final, complete, LLM-augmented file containing all BDD scenarios used for acceptance testing (G3).The artifact representing the fully defined functional requirement.
SLM Audit Findings JSONThe structured output from the SLM Auditor (G1 or G2) listing specific NFR violations, line numbers, and justifications based on the NFR Manifest.The actionable input used by the human reviewer during Micro-Gating.

III. Roles & Processes

TermDefinitionContext/Rationale
G1: Architectural Viability GateThe point where the Architect reviews the Technical Specification Blueprint for NFR compliance, validated by the SLM NFR Contract Validator.Ensures systemic viability before implementation.
G2: Code Integrity GateThe point where the Engineer reviews the generated Code Diff for Generative Debt (complexity, security, dependency violations), validated by the SLM Code Auditor.Prevents high-debt code from entering the main branch.
G3: Strategic Acceptance GateThe final sign-off point where the Product Owner/Analyst approves the functional behavior and NFR compliance (Dual-Audit).The official acceptance criteria for the entire feature.
Dual-Audit G3 GateThe procedure at G3 requiring two separate confirmations: 1) BDD scenarios pass, and 2) a final SLM check confirms NFRs (e.g., latency, PII safety) are met on the staging environment.The refinement necessary to push the WRC above 0.89.
HITL (Human-in-the-Loop)The acknowledgment that human expertise (the Architect, Engineer, Analyst) is mandatory for strategic oversight and audit of AI-generated artifacts.A core design philosophy of the methodology.