General Purpose (Abstract Synthesis) vs Agentic (Instruction Adherence) Models

Owner: Vadim Rudakov, lefthand67@gmail.com
Version: 0.1.1
Birth: 2026-01-16
Last Modified: 2026-01-17

1. The Core Divergence: Synthesis vs. Adherence¶

In the Q1 2026 landscape, the “One Model to Rule Them All” paradigm is deprecated. For the Hybrid Bridge Pattern defined in the aidx framework, the selection of a Cloud Architect depends on the bifurcation of model utility:

The problem is not just “which model is best,” but the Bifurcation of LLM Utility. We are no longer in the era of the “one model to rule them all.”

Agentic Models (Instruction-Locked): Optimized for Logic Rigidity. These models treat system prompts as strict code and minimize “conversational drift” even at high context depths (128k+). They are the required tool for the Architect Phase to ensure the generated artifacts/plan.md is deterministic.
General Purpose Models (Abstract Synthesis): Optimized for Cognitive Breadth. These models excel at “Step 0” (human-led problem discovery), where the objective is to challenge assumptions and identify architectural anti-patterns through latent knowledge retrieval.

2. Model Zoo Classification (Jan 2026)¶

Tier	Models	Primary Capability	Architectural “Why”
Agentic (Instruction-Locked)	Claude 4.0 Sonnet, Gemini 3 Flash, DeepSeek-V3	Logic Rigidity. These models minimize “chatty” drift and follow system prompts as strict code.	Best for the Architect role in `aidx`. They produce a clean `plan.md` that a local model can execute without confusion.
General Purpose (Synthesis)	GPT-5, Claude 4.5 Opus, Gemini 3 Pro	Abstract Depth. These models excel at “Step 0” where the problem is not yet technical.	Best for Human-led exploration. They can challenge your stack choice by recalling niche historical failures of specific libraries.
Thinking (Verification)	OpenAI o2, Gemini 3 (DeepThink), DeepSeek-R1	System 2 Reasoning. These models use “Chain of Thought” to self-verify logic.	Best for Pre-flight verification. Use them to check if your `plan.md` has security race conditions before hitting the editor.

3. The Execution Gap & Technical Debt¶

The “General Purpose” Hallucination: GPT-level models often produce “creative” code that local SLMs (Small Language Models) like qwen2.5-coder:14b cannot interpret, resulting in an Execution Gap.
Persona Drift: General-purpose models may ignore rigid “Senior Architect” constraints to be “helpful,” violating the Smallest Viable Architecture (SVA) principle.
Model Version Drift: All aidx configurations must pin specific versions (e.g., gemini-3-flash-001) to ensure logic stability.

4. Selection Decision Matrix¶

Task Type	Recommended Tier	Justification
Unstructured Brainstorming	General Purpose	Highest “Reasoning Ceiling” for abstract problems.
Plan Serialization (`plan.md`)	Agentic	High-fidelity adherence to ADR 26006 templates.
Security/Logic Audit	Thinking	Self-verifying logic gates for high-stakes code.