Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

General Purpose (Abstract Synthesis) vs Agentic (Instruction Adherence) Models


Owner: Vadim Rudakov, lefthand67@gmail.com
Version: 0.1.1
Birth: 2026-01-16
Last Modified: 2026-01-17


1. The Core Divergence: Synthesis vs. Adherence

In the Q1 2026 landscape, the “One Model to Rule Them All” paradigm is deprecated. For the Hybrid Bridge Pattern defined in the aidx framework, the selection of a Cloud Architect depends on the bifurcation of model utility:

The problem is not just “which model is best,” but the Bifurcation of LLM Utility. We are no longer in the era of the “one model to rule them all.”

  • Agentic Models (Instruction-Locked): Optimized for Logic Rigidity. These models treat system prompts as strict code and minimize “conversational drift” even at high context depths (128k+). They are the required tool for the Architect Phase to ensure the generated artifacts/plan.md is deterministic.

  • General Purpose Models (Abstract Synthesis): Optimized for Cognitive Breadth. These models excel at “Step 0” (human-led problem discovery), where the objective is to challenge assumptions and identify architectural anti-patterns through latent knowledge retrieval.

2. Model Zoo Classification (Jan 2026)

TierModelsPrimary CapabilityArchitectural “Why”
Agentic (Instruction-Locked)Claude 4.0 Sonnet, Gemini 3 Flash, DeepSeek-V3Logic Rigidity. These models minimize “chatty” drift and follow system prompts as strict code.Best for the Architect role in aidx. They produce a clean plan.md that a local model can execute without confusion.
General Purpose (Synthesis)GPT-5, Claude 4.5 Opus, Gemini 3 ProAbstract Depth. These models excel at “Step 0” where the problem is not yet technical.Best for Human-led exploration. They can challenge your stack choice by recalling niche historical failures of specific libraries.
Thinking (Verification)OpenAI o2, Gemini 3 (DeepThink), DeepSeek-R1System 2 Reasoning. These models use “Chain of Thought” to self-verify logic.Best for Pre-flight verification. Use them to check if your plan.md has security race conditions before hitting the editor.

3. The Execution Gap & Technical Debt

  • The “General Purpose” Hallucination: GPT-level models often produce “creative” code that local SLMs (Small Language Models) like qwen2.5-coder:14b cannot interpret, resulting in an Execution Gap.

  • Persona Drift: General-purpose models may ignore rigid “Senior Architect” constraints to be “helpful,” violating the Smallest Viable Architecture (SVA) principle.

  • Model Version Drift: All aidx configurations must pin specific versions (e.g., gemini-3-flash-001) to ensure logic stability.

4. Selection Decision Matrix

Task TypeRecommended TierJustification
Unstructured BrainstormingGeneral PurposeHighest “Reasoning Ceiling” for abstract problems.
Plan Serialization (plan.md)AgenticHigh-fidelity adherence to ADR 26006 templates.
Security/Logic AuditThinkingSelf-verifying logic gates for high-stakes code.