Development Plan: docs-validation-engine - AI Engineering Book from SovIAR

Owner: Vadim Rudakov, rudakow.wadim@gmail.com Version: 0.1.0 Birth: 2026-01-27 Last Modified: 2026-01-27

Overview¶

This document outlines the development plan for extracting the validation scripts from tools/scripts/ into a standalone, reusable Python package. See ADR 26012 for the architectural decision and rationale.

Motivation: Why Extract?¶

The Problem¶

As the number of documentation repositories grows, maintaining identical validation logic across them becomes unsustainable:

Copy-Paste Drift: Scripts copied between repos diverge over time as fixes are applied inconsistently.
Duplicated Testing: Each repo maintains its own test suite for identical logic.
Inconsistent Behavior: Different repos may have subtly different validation rules, causing confusion.
Bootstrap Overhead: New repos require significant setup effort to replicate the validation infrastructure.

The Solution¶

A standalone pip package that:

Provides a single source of truth for validation logic
Allows semantic versioning for controlled updates
Integrates seamlessly with pre-commit hooks
Supports per-repo configuration without code changes

Target Consumers¶

ai_engineering_book (this repository) - Primary development testbed
Future documentation repositories - Planned projects using MyST/Jupytext workflow
Community - Potential open-source release for MyST/Jupyter ecosystem

Architecture¶

Package Structure¶

docs-validation-engine/
├── pyproject.toml              # Package metadata, dependencies, entry points
├── README.md                   # Usage documentation
├── LICENSE                     # MIT or similar
├── src/
│   └── docs_validator/
│       ├── __init__.py         # Version, public API
│       ├── cli.py              # Click-based CLI dispatcher
│       ├── config.py           # Configuration loader (pyproject.toml)
│       ├── core/
│       │   ├── __init__.py
│       │   ├── file_finder.py  # Shared file discovery logic
│       │   ├── link_extractor.py
│       │   └── reporter.py     # Shared reporting/output logic
│       ├── validators/
│       │   ├── __init__.py
│       │   ├── broken_links.py
│       │   ├── link_format.py
│       │   ├── api_keys.py
│       │   ├── json_files.py
│       │   └── script_suite.py
│       └── sync/
│           ├── __init__.py
│           ├── jupytext_sync.py
│           └── jupytext_verify.py
└── tests/
    ├── conftest.py             # Shared fixtures
    ├── test_config.py
    ├── test_broken_links.py
    ├── test_link_format.py
    └── ...

Configuration Schema¶

Consuming repositories configure the engine via pyproject.toml:

[tool.docs-validator]
# Directory exclusions (applied to all validators)
exclude_dirs = [
    "drafts",
    ".venv",
    "node_modules",
    ".git",
]

# File exclusions
exclude_files = [
    ".aider.chat.history.md",
    "CHANGELOG.md",
]

# Link strings to ignore (for broken links checker)
exclude_link_strings = [
    "example.com",
    "placeholder",
    "your-domain.com",
]

# Script suite paths (for script_suite validator)
[tool.docs-validator.script_suite]
scripts_dir = "tools/scripts"
tests_dir = "tools/tests"
docs_dir = "tools/docs/scripts_instructions"
excluded_scripts = ["paths.py", "__init__.py"]

CLI Interface¶

# Individual validators
docs-validator check-broken-links [--paths PATH...] [--verbose]
docs-validator check-link-format [--paths PATH...] [--fix | --fix-all]
docs-validator check-api-keys [--paths PATH...]
docs-validator check-json-files [--paths PATH...]
docs-validator check-script-suite [--verbose]

# Jupytext operations
docs-validator jupytext-sync [--paths PATH...]
docs-validator jupytext-verify [--paths PATH...]

# Run all validators
docs-validator check-all [--paths PATH...]

Pre-commit Integration¶

# .pre-commit-config.yaml in consuming repos
repos:
  - repo: https://github.com/username/docs-validation-engine
    rev: v0.1.0
    hooks:
      - id: check-broken-links
        types: [markdown]
      - id: check-link-format
        types: [markdown]
      - id: check-api-keys
        types_or: [python, markdown, json]
      - id: check-json-files
        types: [json]
      - id: jupytext-sync
        types_or: [markdown, jupyter]
      - id: jupytext-verify
        types_or: [markdown, jupyter]
      - id: check-script-suite
        types: [python]

Development Phases¶

Phase 1: Foundation (Week 1-2)¶

Goal: Create package skeleton with configuration system.

Tasks:

Initialize new repository with uv init
Set up package structure (src/docs_validator/)
Implement config.py - load from pyproject.toml
Create CLI skeleton with Click
Add CI/CD pipeline (GitHub Actions)
Write configuration tests

Deliverable: Empty package that reads configuration and provides CLI help.

Phase 2: Core Extraction (Week 3-4)¶

Goal: Extract and refactor existing validators.

Tasks:

Extract shared utilities (file_finder.py, reporter.py, link_extractor.py)
Port check_broken_links.py with configuration support
Port check_link_format.py with configuration support
Port remaining validators (api_keys, json_files, script_suite)
Migrate existing tests, adapt for new structure
Ensure 100% test coverage

Deliverable: All validators working via CLI with configuration.

Phase 3: Jupytext Integration (Week 5)¶

Goal: Extract notebook synchronization tools.

Tasks:

Port jupytext_sync.py
Port jupytext_verify_pair.py
Add Jupytext as optional dependency
Write integration tests with real notebook fixtures

Deliverable: Full Jupytext support in package.

Phase 4: Pre-commit Hooks (Week 6)¶

Goal: Enable pre-commit integration.

Tasks:

Create .pre-commit-hooks.yaml in package repo
Define hook entry points for each validator
Test integration with this repository as consumer
Document hook configuration options

Deliverable: Package usable as pre-commit repo source.

Phase 5: Migration & Documentation (Week 7-8)¶

Goal: Migrate this repository to use the package.

Tasks:

Add docs-validation-engine as dev dependency
Update .pre-commit-config.yaml to use remote hooks
Remove local scripts (keep as reference/archive)
Update all documentation to reflect new workflow
Write comprehensive README for the package
Create migration guide for future repos

Deliverable: This repo fully migrated; package ready for v1.0.0.

Design Decisions¶

Why Click for CLI?¶

Industry standard for Python CLIs
Built-in help generation
Easy subcommand composition
Decorator-based, clean syntax

Why pyproject.toml for Configuration?¶

Already present in Python projects
Standard location for tool configuration ([tool.X])
No additional config files needed
Supports complex nested structures

Why Zero External Dependencies (Core)?¶

Follows SVA (Smallest Viable Architecture) principle
Faster installation
No dependency conflicts
Jupytext is optional (only needed for sync commands)

Why Semantic Versioning?¶

Clear compatibility guarantees
Consumers can pin to major versions
Breaking changes are explicit

Risk Mitigation¶

Risk	Likelihood	Impact	Mitigation
Configuration complexity	Medium	Medium	Sensible defaults; minimal required config
Breaking changes affecting consumers	Medium	High	Strict semver; deprecation warnings before removal
Test coverage gaps during extraction	Low	High	Migrate tests alongside code; CI enforces coverage
Performance regression	Low	Medium	Benchmark before/after; optimize hot paths
Scope creep	Medium	Medium	Stick to existing functionality; new features after v1.0

Success Criteria¶

Functional Parity: All existing validation behavior preserved
Test Coverage: >= 90% line coverage
Documentation: README, migration guide, API docs
Performance: No measurable slowdown vs current scripts
Adoption: This repository successfully migrated
Usability: New repo setup < 15 minutes with package

Open Questions¶

Package Name: docs-validation-engine, myst-docs-validator, jupytext-validator?
Hosting: GitHub (public) or private registry?
License: MIT? Apache 2.0?
Minimum Python Version: 3.11? 3.12? 3.13?

References¶

ADR 26012: Extraction of Documentation Validation Engine
ADR 26001: Use of Python and OOP for Git Hook Scripts
ADR 26002: Adoption of the Pre-commit Framework
ADR 26011: Formalization of the Mandatory Script Suite Workflow
Pre-commit: Creating new hooks
Click Documentation