Owner: Vadim Rudakov, rudakow
1. Architectural Overview: The SVA Principle¶
This script validates link format in Markdown files, ensuring ipynb priority when a Jupytext pair exists.
When a .md file has a paired .ipynb file, links should point to the .ipynb version because myst.yml only renders .ipynb files. Links to .md files cause downloads instead of opening as web pages.
It adheres to the Smallest Viable Architecture (SVA) principle.
2. Key Capabilities & Logic¶
Core Validation Logic¶
The script checks every link in .md files and flags an error when:
The link points to a
.mdfile (e.g.,[Guide](path/to/file.md))A paired
.ipynbfile exists (e.g.,path/to/file.ipynb)
Example:
# In source.md
[Guide](path/to/file.md) # ERROR if path/to/file.ipynb exists
[Guide](path/to/file.ipynb) # OK - correct format
[Readme](valid.md) # OK if valid.ipynb does NOT existLink Types Handled¶
A. Markdown Links
Standard syntax: [text](link) or .
Regex:
r"\[[^\]]*\]\(([^)]+)\)"
B. MyST Include Directives
Used for file transclusion:
Syntax:
{include} path/to/file.mdRegex:
r"```\{include\}([^\n]+)"`
What Gets Skipped¶
External URLs:
https://...,http://...Internal Fragments:
#sectionNon-.md Links:
image.png,data.json, etc.Excluded Link Strings: Configured in
paths.py
Path Resolution¶
Git Root Awareness: Uses
git rev-parse --show-toplevelto resolve root-absolute links (e.g.,/docs/guide.md)Relative Paths: Resolved relative to the source file
Root-Relative Paths: Resolved from the Git root directory
3. Technical Architecture¶
The script is organized into specialized classes:
LinkFormatCLI: Main orchestrator. Handles argument parsing and execution flow.LinkExtractor: Scans file content line-by-line using regex to capture Markdown and MyST links.LinkFormatValidator: The core engine. Checks if.mdlinks have paired.ipynbfiles.FileFinder: Handles recursive file traversal with exclusion logic.Reporter: Collects errors and handles exit codes.
4. Operational Guide¶
Configuration Reference¶
Primary Script:
tools/scripts/check_link_format.pyExclusion Logic: Managed via
tools/scripts/paths.py(reusesBROKEN_LINKS_EXCLUDE_*constants)Pre-commit Config:
.pre-commit-config.yamlCI Config:
.github/workflows/quality.yml
Command Line Interface¶
check_link_format.py [--paths PATH] [--pattern PATTERN] [--fix | --fix-all] [options]
| Argument | Description | Default |
|---|---|---|
--paths | One or more directories or specific file paths to scan. | . (Current Dir) |
--pattern | Glob pattern for files to scan. | *.md |
--exclude-dirs | List of directory names to ignore. | in_progress, pr, .venv |
--exclude-files | List of specific filenames to ignore. | .aider.chat.history.md |
--verbose | Shows detailed logs of skipped URLs and valid links. | False |
--fix | Interactive fix mode - asks for confirmation before fixing each file. | False |
--fix-all | Automatic fix mode - fixes all errors without prompts. | False |
Manual Execution Commands¶
Run these from the repository root using uv for consistent environment resolution:
| Task | Command |
|---|---|
| Full Repo Audit | uv run tools/scripts/check_link_format.py |
| Scan Specific Directory | uv run tools/scripts/check_link_format.py --paths ai_system/ |
| Verbose Mode | uv run tools/scripts/check_link_format.py --verbose |
| Interactive Fix | uv run tools/scripts/check_link_format.py --fix |
| Automatic Fix All | uv run tools/scripts/check_link_format.py --fix-all |
Examples¶
cd ../../../Check all
*.mdfiles in the current directory and subdirectories:
env -u VIRTUAL_ENV uv run tools/scripts/check_link_format.py 2>&1 | tail -15Using Git root as project root: ai_engineering_book
Found 81 files in: ai_engineering_book/
✅ All link formats are correct! (Note: format only - use check_broken_links.py to verify targets exist)
Check a specific directory with verbose output:
env -u VIRTUAL_ENV uv run tools/scripts/check_link_format.py --paths tools/docs --verbose 2>&1 | head -20Check a specific file:
env -u VIRTUAL_ENV uv run tools/scripts/check_link_format.py --paths README.mdUsing Git root as project root: ai_engineering_book
Found 1 file in: README.md
✅ All link formats are correct! (Note: format only - use check_broken_links.py to verify targets exist)
5. Validation Layers¶
Layer 1: Local Pre-commit Hook¶
The first line of defense runs automatically during the git commit process.
Scope: All
.mdfiles are validated to ensure consistent link format across the repository.Efficiency: Fast execution ensures no significant delay in the developer’s workflow.
Logic Tests: Includes a meta-check (
test-check-link-format) that triggers whenever the script itself or its tests change.
Layer 2: GitHub Action (Continuous Integration)¶
The CI pipeline in quality.yml validates ALL .md files when any documentation changes.
Full Repository Scan: When any
.mdfile changes, the workflow scans ALL.mdfiles.Trigger Optimization: Uses
tj-actions/changed-filesto detect when docs change.Environment Parity: Utilizes
uvfor high-performance dependency management.Failure Isolation: Separates logic tests from format validation.
quality.yml Implementation
quality.yml Implementationlink-format:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
enable-cache: true
- name: Get changed files
id: changed
uses: tj-actions/changed-files@v45
with:
files_yaml: |
logic:
- tools/scripts/check_link_format.py
- tools/tests/test_check_link_format.py
- tools/scripts/paths.py
docs:
- "**/*.md"
safe_output: true
- name: Run Logic Tests
if: steps.changed.outputs.logic_any_changed == 'true'
run: uv run pytest tools/tests/test_check_link_format.py
- name: Run Link Format Check on All Files
if: steps.changed.outputs.docs_any_changed == 'true'
run: uv run tools/scripts/check_link_format.py --verboseLayer 3: Manual Checks¶
Used for deep repository audits or post-refactoring cleanup.
Full Scan: Can be executed manually to scan the entire repository.
Custom Patterns: Supports custom file patterns and exclusion lists.
6. Error Output Format¶
When errors are found, the script outputs:
LINK FORMAT ERROR: File 'docs/guide.md:15' links to 'intro.md' but paired .ipynb exists.
Suggested fix: Change to 'intro.ipynb'The error message includes:
Source file and line number: Where the problematic link is located
Current link: The
.mdlink that should be changedSuggested fix: The correct
.ipynblink to use
7. Auto-Fix Functionality¶
The script can automatically fix detected link format issues using two modes:
Interactive Mode (--fix)¶
Prompts for confirmation before fixing each file:
File: ai_system/2_model/selection/choosing_model_size.md
Line 33: /ai_system/4_orchestration/patterns/llm_usage_patterns.md → .ipynb
Fix this file? [y/n/q] (q=quit):y: Fix all issues in this file
n: Skip this file
q: Quit and stop processing remaining files
Automatic Mode (--fix-all)¶
Fixes all errors without prompts - useful for CI pipelines or batch processing:
uv run tools/scripts/check_link_format.py --fix-allFix Output¶
After fixing, the script reports:
Total fixes applied
Files modified
Any skipped files (in interactive mode)
✅ Fixed all 5 link format errors.or in interactive mode:
✓ Fixed 3/5 link format errors.
Skipped: 28. Test Suite¶
The script is accompanied by a comprehensive test suite (test_check_link_format.py) with 42 tests covering:
Link Extraction: Verifies Markdown and MyST links are correctly identified
Format Validation: Tests the core logic for detecting
.mdlinks with.ipynbpairsFile Discovery: Tests recursive search and exclusion logic
CLI Integration: End-to-end tests for command-line behavior
Fix Functionality: Tests for
LinkFixerclass and fix modes (--fix,--fix-all)Edge Cases: External URLs, fragments, excluded paths
Running the Tests¶
# Run all tests
uv run pytest tools/tests/test_check_link_format.py
# Run with coverage
uv run pytest tools/tests/test_check_link_format.py --cov=tools.scripts.check_link_format --cov-report=term-missingenv -u VIRTUAL_ENV uv run pytest tools/tests/test_check_link_format.py -q.......................................... [100%]
42 passed in 0.09s
env -u VIRTUAL_ENV uv run pytest tools/tests/test_check_link_format.py --cov=. --cov-report=term-missing -q.......................................... [100%]
================================ tests coverage ================================
_______________ coverage: platform linux, python 3.13.11-final-0 _______________
Name Stmts Miss Cover Missing
---------------------------------------------------------------------
tools/scripts/check_link_format.py 302 63 79% 139-140, 147, 150, 166, 170-172, 214-247, 292, 328-329, 351-353, 358, 431, 438-440, 462-464, 467-469, 473-475, 492-493, 498, 515-517
tools/scripts/paths.py 8 1 88% 51
tools/tests/test_check_link_format.py 316 0 100%
---------------------------------------------------------------------
TOTAL 626 64 90%
42 passed in 0.16s