Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Instruction on check_link_format.py script


Owner: Vadim Rudakov, rudakow.wadim@gmail.com Version: 0.1.0 Birth: 2026-01-24 Last Modified: 2026-01-26


1. Architectural Overview: The SVA Principle

This script validates link format in Markdown files, ensuring ipynb priority when a Jupytext pair exists.

When a .md file has a paired .ipynb file, links should point to the .ipynb version because myst.yml only renders .ipynb files. Links to .md files cause downloads instead of opening as web pages.

It adheres to the Smallest Viable Architecture (SVA) principle.

2. Key Capabilities & Logic

Core Validation Logic

The script checks every link in .md files and flags an error when:

  1. The link points to a .md file (e.g., [Guide](path/to/file.md))

  2. A paired .ipynb file exists (e.g., path/to/file.ipynb)

Example:

# In source.md

[Guide](path/to/file.md)     # ERROR if path/to/file.ipynb exists
[Guide](path/to/file.ipynb)  # OK - correct format
[Readme](valid.md)           # OK if valid.ipynb does NOT exist

A. Markdown Links

Standard syntax: [text](link) or ![alt](image).

  • Regex: r"\[[^\]]*\]\(([^)]+)\)"

B. MyST Include Directives

Used for file transclusion:

  • Syntax: {include} path/to/file.md

  • Regex: r"```\{include\}([^\n]+)"`

What Gets Skipped

  • External URLs: https://..., http://...

  • Internal Fragments: #section

  • Non-.md Links: image.png, data.json, etc.

  • Excluded Link Strings: Configured in paths.py

Path Resolution

  • Git Root Awareness: Uses git rev-parse --show-toplevel to resolve root-absolute links (e.g., /docs/guide.md)

  • Relative Paths: Resolved relative to the source file

  • Root-Relative Paths: Resolved from the Git root directory

3. Technical Architecture

The script is organized into specialized classes:

  • LinkFormatCLI: Main orchestrator. Handles argument parsing and execution flow.

  • LinkExtractor: Scans file content line-by-line using regex to capture Markdown and MyST links.

  • LinkFormatValidator: The core engine. Checks if .md links have paired .ipynb files.

  • FileFinder: Handles recursive file traversal with exclusion logic.

  • Reporter: Collects errors and handles exit codes.

4. Operational Guide

Configuration Reference

  • Primary Script: tools/scripts/check_link_format.py

  • Exclusion Logic: Managed via tools/scripts/paths.py (reuses BROKEN_LINKS_EXCLUDE_* constants)

  • Pre-commit Config: .pre-commit-config.yaml

  • CI Config: .github/workflows/quality.yml

Command Line Interface

check_link_format.py [--paths PATH] [--pattern PATTERN] [--fix | --fix-all] [options]
ArgumentDescriptionDefault
--pathsOne or more directories or specific file paths to scan.. (Current Dir)
--patternGlob pattern for files to scan.*.md
--exclude-dirsList of directory names to ignore.in_progress, pr, .venv
--exclude-filesList of specific filenames to ignore..aider.chat.history.md
--verboseShows detailed logs of skipped URLs and valid links.False
--fixInteractive fix mode - asks for confirmation before fixing each file.False
--fix-allAutomatic fix mode - fixes all errors without prompts.False

Manual Execution Commands

Run these from the repository root using uv for consistent environment resolution:

TaskCommand
Full Repo Audituv run tools/scripts/check_link_format.py
Scan Specific Directoryuv run tools/scripts/check_link_format.py --paths ai_system/
Verbose Modeuv run tools/scripts/check_link_format.py --verbose
Interactive Fixuv run tools/scripts/check_link_format.py --fix
Automatic Fix Alluv run tools/scripts/check_link_format.py --fix-all

Examples

cd ../../../
  1. Check all *.md files in the current directory and subdirectories:

env -u VIRTUAL_ENV uv run tools/scripts/check_link_format.py 2>&1 | tail -15
Using Git root as project root: ai_engineering_book
Found 81 files in: ai_engineering_book/

✅ All link formats are correct! (Note: format only - use check_broken_links.py to verify targets exist)
  1. Check a specific directory with verbose output:

env -u VIRTUAL_ENV uv run tools/scripts/check_link_format.py --paths tools/docs --verbose 2>&1 | head -20
  1. Check a specific file:

env -u VIRTUAL_ENV uv run tools/scripts/check_link_format.py --paths README.md
Using Git root as project root: ai_engineering_book
Found 1 file in: README.md

✅ All link formats are correct! (Note: format only - use check_broken_links.py to verify targets exist)

5. Validation Layers

Layer 1: Local Pre-commit Hook

The first line of defense runs automatically during the git commit process.

  • Scope: All .md files are validated to ensure consistent link format across the repository.

  • Efficiency: Fast execution ensures no significant delay in the developer’s workflow.

  • Logic Tests: Includes a meta-check (test-check-link-format) that triggers whenever the script itself or its tests change.

Layer 2: GitHub Action (Continuous Integration)

The CI pipeline in quality.yml validates ALL .md files when any documentation changes.

  • Full Repository Scan: When any .md file changes, the workflow scans ALL .md files.

  • Trigger Optimization: Uses tj-actions/changed-files to detect when docs change.

  • Environment Parity: Utilizes uv for high-performance dependency management.

  • Failure Isolation: Separates logic tests from format validation.

Layer 3: Manual Checks

Used for deep repository audits or post-refactoring cleanup.

  • Full Scan: Can be executed manually to scan the entire repository.

  • Custom Patterns: Supports custom file patterns and exclusion lists.

6. Error Output Format

When errors are found, the script outputs:

LINK FORMAT ERROR: File 'docs/guide.md:15' links to 'intro.md' but paired .ipynb exists.
  Suggested fix: Change to 'intro.ipynb'

The error message includes:

  • Source file and line number: Where the problematic link is located

  • Current link: The .md link that should be changed

  • Suggested fix: The correct .ipynb link to use

7. Auto-Fix Functionality

The script can automatically fix detected link format issues using two modes:

Interactive Mode (--fix)

Prompts for confirmation before fixing each file:

File: ai_system/2_model/selection/choosing_model_size.md
  Line 33: /ai_system/4_orchestration/patterns/llm_usage_patterns.md → .ipynb

Fix this file? [y/n/q] (q=quit):
  • y: Fix all issues in this file

  • n: Skip this file

  • q: Quit and stop processing remaining files

Automatic Mode (--fix-all)

Fixes all errors without prompts - useful for CI pipelines or batch processing:

uv run tools/scripts/check_link_format.py --fix-all

Fix Output

After fixing, the script reports:

  • Total fixes applied

  • Files modified

  • Any skipped files (in interactive mode)

✅ Fixed all 5 link format errors.

or in interactive mode:

✓ Fixed 3/5 link format errors.
  Skipped: 2

8. Test Suite

The script is accompanied by a comprehensive test suite (test_check_link_format.py) with 42 tests covering:

  • Link Extraction: Verifies Markdown and MyST links are correctly identified

  • Format Validation: Tests the core logic for detecting .md links with .ipynb pairs

  • File Discovery: Tests recursive search and exclusion logic

  • CLI Integration: End-to-end tests for command-line behavior

  • Fix Functionality: Tests for LinkFixer class and fix modes (--fix, --fix-all)

  • Edge Cases: External URLs, fragments, excluded paths

Running the Tests

# Run all tests
uv run pytest tools/tests/test_check_link_format.py

# Run with coverage
uv run pytest tools/tests/test_check_link_format.py --cov=tools.scripts.check_link_format --cov-report=term-missing
env -u VIRTUAL_ENV uv run pytest tools/tests/test_check_link_format.py -q
..........................................                               [100%]
42 passed in 0.09s
env -u VIRTUAL_ENV uv run pytest tools/tests/test_check_link_format.py --cov=. --cov-report=term-missing -q
..........................................                               [100%]
================================ tests coverage ================================
_______________ coverage: platform linux, python 3.13.11-final-0 _______________

Name                                    Stmts   Miss  Cover   Missing
---------------------------------------------------------------------
tools/scripts/check_link_format.py        302     63    79%   139-140, 147, 150, 166, 170-172, 214-247, 292, 328-329, 351-353, 358, 431, 438-440, 462-464, 467-469, 473-475, 492-493, 498, 515-517
tools/scripts/paths.py                      8      1    88%   51
tools/tests/test_check_link_format.py     316      0   100%
---------------------------------------------------------------------
TOTAL                                     626     64    90%
42 passed in 0.16s