Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Instruction on prepare_prompt.py script


Owner: Vadim Rudakov, rudakow.wadim@gmail.com Version: 0.3.1 Birth: 2026-01-25 Last Modified: 2026-01-30


1. Architectural Overview: The SVA Principle

This script prepares prompt files (“Layer 3: Prompts-as-Infrastructure”) for LLM consumption by removing metadata, stripping special characters, and converting to a YAML-like output format.

This tool is designed to transform structured prompt files into clean, readable formats suitable for copying into LLM interfaces or automated prompt injection. It supports multiple input formats with automatic detection based on file extension.

It adheres to the Smallest Viable Architecture (SVA) principle.

2. Key Capabilities & Logic

A. Supported Input Formats

The script auto-detects format from file extension:

ExtensionFormatParser
.jsonJSONjson.loads()
.yaml, .ymlYAMLyaml.safe_load()
.tomlTOMLtomllib.loads() (stdlib)
.mdMarkdownYAML frontmatter + body extraction
.txtPlain TextPass-through wrapper

B. Processing Operations

OperationDescription
Metadata RemovalRemoves the metadata field/table from structured data
Character StrippingRemoves *, ', ", `, # from output (preserves * in math expressions)
YAML ConversionConverts data structure to YAML-like indented format
Plain Text ExtractionOptionally extracts only text values

C. Math Expression Preservation

The script uses context-aware character stripping to preserve multiplication operators in mathematical expressions:

PatternExamplePreserved
var * numE * 0.35
num * var0.35 * E
num * num0.35 * 0.25

This ensures formulas like WRC = (E * 0.35) + (A * 0.25) remain meaningful rather than becoming WRC = (E 0.35) + (A 0.25).

Formatting characters (**bold**, `code`) are still stripped as they are visual noise for LLM consumption.

D. Output Formats

FormatDescriptionUse Case
yaml (default)YAML-like key: value structureCopying to LLM interfaces
plainText values only, newline-separatedAutomated processing

3. Technical Architecture

The script uses a handler-based architecture with format auto-detection:

ClassResponsibility
FormatDetectorDetect input format from file extension or explicit flag
InputHandler (ABC)Base class with shared output methods (to_yaml_like, to_plain_text)
JsonHandlerParse JSON input
YamlHandlerParse YAML input
TomlHandlerParse TOML input
MarkdownHandlerExtract YAML frontmatter and body from Markdown
PlainTextHandlerPass-through for plain text
HandlerFactoryCreate appropriate handler based on format
ReporterOutput formatting and exit code handling
PreparePromptCLIArgument parsing and main orchestration

4. Operational Guide

Configuration Reference

  • Primary Script: tools/scripts/prepare_prompt.py

  • Pre-commit Config: .pre-commit-config.yaml

  • CI Config: .github/workflows/quality.yml

Command Line Interface

prepare_prompt.py <file> [--input-format FORMAT] [--output-format yaml|plain] [--stdin] [--verbose]
ArgumentDescriptionDefault
filePath to prompt file (JSON, YAML, TOML, Markdown, or text)(required unless --stdin)
--input-formatInput format: json, yaml, toml, markdown, textAuto-detected from extension
--output-formatOutput format: yaml or plainyaml
--stdinRead input from stdin (defaults to JSON)False
--verboseShow processing detailsFalse

Exit Codes:

  • 0 = Success

  • 1 = Error (file not found, invalid format, etc.)

Manual Execution Commands

Run these from the repository root using uv for consistent environment resolution:

TaskCommand
JSON fileuv run tools/scripts/prepare_prompt.py prompt.json
YAML fileuv run tools/scripts/prepare_prompt.py config.yaml
TOML fileuv run tools/scripts/prepare_prompt.py config.toml
Markdown fileuv run tools/scripts/prepare_prompt.py doc.md
Plain text outputuv run tools/scripts/prepare_prompt.py prompt.json --output-format plain
Override detectionuv run tools/scripts/prepare_prompt.py data.txt --input-format yaml
Stdin (JSON)cat prompt.json | uv run tools/scripts/prepare_prompt.py --stdin
Stdin (YAML)cat config.yaml | uv run tools/scripts/prepare_prompt.py --stdin --input-format yaml
Verbose modeuv run tools/scripts/prepare_prompt.py prompt.json --verbose

Examples

cd ../../../
ls
  1. Process a prompt file (YAML-like output):

env -u VIRTUAL_ENV uv run tools/scripts/prepare_prompt.py ai_system/3_prompts/consultants/devops_consultant.json 2>/dev/null | head -20
  1. Extract plain text values only:

env -u VIRTUAL_ENV uv run tools/scripts/prepare_prompt.py ai_system/3_prompts/consultants/devops_consultant.json --output-format plain 2>/dev/null | head -20
  1. Process with verbose output:

env -u VIRTUAL_ENV uv run tools/scripts/prepare_prompt.py ai_system/3_prompts/consultants/devops_consultant.json --verbose 2>&1 | head -5

5. Validation Layers

Layer 1: Logic Tests Pre-commit Hook

A meta-check ensures the script logic remains sound:

- id: test-prepare-prompt
  name: Test Prepare Prompt script
  entry: uv run --active pytest tools/tests/test_prepare_prompt.py
  language: python
  files: ^tools/(scripts/prepare_prompt\.py|scripts/paths\.py|tests/test_prepare_prompt\.py)$
  pass_filenames: false

This triggers whenever the script, its tests, or shared configuration change.

Layer 2: GitHub Action (Continuous Integration)

The CI pipeline in quality.yml runs the test suite when relevant files change:

prepare-prompt:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - name: Install uv
      uses: astral-sh/setup-uv@v3
    - name: Get changed files
      id: changed
      uses: tj-actions/changed-files@v45
      with:
        files_yaml: |
          logic:
            - tools/scripts/prepare_prompt.py
            - tools/tests/test_prepare_prompt.py
            - tools/scripts/paths.py
    - name: Run Logic Tests
      if: steps.changed.outputs.logic_any_changed == 'true'
      run: uv run pytest tools/tests/test_prepare_prompt.py

6. Test Suite Documentation

The script is accompanied by a comprehensive test suite (test_prepare_prompt.py) that ensures reliability.

Test Classes and Coverage

Test ClassPurpose
TestFormatDetectorExtension detection, explicit format override
TestInputHandlerAbstract base class verification
TestJsonHandlerJSON parsing, metadata removal, format conversion
TestJsonHandlerVerboseVerbose output verification
TestYamlHandlerYAML parsing, metadata removal
TestTomlHandlerTOML parsing, metadata removal
TestMarkdownHandlerFrontmatter extraction, metadata removal
TestPlainTextHandlerPass-through behavior
TestHandlerFactoryHandler creation for each format
TestReporterExit codes, output formatting
TestPreparePromptCLIIntegration tests for CLI modes
TestPreparePromptCLIComplexJsonTests with realistic nested structures
TestPreparePromptCLIInputFormatsMulti-format integration tests
TestMathPreservationMath operator preservation, formatting stripping

Key Test Scenarios

  • Format Detection: Auto-detect from extension, explicit override

  • All Input Formats: JSON, YAML, TOML, Markdown, plain text

  • Invalid Input: Syntax errors for each format

  • Metadata Removal: Verification across all formats

  • Character Stripping: Special characters removed from output

  • Math Preservation: Multiplication operators preserved in formulas (E * 0.35)

  • Stdin Mode: Reading from stdin with format specification

  • Output Formats: Both yaml and plain output

  • Error Handling: File not found, invalid format

Running the Tests

To run the full suite, execute from the repository root:

$ uv run pytest tools/tests/test_prepare_prompt.py
env -u VIRTUAL_ENV uv run pytest tools/tests/test_prepare_prompt.py -q
env -u VIRTUAL_ENV uv run pytest tools/tests/test_prepare_prompt.py --cov=. --cov-report=term-missing -q