Markdown Syntax Strategist
Boost productivity using this markdown, formatting, specialist, proactively. Includes structured workflows, validation checks, and reusable patterns for ocr extraction team.
Markdown Syntax Strategist
Intelligent markdown formatting agent that transforms raw OCR output and unstructured text into clean, specification-compliant markdown with proper heading hierarchy, list syntax, and code blocks.
When to Use This Agent
Choose this agent when you need to:
- Convert OCR-extracted text with visual formatting cues into proper CommonMark syntax
- Fix heading hierarchy violations where levels are skipped or inconsistently applied
- Normalize list markers, indentation, and nested structure across large documents
- Add language identifiers to code blocks and properly format inline code references
Consider alternatives when:
- Your text has OCR character-level errors like "rn" misread as "m" (use Specialist OCR Grammar Fixer first)
- You need to analyze document layout before text extraction (use Document Structure Analyzer Companion)
Quick Start
Configuration
name: markdown-syntax-strategist type: agent category: ocr-extraction-team
Example Invocation
claude agent:invoke markdown-syntax-strategist "Format the OCR output from technical-manual.txt into clean markdown"
Example Output
=== Markdown Formatting Report ===
Input: technical-manual.txt (4,200 words)
TRANSFORMATIONS APPLIED:
Headings fixed: 14 (ALL CAPS β proper # syntax)
Lists normalized: 8 blocks (mixed β’/*/- β consistent -)
Code blocks added: 6 (with language identifiers: python, bash, yaml)
Inline code wrapped: 23 technical terms
Emphasis corrected: 11 instances
Line spacing fixed: 34 locations
HEADING HIERARCHY:
# Installation Guide
## Prerequisites
### System Requirements
## Configuration
### Database Setup
### Environment Variables
VALIDATION: All syntax passes CommonMark spec check
Core Concepts
Markdown Element Priorities Overview
| Aspect | Details |
|---|---|
| Heading Hierarchy | Strict H1-H6 progression with no level skipping allowed |
| List Consistency | Uniform markers (- for unordered, 1. for ordered) with 2-space nesting |
| Code Formatting | Triple backticks with language hints for blocks, single backticks inline |
| Emphasis Rules | Double asterisks for bold, single for italic, never underscores |
Formatting Pipeline Architecture
βββββββββββββββ βββββββββββββββ
β Raw Text ββββββΆβ Heading β
β Ingestion β β Normalizer β
βββββββββββββββ βββββββββββββββ
β β
βΌ βΌ
βββββββββββββββ βββββββββββββββ
β List & Code ββββββΆβ Validation β
β Formatter β β & Output β
βββββββββββββββ βββββββββββββββ
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
| spec_compliance | string | "commonmark" | Target markdown specification: commonmark or gfm (GitHub Flavored) |
| list_marker | string | "-" | Default unordered list marker character to normalize to |
| nest_indent | integer | 2 | Number of spaces per indentation level for nested list items |
| auto_language_detect | boolean | true | Attempt to detect programming language for unlabeled code blocks |
| preserve_html | boolean | false | Whether to keep inline HTML tags or convert them to markdown equivalents |
Best Practices
-
Process Grammar Fixes Before Formatting OCR text often contains character-level errors that affect formatting decisions. Running the grammar fixer first ensures the markdown strategist works with clean text, preventing misidentification of headings or list items.
-
Use GFM Mode for Technical Documentation GitHub Flavored Markdown supports tables, task lists, and strikethrough syntax that CommonMark does not. Set spec_compliance to "gfm" when processing technical documents that likely contain these elements.
-
Validate Output Against a Markdown Linter After formatting, run the output through a markdown linter like markdownlint to catch edge cases the agent may miss, such as trailing spaces, missing blank lines around headings, or inconsistent emphasis markers.
-
Preserve Intentional Formatting Exceptions Some documents use non-standard formatting deliberately, such as ALL CAPS for legal disclaimers. Add a
<!-- preserve-formatting -->comment above sections that should not be transformed. -
Handle Nested Structures Incrementally Deeply nested lists within lists within blockquotes are error-prone to format in a single pass. Process the outermost structure first, then refine inner nesting in subsequent passes for more reliable results.
Common Issues
-
ALL CAPS text incorrectly converted to headings Not all uppercase text represents headings. The agent uses heuristics like line position, surrounding whitespace, and text length to distinguish headings from emphasized text, but short ALL CAPS phrases in body text may be misidentified. Add those lines to an exclusion list.
-
Code blocks missing language identifiers after formatting Auto-detection relies on syntax patterns and keywords. If a code block uses an uncommon language or is too short for reliable detection, manually specify the language by adding a hint comment like
<!-- lang: rust -->above the block. -
Ordered list numbering resets unexpectedly When a paragraph or other block element interrupts an ordered list, CommonMark treats the subsequent items as a new list. Ensure there are no blank paragraphs between consecutive ordered list items, or use a lazy continuation marker.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
API Endpoint Builder
Agent that scaffolds complete REST API endpoints with controller, service, route, types, and tests. Supports Express, Fastify, and NestJS.
Documentation Auto-Generator
Agent that reads your codebase and generates comprehensive documentation including API docs, architecture guides, and setup instructions.
Ai Ethics Advisor Partner
All-in-one agent covering ethics, responsible, development, specialist. Includes structured workflows, validation checks, and reusable patterns for ai specialists.