M

Markdown Syntax Strategist

Boost productivity using this markdown, formatting, specialist, proactively. Includes structured workflows, validation checks, and reusable patterns for ocr extraction team.

AgentClipticsocr extraction teamv1.0.0MIT
0 views0 copies

Markdown Syntax Strategist

Intelligent markdown formatting agent that transforms raw OCR output and unstructured text into clean, specification-compliant markdown with proper heading hierarchy, list syntax, and code blocks.

When to Use This Agent

Choose this agent when you need to:

  • Convert OCR-extracted text with visual formatting cues into proper CommonMark syntax
  • Fix heading hierarchy violations where levels are skipped or inconsistently applied
  • Normalize list markers, indentation, and nested structure across large documents
  • Add language identifiers to code blocks and properly format inline code references

Consider alternatives when:

  • Your text has OCR character-level errors like "rn" misread as "m" (use Specialist OCR Grammar Fixer first)
  • You need to analyze document layout before text extraction (use Document Structure Analyzer Companion)

Quick Start

Configuration

name: markdown-syntax-strategist type: agent category: ocr-extraction-team

Example Invocation

claude agent:invoke markdown-syntax-strategist "Format the OCR output from technical-manual.txt into clean markdown"

Example Output

=== Markdown Formatting Report ===
Input: technical-manual.txt (4,200 words)

TRANSFORMATIONS APPLIED:
  Headings fixed:       14 (ALL CAPS β†’ proper # syntax)
  Lists normalized:     8 blocks (mixed β€’/*/- β†’ consistent -)
  Code blocks added:    6 (with language identifiers: python, bash, yaml)
  Inline code wrapped:  23 technical terms
  Emphasis corrected:   11 instances
  Line spacing fixed:   34 locations

HEADING HIERARCHY:
  # Installation Guide
  ## Prerequisites
  ### System Requirements
  ## Configuration
  ### Database Setup
  ### Environment Variables

VALIDATION: All syntax passes CommonMark spec check

Core Concepts

Markdown Element Priorities Overview

AspectDetails
Heading HierarchyStrict H1-H6 progression with no level skipping allowed
List ConsistencyUniform markers (- for unordered, 1. for ordered) with 2-space nesting
Code FormattingTriple backticks with language hints for blocks, single backticks inline
Emphasis RulesDouble asterisks for bold, single for italic, never underscores

Formatting Pipeline Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Raw Text    │────▢│  Heading    β”‚
β”‚  Ingestion   β”‚     β”‚  Normalizer β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚                   β”‚
        β–Ό                   β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  List & Code │────▢│  Validation β”‚
β”‚  Formatter   β”‚     β”‚  & Output   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Configuration

ParameterTypeDefaultDescription
spec_compliancestring"commonmark"Target markdown specification: commonmark or gfm (GitHub Flavored)
list_markerstring"-"Default unordered list marker character to normalize to
nest_indentinteger2Number of spaces per indentation level for nested list items
auto_language_detectbooleantrueAttempt to detect programming language for unlabeled code blocks
preserve_htmlbooleanfalseWhether to keep inline HTML tags or convert them to markdown equivalents

Best Practices

  1. Process Grammar Fixes Before Formatting OCR text often contains character-level errors that affect formatting decisions. Running the grammar fixer first ensures the markdown strategist works with clean text, preventing misidentification of headings or list items.

  2. Use GFM Mode for Technical Documentation GitHub Flavored Markdown supports tables, task lists, and strikethrough syntax that CommonMark does not. Set spec_compliance to "gfm" when processing technical documents that likely contain these elements.

  3. Validate Output Against a Markdown Linter After formatting, run the output through a markdown linter like markdownlint to catch edge cases the agent may miss, such as trailing spaces, missing blank lines around headings, or inconsistent emphasis markers.

  4. Preserve Intentional Formatting Exceptions Some documents use non-standard formatting deliberately, such as ALL CAPS for legal disclaimers. Add a <!-- preserve-formatting --> comment above sections that should not be transformed.

  5. Handle Nested Structures Incrementally Deeply nested lists within lists within blockquotes are error-prone to format in a single pass. Process the outermost structure first, then refine inner nesting in subsequent passes for more reliable results.

Common Issues

  1. ALL CAPS text incorrectly converted to headings Not all uppercase text represents headings. The agent uses heuristics like line position, surrounding whitespace, and text length to distinguish headings from emphasized text, but short ALL CAPS phrases in body text may be misidentified. Add those lines to an exclusion list.

  2. Code blocks missing language identifiers after formatting Auto-detection relies on syntax patterns and keywords. If a code block uses an uncommon language or is too short for reliable detection, manually specify the language by adding a hint comment like <!-- lang: rust --> above the block.

  3. Ordered list numbering resets unexpectedly When a paragraph or other block element interrupts an ordered list, CommonMark treats the subsequent items as a new list. Ensure there are no blank paragraphs between consecutive ordered list items, or use a lazy continuation marker.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates