Visual Analysis Consultant

Expert image analysis and OCR extraction agent that converts scanned documents into structured markdown while preserving visual hierarchy, formatting, and semantic meaning.

When to Use This Agent

Choose this agent when you need to:

Extract text from scanned documents, photographs of text, or screenshots with high accuracy
Convert visually structured content (headings, lists, tables, callouts) into clean markdown
Analyze complex document layouts including multi-column pages, nested lists, and embedded code blocks
Handle edge cases like rotated text, watermarks, faded ink, or low-resolution scans

Consider alternatives when:

You already have digital text and need to compare two versions (use the Text Comparison Assistant)
Your primary goal is final quality validation rather than initial extraction (use the OCR Quality Assurance Agent)

Quick Start

Configuration


name: visual-analysis-consultant
type: agent
category: ocr-extraction-team

Example Invocation


claude agent:invoke visual-analysis-consultant "Extract and structure text from scanned-contract-page4.png preserving all headings and table formatting"

Example Output

## Section 4: Payment Terms

| Term | Duration | Rate |
|------|----------|------|
| Initial | 12 months | $2,400/mo |
| Renewal | 6 months | $2,160/mo |

Payment is due on the **first business day** of each calendar month.
Late payments incur a *1.5% monthly* penalty fee.

[LOW CONFIDENCE] Footnote text partially obscured: "Subject to §12.3..."

Core Concepts

Document Analysis Pipeline Overview

Aspect	Details
Input Formats	PNG, JPEG, TIFF, PDF page rasters, screenshots
Layout Detection	Multi-column recognition, reading-order inference, region segmentation
Semantic Mapping	Font size to heading level, weight to emphasis, indentation to list nesting
Confidence Reporting	Per-region confidence scores with LOW/MEDIUM/HIGH annotations on uncertain segments

Extraction Architecture

┌─────────────────┐     ┌─────────────────┐
│  Source Image    │────▶│  Layout Region  │
│  Input          │     │  Segmentation   │
└─────────────────┘     └─────────────────┘
        │                       │
        ▼                       ▼
┌─────────────────┐     ┌─────────────────┐
│  Reading Order   │────▶│  OCR Character  │
│  Inference       │     │  Recognition    │
└─────────────────┘     └─────────────────┘
        │                       │
        ▼                       ▼
┌─────────────────┐     ┌─────────────────┐
│  Semantic Style  │────▶│  Markdown       │
│  Classification  │     │  Renderer       │
└─────────────────┘     └─────────────────┘

Configuration

Parameter	Type	Default	Description
heading_detection	string	`font-size`	Strategy for heading inference: `font-size`, `weight`, or `combined`
min_confidence	float	0.80	Characters below this threshold are wrapped in uncertainty markers
table_detection	boolean	true	Enable automatic table structure recognition from grid lines or alignment
preserve_emphasis	boolean	true	Map bold/italic visual styles to markdown emphasis markers
multi_column_mode	string	`auto`	Column handling: `auto` (detect), `single`, `dual`, or `triple`

Best Practices

Perform a Full-Page Scan Before Extracting Details Begin with a high-level structural pass to identify all regions (headers, body, sidebars, footnotes) before diving into character-level extraction. This prevents missing entire sections that fall outside the primary reading flow, such as margin notes or pull quotes.
Infer Reading Order from Visual Layout, Not File Order Rasterized images have no inherent text order. Use column boundaries, vertical positions, and indentation cues to reconstruct the intended reading sequence. Processing left column fully before right column is essential for two-column academic papers and newsletters.
Map Visual Hierarchy to Markdown Semantics Consistently Establish a deterministic mapping between font sizes and heading levels (e.g., 24pt maps to H1, 18pt to H2, 14pt to H3) and apply it uniformly across the document. Inconsistent mapping produces confusing output that undermines downstream processing.
Flag Uncertain Regions Instead of Guessing When a character or word falls below the confidence threshold, annotate it with a marker rather than silently inserting your best guess. Downstream agents and human reviewers depend on these annotations to prioritize their review effort effectively.
Handle Non-Text Elements Descriptively Images, diagrams, and decorative elements cannot be extracted as text, but their presence and relationship to surrounding text should be documented. Use descriptive placeholders like [Figure: Bar chart showing Q3 revenue by region] to maintain document completeness.

Common Issues

Multi-column text merges into a single stream Without explicit column detection, adjacent columns can interleave line by line, producing garbled output. Enable multi_column_mode and verify that the agent correctly identifies column gutters. For documents with irregular column widths, manual region hints may be needed.
Table structure lost when grid lines are faint or absent Many scanned documents use alignment rather than visible borders to define tables. When table_detection fails on borderless tables, provide a hint about expected column count or switch to whitespace-based column detection, which uses character spacing patterns to infer boundaries.
Heading levels inconsistent across pages If the source document uses slightly different font sizes on different pages (common with photocopied or re-scanned materials), the heading level mapping can drift. Calibrate heading detection per page or define explicit pixel-range thresholds to maintain consistency throughout the extraction.

⚠️ Loading Issue

Visual Analysis Consultant

Visual Analysis Consultant

When to Use This Agent

Quick Start

Configuration

Example Invocation

Example Output

Core Concepts

Document Analysis Pipeline Overview

Extraction Architecture

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

API Endpoint Builder

Documentation Auto-Generator

Ai Ethics Advisor Partner