Visual Analysis Consultant
Boost productivity using this visual, analysis, specialist, proactively. Includes structured workflows, validation checks, and reusable patterns for ocr extraction team.
Visual Analysis Consultant
Expert image analysis and OCR extraction agent that converts scanned documents into structured markdown while preserving visual hierarchy, formatting, and semantic meaning.
When to Use This Agent
Choose this agent when you need to:
- Extract text from scanned documents, photographs of text, or screenshots with high accuracy
- Convert visually structured content (headings, lists, tables, callouts) into clean markdown
- Analyze complex document layouts including multi-column pages, nested lists, and embedded code blocks
- Handle edge cases like rotated text, watermarks, faded ink, or low-resolution scans
Consider alternatives when:
- You already have digital text and need to compare two versions (use the Text Comparison Assistant)
- Your primary goal is final quality validation rather than initial extraction (use the OCR Quality Assurance Agent)
Quick Start
Configuration
name: visual-analysis-consultant type: agent category: ocr-extraction-team
Example Invocation
claude agent:invoke visual-analysis-consultant "Extract and structure text from scanned-contract-page4.png preserving all headings and table formatting"
Example Output
## Section 4: Payment Terms
| Term | Duration | Rate |
|------|----------|------|
| Initial | 12 months | $2,400/mo |
| Renewal | 6 months | $2,160/mo |
Payment is due on the **first business day** of each calendar month.
Late payments incur a *1.5% monthly* penalty fee.
[LOW CONFIDENCE] Footnote text partially obscured: "Subject to Β§12.3..."
Core Concepts
Document Analysis Pipeline Overview
| Aspect | Details |
|---|---|
| Input Formats | PNG, JPEG, TIFF, PDF page rasters, screenshots |
| Layout Detection | Multi-column recognition, reading-order inference, region segmentation |
| Semantic Mapping | Font size to heading level, weight to emphasis, indentation to list nesting |
| Confidence Reporting | Per-region confidence scores with LOW/MEDIUM/HIGH annotations on uncertain segments |
Extraction Architecture
βββββββββββββββββββ βββββββββββββββββββ
β Source Image ββββββΆβ Layout Region β
β Input β β Segmentation β
βββββββββββββββββββ βββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ
β Reading Order ββββββΆβ OCR Character β
β Inference β β Recognition β
βββββββββββββββββββ βββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ
β Semantic Style ββββββΆβ Markdown β
β Classification β β Renderer β
βββββββββββββββββββ βββββββββββββββββββ
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
| heading_detection | string | font-size | Strategy for heading inference: font-size, weight, or combined |
| min_confidence | float | 0.80 | Characters below this threshold are wrapped in uncertainty markers |
| table_detection | boolean | true | Enable automatic table structure recognition from grid lines or alignment |
| preserve_emphasis | boolean | true | Map bold/italic visual styles to markdown emphasis markers |
| multi_column_mode | string | auto | Column handling: auto (detect), single, dual, or triple |
Best Practices
-
Perform a Full-Page Scan Before Extracting Details Begin with a high-level structural pass to identify all regions (headers, body, sidebars, footnotes) before diving into character-level extraction. This prevents missing entire sections that fall outside the primary reading flow, such as margin notes or pull quotes.
-
Infer Reading Order from Visual Layout, Not File Order Rasterized images have no inherent text order. Use column boundaries, vertical positions, and indentation cues to reconstruct the intended reading sequence. Processing left column fully before right column is essential for two-column academic papers and newsletters.
-
Map Visual Hierarchy to Markdown Semantics Consistently Establish a deterministic mapping between font sizes and heading levels (e.g., 24pt maps to H1, 18pt to H2, 14pt to H3) and apply it uniformly across the document. Inconsistent mapping produces confusing output that undermines downstream processing.
-
Flag Uncertain Regions Instead of Guessing When a character or word falls below the confidence threshold, annotate it with a marker rather than silently inserting your best guess. Downstream agents and human reviewers depend on these annotations to prioritize their review effort effectively.
-
Handle Non-Text Elements Descriptively Images, diagrams, and decorative elements cannot be extracted as text, but their presence and relationship to surrounding text should be documented. Use descriptive placeholders like
[Figure: Bar chart showing Q3 revenue by region]to maintain document completeness.
Common Issues
-
Multi-column text merges into a single stream Without explicit column detection, adjacent columns can interleave line by line, producing garbled output. Enable
multi_column_modeand verify that the agent correctly identifies column gutters. For documents with irregular column widths, manual region hints may be needed. -
Table structure lost when grid lines are faint or absent Many scanned documents use alignment rather than visible borders to define tables. When
table_detectionfails on borderless tables, provide a hint about expected column count or switch to whitespace-based column detection, which uses character spacing patterns to infer boundaries. -
Heading levels inconsistent across pages If the source document uses slightly different font sizes on different pages (common with photocopied or re-scanned materials), the heading level mapping can drift. Calibrate heading detection per page or define explicit pixel-range thresholds to maintain consistency throughout the extraction.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
API Endpoint Builder
Agent that scaffolds complete REST API endpoints with controller, service, route, types, and tests. Supports Express, Fastify, and NestJS.
Documentation Auto-Generator
Agent that reads your codebase and generates comprehensive documentation including API docs, architecture guides, and setup instructions.
Ai Ethics Advisor Partner
All-in-one agent covering ethics, responsible, development, specialist. Includes structured workflows, validation checks, and reusable patterns for ai specialists.