Document Structure Analyzer Companion
Boost productivity using this document, structure, analysis, specialist. Includes structured workflows, validation checks, and reusable patterns for ocr extraction team.
Document Structure Analyzer Companion
Layout analysis and semantic mapping agent that deconstructs document structures into labeled regions, reading orders, and hierarchical content schemas for OCR preprocessing.
When to Use This Agent
Choose this agent when you need to:
- Analyze complex multi-column layouts before running OCR to improve extraction accuracy
- Map document hierarchies (headers, subheaders, body text, captions) for structured output
- Identify and classify visual elements like tables, forms, figures, and sidebars
- Determine correct reading order for documents with non-linear content flow
Consider alternatives when:
- Your documents are simple single-column text with no complex layout elements
- You need post-OCR grammar correction rather than pre-OCR structural analysis (use Specialist OCR Grammar Fixer)
Quick Start
Configuration
name: document-structure-analyzer-companion type: agent category: ocr-extraction-team
Example Invocation
claude agent:invoke document-structure-analyzer-companion "Analyze the structure of invoice-batch-042.pdf"
Example Output
=== Document Structure Analysis ===
File: invoice-batch-042.pdf (3 pages)
PAGE 1 REGIONS:
[HEADER] Logo + Company Name (confidence: 0.97)
[TABLE] Line items table, 5 columns x 12 rows (confidence: 0.94)
[SIDEBAR] Payment terms block, right margin (confidence: 0.89)
[FOOTER] Page number + legal disclaimer (confidence: 0.96)
READING ORDER: Header β Table β Sidebar β Footer
HIERARCHY: H1(Invoice #) β H2(Bill To, Ship To) β Body(line items)
TEMPLATE MATCH: Standard commercial invoice (92% match)
OCR RECOMMENDATIONS:
- Process table region with grid-aware extraction
- Treat sidebar as independent text block
- Apply deskew correction (2.1Β° detected)
Core Concepts
Document Region Types Overview
| Aspect | Details |
|---|---|
| Content Blocks | Continuous text regions like paragraphs, headings, and captions |
| Tabular Regions | Structured grid areas including tables, forms, and ledgers |
| Visual Elements | Non-text regions such as images, charts, logos, and diagrams |
| Navigation Markers | Page numbers, headers, footers, and section dividers |
Structure Analysis Pipeline Architecture
βββββββββββββββ βββββββββββββββ
β Page Image ββββββΆβ Region β
β Ingestion β β Segmenter β
βββββββββββββββ βββββββββββββββ
β β
βΌ βΌ
βββββββββββββββ βββββββββββββββ
β Reading ββββββΆβ Hierarchy β
β Order Engineβ β Mapper β
βββββββββββββββ βββββββββββββββ
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
| min_confidence | float | 0.80 | Minimum confidence score for a region classification to be included |
| deskew_correction | boolean | true | Automatically detect and correct page rotation before analysis |
| table_detection_mode | string | "grid-aware" | Table detection strategy: grid-aware, line-based, or whitespace |
| max_pages | integer | 50 | Maximum number of pages to analyze in a single invocation |
| output_format | string | "json" | Output format for structure maps: json, yaml, or markdown |
Best Practices
-
Run Structure Analysis Before OCR Extraction Feeding region boundaries and reading order to the OCR engine dramatically improves extraction accuracy. Without structure analysis, OCR processes text linearly and mangles multi-column layouts, tables, and sidebars.
-
Calibrate Confidence Thresholds Per Document Type Scanned handwritten documents produce lower confidence scores than clean digital PDFs. Lower min_confidence to 0.65 for handwritten or degraded inputs to avoid discarding valid but uncertain region detections.
-
Use Template Matching for Recurring Document Types If you process the same form or invoice layout repeatedly, save the structure analysis as a template. Future documents matching that template skip the full analysis pipeline and process significantly faster.
-
Verify Reading Order on Complex Layouts Multi-column academic papers, magazine spreads, and brochures have non-obvious reading orders. Always review the suggested reading order for complex layouts, as incorrect ordering produces incoherent OCR output.
-
Separate Table Regions for Dedicated Processing Tables require grid-aware extraction that differs fundamentally from paragraph text OCR. The structure analyzer marks table boundaries so downstream processors can apply specialized table extraction algorithms.
Common Issues
-
Sidebar text merged with main body content in reading order Sidebars positioned close to the main text column may be incorrectly merged. Increase the column_gap_threshold parameter to require a wider whitespace gap before treating adjacent regions as separate columns.
-
Table detection fails on borderless tables Tables without visible gridlines require whitespace-based detection. Switch table_detection_mode from "grid-aware" to "whitespace" for documents that use spacing rather than borders to delineate table cells.
-
Rotated or skewed pages produce misaligned region boundaries Even with deskew_correction enabled, pages rotated more than 5 degrees may not correct fully. Pre-process heavily skewed documents with a dedicated image rotation tool before submitting them for structure analysis.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
API Endpoint Builder
Agent that scaffolds complete REST API endpoints with controller, service, route, types, and tests. Supports Express, Fastify, and NestJS.
Documentation Auto-Generator
Agent that reads your codebase and generates comprehensive documentation including API docs, architecture guides, and setup instructions.
Ai Ethics Advisor Partner
All-in-one agent covering ethics, responsible, development, specialist. Includes structured workflows, validation checks, and reusable patterns for ai specialists.