Docx Official Dynamic
Comprehensive skill designed for skill, whenever, user, wants. Includes structured workflows, validation checks, and reusable patterns for document processing.
DOCX Official Dynamic
A production-grade skill for DOCX creation, editing, and analysis following OOXML standards. Covers the complete document lifecycle from creation through formatting, conversion, and automation with emphasis on standards compliance and cross-platform compatibility.
When to Use This Skill
Choose this skill when:
- Building document automation systems that must work across platforms
- Creating DOCX files that must comply with OOXML standards
- Implementing document workflows with programmatic creation and editing
- Converting documents between DOCX and other formats with exact fidelity
- Processing uploaded DOCX files for content extraction and analysis
Consider alternatives when:
- Simple one-off document creation → use a DOCX toolkit skill
- Working exclusively with PDFs → use a PDF skill
- Need a web-based editor → use a rich text editor
- Creating presentations → use a PPTX skill
Quick Start
# Convert between formats with pandoc pandoc input.md -o output.docx --reference-doc=template.docx pandoc input.docx -t markdown -o output.md pandoc input.docx -o output.pdf --pdf-engine=xelatex # Analyze DOCX structure unzip -l document.docx # List contents unzip -p document.docx word/document.xml | xmllint --format -
# Comprehensive DOCX creation from docx import Document from docx.shared import Inches, Pt, Cm, Emu from docx.enum.text import WD_ALIGN_PARAGRAPH from docx.enum.style import WD_STYLE_TYPE doc = Document() # Custom style style = doc.styles.add_style('CustomHeading', WD_STYLE_TYPE.PARAGRAPH) style.font.size = Pt(16) style.font.bold = True style.font.color.rgb = RGBColor(0x1a, 0x56, 0xdb) style.paragraph_format.space_after = Pt(12) # Apply custom style doc.add_paragraph('Custom Styled Heading', style='CustomHeading') # Multi-column table with alternating row colors table = doc.add_table(rows=5, cols=4, style='Table Grid') table.alignment = WD_TABLE_ALIGNMENT.CENTER for i, row in enumerate(table.rows): if i % 2 == 1: for cell in row.cells: shading = OxmlElement('w:shd') shading.set(qn('w:fill'), 'F2F2F2') cell._tc.get_or_add_tcPr().append(shading)
Core Concepts
DOCX Processing Approaches
| Approach | Tool | Best For |
|---|---|---|
| High-level API | python-docx | Creating/editing with paragraph-level control |
| Format conversion | pandoc | Converting between formats (MD↔DOCX↔PDF) |
| Raw XML manipulation | lxml + zipfile | Advanced features not in python-docx |
| CLI processing | libreoffice --convert-to | Batch PDF conversion |
| Node.js | docx npm package | Server-side generation in JS apps |
Cross-Platform Compatibility
# Ensure documents render correctly across platforms def create_compatible_document(): doc = Document() # Embed fonts for consistent rendering # Use widely available fonts: Calibri, Arial, Times New Roman # Set explicit styles instead of relying on defaults for style_name in ['Normal', 'Heading 1', 'Heading 2']: style = doc.styles[style_name] style.font.name = 'Calibri' if style_name == 'Normal': style.font.size = Pt(11) # Use points for sizes, not relative units # Use RGB colors, not theme colors # Specify exact column widths for tables return doc
Batch Document Processing
import os from concurrent.futures import ThreadPoolExecutor def batch_convert(input_dir: str, output_format: str = 'pdf'): """Convert all DOCX files in directory to specified format.""" docx_files = [f for f in os.listdir(input_dir) if f.endswith('.docx')] def convert_one(filename): input_path = os.path.join(input_dir, filename) output_path = os.path.join(input_dir, filename.replace('.docx', f'.{output_format}')) os.system(f'libreoffice --headless --convert-to {output_format} ' f'--outdir "{input_dir}" "{input_path}"') return output_path with ThreadPoolExecutor(max_workers=4) as executor: results = list(executor.map(convert_one, docx_files)) return results
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
conversionTool | string | 'pandoc' | Converter: pandoc, libreoffice, or unoconv |
defaultTemplate | string | '' | Reference DOCX template for styling |
fontEmbedding | boolean | false | Embed fonts in generated documents |
xmlValidation | boolean | true | Validate OOXML on save |
concurrency | number | 4 | Parallel workers for batch processing |
preserveComments | boolean | true | Preserve comments during conversion |
Best Practices
-
Use a reference document for consistent styling — Pass
--reference-doc=template.docxto pandoc to inherit styles, headers, footers, and page layout from an existing professionally formatted template. -
Validate generated XML against OOXML schema — Invalid XML causes documents to fail to open. Use
python-docx's built-in validation or check XML manually withxmllintafter modification. -
Use libreoffice headless for reliable PDF conversion — While pandoc can convert to PDF, libreoffice produces more faithful DOCX-to-PDF conversion because it fully renders the DOCX format. Run headless on servers with
--headless --convert-to pdf. -
Process documents in parallel for batch operations — DOCX processing is CPU-bound and parallelizes well. Use thread pools for I/O-heavy operations (file reading) and process pools for CPU-heavy operations (rendering, conversion).
-
Handle character encoding explicitly — DOCX uses UTF-8 internally, but content from databases or CSV files may use different encodings. Decode input data to UTF-8 before inserting into documents to prevent garbled characters.
Common Issues
pandoc conversion loses complex formatting — Pandoc's Markdown intermediate format can't represent all DOCX features (text boxes, complex headers, page breaks). For high-fidelity conversion, use libreoffice or direct OOXML manipulation.
Batch processing fails on corrupted files — Wrap individual file processing in try/except to handle corrupted DOCX files without stopping the entire batch. Log failures and continue processing remaining files.
Generated documents show "Repair" dialog on open — This indicates invalid XML or missing required elements. Common causes: improperly escaped characters, missing content type definitions, or broken image references. Validate the ZIP structure before distribution.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.