Docx Toolkit
Streamline your workflow with this comprehensive, document, creation, editing. Includes structured workflows, validation checks, and reusable patterns for document processing.
DOCX Toolkit
An advanced skill for DOCX file manipulation including creation, editing, analysis, and format conversion. Covers the OOXML structure, advanced formatting, mail merge, document comparison, and batch processing of Word documents.
When to Use This Skill
Choose this skill when:
- Creating complex DOCX files with advanced formatting and layouts
- Analyzing DOCX file structure for debugging or conversion
- Implementing mail merge or batch document generation
- Comparing document versions and tracking changes
- Converting between DOCX and other formats with high fidelity
Consider alternatives when:
- Simple document creation β use a Doc Kit skill
- PDF generation β use a PDF processing skill
- Spreadsheet manipulation β use an XLSX skill
- Presentation creation β use a PPTX skill
Quick Start
# Advanced DOCX creation with styles, headers, footers from docx import Document from docx.shared import Inches, Pt, Cm, RGBColor from docx.enum.section import WD_ORIENT doc = Document() # Configure page layout section = doc.sections[0] section.page_width = Cm(21) # A4 width section.page_height = Cm(29.7) # A4 height section.top_margin = Cm(2.5) section.bottom_margin = Cm(2.5) # Add header header = section.header header_para = header.paragraphs[0] header_para.text = 'CONFIDENTIAL β Acme Corp' header_para.style.font.size = Pt(8) # Add footer with page numbers footer = section.footer footer_para = footer.paragraphs[0] from docx.oxml.ns import qn from docx.oxml import OxmlElement fld = OxmlElement('w:fldSimple') fld.set(qn('w:instr'), 'PAGE') footer_para._p.append(fld)
Core Concepts
OOXML Document Structure
| Component | Path in ZIP | Purpose |
|---|---|---|
word/document.xml | Main body | Paragraphs, tables, images |
word/styles.xml | Style definitions | Fonts, sizes, colors, spacing |
word/header1.xml | Headers | Page header content |
word/footer1.xml | Footers | Page footer content |
word/numbering.xml | Lists | Bullet and numbered list definitions |
word/media/ | Media files | Embedded images and objects |
[Content_Types].xml | Content types | File type declarations |
Mail Merge Pattern
from docx import Document import csv def mail_merge(template_path: str, data_path: str, output_dir: str): """Generate individual documents from template + CSV data.""" with open(data_path) as f: records = list(csv.DictReader(f)) for i, record in enumerate(records): doc = Document(template_path) for para in doc.paragraphs: for key, value in record.items(): placeholder = f'<<{key}>>' if placeholder in para.text: for run in para.runs: if placeholder in run.text: run.text = run.text.replace(placeholder, value) for table in doc.tables: for row in table.rows: for cell in row.cells: for key, value in record.items(): placeholder = f'<<{key}>>' if placeholder in cell.text: for para in cell.paragraphs: for run in para.runs: if placeholder in run.text: run.text = run.text.replace(placeholder, value) output_name = f"{record.get('name', i)}.docx".replace(' ', '_') doc.save(f"{output_dir}/{output_name}") print(f"Generated: {output_name}")
Document Comparison
# Compare two DOCX files using pandoc + diff pandoc old_version.docx -t markdown -o /tmp/old.md pandoc new_version.docx -t markdown -o /tmp/new.md diff --unified /tmp/old.md /tmp/new.md > changes.diff # Or use python-docx for structured comparison python3 -c " from docx import Document old = Document('v1.docx') new = Document('v2.docx') for i, (op, np) in enumerate(zip(old.paragraphs, new.paragraphs)): if op.text != np.text: print(f'Changed at para {i}: {op.text[:50]} β {np.text[:50]}') "
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
pageSize | string | 'A4' | Page size: A4, Letter, Legal |
orientation | string | 'portrait' | Orientation: portrait or landscape |
defaultFont | string | 'Calibri' | Default document font |
defaultSize | number | 11 | Default font size in points |
marginsCm | object | {top: 2.5, bottom: 2.5} | Page margins in centimeters |
templateDir | string | './templates' | Template file directory |
Best Practices
-
Manipulate runs, not paragraphs, to preserve formatting β Replacing
paragraph.textdestroys all inline formatting. Work with individual runs within paragraphs to maintain bold, italic, font changes, and colors. -
Use python-docx for creation and pandoc for conversion β python-docx excels at creating and modifying DOCX files with fine-grained control. Pandoc is superior for converting between formats (DOCXβMarkdown, DOCXβPDF).
-
Handle the OOXML ZIP structure for operations python-docx doesn't support β For advanced features like tracked changes, comments, or custom XML, unzip the DOCX, modify the XML directly, and re-zip. This gives access to the full OOXML specification.
-
Use styles instead of direct formatting β Define heading, body, and table styles in the template, then apply style names in code. This ensures formatting consistency and allows global style changes without modifying content code.
-
Test generated documents in both Word and LibreOffice β Formatting that looks correct in Word may render differently in LibreOffice and vice versa. Test with both applications to catch compatibility issues.
Common Issues
Merged table cells cause IndexError β python-docx doesn't handle merged cells intuitively. Access cells by their grid coordinates and check cell.merge before assuming each cell is independent.
Images appear at wrong size β Default image insertion uses the image's native resolution. Always specify dimensions with doc.add_picture(path, width=Inches(4)) to ensure consistent sizing regardless of source image DPI.
Generated documents crash on open β Invalid XML in the DOCX ZIP causes Word to refuse to open the file. Validate XML content, escape special characters (&, <, >), and use proper namespace prefixes when manipulating raw XML.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.