DOCX Toolkit

An advanced skill for DOCX file manipulation including creation, editing, analysis, and format conversion. Covers the OOXML structure, advanced formatting, mail merge, document comparison, and batch processing of Word documents.

When to Use This Skill

Choose this skill when:

Creating complex DOCX files with advanced formatting and layouts
Analyzing DOCX file structure for debugging or conversion
Implementing mail merge or batch document generation
Comparing document versions and tracking changes
Converting between DOCX and other formats with high fidelity

Consider alternatives when:

Simple document creation → use a Doc Kit skill
PDF generation → use a PDF processing skill
Spreadsheet manipulation → use an XLSX skill
Presentation creation → use a PPTX skill

Quick Start


# Advanced DOCX creation with styles, headers, footers
from docx import Document
from docx.shared import Inches, Pt, Cm, RGBColor
from docx.enum.section import WD_ORIENT

doc = Document()

# Configure page layout
section = doc.sections[0]
section.page_width = Cm(21)    # A4 width
section.page_height = Cm(29.7) # A4 height
section.top_margin = Cm(2.5)
section.bottom_margin = Cm(2.5)

# Add header
header = section.header
header_para = header.paragraphs[0]
header_para.text = 'CONFIDENTIAL — Acme Corp'
header_para.style.font.size = Pt(8)

# Add footer with page numbers
footer = section.footer
footer_para = footer.paragraphs[0]
from docx.oxml.ns import qn
from docx.oxml import OxmlElement
fld = OxmlElement('w:fldSimple')
fld.set(qn('w:instr'), 'PAGE')
footer_para._p.append(fld)

Core Concepts

OOXML Document Structure

Component	Path in ZIP	Purpose
`word/document.xml`	Main body	Paragraphs, tables, images
`word/styles.xml`	Style definitions	Fonts, sizes, colors, spacing
`word/header1.xml`	Headers	Page header content
`word/footer1.xml`	Footers	Page footer content
`word/numbering.xml`	Lists	Bullet and numbered list definitions
`word/media/`	Media files	Embedded images and objects
`[Content_Types].xml`	Content types	File type declarations

Mail Merge Pattern


from docx import Document
import csv

def mail_merge(template_path: str, data_path: str, output_dir: str):
    """Generate individual documents from template + CSV data."""
    with open(data_path) as f:
        records = list(csv.DictReader(f))

    for i, record in enumerate(records):
        doc = Document(template_path)

        for para in doc.paragraphs:
            for key, value in record.items():
                placeholder = f'<<{key}>>'
                if placeholder in para.text:
                    for run in para.runs:
                        if placeholder in run.text:
                            run.text = run.text.replace(placeholder, value)

        for table in doc.tables:
            for row in table.rows:
                for cell in row.cells:
                    for key, value in record.items():
                        placeholder = f'<<{key}>>'
                        if placeholder in cell.text:
                            for para in cell.paragraphs:
                                for run in para.runs:
                                    if placeholder in run.text:
                                        run.text = run.text.replace(placeholder, value)

        output_name = f"{record.get('name', i)}.docx".replace(' ', '_')
        doc.save(f"{output_dir}/{output_name}")
        print(f"Generated: {output_name}")

Document Comparison


# Compare two DOCX files using pandoc + diff
pandoc old_version.docx -t markdown -o /tmp/old.md
pandoc new_version.docx -t markdown -o /tmp/new.md
diff --unified /tmp/old.md /tmp/new.md > changes.diff

# Or use python-docx for structured comparison
python3 -c "
from docx import Document
old = Document('v1.docx')
new = Document('v2.docx')
for i, (op, np) in enumerate(zip(old.paragraphs, new.paragraphs)):
    if op.text != np.text:
        print(f'Changed at para {i}: {op.text[:50]} → {np.text[:50]}')
"

Configuration

Parameter	Type	Default	Description
`pageSize`	string	`'A4'`	Page size: A4, Letter, Legal
`orientation`	string	`'portrait'`	Orientation: portrait or landscape
`defaultFont`	string	`'Calibri'`	Default document font
`defaultSize`	number	`11`	Default font size in points
`marginsCm`	object	`{top: 2.5, bottom: 2.5}`	Page margins in centimeters
`templateDir`	string	`'./templates'`	Template file directory

Best Practices

Manipulate runs, not paragraphs, to preserve formatting — Replacing paragraph.text destroys all inline formatting. Work with individual runs within paragraphs to maintain bold, italic, font changes, and colors.
Use python-docx for creation and pandoc for conversion — python-docx excels at creating and modifying DOCX files with fine-grained control. Pandoc is superior for converting between formats (DOCX↔Markdown, DOCX→PDF).
Handle the OOXML ZIP structure for operations python-docx doesn't support — For advanced features like tracked changes, comments, or custom XML, unzip the DOCX, modify the XML directly, and re-zip. This gives access to the full OOXML specification.
Use styles instead of direct formatting — Define heading, body, and table styles in the template, then apply style names in code. This ensures formatting consistency and allows global style changes without modifying content code.
Test generated documents in both Word and LibreOffice — Formatting that looks correct in Word may render differently in LibreOffice and vice versa. Test with both applications to catch compatibility issues.

Common Issues

Merged table cells cause IndexError — python-docx doesn't handle merged cells intuitively. Access cells by their grid coordinates and check cell.merge before assuming each cell is independent.

Images appear at wrong size — Default image insertion uses the image's native resolution. Always specify dimensions with doc.add_picture(path, width=Inches(4)) to ensure consistent sizing regardless of source image DPI.

Generated documents crash on open — Invalid XML in the DOCX ZIP causes Word to refuse to open the file. Validate XML content, escape special characters (&, <, >), and use proper namespace prefixes when manipulating raw XML.

⚠️ Loading Issue

Docx Toolkit

DOCX Toolkit

When to Use This Skill

Quick Start

Core Concepts

OOXML Document Structure

Mail Merge Pattern

Document Comparison

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace