D

Docx Toolkit

Streamline your workflow with this comprehensive, document, creation, editing. Includes structured workflows, validation checks, and reusable patterns for document processing.

SkillClipticsdocument processingv1.0.0MIT
0 views0 copies

DOCX Toolkit

An advanced skill for DOCX file manipulation including creation, editing, analysis, and format conversion. Covers the OOXML structure, advanced formatting, mail merge, document comparison, and batch processing of Word documents.

When to Use This Skill

Choose this skill when:

  • Creating complex DOCX files with advanced formatting and layouts
  • Analyzing DOCX file structure for debugging or conversion
  • Implementing mail merge or batch document generation
  • Comparing document versions and tracking changes
  • Converting between DOCX and other formats with high fidelity

Consider alternatives when:

  • Simple document creation β†’ use a Doc Kit skill
  • PDF generation β†’ use a PDF processing skill
  • Spreadsheet manipulation β†’ use an XLSX skill
  • Presentation creation β†’ use a PPTX skill

Quick Start

# Advanced DOCX creation with styles, headers, footers from docx import Document from docx.shared import Inches, Pt, Cm, RGBColor from docx.enum.section import WD_ORIENT doc = Document() # Configure page layout section = doc.sections[0] section.page_width = Cm(21) # A4 width section.page_height = Cm(29.7) # A4 height section.top_margin = Cm(2.5) section.bottom_margin = Cm(2.5) # Add header header = section.header header_para = header.paragraphs[0] header_para.text = 'CONFIDENTIAL β€” Acme Corp' header_para.style.font.size = Pt(8) # Add footer with page numbers footer = section.footer footer_para = footer.paragraphs[0] from docx.oxml.ns import qn from docx.oxml import OxmlElement fld = OxmlElement('w:fldSimple') fld.set(qn('w:instr'), 'PAGE') footer_para._p.append(fld)

Core Concepts

OOXML Document Structure

ComponentPath in ZIPPurpose
word/document.xmlMain bodyParagraphs, tables, images
word/styles.xmlStyle definitionsFonts, sizes, colors, spacing
word/header1.xmlHeadersPage header content
word/footer1.xmlFootersPage footer content
word/numbering.xmlListsBullet and numbered list definitions
word/media/Media filesEmbedded images and objects
[Content_Types].xmlContent typesFile type declarations

Mail Merge Pattern

from docx import Document import csv def mail_merge(template_path: str, data_path: str, output_dir: str): """Generate individual documents from template + CSV data.""" with open(data_path) as f: records = list(csv.DictReader(f)) for i, record in enumerate(records): doc = Document(template_path) for para in doc.paragraphs: for key, value in record.items(): placeholder = f'<<{key}>>' if placeholder in para.text: for run in para.runs: if placeholder in run.text: run.text = run.text.replace(placeholder, value) for table in doc.tables: for row in table.rows: for cell in row.cells: for key, value in record.items(): placeholder = f'<<{key}>>' if placeholder in cell.text: for para in cell.paragraphs: for run in para.runs: if placeholder in run.text: run.text = run.text.replace(placeholder, value) output_name = f"{record.get('name', i)}.docx".replace(' ', '_') doc.save(f"{output_dir}/{output_name}") print(f"Generated: {output_name}")

Document Comparison

# Compare two DOCX files using pandoc + diff pandoc old_version.docx -t markdown -o /tmp/old.md pandoc new_version.docx -t markdown -o /tmp/new.md diff --unified /tmp/old.md /tmp/new.md > changes.diff # Or use python-docx for structured comparison python3 -c " from docx import Document old = Document('v1.docx') new = Document('v2.docx') for i, (op, np) in enumerate(zip(old.paragraphs, new.paragraphs)): if op.text != np.text: print(f'Changed at para {i}: {op.text[:50]} β†’ {np.text[:50]}') "

Configuration

ParameterTypeDefaultDescription
pageSizestring'A4'Page size: A4, Letter, Legal
orientationstring'portrait'Orientation: portrait or landscape
defaultFontstring'Calibri'Default document font
defaultSizenumber11Default font size in points
marginsCmobject{top: 2.5, bottom: 2.5}Page margins in centimeters
templateDirstring'./templates'Template file directory

Best Practices

  1. Manipulate runs, not paragraphs, to preserve formatting β€” Replacing paragraph.text destroys all inline formatting. Work with individual runs within paragraphs to maintain bold, italic, font changes, and colors.

  2. Use python-docx for creation and pandoc for conversion β€” python-docx excels at creating and modifying DOCX files with fine-grained control. Pandoc is superior for converting between formats (DOCX↔Markdown, DOCXβ†’PDF).

  3. Handle the OOXML ZIP structure for operations python-docx doesn't support β€” For advanced features like tracked changes, comments, or custom XML, unzip the DOCX, modify the XML directly, and re-zip. This gives access to the full OOXML specification.

  4. Use styles instead of direct formatting β€” Define heading, body, and table styles in the template, then apply style names in code. This ensures formatting consistency and allows global style changes without modifying content code.

  5. Test generated documents in both Word and LibreOffice β€” Formatting that looks correct in Word may render differently in LibreOffice and vice versa. Test with both applications to catch compatibility issues.

Common Issues

Merged table cells cause IndexError β€” python-docx doesn't handle merged cells intuitively. Access cells by their grid coordinates and check cell.merge before assuming each cell is independent.

Images appear at wrong size β€” Default image insertion uses the image's native resolution. Always specify dimensions with doc.add_picture(path, width=Inches(4)) to ensure consistent sizing regardless of source image DPI.

Generated documents crash on open β€” Invalid XML in the DOCX ZIP causes Word to refuse to open the file. Validate XML content, escape special characters (&, <, >), and use proper namespace prefixes when manipulating raw XML.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates