Word Document Processing

A document automation skill for programmatically creating, reading, modifying, and converting Microsoft Word documents using libraries like docx, python-docx, and mammoth.

When to Use

Choose Word Document Processing when:

Generating reports, invoices, or contracts programmatically from templates
Extracting text and metadata from Word documents for processing
Converting Word documents to PDF, HTML, or plain text
Automating mail merge and document assembly workflows

Consider alternatives when:

Creating simple text documents — use plain text or Markdown
Building PDF-first documents — use a PDF generation library directly
Interactive document editing — use Google Docs API or Office 365 API

Quick Start


# Python
pip install python-docx mammoth

# Node.js
npm install docx mammoth


from docx import Document
from docx.shared import Inches, Pt, RGBColor
from docx.enum.text import WD_ALIGN_PARAGRAPH

def create_report(data):
    doc = Document()

    # Title
    title = doc.add_heading('Monthly Report', level=0)
    title.alignment = WD_ALIGN_PARAGRAPH.CENTER

    # Subtitle with styling
    subtitle = doc.add_paragraph()
    run = subtitle.add_run(f"Period: {data['month']} {data['year']}")
    run.font.size = Pt(14)
    run.font.color.rgb = RGBColor(100, 100, 100)
    subtitle.alignment = WD_ALIGN_PARAGRAPH.CENTER

    doc.add_paragraph()  # Spacer

    # Summary section
    doc.add_heading('Executive Summary', level=1)
    doc.add_paragraph(data['summary'])

    # Metrics table
    doc.add_heading('Key Metrics', level=1)
    table = doc.add_table(rows=1, cols=3)
    table.style = 'Light Grid Accent 1'

    header_cells = table.rows[0].cells
    header_cells[0].text = 'Metric'
    header_cells[1].text = 'Value'
    header_cells[2].text = 'Change'

    for metric in data['metrics']:
        row = table.add_row().cells
        row[0].text = metric['name']
        row[1].text = str(metric['value'])
        row[2].text = f"{metric['change']:+.1f}%"

    # Chart placeholder image
    if data.get('chart_path'):
        doc.add_heading('Trend Analysis', level=1)
        doc.add_picture(data['chart_path'], width=Inches(6))

    # Save
    filename = f"report_{data['month']}_{data['year']}.docx"
    doc.save(filename)
    return filename

Core Concepts

Document Structure

Element	Method	Usage
Heading	`add_heading(text, level)`	Section headers (0-9)
Paragraph	`add_paragraph(text)`	Body text
Table	`add_table(rows, cols)`	Tabular data
Image	`add_picture(path, width)`	Embedded images
Page Break	`add_page_break()`	Force new page
List	`add_paragraph(style='List')`	Bulleted/numbered lists
Header/Footer	`section.header/footer`	Page headers/footers

Template-Based Generation


from docx import Document
import re

class DocTemplate:
    def __init__(self, template_path):
        self.doc = Document(template_path)

    def fill(self, data):
        """Replace {{placeholders}} with data values"""
        for paragraph in self.doc.paragraphs:
            self._replace_in_paragraph(paragraph, data)

        for table in self.doc.tables:
            for row in table.rows:
                for cell in row.cells:
                    for paragraph in cell.paragraphs:
                        self._replace_in_paragraph(paragraph, data)

    def _replace_in_paragraph(self, paragraph, data):
        full_text = paragraph.text
        for key, value in data.items():
            placeholder = f'{{{{{key}}}}}'
            if placeholder in full_text:
                for run in paragraph.runs:
                    if placeholder in run.text:
                        run.text = run.text.replace(placeholder, str(value))

    def add_dynamic_table(self, bookmark, headers, rows):
        """Insert a table at a bookmark location"""
        table = self.doc.add_table(rows=1, cols=len(headers))
        table.style = 'Light Grid Accent 1'
        for i, header in enumerate(headers):
            table.rows[0].cells[i].text = header
        for row_data in rows:
            row = table.add_row().cells
            for i, value in enumerate(row_data):
                row[i].text = str(value)

    def save(self, output_path):
        self.doc.save(output_path)

Configuration

Option	Description	Default
`template_path`	Path to Word template file	None
`output_format`	Output format: docx, pdf, html	`"docx"`
`page_size`	Page dimensions: letter, A4	`"letter"`
`margins`	Page margins in inches	`1.0` all sides
`default_font`	Default font family	`"Calibri"`
`font_size`	Default font size in points	`11`
`line_spacing`	Paragraph line spacing	`1.15`
`include_toc`	Generate table of contents	`false`

Best Practices

Use template documents with placeholder markers instead of building documents from scratch — templates preserve formatting, headers, footers, and styles that are tedious to recreate programmatically
Preserve run-level formatting when replacing text by replacing within individual runs rather than reconstructing paragraphs, so bold, italic, and color formatting from the template is maintained
Use consistent placeholder syntax like {{variable_name}} that is unlikely to appear in normal document text and is easy to find with regex
Handle images with proper aspect ratios by always specifying either width or height (not both) when inserting images to prevent distortion
Convert to PDF for distribution using LibreOffice in headless mode (libreoffice --headless --convert-to pdf) rather than distributing editable .docx files when the recipient should not modify the content

Common Issues

Placeholder text split across multiple runs: Word splits text into separate XML runs for formatting reasons, so {{name}} might become {{, na, me}} in three runs. Join all run text in a paragraph, replace placeholders in the joined string, then redistribute text back into runs preserving formatting.

Table formatting lost after adding rows: Dynamically added rows do not inherit the table style's cell formatting. Apply formatting explicitly to each cell in new rows, or copy formatting from the template row before adding data.

Complex layouts breaking with content length changes: Long replacement text can overflow table cells or push content to unexpected pages. Set table column widths explicitly, enable cell auto-fit, and test templates with both minimum and maximum expected content lengths.

⚠️ Loading Issue

Dynamic Word Document Processing Studio

Word Document Processing

When to Use

Quick Start

Core Concepts

Document Structure

Template-Based Generation

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace