Doc Kit
All-in-one skill covering task, involves, reading, creating. Includes structured workflows, validation checks, and reusable patterns for document processing.
Doc Kit
A practical skill for creating, reading, and editing DOCX documents programmatically. Covers document creation with professional formatting, content extraction, template-based generation, and converting between formats.
When to Use This Skill
Choose this skill when:
- Creating DOCX files with tables, headers, images, and formatting
- Extracting text content from DOCX files for analysis
- Generating documents from templates with dynamic data
- Converting DOCX to/from other formats (PDF, HTML, Markdown)
- Building automated report generation pipelines
Consider alternatives when:
- Working with PDFs → use a PDF processing skill
- Creating presentations → use a PPTX skill
- Working with spreadsheets → use an XLSX/spreadsheet skill
- Need rich text editing in a web app → use a WYSIWYG editor
Quick Start
# Python DOCX tools pip install python-docx pandoc # Node.js DOCX tools npm install docx @officedev/office-addin-manifest
# Create a professional DOCX document from docx import Document from docx.shared import Inches, Pt, Cm from docx.enum.text import WD_ALIGN_PARAGRAPH from docx.enum.table import WD_TABLE_ALIGNMENT doc = Document() # Title title = doc.add_heading('Quarterly Report', level=0) title.alignment = WD_ALIGN_PARAGRAPH.CENTER # Subtitle subtitle = doc.add_paragraph('Q1 2024 Performance Summary') subtitle.alignment = WD_ALIGN_PARAGRAPH.CENTER subtitle.style.font.size = Pt(14) # Add table table = doc.add_table(rows=4, cols=3, style='Light Grid Accent 1') headers = ['Metric', 'Target', 'Actual'] for i, header in enumerate(headers): table.rows[0].cells[i].text = header data = [ ['Revenue', '$1.2M', '$1.35M'], ['Users', '10,000', '12,500'], ['Churn', '< 5%', '3.2%'], ] for row_idx, row_data in enumerate(data, 1): for col_idx, value in enumerate(row_data): table.rows[row_idx].cells[col_idx].text = value doc.save('quarterly_report.docx')
Core Concepts
DOCX Operations Matrix
| Operation | Python (python-docx) | CLI (pandoc) |
|---|---|---|
| Create document | Document() | pandoc -o file.docx |
| Add text | doc.add_paragraph() | Markdown input |
| Add table | doc.add_table() | Pipe-delimited in Markdown |
| Add image | doc.add_picture() |  |
| Add heading | doc.add_heading() | # Heading in Markdown |
| Convert to PDF | libreoffice --convert-to pdf | pandoc -o file.pdf |
| Extract text | Read paragraphs/tables | pandoc -t plain |
Template-Based Generation
# Template with placeholders from docx import Document import re def fill_template(template_path: str, data: dict, output_path: str): doc = Document(template_path) for paragraph in doc.paragraphs: for key, value in data.items(): placeholder = f'{{{{{key}}}}}' # {{key}} if placeholder in paragraph.text: for run in paragraph.runs: run.text = run.text.replace(placeholder, str(value)) # Also replace in tables for table in doc.tables: for row in table.rows: for cell in row.cells: for key, value in data.items(): placeholder = f'{{{{{key}}}}}' if placeholder in cell.text: cell.text = cell.text.replace(placeholder, str(value)) doc.save(output_path) # Usage fill_template('template.docx', { 'company_name': 'Acme Corp', 'date': '2024-03-15', 'total': '$1,350,000', }, 'output.docx')
Content Extraction
def extract_docx_content(path: str) -> dict: doc = Document(path) content = { 'paragraphs': [], 'tables': [], 'headings': [], } for para in doc.paragraphs: if para.style.name.startswith('Heading'): level = int(para.style.name.split()[-1]) if para.style.name[-1].isdigit() else 1 content['headings'].append({'level': level, 'text': para.text}) content['paragraphs'].append({ 'text': para.text, 'style': para.style.name, 'bold': any(run.bold for run in para.runs), }) for table in doc.tables: rows = [] for row in table.rows: rows.append([cell.text for cell in row.cells]) content['tables'].append(rows) return content
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
library | string | 'python-docx' | Library: python-docx, docx (Node), or pandoc |
defaultFont | string | 'Calibri' | Default document font |
defaultFontSize | number | 11 | Default font size (points) |
pageMargins | object | {top: 1, right: 1} | Page margins (inches) |
tableStyle | string | 'Light Grid' | Default table style |
templateDir | string | './templates' | Directory for document templates |
Best Practices
-
Use templates instead of building documents from scratch — Create a properly formatted template in Word with placeholder text, then fill it programmatically. This preserves complex formatting that's difficult to reproduce in code.
-
Work with runs, not paragraphs, for inline formatting — A paragraph can contain multiple runs with different formatting (bold, italic, different fonts). Replace text at the run level to preserve inline formatting differences.
-
Use pandoc for format conversion rather than building converters — Pandoc handles Markdown → DOCX, DOCX → PDF, DOCX → HTML, and dozens of other format conversions with high fidelity. Don't reinvent conversion logic.
-
Validate document structure before processing — Check that required sections, tables, and headings exist before attempting to extract or modify content. Missing structure should produce clear error messages.
-
Test with documents from different Word versions — DOCX files from Word 2016, 2019, 365, and LibreOffice have subtle format differences. Test your code with documents from all common sources your users will provide.
Common Issues
Formatting lost when replacing text — Replacing paragraph.text directly strips all formatting. Instead, iterate over paragraph.runs and replace text within individual runs to preserve bold, italic, font, and color formatting.
Images not appearing in generated documents — Image paths must be valid at generation time. Use absolute paths or ensure relative paths resolve correctly. Check that image dimensions don't exceed page width minus margins.
Table cells with merged cells break extraction — python-docx doesn't fully support merged cells. Merged cells return the same text in both the merged and unmerged cell references. Check cell.merge properties or use pandoc for reliable table extraction.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.