Dynamic PDF Toolkit Toolkit
A comprehensive skill that enables extract text, tables, merge and annotate PDF files. Built for Claude Code with best practices and real-world patterns.
PDF Toolkit
A document processing skill for generating, manipulating, merging, and extracting data from PDF files using libraries like pdf-lib, pdfkit, PyPDF, and reportlab.
When to Use
Choose PDF Toolkit when:
- Generating PDF reports, invoices, and certificates programmatically
- Merging, splitting, and manipulating existing PDF documents
- Extracting text, tables, and metadata from PDF files
- Adding watermarks, headers, footers, and page numbers to PDFs
Consider alternatives when:
- Creating editable documents — use Word or Google Docs format
- Simple text output — use HTML or Markdown
- Interactive forms — use web forms instead of PDF forms
Quick Start
# Python pip install reportlab PyPDF2 pdfplumber # Node.js npm install pdf-lib pdfkit
import { PDFDocument, rgb, StandardFonts } from 'pdf-lib'; async function createInvoice(data: InvoiceData) { const pdfDoc = await PDFDocument.create(); const page = pdfDoc.addPage([595, 842]); // A4 size const font = await pdfDoc.embedFont(StandardFonts.Helvetica); const boldFont = await pdfDoc.embedFont(StandardFonts.HelveticaBold); const { width, height } = page.getSize(); // Header page.drawText('INVOICE', { x: 50, y: height - 50, size: 28, font: boldFont, color: rgb(0.2, 0.2, 0.8) }); page.drawText(`Invoice #: ${data.invoiceNumber}`, { x: 50, y: height - 90, size: 12, font }); page.drawText(`Date: ${data.date}`, { x: 50, y: height - 110, size: 12, font }); // Line items table let yPos = height - 180; const headers = ['Item', 'Qty', 'Price', 'Total']; const colX = [50, 300, 400, 480]; headers.forEach((header, i) => { page.drawText(header, { x: colX[i], y: yPos, size: 11, font: boldFont }); }); yPos -= 5; page.drawLine({ start: { x: 50, y: yPos }, end: { x: 545, y: yPos }, thickness: 1 }); yPos -= 20; for (const item of data.items) { page.drawText(item.name, { x: colX[0], y: yPos, size: 10, font }); page.drawText(String(item.quantity), { x: colX[1], y: yPos, size: 10, font }); page.drawText(`$${item.price.toFixed(2)}`, { x: colX[2], y: yPos, size: 10, font }); page.drawText(`$${(item.quantity * item.price).toFixed(2)}`, { x: colX[3], y: yPos, size: 10, font }); yPos -= 20; } // Total yPos -= 10; page.drawLine({ start: { x: 400, y: yPos }, end: { x: 545, y: yPos }, thickness: 1 }); yPos -= 20; page.drawText(`Total: $${data.total.toFixed(2)}`, { x: 400, y: yPos, size: 14, font: boldFont }); return await pdfDoc.save(); }
Core Concepts
PDF Library Comparison
| Library | Language | Best For | Features |
|---|---|---|---|
| pdf-lib | JS/TS | Creating and modifying PDFs | Create, modify, merge, form fill |
| PDFKit | Node.js | Generating complex PDFs | Drawing API, vector graphics |
| ReportLab | Python | Enterprise PDF generation | Complex layouts, charts |
| PyPDF2 | Python | Manipulating existing PDFs | Merge, split, rotate, encrypt |
| pdfplumber | Python | Text and table extraction | Table detection, text position |
| Puppeteer | Node.js | HTML to PDF conversion | Browser rendering |
PDF Manipulation Operations
from PyPDF2 import PdfReader, PdfWriter, PdfMerger class PDFProcessor: def merge_pdfs(self, input_paths, output_path): merger = PdfMerger() for path in input_paths: merger.append(path) merger.write(output_path) merger.close() def split_pdf(self, input_path, output_dir): reader = PdfReader(input_path) for i, page in enumerate(reader.pages): writer = PdfWriter() writer.add_page(page) with open(f"{output_dir}/page_{i+1}.pdf", 'wb') as f: writer.write(f) def add_watermark(self, input_path, watermark_path, output_path): reader = PdfReader(input_path) watermark = PdfReader(watermark_path).pages[0] writer = PdfWriter() for page in reader.pages: page.merge_page(watermark) writer.add_page(page) with open(output_path, 'wb') as f: writer.write(f) def extract_text(self, input_path): import pdfplumber with pdfplumber.open(input_path) as pdf: text = '' for page in pdf.pages: text += page.extract_text() + '\n' tables = page.extract_tables() for table in tables: for row in table: text += ' | '.join(str(cell) for cell in row) + '\n' return text
Configuration
| Option | Description | Default |
|---|---|---|
page_size | Page dimensions: A4, letter, legal | "A4" |
margins | Page margins in points | { top: 72, right: 72, bottom: 72, left: 72 } |
default_font | Default font family | "Helvetica" |
font_size | Default font size in points | 12 |
compression | Enable PDF compression | true |
metadata | PDF metadata (author, title, subject) | {} |
encryption | PDF password protection | null |
dpi | Image resolution for embedded images | 150 |
Best Practices
- Use HTML-to-PDF conversion for complex layouts with CSS styling rather than positioning elements manually with coordinates — Puppeteer or wkhtmltopdf render HTML/CSS faithfully and are much faster to develop with
- Embed fonts if using non-standard fonts to ensure the PDF renders correctly on any system; relying on system fonts causes display issues when recipients do not have the same fonts installed
- Set proper metadata (title, author, subject, keywords) in every generated PDF to improve accessibility, searchability, and organization
- Optimize file size by compressing images before embedding, using vector graphics where possible, and removing unused embedded resources from manipulated PDFs
- Test PDF output with multiple viewers (Adobe Reader, Chrome, Preview) because rendering differences between viewers can cause layout issues that only appear on specific platforms
Common Issues
Text extraction returning garbled output: Some PDFs use non-standard font encodings or are scanned images without OCR text layers. Check if the PDF has a text layer with pdfplumber, and if not, run OCR with pytesseract on page images extracted from the PDF.
Coordinate system confusion: PDF coordinates start from the bottom-left corner with Y increasing upward, which is opposite to most screen coordinate systems. Use helper functions that convert from top-left coordinates to PDF coordinates: pdfY = pageHeight - screenY.
Large PDFs causing memory issues: Processing PDFs with hundreds of pages or high-resolution images consumes significant memory. Process pages one at a time instead of loading the entire document, stream output to disk rather than holding it in memory, and reduce image DPI for embedded images.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.