PDF Skill
Extract text, form fields, metadata from PDFs. Merge, split, annotate, and manipulate PDF documents programmatically. Essential for document processing workflows, data extraction pipelines, and automated report generation.
Description
This skill enables comprehensive PDF manipulation including text extraction, form field reading, page merging/splitting, annotation, and metadata operations. Uses pdf-lib and pdf-parse libraries for reliable PDF processing.
Instructions
When the user asks you to work with PDF files, follow these patterns:
Reading & Extraction
const fs = require('fs'); const pdfParse = require('pdf-parse'); // Extract text from PDF const dataBuffer = fs.readFileSync('document.pdf'); const data = await pdfParse(dataBuffer); console.log(data.text); // Full text content console.log(data.numpages); // Page count console.log(data.info); // Metadata (title, author, etc.)
Creating & Modifying PDFs
const { PDFDocument, rgb, StandardFonts } = require('pdf-lib'); // Create new PDF const pdfDoc = await PDFDocument.create(); const page = pdfDoc.addPage([612, 792]); // Letter size const font = await pdfDoc.embedFont(StandardFonts.Helvetica); page.drawText('Hello World', { x: 50, y: 700, size: 24, font, color: rgb(0, 0, 0), }); const pdfBytes = await pdfDoc.save(); fs.writeFileSync('output.pdf', pdfBytes);
Merging PDFs
const pdfDoc = await PDFDocument.create(); const pdf1 = await PDFDocument.load(fs.readFileSync('file1.pdf')); const pdf2 = await PDFDocument.load(fs.readFileSync('file2.pdf')); const pages1 = await pdfDoc.copyPages(pdf1, pdf1.getPageIndices()); const pages2 = await pdfDoc.copyPages(pdf2, pdf2.getPageIndices()); pages1.forEach(p => pdfDoc.addPage(p)); pages2.forEach(p => pdfDoc.addPage(p)); fs.writeFileSync('merged.pdf', await pdfDoc.save());
Extracting Form Fields
const pdfDoc = await PDFDocument.load(pdfBuffer); const form = pdfDoc.getForm(); const fields = form.getFields(); fields.forEach(field => { const name = field.getName(); const type = field.constructor.name; console.log(`${name} (${type})`); });
Rules
- Always install dependencies first:
npm install pdf-lib pdf-parse - Use
pdf-parsefor text extraction (read-only operations) - Use
pdf-libfor creation, modification, and form operations - Handle encrypted PDFs gracefully — inform user if password is needed
- For large PDFs (>50 pages), process page ranges to manage memory
- Always validate the PDF buffer before processing: check for
%PDFheader - Write output files to the project directory, never overwrite originals unless asked
Examples
User: Extract all text from invoice.pdf Action: Use pdf-parse to read and return full text content
User: Merge these 3 PDFs into one Action: Use pdf-lib to copy pages from each source into a new document
User: Fill out this PDF form with my data Action: Load with pdf-lib, get form fields, set values, save
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.