P

PDF Skill

Extract text, form fields, metadata from PDFs. Merge, split, annotate, and manipulate PDF documents programmatically. Essential for document processing workflows, data extraction pipelines, and automated report generation.

SkillAnthropicdocumentationv1.0.0MIT
0 views0 copies

Description

This skill enables comprehensive PDF manipulation including text extraction, form field reading, page merging/splitting, annotation, and metadata operations. Uses pdf-lib and pdf-parse libraries for reliable PDF processing.

Instructions

When the user asks you to work with PDF files, follow these patterns:

Reading & Extraction

const fs = require('fs'); const pdfParse = require('pdf-parse'); // Extract text from PDF const dataBuffer = fs.readFileSync('document.pdf'); const data = await pdfParse(dataBuffer); console.log(data.text); // Full text content console.log(data.numpages); // Page count console.log(data.info); // Metadata (title, author, etc.)

Creating & Modifying PDFs

const { PDFDocument, rgb, StandardFonts } = require('pdf-lib'); // Create new PDF const pdfDoc = await PDFDocument.create(); const page = pdfDoc.addPage([612, 792]); // Letter size const font = await pdfDoc.embedFont(StandardFonts.Helvetica); page.drawText('Hello World', { x: 50, y: 700, size: 24, font, color: rgb(0, 0, 0), }); const pdfBytes = await pdfDoc.save(); fs.writeFileSync('output.pdf', pdfBytes);

Merging PDFs

const pdfDoc = await PDFDocument.create(); const pdf1 = await PDFDocument.load(fs.readFileSync('file1.pdf')); const pdf2 = await PDFDocument.load(fs.readFileSync('file2.pdf')); const pages1 = await pdfDoc.copyPages(pdf1, pdf1.getPageIndices()); const pages2 = await pdfDoc.copyPages(pdf2, pdf2.getPageIndices()); pages1.forEach(p => pdfDoc.addPage(p)); pages2.forEach(p => pdfDoc.addPage(p)); fs.writeFileSync('merged.pdf', await pdfDoc.save());

Extracting Form Fields

const pdfDoc = await PDFDocument.load(pdfBuffer); const form = pdfDoc.getForm(); const fields = form.getFields(); fields.forEach(field => { const name = field.getName(); const type = field.constructor.name; console.log(`${name} (${type})`); });

Rules

  • Always install dependencies first: npm install pdf-lib pdf-parse
  • Use pdf-parse for text extraction (read-only operations)
  • Use pdf-lib for creation, modification, and form operations
  • Handle encrypted PDFs gracefully — inform user if password is needed
  • For large PDFs (>50 pages), process page ranges to manage memory
  • Always validate the PDF buffer before processing: check for %PDF header
  • Write output files to the project directory, never overwrite originals unless asked

Examples

User: Extract all text from invoice.pdf Action: Use pdf-parse to read and return full text content

User: Merge these 3 PDFs into one Action: Use pdf-lib to copy pages from each source into a new document

User: Fill out this PDF form with my data Action: Load with pdf-lib, get form fields, set values, save

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates