Ultimate Metadata Extraction Engine
Professional-grade skill designed for extract and analyze file metadata for forensic purposes. Built for Claude Code with best practices and real-world patterns.
Metadata Extraction Engine
Comprehensive metadata extraction and analysis toolkit that reads, parses, and reports on metadata from files, images, documents, audio, video, and web resources for forensics, cataloging, and compliance.
When to Use This Skill
Choose Metadata Extraction when:
- Auditing files for sensitive information leakage (author names, GPS coordinates)
- Cataloging media libraries with consistent metadata
- Verifying document authenticity and modification history
- Extracting EXIF data from images for geolocation or timestamp analysis
- Cleaning metadata before publishing or sharing files
Consider alternatives when:
- You need to modify file content, not just metadata
- Working with database records rather than files
- Need real-time metadata streaming from live sources
Quick Start
# Extract image metadata claude skill activate ultimate-metadata-extraction-engine # Analyze image EXIF data claude "Extract all metadata from photo.jpg including GPS coordinates" # Audit documents for sensitive metadata claude "Scan all PDFs in /docs/ for author information and revision history"
Example Extraction
# ExifTool - comprehensive metadata extraction exiftool -all photo.jpg # Extract GPS coordinates from images exiftool -gpslatitude -gpslongitude -gpsaltitude photo.jpg # Extract metadata from PDF exiftool -Author -Creator -Producer -CreateDate -ModifyDate document.pdf # Bulk extract from directory exiftool -csv -r /photos/ > metadata_report.csv # Strip all metadata from files before sharing exiftool -all= -overwrite_original sensitive_document.pdf
# Python metadata extraction from PIL import Image from PIL.ExifTags import TAGS, GPSTAGS import json def extract_image_metadata(image_path: str) -> dict: img = Image.open(image_path) exif_data = img._getexif() if not exif_data: return {"error": "No EXIF data found"} metadata = {} for tag_id, value in exif_data.items(): tag_name = TAGS.get(tag_id, tag_id) if tag_name == "GPSInfo": gps = {} for gps_tag_id, gps_value in value.items(): gps_tag = GPSTAGS.get(gps_tag_id, gps_tag_id) gps[gps_tag] = gps_value metadata["GPSInfo"] = gps else: metadata[tag_name] = str(value) return metadata
Core Concepts
Supported Metadata Types
| File Type | Metadata Standard | Key Fields |
|---|---|---|
| Images (JPEG, PNG, TIFF) | EXIF, IPTC, XMP | Camera model, GPS, date, aperture, ISO |
| PDF Documents | PDF Info, XMP | Author, creator app, creation/mod dates, revision count |
| Office Documents | OOXML, OLE | Author, company, last editor, revision, template |
| Audio (MP3, FLAC, WAV) | ID3, Vorbis | Artist, album, track, year, genre, duration |
| Video (MP4, MKV, AVI) | QuickTime, Matroska | Resolution, codec, duration, creation date, GPS |
| HTML/Web | Meta tags, OpenGraph | Title, description, keywords, OG image, canonical |
Metadata Analysis Use Cases
| Use Case | What to Extract | Why |
|---|---|---|
| Privacy Audit | GPS, author, device IDs, edit history | Prevent PII leakage in published files |
| Authentication | Creation dates, software versions, modification history | Verify document provenance |
| Cataloging | Title, description, tags, dates, dimensions | Build searchable media libraries |
| Forensics | All available metadata including hidden/deleted | Incident investigation evidence |
| SEO | Title, description, Open Graph, structured data | Optimize web page metadata |
| Compliance | Author, dates, access controls, classification | Meet regulatory requirements |
Configuration
| Parameter | Description | Default |
|---|---|---|
recursive | Scan directories recursively | true |
file_types | File extensions to process | ["*"] (all) |
output_format | Output format: json, csv, markdown, xml | json |
include_binary | Include binary/hex metadata fields | false |
gps_format | GPS coordinate format: dms, decimal | decimal |
strip_mode | What to strip: all, gps, author, custom | none |
hash_files | Compute file hashes during extraction | true |
Best Practices
-
Always strip metadata before publishing files externally — Images from smartphones embed GPS coordinates, device serial numbers, and owner names. Documents contain author names, revision history, and printer details. Remove all non-essential metadata before sharing.
-
Use ExifTool for cross-format consistency — ExifTool handles over 400 file formats with a unified interface. Using format-specific tools creates inconsistent workflows and misses metadata that crosses format boundaries (XMP embedded in both images and PDFs).
-
Preserve original files when stripping metadata — Create copies before removing metadata. Use the
-overwrite_originalflag only when you've verified the stripped version is correct. Some workflows depend on metadata that may not be obvious. -
Validate GPS coordinates for plausibility — Extracted GPS data can be inaccurate due to poor signal, cached locations, or timezone mismatches. Cross-reference with other evidence (photo content, timestamps, cell tower data) for forensic work.
-
Check for hidden metadata streams — Files can contain multiple metadata blocks (EXIF, IPTC, XMP) with conflicting information. Some metadata survives standard stripping — use
exiftool -all:allto see everything and-all:all=to remove everything.
Common Issues
Metadata extraction returns empty results despite file containing data. The file may use a non-standard metadata location or encoding. Try ExifTool with -u (unknown tags) and -G (group names) flags. Some applications embed metadata in proprietary formats that require specific parsers. Also check if the file has been pre-stripped by a CDN or upload service.
GPS coordinates appear incorrect or offset by hundreds of meters. Check the GPS datum and coordinate reference fields. Older devices may use different datums (NAD27 vs WGS84), and some cameras record the location where the device last had GPS lock rather than the actual photo location. Indoor photos often have cached outdoor coordinates.
Stripping metadata corrupts the file or changes its appearance. Some file formats interleave metadata with rendering data. JPEG thumbnails stored in EXIF can be lost during stripping, and ICC color profiles (technically metadata) affect how images display. Strip selectively — remove identifying information while preserving color profiles, orientation tags, and format-essential metadata.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.