U

Ultimate Metadata Extraction Engine

Professional-grade skill designed for extract and analyze file metadata for forensic purposes. Built for Claude Code with best practices and real-world patterns.

SkillCommunitysecurityv1.0.0MIT
0 views0 copies

Metadata Extraction Engine

Comprehensive metadata extraction and analysis toolkit that reads, parses, and reports on metadata from files, images, documents, audio, video, and web resources for forensics, cataloging, and compliance.

When to Use This Skill

Choose Metadata Extraction when:

  • Auditing files for sensitive information leakage (author names, GPS coordinates)
  • Cataloging media libraries with consistent metadata
  • Verifying document authenticity and modification history
  • Extracting EXIF data from images for geolocation or timestamp analysis
  • Cleaning metadata before publishing or sharing files

Consider alternatives when:

  • You need to modify file content, not just metadata
  • Working with database records rather than files
  • Need real-time metadata streaming from live sources

Quick Start

# Extract image metadata claude skill activate ultimate-metadata-extraction-engine # Analyze image EXIF data claude "Extract all metadata from photo.jpg including GPS coordinates" # Audit documents for sensitive metadata claude "Scan all PDFs in /docs/ for author information and revision history"

Example Extraction

# ExifTool - comprehensive metadata extraction exiftool -all photo.jpg # Extract GPS coordinates from images exiftool -gpslatitude -gpslongitude -gpsaltitude photo.jpg # Extract metadata from PDF exiftool -Author -Creator -Producer -CreateDate -ModifyDate document.pdf # Bulk extract from directory exiftool -csv -r /photos/ > metadata_report.csv # Strip all metadata from files before sharing exiftool -all= -overwrite_original sensitive_document.pdf
# Python metadata extraction from PIL import Image from PIL.ExifTags import TAGS, GPSTAGS import json def extract_image_metadata(image_path: str) -> dict: img = Image.open(image_path) exif_data = img._getexif() if not exif_data: return {"error": "No EXIF data found"} metadata = {} for tag_id, value in exif_data.items(): tag_name = TAGS.get(tag_id, tag_id) if tag_name == "GPSInfo": gps = {} for gps_tag_id, gps_value in value.items(): gps_tag = GPSTAGS.get(gps_tag_id, gps_tag_id) gps[gps_tag] = gps_value metadata["GPSInfo"] = gps else: metadata[tag_name] = str(value) return metadata

Core Concepts

Supported Metadata Types

File TypeMetadata StandardKey Fields
Images (JPEG, PNG, TIFF)EXIF, IPTC, XMPCamera model, GPS, date, aperture, ISO
PDF DocumentsPDF Info, XMPAuthor, creator app, creation/mod dates, revision count
Office DocumentsOOXML, OLEAuthor, company, last editor, revision, template
Audio (MP3, FLAC, WAV)ID3, VorbisArtist, album, track, year, genre, duration
Video (MP4, MKV, AVI)QuickTime, MatroskaResolution, codec, duration, creation date, GPS
HTML/WebMeta tags, OpenGraphTitle, description, keywords, OG image, canonical

Metadata Analysis Use Cases

Use CaseWhat to ExtractWhy
Privacy AuditGPS, author, device IDs, edit historyPrevent PII leakage in published files
AuthenticationCreation dates, software versions, modification historyVerify document provenance
CatalogingTitle, description, tags, dates, dimensionsBuild searchable media libraries
ForensicsAll available metadata including hidden/deletedIncident investigation evidence
SEOTitle, description, Open Graph, structured dataOptimize web page metadata
ComplianceAuthor, dates, access controls, classificationMeet regulatory requirements

Configuration

ParameterDescriptionDefault
recursiveScan directories recursivelytrue
file_typesFile extensions to process["*"] (all)
output_formatOutput format: json, csv, markdown, xmljson
include_binaryInclude binary/hex metadata fieldsfalse
gps_formatGPS coordinate format: dms, decimaldecimal
strip_modeWhat to strip: all, gps, author, customnone
hash_filesCompute file hashes during extractiontrue

Best Practices

  1. Always strip metadata before publishing files externally — Images from smartphones embed GPS coordinates, device serial numbers, and owner names. Documents contain author names, revision history, and printer details. Remove all non-essential metadata before sharing.

  2. Use ExifTool for cross-format consistency — ExifTool handles over 400 file formats with a unified interface. Using format-specific tools creates inconsistent workflows and misses metadata that crosses format boundaries (XMP embedded in both images and PDFs).

  3. Preserve original files when stripping metadata — Create copies before removing metadata. Use the -overwrite_original flag only when you've verified the stripped version is correct. Some workflows depend on metadata that may not be obvious.

  4. Validate GPS coordinates for plausibility — Extracted GPS data can be inaccurate due to poor signal, cached locations, or timezone mismatches. Cross-reference with other evidence (photo content, timestamps, cell tower data) for forensic work.

  5. Check for hidden metadata streams — Files can contain multiple metadata blocks (EXIF, IPTC, XMP) with conflicting information. Some metadata survives standard stripping — use exiftool -all:all to see everything and -all:all= to remove everything.

Common Issues

Metadata extraction returns empty results despite file containing data. The file may use a non-standard metadata location or encoding. Try ExifTool with -u (unknown tags) and -G (group names) flags. Some applications embed metadata in proprietary formats that require specific parsers. Also check if the file has been pre-stripped by a CDN or upload service.

GPS coordinates appear incorrect or offset by hundreds of meters. Check the GPS datum and coordinate reference fields. Older devices may use different datums (NAD27 vs WGS84), and some cameras record the location where the device last had GPS lock rather than the actual photo location. Indoor photos often have cached outdoor coordinates.

Stripping metadata corrupts the file or changes its appearance. Some file formats interleave metadata with rendering data. JPEG thumbnails stored in EXIF can be lost during stripping, and ICC color profiles (technically metadata) affect how images display. Strip selectively — remove identifying information while preserving color profiles, orientation tags, and format-essential metadata.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates