Comprehensive File Uploads
Comprehensive skill designed for expert, handling, file, uploads. Includes structured workflows, validation checks, and reusable patterns for security.
Comprehensive File Uploads
Implement secure, performant file upload systems with validation, storage management, and processing pipelines. This skill covers multipart upload handling, file type validation, virus scanning integration, cloud storage backends, image/document processing, and resumable upload protocols.
When to Use This Skill
Choose Comprehensive File Uploads when you need to:
- Build file upload endpoints with proper security validation
- Handle large file uploads with chunking and resumable protocols
- Integrate upload processing with cloud storage (S3, GCS, Azure Blob)
- Implement file type restrictions, size limits, and content scanning
Consider alternatives when:
- You need general file I/O and format conversion (use Ultimate File Framework)
- You need image editing after upload (use image processing libraries)
- You need document parsing from uploaded files (use specialized parsers)
Quick Start
# Express.js-style file upload (Python Flask equivalent) from flask import Flask, request, jsonify from werkzeug.utils import secure_filename import os import hashlib import magic app = Flask(__name__) UPLOAD_DIR = '/tmp/uploads' MAX_SIZE = 10 * 1024 * 1024 # 10MB ALLOWED_TYPES = {'image/jpeg', 'image/png', 'image/webp', 'application/pdf'} os.makedirs(UPLOAD_DIR, exist_ok=True) def validate_file(file_storage): """Validate uploaded file for security.""" # Check size file_storage.seek(0, 2) size = file_storage.tell() file_storage.seek(0) if size > MAX_SIZE: return False, f"File too large: {size} bytes (max {MAX_SIZE})" # Check MIME type from content (not extension) header = file_storage.read(2048) file_storage.seek(0) mime = magic.from_buffer(header, mime=True) if mime not in ALLOWED_TYPES: return False, f"File type not allowed: {mime}" # Check for embedded scripts in images if mime.startswith('image/'): content = file_storage.read() file_storage.seek(0) if b'<script' in content.lower() or b'javascript:' in content.lower(): return False, "File contains embedded scripts" return True, "Valid" @app.route('/upload', methods=['POST']) def upload_file(): if 'file' not in request.files: return jsonify({'error': 'No file provided'}), 400 file = request.files['file'] if file.filename == '': return jsonify({'error': 'No filename'}), 400 valid, message = validate_file(file) if not valid: return jsonify({'error': message}), 400 # Generate safe filename with hash content = file.read() file_hash = hashlib.sha256(content).hexdigest()[:16] safe_name = secure_filename(file.filename) ext = os.path.splitext(safe_name)[1] stored_name = f"{file_hash}{ext}" filepath = os.path.join(UPLOAD_DIR, stored_name) with open(filepath, 'wb') as f: f.write(content) return jsonify({ 'filename': stored_name, 'size': len(content), 'hash': file_hash, }), 201
Core Concepts
Upload Security Checklist
| Check | Purpose | Implementation |
|---|---|---|
| MIME validation | Prevent disguised executables | python-magic on content bytes |
| Extension whitelist | First-layer file type filter | Check against allowed list |
| Size limits | Prevent storage exhaustion | Check Content-Length and actual size |
| Filename sanitization | Prevent path traversal | secure_filename() or hash-based names |
| Content scanning | Detect embedded malware/scripts | ClamAV or cloud scanning API |
| Rate limiting | Prevent upload flooding | Per-user upload quotas |
| Deduplication | Avoid storing duplicate files | Content-addressable storage (SHA-256) |
Chunked Upload Handler
import os import hashlib from pathlib import Path from typing import Optional class ChunkedUploadHandler: """Handle large file uploads with chunking and resumability.""" def __init__(self, upload_dir: str, chunk_size: int = 5 * 1024 * 1024): self.upload_dir = Path(upload_dir) self.chunk_dir = self.upload_dir / '.chunks' self.chunk_dir.mkdir(parents=True, exist_ok=True) self.chunk_size = chunk_size def init_upload(self, filename: str, total_size: int) -> str: """Initialize a new chunked upload session.""" upload_id = hashlib.sha256( f"{filename}:{total_size}:{os.urandom(16).hex()}".encode() ).hexdigest()[:32] session_dir = self.chunk_dir / upload_id session_dir.mkdir(exist_ok=True) meta = { 'filename': filename, 'total_size': total_size, 'total_chunks': (total_size + self.chunk_size - 1) // self.chunk_size, 'received_chunks': [], } import json (session_dir / 'meta.json').write_text(json.dumps(meta)) return upload_id def upload_chunk(self, upload_id: str, chunk_index: int, chunk_data: bytes) -> dict: """Upload a single chunk.""" import json session_dir = self.chunk_dir / upload_id meta_path = session_dir / 'meta.json' if not meta_path.exists(): raise ValueError("Invalid upload session") meta = json.loads(meta_path.read_text()) chunk_path = session_dir / f"chunk_{chunk_index:06d}" chunk_path.write_bytes(chunk_data) if chunk_index not in meta['received_chunks']: meta['received_chunks'].append(chunk_index) meta_path.write_text(json.dumps(meta)) return { 'received': len(meta['received_chunks']), 'total': meta['total_chunks'], 'complete': len(meta['received_chunks']) == meta['total_chunks'], } def finalize(self, upload_id: str) -> Optional[str]: """Assemble chunks into final file.""" import json session_dir = self.chunk_dir / upload_id meta = json.loads((session_dir / 'meta.json').read_text()) if len(meta['received_chunks']) != meta['total_chunks']: return None output_path = self.upload_dir / meta['filename'] with open(output_path, 'wb') as out: for i in range(meta['total_chunks']): chunk_path = session_dir / f"chunk_{i:06d}" out.write(chunk_path.read_bytes()) # Cleanup chunks import shutil shutil.rmtree(session_dir) return str(output_path) # handler = ChunkedUploadHandler("/tmp/uploads")
Configuration
| Parameter | Description | Default |
|---|---|---|
max_file_size | Maximum upload size | 10MB |
chunk_size | Size of each upload chunk | 5MB |
allowed_mime_types | Whitelist of allowed MIME types | Application-specific |
upload_dir | File storage directory | "./uploads" |
filename_strategy | Naming (original, hash, uuid) | "hash" |
scan_uploads | Enable malware scanning | true |
deduplicate | Content-addressable storage | false |
max_concurrent | Max simultaneous uploads per user | 3 |
Best Practices
-
Never trust file extensions — validate MIME type from content — Use
python-magicto read the file's magic bytes and determine the actual type. A file namedphoto.jpgcould be a PHP script. Validate content type independently of the provided filename and Content-Type header. -
Store files with generated names, not user-provided names — User-provided filenames can contain path traversal sequences (
../../etc/passwd), null bytes, or special characters that cause issues on different filesystems. Use content hashes or UUIDs as storage names and store the original name in metadata. -
Set file size limits at multiple layers — Enforce size limits in the reverse proxy (nginx:
client_max_body_size), application server, and application code. Relying on a single layer means a misconfiguration exposes you to storage exhaustion attacks. -
Process uploads asynchronously for large files — Don't block the HTTP request while processing (resizing images, generating thumbnails, scanning for malware). Accept the upload, return a job ID, and process in a background worker. Notify the client via webhook or polling endpoint.
-
Implement resumable uploads for files over 5MB — Network interruptions are common for large uploads. Use chunked upload protocols (tus.io, S3 multipart) that allow clients to resume from the last successful chunk rather than re-uploading the entire file.
Common Issues
Uploads fail silently for large files — The web server (nginx, Apache) may have a lower size limit than your application. Check client_max_body_size in nginx or LimitRequestBody in Apache. Also verify that the application framework's request body limit matches.
MIME type detection disagrees with file extension — Some files have ambiguous magic bytes (e.g., XLSX files are ZIP archives). Use a two-step validation: first check the magic bytes for the broad category, then validate the extension against expected extensions for that MIME type.
Uploaded images contain EXIF GPS data — Photos from phones include GPS coordinates, camera model, and timestamps in EXIF metadata. Strip EXIF data before storing user uploads: from PIL import Image; img = Image.open(path); img.save(path, exif=b'') to prevent location data leakage.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.