Comprehensive File Uploads

Implement secure, performant file upload systems with validation, storage management, and processing pipelines. This skill covers multipart upload handling, file type validation, virus scanning integration, cloud storage backends, image/document processing, and resumable upload protocols.

When to Use This Skill

Choose Comprehensive File Uploads when you need to:

Build file upload endpoints with proper security validation
Handle large file uploads with chunking and resumable protocols
Integrate upload processing with cloud storage (S3, GCS, Azure Blob)
Implement file type restrictions, size limits, and content scanning

Consider alternatives when:

You need general file I/O and format conversion (use Ultimate File Framework)
You need image editing after upload (use image processing libraries)
You need document parsing from uploaded files (use specialized parsers)

Quick Start


# Express.js-style file upload (Python Flask equivalent)
from flask import Flask, request, jsonify
from werkzeug.utils import secure_filename
import os
import hashlib
import magic

app = Flask(__name__)
UPLOAD_DIR = '/tmp/uploads'
MAX_SIZE = 10 * 1024 * 1024  # 10MB
ALLOWED_TYPES = {'image/jpeg', 'image/png', 'image/webp', 'application/pdf'}

os.makedirs(UPLOAD_DIR, exist_ok=True)

def validate_file(file_storage):
    """Validate uploaded file for security."""
    # Check size
    file_storage.seek(0, 2)
    size = file_storage.tell()
    file_storage.seek(0)
    if size > MAX_SIZE:
        return False, f"File too large: {size} bytes (max {MAX_SIZE})"

    # Check MIME type from content (not extension)
    header = file_storage.read(2048)
    file_storage.seek(0)
    mime = magic.from_buffer(header, mime=True)
    if mime not in ALLOWED_TYPES:
        return False, f"File type not allowed: {mime}"

    # Check for embedded scripts in images
    if mime.startswith('image/'):
        content = file_storage.read()
        file_storage.seek(0)
        if b'<script' in content.lower() or b'javascript:' in content.lower():
            return False, "File contains embedded scripts"

    return True, "Valid"

@app.route('/upload', methods=['POST'])
def upload_file():
    if 'file' not in request.files:
        return jsonify({'error': 'No file provided'}), 400

    file = request.files['file']
    if file.filename == '':
        return jsonify({'error': 'No filename'}), 400

    valid, message = validate_file(file)
    if not valid:
        return jsonify({'error': message}), 400

    # Generate safe filename with hash
    content = file.read()
    file_hash = hashlib.sha256(content).hexdigest()[:16]
    safe_name = secure_filename(file.filename)
    ext = os.path.splitext(safe_name)[1]
    stored_name = f"{file_hash}{ext}"

    filepath = os.path.join(UPLOAD_DIR, stored_name)
    with open(filepath, 'wb') as f:
        f.write(content)

    return jsonify({
        'filename': stored_name,
        'size': len(content),
        'hash': file_hash,
    }), 201

Core Concepts

Upload Security Checklist

Check	Purpose	Implementation
MIME validation	Prevent disguised executables	`python-magic` on content bytes
Extension whitelist	First-layer file type filter	Check against allowed list
Size limits	Prevent storage exhaustion	Check Content-Length and actual size
Filename sanitization	Prevent path traversal	`secure_filename()` or hash-based names
Content scanning	Detect embedded malware/scripts	ClamAV or cloud scanning API
Rate limiting	Prevent upload flooding	Per-user upload quotas
Deduplication	Avoid storing duplicate files	Content-addressable storage (SHA-256)

Chunked Upload Handler


import os
import hashlib
from pathlib import Path
from typing import Optional

class ChunkedUploadHandler:
    """Handle large file uploads with chunking and resumability."""

    def __init__(self, upload_dir: str, chunk_size: int = 5 * 1024 * 1024):
        self.upload_dir = Path(upload_dir)
        self.chunk_dir = self.upload_dir / '.chunks'
        self.chunk_dir.mkdir(parents=True, exist_ok=True)
        self.chunk_size = chunk_size

    def init_upload(self, filename: str, total_size: int) -> str:
        """Initialize a new chunked upload session."""
        upload_id = hashlib.sha256(
            f"{filename}:{total_size}:{os.urandom(16).hex()}".encode()
        ).hexdigest()[:32]

        session_dir = self.chunk_dir / upload_id
        session_dir.mkdir(exist_ok=True)

        meta = {
            'filename': filename,
            'total_size': total_size,
            'total_chunks': (total_size + self.chunk_size - 1) // self.chunk_size,
            'received_chunks': [],
        }
        import json
        (session_dir / 'meta.json').write_text(json.dumps(meta))
        return upload_id

    def upload_chunk(self, upload_id: str, chunk_index: int,
                     chunk_data: bytes) -> dict:
        """Upload a single chunk."""
        import json
        session_dir = self.chunk_dir / upload_id
        meta_path = session_dir / 'meta.json'

        if not meta_path.exists():
            raise ValueError("Invalid upload session")

        meta = json.loads(meta_path.read_text())

        chunk_path = session_dir / f"chunk_{chunk_index:06d}"
        chunk_path.write_bytes(chunk_data)

        if chunk_index not in meta['received_chunks']:
            meta['received_chunks'].append(chunk_index)
        meta_path.write_text(json.dumps(meta))

        return {
            'received': len(meta['received_chunks']),
            'total': meta['total_chunks'],
            'complete': len(meta['received_chunks']) == meta['total_chunks'],
        }

    def finalize(self, upload_id: str) -> Optional[str]:
        """Assemble chunks into final file."""
        import json
        session_dir = self.chunk_dir / upload_id
        meta = json.loads((session_dir / 'meta.json').read_text())

        if len(meta['received_chunks']) != meta['total_chunks']:
            return None

        output_path = self.upload_dir / meta['filename']
        with open(output_path, 'wb') as out:
            for i in range(meta['total_chunks']):
                chunk_path = session_dir / f"chunk_{i:06d}"
                out.write(chunk_path.read_bytes())

        # Cleanup chunks
        import shutil
        shutil.rmtree(session_dir)

        return str(output_path)

# handler = ChunkedUploadHandler("/tmp/uploads")

Configuration

Parameter	Description	Default
`max_file_size`	Maximum upload size	`10MB`
`chunk_size`	Size of each upload chunk	`5MB`
`allowed_mime_types`	Whitelist of allowed MIME types	Application-specific
`upload_dir`	File storage directory	`"./uploads"`
`filename_strategy`	Naming (original, hash, uuid)	`"hash"`
`scan_uploads`	Enable malware scanning	`true`
`deduplicate`	Content-addressable storage	`false`
`max_concurrent`	Max simultaneous uploads per user	`3`

Best Practices

Never trust file extensions — validate MIME type from content — Use python-magic to read the file's magic bytes and determine the actual type. A file named photo.jpg could be a PHP script. Validate content type independently of the provided filename and Content-Type header.
Store files with generated names, not user-provided names — User-provided filenames can contain path traversal sequences (../../etc/passwd), null bytes, or special characters that cause issues on different filesystems. Use content hashes or UUIDs as storage names and store the original name in metadata.
Set file size limits at multiple layers — Enforce size limits in the reverse proxy (nginx: client_max_body_size), application server, and application code. Relying on a single layer means a misconfiguration exposes you to storage exhaustion attacks.
Process uploads asynchronously for large files — Don't block the HTTP request while processing (resizing images, generating thumbnails, scanning for malware). Accept the upload, return a job ID, and process in a background worker. Notify the client via webhook or polling endpoint.
Implement resumable uploads for files over 5MB — Network interruptions are common for large uploads. Use chunked upload protocols (tus.io, S3 multipart) that allow clients to resume from the last successful chunk rather than re-uploading the entire file.

Common Issues

Uploads fail silently for large files — The web server (nginx, Apache) may have a lower size limit than your application. Check client_max_body_size in nginx or LimitRequestBody in Apache. Also verify that the application framework's request body limit matches.

MIME type detection disagrees with file extension — Some files have ambiguous magic bytes (e.g., XLSX files are ZIP archives). Use a two-step validation: first check the magic bytes for the broad category, then validate the extension against expected extensions for that MIME type.

Uploaded images contain EXIF GPS data — Photos from phones include GPS coordinates, camera model, and timestamps in EXIF metadata. Strip EXIF data before storing user uploads: from PIL import Image; img = Image.open(path); img.save(path, exif=b'') to prevent location data leakage.

⚠️ Loading Issue

Comprehensive File Uploads

Comprehensive File Uploads

When to Use This Skill

Quick Start

Core Concepts

Upload Security Checklist

Chunked Upload Handler

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace