Ultimate Video Framework

A comprehensive skill for programmatic video creation and processing — covering video generation from scripts, clip assembly with FFmpeg, subtitle overlay, transition effects, thumbnail generation, and automated video pipeline workflows.

When to Use This Skill

Choose Ultimate Video Framework when you need to:

Generate videos programmatically from scripts and assets
Assemble clips with transitions, overlays, and subtitles
Convert, compress, or reformat video files
Create video thumbnails and preview GIFs
Build automated video processing pipelines

Consider alternatives when:

You need AI-generated video content (use an AI video generation tool)
You need real-time video streaming (use a streaming platform skill)
You need video editing with a GUI (use a desktop video editor)

Quick Start


# Basic video operations with FFmpeg

# Convert video format
ffmpeg -i input.mov -c:v libx264 -crf 23 -c:a aac output.mp4

# Trim video (start at 30s, duration 60s)
ffmpeg -i input.mp4 -ss 00:00:30 -t 00:01:00 -c copy trimmed.mp4

# Extract audio from video
ffmpeg -i input.mp4 -vn -c:a libmp3lame -q:a 2 audio.mp3

# Generate thumbnail at 5 seconds
ffmpeg -i input.mp4 -ss 00:00:05 -vframes 1 thumbnail.jpg


# Video pipeline with MoviePy
from moviepy.editor import (
    VideoFileClip, TextClip, CompositeVideoClip,
    concatenate_videoclips, AudioFileClip
)

def create_video_from_clips(clips_config, output_path):
    """Assemble a video from multiple clips with transitions."""
    clips = []

    for config in clips_config:
        clip = VideoFileClip(config["path"])
        if "start" in config:
            clip = clip.subclip(config["start"], config.get("end"))
        if "resize" in config:
            clip = clip.resize(config["resize"])
        clips.append(clip)

    # Concatenate with crossfade
    final = concatenate_videoclips(clips, method="compose")

    # Add background audio if provided
    if "audio" in clips_config[0]:
        audio = AudioFileClip(clips_config[0]["audio"])
        final = final.set_audio(audio.subclip(0, final.duration))

    final.write_videofile(output_path, codec="libx264", audio_codec="aac")

clips = [
    {"path": "intro.mp4", "end": 5},
    {"path": "demo.mp4", "start": 10, "end": 60},
    {"path": "outro.mp4"},
]
create_video_from_clips(clips, "final_video.mp4")

Core Concepts

FFmpeg Common Operations

Operation	Command	Purpose
Convert	`ffmpeg -i in.mov -c:v libx264 out.mp4`	Format conversion
Trim	`ffmpeg -i in.mp4 -ss 0:30 -t 1:00 out.mp4`	Cut segment
Resize	`ffmpeg -i in.mp4 -vf scale=1280:720 out.mp4`	Resolution change
Compress	`ffmpeg -i in.mp4 -crf 28 out.mp4`	Reduce file size
Audio extract	`ffmpeg -i in.mp4 -vn audio.mp3`	Extract audio track
Thumbnail	`ffmpeg -i in.mp4 -ss 5 -vframes 1 thumb.jpg`	Generate thumbnail
GIF	`ffmpeg -i in.mp4 -vf "fps=10,scale=480:-1" out.gif`	Preview GIF
Subtitle burn	`ffmpeg -i in.mp4 -vf subtitles=sub.srt out.mp4`	Burn subtitles

Subtitle Overlay


# Burn subtitles into video
ffmpeg -i video.mp4 -vf "subtitles=captions.srt:force_style='\
  FontName=Arial,FontSize=24,PrimaryColour=&H00FFFFFF,\
  OutlineColour=&H00000000,Outline=2,Shadow=1'" \
  -c:a copy output_with_subs.mp4

# Add text overlay (watermark/title)
ffmpeg -i video.mp4 -vf "drawtext=text='Demo Video':\
  fontsize=36:fontcolor=white:x=(w-text_w)/2:y=50:\
  fontfile=/path/to/font.ttf:borderw=2:bordercolor=black" \
  -c:a copy output_with_title.mp4

Video Assembly Pipeline


# Automated video assembly from a script
import subprocess
import json
from pathlib import Path

def assemble_video(script_path, output_path):
    """Build a video from a JSON script definition."""
    with open(script_path) as f:
        script = json.load(f)

    temp_files = []

    for i, segment in enumerate(script["segments"]):
        temp_out = f"/tmp/segment_{i:03d}.mp4"

        if segment["type"] == "clip":
            # Trim and resize clip
            cmd = [
                "ffmpeg", "-y", "-i", segment["source"],
                "-ss", str(segment.get("start", 0)),
                "-t", str(segment.get("duration", 5)),
                "-vf", f"scale={script['width']}:{script['height']}",
                "-c:v", "libx264", "-crf", "23",
                temp_out
            ]
            subprocess.run(cmd, capture_output=True)

        elif segment["type"] == "title":
            # Generate title card
            cmd = [
                "ffmpeg", "-y",
                "-f", "lavfi", "-i",
                f"color=c={segment.get('bg', 'black')}:s={script['width']}x{script['height']}:d={segment.get('duration', 3)}",
                "-vf", f"drawtext=text='{segment['text']}':fontsize=48:fontcolor=white:x=(w-text_w)/2:y=(h-text_h)/2",
                temp_out
            ]
            subprocess.run(cmd, capture_output=True)

        temp_files.append(temp_out)

    # Create concat list
    concat_file = "/tmp/concat_list.txt"
    with open(concat_file, "w") as f:
        for tf in temp_files:
            f.write(f"file '{tf}'\n")

    # Concatenate all segments
    subprocess.run([
        "ffmpeg", "-y", "-f", "concat", "-safe", "0",
        "-i", concat_file, "-c", "copy", output_path
    ], capture_output=True)

    print(f"Assembled: {output_path}")

Configuration

Parameter	Description	Example
`width`	Output video width	`1920`
`height`	Output video height	`1080`
`fps`	Frames per second	`30`
`codec`	Video codec	`"libx264"` / `"libx265"`
`crf`	Quality (lower = better, 18-28)	`23`
`audio_codec`	Audio codec	`"aac"` / `"libmp3lame"`

Best Practices

Use CRF for quality control, not bitrate — CRF (Constant Rate Factor) produces consistent quality regardless of scene complexity. CRF 18 is visually lossless, 23 is good quality, 28 is acceptable. Bitrate targeting wastes bits on simple scenes and starves complex ones.
Process video segments in parallel — When assembling from multiple clips, encode each segment independently in parallel, then concatenate. This is much faster than sequential processing and takes advantage of multi-core CPUs.
Generate preview thumbnails before processing full videos — Extract a frame at the midpoint of each clip and review the previews before running a full assembly. This catches wrong clips, bad frames, and sizing issues without waiting for full encoding.
Use -c copy when no re-encoding is needed — When trimming, concatenating, or remuxing without changing quality, codec, or resolution, use stream copy mode (-c copy) for instant processing. Re-encoding a 2-hour video just to trim 30 seconds wastes significant time.
Always set explicit output resolution — When combining clips from different sources, force all clips to the same resolution with scale=W:H. Mismatched resolutions cause FFmpeg errors or distorted output.

Common Issues

Concatenating clips produces audio sync drift — When clips have different audio sample rates or codec parameters, concatenation can cause gradual audio drift. Re-encode all clips to the same audio parameters before concatenation: -c:a aac -ar 44100 -ac 2.

Video plays in some players but not others — Use the faststart flag for web delivery: ffmpeg -i in.mp4 -movflags +faststart out.mp4. This moves the metadata to the beginning of the file so streaming players can start playback without downloading the entire file.

Subtitle text is too small or unreadable — When burning subtitles, the default font size scales with the video resolution. A subtitle that looks fine at 1080p is tiny at 4K. Set explicit font size relative to the video height: FontSize=max(24, video_height/30) as a starting formula.

⚠️ Loading Issue

Ultimate Video Framework

Ultimate Video Framework

When to Use This Skill

Quick Start

Core Concepts

FFmpeg Common Operations

Subtitle Overlay

Video Assembly Pipeline

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace