Pro YouTube Transcript Fetcher Workshop
Enterprise-ready skill that automates fetch and process transcripts from video platforms. Built for Claude Code with best practices and real-world patterns.
YouTube Transcript Fetcher Workshop
Automated YouTube video transcript extraction toolkit that fetches captions, subtitles, and auto-generated transcripts from YouTube videos for analysis, summarization, and content repurposing.
When to Use This Skill
Choose YouTube Transcript Fetcher when:
- Extracting transcripts from YouTube videos for text analysis
- Creating written content from video transcripts (blog posts, articles)
- Building searchable indexes of video content
- Summarizing long-form video content
- Translating video content through transcript processing
Consider alternatives when:
- Need real-time speech-to-text — use Whisper or cloud STT services
- Processing non-YouTube video files — use local transcription tools
- Need video editing — use video editing software
Quick Start
# Activate transcript fetcher claude skill activate pro-youtube-transcript-fetcher-workshop # Fetch a single video transcript claude "Get the transcript from https://youtube.com/watch?v=dQw4w9WgXcQ" # Batch fetch and summarize claude "Fetch transcripts from this YouTube playlist and create summaries"
Example Transcript Extraction
// Using youtube-transcript-api (Node.js) import { YoutubeTranscript } from 'youtube-transcript'; async function getTranscript(videoId: string) { const items = await YoutubeTranscript.fetchTranscript(videoId); // Raw transcript with timestamps const timestamped = items.map(item => ({ text: item.text, start: item.offset / 1000, // seconds duration: item.duration / 1000, })); // Plain text version const plainText = items.map(item => item.text).join(' '); // Chunked by time segments (5-minute chunks) const chunks = chunkByTime(timestamped, 300); return { timestamped, plainText, chunks }; } function chunkByTime(items: TranscriptItem[], chunkSeconds: number) { const chunks: { startTime: number; text: string }[] = []; let currentChunk = { startTime: 0, texts: [] as string[] }; for (const item of items) { if (item.start - currentChunk.startTime > chunkSeconds) { chunks.push({ startTime: currentChunk.startTime, text: currentChunk.texts.join(' '), }); currentChunk = { startTime: item.start, texts: [] }; } currentChunk.texts.push(item.text); } if (currentChunk.texts.length > 0) { chunks.push({ startTime: currentChunk.startTime, text: currentChunk.texts.join(' '), }); } return chunks; }
Core Concepts
Transcript Types
| Type | Description | Quality |
|---|---|---|
| Manual Captions | Human-created captions uploaded by creator | Highest |
| Auto-generated | YouTube's speech recognition | Good (varies by accent/topic) |
| Translated | Auto-translated from original language | Medium |
| Community | Contributed by community members | High |
| ASR (fallback) | Automatic speech recognition, less refined | Fair |
Processing Pipeline
| Stage | Action | Output |
|---|---|---|
| Fetch | Retrieve transcript via YouTube API or scraping | Raw transcript items |
| Clean | Remove filler words, fix formatting, merge fragments | Clean text |
| Timestamp | Align text with video timestamps | Timestamped segments |
| Chunk | Split into logical sections (by time, topic, or chapter) | Content chunks |
| Analyze | Extract key topics, speakers, and themes | Analysis report |
| Export | Format for target use case | Markdown, SRT, JSON |
# Python transcript extraction from youtube_transcript_api import YouTubeTranscriptApi def fetch_transcript(video_id: str, language: str = 'en') -> dict: try: # Try manual captions first, then auto-generated transcript_list = YouTubeTranscriptApi.list_transcripts(video_id) try: transcript = transcript_list.find_manually_created_transcript([language]) except: transcript = transcript_list.find_generated_transcript([language]) items = transcript.fetch() return { 'video_id': video_id, 'language': language, 'is_generated': transcript.is_generated, 'segments': [ { 'text': item['text'], 'start': item['start'], 'duration': item['duration'], } for item in items ], 'full_text': ' '.join(item['text'] for item in items), } except Exception as e: return {'error': str(e), 'video_id': video_id}
Configuration
| Parameter | Description | Default |
|---|---|---|
language | Preferred transcript language | en |
fallback_languages | Languages to try if preferred unavailable | ["en"] |
prefer_manual | Prefer manual over auto-generated captions | true |
include_timestamps | Include timestamps in output | true |
chunk_duration | Segment duration for chunking (seconds) | 300 |
output_format | Output: text, srt, vtt, json, markdown | markdown |
clean_text | Remove filler words and formatting artifacts | true |
Best Practices
-
Always prefer manual captions over auto-generated — Manual captions are more accurate, especially for technical content, proper nouns, and domain-specific terminology. Check caption availability before falling back to auto-generated.
-
Clean auto-generated transcript artifacts — YouTube's auto-captions include repeated words, filler sounds ("[Music]", "[Laughter]"), and missing punctuation. Post-process with text cleaning that merges fragmented sentences and removes markup artifacts.
-
Chunk transcripts by topic, not just time — Fixed-time chunking splits ideas mid-thought. Use YouTube chapter markers when available, or detect topic boundaries using text similarity between adjacent segments.
-
Cache transcripts to avoid repeated API calls — YouTube transcripts rarely change after upload. Cache by video ID with indefinite TTL and only re-fetch if processing parameters change.
-
Respect YouTube's Terms of Service — Use official APIs where possible, implement reasonable rate limiting, and don't use transcripts for competitive content creation without adding substantial original value.
Common Issues
Video has no available transcript or captions. Not all videos have captions — creators can disable them, and some video types (music, ambient) don't generate useful auto-captions. Check transcript availability before processing and handle the "no transcript" case gracefully in batch operations.
Auto-generated captions have poor accuracy for technical content. Technical terms, acronyms, and product names are frequently mangled by auto-speech-recognition. Build a correction dictionary for domain-specific terms and run a post-processing pass to fix common misrecognitions.
Transcript extraction is blocked by anti-bot measures. YouTube periodically blocks automated transcript fetching. Use the official YouTube Data API for reliable access, implement exponential backoff on rate limit errors, and rotate request patterns. Library versions matter — keep transcript extraction dependencies updated.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.