P

Pro YouTube Transcript Fetcher Workshop

Enterprise-ready skill that automates fetch and process transcripts from video platforms. Built for Claude Code with best practices and real-world patterns.

SkillCommunitycontentv1.0.0MIT
0 views0 copies

YouTube Transcript Fetcher Workshop

Automated YouTube video transcript extraction toolkit that fetches captions, subtitles, and auto-generated transcripts from YouTube videos for analysis, summarization, and content repurposing.

When to Use This Skill

Choose YouTube Transcript Fetcher when:

  • Extracting transcripts from YouTube videos for text analysis
  • Creating written content from video transcripts (blog posts, articles)
  • Building searchable indexes of video content
  • Summarizing long-form video content
  • Translating video content through transcript processing

Consider alternatives when:

  • Need real-time speech-to-text — use Whisper or cloud STT services
  • Processing non-YouTube video files — use local transcription tools
  • Need video editing — use video editing software

Quick Start

# Activate transcript fetcher claude skill activate pro-youtube-transcript-fetcher-workshop # Fetch a single video transcript claude "Get the transcript from https://youtube.com/watch?v=dQw4w9WgXcQ" # Batch fetch and summarize claude "Fetch transcripts from this YouTube playlist and create summaries"

Example Transcript Extraction

// Using youtube-transcript-api (Node.js) import { YoutubeTranscript } from 'youtube-transcript'; async function getTranscript(videoId: string) { const items = await YoutubeTranscript.fetchTranscript(videoId); // Raw transcript with timestamps const timestamped = items.map(item => ({ text: item.text, start: item.offset / 1000, // seconds duration: item.duration / 1000, })); // Plain text version const plainText = items.map(item => item.text).join(' '); // Chunked by time segments (5-minute chunks) const chunks = chunkByTime(timestamped, 300); return { timestamped, plainText, chunks }; } function chunkByTime(items: TranscriptItem[], chunkSeconds: number) { const chunks: { startTime: number; text: string }[] = []; let currentChunk = { startTime: 0, texts: [] as string[] }; for (const item of items) { if (item.start - currentChunk.startTime > chunkSeconds) { chunks.push({ startTime: currentChunk.startTime, text: currentChunk.texts.join(' '), }); currentChunk = { startTime: item.start, texts: [] }; } currentChunk.texts.push(item.text); } if (currentChunk.texts.length > 0) { chunks.push({ startTime: currentChunk.startTime, text: currentChunk.texts.join(' '), }); } return chunks; }

Core Concepts

Transcript Types

TypeDescriptionQuality
Manual CaptionsHuman-created captions uploaded by creatorHighest
Auto-generatedYouTube's speech recognitionGood (varies by accent/topic)
TranslatedAuto-translated from original languageMedium
CommunityContributed by community membersHigh
ASR (fallback)Automatic speech recognition, less refinedFair

Processing Pipeline

StageActionOutput
FetchRetrieve transcript via YouTube API or scrapingRaw transcript items
CleanRemove filler words, fix formatting, merge fragmentsClean text
TimestampAlign text with video timestampsTimestamped segments
ChunkSplit into logical sections (by time, topic, or chapter)Content chunks
AnalyzeExtract key topics, speakers, and themesAnalysis report
ExportFormat for target use caseMarkdown, SRT, JSON
# Python transcript extraction from youtube_transcript_api import YouTubeTranscriptApi def fetch_transcript(video_id: str, language: str = 'en') -> dict: try: # Try manual captions first, then auto-generated transcript_list = YouTubeTranscriptApi.list_transcripts(video_id) try: transcript = transcript_list.find_manually_created_transcript([language]) except: transcript = transcript_list.find_generated_transcript([language]) items = transcript.fetch() return { 'video_id': video_id, 'language': language, 'is_generated': transcript.is_generated, 'segments': [ { 'text': item['text'], 'start': item['start'], 'duration': item['duration'], } for item in items ], 'full_text': ' '.join(item['text'] for item in items), } except Exception as e: return {'error': str(e), 'video_id': video_id}

Configuration

ParameterDescriptionDefault
languagePreferred transcript languageen
fallback_languagesLanguages to try if preferred unavailable["en"]
prefer_manualPrefer manual over auto-generated captionstrue
include_timestampsInclude timestamps in outputtrue
chunk_durationSegment duration for chunking (seconds)300
output_formatOutput: text, srt, vtt, json, markdownmarkdown
clean_textRemove filler words and formatting artifactstrue

Best Practices

  1. Always prefer manual captions over auto-generated — Manual captions are more accurate, especially for technical content, proper nouns, and domain-specific terminology. Check caption availability before falling back to auto-generated.

  2. Clean auto-generated transcript artifacts — YouTube's auto-captions include repeated words, filler sounds ("[Music]", "[Laughter]"), and missing punctuation. Post-process with text cleaning that merges fragmented sentences and removes markup artifacts.

  3. Chunk transcripts by topic, not just time — Fixed-time chunking splits ideas mid-thought. Use YouTube chapter markers when available, or detect topic boundaries using text similarity between adjacent segments.

  4. Cache transcripts to avoid repeated API calls — YouTube transcripts rarely change after upload. Cache by video ID with indefinite TTL and only re-fetch if processing parameters change.

  5. Respect YouTube's Terms of Service — Use official APIs where possible, implement reasonable rate limiting, and don't use transcripts for competitive content creation without adding substantial original value.

Common Issues

Video has no available transcript or captions. Not all videos have captions — creators can disable them, and some video types (music, ambient) don't generate useful auto-captions. Check transcript availability before processing and handle the "no transcript" case gracefully in batch operations.

Auto-generated captions have poor accuracy for technical content. Technical terms, acronyms, and product names are frequently mangled by auto-speech-recognition. Build a correction dictionary for domain-specific terms and run a post-processing pass to fix common misrecognitions.

Transcript extraction is blocked by anti-bot measures. YouTube periodically blocks automated transcript fetching. Use the official YouTube Data API for reliable access, implement exponential backoff on rate limit errors, and rotate request patterns. Library versions matter — keep transcript extraction dependencies updated.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates