Podcast & Video Note Taker

Summarize podcasts and videos into structured notes with timestamps, key quotes, and action items from transcripts. This skill transforms long-form audio and video content into scannable, reference-friendly notes that capture the essential insights without requiring you to re-listen. Provide a transcript or paste content, and it produces organized notes with timestamped sections, attributed quotes, concept summaries, and actionable takeaways.

Supported Platforms & Integrations

Platform	Integration Type	Features
YouTube	Transcript extraction	Process auto-generated or manual captions into structured notes with video timestamps
Spotify (via transcript)	Text transcript parsing	Parse Spotify podcast transcripts when available through the app's transcript feature
Apple Podcasts	Transcript file import	Import transcript files from Apple Podcasts' built-in transcript feature
Obsidian	Markdown with YAML	Save notes as Obsidian-compatible pages with frontmatter, tags, and backlinks
Notion	Markdown database	Export notes as Notion database entries with properties for search and filtering
Otter.ai	Transcript export	Import AI-generated transcripts from Otter.ai recordings with speaker labels

When to Use

Podcast backlog processing -- have a queue of saved podcast episodes? Process transcripts into 5-minute-read summaries to decide which deserve full listening
YouTube educational content -- extract structured notes from tutorial videos, conference talks, and lecture recordings with timestamps for key moments
Meeting recording summarization -- turn recorded meetings (via Otter.ai or similar) into action-item-focused notes with speaker attribution
Interview content extraction -- pull key quotes, claims, and insights from interview-format podcasts for research or content creation
Course lecture notes -- process recorded university lectures or online course videos into study-ready notes with concept maps
Content research -- when researching a topic across multiple podcasts/videos, create standardized notes for cross-source comparison

Alternatives to consider:

For real-time transcription during live meetings, Otter.ai or Fireflies.ai provide live capture with speaker diarization
For YouTube-specific AI summaries, tools like Summarize.tech or Eightify generate instant video summaries
For podcast-specific notes with audio highlights, Snipd captures and organizes audio clips alongside transcripts

Quick Start


# podcast-video-note-taker.yaml
skill:
  name: podcast-video-note-taker
  description: Summarize podcasts and videos into structured notes
  config:
    default_source: youtube       # youtube, podcast, transcript_file
    summary_depth: detailed       # brief, detailed, comprehensive
    include_timestamps: true
    include_quotes: true
    max_quotes: 8
    include_action_items: true
    speaker_attribution: true
    output_format: obsidian       # obsidian, notion, markdown, json
    storage_path: ~/.claude/media-notes/

Example interaction:

User: Summarize this podcast transcript about productivity systems.
      [pastes transcript of a 45-minute episode]

Assistant: Podcast Notes: "The Productivity Equation" -- EP 147
  Host: Cal Newport | Guest: James Clear | Duration: 45 min

  Key Topics:
  [03:20] Why most productivity systems fail
  [12:45] The relationship between habits and systems
  [24:10] Environment design for deep work
  [35:30] Handling digital distractions
  [41:00] Actionable takeaways and book recommendations

  Core Insights:
  1. Systems beat goals because they provide daily direction
     without the binary pass/fail of goal-based thinking.
  2. Environment design is the highest-leverage productivity
     intervention -- change your space, change your behavior.
  3. Digital minimalism is not about quitting technology but
     about being intentional with tool selection.

  Key Quotes:
  - James Clear [14:22]: "You don't rise to the level of your
    goals, you fall to the level of your systems."
  - Cal Newport [26:15]: "Deep work is not about working more
    hours. It's about working fewer hours with greater intensity."

  Action Items:
  [ ] Audit current digital tools -- keep only those serving
      clear purposes
  [ ] Designate a physical deep work space separate from
      casual computing
  [ ] Implement time blocking for the next 2 weeks as an experiment

  Saved to: ~/.claude/media-notes/productivity-equation-ep147.md

Advanced Configuration

Platform-specific setup for YouTube transcript processing:


youtube:
  transcript_method: auto         # auto, manual, paste
  language: en                    # transcript language preference
  include_chapter_markers: true   # use YouTube chapter markers if available
  timestamp_format: "MM:SS"       # MM:SS or HH:MM:SS
  link_timestamps: true           # create clickable YouTube timestamp links
  thumbnail_download: false       # download video thumbnail for notes

Full parameter reference:

Parameter	Type	Default	Description
`default_source`	string	youtube	Default source type: youtube, podcast, transcript_file, otter
`summary_depth`	string	detailed	Output depth: brief (300w), detailed (800w), comprehensive (1500w)
`include_timestamps`	boolean	true	Add timestamps to section headers and key moments
`include_quotes`	boolean	true	Extract and attribute notable quotes
`max_quotes`	number	8	Maximum quotes to include per summary
`include_action_items`	boolean	true	Extract actionable takeaways
`speaker_attribution`	boolean	true	Attribute quotes and points to specific speakers
`speaker_names`	object	{}	Map speaker labels to real names
`output_format`	string	markdown	Format: obsidian, notion, markdown, json
`include_topic_tags`	boolean	true	Auto-generate topic tags for the content
`link_related_notes`	boolean	true	Cross-reference with previously processed content
`storage_path`	string	~/.claude/media-notes/	Local directory for all notes
`content_type_hint`	string	auto	Content hint: interview, lecture, tutorial, discussion, monologue

Core Concepts

Concept	Description
Timestamped Sectioning	Dividing content into logical sections with start timestamps so you can jump to specific parts of the original recording
Speaker Diarization	Identifying and labeling which speaker said what throughout the transcript for accurate attribution
Insight Density Score	Rating of how information-rich each section is, helping you identify which parts are worth re-listening to vs skipping
Progressive Summarization	Layered note structure: full transcript reference, section summaries, top insights, and single-line takeaway
Cross-Source Linking	Connecting insights across multiple episodes, videos, or talks on the same topic to build comprehensive understanding

  Media Note Processing Pipeline
  ================================

  [Transcript Input] --> [Speaker Identification]
          |                        |
          v                        v
  [Section Segmentation] --> [Timestamp Mapping]
          |                        |
          v                        v
  [Key Insight Extraction] --> [Quote Selection]
          |                        |
          v                        v
  [Action Item Generation] --> [Topic Tagging]
          |                        |
          v                        v
  [Cross-Source Linking] ----> [Formatted Export]

Workflow Examples

Scenario 1: YouTube conference talk notes

Input:  "Take notes on this React Conf 2025 talk transcript.
         Speaker: Dan Abramov. Topic: Server Components."
         [pastes 30-minute talk transcript]
Output: Conference Talk Notes: React Server Components Deep Dive
        Speaker: Dan Abramov | Event: React Conf 2025 | 30 min

        Sections:
        [00:00-05:30] Problem statement: client-server waterfall
        [05:30-14:20] Server Components architecture
        [14:20-22:45] Migration strategy from client components
        [22:45-28:00] Performance benchmarks and case studies
        [28:00-30:00] Q&A highlights

        Key Technical Points:
        1. Server Components eliminate client-server waterfalls by
           moving data fetching to the component tree server-side
        2. Bundle size reduction: 30-40% typical for content-heavy pages
        3. Migration path: start with leaf components, work upward

        Code examples preserved at: [14:22], [18:45], [23:10]
        Saved to: ~/.claude/media-notes/react-conf-server-components.md

Scenario 2: Interview podcast with multiple speakers

Input:  "Notes from Tim Ferriss interview with Naval Ravikant.
         Focus on actionable advice." [pastes transcript]
Output: Interview Notes: Tim Ferriss Show #571 -- Naval Ravikant
        Focus: Actionable advice and mental models

        Key exchanges:
        [08:15] Naval on decision-making:
        "If you can't decide, the answer is no. If two paths are
        equally good, take the one that's harder in the short term."

        [22:40] On building wealth vs status:
        Naval: "Seek wealth, not money or status. Wealth is assets
        that earn while you sleep."
        Tim: "How do you distinguish between the two in practice?"
        Naval: "Ask: does this scale without my time?"

        Action Items (filtered for actionable):
        [ ] Apply the "hell yes or no" decision framework this week
        [ ] Audit income streams: which require your time vs which don't
        [ ] Read "Almanack of Naval Ravikant" for deeper framework

Scenario 3: Batch processing a podcast series

Input:  "I have transcripts from episodes 1-5 of the Huberman Lab
         series on sleep. Process all and create a synthesis."
Output: Batch Processing: Huberman Lab -- Sleep Series (5 episodes)

        Individual notes saved:
        - huberman-sleep-ep1-circadian-rhythm.md
        - huberman-sleep-ep2-light-exposure.md
        - huberman-sleep-ep3-supplements.md
        - huberman-sleep-ep4-temperature.md
        - huberman-sleep-ep5-protocols.md

        Cross-Episode Synthesis:
        Core protocols mentioned across multiple episodes:
        1. Morning sunlight exposure within 30 min of waking (eps 1,2,5)
        2. Temperature drop of 1-3 degrees for sleep onset (eps 1,4,5)
        3. No caffeine within 8 hours of bedtime (eps 3,5)
        4. Consistent wake time > consistent bedtime (eps 1,5)

        Contradictions/nuances flagged:
        - Melatonin: recommended in ep3 at low dose, cautioned in ep5
        Master protocol checklist generated from all 5 episodes.

Best Practices

Provide speaker names upfront -- transcripts often label speakers as "Speaker 1" or "Host." Providing actual names before processing produces much more useful notes with proper attribution.
Specify your interest focus -- a 2-hour podcast contains many tangents. Telling the skill what you care about ("focus on the marketing strategies" or "skip the personal anecdotes") produces more targeted, useful notes.
Process content the same day you consume it -- if you listened to a podcast and want notes, process the transcript while your memory of the discussion is fresh. This lets you verify the notes capture what you found valuable.
Use cross-source linking for research topics -- when learning about a topic from multiple podcasts or videos, process all transcripts and request a synthesis. The cross-referencing reveals consensus, contradictions, and comprehensive coverage.
Archive with consistent naming conventions -- use a naming pattern like source-topic-date.md to keep your media notes directory searchable. The skill auto-generates filenames but you can override them for consistency with your system.

Common Issues

Issue: Auto-generated transcripts have many errors Solution: Focus on the structural summary rather than exact quotes when working with auto-generated transcripts. Flag quotes you plan to use and verify them against the original audio. For critical content, consider using Otter.ai or a professional transcription service for higher accuracy.

Issue: Very long transcripts (2+ hours) produce overwhelming notes Solution: Use summary_depth: brief for initial processing to get a high-level overview. Then request detailed notes for only the sections that interest you. This two-pass approach is much more efficient than comprehensive notes for content you may only partially care about.

Issue: Speaker identification is inconsistent in the transcript Solution: Provide a speaker_names mapping in your configuration that maps transcript labels to real names. If the transcript lacks speaker labels entirely, describe the speakers ("the host has a deeper voice, the guest is female") and the skill will attempt consistent attribution based on context clues in the text.

Privacy & Data Handling

All transcripts, generated notes, quotes, and summaries are stored locally in your storage_path directory (default: ~/.claude/media-notes/). Transcripts you paste into the conversation are processed in the session context and saved only to your local machine. No content is uploaded to YouTube, podcast platforms, or transcription services. YouTube timestamps and links are generated from information you provide and do not involve API calls to YouTube. Otter.ai transcripts are imported from your locally exported files. The skill does not record, download, or stream any audio or video content -- it works exclusively with text transcripts that you provide.

⚠️ Loading Issue

Podcast & Video Note Taker

Podcast & Video Note Taker

Supported Platforms & Integrations

When to Use

Quick Start

Advanced Configuration

Core Concepts

Workflow Examples

Best Practices

Common Issues

Privacy & Data Handling

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace