P

Podcast & Video Note Taker

Summarizes podcasts and videos into structured notes with timestamps, key quotes, and action items from transcripts

SkillClipticslearning educationv1.0.0MIT
0 views0 copies

Podcast & Video Note Taker

Summarize podcasts and videos into structured notes with timestamps, key quotes, and action items from transcripts. This skill transforms long-form audio and video content into scannable, reference-friendly notes that capture the essential insights without requiring you to re-listen. Provide a transcript or paste content, and it produces organized notes with timestamped sections, attributed quotes, concept summaries, and actionable takeaways.

Supported Platforms & Integrations

PlatformIntegration TypeFeatures
YouTubeTranscript extractionProcess auto-generated or manual captions into structured notes with video timestamps
Spotify (via transcript)Text transcript parsingParse Spotify podcast transcripts when available through the app's transcript feature
Apple PodcastsTranscript file importImport transcript files from Apple Podcasts' built-in transcript feature
ObsidianMarkdown with YAMLSave notes as Obsidian-compatible pages with frontmatter, tags, and backlinks
NotionMarkdown databaseExport notes as Notion database entries with properties for search and filtering
Otter.aiTranscript exportImport AI-generated transcripts from Otter.ai recordings with speaker labels

When to Use

  • Podcast backlog processing -- have a queue of saved podcast episodes? Process transcripts into 5-minute-read summaries to decide which deserve full listening
  • YouTube educational content -- extract structured notes from tutorial videos, conference talks, and lecture recordings with timestamps for key moments
  • Meeting recording summarization -- turn recorded meetings (via Otter.ai or similar) into action-item-focused notes with speaker attribution
  • Interview content extraction -- pull key quotes, claims, and insights from interview-format podcasts for research or content creation
  • Course lecture notes -- process recorded university lectures or online course videos into study-ready notes with concept maps
  • Content research -- when researching a topic across multiple podcasts/videos, create standardized notes for cross-source comparison

Alternatives to consider:

  • For real-time transcription during live meetings, Otter.ai or Fireflies.ai provide live capture with speaker diarization
  • For YouTube-specific AI summaries, tools like Summarize.tech or Eightify generate instant video summaries
  • For podcast-specific notes with audio highlights, Snipd captures and organizes audio clips alongside transcripts

Quick Start

# podcast-video-note-taker.yaml skill: name: podcast-video-note-taker description: Summarize podcasts and videos into structured notes config: default_source: youtube # youtube, podcast, transcript_file summary_depth: detailed # brief, detailed, comprehensive include_timestamps: true include_quotes: true max_quotes: 8 include_action_items: true speaker_attribution: true output_format: obsidian # obsidian, notion, markdown, json storage_path: ~/.claude/media-notes/

Example interaction:

User: Summarize this podcast transcript about productivity systems.
      [pastes transcript of a 45-minute episode]

Assistant: Podcast Notes: "The Productivity Equation" -- EP 147
  Host: Cal Newport | Guest: James Clear | Duration: 45 min

  Key Topics:
  [03:20] Why most productivity systems fail
  [12:45] The relationship between habits and systems
  [24:10] Environment design for deep work
  [35:30] Handling digital distractions
  [41:00] Actionable takeaways and book recommendations

  Core Insights:
  1. Systems beat goals because they provide daily direction
     without the binary pass/fail of goal-based thinking.
  2. Environment design is the highest-leverage productivity
     intervention -- change your space, change your behavior.
  3. Digital minimalism is not about quitting technology but
     about being intentional with tool selection.

  Key Quotes:
  - James Clear [14:22]: "You don't rise to the level of your
    goals, you fall to the level of your systems."
  - Cal Newport [26:15]: "Deep work is not about working more
    hours. It's about working fewer hours with greater intensity."

  Action Items:
  [ ] Audit current digital tools -- keep only those serving
      clear purposes
  [ ] Designate a physical deep work space separate from
      casual computing
  [ ] Implement time blocking for the next 2 weeks as an experiment

  Saved to: ~/.claude/media-notes/productivity-equation-ep147.md

Advanced Configuration

Platform-specific setup for YouTube transcript processing:

youtube: transcript_method: auto # auto, manual, paste language: en # transcript language preference include_chapter_markers: true # use YouTube chapter markers if available timestamp_format: "MM:SS" # MM:SS or HH:MM:SS link_timestamps: true # create clickable YouTube timestamp links thumbnail_download: false # download video thumbnail for notes

Full parameter reference:

ParameterTypeDefaultDescription
default_sourcestringyoutubeDefault source type: youtube, podcast, transcript_file, otter
summary_depthstringdetailedOutput depth: brief (300w), detailed (800w), comprehensive (1500w)
include_timestampsbooleantrueAdd timestamps to section headers and key moments
include_quotesbooleantrueExtract and attribute notable quotes
max_quotesnumber8Maximum quotes to include per summary
include_action_itemsbooleantrueExtract actionable takeaways
speaker_attributionbooleantrueAttribute quotes and points to specific speakers
speaker_namesobject{}Map speaker labels to real names
output_formatstringmarkdownFormat: obsidian, notion, markdown, json
include_topic_tagsbooleantrueAuto-generate topic tags for the content
link_related_notesbooleantrueCross-reference with previously processed content
storage_pathstring~/.claude/media-notes/Local directory for all notes
content_type_hintstringautoContent hint: interview, lecture, tutorial, discussion, monologue

Core Concepts

ConceptDescription
Timestamped SectioningDividing content into logical sections with start timestamps so you can jump to specific parts of the original recording
Speaker DiarizationIdentifying and labeling which speaker said what throughout the transcript for accurate attribution
Insight Density ScoreRating of how information-rich each section is, helping you identify which parts are worth re-listening to vs skipping
Progressive SummarizationLayered note structure: full transcript reference, section summaries, top insights, and single-line takeaway
Cross-Source LinkingConnecting insights across multiple episodes, videos, or talks on the same topic to build comprehensive understanding
  Media Note Processing Pipeline
  ================================

  [Transcript Input] --> [Speaker Identification]
          |                        |
          v                        v
  [Section Segmentation] --> [Timestamp Mapping]
          |                        |
          v                        v
  [Key Insight Extraction] --> [Quote Selection]
          |                        |
          v                        v
  [Action Item Generation] --> [Topic Tagging]
          |                        |
          v                        v
  [Cross-Source Linking] ----> [Formatted Export]

Workflow Examples

Scenario 1: YouTube conference talk notes

Input:  "Take notes on this React Conf 2025 talk transcript.
         Speaker: Dan Abramov. Topic: Server Components."
         [pastes 30-minute talk transcript]
Output: Conference Talk Notes: React Server Components Deep Dive
        Speaker: Dan Abramov | Event: React Conf 2025 | 30 min

        Sections:
        [00:00-05:30] Problem statement: client-server waterfall
        [05:30-14:20] Server Components architecture
        [14:20-22:45] Migration strategy from client components
        [22:45-28:00] Performance benchmarks and case studies
        [28:00-30:00] Q&A highlights

        Key Technical Points:
        1. Server Components eliminate client-server waterfalls by
           moving data fetching to the component tree server-side
        2. Bundle size reduction: 30-40% typical for content-heavy pages
        3. Migration path: start with leaf components, work upward

        Code examples preserved at: [14:22], [18:45], [23:10]
        Saved to: ~/.claude/media-notes/react-conf-server-components.md

Scenario 2: Interview podcast with multiple speakers

Input:  "Notes from Tim Ferriss interview with Naval Ravikant.
         Focus on actionable advice." [pastes transcript]
Output: Interview Notes: Tim Ferriss Show #571 -- Naval Ravikant
        Focus: Actionable advice and mental models

        Key exchanges:
        [08:15] Naval on decision-making:
        "If you can't decide, the answer is no. If two paths are
        equally good, take the one that's harder in the short term."

        [22:40] On building wealth vs status:
        Naval: "Seek wealth, not money or status. Wealth is assets
        that earn while you sleep."
        Tim: "How do you distinguish between the two in practice?"
        Naval: "Ask: does this scale without my time?"

        Action Items (filtered for actionable):
        [ ] Apply the "hell yes or no" decision framework this week
        [ ] Audit income streams: which require your time vs which don't
        [ ] Read "Almanack of Naval Ravikant" for deeper framework

Scenario 3: Batch processing a podcast series

Input:  "I have transcripts from episodes 1-5 of the Huberman Lab
         series on sleep. Process all and create a synthesis."
Output: Batch Processing: Huberman Lab -- Sleep Series (5 episodes)

        Individual notes saved:
        - huberman-sleep-ep1-circadian-rhythm.md
        - huberman-sleep-ep2-light-exposure.md
        - huberman-sleep-ep3-supplements.md
        - huberman-sleep-ep4-temperature.md
        - huberman-sleep-ep5-protocols.md

        Cross-Episode Synthesis:
        Core protocols mentioned across multiple episodes:
        1. Morning sunlight exposure within 30 min of waking (eps 1,2,5)
        2. Temperature drop of 1-3 degrees for sleep onset (eps 1,4,5)
        3. No caffeine within 8 hours of bedtime (eps 3,5)
        4. Consistent wake time > consistent bedtime (eps 1,5)

        Contradictions/nuances flagged:
        - Melatonin: recommended in ep3 at low dose, cautioned in ep5
        Master protocol checklist generated from all 5 episodes.

Best Practices

  1. Provide speaker names upfront -- transcripts often label speakers as "Speaker 1" or "Host." Providing actual names before processing produces much more useful notes with proper attribution.

  2. Specify your interest focus -- a 2-hour podcast contains many tangents. Telling the skill what you care about ("focus on the marketing strategies" or "skip the personal anecdotes") produces more targeted, useful notes.

  3. Process content the same day you consume it -- if you listened to a podcast and want notes, process the transcript while your memory of the discussion is fresh. This lets you verify the notes capture what you found valuable.

  4. Use cross-source linking for research topics -- when learning about a topic from multiple podcasts or videos, process all transcripts and request a synthesis. The cross-referencing reveals consensus, contradictions, and comprehensive coverage.

  5. Archive with consistent naming conventions -- use a naming pattern like source-topic-date.md to keep your media notes directory searchable. The skill auto-generates filenames but you can override them for consistency with your system.

Common Issues

Issue: Auto-generated transcripts have many errors Solution: Focus on the structural summary rather than exact quotes when working with auto-generated transcripts. Flag quotes you plan to use and verify them against the original audio. For critical content, consider using Otter.ai or a professional transcription service for higher accuracy.

Issue: Very long transcripts (2+ hours) produce overwhelming notes Solution: Use summary_depth: brief for initial processing to get a high-level overview. Then request detailed notes for only the sections that interest you. This two-pass approach is much more efficient than comprehensive notes for content you may only partially care about.

Issue: Speaker identification is inconsistent in the transcript Solution: Provide a speaker_names mapping in your configuration that maps transcript labels to real names. If the transcript lacks speaker labels entirely, describe the speakers ("the host has a deeper voice, the guest is female") and the skill will attempt consistent attribution based on context clues in the text.

Privacy & Data Handling

All transcripts, generated notes, quotes, and summaries are stored locally in your storage_path directory (default: ~/.claude/media-notes/). Transcripts you paste into the conversation are processed in the session context and saved only to your local machine. No content is uploaded to YouTube, podcast platforms, or transcription services. YouTube timestamps and links are generated from information you provide and do not involve API calls to YouTube. Otter.ai transcripts are imported from your locally exported files. The skill does not record, download, or stream any audio or video content -- it works exclusively with text transcripts that you provide.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates