Speech Complete
Streamline your workflow with this user, asks, text, speech. Includes structured workflows, validation checks, and reusable patterns for media.
Speech Complete
A comprehensive skill for generating spoken audio from text — covering text-to-speech synthesis, voice selection, audio file management, narration workflows, and integration with AI voice generation APIs for product demos, accessibility, and voiceover content.
When to Use This Skill
Choose Speech Complete when you need to:
- Convert text to natural-sounding speech audio
- Generate voiceovers for product demos or tutorials
- Create accessible audio versions of written content
- Build IVR prompts or automated phone system messages
- Produce narration for presentations or videos
Consider alternatives when:
- You need speech-to-text transcription (use a transcription skill)
- You need real-time voice chat (use a voice communication skill)
- You need music or sound effects (use an audio generation skill)
Quick Start
# Generate speech with OpenAI TTS pip install openai
from openai import OpenAI from pathlib import Path client = OpenAI() def generate_speech(text, output_path, voice="alloy", model="tts-1"): """Generate speech audio from text.""" response = client.audio.speech.create( model=model, voice=voice, input=text, ) response.stream_to_file(output_path) print(f"Generated: {output_path}") # Simple narration generate_speech( "Welcome to CodeFlow. Let me show you how AI-powered " "code review can catch bugs before your teammates do.", "output/intro.mp3", voice="nova" )
# Using CLI tools # macOS built-in TTS say -o output.aiff "Hello, welcome to the demo" # Convert to MP3 ffmpeg -i output.aiff -codec:a libmp3lame -qscale:a 2 output.mp3 # espeak (Linux) espeak "Hello, welcome to the demo" -w output.wav
Core Concepts
Voice Options
| Voice | Characteristics | Best For |
|---|---|---|
| alloy | Neutral, balanced | General narration |
| echo | Warm, conversational | Storytelling, podcasts |
| fable | British, authoritative | Documentation, tutorials |
| nova | Friendly, energetic | Product demos, marketing |
| onyx | Deep, professional | Corporate, announcements |
| shimmer | Soft, approachable | Meditation, gentle content |
Audio Generation Pipeline
import os from openai import OpenAI from pathlib import Path client = OpenAI() def generate_narration(script, output_dir, voice="nova"): """Generate multi-segment narration from a script.""" Path(output_dir).mkdir(exist_ok=True) segments = [] for i, section in enumerate(script): filename = f"{i+1:02d}_{section['id']}.mp3" output_path = os.path.join(output_dir, filename) response = client.audio.speech.create( model="tts-1-hd", # Higher quality model voice=voice, input=section["text"], speed=section.get("speed", 1.0), ) response.stream_to_file(output_path) segments.append(output_path) print(f" Generated: {filename}") return segments # Multi-section script script = [ {"id": "intro", "text": "Welcome to our product demo.", "speed": 0.95}, {"id": "feature-1", "text": "First, let me show you the dashboard."}, {"id": "feature-2", "text": "Next, the real-time analytics view."}, {"id": "outro", "text": "Thank you for watching. Try it free today.", "speed": 0.9}, ] segments = generate_narration(script, "./narration", voice="nova")
Audio Concatenation
# Concatenate segments with ffmpeg # Create a file list cat > segments.txt << EOF file 'narration/01_intro.mp3' file 'narration/02_feature-1.mp3' file 'narration/03_feature-2.mp3' file 'narration/04_outro.mp3' EOF # Concatenate ffmpeg -f concat -safe 0 -i segments.txt -c copy output/full_narration.mp3 # Add silence between segments (1 second gap) ffmpeg -f concat -safe 0 -i segments.txt \ -af "apad=pad_dur=1" -c:a libmp3lame output/narration_with_gaps.mp3
Configuration
| Parameter | Description | Example |
|---|---|---|
model | TTS model to use | "tts-1" / "tts-1-hd" |
voice | Voice selection | "nova" / "alloy" |
speed | Speech speed multiplier | 1.0 (0.25 to 4.0) |
output_format | Audio output format | "mp3" / "wav" |
output_dir | Directory for generated files | "./audio" |
sample_rate | Audio sample rate | 24000 |
Best Practices
-
Use
tts-1-hdfor production audio,tts-1for drafts — The HD model produces noticeably better quality but is slower and costs more. Use the standard model during script iteration and switch to HD for final production renders. -
Add natural pauses with punctuation — TTS models interpret punctuation as pause cues. Use periods for full stops, commas for brief pauses, and em dashes for dramatic pauses. "Welcome. Let me show you — the future of code review." sounds more natural than a run-on sentence.
-
Break long scripts into segments — Generate each section as a separate file, then concatenate with ffmpeg. This lets you re-render individual sections without regenerating the entire narration, saving time and API costs.
-
Test voice selection with your actual content — Different voices suit different content types. A voice that sounds great reading a casual blog post may sound wrong for a formal compliance document. Generate a short sample with each voice using your real script before committing.
-
Normalize audio levels across segments — Different text generates audio at different volumes. Use ffmpeg's
loudnormfilter to normalize all segments to a consistent loudness level before concatenation:ffmpeg -i input.mp3 -af loudnorm output.mp3.
Common Issues
Generated speech sounds robotic at sentence boundaries — TTS models sometimes lose naturalness at the start or end of clips. Add a buffer sentence that you trim in post-production, or append "..." at the end of text to prevent abrupt cutoffs.
Long passages produce audio with degraded quality — Some TTS APIs have optimal input length ranges. Break text into chunks of 2-3 paragraphs maximum. Generate each chunk separately and concatenate the results for consistent quality throughout.
Audio files are too large for web delivery — Uncompressed WAV files are huge. Convert to MP3 at 128kbps for speech (voice doesn't benefit from higher bitrates) or use Opus format for even smaller files: ffmpeg -i input.wav -c:a libopus -b:a 64k output.opus.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.