Speech Complete

A comprehensive skill for generating spoken audio from text — covering text-to-speech synthesis, voice selection, audio file management, narration workflows, and integration with AI voice generation APIs for product demos, accessibility, and voiceover content.

When to Use This Skill

Choose Speech Complete when you need to:

Convert text to natural-sounding speech audio
Generate voiceovers for product demos or tutorials
Create accessible audio versions of written content
Build IVR prompts or automated phone system messages
Produce narration for presentations or videos

Consider alternatives when:

You need speech-to-text transcription (use a transcription skill)
You need real-time voice chat (use a voice communication skill)
You need music or sound effects (use an audio generation skill)

Quick Start


# Generate speech with OpenAI TTS
pip install openai


from openai import OpenAI
from pathlib import Path

client = OpenAI()

def generate_speech(text, output_path, voice="alloy", model="tts-1"):
    """Generate speech audio from text."""
    response = client.audio.speech.create(
        model=model,
        voice=voice,
        input=text,
    )
    response.stream_to_file(output_path)
    print(f"Generated: {output_path}")

# Simple narration
generate_speech(
    "Welcome to CodeFlow. Let me show you how AI-powered "
    "code review can catch bugs before your teammates do.",
    "output/intro.mp3",
    voice="nova"
)


# Using CLI tools
# macOS built-in TTS
say -o output.aiff "Hello, welcome to the demo"

# Convert to MP3
ffmpeg -i output.aiff -codec:a libmp3lame -qscale:a 2 output.mp3

# espeak (Linux)
espeak "Hello, welcome to the demo" -w output.wav

Core Concepts

Voice Options

Voice	Characteristics	Best For
alloy	Neutral, balanced	General narration
echo	Warm, conversational	Storytelling, podcasts
fable	British, authoritative	Documentation, tutorials
nova	Friendly, energetic	Product demos, marketing
onyx	Deep, professional	Corporate, announcements
shimmer	Soft, approachable	Meditation, gentle content

Audio Generation Pipeline


import os
from openai import OpenAI
from pathlib import Path

client = OpenAI()

def generate_narration(script, output_dir, voice="nova"):
    """Generate multi-segment narration from a script."""
    Path(output_dir).mkdir(exist_ok=True)
    segments = []

    for i, section in enumerate(script):
        filename = f"{i+1:02d}_{section['id']}.mp3"
        output_path = os.path.join(output_dir, filename)

        response = client.audio.speech.create(
            model="tts-1-hd",  # Higher quality model
            voice=voice,
            input=section["text"],
            speed=section.get("speed", 1.0),
        )
        response.stream_to_file(output_path)
        segments.append(output_path)
        print(f"  Generated: {filename}")

    return segments

# Multi-section script
script = [
    {"id": "intro", "text": "Welcome to our product demo.", "speed": 0.95},
    {"id": "feature-1", "text": "First, let me show you the dashboard."},
    {"id": "feature-2", "text": "Next, the real-time analytics view."},
    {"id": "outro", "text": "Thank you for watching. Try it free today.", "speed": 0.9},
]

segments = generate_narration(script, "./narration", voice="nova")

Audio Concatenation


# Concatenate segments with ffmpeg
# Create a file list
cat > segments.txt << EOF
file 'narration/01_intro.mp3'
file 'narration/02_feature-1.mp3'
file 'narration/03_feature-2.mp3'
file 'narration/04_outro.mp3'
EOF

# Concatenate
ffmpeg -f concat -safe 0 -i segments.txt -c copy output/full_narration.mp3

# Add silence between segments (1 second gap)
ffmpeg -f concat -safe 0 -i segments.txt \
  -af "apad=pad_dur=1" -c:a libmp3lame output/narration_with_gaps.mp3

Configuration

Parameter	Description	Example
`model`	TTS model to use	`"tts-1"` / `"tts-1-hd"`
`voice`	Voice selection	`"nova"` / `"alloy"`
`speed`	Speech speed multiplier	`1.0` (0.25 to 4.0)
`output_format`	Audio output format	`"mp3"` / `"wav"`
`output_dir`	Directory for generated files	`"./audio"`
`sample_rate`	Audio sample rate	`24000`

Best Practices

Use tts-1-hd for production audio, tts-1 for drafts — The HD model produces noticeably better quality but is slower and costs more. Use the standard model during script iteration and switch to HD for final production renders.
Add natural pauses with punctuation — TTS models interpret punctuation as pause cues. Use periods for full stops, commas for brief pauses, and em dashes for dramatic pauses. "Welcome. Let me show you — the future of code review." sounds more natural than a run-on sentence.
Break long scripts into segments — Generate each section as a separate file, then concatenate with ffmpeg. This lets you re-render individual sections without regenerating the entire narration, saving time and API costs.
Test voice selection with your actual content — Different voices suit different content types. A voice that sounds great reading a casual blog post may sound wrong for a formal compliance document. Generate a short sample with each voice using your real script before committing.
Normalize audio levels across segments — Different text generates audio at different volumes. Use ffmpeg's loudnorm filter to normalize all segments to a consistent loudness level before concatenation: ffmpeg -i input.mp3 -af loudnorm output.mp3.

Common Issues

Generated speech sounds robotic at sentence boundaries — TTS models sometimes lose naturalness at the start or end of clips. Add a buffer sentence that you trim in post-production, or append "..." at the end of text to prevent abrupt cutoffs.

Long passages produce audio with degraded quality — Some TTS APIs have optimal input length ranges. Break text into chunks of 2-3 paragraphs maximum. Generate each chunk separately and concatenate the results for consistent quality throughout.

Audio files are too large for web delivery — Uncompressed WAV files are huge. Convert to MP3 at 128kbps for speech (voice doesn't benefit from higher bitrates) or use Opus format for even smaller files: ffmpeg -i input.wav -c:a libopus -b:a 64k output.opus.

⚠️ Loading Issue

Speech Complete

Speech Complete

When to Use This Skill

Quick Start

Core Concepts

Voice Options

Audio Generation Pipeline

Audio Concatenation

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace