S

Speech Complete

Streamline your workflow with this user, asks, text, speech. Includes structured workflows, validation checks, and reusable patterns for media.

SkillClipticsmediav1.0.0MIT
0 views0 copies

Speech Complete

A comprehensive skill for generating spoken audio from text — covering text-to-speech synthesis, voice selection, audio file management, narration workflows, and integration with AI voice generation APIs for product demos, accessibility, and voiceover content.

When to Use This Skill

Choose Speech Complete when you need to:

  • Convert text to natural-sounding speech audio
  • Generate voiceovers for product demos or tutorials
  • Create accessible audio versions of written content
  • Build IVR prompts or automated phone system messages
  • Produce narration for presentations or videos

Consider alternatives when:

  • You need speech-to-text transcription (use a transcription skill)
  • You need real-time voice chat (use a voice communication skill)
  • You need music or sound effects (use an audio generation skill)

Quick Start

# Generate speech with OpenAI TTS pip install openai
from openai import OpenAI from pathlib import Path client = OpenAI() def generate_speech(text, output_path, voice="alloy", model="tts-1"): """Generate speech audio from text.""" response = client.audio.speech.create( model=model, voice=voice, input=text, ) response.stream_to_file(output_path) print(f"Generated: {output_path}") # Simple narration generate_speech( "Welcome to CodeFlow. Let me show you how AI-powered " "code review can catch bugs before your teammates do.", "output/intro.mp3", voice="nova" )
# Using CLI tools # macOS built-in TTS say -o output.aiff "Hello, welcome to the demo" # Convert to MP3 ffmpeg -i output.aiff -codec:a libmp3lame -qscale:a 2 output.mp3 # espeak (Linux) espeak "Hello, welcome to the demo" -w output.wav

Core Concepts

Voice Options

VoiceCharacteristicsBest For
alloyNeutral, balancedGeneral narration
echoWarm, conversationalStorytelling, podcasts
fableBritish, authoritativeDocumentation, tutorials
novaFriendly, energeticProduct demos, marketing
onyxDeep, professionalCorporate, announcements
shimmerSoft, approachableMeditation, gentle content

Audio Generation Pipeline

import os from openai import OpenAI from pathlib import Path client = OpenAI() def generate_narration(script, output_dir, voice="nova"): """Generate multi-segment narration from a script.""" Path(output_dir).mkdir(exist_ok=True) segments = [] for i, section in enumerate(script): filename = f"{i+1:02d}_{section['id']}.mp3" output_path = os.path.join(output_dir, filename) response = client.audio.speech.create( model="tts-1-hd", # Higher quality model voice=voice, input=section["text"], speed=section.get("speed", 1.0), ) response.stream_to_file(output_path) segments.append(output_path) print(f" Generated: {filename}") return segments # Multi-section script script = [ {"id": "intro", "text": "Welcome to our product demo.", "speed": 0.95}, {"id": "feature-1", "text": "First, let me show you the dashboard."}, {"id": "feature-2", "text": "Next, the real-time analytics view."}, {"id": "outro", "text": "Thank you for watching. Try it free today.", "speed": 0.9}, ] segments = generate_narration(script, "./narration", voice="nova")

Audio Concatenation

# Concatenate segments with ffmpeg # Create a file list cat > segments.txt << EOF file 'narration/01_intro.mp3' file 'narration/02_feature-1.mp3' file 'narration/03_feature-2.mp3' file 'narration/04_outro.mp3' EOF # Concatenate ffmpeg -f concat -safe 0 -i segments.txt -c copy output/full_narration.mp3 # Add silence between segments (1 second gap) ffmpeg -f concat -safe 0 -i segments.txt \ -af "apad=pad_dur=1" -c:a libmp3lame output/narration_with_gaps.mp3

Configuration

ParameterDescriptionExample
modelTTS model to use"tts-1" / "tts-1-hd"
voiceVoice selection"nova" / "alloy"
speedSpeech speed multiplier1.0 (0.25 to 4.0)
output_formatAudio output format"mp3" / "wav"
output_dirDirectory for generated files"./audio"
sample_rateAudio sample rate24000

Best Practices

  1. Use tts-1-hd for production audio, tts-1 for drafts — The HD model produces noticeably better quality but is slower and costs more. Use the standard model during script iteration and switch to HD for final production renders.

  2. Add natural pauses with punctuation — TTS models interpret punctuation as pause cues. Use periods for full stops, commas for brief pauses, and em dashes for dramatic pauses. "Welcome. Let me show you — the future of code review." sounds more natural than a run-on sentence.

  3. Break long scripts into segments — Generate each section as a separate file, then concatenate with ffmpeg. This lets you re-render individual sections without regenerating the entire narration, saving time and API costs.

  4. Test voice selection with your actual content — Different voices suit different content types. A voice that sounds great reading a casual blog post may sound wrong for a formal compliance document. Generate a short sample with each voice using your real script before committing.

  5. Normalize audio levels across segments — Different text generates audio at different volumes. Use ffmpeg's loudnorm filter to normalize all segments to a consistent loudness level before concatenation: ffmpeg -i input.mp3 -af loudnorm output.mp3.

Common Issues

Generated speech sounds robotic at sentence boundaries — TTS models sometimes lose naturalness at the start or end of clips. Add a buffer sentence that you trim in post-production, or append "..." at the end of text to prevent abrupt cutoffs.

Long passages produce audio with degraded quality — Some TTS APIs have optimal input length ranges. Break text into chunks of 2-3 paragraphs maximum. Generate each chunk separately and concatenate the results for consistent quality throughout.

Audio files are too large for web delivery — Uncompressed WAV files are huge. Convert to MP3 at 128kbps for speech (voice doesn't benefit from higher bitrates) or use Opus format for even smaller files: ffmpeg -i input.wav -c:a libopus -b:a 64k output.opus.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates