- Add intelligent-router.sh hook for automatic agent routing - Add AUTO-TRIGGER-SUMMARY.md documentation - Add FINAL-INTEGRATION-SUMMARY.md documentation - Complete Prometheus integration (6 commands + 4 tools) - Complete Dexto integration (12 commands + 5 tools) - Enhanced Ralph with access to all agents - Fix /clawd command (removed disable-model-invocation) - Update hooks.json to v5 with intelligent routing - 291 total skills now available - All 21 commands with automatic routing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
5.6 KiB
5.6 KiB
Advanced Podcast Generation Agent
An AI agent for creating multi-speaker audio content using the Gemini TTS MCP server.
Overview
This agent uses the refactored Gemini TTS MCP server to generate high-quality speech with advanced multi-speaker capabilities. It supports 30 prebuilt voices, natural language tone control, and can generate entire conversations with multiple speakers in a single request. The server now returns audio content that can be played directly in web interfaces.
Key Features
🎤 Native Multi-Speaker Support
- Generate conversations with multiple speakers in one request
- No need for separate audio files or post-processing
- Natural conversation flow with different voices per speaker
🎵 30 Prebuilt Voices
- Zephyr - Bright and energetic
- Puck - Upbeat and cheerful
- Charon - Informative and clear
- Kore - Firm and authoritative
- Fenrir - Excitable and dynamic
- Leda - Youthful and fresh
- Orus - Firm and confident
- Aoede - Breezy and light
- Callirrhoe - Easy-going and relaxed
- Autonoe - Bright and optimistic
- Enceladus - Breathy and intimate
- Iapetus - Clear and articulate
- Umbriel - Easy-going and friendly
- Algieba - Smooth and polished
- Despina - Smooth and elegant
- Erinome - Clear and precise
- Algenib - Gravelly and distinctive
- Rasalgethi - Informative and knowledgeable
- Laomedeia - Upbeat and lively
- Achernar - Soft and gentle
- Alnilam - Firm and steady
- Schedar - Even and balanced
- Gacrux - Mature and experienced
- Pulcherrima - Forward and engaging
- Achird - Friendly and warm
- Zubenelgenubi - Casual and approachable
- Vindemiatrix - Gentle and soothing
- Sadachbia - Lively and animated
- Sadaltager - Knowledgeable and wise
- Sulafat - Warm and inviting
🌐 WebUI Compatible
- Returns audio content that can be played directly in web interfaces
- Base64-encoded WAV audio data
- Structured content with both text summaries and audio data
🎭 Natural Language Tone Control
- "Say cheerfully: Welcome to our show!"
- "Speak in a formal tone: Welcome to our meeting"
- "Use an excited voice: This is amazing news!"
- "Speak slowly and clearly: This is important information"
Setup
-
Get API Keys:
export GEMINI_API_KEY="your-gemini-api-key" export OPENAI_API_KEY="your-openai-api-key" -
Run the Agent:
dexto -a agents/podcast-agent/podcast-agent.yml
The agent will automatically install the Gemini TTS MCP server from npm when needed.
Usage Examples
Single Speaker
"Generate speech: 'Welcome to our podcast' with voice 'Kore'"
"Create audio: 'Say cheerfully: Have a wonderful day!' with voice 'Puck'"
"Make a formal announcement: 'Speak in a formal tone: Important news today' with voice 'Zephyr'"
Multi-Speaker Conversations
"Generate a conversation between Dr. Anya (voice: Kore) and Liam (voice: Puck) about AI"
"Create an interview with host (voice: Zephyr) and guest (voice: Orus) discussing climate change"
"Make a story with narrator (voice: Schedar) and character (voice: Laomedeia)"
"Generate a podcast with three speakers: host (Zephyr), expert (Kore), and interviewer (Puck)"
Podcast Types
"Create an educational podcast about AI with clear, professional voices"
"Generate a storytelling podcast with expressive character voices"
"Make a news podcast with authoritative, formal delivery"
"Create an interview with host and guest using different voices"
Available Tools
Gemini TTS Tools
generate_speech- Single-speaker audio generationgenerate_conversation- Multi-speaker conversationslist_voices- Browse available voices with characteristics
File Management
list_files- Browse audio filesread_file- Access file informationwrite_file- Save generated contentdelete_file- Clean up files
Voice Selection Guide
Professional Voices
- Kore - Firm, authoritative (great for hosts, experts)
- Orus - Firm, professional (business content)
- Zephyr - Bright, engaging (news, announcements)
- Schedar - Even, balanced (narrators, guides)
Expressive Voices
- Puck - Upbeat, enthusiastic (entertainment, stories)
- Laomedeia - Upbeat, energetic (dynamic content)
- Fenrir - Excitable, passionate (exciting topics)
- Achird - Friendly, warm (casual conversations)
Character Voices
- Umbriel - Easy-going, relaxed (casual hosts)
- Erinome - Clear, articulate (educational content)
- Autonoe - Bright, optimistic (positive content)
- Leda - Youthful, fresh (younger audiences)
Multi-Speaker Configuration
Example Speaker Setup
{
"speakers": [
{
"name": "Dr. Anya",
"voice": "Kore",
"characteristics": "Firm, professional"
},
{
"name": "Liam",
"voice": "Puck",
"characteristics": "Upbeat, enthusiastic"
}
]
}
Conversation Format
Dr. Anya: Welcome to our science podcast!
Liam: Thanks for having me, Dr. Anya!
Dr. Anya: Today we're discussing artificial intelligence.
Liam: It's such an exciting field!
Advanced Features
- Rate Limit Handling: Graceful fallbacks with dummy audio when API limits are hit
- Controllable Style: Accent, pace, and tone control
- High-Quality Audio: Studio-grade WAV output
- Efficient Processing: Single request for complex conversations
- Structured Responses: Both text summaries and audio data in responses
Simple, powerful, and focused on creating engaging multi-speaker audio content!