- Add intelligent-router.sh hook for automatic agent routing - Add AUTO-TRIGGER-SUMMARY.md documentation - Add FINAL-INTEGRATION-SUMMARY.md documentation - Complete Prometheus integration (6 commands + 4 tools) - Complete Dexto integration (12 commands + 5 tools) - Enhanced Ralph with access to all agents - Fix /clawd command (removed disable-model-invocation) - Update hooks.json to v5 with intelligent routing - 291 total skills now available - All 21 commands with automatic routing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
168 lines
5.6 KiB
Markdown
168 lines
5.6 KiB
Markdown
# Advanced Podcast Generation Agent
|
|
|
|
An AI agent for creating multi-speaker audio content using the Gemini TTS MCP server.
|
|
|
|
## Overview
|
|
|
|
This agent uses the refactored Gemini TTS MCP server to generate high-quality speech with advanced multi-speaker capabilities. It supports 30 prebuilt voices, natural language tone control, and can generate entire conversations with multiple speakers in a single request. The server now returns audio content that can be played directly in web interfaces.
|
|
|
|
## Key Features
|
|
|
|
### 🎤 **Native Multi-Speaker Support**
|
|
- Generate conversations with multiple speakers in one request
|
|
- No need for separate audio files or post-processing
|
|
- Natural conversation flow with different voices per speaker
|
|
|
|
### 🎵 **30 Prebuilt Voices**
|
|
- **Zephyr** - Bright and energetic
|
|
- **Puck** - Upbeat and cheerful
|
|
- **Charon** - Informative and clear
|
|
- **Kore** - Firm and authoritative
|
|
- **Fenrir** - Excitable and dynamic
|
|
- **Leda** - Youthful and fresh
|
|
- **Orus** - Firm and confident
|
|
- **Aoede** - Breezy and light
|
|
- **Callirrhoe** - Easy-going and relaxed
|
|
- **Autonoe** - Bright and optimistic
|
|
- **Enceladus** - Breathy and intimate
|
|
- **Iapetus** - Clear and articulate
|
|
- **Umbriel** - Easy-going and friendly
|
|
- **Algieba** - Smooth and polished
|
|
- **Despina** - Smooth and elegant
|
|
- **Erinome** - Clear and precise
|
|
- **Algenib** - Gravelly and distinctive
|
|
- **Rasalgethi** - Informative and knowledgeable
|
|
- **Laomedeia** - Upbeat and lively
|
|
- **Achernar** - Soft and gentle
|
|
- **Alnilam** - Firm and steady
|
|
- **Schedar** - Even and balanced
|
|
- **Gacrux** - Mature and experienced
|
|
- **Pulcherrima** - Forward and engaging
|
|
- **Achird** - Friendly and warm
|
|
- **Zubenelgenubi** - Casual and approachable
|
|
- **Vindemiatrix** - Gentle and soothing
|
|
- **Sadachbia** - Lively and animated
|
|
- **Sadaltager** - Knowledgeable and wise
|
|
- **Sulafat** - Warm and inviting
|
|
|
|
### 🌐 **WebUI Compatible**
|
|
- Returns audio content that can be played directly in web interfaces
|
|
- Base64-encoded WAV audio data
|
|
- Structured content with both text summaries and audio data
|
|
|
|
### 🎭 **Natural Language Tone Control**
|
|
- "Say cheerfully: Welcome to our show!"
|
|
- "Speak in a formal tone: Welcome to our meeting"
|
|
- "Use an excited voice: This is amazing news!"
|
|
- "Speak slowly and clearly: This is important information"
|
|
|
|
## Setup
|
|
|
|
1. **Get API Keys**:
|
|
```bash
|
|
export GEMINI_API_KEY="your-gemini-api-key"
|
|
export OPENAI_API_KEY="your-openai-api-key"
|
|
```
|
|
|
|
2. **Run the Agent**:
|
|
```bash
|
|
dexto -a agents/podcast-agent/podcast-agent.yml
|
|
```
|
|
|
|
The agent will automatically install the Gemini TTS MCP server from npm when needed.
|
|
|
|
## Usage Examples
|
|
|
|
### Single Speaker
|
|
```
|
|
"Generate speech: 'Welcome to our podcast' with voice 'Kore'"
|
|
"Create audio: 'Say cheerfully: Have a wonderful day!' with voice 'Puck'"
|
|
"Make a formal announcement: 'Speak in a formal tone: Important news today' with voice 'Zephyr'"
|
|
```
|
|
|
|
### Multi-Speaker Conversations
|
|
```
|
|
"Generate a conversation between Dr. Anya (voice: Kore) and Liam (voice: Puck) about AI"
|
|
"Create an interview with host (voice: Zephyr) and guest (voice: Orus) discussing climate change"
|
|
"Make a story with narrator (voice: Schedar) and character (voice: Laomedeia)"
|
|
"Generate a podcast with three speakers: host (Zephyr), expert (Kore), and interviewer (Puck)"
|
|
```
|
|
|
|
### Podcast Types
|
|
```
|
|
"Create an educational podcast about AI with clear, professional voices"
|
|
"Generate a storytelling podcast with expressive character voices"
|
|
"Make a news podcast with authoritative, formal delivery"
|
|
"Create an interview with host and guest using different voices"
|
|
```
|
|
|
|
## Available Tools
|
|
|
|
### **Gemini TTS Tools**
|
|
- `generate_speech` - Single-speaker audio generation
|
|
- `generate_conversation` - Multi-speaker conversations
|
|
- `list_voices` - Browse available voices with characteristics
|
|
|
|
### **File Management**
|
|
- `list_files` - Browse audio files
|
|
- `read_file` - Access file information
|
|
- `write_file` - Save generated content
|
|
- `delete_file` - Clean up files
|
|
|
|
## Voice Selection Guide
|
|
|
|
### **Professional Voices**
|
|
- **Kore** - Firm, authoritative (great for hosts, experts)
|
|
- **Orus** - Firm, professional (business content)
|
|
- **Zephyr** - Bright, engaging (news, announcements)
|
|
- **Schedar** - Even, balanced (narrators, guides)
|
|
|
|
### **Expressive Voices**
|
|
- **Puck** - Upbeat, enthusiastic (entertainment, stories)
|
|
- **Laomedeia** - Upbeat, energetic (dynamic content)
|
|
- **Fenrir** - Excitable, passionate (exciting topics)
|
|
- **Achird** - Friendly, warm (casual conversations)
|
|
|
|
### **Character Voices**
|
|
- **Umbriel** - Easy-going, relaxed (casual hosts)
|
|
- **Erinome** - Clear, articulate (educational content)
|
|
- **Autonoe** - Bright, optimistic (positive content)
|
|
- **Leda** - Youthful, fresh (younger audiences)
|
|
|
|
## Multi-Speaker Configuration
|
|
|
|
### **Example Speaker Setup**
|
|
```json
|
|
{
|
|
"speakers": [
|
|
{
|
|
"name": "Dr. Anya",
|
|
"voice": "Kore",
|
|
"characteristics": "Firm, professional"
|
|
},
|
|
{
|
|
"name": "Liam",
|
|
"voice": "Puck",
|
|
"characteristics": "Upbeat, enthusiastic"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### **Conversation Format**
|
|
```
|
|
Dr. Anya: Welcome to our science podcast!
|
|
Liam: Thanks for having me, Dr. Anya!
|
|
Dr. Anya: Today we're discussing artificial intelligence.
|
|
Liam: It's such an exciting field!
|
|
```
|
|
|
|
## Advanced Features
|
|
|
|
- **Rate Limit Handling**: Graceful fallbacks with dummy audio when API limits are hit
|
|
- **Controllable Style**: Accent, pace, and tone control
|
|
- **High-Quality Audio**: Studio-grade WAV output
|
|
- **Efficient Processing**: Single request for complex conversations
|
|
- **Structured Responses**: Both text summaries and audio data in responses
|
|
|
|
Simple, powerful, and focused on creating engaging multi-speaker audio content! |