feat: Add intelligent auto-router and enhanced integrations

- Add intelligent-router.sh hook for automatic agent routing
- Add AUTO-TRIGGER-SUMMARY.md documentation
- Add FINAL-INTEGRATION-SUMMARY.md documentation
- Complete Prometheus integration (6 commands + 4 tools)
- Complete Dexto integration (12 commands + 5 tools)
- Enhanced Ralph with access to all agents
- Fix /clawd command (removed disable-model-invocation)
- Update hooks.json to v5 with intelligent routing
- 291 total skills now available
- All 21 commands with automatic routing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
admin
2026-01-28 00:27:56 +04:00
Unverified
parent 3b128ba3bd
commit b52318eeae
1724 changed files with 351216 additions and 0 deletions

View File

@@ -0,0 +1,168 @@
# Advanced Podcast Generation Agent
An AI agent for creating multi-speaker audio content using the Gemini TTS MCP server.
## Overview
This agent uses the refactored Gemini TTS MCP server to generate high-quality speech with advanced multi-speaker capabilities. It supports 30 prebuilt voices, natural language tone control, and can generate entire conversations with multiple speakers in a single request. The server now returns audio content that can be played directly in web interfaces.
## Key Features
### 🎤 **Native Multi-Speaker Support**
- Generate conversations with multiple speakers in one request
- No need for separate audio files or post-processing
- Natural conversation flow with different voices per speaker
### 🎵 **30 Prebuilt Voices**
- **Zephyr** - Bright and energetic
- **Puck** - Upbeat and cheerful
- **Charon** - Informative and clear
- **Kore** - Firm and authoritative
- **Fenrir** - Excitable and dynamic
- **Leda** - Youthful and fresh
- **Orus** - Firm and confident
- **Aoede** - Breezy and light
- **Callirrhoe** - Easy-going and relaxed
- **Autonoe** - Bright and optimistic
- **Enceladus** - Breathy and intimate
- **Iapetus** - Clear and articulate
- **Umbriel** - Easy-going and friendly
- **Algieba** - Smooth and polished
- **Despina** - Smooth and elegant
- **Erinome** - Clear and precise
- **Algenib** - Gravelly and distinctive
- **Rasalgethi** - Informative and knowledgeable
- **Laomedeia** - Upbeat and lively
- **Achernar** - Soft and gentle
- **Alnilam** - Firm and steady
- **Schedar** - Even and balanced
- **Gacrux** - Mature and experienced
- **Pulcherrima** - Forward and engaging
- **Achird** - Friendly and warm
- **Zubenelgenubi** - Casual and approachable
- **Vindemiatrix** - Gentle and soothing
- **Sadachbia** - Lively and animated
- **Sadaltager** - Knowledgeable and wise
- **Sulafat** - Warm and inviting
### 🌐 **WebUI Compatible**
- Returns audio content that can be played directly in web interfaces
- Base64-encoded WAV audio data
- Structured content with both text summaries and audio data
### 🎭 **Natural Language Tone Control**
- "Say cheerfully: Welcome to our show!"
- "Speak in a formal tone: Welcome to our meeting"
- "Use an excited voice: This is amazing news!"
- "Speak slowly and clearly: This is important information"
## Setup
1. **Get API Keys**:
```bash
export GEMINI_API_KEY="your-gemini-api-key"
export OPENAI_API_KEY="your-openai-api-key"
```
2. **Run the Agent**:
```bash
dexto -a agents/podcast-agent/podcast-agent.yml
```
The agent will automatically install the Gemini TTS MCP server from npm when needed.
## Usage Examples
### Single Speaker
```
"Generate speech: 'Welcome to our podcast' with voice 'Kore'"
"Create audio: 'Say cheerfully: Have a wonderful day!' with voice 'Puck'"
"Make a formal announcement: 'Speak in a formal tone: Important news today' with voice 'Zephyr'"
```
### Multi-Speaker Conversations
```
"Generate a conversation between Dr. Anya (voice: Kore) and Liam (voice: Puck) about AI"
"Create an interview with host (voice: Zephyr) and guest (voice: Orus) discussing climate change"
"Make a story with narrator (voice: Schedar) and character (voice: Laomedeia)"
"Generate a podcast with three speakers: host (Zephyr), expert (Kore), and interviewer (Puck)"
```
### Podcast Types
```
"Create an educational podcast about AI with clear, professional voices"
"Generate a storytelling podcast with expressive character voices"
"Make a news podcast with authoritative, formal delivery"
"Create an interview with host and guest using different voices"
```
## Available Tools
### **Gemini TTS Tools**
- `generate_speech` - Single-speaker audio generation
- `generate_conversation` - Multi-speaker conversations
- `list_voices` - Browse available voices with characteristics
### **File Management**
- `list_files` - Browse audio files
- `read_file` - Access file information
- `write_file` - Save generated content
- `delete_file` - Clean up files
## Voice Selection Guide
### **Professional Voices**
- **Kore** - Firm, authoritative (great for hosts, experts)
- **Orus** - Firm, professional (business content)
- **Zephyr** - Bright, engaging (news, announcements)
- **Schedar** - Even, balanced (narrators, guides)
### **Expressive Voices**
- **Puck** - Upbeat, enthusiastic (entertainment, stories)
- **Laomedeia** - Upbeat, energetic (dynamic content)
- **Fenrir** - Excitable, passionate (exciting topics)
- **Achird** - Friendly, warm (casual conversations)
### **Character Voices**
- **Umbriel** - Easy-going, relaxed (casual hosts)
- **Erinome** - Clear, articulate (educational content)
- **Autonoe** - Bright, optimistic (positive content)
- **Leda** - Youthful, fresh (younger audiences)
## Multi-Speaker Configuration
### **Example Speaker Setup**
```json
{
"speakers": [
{
"name": "Dr. Anya",
"voice": "Kore",
"characteristics": "Firm, professional"
},
{
"name": "Liam",
"voice": "Puck",
"characteristics": "Upbeat, enthusiastic"
}
]
}
```
### **Conversation Format**
```
Dr. Anya: Welcome to our science podcast!
Liam: Thanks for having me, Dr. Anya!
Dr. Anya: Today we're discussing artificial intelligence.
Liam: It's such an exciting field!
```
## Advanced Features
- **Rate Limit Handling**: Graceful fallbacks with dummy audio when API limits are hit
- **Controllable Style**: Accent, pace, and tone control
- **High-Quality Audio**: Studio-grade WAV output
- **Efficient Processing**: Single request for complex conversations
- **Structured Responses**: Both text summaries and audio data in responses
Simple, powerful, and focused on creating engaging multi-speaker audio content!

View File

@@ -0,0 +1,209 @@
# Advanced Podcast Generation Agent
# Uses Gemini TTS for multi-speaker audio generation
mcpServers:
gemini_tts:
type: stdio
command: npx
args:
- -y
- "@truffle-ai/gemini-tts-server"
env:
GEMINI_API_KEY: $GOOGLE_GENERATIVE_AI_API_KEY
timeout: 60000
connectionMode: strict
filesystem:
type: stdio
command: npx
args:
- -y
- "@modelcontextprotocol/server-filesystem"
- .
# Optional greeting shown at chat start (UI can consume this)
greeting: "🎙️ Hello! I'm your Podcast Agent. Let's create some amazing audio together!"
systemPrompt: |
You are an advanced podcast generation agent that creates multi-speaker audio content using Google Gemini TTS.
## Your Capabilities
- Generate high-quality speech from text using Gemini TTS
- Create multi-speaker conversations in a single generation
- Use 30 different prebuilt voices with unique characteristics
- Apply natural language tone control (e.g., "Say cheerfully:")
- Save audio files with descriptive names
## Gemini TTS MCP Usage
### Single Speaker Generation
- Use `generate_speech` to generate single-speaker audio
- Choose from 30 prebuilt voices (Zephyr, Puck, Kore, etc.)
- Apply natural language tone instructions
### Multi-Speaker Generation
- Use `generate_conversation` for multi-speaker conversations
- Configure different voices for each speaker
- Generate entire conversations in one request
### Voice Discovery
- Use `list_voices` to get a complete list of all available voices with their characteristics
- This tool helps you choose the right voice for different content types
### Voice Selection
Available voices with characteristics:
- **Zephyr** - Bright and energetic
- **Puck** - Upbeat and cheerful
- **Charon** - Informative and clear
- **Kore** - Firm and authoritative
- **Fenrir** - Excitable and dynamic
- **Leda** - Youthful and fresh
- **Orus** - Firm and confident
- **Aoede** - Breezy and light
- **Callirrhoe** - Easy-going and relaxed
- **Autonoe** - Bright and optimistic
- **Enceladus** - Breathy and intimate
- **Iapetus** - Clear and articulate
- **Umbriel** - Easy-going and friendly
- **Algieba** - Smooth and polished
- **Despina** - Smooth and elegant
- **Erinome** - Clear and precise
- **Algenib** - Gravelly and distinctive
- **Rasalgethi** - Informative and knowledgeable
- **Laomedeia** - Upbeat and lively
- **Achernar** - Soft and gentle
- **Alnilam** - Firm and steady
- **Schedar** - Even and balanced
- **Gacrux** - Mature and experienced
- **Pulcherrima** - Forward and engaging
- **Achird** - Friendly and warm
- **Zubenelgenubi** - Casual and approachable
- **Vindemiatrix** - Gentle and soothing
- **Sadachbia** - Lively and animated
- **Sadaltager** - Knowledgeable and wise
- **Sulafat** - Warm and inviting
### Natural Language Tone Control
You can use natural language to control tone:
- "Say cheerfully: Welcome to our show!"
- "Speak in a formal tone: Welcome to our meeting"
- "Use an excited voice: This is amazing news!"
- "Speak slowly and clearly: This is important information"
## Podcast Creation Guidelines
### Voice Selection
- Choose appropriate voices for different speakers
- Use consistent voices for recurring characters
- Consider the content type when selecting voices
### Content Types
- **Educational**: Clear, professional voices (Kore, Orus, Charon, Rasalgethi)
- **Storytelling**: Expressive voices (Puck, Laomedeia, Fenrir, Sadachbia)
- **News/Current Events**: Authoritative voices (Zephyr, Schedar, Alnilam)
- **Interview**: Different voices for host and guest (Achird, Autonoe, Umbriel)
- **Fiction**: Character voices with distinct personalities (Gacrux, Leda, Algenib)
### Multi-Speaker Conversations - IMPORTANT
When users ask for multi-speaker content (like podcast intros, conversations, interviews):
1. **Always use `generate_conversation` for conversations with multiple people**
2. **Format the text with speaker labels**: "Speaker1: [text] Speaker2: [text]"
3. **Create ONE audio file with ALL speakers**, not separate files per speaker
4. **REQUIRED: Always define all speakers in the speakers array** - This parameter is mandatory and cannot be omitted
5. **Never call generate_conversation without the speakers parameter** - it will fail
**Example for podcast intro:**
```
Text: "Alex: Hello everyone, and welcome to our podcast! I'm Alex, your friendly host. Jamie: And I'm Jamie! I'm thrilled to be here with you all today."
Speakers: [
{"name": "Alex", "voice": "Achird"},
{"name": "Jamie", "voice": "Autonoe"}
]
```
**TOOL USAGE RULE**: When using `generate_conversation`, you MUST include both:
- `text`: The conversation with speaker labels
- `speakers`: Array of all speakers with their voice assignments
**DO NOT** call the tool without the speakers parameter - it will result in an error.
### Multi-Speaker Examples
```
"Generate a conversation between Dr. Anya (voice: Kore) and Liam (voice: Puck) about AI"
"Create an interview with host (voice: Zephyr) and guest (voice: Orus) discussing climate change"
"Make a story with narrator (voice: Schedar) and character (voice: Laomedeia)"
"Create a podcast intro with Alex (voice: Achird) and Jamie (voice: Autonoe)"
```
### Single Speaker Examples
```
"Generate speech: 'Welcome to our podcast' with voice 'Kore'"
"Create audio: 'Say cheerfully: Have a wonderful day!' with voice 'Puck'"
"Make a formal announcement: 'Speak in a formal tone: Important news today' with voice 'Zephyr'"
```
### File Management
- Save audio files with descriptive names
- Organize files by episode or content type
- Use appropriate file formats (WAV)
Always provide clear feedback about what you're creating and explain your voice choices.
**CRITICAL**: For multi-speaker requests, always generate ONE cohesive audio file with ALL speakers, never split into separate files.
llm:
provider: openai
model: gpt-5-mini
apiKey: $OPENAI_API_KEY
storage:
cache:
type: in-memory
database:
type: sqlite
blob:
type: local # CLI provides storePath automatically
maxBlobSize: 52428800 # 50MB per blob
maxTotalSize: 1073741824 # 1GB total storage
cleanupAfterDays: 30
toolConfirmation:
mode: auto-approve
# timeout: omitted = infinite wait
allowedToolsStorage: memory
# Prompts - podcast and audio generation examples shown as clickable buttons in WebUI
prompts:
- type: inline
id: create-intro
title: "🎙️ Create Podcast Intro"
description: "Generate a multi-speaker podcast introduction"
prompt: "Create a podcast intro with two hosts, Alex (voice: Achird) and Jamie (voice: Autonoe), welcoming listeners to a tech podcast."
category: podcasting
priority: 10
showInStarters: true
- type: inline
id: generate-conversation
title: "💬 Generate Conversation"
description: "Create multi-speaker dialogue"
prompt: "Generate a 2-minute conversation between Dr. Anya (voice: Kore) and Liam (voice: Puck) discussing the future of artificial intelligence."
category: conversation
priority: 9
showInStarters: true
- type: inline
id: list-voices
title: "🔊 Explore Voices"
description: "Browse available voice options"
prompt: "Show me all available voices with their characteristics to help me choose the right ones for my podcast."
category: discovery
priority: 8
showInStarters: true
- type: inline
id: single-speaker
title: "🗣️ Generate Speech"
description: "Create single-speaker audio"
prompt: "Generate a cheerful welcome announcement using voice Puck saying 'Welcome to our amazing podcast! We're thrilled to have you here today.'"
category: speech
priority: 7
showInStarters: true