Files

admin b52318eeae feat: Add intelligent auto-router and enhanced integrations

- Add intelligent-router.sh hook for automatic agent routing
- Add AUTO-TRIGGER-SUMMARY.md documentation
- Add FINAL-INTEGRATION-SUMMARY.md documentation
- Complete Prometheus integration (6 commands + 4 tools)
- Complete Dexto integration (12 commands + 5 tools)
- Enhanced Ralph with access to all agents
- Fix /clawd command (removed disable-model-invocation)
- Update hooks.json to v5 with intelligent routing
- 291 total skills now available
- All 21 commands with automatic routing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2026-01-28 00:27:56 +04:00

5.6 KiB

Raw Blame History

Advanced Podcast Generation Agent

An AI agent for creating multi-speaker audio content using the Gemini TTS MCP server.

Overview

This agent uses the refactored Gemini TTS MCP server to generate high-quality speech with advanced multi-speaker capabilities. It supports 30 prebuilt voices, natural language tone control, and can generate entire conversations with multiple speakers in a single request. The server now returns audio content that can be played directly in web interfaces.

Key Features

🎤 Native Multi-Speaker Support

Generate conversations with multiple speakers in one request
No need for separate audio files or post-processing
Natural conversation flow with different voices per speaker

🎵 30 Prebuilt Voices

Zephyr - Bright and energetic
Puck - Upbeat and cheerful
Charon - Informative and clear
Kore - Firm and authoritative
Fenrir - Excitable and dynamic
Leda - Youthful and fresh
Orus - Firm and confident
Aoede - Breezy and light
Callirrhoe - Easy-going and relaxed
Autonoe - Bright and optimistic
Enceladus - Breathy and intimate
Iapetus - Clear and articulate
Umbriel - Easy-going and friendly
Algieba - Smooth and polished
Despina - Smooth and elegant
Erinome - Clear and precise
Algenib - Gravelly and distinctive
Rasalgethi - Informative and knowledgeable
Laomedeia - Upbeat and lively
Achernar - Soft and gentle
Alnilam - Firm and steady
Schedar - Even and balanced
Gacrux - Mature and experienced
Pulcherrima - Forward and engaging
Achird - Friendly and warm
Zubenelgenubi - Casual and approachable
Vindemiatrix - Gentle and soothing
Sadachbia - Lively and animated
Sadaltager - Knowledgeable and wise
Sulafat - Warm and inviting

🌐 WebUI Compatible

Returns audio content that can be played directly in web interfaces
Base64-encoded WAV audio data
Structured content with both text summaries and audio data

🎭 Natural Language Tone Control

"Say cheerfully: Welcome to our show!"
"Speak in a formal tone: Welcome to our meeting"
"Use an excited voice: This is amazing news!"
"Speak slowly and clearly: This is important information"

Setup

Get API Keys:

export GEMINI_API_KEY="your-gemini-api-key"
export OPENAI_API_KEY="your-openai-api-key"

Run the Agent:

dexto -a agents/podcast-agent/podcast-agent.yml

The agent will automatically install the Gemini TTS MCP server from npm when needed.

Usage Examples

Single Speaker

"Generate speech: 'Welcome to our podcast' with voice 'Kore'"
"Create audio: 'Say cheerfully: Have a wonderful day!' with voice 'Puck'"
"Make a formal announcement: 'Speak in a formal tone: Important news today' with voice 'Zephyr'"

Multi-Speaker Conversations

"Generate a conversation between Dr. Anya (voice: Kore) and Liam (voice: Puck) about AI"
"Create an interview with host (voice: Zephyr) and guest (voice: Orus) discussing climate change"
"Make a story with narrator (voice: Schedar) and character (voice: Laomedeia)"
"Generate a podcast with three speakers: host (Zephyr), expert (Kore), and interviewer (Puck)"

Podcast Types

"Create an educational podcast about AI with clear, professional voices"
"Generate a storytelling podcast with expressive character voices"
"Make a news podcast with authoritative, formal delivery"
"Create an interview with host and guest using different voices"

Available Tools

Gemini TTS Tools

generate_speech - Single-speaker audio generation
generate_conversation - Multi-speaker conversations
list_voices - Browse available voices with characteristics

File Management

list_files - Browse audio files
read_file - Access file information
write_file - Save generated content
delete_file - Clean up files

Voice Selection Guide

Professional Voices

Kore - Firm, authoritative (great for hosts, experts)
Orus - Firm, professional (business content)
Zephyr - Bright, engaging (news, announcements)
Schedar - Even, balanced (narrators, guides)

Expressive Voices

Puck - Upbeat, enthusiastic (entertainment, stories)
Laomedeia - Upbeat, energetic (dynamic content)
Fenrir - Excitable, passionate (exciting topics)
Achird - Friendly, warm (casual conversations)

Character Voices

Umbriel - Easy-going, relaxed (casual hosts)
Erinome - Clear, articulate (educational content)
Autonoe - Bright, optimistic (positive content)
Leda - Youthful, fresh (younger audiences)

Multi-Speaker Configuration

Example Speaker Setup

{
  "speakers": [
    {
      "name": "Dr. Anya",
      "voice": "Kore",
      "characteristics": "Firm, professional"
    },
    {
      "name": "Liam",
      "voice": "Puck", 
      "characteristics": "Upbeat, enthusiastic"
    }
  ]
}

Conversation Format

Dr. Anya: Welcome to our science podcast!
Liam: Thanks for having me, Dr. Anya!
Dr. Anya: Today we're discussing artificial intelligence.
Liam: It's such an exciting field!

Advanced Features

Rate Limit Handling: Graceful fallbacks with dummy audio when API limits are hit
Controllable Style: Accent, pace, and tone control
High-Quality Audio: Studio-grade WAV output
Efficient Processing: Single request for complex conversations
Structured Responses: Both text summaries and audio data in responses

Simple, powerful, and focused on creating engaging multi-speaker audio content!

5.6 KiB Raw Blame History