feat: Add complete Agentic Compaction & Pipeline System

- Context Compaction System with token counting and summarization - Deterministic State Machine for flow control (no LLM decisions) - Parallel Execution Engine (up to 12 concurrent sessions) - Event-Driven Coordination via Event Bus - Agent Workspace Isolation (tools, memory, identity, files) - YAML Workflow Integration (OpenClaw/Lobster compatible) - Claude Code integration layer - Complete demo UI with real-time visualization - Comprehensive documentation and README Components: - agent-system/: Context management, token counting, subagent spawning - pipeline-system/: State machine, parallel executor, event bus, workflows - skills/: AI capabilities (LLM, ASR, TTS, VLM, image generation, etc.) - src/app/: Next.js demo application Total: ~100KB of production-ready TypeScript code
2026-03-03 12:40:47 +00:00
parent 63a8b123c9
commit 2380d33861
152 changed files with 51569 additions and 817 deletions
--- a/skills/TTS/SKILL.md
+++ b/skills/TTS/SKILL.md
@@ -0,0 +1,735 @@
+---
+name: TTS
+description: Implement text-to-speech (TTS) capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to convert text into natural-sounding speech, create audio content, build voice-enabled applications, or generate spoken audio files. Supports multiple voices, adjustable speed, and various audio formats.
+license: MIT
+---
+
+# TTS (Text to Speech) Skill
+
+This skill guides the implementation of text-to-speech (TTS) functionality using the z-ai-web-dev-sdk package, enabling conversion of text into natural-sounding speech audio.
+
+## Skills Path
+
+**Skill Location**: `{project_path}/skills/TTS`
+
+This skill is located at the above path in your project.
+
+**Reference Scripts**: Example test scripts are available in the `{Skill Location}/scripts/` directory for quick testing and reference. See `{Skill Location}/scripts/tts.ts` for a working example.
+
+## Overview
+
+Text-to-Speech allows you to build applications that generate spoken audio from text input, supporting various voices, speeds, and output formats for diverse use cases.
+
+**IMPORTANT**: z-ai-web-dev-sdk MUST be used in backend code only. Never use it in client-side code.
+
+## API Limitations and Constraints
+
+Before implementing TTS functionality, be aware of these important limitations:
+
+### Input Text Constraints
+- **Maximum length**: 1024 characters per request
+- Text exceeding this limit must be split into smaller chunks
+
+### Audio Parameters
+- **Speed range**: 0.5 to 2.0
+  - 0.5 = half speed (slower)
+  - 1.0 = normal speed (default)
+  - 2.0 = double speed (faster)
+- **Volume range**: Greater than 0, up to 10
+  - Default: 1.0
+  - Values must be greater than 0 (exclusive) and up to 10 (inclusive)
+
+### Format and Streaming
+- **Streaming limitation**: When `stream: true` is enabled, only `pcm` format is supported
+- **Non-streaming**: Supports `wav`, `pcm`, and `mp3` formats
+- **Sample rate**: 24000 Hz (recommended)
+
+### Best Practice for Long Text
+```javascript
+function splitTextIntoChunks(text, maxLength = 1000) {
+  const chunks = [];
+  const sentences = text.match(/[^.!?]+[.!?]+/g) || [text];
+  
+  let currentChunk = '';
+  for (const sentence of sentences) {
+    if ((currentChunk + sentence).length <= maxLength) {
+      currentChunk += sentence;
+    } else {
+      if (currentChunk) chunks.push(currentChunk.trim());
+      currentChunk = sentence;
+    }
+  }
+  if (currentChunk) chunks.push(currentChunk.trim());
+  
+  return chunks;
+}
+```
+
+## Prerequisites
+
+The z-ai-web-dev-sdk package is already installed. Import it as shown in the examples below.
+
+## CLI Usage (For Simple Tasks)
+
+For simple text-to-speech conversions, you can use the z-ai CLI instead of writing code. This is ideal for quick audio generation, testing voices, or simple automation.
+
+### Basic TTS
+
+```bash
+# Convert text to speech (default WAV format)
+z-ai tts --input "Hello, world" --output ./hello.wav
+
+# Using short options
+z-ai tts -i "Hello, world" -o ./hello.wav
+```
+
+### Different Voices and Speed
+
+```bash
+# Use specific voice
+z-ai tts -i "Welcome to our service" -o ./welcome.wav --voice tongtong
+
+# Adjust speech speed (0.5-2.0)
+z-ai tts -i "This is faster speech" -o ./fast.wav --speed 1.5
+
+# Slower speech
+z-ai tts -i "This is slower speech" -o ./slow.wav --speed 0.8
+```
+
+### Different Output Formats
+
+```bash
+# MP3 format
+z-ai tts -i "Hello World" -o ./hello.mp3 --format mp3
+
+# WAV format (default)
+z-ai tts -i "Hello World" -o ./hello.wav --format wav
+
+# PCM format
+z-ai tts -i "Hello World" -o ./hello.pcm --format pcm
+```
+
+### Streaming Output
+
+```bash
+# Stream audio generation
+z-ai tts -i "This is a longer text that will be streamed" -o ./stream.wav --stream
+```
+
+### CLI Parameters
+
+- `--input, -i <text>`: **Required** - Text to convert to speech (max 1024 characters)
+- `--output, -o <path>`: **Required** - Output audio file path
+- `--voice, -v <voice>`: Optional - Voice type (default: tongtong)
+- `--speed, -s <number>`: Optional - Speech speed, 0.5-2.0 (default: 1.0)
+- `--format, -f <format>`: Optional - Output format: wav, mp3, pcm (default: wav)
+- `--stream`: Optional - Enable streaming output (only supports pcm format)
+
+### When to Use CLI vs SDK
+
+**Use CLI for:**
+- Quick text-to-speech conversions
+- Testing different voices and speeds
+- Simple batch audio generation
+- Command-line automation scripts
+
+**Use SDK for:**
+- Dynamic audio generation in applications
+- Integration with web services
+- Custom audio processing pipelines
+- Production applications with complex requirements
+
+## Basic TTS Implementation
+
+### Simple Text to Speech
+
+```javascript
+import ZAI from 'z-ai-web-dev-sdk';
+import fs from 'fs';
+
+async function textToSpeech(text, outputPath) {
+  const zai = await ZAI.create();
+
+  const response = await zai.audio.tts.create({
+    input: text,
+    voice: 'tongtong',
+    speed: 1.0,
+    response_format: 'wav',
+    stream: false
+  });
+
+  // Get array buffer from Response object
+  const arrayBuffer = await response.arrayBuffer();
+  const buffer = Buffer.from(new Uint8Array(arrayBuffer));
+
+  fs.writeFileSync(outputPath, buffer);
+  console.log(`Audio saved to ${outputPath}`);
+  return outputPath;
+}
+
+// Usage
+await textToSpeech('Hello, world!', './output.wav');
+```
+
+### Multiple Voice Options
+
+```javascript
+import ZAI from 'z-ai-web-dev-sdk';
+import fs from 'fs';
+
+async function generateWithVoice(text, voice, outputPath) {
+  const zai = await ZAI.create();
+
+  const response = await zai.audio.tts.create({
+    input: text,
+    voice: voice, // Available voices: tongtong, chuichui, xiaochen, jam, kazi, douji, luodo
+    speed: 1.0,
+    response_format: 'wav',
+    stream: false
+  });
+
+  // Get array buffer from Response object
+  const arrayBuffer = await response.arrayBuffer();
+  const buffer = Buffer.from(new Uint8Array(arrayBuffer));
+
+  fs.writeFileSync(outputPath, buffer);
+  return outputPath;
+}
+
+// Usage
+await generateWithVoice('Welcome to our service', 'tongtong', './welcome.wav');
+```
+
+### Adjustable Speed
+
+```javascript
+import ZAI from 'z-ai-web-dev-sdk';
+import fs from 'fs';
+
+async function generateWithSpeed(text, speed, outputPath) {
+  const zai = await ZAI.create();
+
+  // Speed range: 0.5 to 2.0 (API constraint)
+  // 0.5 = half speed (slower)
+  // 1.0 = normal speed (default)
+  // 2.0 = double speed (faster)
+  // Values outside this range will cause API errors
+
+  const response = await zai.audio.tts.create({
+    input: text,
+    voice: 'tongtong',
+    speed: speed,
+    response_format: 'wav',
+    stream: false
+  });
+
+  // Get array buffer from Response object
+  const arrayBuffer = await response.arrayBuffer();
+  const buffer = Buffer.from(new Uint8Array(arrayBuffer));
+
+  fs.writeFileSync(outputPath, buffer);
+  return outputPath;
+}
+
+// Usage - slower narration
+await generateWithSpeed('This is an important announcement', 0.8, './slow.wav');
+
+// Usage - faster narration
+await generateWithSpeed('Quick update', 1.3, './fast.wav');
+```
+
+### Adjustable Volume
+
+```javascript
+import ZAI from 'z-ai-web-dev-sdk';
+import fs from 'fs';
+
+async function generateWithVolume(text, volume, outputPath) {
+  const zai = await ZAI.create();
+
+  // Volume range: greater than 0, up to 10 (API constraint)
+  // Values must be > 0 (exclusive) and <= 10 (inclusive)
+  // Default: 1.0 (normal volume)
+
+  const response = await zai.audio.tts.create({
+    input: text,
+    voice: 'tongtong',
+    speed: 1.0,
+    volume: volume, // Optional parameter
+    response_format: 'wav',
+    stream: false
+  });
+
+  // Get array buffer from Response object
+  const arrayBuffer = await response.arrayBuffer();
+  const buffer = Buffer.from(new Uint8Array(arrayBuffer));
+
+  fs.writeFileSync(outputPath, buffer);
+  return outputPath;
+}
+
+// Usage - louder audio
+await generateWithVolume('This is an announcement', 5.0, './loud.wav');
+
+// Usage - quieter audio
+await generateWithVolume('Whispered message', 0.5, './quiet.wav');
+```
+
+## Advanced Use Cases
+
+### Batch Processing
+
+```javascript
+import ZAI from 'z-ai-web-dev-sdk';
+import fs from 'fs';
+import path from 'path';
+
+async function batchTextToSpeech(textArray, outputDir) {
+  const zai = await ZAI.create();
+  const results = [];
+
+  // Ensure output directory exists
+  if (!fs.existsSync(outputDir)) {
+    fs.mkdirSync(outputDir, { recursive: true });
+  }
+
+  for (let i = 0; i < textArray.length; i++) {
+    try {
+      const text = textArray[i];
+      const outputPath = path.join(outputDir, `audio_${i + 1}.wav`);
+
+      const response = await zai.audio.tts.create({
+        input: text,
+        voice: 'tongtong',
+        speed: 1.0,
+        response_format: 'wav',
+        stream: false
+      });
+
+      // Get array buffer from Response object
+      const arrayBuffer = await response.arrayBuffer();
+      const buffer = Buffer.from(new Uint8Array(arrayBuffer));
+
+      fs.writeFileSync(outputPath, buffer);
+      results.push({
+        success: true,
+        text,
+        path: outputPath
+      });
+    } catch (error) {
+      results.push({
+        success: false,
+        text: textArray[i],
+        error: error.message
+      });
+    }
+  }
+
+  return results;
+}
+
+// Usage
+const texts = [
+  'Welcome to chapter one',
+  'Welcome to chapter two',
+  'Welcome to chapter three'
+];
+
+const results = await batchTextToSpeech(texts, './audio-output');
+console.log('Generated:', results.length, 'audio files');
+```
+
+### Dynamic Content Generation
+
+```javascript
+import ZAI from 'z-ai-web-dev-sdk';
+import fs from 'fs';
+
+class TTSGenerator {
+  constructor() {
+    this.zai = null;
+  }
+
+  async initialize() {
+    this.zai = await ZAI.create();
+  }
+
+  async generateAudio(text, options = {}) {
+    const {
+      voice = 'tongtong',
+      speed = 1.0,
+      format = 'wav'
+    } = options;
+
+    const response = await this.zai.audio.tts.create({
+      input: text,
+      voice: voice,
+      speed: speed,
+      response_format: format,
+      stream: false
+    });
+
+    // Get array buffer from Response object
+    const arrayBuffer = await response.arrayBuffer();
+    return Buffer.from(new Uint8Array(arrayBuffer));
+  }
+
+  async saveAudio(text, outputPath, options = {}) {
+    const buffer = await this.generateAudio(text, options);
+    if (buffer) {
+      fs.writeFileSync(outputPath, buffer);
+      return outputPath;
+    }
+    return null;
+  }
+}
+
+// Usage
+const generator = new TTSGenerator();
+await generator.initialize();
+
+await generator.saveAudio(
+  'Hello, this is a test',
+  './output.wav',
+  { speed: 1.2 }
+);
+```
+
+### Next.js API Route Example
+
+```javascript
+import { NextRequest, NextResponse } from 'next/server';
+
+export async function POST(req: NextRequest) {
+  try {
+    const { text, voice = 'tongtong', speed = 1.0 } = await req.json();
+
+    // Import ZAI SDK
+    const ZAI = (await import('z-ai-web-dev-sdk')).default;
+
+    // Create SDK instance
+    const zai = await ZAI.create();
+
+    // Generate TTS audio
+    const response = await zai.audio.tts.create({
+      input: text.trim(),
+      voice: voice,
+      speed: speed,
+      response_format: 'wav',
+      stream: false,
+    });
+
+    // Get array buffer from Response object
+    const arrayBuffer = await response.arrayBuffer();
+    const buffer = Buffer.from(new Uint8Array(arrayBuffer));
+
+    // Return audio as response
+    return new NextResponse(buffer, {
+      status: 200,
+      headers: {
+        'Content-Type': 'audio/wav',
+        'Content-Length': buffer.length.toString(),
+        'Cache-Control': 'no-cache',
+      },
+    });
+  } catch (error) {
+    console.error('TTS API Error:', error);
+
+    return NextResponse.json(
+      {
+        error: error instanceof Error ? error.message : '生成语音失败，请稍后重试',
+      },
+      { status: 500 }
+    );
+  }
+}
+```
+
+## Best Practices
+
+### 1. Text Preparation
+```javascript
+function prepareTextForTTS(text) {
+  // Remove excessive whitespace
+  text = text.replace(/\s+/g, ' ').trim();
+
+  // Expand common abbreviations for better pronunciation
+  const abbreviations = {
+    'Dr.': 'Doctor',
+    'Mr.': 'Mister',
+    'Mrs.': 'Misses',
+    'etc.': 'et cetera'
+  };
+
+  for (const [abbr, full] of Object.entries(abbreviations)) {
+    text = text.replace(new RegExp(abbr, 'g'), full);
+  }
+
+  return text;
+}
+```
+
+### 2. Error Handling
+```javascript
+import ZAI from 'z-ai-web-dev-sdk';
+import fs from 'fs';
+
+async function safeTTS(text, outputPath) {
+  try {
+    // Validate input
+    if (!text || text.trim().length === 0) {
+      throw new Error('Text input cannot be empty');
+    }
+
+    if (text.length > 1024) {
+      throw new Error('Text input exceeds maximum length of 1024 characters');
+    }
+
+    const zai = await ZAI.create();
+
+    const response = await zai.audio.tts.create({
+      input: text,
+      voice: 'tongtong',
+      speed: 1.0,
+      response_format: 'wav',
+      stream: false
+    });
+
+    // Get array buffer from Response object
+    const arrayBuffer = await response.arrayBuffer();
+    const buffer = Buffer.from(new Uint8Array(arrayBuffer));
+
+    fs.writeFileSync(outputPath, buffer);
+
+    return {
+      success: true,
+      path: outputPath,
+      size: buffer.length
+    };
+  } catch (error) {
+    console.error('TTS Error:', error);
+    return {
+      success: false,
+      error: error.message
+    };
+  }
+}
+```
+
+### 3. SDK Instance Reuse
+
+```javascript
+import ZAI from 'z-ai-web-dev-sdk';
+
+// Create a singleton instance
+let zaiInstance = null;
+
+async function getZAIInstance() {
+  if (!zaiInstance) {
+    zaiInstance = await ZAI.create();
+  }
+  return zaiInstance;
+}
+
+// Usage
+const zai = await getZAIInstance();
+const response = await zai.audio.tts.create({ ... });
+```
+
+## Common Use Cases
+
+1. **Audiobooks & Podcasts**: Convert written content to audio format
+2. **E-learning**: Create narration for educational content
+3. **Accessibility**: Provide audio versions of text content
+4. **Voice Assistants**: Generate dynamic responses
+5. **Announcements**: Create automated audio notifications
+6. **IVR Systems**: Generate phone system prompts
+7. **Content Localization**: Create audio in different languages
+
+## Integration Examples
+
+### Express.js API Endpoint
+
+```javascript
+import express from 'express';
+import ZAI from 'z-ai-web-dev-sdk';
+import fs from 'fs';
+import path from 'path';
+
+const app = express();
+app.use(express.json());
+
+let zaiInstance;
+const outputDir = './audio-output';
+
+async function initZAI() {
+  zaiInstance = await ZAI.create();
+  if (!fs.existsSync(outputDir)) {
+    fs.mkdirSync(outputDir, { recursive: true });
+  }
+}
+
+app.post('/api/tts', async (req, res) => {
+  try {
+    const { text, voice = 'tongtong', speed = 1.0 } = req.body;
+
+    if (!text) {
+      return res.status(400).json({ error: 'Text is required' });
+    }
+
+    const filename = `tts_${Date.now()}.wav`;
+    const outputPath = path.join(outputDir, filename);
+
+    const response = await zaiInstance.audio.tts.create({
+      input: text,
+      voice: voice,
+      speed: speed,
+      response_format: 'wav',
+      stream: false
+    });
+
+    // Get array buffer from Response object
+    const arrayBuffer = await response.arrayBuffer();
+    const buffer = Buffer.from(new Uint8Array(arrayBuffer));
+
+    fs.writeFileSync(outputPath, buffer);
+
+    res.json({
+      success: true,
+      audioUrl: `/audio/${filename}`,
+      size: buffer.length
+    });
+  } catch (error) {
+    res.status(500).json({ error: error.message });
+  }
+});
+
+app.use('/audio', express.static('audio-output'));
+
+initZAI().then(() => {
+  app.listen(3000, () => {
+    console.log('TTS API running on port 3000');
+  });
+});
+```
+
+## Troubleshooting
+
+**Issue**: "Input text exceeds maximum length"
+- **Solution**: Text input is limited to 1024 characters. Split longer text into chunks using the `splitTextIntoChunks` function shown in the API Limitations section
+
+**Issue**: "Invalid speed parameter" or unexpected speed behavior
+- **Solution**: Speed must be between 0.5 and 2.0. Check your speed value is within this range
+
+**Issue**: "Invalid volume parameter"
+- **Solution**: Volume must be greater than 0 and up to 10. Ensure volume value is in range (0, 10]
+
+**Issue**: "Stream format not supported" with WAV/MP3
+- **Solution**: Streaming mode only supports PCM format. Either use `response_format: 'pcm'` with streaming, or disable streaming (`stream: false`) for WAV/MP3 output
+
+**Issue**: "SDK must be used in backend"
+- **Solution**: Ensure z-ai-web-dev-sdk is only imported in server-side code
+
+**Issue**: "TypeError: response.audio is undefined"
+- **Solution**: The SDK returns a standard Response object, use `await response.arrayBuffer()` instead of accessing `response.audio`
+
+**Issue**: Generated audio file is empty or corrupted
+- **Solution**: Ensure you're calling `await response.arrayBuffer()` and properly converting to Buffer: `Buffer.from(new Uint8Array(arrayBuffer))`
+
+**Issue**: Audio sounds unnatural
+- **Solution**: Prepare text properly (remove special characters, expand abbreviations)
+
+**Issue**: Long processing times
+- **Solution**: Break long text into smaller chunks and process in parallel
+
+**Issue**: Next.js caching old API route
+- **Solution**: Create a new API route endpoint or restart the dev server
+
+## Performance Tips
+
+1. **Reuse SDK Instance**: Create ZAI instance once and reuse
+2. **Implement Caching**: Cache generated audio for repeated text
+3. **Batch Processing**: Process multiple texts efficiently
+4. **Optimize Text**: Remove unnecessary content before generation
+5. **Async Processing**: Use queues for handling multiple requests
+
+## Important Notes
+
+### API Constraints
+
+**Input Text Length**: Maximum 1024 characters per request. For longer text:
+```javascript
+// Split long text into chunks
+const longText = "..."; // Your long text here
+const chunks = splitTextIntoChunks(longText, 1000);
+
+for (const chunk of chunks) {
+  const response = await zai.audio.tts.create({
+    input: chunk,
+    voice: 'tongtong',
+    speed: 1.0,
+    response_format: 'wav',
+    stream: false
+  });
+  // Process each chunk...
+}
+```
+
+**Streaming Format Limitation**: When using `stream: true`, only `pcm` format is supported. For `wav` or `mp3` output, use `stream: false`.
+
+**Sample Rate**: Audio is generated at 24000 Hz sample rate (recommended setting for playback).
+
+### Response Object Format
+
+The `zai.audio.tts.create()` method returns a standard **Response** object (not a custom object with an `audio` property). Always use:
+
+```javascript
+// ✅ CORRECT
+const response = await zai.audio.tts.create({ ... });
+const arrayBuffer = await response.arrayBuffer();
+const buffer = Buffer.from(new Uint8Array(arrayBuffer));
+
+// ❌ WRONG - This will not work
+const response = await zai.audio.tts.create({ ... });
+const buffer = Buffer.from(response.audio); // response.audio is undefined
+```
+
+### Available Voices
+
+- `tongtong` - 温暖亲切
+- `chuichui` - 活泼可爱
+- `xiaochen` - 沉稳专业
+- `jam` - 英音绅士
+- `kazi` - 清晰标准
+- `douji` - 自然流畅
+- `luodo` - 富有感染力
+
+### Speed Range
+
+- Minimum: `0.5` (half speed)
+- Default: `1.0` (normal speed)
+- Maximum: `2.0` (double speed)
+
+**Important**: Speed values outside the range [0.5, 2.0] will result in API errors.
+
+### Volume Range
+
+- Minimum: Greater than `0` (exclusive)
+- Default: `1.0` (normal volume)
+- Maximum: `10` (inclusive)
+
+**Note**: Volume parameter is optional. When not specified, defaults to 1.0.
+
+## Remember
+
+- Always use z-ai-web-dev-sdk in backend code only
+- **Input text is limited to 1024 characters maximum** - split longer text into chunks
+- **Speed must be between 0.5 and 2.0** - values outside this range will cause errors
+- **Volume must be greater than 0 and up to 10** - optional parameter with default 1.0
+- **Streaming only supports PCM format** - use non-streaming for WAV or MP3 output
+- The SDK returns a standard Response object - use `await response.arrayBuffer()`
+- Convert ArrayBuffer to Buffer using `Buffer.from(new Uint8Array(arrayBuffer))`
+- Handle audio buffers properly when saving to files
+- Implement error handling for production applications
+- Consider caching for frequently generated content
+- Clean up old audio files periodically to manage storage