feat: Add complete Agentic Compaction & Pipeline System

- Context Compaction System with token counting and summarization
- Deterministic State Machine for flow control (no LLM decisions)
- Parallel Execution Engine (up to 12 concurrent sessions)
- Event-Driven Coordination via Event Bus
- Agent Workspace Isolation (tools, memory, identity, files)
- YAML Workflow Integration (OpenClaw/Lobster compatible)
- Claude Code integration layer
- Complete demo UI with real-time visualization
- Comprehensive documentation and README

Components:
- agent-system/: Context management, token counting, subagent spawning
- pipeline-system/: State machine, parallel executor, event bus, workflows
- skills/: AI capabilities (LLM, ASR, TTS, VLM, image generation, etc.)
- src/app/: Next.js demo application

Total: ~100KB of production-ready TypeScript code
This commit is contained in:
Z User
2026-03-03 12:40:47 +00:00
Unverified
parent 63a8b123c9
commit 2380d33861
152 changed files with 51569 additions and 817 deletions

View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2025 z-ai-web-dev-sdk Skills
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

198
skills/podcast-generate/SKILL.md Executable file
View File

@@ -0,0 +1,198 @@
---
name: Podcast Generate
description: Generate podcast episodes from user-provided content or by searching the web for specified topics. If user uploads a text file/article, creates a dual-host dialogue podcast (or single-host upon request). If no content is provided, searches the web for information about the user-specified topic and generates a podcast. Duration scales with content size (3-20 minutes, ~240 chars/min). Uses z-ai-web-dev-sdk for LLM script generation and TTS audio synthesis. Outputs both a podcast script (Markdown) and a complete audio file (WAV).
license: MIT
---
# Podcast Generate SkillTypeScript 版本)
根据用户提供的资料或联网搜索结果,自动生成播客脚本与音频。
该 Skill 适用于:
- 长文内容的快速理解和播客化
- 知识型内容的音频化呈现
- 热点话题的深度解读和讨论
- 实时信息的搜索和播客制作
---
## 能力说明
### 本 Skill 可以做什么
- **从文件生成**接收一篇资料txt/md/docx/pdf等文本格式生成对谈播客脚本和音频
- **联网搜索生成**:根据用户指定的主题,联网搜索最新信息,生成播客脚本和音频
- 自动控制时长根据内容长度自动调整3-20 分钟)
- 生成 Markdown 格式的播客脚本(可人工编辑)
- 使用 z-ai TTS 合成高质量音频并拼接为最终播客
### 本 Skill 当前不做什么
- 不生成 mp3 / 字幕 / 时间戳
- 不支持三人及以上播客角色
- 不加入背景音乐或音效
---
## 文件与职责说明
本 Skill 由以下文件组成:
- `generate.ts`
统一入口(支持文件模式和搜索模式)
- **文件模式**:读取用户上传的文本文件 → 生成播客
- **搜索模式**:调用 web-search skill 获取资料 → 生成播客
- 使用 z-ai-web-dev-sdk 进行 LLM 脚本生成
- 使用 z-ai-web-dev-sdk 进行 TTS 音频生成
- 自动拼接音频片段
- 只输出最终文件
- `readme.md`
使用说明文档
- `SKILL.md`
当前文件,描述 Skill 能力、边界与使用约定
- `package.json`
Node.js 项目配置与依赖
- `tsconfig.json`
TypeScript 编译配置
---
## 输入与输出约定
### 输入(二选一)
**方式 1文件上传**
- 一篇资料文件txt / md / docx / pdf 等文本格式)
- 资料长度不限Skill 会自动压缩为合适长度
**方式 2联网搜索**
- 用户指定一个搜索主题
- 自动调用 web-search skill 获取相关内容
- 整合多个搜索结果作为资料来源
### 输出(只输出 2 个文件)
- `podcast_script.md`
播客脚本Markdown 格式,可人工编辑)
- `podcast.wav`
最终拼接完成的播客音频
**不输出中间文件**(如 segments.jsonl、meta.json 等)
---
## 运行方式
### 依赖环境
- Node.js 18+
- z-ai-web-dev-sdk已安装
- web-search skill用于联网搜索模式
**不需要** z-ai CLI
### 安装依赖
```bash
npm install
```
---
## 使用示例
### 从文件生成播客
```bash
npm run generate -- --input=test_data/material.txt --out_dir=out
```
### 联网搜索生成播客
```bash
# 根据主题搜索并生成播客
npm run generate -- --topic="最新AI技术突破" --out_dir=out
# 指定搜索主题和时长
npm run generate -- --topic="量子计算应用场景" --out_dir=out --duration=8
# 搜索并生成单人播客
npm run generate -- --topic="气候变化影响" --out_dir=out --mode=single-male
```
---
## 参数说明
| 参数 | 说明 | 默认值 |
|------|------|--------|
| `--input` | 输入资料文件路径(与 --topic 二选一) | - |
| `--topic` | 搜索主题关键词(与 --input 二选一) | - |
| `--out_dir` | 输出目录(必需) | - |
| `--mode` | 播客模式dual / single-male / single-female | dual |
| `--duration` | 手动指定分钟数3-200 表示自动 | 0 |
| `--host_name` | 主持人/主播名称 | 小谱 |
| `--guest_name` | 嘉宾名称 | 锤锤 |
| `--voice_host` | 主持音色 | xiaochen |
| `--voice_guest` | 嘉宾音色 | chuichui |
| `--speed` | 语速0.5-2.0 | 1.0 |
| `--pause_ms` | 段间停顿毫秒数 | 200 |
---
## 可用音色
| 音色 | 特点 |
|------|------|
| xiaochen | 沉稳专业 |
| chuichui | 活泼可爱 |
| tongtong | 温暖亲切 |
| jam | 英音绅士 |
| kazi | 清晰标准 |
| douji | 自然流畅 |
| luodo | 富有感染力 |
---
## 技术架构
### generate.ts统一入口
- **文件模式**:读取用户上传文件 → 生成播客
- **搜索模式**:调用 web-search skill → 获取资料 → 生成播客
- **LLM**:使用 `z-ai-web-dev-sdk` (`chat.completions.create`)
- **TTS**:使用 `z-ai-web-dev-sdk` (`audio.tts.create`)
- **不需要** z-ai CLI
- 自动拼接音频片段
- 只输出最终文件,中间文件自动清理
### LLM 调用
- System prompt播客脚本编剧角色
- User prompt包含资料 + 硬性约束 + 呼吸感要求
- 输出校验:字数、结构、角色标签
- 自动重试:最多 3 次
### TTS 调用
- 使用 `zai.audio.tts.create()`
- 支持自定义音色、语速
- 自动拼接多个 wav 片段
- 临时文件自动清理
---
## 输出示例
### podcast_script.md片段
```markdown
**小谱**:大家好,欢迎收听今天的播客。今天我们来聊一个有趣的话题……
**锤锤**:是啊,这个话题真的很有意思。我最近也在关注……
**小谱**:说到这里,我想给大家举个例子……
```
---
## License
MIT

View File

@@ -0,0 +1,661 @@
#!/usr/bin/env tsx
/**
* generate.ts - 统一入口(纯 SDK 版本)
* 原资料 -> podcast_script.md + podcast.wav
*
* 只使用 z-ai-web-dev-sdk不依赖 z-ai CLI
*
* Usage:
* tsx generate.ts --input=material.txt --out_dir=out
* tsx generate.ts --input=material.md --out_dir=out --duration=5
*/
import ZAI from 'z-ai-web-dev-sdk';
import fs from 'fs';
import path from 'path';
import { fileURLToPath } from 'url';
import os from 'os';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
// -----------------------------
// Types
// -----------------------------
interface GenConfig {
mode: 'dual' | 'single-male' | 'single-female';
temperature: number;
durationManual: number;
charsPerMin: number;
hostName: string;
guestName: string;
audience: string;
tone: string;
maxAttempts: number;
timeoutSec: number;
voiceHost: string;
voiceGuest: string;
speed: number;
pauseMs: number;
}
interface Segment {
idx: number;
speaker: 'host' | 'guest';
name: string;
text: string;
}
// -----------------------------
// Config
// -----------------------------
const DEFAULT_CONFIG: GenConfig = {
mode: 'dual',
temperature: 0.9,
durationManual: 0,
charsPerMin: 240,
hostName: '小谱',
guestName: '锤锤',
audience: '白领小白',
tone: '轻松但有信息密度',
maxAttempts: 3,
timeoutSec: 300,
voiceHost: 'xiaochen',
voiceGuest: 'chuichui',
speed: 1.0,
pauseMs: 200,
};
const DURATION_RANGE_LOW = 3;
const DURATION_RANGE_HIGH = 20;
const BUDGET_TOLERANCE = 0.15;
// -----------------------------
// Functions
// -----------------------------
function parseArgs(): { [key: string]: any } {
const args = process.argv.slice(2);
const result: { [key: string]: any } = {};
for (let i = 0; i < args.length; i++) {
const arg = args[i];
if (arg.startsWith('--')) {
const key = arg.slice(2);
if (key.includes('=')) {
const [k, v] = key.split('=');
result[k] = v;
} else if (i + 1 < args.length && !args[i + 1].startsWith('--')) {
result[key] = args[i + 1];
i++;
} else {
result[key] = true;
}
}
}
return result;
}
function readText(filePath: string): string {
let content = fs.readFileSync(filePath, 'utf-8');
content = content.replace(/\r\n/g, '\n');
content = content.replace(/\n{3,}/g, '\n\n');
content = content.replace(/[ \t]{2,}/g, ' ');
content = content.replace(/-\n/g, '');
return content.trim();
}
function countNonWsChars(text: string): number {
return text.replace(/\s+/g, '').length;
}
function chooseDurationMinutes(inputChars: number, low: number = DURATION_RANGE_LOW, high: number = DURATION_RANGE_HIGH): number {
const estimated = Math.max(low, Math.min(high, Math.floor(inputChars / 1000)));
return estimated;
}
function charBudget(durationMin: number, charsPerMin: number, tolerance: number): [number, number, number] {
const target = durationMin * charsPerMin;
const low = Math.floor(target * (1 - tolerance));
const high = Math.ceil(target * (1 + tolerance));
return [target, low, high];
}
function buildPrompts(
material: string,
cfg: GenConfig,
durationMin: number,
budgetTarget: number,
budgetLow: number,
budgetHigh: number,
attemptHint: string = ''
): [string, string] {
let system: string;
let user: string;
if (cfg.mode === 'dual') {
system = (
`你是一个播客脚本编剧,擅长把资料提炼成双人对谈播客。` +
`角色固定为男主持「${cfg.hostName}」与女嘉宾「${cfg.guestName}」。` +
`你写作口播化、信息密度适中、有呼吸感、节奏自然。` +
`你必须严格遵守输出格式与字数预算。`
);
const hintBlock = attemptHint ? `\n【上一次生成纠偏提示】\n${attemptHint}\n` : '';
user = `请把下面【资料】改写为中文播客脚本,形式为双人对谈(男主持 ${cfg.hostName} + 女嘉宾 ${cfg.guestName})。
时长目标:${durationMin} 分钟。
【硬性约束】
1) 总字数必须在 ${budgetLow}${budgetHigh} 字之间(目标约 ${budgetTarget} 字)。
2) 严格使用轮次交替输出:每段必须以"**${cfg.hostName}**"或"**${cfg.guestName}**"开头。
3) 必须包含完整的叙事结构(但不要在对话中写出结构标签):
- 开场Hook 引入 + 本期主题介绍
- 主体3个不同维度的内容用自然过渡语连接
- 总结:回顾要点 + 行动建议1句话明确可执行
4) 不要在对话中写"核心点1"、"第一点"等结构标签,用自然的过渡语如"说到这个"、"还有个有趣的事"、"另外"等
5) 不要照念原文,不要大段引用;要用口播化表达。
6) 受众:${cfg.audience}
7) 风格:${cfg.tone}
【呼吸感与自然对话 - 重要!】
为了营造真实播客的呼吸感,请:
1) 适度加入语气词和感叹词:嗯、哦、啊、对、没错、哈哈、哇、天呐、啧啧等
2) 多用互动式表达:"你说得对"、"这就很有意思了"、"等等,让我想想"、"我懂你的意思"
3) 适当加入思考和停顿的暗示:"这个问题嘛..."、"怎么说呢..."、"其实..."
4) 避免过于密集的信息输出每段控制在3-5句话给听众消化时间
5) 用类比和生活化的例子来解释复杂概念
6) 两人之间要有自然的呼应和追问,而不是各说各话
7) 不同主题之间用自然过渡语连接,不要出现"核心点1/2/3"等标签
【输出格式示例】
**${cfg.hostName}**:开场……
**${cfg.guestName}**:回应……
(一直交替到结束)
${hintBlock}
【资料】
${material}
`;
} else {
const speakerName = cfg.mode === 'single-male' ? cfg.hostName : cfg.guestName;
const gender = cfg.mode === 'single-male' ? '男性' : '女性';
system = (
`你是一个${gender}单人播客主播,名字叫「${speakerName}」。` +
`你擅长把资料提炼成单人独白式播客,像讲课、读书分享、知识科普一样。` +
`你写作口播化、信息密度适中、有呼吸感、节奏自然。` +
`你必须严格遵守输出格式与字数预算。`
);
const hintBlock = attemptHint ? `\n【上一次生成纠偏提示】\n${attemptHint}\n` : '';
user = `请把下面【资料】改写为中文单人播客脚本,形式为独白式讲述(主播:${speakerName})。
时长目标:${durationMin} 分钟。
【硬性约束】
1) 总字数必须在 ${budgetLow}${budgetHigh} 字之间(目标约 ${budgetTarget} 字)。
2) 所有内容均由「${speakerName}」一人讲述,每段都以"**${speakerName}**"开头。
3) 必须包含完整的叙事结构(但不要在对话中写出结构标签):
- 开场Hook 引入 + 本期主题介绍
- 主体3个不同维度的内容用自然过渡语连接
- 总结:回顾要点 + 行动建议1句话明确可执行
4) 不要在对话中写"核心点1"、"第一点"等结构标签,用自然的过渡语如"说到这个"、"还有个有趣的事"、"另外"等
5) 不要照念原文,不要大段引用;要用口播化表达。
6) 受众:${cfg.audience}
7) 风格:${cfg.tone}
【单人播客的呼吸感 - 重要!】
为了营造自然的单人播客呼吸感,请:
1) 适度加入语气词和感叹词:嗯、哦、啊、对、没错、哈哈、哇、天呐、啧啧等
2) 多用自问自答式表达:"你可能会问...答案是..."、"这是为什么呢?让我来解释..."
3) 适当加入思考和停顿的暗示:"这个问题嘛..."、"怎么说呢..."、"其实..."
4) 避免过于密集的信息输出每段控制在3-5句话给听众消化时间
5) 用类比和生活化的例子来解释复杂概念
6) 像在和朋友聊天一样,而不是在念课文
【输出格式示例】
**${speakerName}**:开场,大家好,我是${speakerName},今天我们来聊……
**${speakerName}**:说到这个,最近有个特别有意思的事……
(所有内容都由${speakerName}讲述,分段输出)
${hintBlock}
【资料】
${material}
`;
}
return [system, user];
}
async function callZAI(
systemPrompt: string,
userPrompt: string,
temperature: number
): Promise<string> {
const zai = await ZAI.create();
const completion = await zai.chat.completions.create({
messages: [
{ role: 'assistant', content: systemPrompt },
{ role: 'user', content: userPrompt },
],
thinking: { type: 'disabled' },
});
const content = completion.choices[0]?.message?.content || '';
return content;
}
function scriptToSegments(script: string, hostName: string, guestName: string): Segment[] {
const segments: Segment[] = [];
const lines = script.split('\n');
let current: Segment | null = null;
let idx = 0;
const hostPrefix = `**${hostName}**`;
const guestPrefix = `**${guestName}**`;
for (const rawLine of lines) {
const line = rawLine.trim();
if (!line) continue;
if (line.startsWith(hostPrefix)) {
idx++;
current = {
idx,
speaker: 'host',
name: hostName,
text: line.slice(hostPrefix.length).trim(),
};
segments.push(current);
} else if (line.startsWith(guestPrefix)) {
idx++;
current = {
idx,
speaker: 'guest',
name: guestName,
text: line.slice(guestPrefix.length).trim(),
};
segments.push(current);
} else {
if (current) {
current.text = (current.text + ' ' + line).trim();
}
}
}
return segments;
}
function validateScript(
script: string,
cfg: GenConfig,
budgetLow: number,
budgetHigh: number
): [boolean, string[]] {
const reasons: string[] = [];
if (cfg.mode === 'dual') {
const hostTag = `**${cfg.hostName}**`;
const guestTag = `**${cfg.guestName}**`;
if (!script.includes(hostTag)) reasons.push(`缺少主持人标识:${hostTag}`);
if (!script.includes(guestTag)) reasons.push(`缺少嘉宾标识:${guestTag}`);
const turns = script.split('\n').filter(line =>
line.startsWith(hostTag) || line.startsWith(guestTag)
);
if (turns.length < 8) reasons.push('对谈轮次过少:建议至少 8 轮');
} else {
const speakerName = cfg.mode === 'single-male' ? cfg.hostName : cfg.guestName;
const speakerTag = `**${speakerName}**`;
if (!script.includes(speakerTag)) reasons.push(`缺少主播标识:${speakerTag}`);
const turns = script.split('\n').filter(line => line.startsWith(speakerTag));
if (turns.length < 5) reasons.push('播客段数过少:建议至少 5 段');
}
const n = countNonWsChars(script);
if (n < budgetLow || n > budgetHigh) {
reasons.push(`字数不在预算:当前约 ${n} 字,预算 ${budgetLow}-${budgetHigh}`);
}
// 只检查开场和总结,不检查"核心点1/2/3"标签(因为不应该出现在对话中)
const mustHave = ['开场', '总结'];
for (const kw of mustHave) {
if (!script.includes(kw)) {
reasons.push(`缺少结构要素:${kw}(请在对话中自然引入)`);
}
}
// 检查是否有足够的对话轮次(确保内容覆盖了多个主题)
const lineCount = script.split('\n').filter(l => l.trim()).length;
if (lineCount < 10) {
reasons.push('对话轮次过少建议至少10段对话');
}
return [reasons.length === 0, reasons];
}
function makeRetryHint(reasons: string[], cfg: GenConfig, budgetLow: number, budgetHigh: number): string {
const lines = ['请严格修复以下问题后重新生成:'];
for (const r of reasons) lines.push(`- ${r}`);
lines.push(`- 总字数必须在 ${budgetLow}-${budgetHigh} 之间。`);
if (cfg.mode === 'dual') {
lines.push(`- 每段必须以"**${cfg.hostName}**"或"**${cfg.guestName}**"开头。`);
} else {
const speakerName = cfg.mode === 'single-male' ? cfg.hostName : cfg.guestName;
lines.push(`- 所有内容都由一人讲述,每段必须以"**${speakerName}**"开头。`);
}
lines.push('- 必须包含开场和总结,中间用自然过渡语连接不同主题,不要出现"核心点1/2/3"等标签。');
return lines.join('\n');
}
async function ttsRequest(
zai: any,
text: string,
voice: string,
speed: number
): Promise<Buffer> {
const response = await zai.audio.tts.create({
input: text,
voice: voice,
speed: speed,
response_format: 'wav',
stream: false,
});
const arrayBuffer = await response.arrayBuffer();
const buffer = Buffer.from(new Uint8Array(arrayBuffer));
return buffer;
}
function ensureSilenceWav(filePath: string, params: { nchannels: number; sampwidth: number; framerate: number }, ms: number): void {
const { nchannels, sampwidth, framerate } = params;
const nframes = Math.floor((framerate * ms) / 1000);
const silenceFrame = Buffer.alloc(sampwidth * nchannels, 0);
const frames = Buffer.alloc(silenceFrame.length * nframes, 0);
const header = Buffer.alloc(44);
header.write('RIFF', 0);
header.writeUInt32LE(36 + frames.length, 4);
header.write('WAVE', 8);
header.write('fmt ', 12);
header.writeUInt32LE(16, 16);
header.writeUInt16LE(1, 20);
header.writeUInt16LE(nchannels, 22);
header.writeUInt32LE(framerate, 24);
header.writeUInt32LE(framerate * nchannels * sampwidth, 28);
header.writeUInt16LE(nchannels * sampwidth, 32);
header.writeUInt16LE(sampwidth * 8, 34);
header.write('data', 36);
header.writeUInt32LE(frames.length, 40);
fs.writeFileSync(filePath, Buffer.concat([header, frames]));
}
function wavParams(filePath: string): { nchannels: number; sampwidth: number; framerate: number } {
const buffer = fs.readFileSync(filePath);
const nchannels = buffer.readUInt16LE(22);
const sampwidth = buffer.readUInt16LE(34) / 8;
const framerate = buffer.readUInt32LE(24);
return { nchannels, sampwidth, framerate };
}
function joinWavsWave(outPath: string, wavPaths: string[], pauseMs: number): void {
if (wavPaths.length === 0) throw new Error('No wav files to join.');
const ref = wavPaths[0];
const refParams = wavParams(ref);
const silencePath = path.join(os.tmpdir(), `_silence_${Date.now()}.wav`);
if (pauseMs > 0) ensureSilenceWav(silencePath, refParams, pauseMs);
const chunks: Buffer[] = [];
for (let i = 0; i < wavPaths.length; i++) {
const wavPath = wavPaths[i];
const buffer = fs.readFileSync(wavPath);
const dataStart = buffer.indexOf('data') + 8;
const data = buffer.subarray(dataStart);
const params = wavParams(wavPath);
if (params.nchannels !== refParams.nchannels ||
params.sampwidth !== refParams.sampwidth ||
params.framerate !== refParams.framerate) {
throw new Error(`WAV params mismatch: ${wavPath}`);
}
chunks.push(data);
if (pauseMs > 0 && i < wavPaths.length - 1) {
const silenceBuffer = fs.readFileSync(silencePath);
const silenceData = silenceBuffer.subarray(silenceBuffer.indexOf('data') + 8);
chunks.push(silenceData);
}
}
const totalDataSize = chunks.reduce((sum, buf) => sum + buf.length, 0);
const header = Buffer.alloc(44);
header.write('RIFF', 0);
header.writeUInt32LE(36 + totalDataSize, 4);
header.write('WAVE', 8);
header.write('fmt ', 12);
header.writeUInt32LE(16, 16);
header.writeUInt16LE(1, 20);
header.writeUInt16LE(refParams.nchannels, 22);
header.writeUInt32LE(refParams.framerate, 24);
header.writeUInt32LE(refParams.framerate * refParams.nchannels * refParams.sampwidth, 28);
header.writeUInt16LE(refParams.nchannels * refParams.sampwidth, 32);
header.writeUInt16LE(refParams.sampwidth * 8, 34);
header.write('data', 36);
header.writeUInt32LE(totalDataSize, 40);
const output = Buffer.concat([header, ...chunks]);
fs.writeFileSync(outPath, output);
if (fs.existsSync(silencePath)) fs.unlinkSync(silencePath);
}
// -----------------------------
// Main
// -----------------------------
async function main() {
const args = parseArgs();
const inputPath = args.input;
const outDir = args.out_dir;
const topic = args.topic;
// 检查参数:必须提供 input 或 topic 之一
if ((!inputPath && !topic) || !outDir) {
console.error('Usage: tsx generate.ts --input=<file> --out_dir=<dir>');
console.error(' OR: tsx generate.ts --topic=<search-term> --out_dir=<dir>');
console.error('');
console.error('Examples:');
console.error(' # From file');
console.error(' npm run generate -- --input=article.txt --out_dir=out');
console.error(' # From web search');
console.error(' npm run generate -- --topic="最新AI新闻" --out_dir=out');
process.exit(1);
}
// Merge config
const cfg: GenConfig = {
...DEFAULT_CONFIG,
mode: (args.mode || 'dual') as GenConfig['mode'],
durationManual: parseInt(args.duration || '0'),
hostName: args.host_name || DEFAULT_CONFIG.hostName,
guestName: args.guest_name || DEFAULT_CONFIG.guestName,
voiceHost: args.voice_host || DEFAULT_CONFIG.voiceHost,
voiceGuest: args.voice_guest || DEFAULT_CONFIG.voiceGuest,
speed: parseFloat(args.speed || String(DEFAULT_CONFIG.speed)),
pauseMs: parseInt(args.pause_ms || String(DEFAULT_CONFIG.pauseMs)),
};
// Create output directory
if (!fs.existsSync(outDir)) {
fs.mkdirSync(outDir, { recursive: true });
}
// 根据模式获取资料
let material: string;
let inputSource: string;
if (inputPath) {
// 模式1从文件读取
console.log(`[MODE] Reading from file: ${inputPath}`);
material = readText(inputPath);
inputSource = `file:${inputPath}`;
} else if (topic) {
// 模式2联网搜索
console.log(`[MODE] Searching web for topic: ${topic}`);
const zai = await ZAI.create();
const searchResults = await zai.functions.invoke('web_search', {
query: topic,
num: 10
});
if (!Array.isArray(searchResults) || searchResults.length === 0) {
console.error(`未找到关于"${topic}"的搜索结果`);
process.exit(2);
}
console.log(`[SEARCH] Found ${searchResults.length} results`);
// 将搜索结果转换为文本资料
material = searchResults
.map((r: any, i: number) => `【来源 ${i + 1}${r.name}\n${r.snippet}\n链接${r.url}`)
.join('\n\n');
inputSource = `web_search:${topic}`;
console.log(`[SEARCH] Compiled material (${material.length} chars)`);
} else {
console.error('[ERROR] Neither --input nor --topic provided');
process.exit(1);
}
const inputChars = material.length;
// Calculate duration
let durationMin: number;
if (cfg.durationManual >= 3 && cfg.durationManual <= 20) {
durationMin = cfg.durationManual;
} else {
durationMin = chooseDurationMinutes(inputChars, DURATION_RANGE_LOW, DURATION_RANGE_HIGH);
}
const [target, low, high] = charBudget(durationMin, cfg.charsPerMin, BUDGET_TOLERANCE);
console.log(`[INFO] input_chars=${inputChars} duration=${durationMin}min budget=${low}-${high}`);
let attemptHint = '';
let lastScript: string | null = null;
// Initialize ZAI SDK (reuse for TTS)
const zai = await ZAI.create();
// Generate script
for (let attempt = 1; attempt <= cfg.maxAttempts; attempt++) {
const [systemPrompt, userPrompt] = buildPrompts(
material,
cfg,
durationMin,
target,
low,
high,
attemptHint
);
try {
console.log(`[LLM] Attempt ${attempt}/${cfg.maxAttempts}...`);
const content = await callZAI(systemPrompt, userPrompt, cfg.temperature);
lastScript = content;
const [ok, reasons] = validateScript(content, cfg, low, high);
if (ok) {
break;
}
attemptHint = makeRetryHint(reasons, cfg, low, high);
console.error(`[WARN] Validation failed:`, reasons.join(', '));
} catch (error: any) {
console.error(`[ERROR] LLM call failed: ${error.message}`);
throw error;
}
}
if (!lastScript) {
console.error('[ERROR] 未生成任何脚本输出。');
process.exit(1);
}
// Write script
const scriptPath = path.join(outDir, 'podcast_script.md');
fs.writeFileSync(scriptPath, lastScript, 'utf-8');
console.log(`[DONE] podcast_script.md -> ${scriptPath}`);
// Parse segments
const segments = scriptToSegments(lastScript, cfg.hostName, cfg.guestName);
console.log(`[INFO] Parsed ${segments.length} segments`);
// Generate TTS using SDK
const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'podcast_segments_'));
const produced: string[] = [];
try {
for (let i = 0; i < segments.length; i++) {
const seg = segments[i];
const text = seg.text.trim();
if (!text) continue;
let voice: string;
if (cfg.mode === 'dual') {
voice = seg.speaker === 'host' ? cfg.voiceHost : cfg.voiceGuest;
} else if (cfg.mode === 'single-male') {
voice = cfg.voiceHost;
} else {
voice = cfg.voiceGuest;
}
const wavPath = path.join(tmpDir, `seg_${seg.idx.toString().padStart(4, '0')}.wav`);
console.log(`[TTS] [${i + 1}/${segments.length}] idx=${seg.idx} speaker=${seg.speaker} voice=${voice}`);
const buffer = await ttsRequest(zai, text, voice, cfg.speed);
fs.writeFileSync(wavPath, buffer);
produced.push(wavPath);
}
// Join segments
const podcastPath = path.join(outDir, 'podcast.wav');
console.log(`[JOIN] Joining ${produced.length} wav files -> ${podcastPath}`);
joinWavsWave(podcastPath, produced, cfg.pauseMs);
console.log(`[DONE] podcast.wav -> ${podcastPath}`);
} finally {
// Cleanup temp directory
try {
fs.rmSync(tmpDir, { recursive: true, force: true });
} catch (error: any) {
console.error(`[WARN] Failed to cleanup temp dir: ${error.message}`);
}
}
console.log('\n[FINAL OUTPUT]');
console.log(` 📄 podcast_script.md -> ${scriptPath}`);
console.log(` 🎙️ podcast.wav -> ${path.join(outDir, 'podcast.wav')}`);
}
main().catch(error => {
console.error('[FATAL ERROR]', error);
process.exit(1);
});

View File

@@ -0,0 +1,30 @@
{
"name": "podcast-generate-online",
"version": "1.0.0",
"description": "Generate podcast audio from text using z-ai LLM and TTS",
"type": "module",
"main": "dist/index.js",
"scripts": {
"generate": "tsx generate.ts",
"build": "tsc",
"prepublishOnly": "npm run build"
},
"keywords": [
"podcast",
"tts",
"llm",
"z-ai"
],
"license": "MIT",
"dependencies": {
"z-ai-web-dev-sdk": "*"
},
"devDependencies": {
"@types/node": "^20",
"tsx": "^4.7.0",
"typescript": "^5.3.0"
},
"engines": {
"node": ">=18.0.0"
}
}

177
skills/podcast-generate/readme.md Executable file
View File

@@ -0,0 +1,177 @@
# Podcast Generate SkillTypeScript 线上版本)
将一篇资料自动转化为对谈播客时长根据内容长度自动调整3-20 分钟约240字/分钟):
- 自动提炼核心内容
- 生成可编辑的播客脚本
- 使用 z-ai TTS 合成音频
这是一个使用 **z-ai-web-dev-sdk** 的 TypeScript 版本,适用于线上环境。
---
## 快速开始
### 一键生成(脚本 + 音频)
```bash
npm run generate -- --input=test_data/material.txt --out_dir=out
```
**最终输出:**
- `out/podcast_script.md` - 播客脚本Markdown 格式)
- `out/podcast.wav` - 最终播客音频
---
## 目录结构
```text
podcast-generate/
├── readme.md # 使用说明(本文件)
├── SKILL.md # Skill 能力与接口约定
├── package.json # Node.js 依赖配置
├── tsconfig.json # TypeScript 编译配置
├── generate.ts # ⭐ 统一入口(唯一需要的文件)
└── test_data/
└── material.txt # 示例输入资料
```
---
## 环境要求
- **Node.js 18+**
- **z-ai-web-dev-sdk**(已安装在环境中)
**不需要** z-ai CLI本代码完全使用 SDK。
---
## 安装
```bash
npm install
```
---
## 使用方式
### 方式 1从文件生成
```bash
npm run generate -- --input=material.txt --out_dir=out
```
### 方式 2联网搜索生成
```bash
npm run generate -- --topic="最新AI新闻" --out_dir=out
npm run generate -- --topic="量子计算应用" --out_dir=out --duration=8
```
### 参数说明
| 参数 | 说明 | 默认值 |
|------|------|--------|
| `--input` | 输入资料文件路径,支持 txt/md/docx/pdf 等文本格式(与 --topic 二选一) | - |
| `--topic` | 搜索主题关键词(与 --input 二选一) | - |
| `--out_dir` | 输出目录(必需) | - |
| `--mode` | 播客模式dual / single-male / single-female | dual |
| `--duration` | 手动指定分钟数3-200 表示自动 | 0 |
| `--host_name` | 主持人/主播名称 | 小谱 |
| `--guest_name` | 嘉宾名称 | 锤锤 |
| `--voice_host` | 主持音色 | xiaochen |
| `--voice_guest` | 嘉宾音色 | chuichui |
| `--speed` | 语速0.5-2.0 | 1.0 |
| `--pause_ms` | 段间停顿毫秒数 | 200 |
---
## 使用示例
### 双人对谈播客(默认)
```bash
npm run generate -- --input=material.txt --out_dir=out
```
### 单人男声播客
```bash
npm run generate -- --input=material.txt --out_dir=out --mode=single-male
```
### 指定 5 分钟时长
```bash
npm run generate -- --input=material.txt --out_dir=out --duration=5
```
### 自定义角色名称
```bash
npm run generate -- --input=material.txt --out_dir=out --host_name=张三 --guest_name=李四
```
### 使用不同音色
```bash
npm run generate -- --input=material.txt --out_dir=out --voice_host=tongtong --voice_guest=douji
```
### 联网搜索生成播客
```bash
# 根据主题搜索并生成播客
npm run generate -- --topic="最新AI技术突破" --out_dir=out
# 指定搜索主题和时长
npm run generate -- --topic="量子计算应用场景" --out_dir=out --duration=8
# 搜索并生成单人播客
npm run generate -- --topic="气候变化影响" --out_dir=out --mode=single-male
```
---
## 可用音色
| 音色 | 特点 |
|------|------|
| xiaochen | 沉稳专业 |
| chuichui | 活泼可爱 |
| tongtong | 温暖亲切 |
| jam | 英音绅士 |
| kazi | 清晰标准 |
| douji | 自然流畅 |
| luodo | 富有感染力 |
---
## 技术架构
### generate.ts统一入口
- **LLM**:使用 `z-ai-web-dev-sdk` (`chat.completions.create`)
- **TTS**:使用 `z-ai-web-dev-sdk` (`audio.tts.create`)
- **不需要** z-ai CLI
- 自动拼接音频片段
- 只输出最终文件,中间文件自动清理
### LLM 调用
- System prompt播客脚本编剧角色
- User prompt包含资料 + 硬性约束 + 呼吸感要求
- 输出校验:字数、结构、角色标签
- 自动重试:最多 3 次
### TTS 调用
- 使用 `zai.audio.tts.create()`
- 支持自定义音色、语速
- 自动拼接多个 wav 片段
- 临时文件自动清理
---
## License
MIT

View File

@@ -0,0 +1,3 @@
{"idx": 1, "speaker": "host", "name": "主持人", "text": "大家好,欢迎来到今天的播客节目。"}
{"idx": 2, "speaker": "guest", "name": "嘉宾", "text": "很高兴能参加这次节目。"}
{"idx": 3, "speaker": "host", "name": "主持人", "text": "今天我们要讨论一个非常有意思的话题。"}

View File

@@ -0,0 +1,26 @@
{
"compilerOptions": {
"target": "ES2022",
"module": "ES2022",
"lib": ["ES2022"],
"moduleResolution": "node",
"outDir": "./dist",
"rootDir": "./",
"strict": true,
"esModuleInterop": true,
"skipLibCheck": true,
"forceConsistentCasingInFileNames": true,
"resolveJsonModule": true,
"allowSyntheticDefaultImports": true,
"declaration": true,
"declarationMap": true,
"sourceMap": true
},
"include": [
"*.ts"
],
"exclude": [
"node_modules",
"dist"
]
}