9.7 KiB
name, description, priority, autoTrigger, triggers
| name | description | priority | autoTrigger | triggers | ||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| glm-skills | Reference for Super Z/GLM platform skills and SDK. AUTO-TRIGGERS when: speech-to-text, ASR, TTS, text-to-speech, image generation, video generation, VLM, vision model, PDF processing, DOCX, XLSX, PPTX, web search, web scraping, podcast generation, multimodal AI, z-ai-web-dev-sdk. | 100 | true |
|
Super Z / GLM Skills & Agents Reference
Complete reference for the Super Z (z.ai) platform's skills system, agents, and development patterns.
SDK: z-ai-web-dev-sdk
All skills use the z-ai-web-dev-sdk JavaScript/TypeScript SDK.
Installation
npm install z-ai-web-dev-sdk
# or
bun add z-ai-web-dev-sdk
Initialization
import ZAI from 'z-ai-web-dev-sdk';
const zai = await ZAI.create();
Multimodal AI Skills
ASR (Automatic Speech Recognition)
Command: ASR
Speech-to-text using z-ai-web-dev-sdk.
// Supports base64 encoded audio
const transcription = await zai.asr.transcribe({
audio: audioBase64
});
Use Cases: Transcription, voice input, audio processing
TTS (Text-to-Speech)
Command: TTS
Convert text to natural-sounding speech.
const audio = await zai.tts.synthesize({
text: "Hello world",
voice: "default",
speed: 1.0
});
Features: Multiple voices, adjustable speed, various audio formats
LLM (Large Language Model)
Command: LLM
Chat completions with context management.
const completion = await zai.chat.completions.create({
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Hello!' }
],
temperature: 0.7
});
Features: Multi-turn conversations, system prompts, context management
VLM (Vision Language Model)
Command: VLM
Image understanding with conversational AI.
const response = await zai.vlm.analyze({
image: imageUrlOrBase64,
prompt: "Describe this image"
});
Supports: Image URLs, base64 encoded images, multimodal interactions
Image Generation
Command: image-generation
AI image creation from text.
const response = await zai.images.generations.create({
prompt: 'A cute cat playing in the garden',
size: '1024x1024'
});
const imageBase64 = response.data[0].base64;
CLI Tool:
z-ai-generate --prompt "A beautiful landscape" --output "./image.png"
z-ai-generate -p "A cute cat" -o "./cat.png" -s 1024x1024
Supported Sizes: 1024x1024, 768x1344, 864x1152, 1344x768, 1152x864, 1440x720, 720x1440
Image Edit
Command: image-edit
Modify existing images with AI.
const edited = await zai.images.edits.create({
image: originalImageBase64,
prompt: "Add a sunset background"
});
Use Cases: Variations, redesign, text-based transformation
Image Understand
Command: image-understand
Analyze and understand images.
const analysis = await zai.image.understand({
image: imagePath,
task: "extract_text" // or "detect_objects", "classify"
});
Supports: PNG, JPEG, GIF, WebP, BMP
Video Generation
Command: video-generation
AI-powered video creation.
const task = await zai.videos.generations.create({
prompt: "A dog running in a park",
// or from image
image: imageBase64
});
// Async status polling
const status = await zai.videos.generations.status(task.id);
const result = await zai.videos.generations.retrieve(task.id);
Features: Async task management, status polling, result retrieval
Video Understand
Command: video-understand
Analyze video content.
const analysis = await zai.video.analyze({
video: videoPath,
prompt: "Describe what happens in this video"
});
Supports: MP4, AVI, MOV
Document Processing Skills
Command: pdf
Comprehensive PDF toolkit.
Capabilities:
- Extract text and tables
- Create new PDFs
- Merge/split documents
- Handle forms
DOCX
Command: docx
Word document processing.
Capabilities:
- Create and edit documents
- Tracked changes
- Comments
- Formatting preservation
- Text extraction
XLSX
Command: xlsx
Spreadsheet processing.
Capabilities:
- Create with formulas and formatting
- Read and analyze data
- Modify while preserving formulas
- Data visualization
- Formula recalculation
Supports: .xlsx, .xlsm, .csv, .tsv
PPTX
Command: pptx
Presentation processing.
Capabilities:
- Edit existing presentations
- Add slides
- Create new presentations
- Work with layouts
- Add comments/speaker notes
Web & Data Skills
Web Search
Command: web-search
Search for real-time information.
const results = await zai.functions.invoke("web_search", {
query: "What is the capital of France?",
num: 10
});
// Result type
interface SearchFunctionResultItem {
url: string;
name: string;
snippet: string;
host_name: string;
rank: number;
date: string;
favicon: string;
}
Web Reader
Command: web-reader
Extract web page content.
const content = await zai.web.read({
url: "https://example.com"
});
Features: Automatic content extraction, title, HTML, publication time
CSV Data Summarizer
Command: csv-data-summarizer
Automatic CSV analysis.
Features:
- Detects data types (sales, customer, financial, operational, survey)
- Generates correlation heatmaps
- Time-series plots
- Distribution charts
- Missing data analysis
- Automatic date detection
Built with: pandas, matplotlib, seaborn
Deep Research
Command: deep-research
Enterprise-grade research.
Triggers: "deep research", "comprehensive analysis", "research report", "compare X vs Y"
Features:
- Multi-source synthesis
- Citation tracking
- Verification
- 10+ sources
Specialized Skills
Podcast Generate
Command: podcast-generate
Create podcast episodes.
Modes:
- From uploaded text/article → dual-host dialogue
- From topic → web search + generation
Features:
- Duration scales with content (3-20 min, ~240 chars/min)
- Outputs: Markdown script + WAV audio
Story Video Generation
Command: story-video-generation
Convert sentences to story videos.
Triggers (Chinese): "生成故事", "故事视频", "把一句话变成视频"
Process: Sentence → Story → Scene Images → Video
Frontend Design
Command: frontend-design
Transform UI requirements to production code.
Features:
- Design tokens
- Accessibility compliance
- Creative execution
Use Cases: Websites, web apps, React/Vue components, dashboards, landing pages
Finance
Command: finance
Finance API integration.
Capabilities:
- Stock price queries
- Market data analysis
- Company financials
- Portfolio tracking
- Market news
- Stock screening
- Technical analysis
Gift Evaluator
Command: gift-evaluator
Spring Festival gift analysis.
Features:
- Visual perception
- Market valuation
- HTML card generation
Use Cases: Gift photos, value inquiry, authenticity, social responses
Subagents (Task Tool)
Available Agent Types
| Agent | Description | Tools |
|---|---|---|
general-purpose |
Complex research, multi-step tasks | All |
statusline-setup |
Status line configuration | Read, Edit |
Explore |
Codebase exploration | All |
Plan |
Implementation planning | All |
frontend-styling-expert |
CSS, styling, responsive design | All |
full-stack-developer |
Next.js 15 + React + Prisma | All |
Explore Agent Thoroughness
quick: Basic searchesmedium: Moderate explorationvery thorough: Comprehensive analysis
Project Environment
Standard Stack
- Next.js 15 with App Router
- Port: 3000
- Package manager: Bun
- Database: Prisma
- UI: shadcn/ui components
Commands
bun run dev # Start dev server (auto-runs)
bun run lint # Check code quality
File Output
All generated files go to:
/home/z/my-project/download/
Backend-Only Rule
z-ai-web-dev-sdk MUST be used in backend only (API routes, server components).
Design Patterns
Async Task Pattern (Video Generation)
// 1. Create task
const task = await create({ prompt });
// 2. Poll status
const status = await status(task.id);
// 3. Retrieve result when complete
const result = await retrieve(task.id);
Multi-Modal Input Pattern
// Flexible input handling
const input = {
// Text only
text: "description",
// Image only
image: base64OrUrl,
// Mixed
text: "modify this",
image: base64OrUrl
};
Function Invocation Pattern
const result = await zai.functions.invoke("function_name", {
param1: "value",
param2: 123
});
When to Use This Reference
- Building AI-powered applications: SDK patterns and examples
- Document processing: PDF/DOCX/XLSX/PPTX capabilities
- Multimodal features: Image, video, audio processing
- Web integration: Search and scraping patterns
- Agent design: Subagent patterns and capabilities
Source
- Platform: Super Z (z.ai)
- SDK: z-ai-web-dev-sdk
- Framework: Next.js 15 + React + Prisma
- UI: shadcn/ui