---
name: glm-skills
description: "Reference for Super Z/GLM platform skills and SDK. AUTO-TRIGGERS when: speech-to-text, ASR, TTS, text-to-speech, image generation, video generation, VLM, vision model, PDF processing, DOCX, XLSX, PPTX, web search, web scraping, podcast generation, multimodal AI, z-ai-web-dev-sdk."
priority: 100
autoTrigger: true
triggers:
  - "speech to text"
  - "ASR"
  - "transcribe"
  - "text to speech"
  - "TTS"
  - "voice"
  - "image generation"
  - "generate image"
  - "video generation"
  - "generate video"
  - "VLM"
  - "vision language"
  - "analyze image"
  - "PDF"
  - "DOCX"
  - "XLSX"
  - "PPTX"
  - "spreadsheet"
  - "presentation"
  - "web search"
  - "web scraping"
  - "podcast"
  - "multimodal"
  - "z-ai-web-dev-sdk"
  - "Super Z"
  - "GLM"
---

# Super Z / GLM Skills & Agents Reference

Complete reference for the Super Z (z.ai) platform's skills system, agents, and development patterns.

---

## SDK: z-ai-web-dev-sdk

All skills use the `z-ai-web-dev-sdk` JavaScript/TypeScript SDK.

### Installation
```bash
npm install z-ai-web-dev-sdk
# or
bun add z-ai-web-dev-sdk
```

### Initialization
```javascript
import ZAI from 'z-ai-web-dev-sdk';

const zai = await ZAI.create();
```

---

## Multimodal AI Skills

### ASR (Automatic Speech Recognition)
**Command**: `ASR`

Speech-to-text using z-ai-web-dev-sdk.

```javascript
// Supports base64 encoded audio
const transcription = await zai.asr.transcribe({
  audio: audioBase64
});
```

**Use Cases**: Transcription, voice input, audio processing

---

### TTS (Text-to-Speech)
**Command**: `TTS`

Convert text to natural-sounding speech.

```javascript
const audio = await zai.tts.synthesize({
  text: "Hello world",
  voice: "default",
  speed: 1.0
});
```

**Features**: Multiple voices, adjustable speed, various audio formats

---

### LLM (Large Language Model)
**Command**: `LLM`

Chat completions with context management.

```javascript
const completion = await zai.chat.completions.create({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Hello!' }
  ],
  temperature: 0.7
});
```

**Features**: Multi-turn conversations, system prompts, context management

---

### VLM (Vision Language Model)
**Command**: `VLM`

Image understanding with conversational AI.

```javascript
const response = await zai.vlm.analyze({
  image: imageUrlOrBase64,
  prompt: "Describe this image"
});
```

**Supports**: Image URLs, base64 encoded images, multimodal interactions

---

### Image Generation
**Command**: `image-generation`

AI image creation from text.

```javascript
const response = await zai.images.generations.create({
  prompt: 'A cute cat playing in the garden',
  size: '1024x1024'
});

const imageBase64 = response.data[0].base64;
```

**CLI Tool**:
```bash
z-ai-generate --prompt "A beautiful landscape" --output "./image.png"
z-ai-generate -p "A cute cat" -o "./cat.png" -s 1024x1024
```

**Supported Sizes**: 1024x1024, 768x1344, 864x1152, 1344x768, 1152x864, 1440x720, 720x1440

---

### Image Edit
**Command**: `image-edit`

Modify existing images with AI.

```javascript
const edited = await zai.images.edits.create({
  image: originalImageBase64,
  prompt: "Add a sunset background"
});
```

**Use Cases**: Variations, redesign, text-based transformation

---

### Image Understand
**Command**: `image-understand`

Analyze and understand images.

```javascript
const analysis = await zai.image.understand({
  image: imagePath,
  task: "extract_text" // or "detect_objects", "classify"
});
```

**Supports**: PNG, JPEG, GIF, WebP, BMP

---

### Video Generation
**Command**: `video-generation`

AI-powered video creation.

```javascript
const task = await zai.videos.generations.create({
  prompt: "A dog running in a park",
  // or from image
  image: imageBase64
});

// Async status polling
const status = await zai.videos.generations.status(task.id);
const result = await zai.videos.generations.retrieve(task.id);
```

**Features**: Async task management, status polling, result retrieval

---

### Video Understand
**Command**: `video-understand`

Analyze video content.

```javascript
const analysis = await zai.video.analyze({
  video: videoPath,
  prompt: "Describe what happens in this video"
});
```

**Supports**: MP4, AVI, MOV

---

## Document Processing Skills

### PDF
**Command**: `pdf`

Comprehensive PDF toolkit.

**Capabilities**:
- Extract text and tables
- Create new PDFs
- Merge/split documents
- Handle forms

---

### DOCX
**Command**: `docx`

Word document processing.

**Capabilities**:
- Create and edit documents
- Tracked changes
- Comments
- Formatting preservation
- Text extraction

---

### XLSX
**Command**: `xlsx`

Spreadsheet processing.

**Capabilities**:
- Create with formulas and formatting
- Read and analyze data
- Modify while preserving formulas
- Data visualization
- Formula recalculation

**Supports**: .xlsx, .xlsm, .csv, .tsv

---

### PPTX
**Command**: `pptx`

Presentation processing.

**Capabilities**:
- Edit existing presentations
- Add slides
- Create new presentations
- Work with layouts
- Add comments/speaker notes

---

## Web & Data Skills

### Web Search
**Command**: `web-search`

Search for real-time information.

```javascript
const results = await zai.functions.invoke("web_search", {
  query: "What is the capital of France?",
  num: 10
});

// Result type
interface SearchFunctionResultItem {
  url: string;
  name: string;
  snippet: string;
  host_name: string;
  rank: number;
  date: string;
  favicon: string;
}
```

---

### Web Reader
**Command**: `web-reader`

Extract web page content.

```javascript
const content = await zai.web.read({
  url: "https://example.com"
});
```

**Features**: Automatic content extraction, title, HTML, publication time

---

### CSV Data Summarizer
**Command**: `csv-data-summarizer`

Automatic CSV analysis.

**Features**:
- Detects data types (sales, customer, financial, operational, survey)
- Generates correlation heatmaps
- Time-series plots
- Distribution charts
- Missing data analysis
- Automatic date detection

**Built with**: pandas, matplotlib, seaborn

---

### Deep Research
**Command**: `deep-research`

Enterprise-grade research.

**Triggers**: "deep research", "comprehensive analysis", "research report", "compare X vs Y"

**Features**:
- Multi-source synthesis
- Citation tracking
- Verification
- 10+ sources

---

## Specialized Skills

### Podcast Generate
**Command**: `podcast-generate`

Create podcast episodes.

**Modes**:
1. From uploaded text/article → dual-host dialogue
2. From topic → web search + generation

**Features**:
- Duration scales with content (3-20 min, ~240 chars/min)
- Outputs: Markdown script + WAV audio

---

### Story Video Generation
**Command**: `story-video-generation`

Convert sentences to story videos.

**Triggers** (Chinese): "生成故事", "故事视频", "把一句话变成视频"

**Process**: Sentence → Story → Scene Images → Video

---

### Frontend Design
**Command**: `frontend-design`

Transform UI requirements to production code.

**Features**:
- Design tokens
- Accessibility compliance
- Creative execution

**Use Cases**: Websites, web apps, React/Vue components, dashboards, landing pages

---

### Finance
**Command**: `finance`

Finance API integration.

**Capabilities**:
- Stock price queries
- Market data analysis
- Company financials
- Portfolio tracking
- Market news
- Stock screening
- Technical analysis

---

### Gift Evaluator
**Command**: `gift-evaluator`

Spring Festival gift analysis.

**Features**:
- Visual perception
- Market valuation
- HTML card generation

**Use Cases**: Gift photos, value inquiry, authenticity, social responses

---

## Subagents (Task Tool)

### Available Agent Types

| Agent | Description | Tools |
|-------|-------------|-------|
| `general-purpose` | Complex research, multi-step tasks | All |
| `statusline-setup` | Status line configuration | Read, Edit |
| `Explore` | Codebase exploration | All |
| `Plan` | Implementation planning | All |
| `frontend-styling-expert` | CSS, styling, responsive design | All |
| `full-stack-developer` | Next.js 15 + React + Prisma | All |

### Explore Agent Thoroughness
- `quick`: Basic searches
- `medium`: Moderate exploration
- `very thorough`: Comprehensive analysis

---

## Project Environment

### Standard Stack
- Next.js 15 with App Router
- Port: 3000
- Package manager: Bun
- Database: Prisma
- UI: shadcn/ui components

### Commands
```bash
bun run dev    # Start dev server (auto-runs)
bun run lint   # Check code quality
```

### File Output
All generated files go to:
```
/home/z/my-project/download/
```

### Backend-Only Rule
`z-ai-web-dev-sdk` MUST be used in backend only (API routes, server components).

---

## Design Patterns

### Async Task Pattern (Video Generation)
```javascript
// 1. Create task
const task = await create({ prompt });

// 2. Poll status
const status = await status(task.id);

// 3. Retrieve result when complete
const result = await retrieve(task.id);
```

### Multi-Modal Input Pattern
```javascript
// Flexible input handling
const input = {
  // Text only
  text: "description",

  // Image only
  image: base64OrUrl,

  // Mixed
  text: "modify this",
  image: base64OrUrl
};
```

### Function Invocation Pattern
```javascript
const result = await zai.functions.invoke("function_name", {
  param1: "value",
  param2: 123
});
```

---

## When to Use This Reference

1. **Building AI-powered applications**: SDK patterns and examples
2. **Document processing**: PDF/DOCX/XLSX/PPTX capabilities
3. **Multimodal features**: Image, video, audio processing
4. **Web integration**: Search and scraping patterns
5. **Agent design**: Subagent patterns and capabilities

## Source
- Platform: Super Z (z.ai)
- SDK: z-ai-web-dev-sdk
- Framework: Next.js 15 + React + Prisma
- UI: shadcn/ui