feat: Add GLM Tools, Skills & Agents collection

This commit is contained in:
admin
2026-02-13 12:52:16 +04:00
Unverified
commit 15f60f6c86
83 changed files with 12638 additions and 0 deletions

527
skills/glm-skills/SKILL.md Normal file
View File

@@ -0,0 +1,527 @@
---
name: glm-skills
description: "Reference for Super Z/GLM platform skills and SDK. AUTO-TRIGGERS when: speech-to-text, ASR, TTS, text-to-speech, image generation, video generation, VLM, vision model, PDF processing, DOCX, XLSX, PPTX, web search, web scraping, podcast generation, multimodal AI, z-ai-web-dev-sdk."
priority: 100
autoTrigger: true
triggers:
- "speech to text"
- "ASR"
- "transcribe"
- "text to speech"
- "TTS"
- "voice"
- "image generation"
- "generate image"
- "video generation"
- "generate video"
- "VLM"
- "vision language"
- "analyze image"
- "PDF"
- "DOCX"
- "XLSX"
- "PPTX"
- "spreadsheet"
- "presentation"
- "web search"
- "web scraping"
- "podcast"
- "multimodal"
- "z-ai-web-dev-sdk"
- "Super Z"
- "GLM"
---
# Super Z / GLM Skills & Agents Reference
Complete reference for the Super Z (z.ai) platform's skills system, agents, and development patterns.
---
## SDK: z-ai-web-dev-sdk
All skills use the `z-ai-web-dev-sdk` JavaScript/TypeScript SDK.
### Installation
```bash
npm install z-ai-web-dev-sdk
# or
bun add z-ai-web-dev-sdk
```
### Initialization
```javascript
import ZAI from 'z-ai-web-dev-sdk';
const zai = await ZAI.create();
```
---
## Multimodal AI Skills
### ASR (Automatic Speech Recognition)
**Command**: `ASR`
Speech-to-text using z-ai-web-dev-sdk.
```javascript
// Supports base64 encoded audio
const transcription = await zai.asr.transcribe({
audio: audioBase64
});
```
**Use Cases**: Transcription, voice input, audio processing
---
### TTS (Text-to-Speech)
**Command**: `TTS`
Convert text to natural-sounding speech.
```javascript
const audio = await zai.tts.synthesize({
text: "Hello world",
voice: "default",
speed: 1.0
});
```
**Features**: Multiple voices, adjustable speed, various audio formats
---
### LLM (Large Language Model)
**Command**: `LLM`
Chat completions with context management.
```javascript
const completion = await zai.chat.completions.create({
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Hello!' }
],
temperature: 0.7
});
```
**Features**: Multi-turn conversations, system prompts, context management
---
### VLM (Vision Language Model)
**Command**: `VLM`
Image understanding with conversational AI.
```javascript
const response = await zai.vlm.analyze({
image: imageUrlOrBase64,
prompt: "Describe this image"
});
```
**Supports**: Image URLs, base64 encoded images, multimodal interactions
---
### Image Generation
**Command**: `image-generation`
AI image creation from text.
```javascript
const response = await zai.images.generations.create({
prompt: 'A cute cat playing in the garden',
size: '1024x1024'
});
const imageBase64 = response.data[0].base64;
```
**CLI Tool**:
```bash
z-ai-generate --prompt "A beautiful landscape" --output "./image.png"
z-ai-generate -p "A cute cat" -o "./cat.png" -s 1024x1024
```
**Supported Sizes**: 1024x1024, 768x1344, 864x1152, 1344x768, 1152x864, 1440x720, 720x1440
---
### Image Edit
**Command**: `image-edit`
Modify existing images with AI.
```javascript
const edited = await zai.images.edits.create({
image: originalImageBase64,
prompt: "Add a sunset background"
});
```
**Use Cases**: Variations, redesign, text-based transformation
---
### Image Understand
**Command**: `image-understand`
Analyze and understand images.
```javascript
const analysis = await zai.image.understand({
image: imagePath,
task: "extract_text" // or "detect_objects", "classify"
});
```
**Supports**: PNG, JPEG, GIF, WebP, BMP
---
### Video Generation
**Command**: `video-generation`
AI-powered video creation.
```javascript
const task = await zai.videos.generations.create({
prompt: "A dog running in a park",
// or from image
image: imageBase64
});
// Async status polling
const status = await zai.videos.generations.status(task.id);
const result = await zai.videos.generations.retrieve(task.id);
```
**Features**: Async task management, status polling, result retrieval
---
### Video Understand
**Command**: `video-understand`
Analyze video content.
```javascript
const analysis = await zai.video.analyze({
video: videoPath,
prompt: "Describe what happens in this video"
});
```
**Supports**: MP4, AVI, MOV
---
## Document Processing Skills
### PDF
**Command**: `pdf`
Comprehensive PDF toolkit.
**Capabilities**:
- Extract text and tables
- Create new PDFs
- Merge/split documents
- Handle forms
---
### DOCX
**Command**: `docx`
Word document processing.
**Capabilities**:
- Create and edit documents
- Tracked changes
- Comments
- Formatting preservation
- Text extraction
---
### XLSX
**Command**: `xlsx`
Spreadsheet processing.
**Capabilities**:
- Create with formulas and formatting
- Read and analyze data
- Modify while preserving formulas
- Data visualization
- Formula recalculation
**Supports**: .xlsx, .xlsm, .csv, .tsv
---
### PPTX
**Command**: `pptx`
Presentation processing.
**Capabilities**:
- Edit existing presentations
- Add slides
- Create new presentations
- Work with layouts
- Add comments/speaker notes
---
## Web & Data Skills
### Web Search
**Command**: `web-search`
Search for real-time information.
```javascript
const results = await zai.functions.invoke("web_search", {
query: "What is the capital of France?",
num: 10
});
// Result type
interface SearchFunctionResultItem {
url: string;
name: string;
snippet: string;
host_name: string;
rank: number;
date: string;
favicon: string;
}
```
---
### Web Reader
**Command**: `web-reader`
Extract web page content.
```javascript
const content = await zai.web.read({
url: "https://example.com"
});
```
**Features**: Automatic content extraction, title, HTML, publication time
---
### CSV Data Summarizer
**Command**: `csv-data-summarizer`
Automatic CSV analysis.
**Features**:
- Detects data types (sales, customer, financial, operational, survey)
- Generates correlation heatmaps
- Time-series plots
- Distribution charts
- Missing data analysis
- Automatic date detection
**Built with**: pandas, matplotlib, seaborn
---
### Deep Research
**Command**: `deep-research`
Enterprise-grade research.
**Triggers**: "deep research", "comprehensive analysis", "research report", "compare X vs Y"
**Features**:
- Multi-source synthesis
- Citation tracking
- Verification
- 10+ sources
---
## Specialized Skills
### Podcast Generate
**Command**: `podcast-generate`
Create podcast episodes.
**Modes**:
1. From uploaded text/article → dual-host dialogue
2. From topic → web search + generation
**Features**:
- Duration scales with content (3-20 min, ~240 chars/min)
- Outputs: Markdown script + WAV audio
---
### Story Video Generation
**Command**: `story-video-generation`
Convert sentences to story videos.
**Triggers** (Chinese): "生成故事", "故事视频", "把一句话变成视频"
**Process**: Sentence → Story → Scene Images → Video
---
### Frontend Design
**Command**: `frontend-design`
Transform UI requirements to production code.
**Features**:
- Design tokens
- Accessibility compliance
- Creative execution
**Use Cases**: Websites, web apps, React/Vue components, dashboards, landing pages
---
### Finance
**Command**: `finance`
Finance API integration.
**Capabilities**:
- Stock price queries
- Market data analysis
- Company financials
- Portfolio tracking
- Market news
- Stock screening
- Technical analysis
---
### Gift Evaluator
**Command**: `gift-evaluator`
Spring Festival gift analysis.
**Features**:
- Visual perception
- Market valuation
- HTML card generation
**Use Cases**: Gift photos, value inquiry, authenticity, social responses
---
## Subagents (Task Tool)
### Available Agent Types
| Agent | Description | Tools |
|-------|-------------|-------|
| `general-purpose` | Complex research, multi-step tasks | All |
| `statusline-setup` | Status line configuration | Read, Edit |
| `Explore` | Codebase exploration | All |
| `Plan` | Implementation planning | All |
| `frontend-styling-expert` | CSS, styling, responsive design | All |
| `full-stack-developer` | Next.js 15 + React + Prisma | All |
### Explore Agent Thoroughness
- `quick`: Basic searches
- `medium`: Moderate exploration
- `very thorough`: Comprehensive analysis
---
## Project Environment
### Standard Stack
- Next.js 15 with App Router
- Port: 3000
- Package manager: Bun
- Database: Prisma
- UI: shadcn/ui components
### Commands
```bash
bun run dev # Start dev server (auto-runs)
bun run lint # Check code quality
```
### File Output
All generated files go to:
```
/home/z/my-project/download/
```
### Backend-Only Rule
`z-ai-web-dev-sdk` MUST be used in backend only (API routes, server components).
---
## Design Patterns
### Async Task Pattern (Video Generation)
```javascript
// 1. Create task
const task = await create({ prompt });
// 2. Poll status
const status = await status(task.id);
// 3. Retrieve result when complete
const result = await retrieve(task.id);
```
### Multi-Modal Input Pattern
```javascript
// Flexible input handling
const input = {
// Text only
text: "description",
// Image only
image: base64OrUrl,
// Mixed
text: "modify this",
image: base64OrUrl
};
```
### Function Invocation Pattern
```javascript
const result = await zai.functions.invoke("function_name", {
param1: "value",
param2: 123
});
```
---
## When to Use This Reference
1. **Building AI-powered applications**: SDK patterns and examples
2. **Document processing**: PDF/DOCX/XLSX/PPTX capabilities
3. **Multimodal features**: Image, video, audio processing
4. **Web integration**: Search and scraping patterns
5. **Agent design**: Subagent patterns and capabilities
## Source
- Platform: Super Z (z.ai)
- SDK: z-ai-web-dev-sdk
- Framework: Next.js 15 + React + Prisma
- UI: shadcn/ui