feat: Add GLM Tools, Skills & Agents collection

2026-02-13 12:52:16 +04:00
commit 15f60f6c86
83 changed files with 12638 additions and 0 deletions
--- a/skills/glm-skills/SKILL.md
+++ b/skills/glm-skills/SKILL.md
@@ -0,0 +1,527 @@
+---
+name: glm-skills
+description: "Reference for Super Z/GLM platform skills and SDK. AUTO-TRIGGERS when: speech-to-text, ASR, TTS, text-to-speech, image generation, video generation, VLM, vision model, PDF processing, DOCX, XLSX, PPTX, web search, web scraping, podcast generation, multimodal AI, z-ai-web-dev-sdk."
+priority: 100
+autoTrigger: true
+triggers:
+  - "speech to text"
+  - "ASR"
+  - "transcribe"
+  - "text to speech"
+  - "TTS"
+  - "voice"
+  - "image generation"
+  - "generate image"
+  - "video generation"
+  - "generate video"
+  - "VLM"
+  - "vision language"
+  - "analyze image"
+  - "PDF"
+  - "DOCX"
+  - "XLSX"
+  - "PPTX"
+  - "spreadsheet"
+  - "presentation"
+  - "web search"
+  - "web scraping"
+  - "podcast"
+  - "multimodal"
+  - "z-ai-web-dev-sdk"
+  - "Super Z"
+  - "GLM"
+---
+
+# Super Z / GLM Skills & Agents Reference
+
+Complete reference for the Super Z (z.ai) platform's skills system, agents, and development patterns.
+
+---
+
+## SDK: z-ai-web-dev-sdk
+
+All skills use the `z-ai-web-dev-sdk` JavaScript/TypeScript SDK.
+
+### Installation
+```bash
+npm install z-ai-web-dev-sdk
+# or
+bun add z-ai-web-dev-sdk
+```
+
+### Initialization
+```javascript
+import ZAI from 'z-ai-web-dev-sdk';
+
+const zai = await ZAI.create();
+```
+
+---
+
+## Multimodal AI Skills
+
+### ASR (Automatic Speech Recognition)
+**Command**: `ASR`
+
+Speech-to-text using z-ai-web-dev-sdk.
+
+```javascript
+// Supports base64 encoded audio
+const transcription = await zai.asr.transcribe({
+  audio: audioBase64
+});
+```
+
+**Use Cases**: Transcription, voice input, audio processing
+
+---
+
+### TTS (Text-to-Speech)
+**Command**: `TTS`
+
+Convert text to natural-sounding speech.
+
+```javascript
+const audio = await zai.tts.synthesize({
+  text: "Hello world",
+  voice: "default",
+  speed: 1.0
+});
+```
+
+**Features**: Multiple voices, adjustable speed, various audio formats
+
+---
+
+### LLM (Large Language Model)
+**Command**: `LLM`
+
+Chat completions with context management.
+
+```javascript
+const completion = await zai.chat.completions.create({
+  messages: [
+    { role: 'system', content: 'You are a helpful assistant.' },
+    { role: 'user', content: 'Hello!' }
+  ],
+  temperature: 0.7
+});
+```
+
+**Features**: Multi-turn conversations, system prompts, context management
+
+---
+
+### VLM (Vision Language Model)
+**Command**: `VLM`
+
+Image understanding with conversational AI.
+
+```javascript
+const response = await zai.vlm.analyze({
+  image: imageUrlOrBase64,
+  prompt: "Describe this image"
+});
+```
+
+**Supports**: Image URLs, base64 encoded images, multimodal interactions
+
+---
+
+### Image Generation
+**Command**: `image-generation`
+
+AI image creation from text.
+
+```javascript
+const response = await zai.images.generations.create({
+  prompt: 'A cute cat playing in the garden',
+  size: '1024x1024'
+});
+
+const imageBase64 = response.data[0].base64;
+```
+
+**CLI Tool**:
+```bash
+z-ai-generate --prompt "A beautiful landscape" --output "./image.png"
+z-ai-generate -p "A cute cat" -o "./cat.png" -s 1024x1024
+```
+
+**Supported Sizes**: 1024x1024, 768x1344, 864x1152, 1344x768, 1152x864, 1440x720, 720x1440
+
+---
+
+### Image Edit
+**Command**: `image-edit`
+
+Modify existing images with AI.
+
+```javascript
+const edited = await zai.images.edits.create({
+  image: originalImageBase64,
+  prompt: "Add a sunset background"
+});
+```
+
+**Use Cases**: Variations, redesign, text-based transformation
+
+---
+
+### Image Understand
+**Command**: `image-understand`
+
+Analyze and understand images.
+
+```javascript
+const analysis = await zai.image.understand({
+  image: imagePath,
+  task: "extract_text" // or "detect_objects", "classify"
+});
+```
+
+**Supports**: PNG, JPEG, GIF, WebP, BMP
+
+---
+
+### Video Generation
+**Command**: `video-generation`
+
+AI-powered video creation.
+
+```javascript
+const task = await zai.videos.generations.create({
+  prompt: "A dog running in a park",
+  // or from image
+  image: imageBase64
+});
+
+// Async status polling
+const status = await zai.videos.generations.status(task.id);
+const result = await zai.videos.generations.retrieve(task.id);
+```
+
+**Features**: Async task management, status polling, result retrieval
+
+---
+
+### Video Understand
+**Command**: `video-understand`
+
+Analyze video content.
+
+```javascript
+const analysis = await zai.video.analyze({
+  video: videoPath,
+  prompt: "Describe what happens in this video"
+});
+```
+
+**Supports**: MP4, AVI, MOV
+
+---
+
+## Document Processing Skills
+
+### PDF
+**Command**: `pdf`
+
+Comprehensive PDF toolkit.
+
+**Capabilities**:
+- Extract text and tables
+- Create new PDFs
+- Merge/split documents
+- Handle forms
+
+---
+
+### DOCX
+**Command**: `docx`
+
+Word document processing.
+
+**Capabilities**:
+- Create and edit documents
+- Tracked changes
+- Comments
+- Formatting preservation
+- Text extraction
+
+---
+
+### XLSX
+**Command**: `xlsx`
+
+Spreadsheet processing.
+
+**Capabilities**:
+- Create with formulas and formatting
+- Read and analyze data
+- Modify while preserving formulas
+- Data visualization
+- Formula recalculation
+
+**Supports**: .xlsx, .xlsm, .csv, .tsv
+
+---
+
+### PPTX
+**Command**: `pptx`
+
+Presentation processing.
+
+**Capabilities**:
+- Edit existing presentations
+- Add slides
+- Create new presentations
+- Work with layouts
+- Add comments/speaker notes
+
+---
+
+## Web & Data Skills
+
+### Web Search
+**Command**: `web-search`
+
+Search for real-time information.
+
+```javascript
+const results = await zai.functions.invoke("web_search", {
+  query: "What is the capital of France?",
+  num: 10
+});
+
+// Result type
+interface SearchFunctionResultItem {
+  url: string;
+  name: string;
+  snippet: string;
+  host_name: string;
+  rank: number;
+  date: string;
+  favicon: string;
+}
+```
+
+---
+
+### Web Reader
+**Command**: `web-reader`
+
+Extract web page content.
+
+```javascript
+const content = await zai.web.read({
+  url: "https://example.com"
+});
+```
+
+**Features**: Automatic content extraction, title, HTML, publication time
+
+---
+
+### CSV Data Summarizer
+**Command**: `csv-data-summarizer`
+
+Automatic CSV analysis.
+
+**Features**:
+- Detects data types (sales, customer, financial, operational, survey)
+- Generates correlation heatmaps
+- Time-series plots
+- Distribution charts
+- Missing data analysis
+- Automatic date detection
+
+**Built with**: pandas, matplotlib, seaborn
+
+---
+
+### Deep Research
+**Command**: `deep-research`
+
+Enterprise-grade research.
+
+**Triggers**: "deep research", "comprehensive analysis", "research report", "compare X vs Y"
+
+**Features**:
+- Multi-source synthesis
+- Citation tracking
+- Verification
+- 10+ sources
+
+---
+
+## Specialized Skills
+
+### Podcast Generate
+**Command**: `podcast-generate`
+
+Create podcast episodes.
+
+**Modes**:
+1. From uploaded text/article → dual-host dialogue
+2. From topic → web search + generation
+
+**Features**:
+- Duration scales with content (3-20 min, ~240 chars/min)
+- Outputs: Markdown script + WAV audio
+
+---
+
+### Story Video Generation
+**Command**: `story-video-generation`
+
+Convert sentences to story videos.
+
+**Triggers** (Chinese): "生成故事", "故事视频", "把一句话变成视频"
+
+**Process**: Sentence → Story → Scene Images → Video
+
+---
+
+### Frontend Design
+**Command**: `frontend-design`
+
+Transform UI requirements to production code.
+
+**Features**:
+- Design tokens
+- Accessibility compliance
+- Creative execution
+
+**Use Cases**: Websites, web apps, React/Vue components, dashboards, landing pages
+
+---
+
+### Finance
+**Command**: `finance`
+
+Finance API integration.
+
+**Capabilities**:
+- Stock price queries
+- Market data analysis
+- Company financials
+- Portfolio tracking
+- Market news
+- Stock screening
+- Technical analysis
+
+---
+
+### Gift Evaluator
+**Command**: `gift-evaluator`
+
+Spring Festival gift analysis.
+
+**Features**:
+- Visual perception
+- Market valuation
+- HTML card generation
+
+**Use Cases**: Gift photos, value inquiry, authenticity, social responses
+
+---
+
+## Subagents (Task Tool)
+
+### Available Agent Types
+
+| Agent | Description | Tools |
+|-------|-------------|-------|
+| `general-purpose` | Complex research, multi-step tasks | All |
+| `statusline-setup` | Status line configuration | Read, Edit |
+| `Explore` | Codebase exploration | All |
+| `Plan` | Implementation planning | All |
+| `frontend-styling-expert` | CSS, styling, responsive design | All |
+| `full-stack-developer` | Next.js 15 + React + Prisma | All |
+
+### Explore Agent Thoroughness
+- `quick`: Basic searches
+- `medium`: Moderate exploration
+- `very thorough`: Comprehensive analysis
+
+---
+
+## Project Environment
+
+### Standard Stack
+- Next.js 15 with App Router
+- Port: 3000
+- Package manager: Bun
+- Database: Prisma
+- UI: shadcn/ui components
+
+### Commands
+```bash
+bun run dev    # Start dev server (auto-runs)
+bun run lint   # Check code quality
+```
+
+### File Output
+All generated files go to:
+```
+/home/z/my-project/download/
+```
+
+### Backend-Only Rule
+`z-ai-web-dev-sdk` MUST be used in backend only (API routes, server components).
+
+---
+
+## Design Patterns
+
+### Async Task Pattern (Video Generation)
+```javascript
+// 1. Create task
+const task = await create({ prompt });
+
+// 2. Poll status
+const status = await status(task.id);
+
+// 3. Retrieve result when complete
+const result = await retrieve(task.id);
+```
+
+### Multi-Modal Input Pattern
+```javascript
+// Flexible input handling
+const input = {
+  // Text only
+  text: "description",
+
+  // Image only
+  image: base64OrUrl,
+
+  // Mixed
+  text: "modify this",
+  image: base64OrUrl
+};
+```
+
+### Function Invocation Pattern
+```javascript
+const result = await zai.functions.invoke("function_name", {
+  param1: "value",
+  param2: 123
+});
+```
+
+---
+
+## When to Use This Reference
+
+1. **Building AI-powered applications**: SDK patterns and examples
+2. **Document processing**: PDF/DOCX/XLSX/PPTX capabilities
+3. **Multimodal features**: Image, video, audio processing
+4. **Web integration**: Search and scraping patterns
+5. **Agent design**: Subagent patterns and capabilities
+
+## Source
+- Platform: Super Z (z.ai)
+- SDK: z-ai-web-dev-sdk
+- Framework: Next.js 15 + React + Prisma
+- UI: shadcn/ui