--- name: glm-skills description: "Reference for Super Z/GLM platform skills and SDK. AUTO-TRIGGERS when: speech-to-text, ASR, TTS, text-to-speech, image generation, video generation, VLM, vision model, PDF processing, DOCX, XLSX, PPTX, web search, web scraping, podcast generation, multimodal AI, z-ai-web-dev-sdk." priority: 100 autoTrigger: true triggers: - "speech to text" - "ASR" - "transcribe" - "text to speech" - "TTS" - "voice" - "image generation" - "generate image" - "video generation" - "generate video" - "VLM" - "vision language" - "analyze image" - "PDF" - "DOCX" - "XLSX" - "PPTX" - "spreadsheet" - "presentation" - "web search" - "web scraping" - "podcast" - "multimodal" - "z-ai-web-dev-sdk" - "Super Z" - "GLM" --- # Super Z / GLM Skills & Agents Reference Complete reference for the Super Z (z.ai) platform's skills system, agents, and development patterns. --- ## SDK: z-ai-web-dev-sdk All skills use the `z-ai-web-dev-sdk` JavaScript/TypeScript SDK. ### Installation ```bash npm install z-ai-web-dev-sdk # or bun add z-ai-web-dev-sdk ``` ### Initialization ```javascript import ZAI from 'z-ai-web-dev-sdk'; const zai = await ZAI.create(); ``` --- ## Multimodal AI Skills ### ASR (Automatic Speech Recognition) **Command**: `ASR` Speech-to-text using z-ai-web-dev-sdk. ```javascript // Supports base64 encoded audio const transcription = await zai.asr.transcribe({ audio: audioBase64 }); ``` **Use Cases**: Transcription, voice input, audio processing --- ### TTS (Text-to-Speech) **Command**: `TTS` Convert text to natural-sounding speech. ```javascript const audio = await zai.tts.synthesize({ text: "Hello world", voice: "default", speed: 1.0 }); ``` **Features**: Multiple voices, adjustable speed, various audio formats --- ### LLM (Large Language Model) **Command**: `LLM` Chat completions with context management. ```javascript const completion = await zai.chat.completions.create({ messages: [ { role: 'system', content: 'You are a helpful assistant.' }, { role: 'user', content: 'Hello!' } ], temperature: 0.7 }); ``` **Features**: Multi-turn conversations, system prompts, context management --- ### VLM (Vision Language Model) **Command**: `VLM` Image understanding with conversational AI. ```javascript const response = await zai.vlm.analyze({ image: imageUrlOrBase64, prompt: "Describe this image" }); ``` **Supports**: Image URLs, base64 encoded images, multimodal interactions --- ### Image Generation **Command**: `image-generation` AI image creation from text. ```javascript const response = await zai.images.generations.create({ prompt: 'A cute cat playing in the garden', size: '1024x1024' }); const imageBase64 = response.data[0].base64; ``` **CLI Tool**: ```bash z-ai-generate --prompt "A beautiful landscape" --output "./image.png" z-ai-generate -p "A cute cat" -o "./cat.png" -s 1024x1024 ``` **Supported Sizes**: 1024x1024, 768x1344, 864x1152, 1344x768, 1152x864, 1440x720, 720x1440 --- ### Image Edit **Command**: `image-edit` Modify existing images with AI. ```javascript const edited = await zai.images.edits.create({ image: originalImageBase64, prompt: "Add a sunset background" }); ``` **Use Cases**: Variations, redesign, text-based transformation --- ### Image Understand **Command**: `image-understand` Analyze and understand images. ```javascript const analysis = await zai.image.understand({ image: imagePath, task: "extract_text" // or "detect_objects", "classify" }); ``` **Supports**: PNG, JPEG, GIF, WebP, BMP --- ### Video Generation **Command**: `video-generation` AI-powered video creation. ```javascript const task = await zai.videos.generations.create({ prompt: "A dog running in a park", // or from image image: imageBase64 }); // Async status polling const status = await zai.videos.generations.status(task.id); const result = await zai.videos.generations.retrieve(task.id); ``` **Features**: Async task management, status polling, result retrieval --- ### Video Understand **Command**: `video-understand` Analyze video content. ```javascript const analysis = await zai.video.analyze({ video: videoPath, prompt: "Describe what happens in this video" }); ``` **Supports**: MP4, AVI, MOV --- ## Document Processing Skills ### PDF **Command**: `pdf` Comprehensive PDF toolkit. **Capabilities**: - Extract text and tables - Create new PDFs - Merge/split documents - Handle forms --- ### DOCX **Command**: `docx` Word document processing. **Capabilities**: - Create and edit documents - Tracked changes - Comments - Formatting preservation - Text extraction --- ### XLSX **Command**: `xlsx` Spreadsheet processing. **Capabilities**: - Create with formulas and formatting - Read and analyze data - Modify while preserving formulas - Data visualization - Formula recalculation **Supports**: .xlsx, .xlsm, .csv, .tsv --- ### PPTX **Command**: `pptx` Presentation processing. **Capabilities**: - Edit existing presentations - Add slides - Create new presentations - Work with layouts - Add comments/speaker notes --- ## Web & Data Skills ### Web Search **Command**: `web-search` Search for real-time information. ```javascript const results = await zai.functions.invoke("web_search", { query: "What is the capital of France?", num: 10 }); // Result type interface SearchFunctionResultItem { url: string; name: string; snippet: string; host_name: string; rank: number; date: string; favicon: string; } ``` --- ### Web Reader **Command**: `web-reader` Extract web page content. ```javascript const content = await zai.web.read({ url: "https://example.com" }); ``` **Features**: Automatic content extraction, title, HTML, publication time --- ### CSV Data Summarizer **Command**: `csv-data-summarizer` Automatic CSV analysis. **Features**: - Detects data types (sales, customer, financial, operational, survey) - Generates correlation heatmaps - Time-series plots - Distribution charts - Missing data analysis - Automatic date detection **Built with**: pandas, matplotlib, seaborn --- ### Deep Research **Command**: `deep-research` Enterprise-grade research. **Triggers**: "deep research", "comprehensive analysis", "research report", "compare X vs Y" **Features**: - Multi-source synthesis - Citation tracking - Verification - 10+ sources --- ## Specialized Skills ### Podcast Generate **Command**: `podcast-generate` Create podcast episodes. **Modes**: 1. From uploaded text/article → dual-host dialogue 2. From topic → web search + generation **Features**: - Duration scales with content (3-20 min, ~240 chars/min) - Outputs: Markdown script + WAV audio --- ### Story Video Generation **Command**: `story-video-generation` Convert sentences to story videos. **Triggers** (Chinese): "生成故事", "故事视频", "把一句话变成视频" **Process**: Sentence → Story → Scene Images → Video --- ### Frontend Design **Command**: `frontend-design` Transform UI requirements to production code. **Features**: - Design tokens - Accessibility compliance - Creative execution **Use Cases**: Websites, web apps, React/Vue components, dashboards, landing pages --- ### Finance **Command**: `finance` Finance API integration. **Capabilities**: - Stock price queries - Market data analysis - Company financials - Portfolio tracking - Market news - Stock screening - Technical analysis --- ### Gift Evaluator **Command**: `gift-evaluator` Spring Festival gift analysis. **Features**: - Visual perception - Market valuation - HTML card generation **Use Cases**: Gift photos, value inquiry, authenticity, social responses --- ## Subagents (Task Tool) ### Available Agent Types | Agent | Description | Tools | |-------|-------------|-------| | `general-purpose` | Complex research, multi-step tasks | All | | `statusline-setup` | Status line configuration | Read, Edit | | `Explore` | Codebase exploration | All | | `Plan` | Implementation planning | All | | `frontend-styling-expert` | CSS, styling, responsive design | All | | `full-stack-developer` | Next.js 15 + React + Prisma | All | ### Explore Agent Thoroughness - `quick`: Basic searches - `medium`: Moderate exploration - `very thorough`: Comprehensive analysis --- ## Project Environment ### Standard Stack - Next.js 15 with App Router - Port: 3000 - Package manager: Bun - Database: Prisma - UI: shadcn/ui components ### Commands ```bash bun run dev # Start dev server (auto-runs) bun run lint # Check code quality ``` ### File Output All generated files go to: ``` /home/z/my-project/download/ ``` ### Backend-Only Rule `z-ai-web-dev-sdk` MUST be used in backend only (API routes, server components). --- ## Design Patterns ### Async Task Pattern (Video Generation) ```javascript // 1. Create task const task = await create({ prompt }); // 2. Poll status const status = await status(task.id); // 3. Retrieve result when complete const result = await retrieve(task.id); ``` ### Multi-Modal Input Pattern ```javascript // Flexible input handling const input = { // Text only text: "description", // Image only image: base64OrUrl, // Mixed text: "modify this", image: base64OrUrl }; ``` ### Function Invocation Pattern ```javascript const result = await zai.functions.invoke("function_name", { param1: "value", param2: 123 }); ``` --- ## When to Use This Reference 1. **Building AI-powered applications**: SDK patterns and examples 2. **Document processing**: PDF/DOCX/XLSX/PPTX capabilities 3. **Multimodal features**: Image, video, audio processing 4. **Web integration**: Search and scraping patterns 5. **Agent design**: Subagent patterns and capabilities ## Source - Platform: Super Z (z.ai) - SDK: z-ai-web-dev-sdk - Framework: Next.js 15 + React + Prisma - UI: shadcn/ui