Files
GLM-Tools-Skills-Agents/skills/glm-skills/SKILL.md

9.7 KiB

name, description, priority, autoTrigger, triggers
name description priority autoTrigger triggers
glm-skills Reference for Super Z/GLM platform skills and SDK. AUTO-TRIGGERS when: speech-to-text, ASR, TTS, text-to-speech, image generation, video generation, VLM, vision model, PDF processing, DOCX, XLSX, PPTX, web search, web scraping, podcast generation, multimodal AI, z-ai-web-dev-sdk. 100 true
speech to text
ASR
transcribe
text to speech
TTS
voice
image generation
generate image
video generation
generate video
VLM
vision language
analyze image
PDF
DOCX
XLSX
PPTX
spreadsheet
presentation
web search
web scraping
podcast
multimodal
z-ai-web-dev-sdk
Super Z
GLM

Super Z / GLM Skills & Agents Reference

Complete reference for the Super Z (z.ai) platform's skills system, agents, and development patterns.


SDK: z-ai-web-dev-sdk

All skills use the z-ai-web-dev-sdk JavaScript/TypeScript SDK.

Installation

npm install z-ai-web-dev-sdk
# or
bun add z-ai-web-dev-sdk

Initialization

import ZAI from 'z-ai-web-dev-sdk';

const zai = await ZAI.create();

Multimodal AI Skills

ASR (Automatic Speech Recognition)

Command: ASR

Speech-to-text using z-ai-web-dev-sdk.

// Supports base64 encoded audio
const transcription = await zai.asr.transcribe({
  audio: audioBase64
});

Use Cases: Transcription, voice input, audio processing


TTS (Text-to-Speech)

Command: TTS

Convert text to natural-sounding speech.

const audio = await zai.tts.synthesize({
  text: "Hello world",
  voice: "default",
  speed: 1.0
});

Features: Multiple voices, adjustable speed, various audio formats


LLM (Large Language Model)

Command: LLM

Chat completions with context management.

const completion = await zai.chat.completions.create({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Hello!' }
  ],
  temperature: 0.7
});

Features: Multi-turn conversations, system prompts, context management


VLM (Vision Language Model)

Command: VLM

Image understanding with conversational AI.

const response = await zai.vlm.analyze({
  image: imageUrlOrBase64,
  prompt: "Describe this image"
});

Supports: Image URLs, base64 encoded images, multimodal interactions


Image Generation

Command: image-generation

AI image creation from text.

const response = await zai.images.generations.create({
  prompt: 'A cute cat playing in the garden',
  size: '1024x1024'
});

const imageBase64 = response.data[0].base64;

CLI Tool:

z-ai-generate --prompt "A beautiful landscape" --output "./image.png"
z-ai-generate -p "A cute cat" -o "./cat.png" -s 1024x1024

Supported Sizes: 1024x1024, 768x1344, 864x1152, 1344x768, 1152x864, 1440x720, 720x1440


Image Edit

Command: image-edit

Modify existing images with AI.

const edited = await zai.images.edits.create({
  image: originalImageBase64,
  prompt: "Add a sunset background"
});

Use Cases: Variations, redesign, text-based transformation


Image Understand

Command: image-understand

Analyze and understand images.

const analysis = await zai.image.understand({
  image: imagePath,
  task: "extract_text" // or "detect_objects", "classify"
});

Supports: PNG, JPEG, GIF, WebP, BMP


Video Generation

Command: video-generation

AI-powered video creation.

const task = await zai.videos.generations.create({
  prompt: "A dog running in a park",
  // or from image
  image: imageBase64
});

// Async status polling
const status = await zai.videos.generations.status(task.id);
const result = await zai.videos.generations.retrieve(task.id);

Features: Async task management, status polling, result retrieval


Video Understand

Command: video-understand

Analyze video content.

const analysis = await zai.video.analyze({
  video: videoPath,
  prompt: "Describe what happens in this video"
});

Supports: MP4, AVI, MOV


Document Processing Skills

PDF

Command: pdf

Comprehensive PDF toolkit.

Capabilities:

  • Extract text and tables
  • Create new PDFs
  • Merge/split documents
  • Handle forms

DOCX

Command: docx

Word document processing.

Capabilities:

  • Create and edit documents
  • Tracked changes
  • Comments
  • Formatting preservation
  • Text extraction

XLSX

Command: xlsx

Spreadsheet processing.

Capabilities:

  • Create with formulas and formatting
  • Read and analyze data
  • Modify while preserving formulas
  • Data visualization
  • Formula recalculation

Supports: .xlsx, .xlsm, .csv, .tsv


PPTX

Command: pptx

Presentation processing.

Capabilities:

  • Edit existing presentations
  • Add slides
  • Create new presentations
  • Work with layouts
  • Add comments/speaker notes

Web & Data Skills

Command: web-search

Search for real-time information.

const results = await zai.functions.invoke("web_search", {
  query: "What is the capital of France?",
  num: 10
});

// Result type
interface SearchFunctionResultItem {
  url: string;
  name: string;
  snippet: string;
  host_name: string;
  rank: number;
  date: string;
  favicon: string;
}

Web Reader

Command: web-reader

Extract web page content.

const content = await zai.web.read({
  url: "https://example.com"
});

Features: Automatic content extraction, title, HTML, publication time


CSV Data Summarizer

Command: csv-data-summarizer

Automatic CSV analysis.

Features:

  • Detects data types (sales, customer, financial, operational, survey)
  • Generates correlation heatmaps
  • Time-series plots
  • Distribution charts
  • Missing data analysis
  • Automatic date detection

Built with: pandas, matplotlib, seaborn


Deep Research

Command: deep-research

Enterprise-grade research.

Triggers: "deep research", "comprehensive analysis", "research report", "compare X vs Y"

Features:

  • Multi-source synthesis
  • Citation tracking
  • Verification
  • 10+ sources

Specialized Skills

Podcast Generate

Command: podcast-generate

Create podcast episodes.

Modes:

  1. From uploaded text/article → dual-host dialogue
  2. From topic → web search + generation

Features:

  • Duration scales with content (3-20 min, ~240 chars/min)
  • Outputs: Markdown script + WAV audio

Story Video Generation

Command: story-video-generation

Convert sentences to story videos.

Triggers (Chinese): "生成故事", "故事视频", "把一句话变成视频"

Process: Sentence → Story → Scene Images → Video


Frontend Design

Command: frontend-design

Transform UI requirements to production code.

Features:

  • Design tokens
  • Accessibility compliance
  • Creative execution

Use Cases: Websites, web apps, React/Vue components, dashboards, landing pages


Finance

Command: finance

Finance API integration.

Capabilities:

  • Stock price queries
  • Market data analysis
  • Company financials
  • Portfolio tracking
  • Market news
  • Stock screening
  • Technical analysis

Gift Evaluator

Command: gift-evaluator

Spring Festival gift analysis.

Features:

  • Visual perception
  • Market valuation
  • HTML card generation

Use Cases: Gift photos, value inquiry, authenticity, social responses


Subagents (Task Tool)

Available Agent Types

Agent Description Tools
general-purpose Complex research, multi-step tasks All
statusline-setup Status line configuration Read, Edit
Explore Codebase exploration All
Plan Implementation planning All
frontend-styling-expert CSS, styling, responsive design All
full-stack-developer Next.js 15 + React + Prisma All

Explore Agent Thoroughness

  • quick: Basic searches
  • medium: Moderate exploration
  • very thorough: Comprehensive analysis

Project Environment

Standard Stack

  • Next.js 15 with App Router
  • Port: 3000
  • Package manager: Bun
  • Database: Prisma
  • UI: shadcn/ui components

Commands

bun run dev    # Start dev server (auto-runs)
bun run lint   # Check code quality

File Output

All generated files go to:

/home/z/my-project/download/

Backend-Only Rule

z-ai-web-dev-sdk MUST be used in backend only (API routes, server components).


Design Patterns

Async Task Pattern (Video Generation)

// 1. Create task
const task = await create({ prompt });

// 2. Poll status
const status = await status(task.id);

// 3. Retrieve result when complete
const result = await retrieve(task.id);

Multi-Modal Input Pattern

// Flexible input handling
const input = {
  // Text only
  text: "description",

  // Image only
  image: base64OrUrl,

  // Mixed
  text: "modify this",
  image: base64OrUrl
};

Function Invocation Pattern

const result = await zai.functions.invoke("function_name", {
  param1: "value",
  param2: 123
});

When to Use This Reference

  1. Building AI-powered applications: SDK patterns and examples
  2. Document processing: PDF/DOCX/XLSX/PPTX capabilities
  3. Multimodal features: Image, video, audio processing
  4. Web integration: Search and scraping patterns
  5. Agent design: Subagent patterns and capabilities

Source

  • Platform: Super Z (z.ai)
  • SDK: z-ai-web-dev-sdk
  • Framework: Next.js 15 + React + Prisma
  • UI: shadcn/ui