feat: Add complete Agentic Compaction & Pipeline System

- Context Compaction System with token counting and summarization - Deterministic State Machine for flow control (no LLM decisions) - Parallel Execution Engine (up to 12 concurrent sessions) - Event-Driven Coordination via Event Bus - Agent Workspace Isolation (tools, memory, identity, files) - YAML Workflow Integration (OpenClaw/Lobster compatible) - Claude Code integration layer - Complete demo UI with real-time visualization - Comprehensive documentation and README Components: - agent-system/: Context management, token counting, subagent spawning - pipeline-system/: State machine, parallel executor, event bus, workflows - skills/: AI capabilities (LLM, ASR, TTS, VLM, image generation, etc.) - src/app/: Next.js demo application Total: ~100KB of production-ready TypeScript code
2026-03-03 12:40:47 +00:00
parent 63a8b123c9
commit 2380d33861
152 changed files with 51569 additions and 817 deletions
--- a/skills/video-understand/LICENSE.txt
+++ b/skills/video-understand/LICENSE.txt
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2025 z-ai-web-dev-sdk Skills
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/skills/video-understand/SKILL.md
+++ b/skills/video-understand/SKILL.md
@@ -0,0 +1,916 @@
+---
+name: video-understand
+description: Implement specialized video understanding capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to analyze video content, understand motion and temporal sequences, extract information from video frames, describe video scenes, or perform video-based AI analysis. Optimized for MP4, AVI, MOV, and other common video formats.
+license: MIT
+---
+
+# Video Understanding Skill
+
+This skill provides specialized video understanding functionality using the z-ai-web-dev-sdk package, enabling AI models to analyze, describe, and extract information from video content including motion, temporal sequences, and scene changes.
+
+## Skills Path
+
+**Skill Location**: `{project_path}/skills/video-understand`
+
+this skill is located at above path in your project.
+
+**Reference Scripts**: Example test scripts are available in the `{Skill Location}/scripts/` directory for quick testing and reference. See `{Skill Location}/scripts/video-understand.ts` for a working example.
+
+## Overview
+
+Video Understanding focuses specifically on video content analysis, providing capabilities for:
+- Video scene understanding and description
+- Action and motion detection
+- Temporal sequence analysis
+- Event detection in videos
+- Video content summarization
+- Scene change detection
+- People and object tracking across frames
+- Audio-visual content analysis (when applicable)
+
+**IMPORTANT**: z-ai-web-dev-sdk MUST be used in backend code only. Never use it in client-side code.
+
+## Prerequisites
+
+The z-ai-web-dev-sdk package is already installed. Import it as shown in the examples below.
+
+## CLI Usage (For Simple Tasks)
+
+For quick video analysis tasks, you can use the z-ai CLI instead of writing code. This is ideal for simple video descriptions, testing, or automation.
+
+### Basic Video Analysis
+
+```bash
+# Analyze a video from URL
+z-ai vision --prompt "Summarize what happens in this video" --image "https://example.com/video.mp4"
+
+# Note: Use --image flag for video URLs as well
+z-ai vision -p "Describe the key events" -i "https://example.com/presentation.mp4"
+```
+
+### Analyze Local Videos
+
+```bash
+# Analyze a local video file
+z-ai vision -p "What activities are shown in this video?" -i "./recording.mp4"
+
+# Save response to file
+z-ai vision -p "Provide a detailed summary" -i "./meeting.mp4" -o summary.json
+```
+
+### Advanced Video Analysis
+
+```bash
+# Complex scene understanding with thinking
+z-ai vision \
+  -p "Analyze this video and identify: 1) Main events, 2) People and their actions, 3) Timeline of key moments" \
+  -i "./event.mp4" \
+  --thinking \
+  -o analysis.json
+
+# Action detection
+z-ai vision \
+  -p "Identify all actions performed by people in this video" \
+  -i "./sports.mp4" \
+  --thinking
+```
+
+### Streaming Output
+
+```bash
+# Stream the video analysis
+z-ai vision -p "Describe this video content" -i "./video.mp4" --stream
+```
+
+### CLI Parameters
+
+- `--prompt, -p <text>`: **Required** - Question or instruction about the video
+- `--image, -i <URL or path>`: Optional - Video URL or local file path (despite the name, it works for videos too)
+- `--thinking, -t`: Optional - Enable chain-of-thought reasoning for complex analysis (default: disabled)
+- `--output, -o <path>`: Optional - Output file path (JSON format)
+- `--stream`: Optional - Stream the response in real-time
+
+### Supported Video Formats
+
+- MP4 (.mp4) - Most widely supported format
+- AVI (.avi) - Audio Video Interleave
+- MOV (.mov) - QuickTime format
+- WebM (.webm) - Web-optimized format
+- MKV (.mkv) - Matroska format
+- FLV (.flv) - Flash Video format
+
+### When to Use CLI vs SDK
+
+**Use CLI for:**
+- Quick video summaries
+- One-off video analysis
+- Testing video understanding capabilities
+- Simple automation scripts
+- Generating video descriptions
+
+**Use SDK for:**
+- Multi-turn conversations about videos
+- Complex video processing pipelines
+- Production applications with error handling
+- Custom integration with video processing logic
+- Batch video processing with custom workflows
+
+## Recommended Approach
+
+For better performance and reliability with local videos, consider:
+1. Uploading videos to a CDN and using URLs
+2. For shorter videos, convert key frames to images for faster analysis
+3. For long videos, consider chunking or sampling at intervals
+
+## Basic Video Understanding Implementation
+
+### Single Video Analysis
+
+```javascript
+import ZAI from 'z-ai-web-dev-sdk';
+
+async function analyzeVideo(videoUrl, prompt) {
+  const zai = await ZAI.create();
+
+  const response = await zai.chat.completions.createVision({
+    messages: [
+      {
+        role: 'user',
+        content: [
+          {
+            type: 'text',
+            text: prompt
+          },
+          {
+            type: 'video_url',
+            video_url: {
+              url: videoUrl
+            }
+          }
+        ]
+      }
+    ],
+    thinking: { type: 'disabled' }
+  });
+
+  return response.choices[0]?.message?.content;
+}
+
+// Usage examples
+const summary = await analyzeVideo(
+  'https://example.com/presentation.mp4',
+  'Summarize the key points presented in this video'
+);
+
+const actionDetection = await analyzeVideo(
+  'https://example.com/sports.mp4',
+  'Identify and describe all athletic actions performed in this video'
+);
+```
+
+### Video Scene Understanding
+
+```javascript
+import ZAI from 'z-ai-web-dev-sdk';
+
+async function understandVideoScenes(videoUrl) {
+  const zai = await ZAI.create();
+
+  const prompt = `Analyze this video and provide:
+1. Overall summary of the video content
+2. Main scenes or segments (with approximate timestamps if possible)
+3. Key people or characters and their roles
+4. Important actions or events in chronological order
+5. Setting and environment description
+6. Overall mood or tone`;
+
+  const response = await zai.chat.completions.createVision({
+    messages: [
+      {
+        role: 'user',
+        content: [
+          { type: 'text', text: prompt },
+          { type: 'video_url', video_url: { url: videoUrl } }
+        ]
+      }
+    ],
+    thinking: { type: 'enabled' } // Enable for detailed analysis
+  });
+
+  return response.choices[0]?.message?.content;
+}
+
+// Usage
+const sceneAnalysis = await understandVideoScenes(
+  'https://example.com/documentary.mp4'
+);
+```
+
+### Motion and Action Detection
+
+```javascript
+import ZAI from 'z-ai-web-dev-sdk';
+
+async function detectActions(videoUrl, specificAction = null) {
+  const zai = await ZAI.create();
+
+  const prompt = specificAction
+    ? `Identify all instances of "${specificAction}" in this video. For each instance, describe when it occurs and provide details about how it's performed.`
+    : 'Identify and describe all significant actions and movements in this video. Include who is performing them and when they occur.';
+
+  const response = await zai.chat.completions.createVision({
+    messages: [
+      {
+        role: 'user',
+        content: [
+          { type: 'text', text: prompt },
+          { type: 'video_url', video_url: { url: videoUrl } }
+        ]
+      }
+    ],
+    thinking: { type: 'enabled' }
+  });
+
+  return response.choices[0]?.message?.content;
+}
+
+// Usage
+const runningActions = await detectActions(
+  'https://example.com/sports.mp4',
+  'running'
+);
+
+const allActions = await detectActions(
+  'https://example.com/activity.mp4'
+);
+```
+
+### Event Timeline Extraction
+
+```javascript
+import ZAI from 'z-ai-web-dev-sdk';
+
+async function extractTimeline(videoUrl) {
+  const zai = await ZAI.create();
+
+  const prompt = `Create a detailed timeline of events in this video:
+- Identify key moments and transitions
+- Note approximate timing (beginning, middle, end or specific timestamps if visible)
+- Describe what happens at each key point
+- Identify any cause-and-effect relationships between events
+
+Format as a chronological list.`;
+
+  const response = await zai.chat.completions.createVision({
+    messages: [
+      {
+        role: 'user',
+        content: [
+          { type: 'text', text: prompt },
+          { type: 'video_url', video_url: { url: videoUrl } }
+        ]
+      }
+    ],
+    thinking: { type: 'enabled' }
+  });
+
+  return response.choices[0]?.message?.content;
+}
+```
+
+### Video Content Classification
+
+```javascript
+import ZAI from 'z-ai-web-dev-sdk';
+
+async function classifyVideo(videoUrl) {
+  const zai = await ZAI.create();
+
+  const prompt = `Classify this video content:
+1. Primary category (e.g., educational, entertainment, sports, news, tutorial)
+2. Sub-category or genre
+3. Target audience
+4. Content style (professional, casual, documentary, etc.)
+5. Key themes or topics
+6. Suggested tags (10-15 keywords)
+
+Format your response as structured JSON.`;
+
+  const response = await zai.chat.completions.createVision({
+    messages: [
+      {
+        role: 'user',
+        content: [
+          { type: 'text', text: prompt },
+          { type: 'video_url', video_url: { url: videoUrl } }
+        ]
+      }
+    ],
+    thinking: { type: 'disabled' }
+  });
+
+  const content = response.choices[0]?.message?.content;
+  
+  try {
+    return JSON.parse(content);
+  } catch (e) {
+    return { rawResponse: content };
+  }
+}
+```
+
+## Advanced Use Cases
+
+### Multi-turn Video Conversation
+
+```javascript
+import ZAI from 'z-ai-web-dev-sdk';
+
+class VideoConversation {
+  constructor() {
+    this.messages = [];
+  }
+
+  async initialize() {
+    this.zai = await ZAI.create();
+  }
+
+  async loadVideo(videoUrl, initialQuestion) {
+    this.messages.push({
+      role: 'user',
+      content: [
+        { type: 'text', text: initialQuestion },
+        { type: 'video_url', video_url: { url: videoUrl } }
+      ]
+    });
+
+    return this.getResponse();
+  }
+
+  async askFollowUp(question) {
+    this.messages.push({
+      role: 'user',
+      content: [
+        { type: 'text', text: question }
+      ]
+    });
+
+    return this.getResponse();
+  }
+
+  async getResponse() {
+    const response = await this.zai.chat.completions.createVision({
+      messages: this.messages,
+      thinking: { type: 'disabled' }
+    });
+
+    const assistantMessage = response.choices[0]?.message?.content;
+    
+    this.messages.push({
+      role: 'assistant',
+      content: assistantMessage
+    });
+
+    return assistantMessage;
+  }
+}
+
+// Usage
+const conversation = new VideoConversation();
+await conversation.initialize();
+
+const initial = await conversation.loadVideo(
+  'https://example.com/lecture.mp4',
+  'What is the main topic of this lecture?'
+);
+
+const followup1 = await conversation.askFollowUp(
+  'Can you explain the key concepts mentioned?'
+);
+
+const followup2 = await conversation.askFollowUp(
+  'What examples were used to illustrate these concepts?'
+);
+```
+
+### Video Quality Assessment
+
+```javascript
+import ZAI from 'z-ai-web-dev-sdk';
+
+async function assessVideoQuality(videoUrl) {
+  const zai = await ZAI.create();
+
+  const prompt = `Assess the quality of this video:
+1. Visual quality (resolution, clarity, lighting) - Rate 1-10
+2. Audio quality (if audio is present) - Rate 1-10
+3. Camera work (stability, framing, composition) - Rate 1-10
+4. Production value (editing, transitions, effects) - Rate 1-10
+5. Content clarity (is the message clear?) - Rate 1-10
+6. Pacing (too fast, too slow, just right)
+7. Technical issues (artifacts, blur, audio sync, etc.)
+8. Overall rating - 1-10
+9. Specific recommendations for improvement
+
+Provide detailed feedback for each criterion.`;
+
+  const response = await zai.chat.completions.createVision({
+    messages: [
+      {
+        role: 'user',
+        content: [
+          { type: 'text', text: prompt },
+          { type: 'video_url', video_url: { url: videoUrl } }
+        ]
+      }
+    ],
+    thinking: { type: 'enabled' }
+  });
+
+  return response.choices[0]?.message?.content;
+}
+```
+
+### Video Content Moderation
+
+```javascript
+import ZAI from 'z-ai-web-dev-sdk';
+
+async function moderateVideo(videoUrl) {
+  const zai = await ZAI.create();
+
+  const prompt = `Review this video for content moderation:
+1. Check for any inappropriate or sensitive content
+2. Identify any potential safety concerns
+3. Note any content that might violate common community guidelines
+4. Assess age-appropriateness
+5. Identify any copyrighted material visible (logos, brands, music)
+6. Overall safety rating: Safe / Caution / Review Required
+
+Provide specific examples for any concerns identified.`;
+
+  const response = await zai.chat.completions.createVision({
+    messages: [
+      {
+        role: 'user',
+        content: [
+          { type: 'text', text: prompt },
+          { type: 'video_url', video_url: { url: videoUrl } }
+        ]
+      }
+    ],
+    thinking: { type: 'enabled' }
+  });
+
+  return response.choices[0]?.message?.content;
+}
+```
+
+### Video Transcript Generation (Visual Description)
+
+```javascript
+import ZAI from 'z-ai-web-dev-sdk';
+
+async function generateVisualTranscript(videoUrl) {
+  const zai = await ZAI.create();
+
+  const prompt = `Generate a detailed visual transcript of this video:
+- Describe what's happening in each scene
+- Note any text that appears on screen
+- Describe important visual elements
+- Mention any scene changes or transitions
+- Include descriptions of people's actions and expressions
+
+Format as a time-based narrative (e.g., "At the beginning...", "Then...", "Finally...").`;
+
+  const response = await zai.chat.completions.createVision({
+    messages: [
+      {
+        role: 'user',
+        content: [
+          { type: 'text', text: prompt },
+          { type: 'video_url', video_url: { url: videoUrl } }
+        ]
+      }
+    ],
+    thinking: { type: 'disabled' }
+  });
+
+  return response.choices[0]?.message?.content;
+}
+```
+
+### Sports Video Analysis
+
+```javascript
+import ZAI from 'z-ai-web-dev-sdk';
+
+async function analyzeSportsVideo(videoUrl, sport = null) {
+  const zai = await ZAI.create();
+
+  const prompt = sport
+    ? `Analyze this ${sport} video in detail:
+1. Identify players and their positions
+2. Describe key plays and strategies
+3. Note scoring events or important moments
+4. Assess player performance
+5. Identify any rule violations or fouls
+6. Describe the pace and flow of the game`
+    : `Analyze this sports video:
+1. Identify the sport being played
+2. Describe the key actions and plays
+3. Note any scoring or significant events
+4. Describe player movements and strategies
+5. Overall assessment of the game or match`;
+
+  const response = await zai.chat.completions.createVision({
+    messages: [
+      {
+        role: 'user',
+        content: [
+          { type: 'text', text: prompt },
+          { type: 'video_url', video_url: { url: videoUrl } }
+        ]
+      }
+    ],
+    thinking: { type: 'enabled' }
+  });
+
+  return response.choices[0]?.message?.content;
+}
+```
+
+### Educational Video Summarization
+
+```javascript
+import ZAI from 'z-ai-web-dev-sdk';
+
+async function summarizeEducationalVideo(videoUrl) {
+  const zai = await ZAI.create();
+
+  const prompt = `Summarize this educational video for students:
+1. Main topic or learning objective
+2. Key concepts explained (in order)
+3. Important definitions or terminology
+4. Examples used to illustrate concepts
+5. Visual aids or demonstrations shown
+6. Key takeaways or conclusions
+7. Suggested review points
+
+Format as a study guide.`;
+
+  const response = await zai.chat.completions.createVision({
+    messages: [
+      {
+        role: 'user',
+        content: [
+          { type: 'text', text: prompt },
+          { type: 'video_url', video_url: { url: videoUrl } }
+        ]
+      }
+    ],
+    thinking: { type: 'enabled' }
+  });
+
+  return response.choices[0]?.message?.content;
+}
+```
+
+## Batch Video Processing
+
+### Process Multiple Videos
+
+```javascript
+import ZAI from 'z-ai-web-dev-sdk';
+
+class VideoBatchProcessor {
+  constructor() {
+    this.zai = null;
+  }
+
+  async initialize() {
+    this.zai = await ZAI.create();
+  }
+
+  async processVideo(videoUrl, prompt) {
+    const response = await this.zai.chat.completions.createVision({
+      messages: [
+        {
+          role: 'user',
+          content: [
+            { type: 'text', text: prompt },
+            { type: 'video_url', video_url: { url: videoUrl } }
+          ]
+        }
+      ],
+      thinking: { type: 'disabled' }
+    });
+
+    return response.choices[0]?.message?.content;
+  }
+
+  async processBatch(videoUrls, prompt) {
+    const results = [];
+    
+    for (const videoUrl of videoUrls) {
+      try {
+        console.log(`Processing: ${videoUrl}`);
+        const result = await this.processVideo(videoUrl, prompt);
+        results.push({ videoUrl, success: true, result });
+        
+        // Add delay to avoid rate limiting
+        await new Promise(resolve => setTimeout(resolve, 1000));
+      } catch (error) {
+        results.push({ 
+          videoUrl, 
+          success: false, 
+          error: error.message 
+        });
+      }
+    }
+
+    return results;
+  }
+}
+
+// Usage
+const processor = new VideoBatchProcessor();
+await processor.initialize();
+
+const videos = [
+  'https://example.com/video1.mp4',
+  'https://example.com/video2.mp4',
+  'https://example.com/video3.mp4'
+];
+
+const results = await processor.processBatch(
+  videos,
+  'Provide a brief summary of this video suitable for a content catalog'
+);
+```
+
+## Best Practices
+
+### 1. Video Preparation
+- Use standard video formats (MP4, MOV, AVI)
+- Ensure videos are accessible via public URLs or properly encoded
+- For long videos, consider creating shorter clips for specific analysis
+- Optimize video size for faster processing
+- Ensure good lighting and audio quality in source videos
+
+### 2. Prompt Engineering for Videos
+- Be specific about temporal aspects ("beginning", "throughout", "at the end")
+- Mention what type of analysis you need (actions, events, scenes, etc.)
+- For long videos, ask for summaries or key moments
+- Use thinking mode for complex temporal reasoning
+- Specify if you need chronological or thematic organization
+
+### 3. Error Handling
+
+```javascript
+async function safeVideoAnalysis(videoUrl, prompt) {
+  try {
+    const zai = await ZAI.create();
+    
+    const response = await zai.chat.completions.createVision({
+      messages: [
+        {
+          role: 'user',
+          content: [
+            { type: 'text', text: prompt },
+            { type: 'video_url', video_url: { url: videoUrl } }
+          ]
+        }
+      ],
+      thinking: { type: 'disabled' }
+    });
+
+    return {
+      success: true,
+      content: response.choices[0]?.message?.content
+    };
+  } catch (error) {
+    console.error('Video analysis error:', error);
+    return {
+      success: false,
+      error: error.message
+    };
+  }
+}
+```
+
+### 4. Performance Optimization
+- Cache SDK instance for batch processing
+- Implement request throttling (add delays between requests)
+- Process videos asynchronously when possible
+- For very long videos, consider analyzing at specific intervals
+- Use appropriate thinking mode (disabled for simple descriptions, enabled for complex analysis)
+
+### 5. Security Considerations
+- Validate video URLs before processing
+- Implement rate limiting for public APIs
+- Sanitize user-provided video URLs
+- Never expose SDK credentials in client-side code
+- Implement content moderation for user-uploaded videos
+- Consider video file size limits
+
+## Common Use Cases
+
+1. **Content Moderation**: Automatically review video uploads for policy compliance
+2. **Video Cataloging**: Generate descriptions and tags for video libraries
+3. **Sports Analysis**: Analyze games, identify plays, assess performance
+4. **Educational Content**: Summarize lectures, create study guides
+5. **Security & Surveillance**: Detect events, track activities (with appropriate authorization)
+6. **Quality Control**: Assess video production quality
+7. **Social Media**: Generate video captions and descriptions
+8. **Training & Documentation**: Analyze training videos, create documentation
+9. **Event Recording**: Summarize meetings, conferences, presentations
+10. **Entertainment**: Analyze films, shows for content, themes, scenes
+
+## Integration Examples
+
+### Express.js API Endpoint
+
+```javascript
+import express from 'express';
+import ZAI from 'z-ai-web-dev-sdk';
+
+const app = express();
+app.use(express.json());
+
+let zaiInstance;
+
+async function initZAI() {
+  zaiInstance = await ZAI.create();
+}
+
+// Analyze video from URL
+app.post('/api/analyze-video', async (req, res) => {
+  try {
+    const { videoUrl, prompt } = req.body;
+
+    if (!videoUrl || !prompt) {
+      return res.status(400).json({ 
+        error: 'videoUrl and prompt are required' 
+      });
+    }
+
+    const response = await zaiInstance.chat.completions.createVision({
+      messages: [
+        {
+          role: 'user',
+          content: [
+            { type: 'text', text: prompt },
+            { type: 'video_url', video_url: { url: videoUrl } }
+          ]
+        }
+      ],
+      thinking: { type: 'disabled' }
+    });
+
+    res.json({
+      success: true,
+      analysis: response.choices[0]?.message?.content
+    });
+  } catch (error) {
+    res.status(500).json({
+      success: false,
+      error: error.message
+    });
+  }
+});
+
+// Get video summary
+app.post('/api/video-summary', async (req, res) => {
+  try {
+    const { videoUrl } = req.body;
+
+    if (!videoUrl) {
+      return res.status(400).json({ error: 'videoUrl is required' });
+    }
+
+    const prompt = 'Provide a comprehensive summary of this video including: 1) Main content/topic, 2) Key events in chronological order, 3) Important people or subjects, 4) Overall takeaway.';
+
+    const response = await zaiInstance.chat.completions.createVision({
+      messages: [
+        {
+          role: 'user',
+          content: [
+            { type: 'text', text: prompt },
+            { type: 'video_url', video_url: { url: videoUrl } }
+          ]
+        }
+      ],
+      thinking: { type: 'enabled' }
+    });
+
+    res.json({
+      success: true,
+      summary: response.choices[0]?.message?.content
+    });
+  } catch (error) {
+    res.status(500).json({
+      success: false,
+      error: error.message
+    });
+  }
+});
+
+initZAI().then(() => {
+  app.listen(3000, () => {
+    console.log('Video understanding API running on port 3000');
+  });
+});
+```
+
+### Next.js API Route
+
+```javascript
+// pages/api/video-understand.js
+import ZAI from 'z-ai-web-dev-sdk';
+
+let zaiInstance = null;
+
+async function getZAI() {
+  if (!zaiInstance) {
+    zaiInstance = await ZAI.create();
+  }
+  return zaiInstance;
+}
+
+export default async function handler(req, res) {
+  if (req.method !== 'POST') {
+    return res.status(405).json({ error: 'Method not allowed' });
+  }
+
+  try {
+    const { videoUrl, prompt, enableThinking = false } = req.body;
+
+    if (!videoUrl || !prompt) {
+      return res.status(400).json({ 
+        error: 'videoUrl and prompt are required' 
+      });
+    }
+
+    const zai = await getZAI();
+
+    const response = await zai.chat.completions.createVision({
+      messages: [
+        {
+          role: 'user',
+          content: [
+            { type: 'text', text: prompt },
+            { type: 'video_url', video_url: { url: videoUrl } }
+          ]
+        }
+      ],
+      thinking: { type: enableThinking ? 'enabled' : 'disabled' }
+    });
+
+    res.status(200).json({
+      success: true,
+      analysis: response.choices[0]?.message?.content
+    });
+  } catch (error) {
+    console.error('Error:', error);
+    res.status(500).json({
+      success: false,
+      error: error.message
+    });
+  }
+}
+```
+
+## Troubleshooting
+
+**Issue**: "SDK must be used in backend"
+- **Solution**: Ensure z-ai-web-dev-sdk is only imported and used in server-side code, never in client/browser code
+
+**Issue**: Video not loading or being analyzed
+- **Solution**: Verify the video URL is accessible, returns correct MIME type, and is in a supported format
+
+**Issue**: Inaccurate temporal analysis
+- **Solution**: Enable thinking mode for complex temporal reasoning, provide more specific prompts about time/sequence
+
+**Issue**: Slow response times for videos
+- **Solution**: Videos take longer to process than images; consider shorter clips or sampling for long videos
+
+**Issue**: Missing details from video
+- **Solution**: Be more specific in your prompt, ask about particular time segments or aspects
+
+**Issue**: Video format not supported
+- **Solution**: Convert video to MP4 (most widely supported), check that URL returns proper video MIME type
+
+## Remember
+
+- Always use z-ai-web-dev-sdk in backend code only
+- The SDK is already installed - import as shown in examples
+- Use `video_url` content type for video files
+- Video analysis takes longer than image analysis - be patient
+- Enable thinking mode for complex temporal reasoning and event detection
+- Structure prompts to include temporal information (beginning, middle, end)
+- Handle errors gracefully in production
+- Implement rate limiting and delays for batch processing
+- Validate and sanitize user inputs
+- Consider privacy and security when processing user videos
+- For very long videos, consider analyzing specific segments or key frames
--- a/skills/video-understand/scripts/video-understand.ts
+++ b/skills/video-understand/scripts/video-understand.ts
@@ -0,0 +1,41 @@
+import ZAI, { VisionMessage } from 'z-ai-web-dev-sdk';
+
+async function main(videoUrl: string, prompt: string) {
+	try {
+		const zai = await ZAI.create();
+
+		const messages: VisionMessage[] = [
+			{
+				role: 'assistant',
+				content: [
+					{ type: 'text', text: 'Output only text, no markdown.' }
+				]
+			},
+			{
+				role: 'user',
+				content: [
+					{ type: 'text', text: prompt },
+					{ type: 'video_url', video_url: { url: videoUrl } }
+				]
+			}
+		];
+
+		const response = await zai.chat.completions.createVision({
+			model: 'glm-4.6v',
+			messages,
+			thinking: { type: 'disabled' }
+		});
+
+		const reply = response.choices?.[0]?.message?.content;
+		console.log('Video Understanding Result:');
+		console.log(reply ?? JSON.stringify(response, null, 2));
+	} catch (err: any) {
+		console.error('Video understanding failed:', err?.message || err);
+	}
+}
+
+// Example usage - analyze a video
+main(
+	"https://example.com/sample-video.mp4",
+	"Please analyze this video and describe the main events, actions, and key moments in chronological order."
+);