feat: Add complete Agentic Compaction & Pipeline System
- Context Compaction System with token counting and summarization - Deterministic State Machine for flow control (no LLM decisions) - Parallel Execution Engine (up to 12 concurrent sessions) - Event-Driven Coordination via Event Bus - Agent Workspace Isolation (tools, memory, identity, files) - YAML Workflow Integration (OpenClaw/Lobster compatible) - Claude Code integration layer - Complete demo UI with real-time visualization - Comprehensive documentation and README Components: - agent-system/: Context management, token counting, subagent spawning - pipeline-system/: State machine, parallel executor, event bus, workflows - skills/: AI capabilities (LLM, ASR, TTS, VLM, image generation, etc.) - src/app/: Next.js demo application Total: ~100KB of production-ready TypeScript code
This commit is contained in:
21
skills/video-understand/LICENSE.txt
Executable file
21
skills/video-understand/LICENSE.txt
Executable file
@@ -0,0 +1,21 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2025 z-ai-web-dev-sdk Skills
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
916
skills/video-understand/SKILL.md
Executable file
916
skills/video-understand/SKILL.md
Executable file
@@ -0,0 +1,916 @@
|
||||
---
|
||||
name: video-understand
|
||||
description: Implement specialized video understanding capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to analyze video content, understand motion and temporal sequences, extract information from video frames, describe video scenes, or perform video-based AI analysis. Optimized for MP4, AVI, MOV, and other common video formats.
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# Video Understanding Skill
|
||||
|
||||
This skill provides specialized video understanding functionality using the z-ai-web-dev-sdk package, enabling AI models to analyze, describe, and extract information from video content including motion, temporal sequences, and scene changes.
|
||||
|
||||
## Skills Path
|
||||
|
||||
**Skill Location**: `{project_path}/skills/video-understand`
|
||||
|
||||
this skill is located at above path in your project.
|
||||
|
||||
**Reference Scripts**: Example test scripts are available in the `{Skill Location}/scripts/` directory for quick testing and reference. See `{Skill Location}/scripts/video-understand.ts` for a working example.
|
||||
|
||||
## Overview
|
||||
|
||||
Video Understanding focuses specifically on video content analysis, providing capabilities for:
|
||||
- Video scene understanding and description
|
||||
- Action and motion detection
|
||||
- Temporal sequence analysis
|
||||
- Event detection in videos
|
||||
- Video content summarization
|
||||
- Scene change detection
|
||||
- People and object tracking across frames
|
||||
- Audio-visual content analysis (when applicable)
|
||||
|
||||
**IMPORTANT**: z-ai-web-dev-sdk MUST be used in backend code only. Never use it in client-side code.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
The z-ai-web-dev-sdk package is already installed. Import it as shown in the examples below.
|
||||
|
||||
## CLI Usage (For Simple Tasks)
|
||||
|
||||
For quick video analysis tasks, you can use the z-ai CLI instead of writing code. This is ideal for simple video descriptions, testing, or automation.
|
||||
|
||||
### Basic Video Analysis
|
||||
|
||||
```bash
|
||||
# Analyze a video from URL
|
||||
z-ai vision --prompt "Summarize what happens in this video" --image "https://example.com/video.mp4"
|
||||
|
||||
# Note: Use --image flag for video URLs as well
|
||||
z-ai vision -p "Describe the key events" -i "https://example.com/presentation.mp4"
|
||||
```
|
||||
|
||||
### Analyze Local Videos
|
||||
|
||||
```bash
|
||||
# Analyze a local video file
|
||||
z-ai vision -p "What activities are shown in this video?" -i "./recording.mp4"
|
||||
|
||||
# Save response to file
|
||||
z-ai vision -p "Provide a detailed summary" -i "./meeting.mp4" -o summary.json
|
||||
```
|
||||
|
||||
### Advanced Video Analysis
|
||||
|
||||
```bash
|
||||
# Complex scene understanding with thinking
|
||||
z-ai vision \
|
||||
-p "Analyze this video and identify: 1) Main events, 2) People and their actions, 3) Timeline of key moments" \
|
||||
-i "./event.mp4" \
|
||||
--thinking \
|
||||
-o analysis.json
|
||||
|
||||
# Action detection
|
||||
z-ai vision \
|
||||
-p "Identify all actions performed by people in this video" \
|
||||
-i "./sports.mp4" \
|
||||
--thinking
|
||||
```
|
||||
|
||||
### Streaming Output
|
||||
|
||||
```bash
|
||||
# Stream the video analysis
|
||||
z-ai vision -p "Describe this video content" -i "./video.mp4" --stream
|
||||
```
|
||||
|
||||
### CLI Parameters
|
||||
|
||||
- `--prompt, -p <text>`: **Required** - Question or instruction about the video
|
||||
- `--image, -i <URL or path>`: Optional - Video URL or local file path (despite the name, it works for videos too)
|
||||
- `--thinking, -t`: Optional - Enable chain-of-thought reasoning for complex analysis (default: disabled)
|
||||
- `--output, -o <path>`: Optional - Output file path (JSON format)
|
||||
- `--stream`: Optional - Stream the response in real-time
|
||||
|
||||
### Supported Video Formats
|
||||
|
||||
- MP4 (.mp4) - Most widely supported format
|
||||
- AVI (.avi) - Audio Video Interleave
|
||||
- MOV (.mov) - QuickTime format
|
||||
- WebM (.webm) - Web-optimized format
|
||||
- MKV (.mkv) - Matroska format
|
||||
- FLV (.flv) - Flash Video format
|
||||
|
||||
### When to Use CLI vs SDK
|
||||
|
||||
**Use CLI for:**
|
||||
- Quick video summaries
|
||||
- One-off video analysis
|
||||
- Testing video understanding capabilities
|
||||
- Simple automation scripts
|
||||
- Generating video descriptions
|
||||
|
||||
**Use SDK for:**
|
||||
- Multi-turn conversations about videos
|
||||
- Complex video processing pipelines
|
||||
- Production applications with error handling
|
||||
- Custom integration with video processing logic
|
||||
- Batch video processing with custom workflows
|
||||
|
||||
## Recommended Approach
|
||||
|
||||
For better performance and reliability with local videos, consider:
|
||||
1. Uploading videos to a CDN and using URLs
|
||||
2. For shorter videos, convert key frames to images for faster analysis
|
||||
3. For long videos, consider chunking or sampling at intervals
|
||||
|
||||
## Basic Video Understanding Implementation
|
||||
|
||||
### Single Video Analysis
|
||||
|
||||
```javascript
|
||||
import ZAI from 'z-ai-web-dev-sdk';
|
||||
|
||||
async function analyzeVideo(videoUrl, prompt) {
|
||||
const zai = await ZAI.create();
|
||||
|
||||
const response = await zai.chat.completions.createVision({
|
||||
messages: [
|
||||
{
|
||||
role: 'user',
|
||||
content: [
|
||||
{
|
||||
type: 'text',
|
||||
text: prompt
|
||||
},
|
||||
{
|
||||
type: 'video_url',
|
||||
video_url: {
|
||||
url: videoUrl
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
thinking: { type: 'disabled' }
|
||||
});
|
||||
|
||||
return response.choices[0]?.message?.content;
|
||||
}
|
||||
|
||||
// Usage examples
|
||||
const summary = await analyzeVideo(
|
||||
'https://example.com/presentation.mp4',
|
||||
'Summarize the key points presented in this video'
|
||||
);
|
||||
|
||||
const actionDetection = await analyzeVideo(
|
||||
'https://example.com/sports.mp4',
|
||||
'Identify and describe all athletic actions performed in this video'
|
||||
);
|
||||
```
|
||||
|
||||
### Video Scene Understanding
|
||||
|
||||
```javascript
|
||||
import ZAI from 'z-ai-web-dev-sdk';
|
||||
|
||||
async function understandVideoScenes(videoUrl) {
|
||||
const zai = await ZAI.create();
|
||||
|
||||
const prompt = `Analyze this video and provide:
|
||||
1. Overall summary of the video content
|
||||
2. Main scenes or segments (with approximate timestamps if possible)
|
||||
3. Key people or characters and their roles
|
||||
4. Important actions or events in chronological order
|
||||
5. Setting and environment description
|
||||
6. Overall mood or tone`;
|
||||
|
||||
const response = await zai.chat.completions.createVision({
|
||||
messages: [
|
||||
{
|
||||
role: 'user',
|
||||
content: [
|
||||
{ type: 'text', text: prompt },
|
||||
{ type: 'video_url', video_url: { url: videoUrl } }
|
||||
]
|
||||
}
|
||||
],
|
||||
thinking: { type: 'enabled' } // Enable for detailed analysis
|
||||
});
|
||||
|
||||
return response.choices[0]?.message?.content;
|
||||
}
|
||||
|
||||
// Usage
|
||||
const sceneAnalysis = await understandVideoScenes(
|
||||
'https://example.com/documentary.mp4'
|
||||
);
|
||||
```
|
||||
|
||||
### Motion and Action Detection
|
||||
|
||||
```javascript
|
||||
import ZAI from 'z-ai-web-dev-sdk';
|
||||
|
||||
async function detectActions(videoUrl, specificAction = null) {
|
||||
const zai = await ZAI.create();
|
||||
|
||||
const prompt = specificAction
|
||||
? `Identify all instances of "${specificAction}" in this video. For each instance, describe when it occurs and provide details about how it's performed.`
|
||||
: 'Identify and describe all significant actions and movements in this video. Include who is performing them and when they occur.';
|
||||
|
||||
const response = await zai.chat.completions.createVision({
|
||||
messages: [
|
||||
{
|
||||
role: 'user',
|
||||
content: [
|
||||
{ type: 'text', text: prompt },
|
||||
{ type: 'video_url', video_url: { url: videoUrl } }
|
||||
]
|
||||
}
|
||||
],
|
||||
thinking: { type: 'enabled' }
|
||||
});
|
||||
|
||||
return response.choices[0]?.message?.content;
|
||||
}
|
||||
|
||||
// Usage
|
||||
const runningActions = await detectActions(
|
||||
'https://example.com/sports.mp4',
|
||||
'running'
|
||||
);
|
||||
|
||||
const allActions = await detectActions(
|
||||
'https://example.com/activity.mp4'
|
||||
);
|
||||
```
|
||||
|
||||
### Event Timeline Extraction
|
||||
|
||||
```javascript
|
||||
import ZAI from 'z-ai-web-dev-sdk';
|
||||
|
||||
async function extractTimeline(videoUrl) {
|
||||
const zai = await ZAI.create();
|
||||
|
||||
const prompt = `Create a detailed timeline of events in this video:
|
||||
- Identify key moments and transitions
|
||||
- Note approximate timing (beginning, middle, end or specific timestamps if visible)
|
||||
- Describe what happens at each key point
|
||||
- Identify any cause-and-effect relationships between events
|
||||
|
||||
Format as a chronological list.`;
|
||||
|
||||
const response = await zai.chat.completions.createVision({
|
||||
messages: [
|
||||
{
|
||||
role: 'user',
|
||||
content: [
|
||||
{ type: 'text', text: prompt },
|
||||
{ type: 'video_url', video_url: { url: videoUrl } }
|
||||
]
|
||||
}
|
||||
],
|
||||
thinking: { type: 'enabled' }
|
||||
});
|
||||
|
||||
return response.choices[0]?.message?.content;
|
||||
}
|
||||
```
|
||||
|
||||
### Video Content Classification
|
||||
|
||||
```javascript
|
||||
import ZAI from 'z-ai-web-dev-sdk';
|
||||
|
||||
async function classifyVideo(videoUrl) {
|
||||
const zai = await ZAI.create();
|
||||
|
||||
const prompt = `Classify this video content:
|
||||
1. Primary category (e.g., educational, entertainment, sports, news, tutorial)
|
||||
2. Sub-category or genre
|
||||
3. Target audience
|
||||
4. Content style (professional, casual, documentary, etc.)
|
||||
5. Key themes or topics
|
||||
6. Suggested tags (10-15 keywords)
|
||||
|
||||
Format your response as structured JSON.`;
|
||||
|
||||
const response = await zai.chat.completions.createVision({
|
||||
messages: [
|
||||
{
|
||||
role: 'user',
|
||||
content: [
|
||||
{ type: 'text', text: prompt },
|
||||
{ type: 'video_url', video_url: { url: videoUrl } }
|
||||
]
|
||||
}
|
||||
],
|
||||
thinking: { type: 'disabled' }
|
||||
});
|
||||
|
||||
const content = response.choices[0]?.message?.content;
|
||||
|
||||
try {
|
||||
return JSON.parse(content);
|
||||
} catch (e) {
|
||||
return { rawResponse: content };
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Advanced Use Cases
|
||||
|
||||
### Multi-turn Video Conversation
|
||||
|
||||
```javascript
|
||||
import ZAI from 'z-ai-web-dev-sdk';
|
||||
|
||||
class VideoConversation {
|
||||
constructor() {
|
||||
this.messages = [];
|
||||
}
|
||||
|
||||
async initialize() {
|
||||
this.zai = await ZAI.create();
|
||||
}
|
||||
|
||||
async loadVideo(videoUrl, initialQuestion) {
|
||||
this.messages.push({
|
||||
role: 'user',
|
||||
content: [
|
||||
{ type: 'text', text: initialQuestion },
|
||||
{ type: 'video_url', video_url: { url: videoUrl } }
|
||||
]
|
||||
});
|
||||
|
||||
return this.getResponse();
|
||||
}
|
||||
|
||||
async askFollowUp(question) {
|
||||
this.messages.push({
|
||||
role: 'user',
|
||||
content: [
|
||||
{ type: 'text', text: question }
|
||||
]
|
||||
});
|
||||
|
||||
return this.getResponse();
|
||||
}
|
||||
|
||||
async getResponse() {
|
||||
const response = await this.zai.chat.completions.createVision({
|
||||
messages: this.messages,
|
||||
thinking: { type: 'disabled' }
|
||||
});
|
||||
|
||||
const assistantMessage = response.choices[0]?.message?.content;
|
||||
|
||||
this.messages.push({
|
||||
role: 'assistant',
|
||||
content: assistantMessage
|
||||
});
|
||||
|
||||
return assistantMessage;
|
||||
}
|
||||
}
|
||||
|
||||
// Usage
|
||||
const conversation = new VideoConversation();
|
||||
await conversation.initialize();
|
||||
|
||||
const initial = await conversation.loadVideo(
|
||||
'https://example.com/lecture.mp4',
|
||||
'What is the main topic of this lecture?'
|
||||
);
|
||||
|
||||
const followup1 = await conversation.askFollowUp(
|
||||
'Can you explain the key concepts mentioned?'
|
||||
);
|
||||
|
||||
const followup2 = await conversation.askFollowUp(
|
||||
'What examples were used to illustrate these concepts?'
|
||||
);
|
||||
```
|
||||
|
||||
### Video Quality Assessment
|
||||
|
||||
```javascript
|
||||
import ZAI from 'z-ai-web-dev-sdk';
|
||||
|
||||
async function assessVideoQuality(videoUrl) {
|
||||
const zai = await ZAI.create();
|
||||
|
||||
const prompt = `Assess the quality of this video:
|
||||
1. Visual quality (resolution, clarity, lighting) - Rate 1-10
|
||||
2. Audio quality (if audio is present) - Rate 1-10
|
||||
3. Camera work (stability, framing, composition) - Rate 1-10
|
||||
4. Production value (editing, transitions, effects) - Rate 1-10
|
||||
5. Content clarity (is the message clear?) - Rate 1-10
|
||||
6. Pacing (too fast, too slow, just right)
|
||||
7. Technical issues (artifacts, blur, audio sync, etc.)
|
||||
8. Overall rating - 1-10
|
||||
9. Specific recommendations for improvement
|
||||
|
||||
Provide detailed feedback for each criterion.`;
|
||||
|
||||
const response = await zai.chat.completions.createVision({
|
||||
messages: [
|
||||
{
|
||||
role: 'user',
|
||||
content: [
|
||||
{ type: 'text', text: prompt },
|
||||
{ type: 'video_url', video_url: { url: videoUrl } }
|
||||
]
|
||||
}
|
||||
],
|
||||
thinking: { type: 'enabled' }
|
||||
});
|
||||
|
||||
return response.choices[0]?.message?.content;
|
||||
}
|
||||
```
|
||||
|
||||
### Video Content Moderation
|
||||
|
||||
```javascript
|
||||
import ZAI from 'z-ai-web-dev-sdk';
|
||||
|
||||
async function moderateVideo(videoUrl) {
|
||||
const zai = await ZAI.create();
|
||||
|
||||
const prompt = `Review this video for content moderation:
|
||||
1. Check for any inappropriate or sensitive content
|
||||
2. Identify any potential safety concerns
|
||||
3. Note any content that might violate common community guidelines
|
||||
4. Assess age-appropriateness
|
||||
5. Identify any copyrighted material visible (logos, brands, music)
|
||||
6. Overall safety rating: Safe / Caution / Review Required
|
||||
|
||||
Provide specific examples for any concerns identified.`;
|
||||
|
||||
const response = await zai.chat.completions.createVision({
|
||||
messages: [
|
||||
{
|
||||
role: 'user',
|
||||
content: [
|
||||
{ type: 'text', text: prompt },
|
||||
{ type: 'video_url', video_url: { url: videoUrl } }
|
||||
]
|
||||
}
|
||||
],
|
||||
thinking: { type: 'enabled' }
|
||||
});
|
||||
|
||||
return response.choices[0]?.message?.content;
|
||||
}
|
||||
```
|
||||
|
||||
### Video Transcript Generation (Visual Description)
|
||||
|
||||
```javascript
|
||||
import ZAI from 'z-ai-web-dev-sdk';
|
||||
|
||||
async function generateVisualTranscript(videoUrl) {
|
||||
const zai = await ZAI.create();
|
||||
|
||||
const prompt = `Generate a detailed visual transcript of this video:
|
||||
- Describe what's happening in each scene
|
||||
- Note any text that appears on screen
|
||||
- Describe important visual elements
|
||||
- Mention any scene changes or transitions
|
||||
- Include descriptions of people's actions and expressions
|
||||
|
||||
Format as a time-based narrative (e.g., "At the beginning...", "Then...", "Finally...").`;
|
||||
|
||||
const response = await zai.chat.completions.createVision({
|
||||
messages: [
|
||||
{
|
||||
role: 'user',
|
||||
content: [
|
||||
{ type: 'text', text: prompt },
|
||||
{ type: 'video_url', video_url: { url: videoUrl } }
|
||||
]
|
||||
}
|
||||
],
|
||||
thinking: { type: 'disabled' }
|
||||
});
|
||||
|
||||
return response.choices[0]?.message?.content;
|
||||
}
|
||||
```
|
||||
|
||||
### Sports Video Analysis
|
||||
|
||||
```javascript
|
||||
import ZAI from 'z-ai-web-dev-sdk';
|
||||
|
||||
async function analyzeSportsVideo(videoUrl, sport = null) {
|
||||
const zai = await ZAI.create();
|
||||
|
||||
const prompt = sport
|
||||
? `Analyze this ${sport} video in detail:
|
||||
1. Identify players and their positions
|
||||
2. Describe key plays and strategies
|
||||
3. Note scoring events or important moments
|
||||
4. Assess player performance
|
||||
5. Identify any rule violations or fouls
|
||||
6. Describe the pace and flow of the game`
|
||||
: `Analyze this sports video:
|
||||
1. Identify the sport being played
|
||||
2. Describe the key actions and plays
|
||||
3. Note any scoring or significant events
|
||||
4. Describe player movements and strategies
|
||||
5. Overall assessment of the game or match`;
|
||||
|
||||
const response = await zai.chat.completions.createVision({
|
||||
messages: [
|
||||
{
|
||||
role: 'user',
|
||||
content: [
|
||||
{ type: 'text', text: prompt },
|
||||
{ type: 'video_url', video_url: { url: videoUrl } }
|
||||
]
|
||||
}
|
||||
],
|
||||
thinking: { type: 'enabled' }
|
||||
});
|
||||
|
||||
return response.choices[0]?.message?.content;
|
||||
}
|
||||
```
|
||||
|
||||
### Educational Video Summarization
|
||||
|
||||
```javascript
|
||||
import ZAI from 'z-ai-web-dev-sdk';
|
||||
|
||||
async function summarizeEducationalVideo(videoUrl) {
|
||||
const zai = await ZAI.create();
|
||||
|
||||
const prompt = `Summarize this educational video for students:
|
||||
1. Main topic or learning objective
|
||||
2. Key concepts explained (in order)
|
||||
3. Important definitions or terminology
|
||||
4. Examples used to illustrate concepts
|
||||
5. Visual aids or demonstrations shown
|
||||
6. Key takeaways or conclusions
|
||||
7. Suggested review points
|
||||
|
||||
Format as a study guide.`;
|
||||
|
||||
const response = await zai.chat.completions.createVision({
|
||||
messages: [
|
||||
{
|
||||
role: 'user',
|
||||
content: [
|
||||
{ type: 'text', text: prompt },
|
||||
{ type: 'video_url', video_url: { url: videoUrl } }
|
||||
]
|
||||
}
|
||||
],
|
||||
thinking: { type: 'enabled' }
|
||||
});
|
||||
|
||||
return response.choices[0]?.message?.content;
|
||||
}
|
||||
```
|
||||
|
||||
## Batch Video Processing
|
||||
|
||||
### Process Multiple Videos
|
||||
|
||||
```javascript
|
||||
import ZAI from 'z-ai-web-dev-sdk';
|
||||
|
||||
class VideoBatchProcessor {
|
||||
constructor() {
|
||||
this.zai = null;
|
||||
}
|
||||
|
||||
async initialize() {
|
||||
this.zai = await ZAI.create();
|
||||
}
|
||||
|
||||
async processVideo(videoUrl, prompt) {
|
||||
const response = await this.zai.chat.completions.createVision({
|
||||
messages: [
|
||||
{
|
||||
role: 'user',
|
||||
content: [
|
||||
{ type: 'text', text: prompt },
|
||||
{ type: 'video_url', video_url: { url: videoUrl } }
|
||||
]
|
||||
}
|
||||
],
|
||||
thinking: { type: 'disabled' }
|
||||
});
|
||||
|
||||
return response.choices[0]?.message?.content;
|
||||
}
|
||||
|
||||
async processBatch(videoUrls, prompt) {
|
||||
const results = [];
|
||||
|
||||
for (const videoUrl of videoUrls) {
|
||||
try {
|
||||
console.log(`Processing: ${videoUrl}`);
|
||||
const result = await this.processVideo(videoUrl, prompt);
|
||||
results.push({ videoUrl, success: true, result });
|
||||
|
||||
// Add delay to avoid rate limiting
|
||||
await new Promise(resolve => setTimeout(resolve, 1000));
|
||||
} catch (error) {
|
||||
results.push({
|
||||
videoUrl,
|
||||
success: false,
|
||||
error: error.message
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
return results;
|
||||
}
|
||||
}
|
||||
|
||||
// Usage
|
||||
const processor = new VideoBatchProcessor();
|
||||
await processor.initialize();
|
||||
|
||||
const videos = [
|
||||
'https://example.com/video1.mp4',
|
||||
'https://example.com/video2.mp4',
|
||||
'https://example.com/video3.mp4'
|
||||
];
|
||||
|
||||
const results = await processor.processBatch(
|
||||
videos,
|
||||
'Provide a brief summary of this video suitable for a content catalog'
|
||||
);
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Video Preparation
|
||||
- Use standard video formats (MP4, MOV, AVI)
|
||||
- Ensure videos are accessible via public URLs or properly encoded
|
||||
- For long videos, consider creating shorter clips for specific analysis
|
||||
- Optimize video size for faster processing
|
||||
- Ensure good lighting and audio quality in source videos
|
||||
|
||||
### 2. Prompt Engineering for Videos
|
||||
- Be specific about temporal aspects ("beginning", "throughout", "at the end")
|
||||
- Mention what type of analysis you need (actions, events, scenes, etc.)
|
||||
- For long videos, ask for summaries or key moments
|
||||
- Use thinking mode for complex temporal reasoning
|
||||
- Specify if you need chronological or thematic organization
|
||||
|
||||
### 3. Error Handling
|
||||
|
||||
```javascript
|
||||
async function safeVideoAnalysis(videoUrl, prompt) {
|
||||
try {
|
||||
const zai = await ZAI.create();
|
||||
|
||||
const response = await zai.chat.completions.createVision({
|
||||
messages: [
|
||||
{
|
||||
role: 'user',
|
||||
content: [
|
||||
{ type: 'text', text: prompt },
|
||||
{ type: 'video_url', video_url: { url: videoUrl } }
|
||||
]
|
||||
}
|
||||
],
|
||||
thinking: { type: 'disabled' }
|
||||
});
|
||||
|
||||
return {
|
||||
success: true,
|
||||
content: response.choices[0]?.message?.content
|
||||
};
|
||||
} catch (error) {
|
||||
console.error('Video analysis error:', error);
|
||||
return {
|
||||
success: false,
|
||||
error: error.message
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Performance Optimization
|
||||
- Cache SDK instance for batch processing
|
||||
- Implement request throttling (add delays between requests)
|
||||
- Process videos asynchronously when possible
|
||||
- For very long videos, consider analyzing at specific intervals
|
||||
- Use appropriate thinking mode (disabled for simple descriptions, enabled for complex analysis)
|
||||
|
||||
### 5. Security Considerations
|
||||
- Validate video URLs before processing
|
||||
- Implement rate limiting for public APIs
|
||||
- Sanitize user-provided video URLs
|
||||
- Never expose SDK credentials in client-side code
|
||||
- Implement content moderation for user-uploaded videos
|
||||
- Consider video file size limits
|
||||
|
||||
## Common Use Cases
|
||||
|
||||
1. **Content Moderation**: Automatically review video uploads for policy compliance
|
||||
2. **Video Cataloging**: Generate descriptions and tags for video libraries
|
||||
3. **Sports Analysis**: Analyze games, identify plays, assess performance
|
||||
4. **Educational Content**: Summarize lectures, create study guides
|
||||
5. **Security & Surveillance**: Detect events, track activities (with appropriate authorization)
|
||||
6. **Quality Control**: Assess video production quality
|
||||
7. **Social Media**: Generate video captions and descriptions
|
||||
8. **Training & Documentation**: Analyze training videos, create documentation
|
||||
9. **Event Recording**: Summarize meetings, conferences, presentations
|
||||
10. **Entertainment**: Analyze films, shows for content, themes, scenes
|
||||
|
||||
## Integration Examples
|
||||
|
||||
### Express.js API Endpoint
|
||||
|
||||
```javascript
|
||||
import express from 'express';
|
||||
import ZAI from 'z-ai-web-dev-sdk';
|
||||
|
||||
const app = express();
|
||||
app.use(express.json());
|
||||
|
||||
let zaiInstance;
|
||||
|
||||
async function initZAI() {
|
||||
zaiInstance = await ZAI.create();
|
||||
}
|
||||
|
||||
// Analyze video from URL
|
||||
app.post('/api/analyze-video', async (req, res) => {
|
||||
try {
|
||||
const { videoUrl, prompt } = req.body;
|
||||
|
||||
if (!videoUrl || !prompt) {
|
||||
return res.status(400).json({
|
||||
error: 'videoUrl and prompt are required'
|
||||
});
|
||||
}
|
||||
|
||||
const response = await zaiInstance.chat.completions.createVision({
|
||||
messages: [
|
||||
{
|
||||
role: 'user',
|
||||
content: [
|
||||
{ type: 'text', text: prompt },
|
||||
{ type: 'video_url', video_url: { url: videoUrl } }
|
||||
]
|
||||
}
|
||||
],
|
||||
thinking: { type: 'disabled' }
|
||||
});
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
analysis: response.choices[0]?.message?.content
|
||||
});
|
||||
} catch (error) {
|
||||
res.status(500).json({
|
||||
success: false,
|
||||
error: error.message
|
||||
});
|
||||
}
|
||||
});
|
||||
|
||||
// Get video summary
|
||||
app.post('/api/video-summary', async (req, res) => {
|
||||
try {
|
||||
const { videoUrl } = req.body;
|
||||
|
||||
if (!videoUrl) {
|
||||
return res.status(400).json({ error: 'videoUrl is required' });
|
||||
}
|
||||
|
||||
const prompt = 'Provide a comprehensive summary of this video including: 1) Main content/topic, 2) Key events in chronological order, 3) Important people or subjects, 4) Overall takeaway.';
|
||||
|
||||
const response = await zaiInstance.chat.completions.createVision({
|
||||
messages: [
|
||||
{
|
||||
role: 'user',
|
||||
content: [
|
||||
{ type: 'text', text: prompt },
|
||||
{ type: 'video_url', video_url: { url: videoUrl } }
|
||||
]
|
||||
}
|
||||
],
|
||||
thinking: { type: 'enabled' }
|
||||
});
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
summary: response.choices[0]?.message?.content
|
||||
});
|
||||
} catch (error) {
|
||||
res.status(500).json({
|
||||
success: false,
|
||||
error: error.message
|
||||
});
|
||||
}
|
||||
});
|
||||
|
||||
initZAI().then(() => {
|
||||
app.listen(3000, () => {
|
||||
console.log('Video understanding API running on port 3000');
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
### Next.js API Route
|
||||
|
||||
```javascript
|
||||
// pages/api/video-understand.js
|
||||
import ZAI from 'z-ai-web-dev-sdk';
|
||||
|
||||
let zaiInstance = null;
|
||||
|
||||
async function getZAI() {
|
||||
if (!zaiInstance) {
|
||||
zaiInstance = await ZAI.create();
|
||||
}
|
||||
return zaiInstance;
|
||||
}
|
||||
|
||||
export default async function handler(req, res) {
|
||||
if (req.method !== 'POST') {
|
||||
return res.status(405).json({ error: 'Method not allowed' });
|
||||
}
|
||||
|
||||
try {
|
||||
const { videoUrl, prompt, enableThinking = false } = req.body;
|
||||
|
||||
if (!videoUrl || !prompt) {
|
||||
return res.status(400).json({
|
||||
error: 'videoUrl and prompt are required'
|
||||
});
|
||||
}
|
||||
|
||||
const zai = await getZAI();
|
||||
|
||||
const response = await zai.chat.completions.createVision({
|
||||
messages: [
|
||||
{
|
||||
role: 'user',
|
||||
content: [
|
||||
{ type: 'text', text: prompt },
|
||||
{ type: 'video_url', video_url: { url: videoUrl } }
|
||||
]
|
||||
}
|
||||
],
|
||||
thinking: { type: enableThinking ? 'enabled' : 'disabled' }
|
||||
});
|
||||
|
||||
res.status(200).json({
|
||||
success: true,
|
||||
analysis: response.choices[0]?.message?.content
|
||||
});
|
||||
} catch (error) {
|
||||
console.error('Error:', error);
|
||||
res.status(500).json({
|
||||
success: false,
|
||||
error: error.message
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Issue**: "SDK must be used in backend"
|
||||
- **Solution**: Ensure z-ai-web-dev-sdk is only imported and used in server-side code, never in client/browser code
|
||||
|
||||
**Issue**: Video not loading or being analyzed
|
||||
- **Solution**: Verify the video URL is accessible, returns correct MIME type, and is in a supported format
|
||||
|
||||
**Issue**: Inaccurate temporal analysis
|
||||
- **Solution**: Enable thinking mode for complex temporal reasoning, provide more specific prompts about time/sequence
|
||||
|
||||
**Issue**: Slow response times for videos
|
||||
- **Solution**: Videos take longer to process than images; consider shorter clips or sampling for long videos
|
||||
|
||||
**Issue**: Missing details from video
|
||||
- **Solution**: Be more specific in your prompt, ask about particular time segments or aspects
|
||||
|
||||
**Issue**: Video format not supported
|
||||
- **Solution**: Convert video to MP4 (most widely supported), check that URL returns proper video MIME type
|
||||
|
||||
## Remember
|
||||
|
||||
- Always use z-ai-web-dev-sdk in backend code only
|
||||
- The SDK is already installed - import as shown in examples
|
||||
- Use `video_url` content type for video files
|
||||
- Video analysis takes longer than image analysis - be patient
|
||||
- Enable thinking mode for complex temporal reasoning and event detection
|
||||
- Structure prompts to include temporal information (beginning, middle, end)
|
||||
- Handle errors gracefully in production
|
||||
- Implement rate limiting and delays for batch processing
|
||||
- Validate and sanitize user inputs
|
||||
- Consider privacy and security when processing user videos
|
||||
- For very long videos, consider analyzing specific segments or key frames
|
||||
41
skills/video-understand/scripts/video-understand.ts
Executable file
41
skills/video-understand/scripts/video-understand.ts
Executable file
@@ -0,0 +1,41 @@
|
||||
import ZAI, { VisionMessage } from 'z-ai-web-dev-sdk';
|
||||
|
||||
async function main(videoUrl: string, prompt: string) {
|
||||
try {
|
||||
const zai = await ZAI.create();
|
||||
|
||||
const messages: VisionMessage[] = [
|
||||
{
|
||||
role: 'assistant',
|
||||
content: [
|
||||
{ type: 'text', text: 'Output only text, no markdown.' }
|
||||
]
|
||||
},
|
||||
{
|
||||
role: 'user',
|
||||
content: [
|
||||
{ type: 'text', text: prompt },
|
||||
{ type: 'video_url', video_url: { url: videoUrl } }
|
||||
]
|
||||
}
|
||||
];
|
||||
|
||||
const response = await zai.chat.completions.createVision({
|
||||
model: 'glm-4.6v',
|
||||
messages,
|
||||
thinking: { type: 'disabled' }
|
||||
});
|
||||
|
||||
const reply = response.choices?.[0]?.message?.content;
|
||||
console.log('Video Understanding Result:');
|
||||
console.log(reply ?? JSON.stringify(response, null, 2));
|
||||
} catch (err: any) {
|
||||
console.error('Video understanding failed:', err?.message || err);
|
||||
}
|
||||
}
|
||||
|
||||
// Example usage - analyze a video
|
||||
main(
|
||||
"https://example.com/sample-video.mp4",
|
||||
"Please analyze this video and describe the main events, actions, and key moments in chronological order."
|
||||
);
|
||||
Reference in New Issue
Block a user