- Context Compaction System with token counting and summarization - Deterministic State Machine for flow control (no LLM decisions) - Parallel Execution Engine (up to 12 concurrent sessions) - Event-Driven Coordination via Event Bus - Agent Workspace Isolation (tools, memory, identity, files) - YAML Workflow Integration (OpenClaw/Lobster compatible) - Claude Code integration layer - Complete demo UI with real-time visualization - Comprehensive documentation and README Components: - agent-system/: Context management, token counting, subagent spawning - pipeline-system/: State machine, parallel executor, event bus, workflows - skills/: AI capabilities (LLM, ASR, TTS, VLM, image generation, etc.) - src/app/: Next.js demo application Total: ~100KB of production-ready TypeScript code
917 lines
24 KiB
Markdown
Executable File
917 lines
24 KiB
Markdown
Executable File
---
|
|
name: video-understand
|
|
description: Implement specialized video understanding capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to analyze video content, understand motion and temporal sequences, extract information from video frames, describe video scenes, or perform video-based AI analysis. Optimized for MP4, AVI, MOV, and other common video formats.
|
|
license: MIT
|
|
---
|
|
|
|
# Video Understanding Skill
|
|
|
|
This skill provides specialized video understanding functionality using the z-ai-web-dev-sdk package, enabling AI models to analyze, describe, and extract information from video content including motion, temporal sequences, and scene changes.
|
|
|
|
## Skills Path
|
|
|
|
**Skill Location**: `{project_path}/skills/video-understand`
|
|
|
|
this skill is located at above path in your project.
|
|
|
|
**Reference Scripts**: Example test scripts are available in the `{Skill Location}/scripts/` directory for quick testing and reference. See `{Skill Location}/scripts/video-understand.ts` for a working example.
|
|
|
|
## Overview
|
|
|
|
Video Understanding focuses specifically on video content analysis, providing capabilities for:
|
|
- Video scene understanding and description
|
|
- Action and motion detection
|
|
- Temporal sequence analysis
|
|
- Event detection in videos
|
|
- Video content summarization
|
|
- Scene change detection
|
|
- People and object tracking across frames
|
|
- Audio-visual content analysis (when applicable)
|
|
|
|
**IMPORTANT**: z-ai-web-dev-sdk MUST be used in backend code only. Never use it in client-side code.
|
|
|
|
## Prerequisites
|
|
|
|
The z-ai-web-dev-sdk package is already installed. Import it as shown in the examples below.
|
|
|
|
## CLI Usage (For Simple Tasks)
|
|
|
|
For quick video analysis tasks, you can use the z-ai CLI instead of writing code. This is ideal for simple video descriptions, testing, or automation.
|
|
|
|
### Basic Video Analysis
|
|
|
|
```bash
|
|
# Analyze a video from URL
|
|
z-ai vision --prompt "Summarize what happens in this video" --image "https://example.com/video.mp4"
|
|
|
|
# Note: Use --image flag for video URLs as well
|
|
z-ai vision -p "Describe the key events" -i "https://example.com/presentation.mp4"
|
|
```
|
|
|
|
### Analyze Local Videos
|
|
|
|
```bash
|
|
# Analyze a local video file
|
|
z-ai vision -p "What activities are shown in this video?" -i "./recording.mp4"
|
|
|
|
# Save response to file
|
|
z-ai vision -p "Provide a detailed summary" -i "./meeting.mp4" -o summary.json
|
|
```
|
|
|
|
### Advanced Video Analysis
|
|
|
|
```bash
|
|
# Complex scene understanding with thinking
|
|
z-ai vision \
|
|
-p "Analyze this video and identify: 1) Main events, 2) People and their actions, 3) Timeline of key moments" \
|
|
-i "./event.mp4" \
|
|
--thinking \
|
|
-o analysis.json
|
|
|
|
# Action detection
|
|
z-ai vision \
|
|
-p "Identify all actions performed by people in this video" \
|
|
-i "./sports.mp4" \
|
|
--thinking
|
|
```
|
|
|
|
### Streaming Output
|
|
|
|
```bash
|
|
# Stream the video analysis
|
|
z-ai vision -p "Describe this video content" -i "./video.mp4" --stream
|
|
```
|
|
|
|
### CLI Parameters
|
|
|
|
- `--prompt, -p <text>`: **Required** - Question or instruction about the video
|
|
- `--image, -i <URL or path>`: Optional - Video URL or local file path (despite the name, it works for videos too)
|
|
- `--thinking, -t`: Optional - Enable chain-of-thought reasoning for complex analysis (default: disabled)
|
|
- `--output, -o <path>`: Optional - Output file path (JSON format)
|
|
- `--stream`: Optional - Stream the response in real-time
|
|
|
|
### Supported Video Formats
|
|
|
|
- MP4 (.mp4) - Most widely supported format
|
|
- AVI (.avi) - Audio Video Interleave
|
|
- MOV (.mov) - QuickTime format
|
|
- WebM (.webm) - Web-optimized format
|
|
- MKV (.mkv) - Matroska format
|
|
- FLV (.flv) - Flash Video format
|
|
|
|
### When to Use CLI vs SDK
|
|
|
|
**Use CLI for:**
|
|
- Quick video summaries
|
|
- One-off video analysis
|
|
- Testing video understanding capabilities
|
|
- Simple automation scripts
|
|
- Generating video descriptions
|
|
|
|
**Use SDK for:**
|
|
- Multi-turn conversations about videos
|
|
- Complex video processing pipelines
|
|
- Production applications with error handling
|
|
- Custom integration with video processing logic
|
|
- Batch video processing with custom workflows
|
|
|
|
## Recommended Approach
|
|
|
|
For better performance and reliability with local videos, consider:
|
|
1. Uploading videos to a CDN and using URLs
|
|
2. For shorter videos, convert key frames to images for faster analysis
|
|
3. For long videos, consider chunking or sampling at intervals
|
|
|
|
## Basic Video Understanding Implementation
|
|
|
|
### Single Video Analysis
|
|
|
|
```javascript
|
|
import ZAI from 'z-ai-web-dev-sdk';
|
|
|
|
async function analyzeVideo(videoUrl, prompt) {
|
|
const zai = await ZAI.create();
|
|
|
|
const response = await zai.chat.completions.createVision({
|
|
messages: [
|
|
{
|
|
role: 'user',
|
|
content: [
|
|
{
|
|
type: 'text',
|
|
text: prompt
|
|
},
|
|
{
|
|
type: 'video_url',
|
|
video_url: {
|
|
url: videoUrl
|
|
}
|
|
}
|
|
]
|
|
}
|
|
],
|
|
thinking: { type: 'disabled' }
|
|
});
|
|
|
|
return response.choices[0]?.message?.content;
|
|
}
|
|
|
|
// Usage examples
|
|
const summary = await analyzeVideo(
|
|
'https://example.com/presentation.mp4',
|
|
'Summarize the key points presented in this video'
|
|
);
|
|
|
|
const actionDetection = await analyzeVideo(
|
|
'https://example.com/sports.mp4',
|
|
'Identify and describe all athletic actions performed in this video'
|
|
);
|
|
```
|
|
|
|
### Video Scene Understanding
|
|
|
|
```javascript
|
|
import ZAI from 'z-ai-web-dev-sdk';
|
|
|
|
async function understandVideoScenes(videoUrl) {
|
|
const zai = await ZAI.create();
|
|
|
|
const prompt = `Analyze this video and provide:
|
|
1. Overall summary of the video content
|
|
2. Main scenes or segments (with approximate timestamps if possible)
|
|
3. Key people or characters and their roles
|
|
4. Important actions or events in chronological order
|
|
5. Setting and environment description
|
|
6. Overall mood or tone`;
|
|
|
|
const response = await zai.chat.completions.createVision({
|
|
messages: [
|
|
{
|
|
role: 'user',
|
|
content: [
|
|
{ type: 'text', text: prompt },
|
|
{ type: 'video_url', video_url: { url: videoUrl } }
|
|
]
|
|
}
|
|
],
|
|
thinking: { type: 'enabled' } // Enable for detailed analysis
|
|
});
|
|
|
|
return response.choices[0]?.message?.content;
|
|
}
|
|
|
|
// Usage
|
|
const sceneAnalysis = await understandVideoScenes(
|
|
'https://example.com/documentary.mp4'
|
|
);
|
|
```
|
|
|
|
### Motion and Action Detection
|
|
|
|
```javascript
|
|
import ZAI from 'z-ai-web-dev-sdk';
|
|
|
|
async function detectActions(videoUrl, specificAction = null) {
|
|
const zai = await ZAI.create();
|
|
|
|
const prompt = specificAction
|
|
? `Identify all instances of "${specificAction}" in this video. For each instance, describe when it occurs and provide details about how it's performed.`
|
|
: 'Identify and describe all significant actions and movements in this video. Include who is performing them and when they occur.';
|
|
|
|
const response = await zai.chat.completions.createVision({
|
|
messages: [
|
|
{
|
|
role: 'user',
|
|
content: [
|
|
{ type: 'text', text: prompt },
|
|
{ type: 'video_url', video_url: { url: videoUrl } }
|
|
]
|
|
}
|
|
],
|
|
thinking: { type: 'enabled' }
|
|
});
|
|
|
|
return response.choices[0]?.message?.content;
|
|
}
|
|
|
|
// Usage
|
|
const runningActions = await detectActions(
|
|
'https://example.com/sports.mp4',
|
|
'running'
|
|
);
|
|
|
|
const allActions = await detectActions(
|
|
'https://example.com/activity.mp4'
|
|
);
|
|
```
|
|
|
|
### Event Timeline Extraction
|
|
|
|
```javascript
|
|
import ZAI from 'z-ai-web-dev-sdk';
|
|
|
|
async function extractTimeline(videoUrl) {
|
|
const zai = await ZAI.create();
|
|
|
|
const prompt = `Create a detailed timeline of events in this video:
|
|
- Identify key moments and transitions
|
|
- Note approximate timing (beginning, middle, end or specific timestamps if visible)
|
|
- Describe what happens at each key point
|
|
- Identify any cause-and-effect relationships between events
|
|
|
|
Format as a chronological list.`;
|
|
|
|
const response = await zai.chat.completions.createVision({
|
|
messages: [
|
|
{
|
|
role: 'user',
|
|
content: [
|
|
{ type: 'text', text: prompt },
|
|
{ type: 'video_url', video_url: { url: videoUrl } }
|
|
]
|
|
}
|
|
],
|
|
thinking: { type: 'enabled' }
|
|
});
|
|
|
|
return response.choices[0]?.message?.content;
|
|
}
|
|
```
|
|
|
|
### Video Content Classification
|
|
|
|
```javascript
|
|
import ZAI from 'z-ai-web-dev-sdk';
|
|
|
|
async function classifyVideo(videoUrl) {
|
|
const zai = await ZAI.create();
|
|
|
|
const prompt = `Classify this video content:
|
|
1. Primary category (e.g., educational, entertainment, sports, news, tutorial)
|
|
2. Sub-category or genre
|
|
3. Target audience
|
|
4. Content style (professional, casual, documentary, etc.)
|
|
5. Key themes or topics
|
|
6. Suggested tags (10-15 keywords)
|
|
|
|
Format your response as structured JSON.`;
|
|
|
|
const response = await zai.chat.completions.createVision({
|
|
messages: [
|
|
{
|
|
role: 'user',
|
|
content: [
|
|
{ type: 'text', text: prompt },
|
|
{ type: 'video_url', video_url: { url: videoUrl } }
|
|
]
|
|
}
|
|
],
|
|
thinking: { type: 'disabled' }
|
|
});
|
|
|
|
const content = response.choices[0]?.message?.content;
|
|
|
|
try {
|
|
return JSON.parse(content);
|
|
} catch (e) {
|
|
return { rawResponse: content };
|
|
}
|
|
}
|
|
```
|
|
|
|
## Advanced Use Cases
|
|
|
|
### Multi-turn Video Conversation
|
|
|
|
```javascript
|
|
import ZAI from 'z-ai-web-dev-sdk';
|
|
|
|
class VideoConversation {
|
|
constructor() {
|
|
this.messages = [];
|
|
}
|
|
|
|
async initialize() {
|
|
this.zai = await ZAI.create();
|
|
}
|
|
|
|
async loadVideo(videoUrl, initialQuestion) {
|
|
this.messages.push({
|
|
role: 'user',
|
|
content: [
|
|
{ type: 'text', text: initialQuestion },
|
|
{ type: 'video_url', video_url: { url: videoUrl } }
|
|
]
|
|
});
|
|
|
|
return this.getResponse();
|
|
}
|
|
|
|
async askFollowUp(question) {
|
|
this.messages.push({
|
|
role: 'user',
|
|
content: [
|
|
{ type: 'text', text: question }
|
|
]
|
|
});
|
|
|
|
return this.getResponse();
|
|
}
|
|
|
|
async getResponse() {
|
|
const response = await this.zai.chat.completions.createVision({
|
|
messages: this.messages,
|
|
thinking: { type: 'disabled' }
|
|
});
|
|
|
|
const assistantMessage = response.choices[0]?.message?.content;
|
|
|
|
this.messages.push({
|
|
role: 'assistant',
|
|
content: assistantMessage
|
|
});
|
|
|
|
return assistantMessage;
|
|
}
|
|
}
|
|
|
|
// Usage
|
|
const conversation = new VideoConversation();
|
|
await conversation.initialize();
|
|
|
|
const initial = await conversation.loadVideo(
|
|
'https://example.com/lecture.mp4',
|
|
'What is the main topic of this lecture?'
|
|
);
|
|
|
|
const followup1 = await conversation.askFollowUp(
|
|
'Can you explain the key concepts mentioned?'
|
|
);
|
|
|
|
const followup2 = await conversation.askFollowUp(
|
|
'What examples were used to illustrate these concepts?'
|
|
);
|
|
```
|
|
|
|
### Video Quality Assessment
|
|
|
|
```javascript
|
|
import ZAI from 'z-ai-web-dev-sdk';
|
|
|
|
async function assessVideoQuality(videoUrl) {
|
|
const zai = await ZAI.create();
|
|
|
|
const prompt = `Assess the quality of this video:
|
|
1. Visual quality (resolution, clarity, lighting) - Rate 1-10
|
|
2. Audio quality (if audio is present) - Rate 1-10
|
|
3. Camera work (stability, framing, composition) - Rate 1-10
|
|
4. Production value (editing, transitions, effects) - Rate 1-10
|
|
5. Content clarity (is the message clear?) - Rate 1-10
|
|
6. Pacing (too fast, too slow, just right)
|
|
7. Technical issues (artifacts, blur, audio sync, etc.)
|
|
8. Overall rating - 1-10
|
|
9. Specific recommendations for improvement
|
|
|
|
Provide detailed feedback for each criterion.`;
|
|
|
|
const response = await zai.chat.completions.createVision({
|
|
messages: [
|
|
{
|
|
role: 'user',
|
|
content: [
|
|
{ type: 'text', text: prompt },
|
|
{ type: 'video_url', video_url: { url: videoUrl } }
|
|
]
|
|
}
|
|
],
|
|
thinking: { type: 'enabled' }
|
|
});
|
|
|
|
return response.choices[0]?.message?.content;
|
|
}
|
|
```
|
|
|
|
### Video Content Moderation
|
|
|
|
```javascript
|
|
import ZAI from 'z-ai-web-dev-sdk';
|
|
|
|
async function moderateVideo(videoUrl) {
|
|
const zai = await ZAI.create();
|
|
|
|
const prompt = `Review this video for content moderation:
|
|
1. Check for any inappropriate or sensitive content
|
|
2. Identify any potential safety concerns
|
|
3. Note any content that might violate common community guidelines
|
|
4. Assess age-appropriateness
|
|
5. Identify any copyrighted material visible (logos, brands, music)
|
|
6. Overall safety rating: Safe / Caution / Review Required
|
|
|
|
Provide specific examples for any concerns identified.`;
|
|
|
|
const response = await zai.chat.completions.createVision({
|
|
messages: [
|
|
{
|
|
role: 'user',
|
|
content: [
|
|
{ type: 'text', text: prompt },
|
|
{ type: 'video_url', video_url: { url: videoUrl } }
|
|
]
|
|
}
|
|
],
|
|
thinking: { type: 'enabled' }
|
|
});
|
|
|
|
return response.choices[0]?.message?.content;
|
|
}
|
|
```
|
|
|
|
### Video Transcript Generation (Visual Description)
|
|
|
|
```javascript
|
|
import ZAI from 'z-ai-web-dev-sdk';
|
|
|
|
async function generateVisualTranscript(videoUrl) {
|
|
const zai = await ZAI.create();
|
|
|
|
const prompt = `Generate a detailed visual transcript of this video:
|
|
- Describe what's happening in each scene
|
|
- Note any text that appears on screen
|
|
- Describe important visual elements
|
|
- Mention any scene changes or transitions
|
|
- Include descriptions of people's actions and expressions
|
|
|
|
Format as a time-based narrative (e.g., "At the beginning...", "Then...", "Finally...").`;
|
|
|
|
const response = await zai.chat.completions.createVision({
|
|
messages: [
|
|
{
|
|
role: 'user',
|
|
content: [
|
|
{ type: 'text', text: prompt },
|
|
{ type: 'video_url', video_url: { url: videoUrl } }
|
|
]
|
|
}
|
|
],
|
|
thinking: { type: 'disabled' }
|
|
});
|
|
|
|
return response.choices[0]?.message?.content;
|
|
}
|
|
```
|
|
|
|
### Sports Video Analysis
|
|
|
|
```javascript
|
|
import ZAI from 'z-ai-web-dev-sdk';
|
|
|
|
async function analyzeSportsVideo(videoUrl, sport = null) {
|
|
const zai = await ZAI.create();
|
|
|
|
const prompt = sport
|
|
? `Analyze this ${sport} video in detail:
|
|
1. Identify players and their positions
|
|
2. Describe key plays and strategies
|
|
3. Note scoring events or important moments
|
|
4. Assess player performance
|
|
5. Identify any rule violations or fouls
|
|
6. Describe the pace and flow of the game`
|
|
: `Analyze this sports video:
|
|
1. Identify the sport being played
|
|
2. Describe the key actions and plays
|
|
3. Note any scoring or significant events
|
|
4. Describe player movements and strategies
|
|
5. Overall assessment of the game or match`;
|
|
|
|
const response = await zai.chat.completions.createVision({
|
|
messages: [
|
|
{
|
|
role: 'user',
|
|
content: [
|
|
{ type: 'text', text: prompt },
|
|
{ type: 'video_url', video_url: { url: videoUrl } }
|
|
]
|
|
}
|
|
],
|
|
thinking: { type: 'enabled' }
|
|
});
|
|
|
|
return response.choices[0]?.message?.content;
|
|
}
|
|
```
|
|
|
|
### Educational Video Summarization
|
|
|
|
```javascript
|
|
import ZAI from 'z-ai-web-dev-sdk';
|
|
|
|
async function summarizeEducationalVideo(videoUrl) {
|
|
const zai = await ZAI.create();
|
|
|
|
const prompt = `Summarize this educational video for students:
|
|
1. Main topic or learning objective
|
|
2. Key concepts explained (in order)
|
|
3. Important definitions or terminology
|
|
4. Examples used to illustrate concepts
|
|
5. Visual aids or demonstrations shown
|
|
6. Key takeaways or conclusions
|
|
7. Suggested review points
|
|
|
|
Format as a study guide.`;
|
|
|
|
const response = await zai.chat.completions.createVision({
|
|
messages: [
|
|
{
|
|
role: 'user',
|
|
content: [
|
|
{ type: 'text', text: prompt },
|
|
{ type: 'video_url', video_url: { url: videoUrl } }
|
|
]
|
|
}
|
|
],
|
|
thinking: { type: 'enabled' }
|
|
});
|
|
|
|
return response.choices[0]?.message?.content;
|
|
}
|
|
```
|
|
|
|
## Batch Video Processing
|
|
|
|
### Process Multiple Videos
|
|
|
|
```javascript
|
|
import ZAI from 'z-ai-web-dev-sdk';
|
|
|
|
class VideoBatchProcessor {
|
|
constructor() {
|
|
this.zai = null;
|
|
}
|
|
|
|
async initialize() {
|
|
this.zai = await ZAI.create();
|
|
}
|
|
|
|
async processVideo(videoUrl, prompt) {
|
|
const response = await this.zai.chat.completions.createVision({
|
|
messages: [
|
|
{
|
|
role: 'user',
|
|
content: [
|
|
{ type: 'text', text: prompt },
|
|
{ type: 'video_url', video_url: { url: videoUrl } }
|
|
]
|
|
}
|
|
],
|
|
thinking: { type: 'disabled' }
|
|
});
|
|
|
|
return response.choices[0]?.message?.content;
|
|
}
|
|
|
|
async processBatch(videoUrls, prompt) {
|
|
const results = [];
|
|
|
|
for (const videoUrl of videoUrls) {
|
|
try {
|
|
console.log(`Processing: ${videoUrl}`);
|
|
const result = await this.processVideo(videoUrl, prompt);
|
|
results.push({ videoUrl, success: true, result });
|
|
|
|
// Add delay to avoid rate limiting
|
|
await new Promise(resolve => setTimeout(resolve, 1000));
|
|
} catch (error) {
|
|
results.push({
|
|
videoUrl,
|
|
success: false,
|
|
error: error.message
|
|
});
|
|
}
|
|
}
|
|
|
|
return results;
|
|
}
|
|
}
|
|
|
|
// Usage
|
|
const processor = new VideoBatchProcessor();
|
|
await processor.initialize();
|
|
|
|
const videos = [
|
|
'https://example.com/video1.mp4',
|
|
'https://example.com/video2.mp4',
|
|
'https://example.com/video3.mp4'
|
|
];
|
|
|
|
const results = await processor.processBatch(
|
|
videos,
|
|
'Provide a brief summary of this video suitable for a content catalog'
|
|
);
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
### 1. Video Preparation
|
|
- Use standard video formats (MP4, MOV, AVI)
|
|
- Ensure videos are accessible via public URLs or properly encoded
|
|
- For long videos, consider creating shorter clips for specific analysis
|
|
- Optimize video size for faster processing
|
|
- Ensure good lighting and audio quality in source videos
|
|
|
|
### 2. Prompt Engineering for Videos
|
|
- Be specific about temporal aspects ("beginning", "throughout", "at the end")
|
|
- Mention what type of analysis you need (actions, events, scenes, etc.)
|
|
- For long videos, ask for summaries or key moments
|
|
- Use thinking mode for complex temporal reasoning
|
|
- Specify if you need chronological or thematic organization
|
|
|
|
### 3. Error Handling
|
|
|
|
```javascript
|
|
async function safeVideoAnalysis(videoUrl, prompt) {
|
|
try {
|
|
const zai = await ZAI.create();
|
|
|
|
const response = await zai.chat.completions.createVision({
|
|
messages: [
|
|
{
|
|
role: 'user',
|
|
content: [
|
|
{ type: 'text', text: prompt },
|
|
{ type: 'video_url', video_url: { url: videoUrl } }
|
|
]
|
|
}
|
|
],
|
|
thinking: { type: 'disabled' }
|
|
});
|
|
|
|
return {
|
|
success: true,
|
|
content: response.choices[0]?.message?.content
|
|
};
|
|
} catch (error) {
|
|
console.error('Video analysis error:', error);
|
|
return {
|
|
success: false,
|
|
error: error.message
|
|
};
|
|
}
|
|
}
|
|
```
|
|
|
|
### 4. Performance Optimization
|
|
- Cache SDK instance for batch processing
|
|
- Implement request throttling (add delays between requests)
|
|
- Process videos asynchronously when possible
|
|
- For very long videos, consider analyzing at specific intervals
|
|
- Use appropriate thinking mode (disabled for simple descriptions, enabled for complex analysis)
|
|
|
|
### 5. Security Considerations
|
|
- Validate video URLs before processing
|
|
- Implement rate limiting for public APIs
|
|
- Sanitize user-provided video URLs
|
|
- Never expose SDK credentials in client-side code
|
|
- Implement content moderation for user-uploaded videos
|
|
- Consider video file size limits
|
|
|
|
## Common Use Cases
|
|
|
|
1. **Content Moderation**: Automatically review video uploads for policy compliance
|
|
2. **Video Cataloging**: Generate descriptions and tags for video libraries
|
|
3. **Sports Analysis**: Analyze games, identify plays, assess performance
|
|
4. **Educational Content**: Summarize lectures, create study guides
|
|
5. **Security & Surveillance**: Detect events, track activities (with appropriate authorization)
|
|
6. **Quality Control**: Assess video production quality
|
|
7. **Social Media**: Generate video captions and descriptions
|
|
8. **Training & Documentation**: Analyze training videos, create documentation
|
|
9. **Event Recording**: Summarize meetings, conferences, presentations
|
|
10. **Entertainment**: Analyze films, shows for content, themes, scenes
|
|
|
|
## Integration Examples
|
|
|
|
### Express.js API Endpoint
|
|
|
|
```javascript
|
|
import express from 'express';
|
|
import ZAI from 'z-ai-web-dev-sdk';
|
|
|
|
const app = express();
|
|
app.use(express.json());
|
|
|
|
let zaiInstance;
|
|
|
|
async function initZAI() {
|
|
zaiInstance = await ZAI.create();
|
|
}
|
|
|
|
// Analyze video from URL
|
|
app.post('/api/analyze-video', async (req, res) => {
|
|
try {
|
|
const { videoUrl, prompt } = req.body;
|
|
|
|
if (!videoUrl || !prompt) {
|
|
return res.status(400).json({
|
|
error: 'videoUrl and prompt are required'
|
|
});
|
|
}
|
|
|
|
const response = await zaiInstance.chat.completions.createVision({
|
|
messages: [
|
|
{
|
|
role: 'user',
|
|
content: [
|
|
{ type: 'text', text: prompt },
|
|
{ type: 'video_url', video_url: { url: videoUrl } }
|
|
]
|
|
}
|
|
],
|
|
thinking: { type: 'disabled' }
|
|
});
|
|
|
|
res.json({
|
|
success: true,
|
|
analysis: response.choices[0]?.message?.content
|
|
});
|
|
} catch (error) {
|
|
res.status(500).json({
|
|
success: false,
|
|
error: error.message
|
|
});
|
|
}
|
|
});
|
|
|
|
// Get video summary
|
|
app.post('/api/video-summary', async (req, res) => {
|
|
try {
|
|
const { videoUrl } = req.body;
|
|
|
|
if (!videoUrl) {
|
|
return res.status(400).json({ error: 'videoUrl is required' });
|
|
}
|
|
|
|
const prompt = 'Provide a comprehensive summary of this video including: 1) Main content/topic, 2) Key events in chronological order, 3) Important people or subjects, 4) Overall takeaway.';
|
|
|
|
const response = await zaiInstance.chat.completions.createVision({
|
|
messages: [
|
|
{
|
|
role: 'user',
|
|
content: [
|
|
{ type: 'text', text: prompt },
|
|
{ type: 'video_url', video_url: { url: videoUrl } }
|
|
]
|
|
}
|
|
],
|
|
thinking: { type: 'enabled' }
|
|
});
|
|
|
|
res.json({
|
|
success: true,
|
|
summary: response.choices[0]?.message?.content
|
|
});
|
|
} catch (error) {
|
|
res.status(500).json({
|
|
success: false,
|
|
error: error.message
|
|
});
|
|
}
|
|
});
|
|
|
|
initZAI().then(() => {
|
|
app.listen(3000, () => {
|
|
console.log('Video understanding API running on port 3000');
|
|
});
|
|
});
|
|
```
|
|
|
|
### Next.js API Route
|
|
|
|
```javascript
|
|
// pages/api/video-understand.js
|
|
import ZAI from 'z-ai-web-dev-sdk';
|
|
|
|
let zaiInstance = null;
|
|
|
|
async function getZAI() {
|
|
if (!zaiInstance) {
|
|
zaiInstance = await ZAI.create();
|
|
}
|
|
return zaiInstance;
|
|
}
|
|
|
|
export default async function handler(req, res) {
|
|
if (req.method !== 'POST') {
|
|
return res.status(405).json({ error: 'Method not allowed' });
|
|
}
|
|
|
|
try {
|
|
const { videoUrl, prompt, enableThinking = false } = req.body;
|
|
|
|
if (!videoUrl || !prompt) {
|
|
return res.status(400).json({
|
|
error: 'videoUrl and prompt are required'
|
|
});
|
|
}
|
|
|
|
const zai = await getZAI();
|
|
|
|
const response = await zai.chat.completions.createVision({
|
|
messages: [
|
|
{
|
|
role: 'user',
|
|
content: [
|
|
{ type: 'text', text: prompt },
|
|
{ type: 'video_url', video_url: { url: videoUrl } }
|
|
]
|
|
}
|
|
],
|
|
thinking: { type: enableThinking ? 'enabled' : 'disabled' }
|
|
});
|
|
|
|
res.status(200).json({
|
|
success: true,
|
|
analysis: response.choices[0]?.message?.content
|
|
});
|
|
} catch (error) {
|
|
console.error('Error:', error);
|
|
res.status(500).json({
|
|
success: false,
|
|
error: error.message
|
|
});
|
|
}
|
|
}
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
**Issue**: "SDK must be used in backend"
|
|
- **Solution**: Ensure z-ai-web-dev-sdk is only imported and used in server-side code, never in client/browser code
|
|
|
|
**Issue**: Video not loading or being analyzed
|
|
- **Solution**: Verify the video URL is accessible, returns correct MIME type, and is in a supported format
|
|
|
|
**Issue**: Inaccurate temporal analysis
|
|
- **Solution**: Enable thinking mode for complex temporal reasoning, provide more specific prompts about time/sequence
|
|
|
|
**Issue**: Slow response times for videos
|
|
- **Solution**: Videos take longer to process than images; consider shorter clips or sampling for long videos
|
|
|
|
**Issue**: Missing details from video
|
|
- **Solution**: Be more specific in your prompt, ask about particular time segments or aspects
|
|
|
|
**Issue**: Video format not supported
|
|
- **Solution**: Convert video to MP4 (most widely supported), check that URL returns proper video MIME type
|
|
|
|
## Remember
|
|
|
|
- Always use z-ai-web-dev-sdk in backend code only
|
|
- The SDK is already installed - import as shown in examples
|
|
- Use `video_url` content type for video files
|
|
- Video analysis takes longer than image analysis - be patient
|
|
- Enable thinking mode for complex temporal reasoning and event detection
|
|
- Structure prompts to include temporal information (beginning, middle, end)
|
|
- Handle errors gracefully in production
|
|
- Implement rate limiting and delays for batch processing
|
|
- Validate and sanitize user inputs
|
|
- Consider privacy and security when processing user videos
|
|
- For very long videos, consider analyzing specific segments or key frames
|