diff --git a/.gitignore b/.gitignore index a5d008a..64610ba 100755 --- a/.gitignore +++ b/.gitignore @@ -1,51 +1,51 @@ -# See https://help.github.com/articles/ignoring-files/ for more about ignoring files. +# Dependencies +node_modules/ +.pnp +.pnp.js -# dependencies -node_modules -/.pnp -.pnp.* -.yarn/* -!.yarn/patches -!.yarn/plugins -!.yarn/releases -!.yarn/versions +# Build outputs +.next/ +out/ +build/ +dist/ -# testing -/coverage +# Environment files +.env +.env.local +.env.development.local +.env.test.local +.env.production.local -# next.js -/.next/ -/out/ +# IDE +.idea/ +.vscode/ +*.swp +*.swo -# production -/build - -# misc +# OS files .DS_Store -*.pem +Thumbs.db -# debug +# Logs +*.log npm-debug.log* yarn-debug.log* yarn-error.log* -.pnpm-debug.log* -# env files (can opt-in for committing if needed) -.env* +# Testing +coverage/ +.nyc_output/ -# vercel -.vercel +# Misc +.pipeline-data/ +.worktrees/ +agent-workspaces/ -# typescript +# Download files (keep zips) +download/article.json +download/*.json +!.gitkeep + +# Cache +.cache/ *.tsbuildinfo -next-env.d.ts -local-* -.claude -.z-ai-config -dev.log -test -prompt - -server.log -# Skills directory -/skills/ \ No newline at end of file diff --git a/README.md b/README.md index 5a36639..7bf14ed 100755 --- a/README.md +++ b/README.md @@ -1,141 +1,493 @@ -# 🚀 Welcome to Z.ai Code Scaffold +# 🤖 Agentic Compaction & Pipeline System -A modern, production-ready web application scaffold powered by cutting-edge technologies, designed to accelerate your development with [Z.ai](https://chat.z.ai)'s AI-powered coding assistance. +> **Complete open-source implementation of Claude Code-level architecture for deterministic multi-agent orchestration** -## ✨ Technology Stack - -This scaffold provides a robust foundation built with: - -### 🎯 Core Framework -- **⚡ Next.js 16** - The React framework for production with App Router -- **📘 TypeScript 5** - Type-safe JavaScript for better developer experience -- **🎨 Tailwind CSS 4** - Utility-first CSS framework for rapid UI development - -### 🧩 UI Components & Styling -- **🧩 shadcn/ui** - High-quality, accessible components built on Radix UI -- **🎯 Lucide React** - Beautiful & consistent icon library -- **🌈 Framer Motion** - Production-ready motion library for React -- **🎨 Next Themes** - Perfect dark mode in 2 lines of code - -### 📋 Forms & Validation -- **🎣 React Hook Form** - Performant forms with easy validation -- **✅ Zod** - TypeScript-first schema validation - -### 🔄 State Management & Data Fetching -- **🐻 Zustand** - Simple, scalable state management -- **🔄 TanStack Query** - Powerful data synchronization for React -- **🌐 Fetch** - Promise-based HTTP request - -### 🗄️ Database & Backend -- **🗄️ Prisma** - Next-generation TypeScript ORM -- **🔐 NextAuth.js** - Complete open-source authentication solution - -### 🎨 Advanced UI Features -- **📊 TanStack Table** - Headless UI for building tables and datagrids -- **🖱️ DND Kit** - Modern drag and drop toolkit for React -- **📊 Recharts** - Redefined chart library built with React and D3 -- **🖼️ Sharp** - High performance image processing - -### 🌍 Internationalization & Utilities -- **🌍 Next Intl** - Internationalization library for Next.js -- **📅 Date-fns** - Modern JavaScript date utility library -- **🪝 ReactUse** - Collection of essential React hooks for modern development - -## 🎯 Why This Scaffold? - -- **🏎️ Fast Development** - Pre-configured tooling and best practices -- **🎨 Beautiful UI** - Complete shadcn/ui component library with advanced interactions -- **🔒 Type Safety** - Full TypeScript configuration with Zod validation -- **📱 Responsive** - Mobile-first design principles with smooth animations -- **🗄️ Database Ready** - Prisma ORM configured for rapid backend development -- **🔐 Auth Included** - NextAuth.js for secure authentication flows -- **📊 Data Visualization** - Charts, tables, and drag-and-drop functionality -- **🌍 i18n Ready** - Multi-language support with Next Intl -- **🚀 Production Ready** - Optimized build and deployment settings -- **🤖 AI-Friendly** - Structured codebase perfect for AI assistance - -## 🚀 Quick Start - -```bash -# Install dependencies -bun install - -# Start development server -bun run dev - -# Build for production -bun run build - -# Start production server -bun start -``` - -Open [http://localhost:3000](http://localhost:3000) to see your application running. - -## 🤖 Powered by Z.ai - -This scaffold is optimized for use with [Z.ai](https://chat.z.ai) - your AI assistant for: - -- **💻 Code Generation** - Generate components, pages, and features instantly -- **🎨 UI Development** - Create beautiful interfaces with AI assistance -- **🔧 Bug Fixing** - Identify and resolve issues with intelligent suggestions -- **📝 Documentation** - Auto-generate comprehensive documentation -- **🚀 Optimization** - Performance improvements and best practices - -Ready to build something amazing? Start chatting with Z.ai at [chat.z.ai](https://chat.z.ai) and experience the future of AI-powered development! - -## 📁 Project Structure - -``` -src/ -├── app/ # Next.js App Router pages -├── components/ # Reusable React components -│ └── ui/ # shadcn/ui components -├── hooks/ # Custom React hooks -└── lib/ # Utility functions and configurations -``` - -## 🎨 Available Features & Components - -This scaffold includes a comprehensive set of modern web development tools: - -### 🧩 UI Components (shadcn/ui) -- **Layout**: Card, Separator, Aspect Ratio, Resizable Panels -- **Forms**: Input, Textarea, Select, Checkbox, Radio Group, Switch -- **Feedback**: Alert, Toast (Sonner), Progress, Skeleton -- **Navigation**: Breadcrumb, Menubar, Navigation Menu, Pagination -- **Overlay**: Dialog, Sheet, Popover, Tooltip, Hover Card -- **Data Display**: Badge, Avatar, Calendar - -### 📊 Advanced Data Features -- **Tables**: Powerful data tables with sorting, filtering, pagination (TanStack Table) -- **Charts**: Beautiful visualizations with Recharts -- **Forms**: Type-safe forms with React Hook Form + Zod validation - -### 🎨 Interactive Features -- **Animations**: Smooth micro-interactions with Framer Motion -- **Drag & Drop**: Modern drag-and-drop functionality with DND Kit -- **Theme Switching**: Built-in dark/light mode support - -### 🔐 Backend Integration -- **Authentication**: Ready-to-use auth flows with NextAuth.js -- **Database**: Type-safe database operations with Prisma -- **API Client**: HTTP requests with Fetch + TanStack Query -- **State Management**: Simple and scalable with Zustand - -### 🌍 Production Features -- **Internationalization**: Multi-language support with Next Intl -- **Image Optimization**: Automatic image processing with Sharp -- **Type Safety**: End-to-end TypeScript with Zod validation -- **Essential Hooks**: 100+ useful React hooks with ReactUse for common patterns - -## 🤝 Get Started with Z.ai - -1. **Clone this scaffold** to jumpstart your project -2. **Visit [chat.z.ai](https://chat.z.ai)** to access your AI coding assistant -3. **Start building** with intelligent code generation and assistance -4. **Deploy with confidence** using the production-ready setup +A production-ready TypeScript implementation featuring: +- **Context Compaction** - Intelligent conversation summarization and token management +- **Deterministic Orchestration** - State machine controls flow, not LLM decisions +- **Parallel Execution** - Up to 12 concurrent agent sessions +- **Event-Driven Coordination** - Agents finish work → next step triggers automatically --- -Built with ❤️ for the developer community. Supercharged by [Z.ai](https://chat.z.ai) 🚀 +## 📋 Table of Contents + +- [Features](#-features) +- [Architecture](#-architecture) +- [Quick Start](#-quick-start) +- [Component Overview](#-component-overview) +- [Usage Examples](#-usage-examples) +- [API Reference](#-api-reference) +- [Integration](#-integration) +- [Download](#-download) + +--- + +## ✨ Features + +### 1. Context Compaction System + +| Feature | Description | +|---------|-------------| +| Token Counting | Character-based approximation (~4 chars/token) | +| Conversation Summarization | LLM-powered intelligent summarization | +| Context Compaction | 4 strategies: sliding-window, summarize-old, priority-retention, hybrid | +| Budget Management | Track and manage token budgets | + +### 2. Deterministic Pipeline System + +| Feature | Description | +|---------|-------------| +| State Machine | Deterministic flow control (no LLM decisions) | +| Parallel Execution | 4 projects × 3 roles = 12 concurrent sessions | +| Event Bus | Pub/sub coordination between agents | +| Workspace Isolation | Per-agent tools, memory, identity, files | +| YAML Workflows | OpenClaw/Lobster-compatible definitions | + +--- + +## 🏗️ Architecture + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ Pipeline Orchestrator │ +│ │ +│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │ +│ │ State Machine │ │ Event Bus │ │ Parallel Exec │ │ +│ │ (Deterministic)│ │ (Coordination) │ │ (Concurrency) │ │ +│ └────────────────┘ └────────────────┘ └────────────────┘ │ +│ │ │ +│ ┌───────────────────────────┴───────────────────────────┐ │ +│ │ Agent Workspaces │ │ +│ │ │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ Programmer │ │ Reviewer │ │ Tester │ │ │ +│ │ │ (Opus) │ │ (Sonnet) │ │ (Sonnet) │ │ │ +│ │ │ │ │ │ │ │ │ │ +│ │ │ • Tools │ │ • Tools │ │ • Tools │ │ │ +│ │ │ • Memory │ │ • Memory │ │ • Memory │ │ │ +│ │ │ • Workspace │ │ • Workspace │ │ • Workspace │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ │ │ │ +│ └────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ LLM Provider (ZAI SDK) │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## 🚀 Quick Start + +### Prerequisites + +```bash +# Install dependencies +bun add z-ai-web-dev-sdk +``` + +### Basic Usage + +```typescript +import { + PipelineOrchestrator, + runWorkflow, + ContextCompactor, + TokenCounter +} from './pipeline-system'; + +// Initialize orchestrator +const orchestrator = new PipelineOrchestrator(); +await orchestrator.initialize(); + +// Create a code pipeline +const pipelineId = await orchestrator.createPipeline({ + name: 'Multi-Project Pipeline', + projects: [ + { + id: 'project-1', + name: 'Authentication Module', + tasks: [ + { type: 'implement', description: 'Create auth module', role: 'programmer' }, + { type: 'review', description: 'Review code', role: 'reviewer' }, + { type: 'test', description: 'Test implementation', role: 'tester' } + ] + } + ], + roles: ['programmer', 'reviewer', 'tester'], + maxConcurrency: 12 +}); + +// Or run a predefined workflow +const workflowId = await runWorkflow('code-pipeline', { + projectId: 'my-project' +}); +``` + +--- + +## 📦 Component Overview + +### Agent System (`agent-system/`) + +``` +agent-system/ +├── core/ +│ ├── orchestrator.ts # Agent lifecycle management +│ ├── token-counter.ts # Token counting & budgeting +│ ├── context-manager.ts # Context compaction logic +│ ├── subagent-spawner.ts # Subagent creation & management +│ └── summarizer.ts # LLM-powered summarization +├── agents/ +│ ├── base-agent.ts # Base agent class +│ └── task-agent.ts # Task-specific agent +├── storage/ +│ └── memory-store.ts # Persistent storage +└── index.ts # Main exports +``` + +### Pipeline System (`pipeline-system/`) + +``` +pipeline-system/ +├── core/ +│ └── state-machine.ts # Deterministic state machine +├── engine/ +│ └── parallel-executor.ts # Parallel execution engine +├── events/ +│ └── event-bus.ts # Event-driven coordination +├── workspace/ +│ └── agent-workspace.ts # Isolated agent workspaces +├── workflows/ +│ └── yaml-workflow.ts # YAML workflow parser +├── integrations/ +│ └── claude-code.ts # Claude Code integration +└── index.ts # Main exports +``` + +--- + +## 💡 Usage Examples + +### Token Counting + +```typescript +import { TokenCounter } from './agent-system'; + +const counter = new TokenCounter(128000); + +// Count tokens in text +const result = counter.countText("Hello, world!"); +console.log(result); // { tokens: 3, characters: 13, words: 2 } + +// Count conversation +const conversation = [ + { role: 'user', content: 'Hello!' }, + { role: 'assistant', content: 'Hi there!' } +]; +const budget = counter.getBudget(counter.countConversation(conversation).total); +console.log(budget); +// { used: 15, remaining: 123985, total: 124000, percentageUsed: 0.01 } +``` + +### Context Compaction + +```typescript +import { ContextCompactor } from './agent-system'; + +const compactor = new ContextCompactor({ + maxTokens: 120000, + strategy: 'hybrid', + preserveRecentCount: 6 +}); + +// Check if compaction needed +if (compactor.needsCompaction(messages)) { + const result = await compactor.compact(messages); + console.log(`Saved ${result.tokensSaved} tokens`); + console.log(`Compression: ${(result.compressionRatio * 100).toFixed(1)}%`); +} +``` + +### State Machine + +```typescript +import { DeterministicStateMachine } from './pipeline-system'; + +const definition = { + id: 'code-pipeline', + name: 'Code Pipeline', + initial: 'start', + states: { + start: { type: 'start', onExit: [{ event: 'start', target: 'code' }] }, + code: { + type: 'action', + agent: 'programmer', + onExit: [ + { event: 'completed', target: 'review' }, + { event: 'failed', target: 'failed' } + ] + }, + review: { + type: 'choice', + onExit: [ + { event: 'approved', target: 'end', condition: { type: 'equals', field: 'approved', value: true } }, + { event: 'rejected', target: 'code' } + ] + }, + end: { type: 'end' }, + failed: { type: 'end' } + } +}; + +const sm = new DeterministicStateMachine(definition); +sm.start(); +sm.sendEvent({ type: 'start', source: 'user', payload: {} }); +``` + +### Parallel Execution + +```typescript +import { ParallelExecutionEngine } from './pipeline-system'; + +const executor = new ParallelExecutionEngine({ + maxWorkers: 4, + maxConcurrentPerWorker: 3 +}); + +executor.start(); + +// Submit parallel tasks +const tasks = executor.submitBatch([ + { projectId: 'p1', role: 'programmer', type: 'implement', description: 'Auth', priority: 'high' }, + { projectId: 'p2', role: 'programmer', type: 'implement', description: 'Payment', priority: 'high' }, + { projectId: 'p3', role: 'programmer', type: 'implement', description: 'Dashboard', priority: 'medium' } +]); +``` + +### Event Bus + +```typescript +import { EventBus, PipelineEventTypes } from './pipeline-system'; + +const eventBus = new EventBus(); +eventBus.start(); + +// Subscribe to events +eventBus.subscribe({ + eventType: PipelineEventTypes.AGENT_COMPLETED, + handler: async (event) => { + console.log('Agent completed:', event.payload); + // Trigger next step + eventBus.publish({ + type: PipelineEventTypes.TASK_STARTED, + source: 'orchestrator', + payload: { nextAgent: 'reviewer' } + }); + } +}); +``` + +--- + +## 📚 API Reference + +### PipelineOrchestrator + +```typescript +class PipelineOrchestrator { + // Initialize the system + async initialize(): Promise + + // Create a pipeline + async createPipeline(config: PipelineConfig): Promise + + // Create from YAML workflow + async createPipelineFromYAML(workflowId: string, context?: object): Promise + + // Get pipeline status + getPipelineStatus(pipelineId: string): PipelineResult | undefined + + // Cancel pipeline + async cancelPipeline(pipelineId: string): Promise + + // Subscribe to events + onEvent(eventType: string, handler: Function): () => void + + // Get statistics + getStats(): object + + // Shutdown + async shutdown(): Promise +} +``` + +### Quick Start Functions + +```typescript +// Create simple code pipeline +createCodePipeline(projects: ProjectConfig[]): Promise + +// Create parallel pipeline +createParallelPipeline(config: PipelineConfig): Promise + +// Run predefined workflow +runWorkflow(workflowId: string, context?: object): Promise +``` + +--- + +## 🔗 Integration + +### With Claude Code + +```typescript +import { PipelineOrchestrator } from './pipeline-system'; + +const orchestrator = new PipelineOrchestrator(); +await orchestrator.initialize(); + +// Use in Claude Code environment +const pipelineId = await orchestrator.createPipeline({ + name: 'Claude Code Pipeline', + projects: [ + { + id: 'claude-project', + name: 'Claude Integration', + tasks: [ + { type: 'implement', description: 'Add MCP server', role: 'programmer' }, + { type: 'review', description: 'Review changes', role: 'reviewer' }, + { type: 'test', description: 'Test integration', role: 'tester' } + ] + } + ] +}); +``` + +### With OpenClaw + +```typescript +import { runWorkflow, WorkflowRegistry } from './pipeline-system'; + +// Register custom workflow +const registry = new WorkflowRegistry(); +registry.register({ + id: 'custom-openclaw-workflow', + name: 'Custom Workflow', + initial: 'start', + states: { /* ... */ } +}); + +// Run workflow +await runWorkflow('custom-openclaw-workflow', { + projectId: 'my-project' +}); +``` + +### Lobster-Compatible YAML + +```yaml +id: code-pipeline +name: Code Pipeline +initial: start +states: + start: + type: start + on: + start: code + + code: + type: action + role: programmer + timeout: 30m + retry: + maxAttempts: 2 + backoff: exponential + on: + completed: review + failed: failed + + review: + type: choice + on: + approved: test + rejected: code + + test: + type: action + role: tester + on: + passed: end + failed: failed + + end: + type: end + + failed: + type: end +``` + +--- + +## 📥 Download + +Pre-built packages available: + +| Package | Size | Contents | +|---------|------|----------| +| `complete-agent-pipeline-system.zip` | 60KB | Both systems + docs | +| `agent-system.zip` | 27KB | Context & memory management | +| `pipeline-system.zip` | 29KB | Deterministic orchestration | + +--- + +## 📋 Predefined Workflows + +| Workflow ID | Description | +|-------------|-------------| +| `code-pipeline` | Code → Review → Test (max 3 review iterations) | +| `parallel-projects` | Run multiple projects in parallel | +| `human-approval` | Workflow with human approval gates | + +--- + +## 🎯 Key Principles + +1. **Deterministic Flow**: State machines control the pipeline, not LLM decisions +2. **Event-Driven**: Agents communicate through events, enabling loose coupling +3. **Parallel Execution**: Multiple agents work concurrently with resource isolation +4. **Workspace Isolation**: Each agent has its own tools, memory, and file space +5. **YAML Workflows**: Define pipelines declaratively, compatible with Lobster + +--- + +## 📄 License + +MIT License - Free to use, modify, and distribute. + +--- + +## 🤝 Contributing + +Contributions welcome! This system is designed for easy integration with: +- Claude Code +- OpenClaw +- Lobster +- Custom agent frameworks + +--- + +## 📊 Project Statistics + +- **Total Files**: 32 source files +- **Total Code**: ~100KB of TypeScript +- **Components**: 6 major modules +- **Predefined Workflows**: 3 + +--- + +Built with ❤️ for the AI agent community. diff --git a/download/PIPELINE_README.md b/download/PIPELINE_README.md new file mode 100644 index 0000000..c94e616 --- /dev/null +++ b/download/PIPELINE_README.md @@ -0,0 +1,311 @@ +# Deterministic Multi-Agent Pipeline System + +A comprehensive, open-source implementation of **Claude Code-level architecture** for building deterministic, parallel, event-driven multi-agent pipelines. + +## 🎯 Key Features + +| Feature | Description | +|---------|-------------| +| **Deterministic Orchestration** | State machine controls flow, not LLM decisions | +| **Parallel Execution** | 4 projects × 3 roles = 12 concurrent agent sessions | +| **Event-Driven Coordination** | Agents finish work → next step triggers automatically | +| **Full Agent Capabilities** | Each agent gets tools, memory, identity, workspace | +| **YAML Workflow Support** | OpenClaw/Lobster-compatible workflow definitions | + +## 📦 Package Contents + +``` +pipeline-system/ +├── core/ +│ └── state-machine.ts # Deterministic state machine engine +├── engine/ +│ └── parallel-executor.ts # Parallel execution with worker pools +├── events/ +│ └── event-bus.ts # Event-driven coordination system +├── workspace/ +│ └── agent-workspace.ts # Isolated agent workspaces +├── workflows/ +│ └── yaml-workflow.ts # YAML workflow parser (Lobster-compatible) +├── integrations/ +│ └── claude-code.ts # Claude Code integration layer +└── index.ts # Main exports +``` + +## 🚀 Quick Start + +### Installation + +```bash +bun add z-ai-web-dev-sdk +``` + +Copy `pipeline-system/` to your project. + +### Basic Usage + +```typescript +import { PipelineOrchestrator, runWorkflow } from './pipeline-system'; + +// Option 1: Create a code pipeline +const orchestrator = new PipelineOrchestrator(); +await orchestrator.initialize(); + +const pipelineId = await orchestrator.createPipeline({ + name: 'Code Pipeline', + projects: [ + { + id: 'project-1', + name: 'Authentication Module', + tasks: [ + { type: 'implement', description: 'Create auth module', role: 'programmer' }, + { type: 'review', description: 'Review code', role: 'reviewer' }, + { type: 'test', description: 'Test implementation', role: 'tester' } + ] + } + ], + roles: ['programmer', 'reviewer', 'tester'], + maxConcurrency: 12 +}); + +// Option 2: Run predefined workflow +const workflowId = await runWorkflow('code-pipeline', { + projectId: 'my-project', + requirements: 'Build REST API' +}); + +// Subscribe to events +orchestrator.onEvent('agent.completed', (event) => { + console.log('Agent completed:', event.payload); +}); +``` + +## 📐 Architecture + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ Pipeline Orchestrator │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ State Machine│ │ Event Bus │ │Parallel Exec │ │ +│ │(Deterministic)│ │(Coordination)│ │ (Concurrency)│ │ +│ └──────────────┘ └──────────────┘ └──────────────┘ │ +│ │ │ │ │ +│ ┌──────┴────────────────┴─────────────────┴──────┐ │ +│ │ Agent Workspaces │ │ +│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ +│ │ │Programmer│ │Reviewer │ │ Tester │ │ │ +│ │ │ • Tools │ │ • Tools │ │ • Tools │ │ │ +│ │ │ • Memory │ │ • Memory│ │ • Memory│ │ │ +│ │ │ • Files │ │ • Files │ │ • Files │ │ │ +│ │ └─────────┘ └─────────┘ └─────────┘ │ │ +│ └────────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ LLM Provider (ZAI SDK) │ +└─────────────────────────────────────────────────────────────────┘ +``` + +## 🔄 State Machine + +### Define States + +```typescript +const definition: StateMachineDefinition = { + id: 'code-pipeline', + name: 'Code Pipeline', + initial: 'start', + states: { + start: { + id: 'start', + name: 'Start', + type: 'start', + onExit: [{ event: 'start', target: 'code' }] + }, + code: { + id: 'code', + name: 'Code', + type: 'action', + agent: 'programmer', + timeout: 300000, + retry: { maxAttempts: 2, backoff: 'exponential' }, + onExit: [ + { event: 'completed', target: 'review' }, + { event: 'failed', target: 'failed' } + ] + }, + review: { + id: 'review', + name: 'Review', + type: 'choice', + onExit: [ + { event: 'approved', target: 'test', condition: { type: 'equals', field: 'approved', value: true } }, + { event: 'rejected', target: 'code' } + ] + }, + test: { + id: 'test', + name: 'Test', + type: 'action', + agent: 'tester', + onExit: [ + { event: 'passed', target: 'end' }, + { event: 'failed', target: 'failed' } + ] + }, + end: { id: 'end', name: 'End', type: 'end' }, + failed: { id: 'failed', name: 'Failed', type: 'end' } + } +}; +``` + +## ⚡ Parallel Execution + +```typescript +const executor = new ParallelExecutionEngine({ + maxWorkers: 4, + maxConcurrentPerWorker: 3, + taskTimeout: 300000 +}); + +executor.start(); + +// Submit parallel tasks +const tasks = executor.submitBatch([ + { projectId: 'p1', role: 'programmer', type: 'implement', description: 'Auth module', priority: 'high' }, + { projectId: 'p2', role: 'programmer', type: 'implement', description: 'Payment module', priority: 'high' }, + { projectId: 'p3', role: 'programmer', type: 'implement', description: 'Dashboard', priority: 'medium' }, + { projectId: 'p4', role: 'programmer', type: 'implement', description: 'API service', priority: 'medium' } +]); +``` + +## 📨 Event Bus + +```typescript +const eventBus = new EventBus(); + +// Subscribe to events +eventBus.subscribe({ + eventType: 'code.written', + handler: async (event) => { + console.log('Code written:', event.payload); + // Trigger review + eventBus.publish({ + type: 'review.start', + source: 'coordinator', + payload: { projectId: event.payload.projectId } + }); + } +}); + +// Publish events +eventBus.publish({ + type: 'code.written', + source: 'programmer-1', + payload: { projectId: 'p1', files: ['auth.ts', 'auth.test.ts'] } +}); +``` + +## 📁 YAML Workflows (Lobster-Compatible) + +```yaml +id: code-pipeline +name: Code Pipeline +initial: start +states: + start: + type: start + on: + start: code + + code: + type: action + role: programmer + timeout: 30m + retry: + maxAttempts: 2 + backoff: exponential + on: + completed: review + failed: failed + + review: + type: choice + on: + approved: test + rejected: code + + test: + type: action + role: tester + on: + passed: end + failed: test_failed + + test_failed: + type: choice + on: + retry: code + abort: failed + + end: + type: end + + failed: + type: end +``` + +## 🤝 Integration with Claude Code & OpenClaw + +### Claude Code Integration + +```typescript +import { PipelineOrchestrator } from './pipeline-system'; + +const orchestrator = new PipelineOrchestrator(); +await orchestrator.initialize(); + +// Create pipeline for Claude Code project +const pipelineId = await orchestrator.createPipeline({ + name: 'Claude Code Pipeline', + projects: [ + { + id: 'claude-project', + name: 'Claude Integration', + tasks: [ + { type: 'implement', description: 'Add MCP server', role: 'programmer' }, + { type: 'review', description: 'Review changes', role: 'reviewer' }, + { type: 'test', description: 'Test integration', role: 'tester' } + ] + } + ] +}); +``` + +### OpenClaw Integration + +```typescript +import { runWorkflow } from './pipeline-system'; + +// Run Lobster-compatible workflow +const workflowId = await runWorkflow('parallel-projects', { + projects: ['project1', 'project2', 'project3', 'project4'], + roles: ['programmer', 'reviewer', 'tester'] +}); +``` + +## 📊 Predefined Workflows + +| Workflow | Description | +|----------|-------------| +| `code-pipeline` | Code → Review → Test with max 3 review iterations | +| `parallel-projects` | Run multiple projects in parallel | +| `human-approval` | Workflow with human approval gates | + +## 📄 License + +MIT License - Free to use, modify, and distribute. + +## 🤝 Contributing + +Contributions welcome! This is designed for easy integration with Claude Code and OpenClaw. diff --git a/download/README.md b/download/README.md index 10906f8..001ad0c 100755 --- a/download/README.md +++ b/download/README.md @@ -1 +1,235 @@ -Here are all the generated files. \ No newline at end of file +# Agent System - Complete Implementation + +A comprehensive, open-source implementation of context compaction, agent orchestration, and subagent spawning. + +## 📦 Package Contents + +``` +agent-system/ +├── core/ +│ ├── token-counter.ts # Token counting & management +│ ├── summarizer.ts # Conversation summarization (uses LLM) +│ ├── context-manager.ts # Context compaction logic +│ ├── orchestrator.ts # Agent orchestration system +│ └── subagent-spawner.ts # Subagent spawning mechanism +├── agents/ +│ ├── base-agent.ts # Base agent class +│ └── task-agent.ts # Task-specific agent +├── storage/ +│ └── memory-store.ts # Persistent file storage +├── utils/ +│ └── helpers.ts # Utility functions +└── index.ts # Main exports +``` + +## 🚀 Quick Start + +### Installation + +1. Copy the `agent-system` folder to your project +2. Install dependencies: +```bash +bun add z-ai-web-dev-sdk +``` + +### Basic Usage + +```typescript +import { + TokenCounter, + ContextCompactor, + AgentOrchestrator, + SubagentSpawner, + createAgent +} from './agent-system'; + +// Token Counting +const counter = new TokenCounter(128000); +const result = counter.countText("Hello world"); +console.log(result.tokens); // ~3 tokens + +// Context Compaction +const compactor = new ContextCompactor({ + maxTokens: 100000, + strategy: 'hybrid' +}); + +if (compactor.needsCompaction(messages)) { + const compacted = await compactor.compact(messages); + console.log(`Saved ${compacted.tokensSaved} tokens`); +} + +// Agent Orchestration +const orchestrator = new AgentOrchestrator(); +orchestrator.registerAgent({ + id: 'worker-1', + name: 'Worker', + type: 'worker', + capabilities: ['process'], + maxConcurrentTasks: 3, + timeout: 60000 +}); +orchestrator.start(); + +// Subagent Spawning +const spawner = new SubagentSpawner(); +const result = await spawner.executeWithSubagent( + 'researcher', + 'Research AI agents' +); + +// Create Custom Agent +const agent = createAgent( + 'MyAgent', + 'You are a helpful assistant.' +); +await agent.initialize(); +const response = await agent.act('Hello!'); +``` + +## 📚 Components + +### 1. Token Counter +Estimates token counts using character-based approximation (~4 chars/token). + +```typescript +const counter = new TokenCounter(128000); // max tokens + +// Count text +counter.countText("text"); // { tokens, characters, words } + +// Count conversation +counter.countConversation(messages); // { total, breakdown[] } + +// Check budget +counter.getBudget(usedTokens); // { used, remaining, total, percentageUsed } +``` + +### 2. Conversation Summarizer +Creates intelligent summaries using LLM. + +```typescript +const summarizer = new ConversationSummarizer(); + +const result = await summarizer.summarize(messages, { + maxSummaryTokens: 1000, + extractKeyPoints: true, + extractDecisions: true, + extractActionItems: true +}); +// result.summary, result.keyPoints[], result.decisions[], result.actionItems[] +``` + +### 3. Context Compactor +4 compaction strategies: +- `sliding-window` - Keep recent messages only +- `summarize-old` - Summarize older messages +- `priority-retention` - Keep important messages +- `hybrid` - Combine all strategies + +```typescript +const compactor = new ContextCompactor({ + maxTokens: 120000, + targetTokens: 80000, + strategy: 'hybrid', + preserveRecentCount: 6, + triggerThreshold: 80 // % of maxTokens +}); + +const result = await compactor.compact(messages); +// result.messages, result.tokensSaved, result.compressionRatio +``` + +### 4. Agent Orchestrator +Manages agent lifecycle and task routing. + +```typescript +const orchestrator = new AgentOrchestrator(); + +// Register agent +orchestrator.registerAgent({ + id: 'agent-1', + name: 'Worker', + type: 'worker', + capabilities: ['process', 'execute'], + maxConcurrentTasks: 3, + timeout: 60000 +}); + +// Create task +orchestrator.createTask('process', 'Process data', { data: [...] }); + +// Events +orchestrator.on('task_completed', (event) => { ... }); + +// Start processing +orchestrator.start(); +``` + +### 5. Subagent Spawner +Spawn and manage child agents for parallel execution. + +```typescript +const spawner = new SubagentSpawner(); + +// Single execution +const result = await spawner.executeWithSubagent('explorer', 'Find files'); + +// Parallel execution +const results = await spawner.executeParallel([ + { type: 'explorer', input: 'Task 1' }, + { type: 'coder', input: 'Task 2' } +]); + +// Pipeline execution (context passes between steps) +const pipeline = await spawner.executePipeline([ + { type: 'planner', input: 'Plan task' }, + { type: 'coder', input: (prev) => `Implement: ${prev}` }, + { type: 'reviewer', input: (prev) => `Review: ${prev}` } +]); +``` + +### 6. Predefined Subagent Types +- `explorer` - Code exploration +- `researcher` - Information gathering +- `coder` - Code generation +- `reviewer` - Code review +- `planner` - Task planning +- `executor` - Task execution +- `custom` - Your custom type + +## 🔧 Configuration Options + +### CompactionConfig +```typescript +{ + maxTokens: 120000, // Maximum context tokens + targetTokens: 80000, // Target after compaction + strategy: 'hybrid', // Compaction strategy + preserveRecentCount: 6, // Keep last N messages + preserveSystemMessage: true, // Keep system message + priorityKeywords: ['important', 'critical'], + summaryMaxTokens: 2000, // Max tokens for summaries + triggerThreshold: 80 // % to trigger compaction +} +``` + +### AgentConfig +```typescript +{ + id: 'agent-id', + name: 'Agent Name', + type: 'agent-type', + capabilities: ['task1', 'task2'], + maxConcurrentTasks: 3, + timeout: 60000 +} +``` + +## 📄 License + +MIT License - Free to use, modify, and distribute. + +## 🤝 Contributing + +Feel free to extend and customize for your needs! diff --git a/download/agent-system.zip b/download/agent-system.zip new file mode 100644 index 0000000..f1c1cb3 Binary files /dev/null and b/download/agent-system.zip differ diff --git a/download/all-systems.zip b/download/all-systems.zip new file mode 100644 index 0000000..a48654d Binary files /dev/null and b/download/all-systems.zip differ diff --git a/download/complete-agent-pipeline-system.zip b/download/complete-agent-pipeline-system.zip new file mode 100644 index 0000000..603a5cc Binary files /dev/null and b/download/complete-agent-pipeline-system.zip differ diff --git a/download/complete-agent-pipeline.zip b/download/complete-agent-pipeline.zip new file mode 100644 index 0000000..603a5cc Binary files /dev/null and b/download/complete-agent-pipeline.zip differ diff --git a/download/pipeline-system.zip b/download/pipeline-system.zip new file mode 100644 index 0000000..5616afa Binary files /dev/null and b/download/pipeline-system.zip differ diff --git a/next-env.d.ts b/next-env.d.ts new file mode 100644 index 0000000..c4b7818 --- /dev/null +++ b/next-env.d.ts @@ -0,0 +1,6 @@ +/// +/// +import "./.next/dev/types/routes.d.ts"; + +// NOTE: This file should not be edited +// see https://nextjs.org/docs/app/api-reference/config/typescript for more information. diff --git a/pipeline-system/core/state-machine.ts b/pipeline-system/core/state-machine.ts new file mode 100644 index 0000000..6e92f0f --- /dev/null +++ b/pipeline-system/core/state-machine.ts @@ -0,0 +1,653 @@ +/** + * Deterministic State Machine Core + * + * A state machine that controls agent flow WITHOUT LLM decision-making. + * States, transitions, and events are defined declaratively. + * + * Key principle: The LLM does creative work, the state machine handles the plumbing. + */ + +import { randomUUID } from 'crypto'; +import { EventEmitter } from 'events'; + +// ============================================================================ +// Types +// ============================================================================ + +export type StateStatus = 'idle' | 'active' | 'waiting' | 'completed' | 'failed' | 'paused'; + +export interface State { + id: string; + name: string; + type: 'start' | 'end' | 'action' | 'parallel' | 'choice' | 'wait' | 'loop'; + agent?: string; // Agent to invoke in this state + action?: string; // Action to execute + timeout?: number; // Timeout in ms + retry?: RetryConfig; // Retry configuration + onEnter?: Transition[]; // Transitions on entering state + onExit?: Transition[]; // Transitions on exiting state + metadata?: Record; +} + +export interface Transition { + event: string; // Event that triggers this transition + target: string; // Target state ID + condition?: Condition; // Optional condition + guard?: string; // Guard function name +} + +export interface Condition { + type: 'equals' | 'contains' | 'exists' | 'custom'; + field: string; + value?: unknown; + expression?: string; +} + +export interface RetryConfig { + maxAttempts: number; + backoff: 'fixed' | 'exponential' | 'linear'; + initialDelay: number; + maxDelay: number; +} + +export interface StateMachineDefinition { + id: string; + name: string; + version: string; + description?: string; + initial: string; + states: Record; + events?: string[]; // Allowed events + context?: Record; // Initial context + onError?: ErrorHandling; +} + +export interface ErrorHandling { + strategy: 'fail' | 'retry' | 'transition'; + targetState?: string; + maxRetries?: number; +} + +export interface StateMachineInstance { + id: string; + definition: StateMachineDefinition; + currentState: string; + previousState?: string; + status: StateStatus; + context: Record; + history: StateTransition[]; + createdAt: Date; + updatedAt: Date; + startedAt?: Date; + completedAt?: Date; + error?: string; +} + +export interface StateTransition { + from: string; + to: string; + event: string; + timestamp: Date; + context?: Record; +} + +export interface Event { + type: string; + source: string; + target?: string; + payload: unknown; + timestamp: Date; + correlationId?: string; +} + +// ============================================================================ +// State Machine Engine +// ============================================================================ + +/** + * DeterministicStateMachine - Core engine for deterministic flow control + */ +export class DeterministicStateMachine extends EventEmitter { + private definition: StateMachineDefinition; + private instance: StateMachineInstance; + private eventQueue: Event[] = []; + private processing = false; + private timeoutId?: ReturnType; + + constructor(definition: StateMachineDefinition, instanceId?: string) { + super(); + this.definition = definition; + this.instance = this.createInstance(instanceId); + } + + /** + * Create a new state machine instance + */ + private createInstance(instanceId?: string): StateMachineInstance { + return { + id: instanceId || randomUUID(), + definition: this.definition, + currentState: this.definition.initial, + status: 'idle', + context: { ...this.definition.context } || {}, + history: [], + createdAt: new Date(), + updatedAt: new Date() + }; + } + + /** + * Start the state machine + */ + start(): void { + if (this.instance.status !== 'idle') { + throw new Error(`Cannot start state machine in ${this.instance.status} status`); + } + + this.instance.status = 'active'; + this.instance.startedAt = new Date(); + this.emit('started', { instance: this.instance }); + + // Enter initial state + this.enterState(this.instance.currentState); + } + + /** + * Send an event to the state machine + */ + sendEvent(event: Omit): void { + const fullEvent: Event = { + ...event, + timestamp: new Date() + }; + + this.eventQueue.push(fullEvent); + this.emit('eventQueued', { event: fullEvent }); + + this.processQueue(); + } + + /** + * Process the event queue + */ + private async processQueue(): Promise { + if (this.processing || this.eventQueue.length === 0) return; + + this.processing = true; + + try { + while (this.eventQueue.length > 0 && this.instance.status === 'active') { + const event = this.eventQueue.shift()!; + await this.handleEvent(event); + } + } finally { + this.processing = false; + } + } + + /** + * Handle a single event + */ + private async handleEvent(event: Event): Promise { + const currentState = this.getCurrentState(); + + this.emit('eventProcessed', { event, state: currentState }); + + // Find matching transition + const transition = this.findTransition(currentState, event); + + if (!transition) { + this.emit('noTransition', { event, state: currentState }); + return; + } + + // Check condition if present + if (transition.condition && !this.evaluateCondition(transition.condition)) { + this.emit('conditionFailed', { event, transition }); + return; + } + + // Execute transition + await this.executeTransition(transition, event); + } + + /** + * Find a matching transition for the event + */ + private findTransition(state: State, event: Event): Transition | undefined { + const transitions = state.onExit || []; + return transitions.find(t => { + // Check event type match + if (t.event !== event.type) return false; + + // Check target filter if event has specific target + if (event.target && event.target !== this.instance.id) return false; + + return true; + }); + } + + /** + * Evaluate a transition condition + */ + private evaluateCondition(condition: Condition): boolean { + const value = this.getDeepValue(this.instance.context, condition.field); + + switch (condition.type) { + case 'equals': + return value === condition.value; + case 'contains': + if (Array.isArray(value)) { + return value.includes(condition.value); + } + return String(value).includes(String(condition.value)); + case 'exists': + return value !== undefined && value !== null; + case 'custom': + // Custom conditions would be evaluated by a condition registry + return true; + default: + return false; + } + } + + /** + * Execute a state transition + */ + private async executeTransition(transition: Transition, event: Event): Promise { + const fromState = this.instance.currentState; + const toState = transition.target; + + // Record transition + const transitionRecord: StateTransition = { + from: fromState, + to: toState, + event: event.type, + timestamp: new Date(), + context: { ...this.instance.context } + }; + this.instance.history.push(transitionRecord); + + // Exit current state + await this.exitState(fromState); + + // Update instance + this.instance.previousState = fromState; + this.instance.currentState = toState; + this.instance.updatedAt = new Date(); + + // Merge event payload into context + if (event.payload && typeof event.payload === 'object') { + this.instance.context = { + ...this.instance.context, + ...event.payload as Record + }; + } + + this.emit('transition', { from: fromState, to: toState, event }); + + // Enter new state + await this.enterState(toState); + } + + /** + * Enter a state + */ + private async enterState(stateId: string): Promise { + const state = this.definition.states[stateId]; + if (!state) { + this.handleError(`State ${stateId} not found`); + return; + } + + this.emit('enteringState', { state }); + + // Handle state types + switch (state.type) { + case 'end': + this.instance.status = 'completed'; + this.instance.completedAt = new Date(); + this.emit('completed', { instance: this.instance }); + break; + + case 'action': + // Emit event for external action handler + this.emit('action', { + state, + context: this.instance.context, + instanceId: this.instance.id + }); + + // Set timeout if specified + if (state.timeout) { + this.setTimeout(state.timeout, stateId); + } + break; + + case 'parallel': + this.handleParallelState(state); + break; + + case 'choice': + this.handleChoiceState(state); + break; + + case 'wait': + // Wait for external event + this.instance.status = 'waiting'; + break; + + case 'loop': + this.handleLoopState(state); + break; + + default: + // Process onEnter transitions + if (state.onEnter) { + for (const transition of state.onEnter) { + // Auto-transitions trigger immediately + if (transition.event === '*') { + await this.executeTransition(transition, { + type: '*', + source: stateId, + payload: {}, + timestamp: new Date() + }); + break; + } + } + } + } + + this.emit('enteredState', { state }); + } + + /** + * Exit a state + */ + private async exitState(stateId: string): Promise { + const state = this.definition.states[stateId]; + + // Clear any pending timeout + if (this.timeoutId) { + clearTimeout(this.timeoutId); + this.timeoutId = undefined; + } + + this.emit('exitingState', { state }); + } + + /** + * Handle parallel state (fork into concurrent branches) + */ + private handleParallelState(state: State): void { + this.emit('parallel', { + state, + branches: state.onEnter?.map(t => t.target) || [], + context: this.instance.context + }); + } + + /** + * Handle choice state (conditional branching) + */ + private handleChoiceState(state: State): void { + const transitions = state.onExit || []; + + for (const transition of transitions) { + if (transition.condition && this.evaluateCondition(transition.condition)) { + this.sendEvent({ + type: transition.event, + source: state.id, + payload: {} + }); + return; + } + } + + // No condition matched - use default transition + const defaultTransition = transitions.find(t => !t.condition); + if (defaultTransition) { + this.sendEvent({ + type: defaultTransition.event, + source: state.id, + payload: {} + }); + } + } + + /** + * Handle loop state + */ + private handleLoopState(state: State): void { + const loopCount = (this.instance.context._loopCount as Record)?.[state.id] || 0; + const maxIterations = (state.metadata?.maxIterations as number) || 3; + + if (loopCount < maxIterations) { + // Continue loop + this.instance.context._loopCount = { + ...this.instance.context._loopCount as Record, + [state.id]: loopCount + 1 + }; + + this.emit('loopIteration', { + state, + iteration: loopCount + 1, + maxIterations + }); + + // Trigger loop body + const loopTransition = state.onExit?.find(t => t.event === 'continue'); + if (loopTransition) { + this.sendEvent({ + type: 'continue', + source: state.id, + payload: { iteration: loopCount + 1 } + }); + } + } else { + // Exit loop + const exitTransition = state.onExit?.find(t => t.event === 'exit'); + if (exitTransition) { + this.sendEvent({ + type: 'exit', + source: state.id, + payload: { iterations: loopCount } + }); + } + } + } + + /** + * Set a timeout for the current state + */ + private setTimeout(duration: number, stateId: string): void { + this.timeoutId = setTimeout(() => { + this.emit('timeout', { stateId }); + this.sendEvent({ + type: 'timeout', + source: stateId, + payload: { timedOut: true } + }); + }, duration); + } + + /** + * Handle errors + */ + private handleError(error: string): void { + this.instance.error = error; + this.instance.status = 'failed'; + this.instance.completedAt = new Date(); + this.emit('error', { error, instance: this.instance }); + } + + /** + * Get current state definition + */ + getCurrentState(): State { + return this.definition.states[this.instance.currentState]; + } + + /** + * Get instance info + */ + getInstance(): StateMachineInstance { + return { ...this.instance }; + } + + /** + * Update context + */ + updateContext(updates: Record): void { + this.instance.context = { ...this.instance.context, ...updates }; + this.instance.updatedAt = new Date(); + } + + /** + * Pause the state machine + */ + pause(): void { + if (this.instance.status === 'active') { + this.instance.status = 'paused'; + this.emit('paused', { instance: this.instance }); + } + } + + /** + * Resume the state machine + */ + resume(): void { + if (this.instance.status === 'paused') { + this.instance.status = 'active'; + this.emit('resumed', { instance: this.instance }); + this.processQueue(); + } + } + + /** + * Cancel the state machine + */ + cancel(): void { + this.instance.status = 'failed'; + this.instance.error = 'Cancelled'; + this.instance.completedAt = new Date(); + + if (this.timeoutId) { + clearTimeout(this.timeoutId); + } + + this.eventQueue = []; + this.emit('cancelled', { instance: this.instance }); + } + + /** + * Get deep value from object by dot-notation path + */ + private getDeepValue(obj: Record, path: string): unknown { + return path.split('.').reduce((acc, key) => { + if (acc && typeof acc === 'object' && key in acc) { + return (acc as Record)[key]; + } + return undefined; + }, obj); + } +} + +// ============================================================================ +// State Machine Registry +// ============================================================================ + +/** + * StateMachineRegistry - Manages multiple state machine instances + */ +export class StateMachineRegistry { + private definitions: Map = new Map(); + private instances: Map = new Map(); + + /** + * Register a state machine definition + */ + register(definition: StateMachineDefinition): void { + this.definitions.set(definition.id, definition); + } + + /** + * Create a new instance of a state machine + */ + createInstance(definitionId: string, instanceId?: string): DeterministicStateMachine { + const definition = this.definitions.get(definitionId); + if (!definition) { + throw new Error(`State machine definition ${definitionId} not found`); + } + + const sm = new DeterministicStateMachine(definition, instanceId); + this.instances.set(sm.getInstance().id, sm); + + return sm; + } + + /** + * Get an instance by ID + */ + getInstance(instanceId: string): DeterministicStateMachine | undefined { + return this.instances.get(instanceId); + } + + /** + * Get all instances + */ + getAllInstances(): DeterministicStateMachine[] { + return Array.from(this.instances.values()); + } + + /** + * Get instances by status + */ + getInstancesByStatus(status: StateStatus): DeterministicStateMachine[] { + return this.getAllInstances().filter(sm => sm.getInstance().status === status); + } + + /** + * Remove an instance + */ + removeInstance(instanceId: string): boolean { + const sm = this.instances.get(instanceId); + if (sm) { + sm.cancel(); + return this.instances.delete(instanceId); + } + return false; + } + + /** + * Get statistics + */ + getStats(): { + definitions: number; + instances: number; + byStatus: Record; + } { + const byStatus: Record = { + idle: 0, + active: 0, + waiting: 0, + completed: 0, + failed: 0, + paused: 0 + }; + + for (const sm of this.instances.values()) { + byStatus[sm.getInstance().status]++; + } + + return { + definitions: this.definitions.size, + instances: this.instances.size, + byStatus + }; + } +} + +// Singleton registry +export const stateMachineRegistry = new StateMachineRegistry(); diff --git a/pipeline-system/engine/parallel-executor.ts b/pipeline-system/engine/parallel-executor.ts new file mode 100644 index 0000000..23954b9 --- /dev/null +++ b/pipeline-system/engine/parallel-executor.ts @@ -0,0 +1,624 @@ +/** + * Parallel Execution Engine + * + * Manages concurrent agent sessions with resource pooling. + * Supports: 4 projects × 3 roles = up to 12 concurrent sessions. + * + * Key features: + * - Worker pool with configurable concurrency limits + * - Resource isolation per agent session + * - Automatic scaling based on load + * - Task queuing with priority support + */ + +import { randomUUID } from 'crypto'; +import { EventEmitter } from 'events'; + +// ============================================================================ +// Types +// ============================================================================ + +export type AgentRole = 'programmer' | 'reviewer' | 'tester' | 'planner' | 'analyst' | 'custom'; +export type TaskStatus = 'pending' | 'queued' | 'running' | 'completed' | 'failed' | 'cancelled'; +export type WorkerStatus = 'idle' | 'busy' | 'draining' | 'terminated'; + +export interface AgentSession { + id: string; + projectId: string; + role: AgentRole; + model?: string; // e.g., 'opus', 'sonnet' for cost optimization + workspace: string; + tools: string[]; + memory: Record; + identity: AgentIdentity; + status: 'active' | 'idle' | 'terminated'; + createdAt: Date; + lastActivity: Date; +} + +export interface AgentIdentity { + name: string; + description: string; + personality?: string; + systemPrompt?: string; +} + +export interface PipelineTask { + id: string; + projectId: string; + role: AgentRole; + type: string; + description: string; + priority: 'low' | 'medium' | 'high' | 'critical'; + input: unknown; + dependencies: string[]; + timeout: number; + retryCount: number; + maxRetries: number; + status: TaskStatus; + assignedWorker?: string; + result?: unknown; + error?: string; + createdAt: Date; + startedAt?: Date; + completedAt?: Date; + metadata?: Record; +} + +export interface Worker { + id: string; + status: WorkerStatus; + currentTask?: string; + sessions: Map; + completedTasks: number; + failedTasks: number; + createdAt: Date; + lastActivity: Date; +} + +export interface ExecutionConfig { + maxWorkers: number; + maxConcurrentPerWorker: number; + taskTimeout: number; + retryAttempts: number; + retryDelay: number; + drainTimeout: number; +} + +export interface ExecutionResult { + taskId: string; + success: boolean; + output?: unknown; + error?: string; + duration: number; + workerId: string; + sessionId: string; +} + +// ============================================================================ +// Parallel Executor +// ============================================================================ + +/** + * ParallelExecutionEngine - Manages concurrent agent sessions + */ +export class ParallelExecutionEngine extends EventEmitter { + private config: ExecutionConfig; + private workers: Map = new Map(); + private taskQueue: PipelineTask[] = []; + private runningTasks: Map = new Map(); + private completedTasks: PipelineTask[] = []; + private failedTasks: PipelineTask[] = []; + private sessions: Map = new Map(); + private processing = false; + private processInterval?: ReturnType; + private taskHandlers: Map Promise> = new Map(); + + constructor(config?: Partial) { + super(); + this.config = { + maxWorkers: config?.maxWorkers || 4, + maxConcurrentPerWorker: config?.maxConcurrentPerWorker || 3, + taskTimeout: config?.taskTimeout || 300000, // 5 minutes + retryAttempts: config?.retryAttempts || 3, + retryDelay: config?.retryDelay || 5000, + drainTimeout: config?.drainTimeout || 60000, + ...config + }; + } + + /** + * Start the execution engine + */ + start(): void { + // Initialize workers + for (let i = 0; i < this.config.maxWorkers; i++) { + this.createWorker(); + } + + // Start processing loop + this.processing = true; + this.processInterval = setInterval(() => this.processQueue(), 100); + + this.emit('started', { workerCount: this.workers.size }); + } + + /** + * Stop the execution engine + */ + async stop(): Promise { + this.processing = false; + + if (this.processInterval) { + clearInterval(this.processInterval); + } + + // Wait for running tasks to complete or drain + await this.drain(); + + // Terminate workers + for (const worker of this.workers.values()) { + worker.status = 'terminated'; + } + + this.emit('stopped'); + } + + /** + * Create a new worker + */ + private createWorker(): Worker { + const worker: Worker = { + id: `worker-${randomUUID().substring(0, 8)}`, + status: 'idle', + sessions: new Map(), + completedTasks: 0, + failedTasks: 0, + createdAt: new Date(), + lastActivity: new Date() + }; + + this.workers.set(worker.id, worker); + this.emit('workerCreated', { worker }); + + return worker; + } + + /** + * Create an agent session + */ + createSession(config: { + projectId: string; + role: AgentRole; + model?: string; + workspace: string; + tools: string[]; + identity: AgentIdentity; + }): AgentSession { + const session: AgentSession = { + id: `session-${config.projectId}-${config.role}-${randomUUID().substring(0, 8)}`, + projectId: config.projectId, + role: config.role, + model: config.model || this.getDefaultModelForRole(config.role), + workspace: config.workspace, + tools: config.tools, + memory: {}, + identity: config.identity, + status: 'idle', + createdAt: new Date(), + lastActivity: new Date() + }; + + this.sessions.set(session.id, session); + this.emit('sessionCreated', { session }); + + return session; + } + + /** + * Get default model for a role (cost optimization) + */ + private getDefaultModelForRole(role: AgentRole): string { + switch (role) { + case 'programmer': + return 'opus'; // Best for complex coding + case 'reviewer': + return 'sonnet'; // Cost-effective for review + case 'tester': + return 'sonnet'; // Good for test generation + case 'planner': + return 'opus'; // Complex planning + case 'analyst': + return 'sonnet'; + default: + return 'sonnet'; + } + } + + /** + * Submit a task for execution + */ + submitTask(task: Omit): PipelineTask { + const fullTask: PipelineTask = { + ...task, + id: `task-${randomUUID().substring(0, 8)}`, + status: 'pending', + retryCount: 0, + createdAt: new Date() + }; + + this.taskQueue.push(fullTask); + this.emit('taskSubmitted', { task: fullTask }); + + // Sort by priority + this.prioritizeQueue(); + + return fullTask; + } + + /** + * Submit multiple tasks for parallel execution + */ + submitBatch(tasks: Array>): PipelineTask[] { + return tasks.map(task => this.submitTask(task)); + } + + /** + * Prioritize the task queue + */ + private prioritizeQueue(): void { + const priorityOrder = { critical: 0, high: 1, medium: 2, low: 3 }; + + this.taskQueue.sort((a, b) => { + // First by priority + const priorityDiff = priorityOrder[a.priority] - priorityOrder[b.priority]; + if (priorityDiff !== 0) return priorityDiff; + + // Then by creation time (FIFO within priority) + return a.createdAt.getTime() - b.createdAt.getTime(); + }); + } + + /** + * Process the task queue + */ + private async processQueue(): Promise { + if (!this.processing) return; + + // Find tasks ready to run (dependencies met) + const readyTasks = this.getReadyTasks(); + + for (const task of readyTasks) { + const worker = this.findAvailableWorker(task); + if (!worker) break; // No workers available + + await this.executeTask(task, worker); + } + } + + /** + * Get tasks that are ready to execute + */ + private getReadyTasks(): PipelineTask[] { + return this.taskQueue.filter(task => { + if (task.status !== 'pending') return false; + + // Check dependencies + for (const depId of task.dependencies) { + const depTask = this.getTask(depId); + if (!depTask || depTask.status !== 'completed') { + return false; + } + } + + return true; + }); + } + + /** + * Find an available worker for a task + */ + private findAvailableWorker(task: PipelineTask): Worker | undefined { + // First, try to find a worker already handling the project + for (const worker of this.workers.values()) { + if (worker.status !== 'idle' && worker.status !== 'busy') continue; + + const hasProject = Array.from(worker.sessions.values()) + .some(s => s.projectId === task.projectId); + + if (hasProject && worker.sessions.size < this.config.maxConcurrentPerWorker) { + return worker; + } + } + + // Then, find any available worker + for (const worker of this.workers.values()) { + if (worker.status !== 'idle' && worker.status !== 'busy') continue; + + if (worker.sessions.size < this.config.maxConcurrentPerWorker) { + return worker; + } + } + + // Create new worker if under limit + if (this.workers.size < this.config.maxWorkers) { + return this.createWorker(); + } + + return undefined; + } + + /** + * Execute a task + */ + private async executeTask(task: PipelineTask, worker: Worker): Promise { + // Move task from queue to running + const taskIndex = this.taskQueue.indexOf(task); + if (taskIndex > -1) { + this.taskQueue.splice(taskIndex, 1); + } + + task.status = 'running'; + task.startedAt = new Date(); + task.assignedWorker = worker.id; + + // Create or get session + const session = this.getOrCreateSession(task, worker); + + // Track running task + this.runningTasks.set(task.id, { task, worker, session }); + + // Update worker status + worker.status = 'busy'; + worker.currentTask = task.id; + worker.lastActivity = new Date(); + + this.emit('taskStarted', { task, worker, session }); + + // Get task handler + const handler = this.taskHandlers.get(task.type) || this.defaultTaskHandler; + + try { + // Execute with timeout + const result = await Promise.race([ + handler(task, session), + this.createTimeout(task) + ]); + + task.result = result; + task.status = 'completed'; + task.completedAt = new Date(); + + worker.completedTasks++; + this.completedTasks.push(task); + + this.emit('taskCompleted', { task, worker, session, result }); + + } catch (error) { + const errorMessage = error instanceof Error ? error.message : String(error); + + task.error = errorMessage; + task.retryCount++; + + if (task.retryCount < task.maxRetries) { + // Retry + task.status = 'pending'; + this.taskQueue.push(task); + this.emit('taskRetrying', { task, attempt: task.retryCount }); + } else { + // Failed + task.status = 'failed'; + task.completedAt = new Date(); + worker.failedTasks++; + this.failedTasks.push(task); + this.emit('taskFailed', { task, worker, error: errorMessage }); + } + } + + // Cleanup + this.runningTasks.delete(task.id); + worker.currentTask = undefined; + worker.lastActivity = new Date(); + + // Update worker status + if (worker.sessions.size === 0 || this.runningTasks.size === 0) { + worker.status = 'idle'; + } + + session.lastActivity = new Date(); + } + + /** + * Get or create session for a task + */ + private getOrCreateSession(task: PipelineTask, worker: Worker): AgentSession { + // Look for existing session for this project/role + for (const session of worker.sessions.values()) { + if (session.projectId === task.projectId && session.role === task.role) { + return session; + } + } + + // Create new session + const session = this.createSession({ + projectId: task.projectId, + role: task.role, + workspace: `workspace/${task.projectId}/${task.role}`, + tools: this.getToolsForRole(task.role), + identity: this.getIdentityForRole(task.role) + }); + + worker.sessions.set(session.id, session); + + return session; + } + + /** + * Get tools available for a role + */ + private getToolsForRole(role: AgentRole): string[] { + const toolMap: Record = { + programmer: ['read', 'write', 'execute', 'git', 'test', 'lint', 'build'], + reviewer: ['read', 'diff', 'comment', 'lint', 'test'], + tester: ['read', 'execute', 'test', 'mock'], + planner: ['read', 'write', 'diagram'], + analyst: ['read', 'query', 'report'], + custom: ['read'] + }; + + return toolMap[role] || ['read']; + } + + /** + * Get identity for a role + */ + private getIdentityForRole(role: AgentRole): AgentIdentity { + const identityMap: Record = { + programmer: { + name: 'Code Architect', + description: 'Expert developer who writes clean, efficient code', + personality: 'Methodical, detail-oriented, focuses on best practices' + }, + reviewer: { + name: 'Code Reviewer', + description: 'Experienced engineer who catches bugs and improves code quality', + personality: 'Thorough, constructive, focuses on maintainability' + }, + tester: { + name: 'QA Engineer', + description: 'Test specialist who ensures code correctness', + personality: 'Systematic, edge-case focused, quality-driven' + }, + planner: { + name: 'Technical Architect', + description: 'Strategic thinker who plans implementation', + personality: 'Analytical, systematic, big-picture focused' + }, + analyst: { + name: 'Data Analyst', + description: 'Data specialist who extracts insights', + personality: 'Curious, methodical, detail-oriented' + }, + custom: { + name: 'Custom Agent', + description: 'Generic agent for custom tasks', + personality: 'Adaptable' + } + }; + + return identityMap[role] || identityMap.custom; + } + + /** + * Default task handler + */ + private async defaultTaskHandler(task: PipelineTask, session: AgentSession): Promise { + // This would be replaced by actual LLM invocation + return { + message: `Task ${task.type} completed by ${session.identity.name}`, + projectId: task.projectId, + role: task.role + }; + } + + /** + * Create timeout promise + */ + private createTimeout(task: PipelineTask): Promise { + return new Promise((_, reject) => { + setTimeout(() => { + reject(new Error(`Task ${task.id} timed out after ${task.timeout}ms`)); + }, task.timeout); + }); + } + + /** + * Get task by ID + */ + getTask(taskId: string): PipelineTask | undefined { + return ( + this.taskQueue.find(t => t.id === taskId) || + this.runningTasks.get(taskId)?.task || + this.completedTasks.find(t => t.id === taskId) || + this.failedTasks.find(t => t.id === taskId) + ); + } + + /** + * Register a task handler + */ + registerHandler(taskType: string, handler: (task: PipelineTask, session: AgentSession) => Promise): void { + this.taskHandlers.set(taskType, handler); + } + + /** + * Drain - wait for running tasks to complete + */ + private async drain(): Promise { + while (this.runningTasks.size > 0) { + await new Promise(resolve => setTimeout(resolve, 100)); + } + } + + /** + * Get engine statistics + */ + getStats(): { + workers: { total: number; idle: number; busy: number }; + tasks: { pending: number; running: number; completed: number; failed: number }; + sessions: number; + } { + let idleWorkers = 0; + let busyWorkers = 0; + + for (const worker of this.workers.values()) { + if (worker.status === 'idle') idleWorkers++; + else if (worker.status === 'busy') busyWorkers++; + } + + return { + workers: { + total: this.workers.size, + idle: idleWorkers, + busy: busyWorkers + }, + tasks: { + pending: this.taskQueue.length, + running: this.runningTasks.size, + completed: this.completedTasks.length, + failed: this.failedTasks.length + }, + sessions: this.sessions.size + }; + } + + /** + * Get sessions by project + */ + getSessionsByProject(projectId: string): AgentSession[] { + return Array.from(this.sessions.values()).filter(s => s.projectId === projectId); + } + + /** + * Get all sessions + */ + getAllSessions(): AgentSession[] { + return Array.from(this.sessions.values()); + } + + /** + * Terminate a session + */ + terminateSession(sessionId: string): boolean { + const session = this.sessions.get(sessionId); + if (session) { + session.status = 'terminated'; + this.emit('sessionTerminated', { session }); + return true; + } + return false; + } +} + +// Default instance +export const defaultExecutor = new ParallelExecutionEngine(); diff --git a/pipeline-system/events/event-bus.ts b/pipeline-system/events/event-bus.ts new file mode 100644 index 0000000..f53ee20 --- /dev/null +++ b/pipeline-system/events/event-bus.ts @@ -0,0 +1,570 @@ +/** + * Event-Driven Coordination System + * + * Event bus for inter-agent communication. + * Agents finish work → emit event → next step triggers automatically. + * + * Key features: + * - Pub/sub event distribution + * - Event correlation and routing + * - Event replay for debugging + * - Dead letter queue for failed handlers + */ + +import { randomUUID } from 'crypto'; +import { EventEmitter } from 'events'; + +// ============================================================================ +// Types +// ============================================================================ + +export type EventPriority = 'low' | 'normal' | 'high' | 'critical'; + +export interface PipelineEvent { + id: string; + type: string; + source: string; + target?: string; + payload: unknown; + priority: EventPriority; + timestamp: Date; + correlationId?: string; + causationId?: string; // ID of event that caused this event + metadata?: Record; + retryCount?: number; +} + +export interface EventHandler { + id: string; + eventType: string | string[] | '*'; + filter?: EventFilter; + handler: (event: PipelineEvent) => Promise | void; + priority?: number; + once?: boolean; +} + +export interface EventFilter { + source?: string | string[]; + target?: string | string[]; + payloadPattern?: Record; +} + +export interface Subscription { + id: string; + eventType: string; + handlerId: string; + active: boolean; + createdAt: Date; + eventsReceived: number; +} + +export interface EventBusConfig { + maxHistorySize: number; + deadLetterQueueSize: number; + retryAttempts: number; + retryDelay: number; + enableReplay: boolean; +} + +export interface EventBusStats { + eventsPublished: number; + eventsProcessed: number; + eventsFailed: number; + handlersRegistered: number; + queueSize: number; + historySize: number; +} + +// ============================================================================ +// Event Bus +// ============================================================================ + +/** + * EventBus - Central event distribution system + */ +export class EventBus extends EventEmitter { + private config: EventBusConfig; + private handlers: Map = new Map(); + private eventQueue: PipelineEvent[] = []; + private history: PipelineEvent[] = []; + private deadLetterQueue: PipelineEvent[] = []; + private processing = false; + private stats = { + eventsPublished: 0, + eventsProcessed: 0, + eventsFailed: 0 + }; + private processInterval?: ReturnType; + + constructor(config?: Partial) { + super(); + this.config = { + maxHistorySize: 1000, + deadLetterQueueSize: 100, + retryAttempts: 3, + retryDelay: 1000, + enableReplay: true, + ...config + }; + } + + /** + * Start the event bus + */ + start(): void { + this.processing = true; + this.processInterval = setInterval(() => this.processQueue(), 50); + this.emit('started'); + } + + /** + * Stop the event bus + */ + stop(): void { + this.processing = false; + if (this.processInterval) { + clearInterval(this.processInterval); + } + this.emit('stopped'); + } + + /** + * Publish an event + */ + publish(event: Omit): string { + const fullEvent: PipelineEvent = { + ...event, + id: `evt-${randomUUID().substring(0, 8)}`, + timestamp: new Date(), + retryCount: event.retryCount || 0 + }; + + // Add to queue + this.eventQueue.push(fullEvent); + this.stats.eventsPublished++; + + // Add to history + if (this.config.enableReplay) { + this.history.push(fullEvent); + if (this.history.length > this.config.maxHistorySize) { + this.history.shift(); + } + } + + this.emit('eventPublished', { event: fullEvent }); + + return fullEvent.id; + } + + /** + * Publish a batch of events + */ + publishBatch(events: Array>): string[] { + return events.map(event => this.publish(event)); + } + + /** + * Subscribe to events + */ + subscribe(config: { + eventType: string | string[] | '*'; + handler: (event: PipelineEvent) => Promise | void; + filter?: EventFilter; + priority?: number; + once?: boolean; + }): string { + const handlerId = `handler-${randomUUID().substring(0, 8)}`; + + const handler: EventHandler = { + id: handlerId, + eventType: config.eventType, + filter: config.filter, + handler: config.handler, + priority: config.priority || 0, + once: config.once || false + }; + + this.handlers.set(handlerId, handler); + this.emit('handlerRegistered', { handler }); + + return handlerId; + } + + /** + * Unsubscribe from events + */ + unsubscribe(handlerId: string): boolean { + const result = this.handlers.delete(handlerId); + if (result) { + this.emit('handlerUnregistered', { handlerId }); + } + return result; + } + + /** + * Process the event queue + */ + private async processQueue(): Promise { + if (!this.processing || this.eventQueue.length === 0) return; + + const event = this.eventQueue.shift()!; + + // Find matching handlers + const matchingHandlers = this.findMatchingHandlers(event); + + // Sort by priority (higher first) + matchingHandlers.sort((a, b) => (b.priority || 0) - (a.priority || 0)); + + // Execute handlers + for (const handler of matchingHandlers) { + try { + await handler.handler(event); + this.stats.eventsProcessed++; + + // Remove one-time handlers + if (handler.once) { + this.handlers.delete(handler.id); + } + + } catch (error) { + this.stats.eventsFailed++; + + // Retry logic + const retryCount = (event.retryCount || 0) + 1; + + if (retryCount < this.config.retryAttempts) { + // Re-queue with incremented retry count + setTimeout(() => { + this.publish({ + ...event, + retryCount + }); + }, this.config.retryDelay * retryCount); + + this.emit('eventRetry', { event, error, retryCount }); + } else { + // Move to dead letter queue + this.addToDeadLetterQueue(event, error); + } + } + } + + this.emit('eventProcessed', { event, handlerCount: matchingHandlers.length }); + } + + /** + * Find handlers matching an event + */ + private findMatchingHandlers(event: PipelineEvent): EventHandler[] { + const matching: EventHandler[] = []; + + for (const handler of this.handlers.values()) { + // Check event type match + if (handler.eventType !== '*') { + const types = Array.isArray(handler.eventType) ? handler.eventType : [handler.eventType]; + if (!types.includes(event.type)) continue; + } + + // Check filters + if (handler.filter && !this.matchesFilter(event, handler.filter)) { + continue; + } + + matching.push(handler); + } + + return matching; + } + + /** + * Check if event matches filter + */ + private matchesFilter(event: PipelineEvent, filter: EventFilter): boolean { + // Check source filter + if (filter.source) { + const sources = Array.isArray(filter.source) ? filter.source : [filter.source]; + if (!sources.includes(event.source)) return false; + } + + // Check target filter + if (filter.target) { + const targets = Array.isArray(filter.target) ? filter.target : [filter.target]; + if (event.target && !targets.includes(event.target)) return false; + } + + // Check payload pattern + if (filter.payloadPattern) { + const payload = event.payload as Record; + for (const [key, value] of Object.entries(filter.payloadPattern)) { + if (payload[key] !== value) return false; + } + } + + return true; + } + + /** + * Add event to dead letter queue + */ + private addToDeadLetterQueue(event: PipelineEvent, error: unknown): void { + this.deadLetterQueue.push({ + ...event, + metadata: { + ...event.metadata, + error: error instanceof Error ? error.message : String(error), + failedAt: new Date().toISOString() + } + }); + + // Trim queue + if (this.deadLetterQueue.length > this.config.deadLetterQueueSize) { + this.deadLetterQueue.shift(); + } + + this.emit('eventDeadLettered', { event, error }); + } + + /** + * Replay events from history + */ + replay(fromTimestamp?: Date, toTimestamp?: Date): void { + if (!this.config.enableReplay) { + throw new Error('Event replay is disabled'); + } + + const events = this.history.filter(event => { + if (fromTimestamp && event.timestamp < fromTimestamp) return false; + if (toTimestamp && event.timestamp > toTimestamp) return false; + return true; + }); + + for (const event of events) { + this.eventQueue.push({ + ...event, + id: `replay-${event.id}`, + metadata: { ...event.metadata, replayed: true } + }); + } + + this.emit('replayStarted', { count: events.length }); + } + + /** + * Get events from history + */ + getHistory(filter?: { + type?: string; + source?: string; + from?: Date; + to?: Date; + }): PipelineEvent[] { + let events = [...this.history]; + + if (filter) { + if (filter.type) { + events = events.filter(e => e.type === filter.type); + } + if (filter.source) { + events = events.filter(e => e.source === filter.source); + } + if (filter.from) { + events = events.filter(e => e.timestamp >= filter.from!); + } + if (filter.to) { + events = events.filter(e => e.timestamp <= filter.to!); + } + } + + return events; + } + + /** + * Get dead letter queue + */ + getDeadLetterQueue(): PipelineEvent[] { + return [...this.deadLetterQueue]; + } + + /** + * Clear dead letter queue + */ + clearDeadLetterQueue(): void { + this.deadLetterQueue = []; + } + + /** + * Get statistics + */ + getStats(): EventBusStats { + return { + eventsPublished: this.stats.eventsPublished, + eventsProcessed: this.stats.eventsProcessed, + eventsFailed: this.stats.eventsFailed, + handlersRegistered: this.handlers.size, + queueSize: this.eventQueue.length, + historySize: this.history.length + }; + } + + /** + * Request-response pattern + */ + async request( + event: Omit, + timeout = 30000 + ): Promise { + return new Promise((resolve, reject) => { + const correlationId = `req-${randomUUID().substring(0, 8)}`; + + // Subscribe to response + const responseHandler = this.subscribe({ + eventType: `${event.type}.response`, + filter: { payloadPattern: { correlationId } }, + once: true, + handler: (response) => { + clearTimeout(timeoutId); + resolve(response.payload as T); + } + }); + + // Set timeout + const timeoutId = setTimeout(() => { + this.unsubscribe(responseHandler); + reject(new Error(`Request timeout for event ${event.type}`)); + }, timeout); + + // Publish request with correlation ID + this.publish({ + ...event, + metadata: { ...event.metadata, correlationId } + }); + }); + } + + /** + * Create a correlated event chain + */ + createChain(firstEvent: Omit): EventChain { + const correlationId = `chain-${randomUUID().substring(0, 8)}`; + + return new EventChain(this, correlationId, firstEvent); + } +} + +// ============================================================================ +// Event Chain +// ============================================================================ + +/** + * EventChain - Builder for correlated event sequences + */ +export class EventChain { + private bus: EventBus; + private correlationId: string; + private events: PipelineEvent[] = []; + private currentEvent?: PipelineEvent; + + constructor(bus: EventBus, correlationId: string, firstEvent: Omit) { + this.bus = bus; + this.correlationId = correlationId; + this.currentEvent = { + ...firstEvent, + id: '', + timestamp: new Date(), + correlationId + } as PipelineEvent; + } + + /** + * Add next event in chain + */ + then(event: Omit): this { + if (this.currentEvent) { + this.events.push(this.currentEvent); + + this.currentEvent = { + ...event, + id: '', + timestamp: new Date(), + correlationId: this.correlationId, + causationId: this.currentEvent.id || undefined + } as PipelineEvent; + } + return this; + } + + /** + * Execute the chain + */ + execute(): string[] { + if (this.currentEvent) { + this.events.push(this.currentEvent); + } + + return this.events.map(event => + this.bus.publish({ + ...event, + correlationId: this.correlationId + }) + ); + } + + /** + * Get correlation ID + */ + getCorrelationId(): string { + return this.correlationId; + } +} + +// ============================================================================ +// Predefined Pipeline Events +// ============================================================================ + +/** + * Standard pipeline event types + */ +export const PipelineEventTypes = { + // Agent lifecycle + AGENT_STARTED: 'agent.started', + AGENT_COMPLETED: 'agent.completed', + AGENT_FAILED: 'agent.failed', + AGENT_TIMEOUT: 'agent.timeout', + + // Task lifecycle + TASK_CREATED: 'task.created', + TASK_ASSIGNED: 'task.assigned', + TASK_STARTED: 'task.started', + TASK_COMPLETED: 'task.completed', + TASK_FAILED: 'task.failed', + + // Code pipeline + CODE_WRITTEN: 'code.written', + CODE_REVIEWED: 'code.reviewed', + CODE_APPROVED: 'code.approved', + CODE_REJECTED: 'code.rejected', + CODE_TESTED: 'code.tested', + TESTS_PASSED: 'tests.passed', + TESTS_FAILED: 'tests.failed', + + // State machine + STATE_ENTERED: 'state.entered', + STATE_EXITED: 'state.exited', + TRANSITION: 'state.transition', + + // Coordination + PIPELINE_STARTED: 'pipeline.started', + PIPELINE_COMPLETED: 'pipeline.completed', + PIPELINE_PAUSED: 'pipeline.paused', + PIPELINE_RESUMED: 'pipeline.resumed', + + // Human interaction + HUMAN_INPUT_REQUIRED: 'human.input_required', + HUMAN_INPUT_RECEIVED: 'human.input_received', + HUMAN_APPROVAL_REQUIRED: 'human.approval_required', + HUMAN_APPROVED: 'human.approved', + HUMAN_REJECTED: 'human.rejected' +} as const; + +// Default event bus instance +export const defaultEventBus = new EventBus(); diff --git a/pipeline-system/index.ts b/pipeline-system/index.ts new file mode 100644 index 0000000..42dd281 --- /dev/null +++ b/pipeline-system/index.ts @@ -0,0 +1,206 @@ +/** + * Deterministic Multi-Agent Pipeline System + * + * A comprehensive system for building deterministic, parallel, event-driven + * multi-agent pipelines that integrate with Claude Code and OpenClaw. + * + * Key Features: + * - Deterministic orchestration (state machine, not LLM decision) + * - Parallel execution (up to 12 concurrent agent sessions) + * - Event-driven coordination (agents finish → next triggers) + * - Full agent capabilities (tools, memory, identity, workspace) + * + * @module pipeline-system + */ + +// Core +export { + DeterministicStateMachine, + StateMachineRegistry, + stateMachineRegistry +} from './core/state-machine'; +export type { + State, + StateStatus, + Transition, + Condition, + RetryConfig, + StateMachineDefinition, + StateMachineInstance, + StateTransition, + Event, + ErrorHandling +} from './core/state-machine'; + +// Engine +export { + ParallelExecutionEngine, + defaultExecutor +} from './engine/parallel-executor'; +export type { + AgentRole, + TaskStatus, + WorkerStatus, + AgentSession, + AgentIdentity, + PipelineTask, + Worker, + ExecutionConfig, + ExecutionResult +} from './engine/parallel-executor'; + +// Events +export { + EventBus, + EventChain, + PipelineEventTypes, + defaultEventBus +} from './events/event-bus'; +export type { + PipelineEvent, + EventHandler, + EventFilter, + Subscription, + EventBusConfig, + EventBusStats, + EventPriority +} from './events/event-bus'; + +// Workspace +export { + WorkspaceManager, + WorkspaceFactory, + defaultWorkspaceFactory +} from './workspace/agent-workspace'; +export type { + Permission, + WorkspaceConfig, + ResourceLimits, + MountPoint, + AgentTool, + ToolContext, + ToolResult, + MemoryStore +} from './workspace/agent-workspace'; + +// Workflows +export { + WorkflowParser, + WorkflowRegistry, + CODE_PIPELINE_WORKFLOW, + PARALLEL_PROJECTS_WORKFLOW, + HUMAN_APPROVAL_WORKFLOW, + defaultWorkflowRegistry +} from './workflows/yaml-workflow'; +export type { + YAMLWorkflow, + YAMLState, + YAMLTransition, + YAMLCondition, + YAMLRetryConfig, + YAMLLoopConfig +} from './workflows/yaml-workflow'; + +// Integrations +export { + PipelineOrchestrator, + createCodePipeline, + createParallelPipeline, + runWorkflow, + defaultOrchestrator +} from './integrations/claude-code'; +export type { + PipelineConfig, + ProjectConfig, + TaskConfig, + PipelineResult, + ProjectResult, + TaskResult, + AgentMessage +} from './integrations/claude-code'; + +// Version +export const VERSION = '1.0.0'; + +/** + * Quick Start Example: + * + * ```typescript + * import { + * PipelineOrchestrator, + * createCodePipeline, + * runWorkflow + * } from './pipeline-system'; + * + * // Option 1: Simple code pipeline + * const pipelineId = await createCodePipeline([ + * { + * id: 'project-1', + * name: 'My Project', + * tasks: [ + * { type: 'implement', description: 'Create auth module', role: 'programmer' }, + * { type: 'review', description: 'Review auth module', role: 'reviewer' }, + * { type: 'test', description: 'Test auth module', role: 'tester' } + * ] + * } + * ]); + * + * // Option 2: Run predefined workflow + * const workflowId = await runWorkflow('code-pipeline', { + * projectId: 'my-project', + * requirements: 'Build REST API' + * }); + * + * // Option 3: Custom configuration + * const orchestrator = new PipelineOrchestrator(); + * await orchestrator.initialize(); + * + * const customPipelineId = await orchestrator.createPipeline({ + * name: 'Custom Pipeline', + * projects: [...], + * roles: ['programmer', 'reviewer', 'tester'], + * maxConcurrency: 12 + * }); + * + * // Subscribe to events + * orchestrator.onEvent('agent.completed', (event) => { + * console.log('Agent completed:', event.payload); + * }); + * ``` + * + * ## Architecture + * + * ``` + * ┌─────────────────────────────────────────────────────────────────┐ + * │ Pipeline Orchestrator │ + * │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ + * │ │ State Machine│ │Event Bus │ │ Parallel Exec│ │ + * │ │ (Deterministic│ │(Coordination)│ │ (Concurrency)│ │ + * │ └──────────────┘ └──────────────┘ └──────────────┘ │ + * │ │ │ │ │ + * │ ┌──────┴────────────────┴─────────────────┴──────┐ │ + * │ │ Agent Workspaces │ │ + * │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ + * │ │ │Programmer│ │Reviewer │ │ Tester │ │ │ + * │ │ │Workspace │ │Workspace│ │Workspace│ │ │ + * │ │ │ • Tools │ │ • Tools │ │ • Tools │ │ │ + * │ │ │ • Memory │ │ • Memory│ │ • Memory│ │ │ + * │ │ │ • Files │ │ • Files │ │ • Files │ │ │ + * │ │ └─────────┘ └─────────┘ └─────────┘ │ │ + * │ └────────────────────────────────────────────────┘ │ + * └─────────────────────────────────────────────────────────────────┘ + * │ + * ▼ + * ┌─────────────────────────────────────────────────────────────────┐ + * │ LLM Provider (ZAI SDK) │ + * └─────────────────────────────────────────────────────────────────┘ + * ``` + * + * ## Key Principles + * + * 1. **Deterministic Flow**: State machines control the pipeline, not LLM decisions + * 2. **Event-Driven**: Agents communicate through events, enabling loose coupling + * 3. **Parallel Execution**: Multiple agents work concurrently with resource isolation + * 4. **Workspace Isolation**: Each agent has its own tools, memory, and file space + * 5. **YAML Workflows**: Define pipelines declaratively, compatible with Lobster + */ diff --git a/pipeline-system/integrations/claude-code.ts b/pipeline-system/integrations/claude-code.ts new file mode 100644 index 0000000..9abcd42 --- /dev/null +++ b/pipeline-system/integrations/claude-code.ts @@ -0,0 +1,599 @@ +/** + * Claude Code Integration Layer + * + * Provides easy integration with Claude Code and OpenClaw. + * Single API surface for all pipeline operations. + */ + +import { randomUUID } from 'crypto'; +import ZAI from 'z-ai-web-dev-sdk'; +import { + DeterministicStateMachine, + StateMachineDefinition, + StateMachineRegistry, + stateMachineRegistry +} from '../core/state-machine'; +import { + ParallelExecutionEngine, + PipelineTask, + AgentRole, + AgentSession, + defaultExecutor +} from '../engine/parallel-executor'; +import { + EventBus, + PipelineEvent, + PipelineEventTypes, + defaultEventBus +} from '../events/event-bus'; +import { + WorkspaceManager, + WorkspaceFactory, + AgentIdentity, + defaultWorkspaceFactory +} from '../workspace/agent-workspace'; +import { + WorkflowRegistry, + YAMLWorkflow, + defaultWorkflowRegistry +} from '../workflows/yaml-workflow'; + +// ============================================================================ +// Types +// ============================================================================ + +export interface PipelineConfig { + name: string; + projects: ProjectConfig[]; + roles: AgentRole[]; + maxConcurrency?: number; + timeout?: number; +} + +export interface ProjectConfig { + id: string; + name: string; + description?: string; + repository?: string; + branch?: string; + tasks: TaskConfig[]; +} + +export interface TaskConfig { + type: string; + description: string; + role: AgentRole; + priority?: 'low' | 'medium' | 'high' | 'critical'; + dependencies?: string[]; + timeout?: number; +} + +export interface PipelineResult { + pipelineId: string; + status: 'running' | 'completed' | 'failed' | 'cancelled'; + startTime: Date; + endTime?: Date; + projects: ProjectResult[]; +} + +export interface ProjectResult { + projectId: string; + status: 'pending' | 'running' | 'completed' | 'failed'; + tasks: TaskResult[]; +} + +export interface TaskResult { + taskId: string; + status: 'pending' | 'running' | 'completed' | 'failed'; + output?: unknown; + error?: string; + duration?: number; +} + +export interface AgentMessage { + role: 'system' | 'user' | 'assistant'; + content: string; +} + +// ============================================================================ +// Pipeline Orchestrator +// ============================================================================ + +/** + * PipelineOrchestrator - Main integration class + * + * Single entry point for Claude Code and OpenClaw integration. + */ +export class PipelineOrchestrator { + private zai: Awaited> | null = null; + private executor: ParallelExecutionEngine; + private eventBus: EventBus; + private workflowRegistry: WorkflowRegistry; + private workspaceFactory: WorkspaceFactory; + private smRegistry: StateMachineRegistry; + private pipelines: Map = new Map(); + private initialized = false; + + constructor(config?: { + executor?: ParallelExecutionEngine; + eventBus?: EventBus; + workflowRegistry?: WorkflowRegistry; + workspaceFactory?: WorkspaceFactory; + }) { + this.executor = config?.executor || defaultExecutor; + this.eventBus = config?.eventBus || defaultEventBus; + this.workflowRegistry = config?.workflowRegistry || defaultWorkflowRegistry; + this.workspaceFactory = config?.workspaceFactory || defaultWorkspaceFactory; + this.smRegistry = stateMachineRegistry; + } + + /** + * Initialize the pipeline system + */ + async initialize(): Promise { + if (this.initialized) return; + + // Initialize ZAI SDK + this.zai = await ZAI.create(); + + // Start executor + this.executor.start(); + + // Start event bus + this.eventBus.start(); + + // Register task handler + this.executor.registerHandler('agent-task', this.executeAgentTask.bind(this)); + + // Set up event subscriptions + this.setupEventSubscriptions(); + + this.initialized = true; + } + + /** + * Set up event subscriptions for coordination + */ + private setupEventSubscriptions(): void { + // Agent completion triggers next step + this.eventBus.subscribe({ + eventType: PipelineEventTypes.AGENT_COMPLETED, + handler: async (event) => { + const { projectId, role, output } = event.payload as Record; + + // Determine next role in pipeline + const nextRole = this.getNextRole(role as AgentRole); + + if (nextRole) { + // Emit event to trigger next agent + this.eventBus.publish({ + type: PipelineEventTypes.TASK_STARTED, + source: 'orchestrator', + payload: { projectId, role: nextRole, previousOutput: output } + }); + } + } + }); + + // Handle failures + this.eventBus.subscribe({ + eventType: PipelineEventTypes.AGENT_FAILED, + handler: async (event) => { + const { projectId, error } = event.payload as Record; + console.error(`Agent failed for project ${projectId}:`, error); + + // Emit pipeline failure event + this.eventBus.publish({ + type: PipelineEventTypes.PIPELINE_COMPLETED, + source: 'orchestrator', + payload: { projectId, status: 'failed', error } + }); + } + }); + } + + /** + * Get next role in the pipeline sequence + */ + private getNextRole(currentRole: AgentRole): AgentRole | null { + const sequence: AgentRole[] = ['programmer', 'reviewer', 'tester']; + const currentIndex = sequence.indexOf(currentRole); + + if (currentIndex < sequence.length - 1) { + return sequence[currentIndex + 1]; + } + + return null; // End of pipeline + } + + /** + * Execute an agent task + */ + private async executeAgentTask( + task: PipelineTask, + session: AgentSession + ): Promise { + if (!this.zai) { + throw new Error('Pipeline not initialized'); + } + + // Create workspace for this task + const workspace = this.workspaceFactory.createWorkspace({ + projectId: session.projectId, + agentId: session.id, + role: session.role, + permissions: this.getPermissionsForRole(session.role) + }); + + // Set agent identity + workspace.setIdentity(session.identity); + + // Build messages for LLM + const messages = this.buildMessages(task, session, workspace); + + try { + // Call LLM + const response = await this.zai.chat.completions.create({ + messages, + thinking: { type: 'disabled' } + }); + + const output = response.choices?.[0]?.message?.content || ''; + + // Save output to workspace + workspace.writeFile(`output/${task.id}.txt`, output); + + // Store in memory for next agent + workspace.memorize(`task.${task.id}.output`, output); + + // Emit completion event + this.eventBus.publish({ + type: PipelineEventTypes.AGENT_COMPLETED, + source: session.id, + payload: { + taskId: task.id, + projectId: session.projectId, + role: session.role, + output + } + }); + + return { output, workspace: workspace.getPath() }; + + } catch (error) { + // Emit failure event + this.eventBus.publish({ + type: PipelineEventTypes.AGENT_FAILED, + source: session.id, + payload: { + taskId: task.id, + projectId: session.projectId, + role: session.role, + error: error instanceof Error ? error.message : String(error) + } + }); + + throw error; + } + } + + /** + * Build messages for LLM + */ + private buildMessages( + task: PipelineTask, + session: AgentSession, + workspace: WorkspaceManager + ): AgentMessage[] { + const messages: AgentMessage[] = []; + + // System prompt with identity + messages.push({ + role: 'system', + content: this.buildSystemPrompt(session, workspace) + }); + + // Task description + messages.push({ + role: 'user', + content: `## Task\n${task.description}\n\n## Context\nProject: ${session.projectId}\nRole: ${session.role}\n\n## Instructions\nComplete this task and provide your output.` + }); + + // Add any previous context from memory + const previousOutput = workspace.recall('previous.output'); + if (previousOutput) { + messages.push({ + role: 'user', + content: `## Previous Work\n${JSON.stringify(previousOutput, null, 2)}` + }); + } + + return messages; + } + + /** + * Build system prompt for agent + */ + private buildSystemPrompt(session: AgentSession, workspace: WorkspaceManager): string { + const identity = session.identity; + const role = session.role; + + const roleInstructions: Record = { + programmer: `You are responsible for writing clean, efficient, and well-documented code. +- Follow best practices and coding standards +- Write tests for your code +- Ensure code is production-ready`, + reviewer: `You are responsible for reviewing code for quality, bugs, and improvements. +- Check for security vulnerabilities +- Verify coding standards +- Suggest improvements +- Approve or request changes`, + tester: `You are responsible for testing the code thoroughly. +- Write comprehensive test cases +- Test edge cases and error handling +- Verify functionality meets requirements +- Report test results clearly`, + planner: `You are responsible for planning and architecture. +- Break down complex tasks +- Design system architecture +- Identify dependencies +- Create implementation plans`, + analyst: `You are responsible for analysis and reporting. +- Analyze data and metrics +- Identify patterns and insights +- Create reports and recommendations`, + custom: `You are a custom agent with specific instructions.` + }; + + return `# Agent Identity + +Name: ${identity.name} +Role: ${role} +Description: ${identity.description} + +# Personality +${identity.personality || 'Professional and efficient.'} + +# Role Instructions +${roleInstructions[role] || roleInstructions.custom} + +# Workspace +Your workspace is at: ${workspace.getPath()} + +# Available Tools +${session.tools.map(t => `- ${t}`).join('\n')} + +# Constraints +- Stay within your role boundaries +- Communicate clearly and concisely +- Report progress and issues promptly`; + } + + /** + * Get permissions for a role + */ + private getPermissionsForRole(role: AgentRole): string[] { + const permissionMap: Record = { + programmer: ['read', 'write', 'execute', 'git'], + reviewer: ['read', 'diff'], + tester: ['read', 'execute', 'test'], + planner: ['read', 'write'], + analyst: ['read'], + custom: ['read'] + }; + return permissionMap[role] || ['read']; + } + + // ========================================================================= + // Public API + // ========================================================================= + + /** + * Create and start a pipeline + */ + async createPipeline(config: PipelineConfig): Promise { + await this.initialize(); + + const pipelineId = `pipeline-${randomUUID().substring(0, 8)}`; + const result: PipelineResult = { + pipelineId, + status: 'running', + startTime: new Date(), + projects: config.projects.map(p => ({ + projectId: p.id, + status: 'pending', + tasks: [] + })) + }; + + this.pipelines.set(pipelineId, result); + + // Create tasks for all projects and roles + const tasks: PipelineTask[] = []; + + for (const project of config.projects) { + for (const taskConfig of project.tasks) { + const task = this.executor.submitTask({ + projectId: project.id, + role: taskConfig.role, + type: taskConfig.type || 'agent-task', + description: taskConfig.description, + priority: taskConfig.priority || 'medium', + input: { project, task: taskConfig }, + dependencies: taskConfig.dependencies || [], + timeout: taskConfig.timeout || config.timeout || 300000, + maxRetries: 3 + }); + tasks.push(task); + } + } + + // Emit pipeline started event + this.eventBus.publish({ + type: PipelineEventTypes.PIPELINE_STARTED, + source: 'orchestrator', + payload: { pipelineId, config, taskCount: tasks.length } + }); + + return pipelineId; + } + + /** + * Create pipeline from YAML workflow + */ + async createPipelineFromYAML(workflowId: string, context?: Record): Promise { + await this.initialize(); + + const workflow = this.workflowRegistry.get(workflowId); + if (!workflow) { + throw new Error(`Workflow ${workflowId} not found`); + } + + const definition = this.workflowRegistry.getParsed(workflowId)!; + + // Create state machine instance + const sm = this.smRegistry.createInstance(workflowId); + + // Update context if provided + if (context) { + sm.updateContext(context); + } + + // Start the state machine + sm.start(); + + // Listen for state transitions + sm.on('transition', ({ from, to, event }) => { + this.eventBus.publish({ + type: PipelineEventTypes.TRANSITION, + source: sm.getInstance().id, + payload: { workflowId, from, to, event } + }); + }); + + // Listen for actions + sm.on('action', async ({ state, context }) => { + if (state.agent || state.metadata?.role) { + // Submit task to executor + this.executor.submitTask({ + projectId: context.projectId as string || 'default', + role: state.metadata?.role as AgentRole || 'programmer', + type: 'agent-task', + description: `Execute ${state.name}`, + priority: 'high', + input: { state, context }, + dependencies: [], + timeout: state.timeout || 300000, + maxRetries: state.retry?.maxAttempts || 3 + }); + } + }); + + return sm.getInstance().id; + } + + /** + * Register a custom workflow + */ + registerWorkflow(yaml: YAMLWorkflow): StateMachineDefinition { + return this.workflowRegistry.register(yaml); + } + + /** + * Get pipeline status + */ + getPipelineStatus(pipelineId: string): PipelineResult | undefined { + return this.pipelines.get(pipelineId); + } + + /** + * Cancel a pipeline + */ + async cancelPipeline(pipelineId: string): Promise { + const pipeline = this.pipelines.get(pipelineId); + if (pipeline) { + pipeline.status = 'cancelled'; + pipeline.endTime = new Date(); + + this.eventBus.publish({ + type: PipelineEventTypes.PIPELINE_COMPLETED, + source: 'orchestrator', + payload: { pipelineId, status: 'cancelled' } + }); + } + } + + /** + * Get system statistics + */ + getStats(): { + pipelines: number; + executor: ReturnType; + eventBus: ReturnType; + workspaces: ReturnType; + } { + return { + pipelines: this.pipelines.size, + executor: this.executor.getStats(), + eventBus: this.eventBus.getStats(), + workspaces: this.workspaceFactory.getStats() + }; + } + + /** + * Subscribe to pipeline events + */ + onEvent(eventType: string, handler: (event: PipelineEvent) => void): () => void { + return this.eventBus.subscribe({ eventType, handler }); + } + + /** + * Shutdown the pipeline system + */ + async shutdown(): Promise { + await this.executor.stop(); + this.eventBus.stop(); + this.initialized = false; + } +} + +// ============================================================================ +// Quick Start Functions +// ============================================================================ + +/** + * Create a simple code pipeline + */ +export async function createCodePipeline(projects: ProjectConfig[]): Promise { + const orchestrator = new PipelineOrchestrator(); + + return orchestrator.createPipeline({ + name: 'Code Pipeline', + projects, + roles: ['programmer', 'reviewer', 'tester'], + maxConcurrency: 12, // 4 projects × 3 roles + timeout: 300000 + }); +} + +/** + * Create a parallel execution pipeline + */ +export async function createParallelPipeline(config: PipelineConfig): Promise { + const orchestrator = new PipelineOrchestrator(); + return orchestrator.createPipeline(config); +} + +/** + * Run a predefined workflow + */ +export async function runWorkflow( + workflowId: string, + context?: Record +): Promise { + const orchestrator = new PipelineOrchestrator(); + return orchestrator.createPipelineFromYAML(workflowId, context); +} + +// Default orchestrator instance +export const defaultOrchestrator = new PipelineOrchestrator(); diff --git a/pipeline-system/workflows/yaml-workflow.ts b/pipeline-system/workflows/yaml-workflow.ts new file mode 100644 index 0000000..2b905f7 --- /dev/null +++ b/pipeline-system/workflows/yaml-workflow.ts @@ -0,0 +1,540 @@ +/** + * YAML Workflow Integration (Lobster-Compatible) + * + * Parses YAML workflow definitions and converts them to + * deterministic state machine definitions. + * + * Compatible with OpenClaw/Lobster workflow format. + */ + +import { StateMachineDefinition, State, Transition, RetryConfig } from '../core/state-machine'; +import { AgentRole } from '../engine/parallel-executor'; + +// ============================================================================ +// Types +// ============================================================================ + +export interface YAMLWorkflow { + id: string; + name: string; + version?: string; + description?: string; + initial: string; + states: Record; + events?: string[]; + context?: Record; +} + +export interface YAMLState { + type: 'start' | 'end' | 'action' | 'parallel' | 'choice' | 'wait' | 'loop' | 'subworkflow'; + agent?: string; + role?: AgentRole; + action?: string; + timeout?: number | string; + retry?: YAMLRetryConfig; + on?: Record; + branches?: Record; + conditions?: YAMLCondition[]; + subworkflow?: string; + loop?: YAMLLoopConfig; + metadata?: Record; +} + +export interface YAMLTransition { + target: string; + condition?: YAMLCondition; + guard?: string; +} + +export interface YAMLCondition { + type: 'equals' | 'contains' | 'exists' | 'custom'; + field: string; + value?: unknown; +} + +export interface YAMLRetryConfig { + maxAttempts: number; + backoff?: 'fixed' | 'exponential' | 'linear'; + initialDelay?: number | string; + maxDelay?: number | string; +} + +export interface YAMLLoopConfig { + maxIterations: number; + iterator?: string; + body: string; + exitCondition?: YAMLCondition; +} + +// ============================================================================ +// Workflow Parser +// ============================================================================ + +/** + * WorkflowParser - Parses YAML workflows to state machine definitions + */ +export class WorkflowParser { + /** + * Parse a YAML workflow to a state machine definition + */ + parse(yaml: YAMLWorkflow): StateMachineDefinition { + const states: Record = {}; + + for (const [stateId, yamlState] of Object.entries(yaml.states)) { + states[stateId] = this.parseState(stateId, yamlState); + } + + return { + id: yaml.id, + name: yaml.name, + version: yaml.version || '1.0.0', + description: yaml.description, + initial: yaml.initial, + states, + events: yaml.events, + context: yaml.context + }; + } + + /** + * Parse a single state + */ + private parseState(stateId: string, yamlState: YAMLState): State { + const state: State = { + id: stateId, + name: stateId, + type: yamlState.type, + agent: yamlState.agent, + action: yamlState.action, + timeout: this.parseDuration(yamlState.timeout), + metadata: { + ...yamlState.metadata, + role: yamlState.role + } + }; + + // Parse retry config + if (yamlState.retry) { + state.retry = { + maxAttempts: yamlState.retry.maxAttempts, + backoff: yamlState.retry.backoff || 'exponential', + initialDelay: this.parseDuration(yamlState.retry.initialDelay) || 1000, + maxDelay: this.parseDuration(yamlState.retry.maxDelay) || 60000 + }; + } + + // Parse transitions (on) + if (yamlState.on) { + const transitions = this.parseTransitions(yamlState.on); + state.onExit = transitions; + } + + // Parse parallel branches + if (yamlState.branches) { + state.type = 'parallel'; + state.onEnter = Object.entries(yamlState.branches).map(([event, target]) => ({ + event, + target + })); + } + + // Parse loop config + if (yamlState.loop) { + state.type = 'loop'; + state.metadata = { + ...state.metadata, + maxIterations: yamlState.loop.maxIterations, + iterator: yamlState.loop.iterator, + body: yamlState.loop.body + }; + + // Add loop transitions + state.onExit = [ + { event: 'continue', target: yamlState.loop.body }, + { event: 'exit', target: yamlState.on?.['exit'] as string || 'end' } + ]; + } + + // Parse subworkflow + if (yamlState.subworkflow) { + state.type = 'action'; + state.action = 'subworkflow'; + state.metadata = { + ...state.metadata, + subworkflow: yamlState.subworkflow + }; + } + + return state; + } + + /** + * Parse transitions from YAML format + */ + private parseTransitions(on: Record): Transition[] { + const transitions: Transition[] = []; + + for (const [event, transition] of Object.entries(on)) { + if (typeof transition === 'string') { + transitions.push({ event, target: transition }); + } else { + transitions.push({ + event, + target: transition.target, + condition: transition.condition ? this.parseCondition(transition.condition) : undefined, + guard: transition.guard + }); + } + } + + return transitions; + } + + /** + * Parse a condition + */ + private parseCondition(yamlCond: YAMLCondition): Transition['condition'] { + return { + type: yamlCond.type, + field: yamlCond.field, + value: yamlCond.value + }; + } + + /** + * Parse duration string (e.g., '30s', '5m', '1h') + */ + private parseDuration(duration?: number | string): number | undefined { + if (typeof duration === 'number') return duration; + if (!duration) return undefined; + + const match = duration.match(/^(\d+)(ms|s|m|h)?$/); + if (!match) return undefined; + + const value = parseInt(match[1]); + const unit = match[2] || 'ms'; + + switch (unit) { + case 'ms': return value; + case 's': return value * 1000; + case 'm': return value * 60 * 1000; + case 'h': return value * 60 * 60 * 1000; + default: return value; + } + } +} + +// ============================================================================ +// Workflow Registry +// ============================================================================ + +/** + * WorkflowRegistry - Manages workflow definitions + */ +export class WorkflowRegistry { + private workflows: Map = new Map(); + private parser: WorkflowParser; + + constructor() { + this.parser = new WorkflowParser(); + } + + /** + * Register a workflow from YAML object + */ + register(yaml: YAMLWorkflow): StateMachineDefinition { + this.workflows.set(yaml.id, yaml); + return this.parser.parse(yaml); + } + + /** + * Get a workflow by ID + */ + get(id: string): YAMLWorkflow | undefined { + return this.workflows.get(id); + } + + /** + * Get parsed state machine definition + */ + getParsed(id: string): StateMachineDefinition | undefined { + const yaml = this.workflows.get(id); + if (yaml) { + return this.parser.parse(yaml); + } + return undefined; + } + + /** + * List all workflows + */ + list(): string[] { + return Array.from(this.workflows.keys()); + } +} + +// ============================================================================ +// Predefined Workflows +// ============================================================================ + +/** + * Standard Code Pipeline Workflow + * + * Code → Review → Test → Done + * With max 3 review iterations + */ +export const CODE_PIPELINE_WORKFLOW: YAMLWorkflow = { + id: 'code-pipeline', + name: 'Code Pipeline', + version: '1.0.0', + description: 'Code → Review → Test pipeline with deterministic flow', + initial: 'start', + context: { + reviewIteration: 0, + maxReviewIterations: 3 + }, + states: { + start: { + type: 'start', + on: { + 'start': 'code' + } + }, + code: { + type: 'action', + role: 'programmer', + timeout: '30m', + retry: { + maxAttempts: 2, + backoff: 'exponential', + initialDelay: '5s', + maxDelay: '1m' + }, + on: { + 'completed': 'review', + 'failed': 'failed' + } + }, + review: { + type: 'choice', + conditions: [ + { type: 'equals', field: 'reviewApproved', value: true } + ], + on: { + 'approved': 'test', + 'rejected': 'review_loop', + 'failed': 'failed' + } + }, + review_loop: { + type: 'loop', + loop: { + maxIterations: 3, + body: 'code' + }, + on: { + 'exit': 'failed' + } + }, + test: { + type: 'action', + role: 'tester', + timeout: '15m', + on: { + 'passed': 'end', + 'failed': 'test_failed' + } + }, + test_failed: { + type: 'choice', + on: { + 'retry': 'code', + 'abort': 'failed' + } + }, + end: { + type: 'end' + }, + failed: { + type: 'end', + metadata: { status: 'failed' } + } + } +}; + +/** + * Parallel Multi-Project Workflow + * + * Runs multiple projects in parallel + */ +export const PARALLEL_PROJECTS_WORKFLOW: YAMLWorkflow = { + id: 'parallel-projects', + name: 'Parallel Projects Pipeline', + version: '1.0.0', + description: 'Run multiple projects in parallel with synchronized completion', + initial: 'start', + states: { + start: { + type: 'start', + on: { + 'start': 'parallel' + } + }, + parallel: { + type: 'parallel', + branches: { + 'project1': 'project1_code', + 'project2': 'project2_code', + 'project3': 'project3_code', + 'project4': 'project4_code' + }, + on: { + 'all_completed': 'end', + 'any_failed': 'failed' + } + }, + project1_code: { + type: 'action', + role: 'programmer', + agent: 'project1-programmer', + on: { 'completed': 'project1_review' } + }, + project1_review: { + type: 'action', + role: 'reviewer', + agent: 'project1-reviewer', + on: { 'completed': 'project1_test' } + }, + project1_test: { + type: 'action', + role: 'tester', + agent: 'project1-tester', + on: { 'completed': 'join' } + }, + project2_code: { + type: 'action', + role: 'programmer', + agent: 'project2-programmer', + on: { 'completed': 'project2_review' } + }, + project2_review: { + type: 'action', + role: 'reviewer', + agent: 'project2-reviewer', + on: { 'completed': 'project2_test' } + }, + project2_test: { + type: 'action', + role: 'tester', + agent: 'project2-tester', + on: { 'completed': 'join' } + }, + project3_code: { + type: 'action', + role: 'programmer', + agent: 'project3-programmer', + on: { 'completed': 'project3_review' } + }, + project3_review: { + type: 'action', + role: 'reviewer', + agent: 'project3-reviewer', + on: { 'completed': 'project3_test' } + }, + project3_test: { + type: 'action', + role: 'tester', + agent: 'project3-tester', + on: { 'completed': 'join' } + }, + project4_code: { + type: 'action', + role: 'programmer', + agent: 'project4-programmer', + on: { 'completed': 'project4_review' } + }, + project4_review: { + type: 'action', + role: 'reviewer', + agent: 'project4-reviewer', + on: { 'completed': 'project4_test' } + }, + project4_test: { + type: 'action', + role: 'tester', + agent: 'project4-tester', + on: { 'completed': 'join' } + }, + join: { + type: 'wait', + on: { + 'all_joined': 'end' + } + }, + end: { + type: 'end' + }, + failed: { + type: 'end', + metadata: { status: 'failed' } + } + } +}; + +/** + * Human-in-the-Loop Workflow + */ +export const HUMAN_APPROVAL_WORKFLOW: YAMLWorkflow = { + id: 'human-approval', + name: 'Human Approval Workflow', + version: '1.0.0', + description: 'Workflow with human approval gates', + initial: 'start', + states: { + start: { + type: 'start', + on: { 'start': 'plan' } + }, + plan: { + type: 'action', + role: 'planner', + on: { 'completed': 'await_approval' } + }, + await_approval: { + type: 'wait', + timeout: '24h', + on: { + 'approved': 'execute', + 'rejected': 'plan', + 'timeout': 'notify_timeout' + } + }, + notify_timeout: { + type: 'action', + action: 'notify', + metadata: { message: 'Approval timeout' }, + on: { 'completed': 'await_approval' } + }, + execute: { + type: 'action', + role: 'programmer', + on: { 'completed': 'review' } + }, + review: { + type: 'action', + role: 'reviewer', + on: { 'completed': 'end' } + }, + end: { + type: 'end' + } + } +}; + +// Default registry with predefined workflows +export const defaultWorkflowRegistry = new WorkflowRegistry(); + +// Register predefined workflows +defaultWorkflowRegistry.register(CODE_PIPELINE_WORKFLOW); +defaultWorkflowRegistry.register(PARALLEL_PROJECTS_WORKFLOW); +defaultWorkflowRegistry.register(HUMAN_APPROVAL_WORKFLOW); diff --git a/pipeline-system/workspace/agent-workspace.ts b/pipeline-system/workspace/agent-workspace.ts new file mode 100644 index 0000000..85d77a1 --- /dev/null +++ b/pipeline-system/workspace/agent-workspace.ts @@ -0,0 +1,642 @@ +/** + * Agent Workspace Isolation + * + * Each agent gets its own tools, memory, identity, and workspace. + * Provides isolation and resource management for parallel agents. + */ + +import { randomUUID } from 'crypto'; +import { EventEmitter } from 'events'; +import { mkdirSync, rmSync, existsSync, writeFileSync, readFileSync, readdirSync, statSync } from 'fs'; +import { join, resolve, relative } from 'path'; + +// ============================================================================ +// Types +// ============================================================================ + +export type Permission = 'read' | 'write' | 'execute' | 'delete' | 'network' | 'git'; + +export interface WorkspaceConfig { + id: string; + projectId: string; + agentId: string; + role: string; + basePath: string; + permissions: Permission[]; + resourceLimits: ResourceLimits; + environment: Record; + mountPoints: MountPoint[]; +} + +export interface ResourceLimits { + maxMemoryMB: number; + maxCpuPercent: number; + maxFileSizeMB: number; + maxExecutionTimeMs: number; + maxFileCount: number; +} + +export interface MountPoint { + source: string; + target: string; + readOnly: boolean; +} + +export interface AgentTool { + name: string; + description: string; + permissions: Permission[]; + execute: (params: unknown, context: ToolContext) => Promise; +} + +export interface ToolContext { + workspace: WorkspaceManager; + agentId: string; + sessionId: string; + permissions: Permission[]; +} + +export interface ToolResult { + success: boolean; + output?: unknown; + error?: string; + metadata?: Record; +} + +export interface MemoryStore { + shortTerm: Map; + longTerm: Map; + session: Map; +} + +export interface AgentIdentity { + id: string; + name: string; + role: string; + description: string; + personality: string; + systemPrompt: string; + capabilities: string[]; + constraints: string[]; +} + +// ============================================================================ +// Workspace Manager +// ============================================================================ + +/** + * WorkspaceManager - Isolated workspace for an agent + */ +export class WorkspaceManager extends EventEmitter { + private config: WorkspaceConfig; + private workspacePath: string; + private memory: MemoryStore; + private identity: AgentIdentity; + private tools: Map = new Map(); + private fileHandles: Map = new Map(); + private active = true; + + constructor(config: WorkspaceConfig) { + super(); + this.config = config; + this.workspacePath = resolve(config.basePath, config.projectId, config.agentId); + this.memory = { + shortTerm: new Map(), + longTerm: new Map(), + session: new Map() + }; + + this.initializeWorkspace(); + } + + /** + * Initialize the workspace directory + */ + private initializeWorkspace(): void { + if (!existsSync(this.workspacePath)) { + mkdirSync(this.workspacePath, { recursive: true }); + } + + // Create subdirectories + const subdirs = ['memory', 'output', 'cache', 'logs']; + for (const dir of subdirs) { + const path = join(this.workspacePath, dir); + if (!existsSync(path)) { + mkdirSync(path, { recursive: true }); + } + } + + this.emit('workspaceInitialized', { path: this.workspacePath }); + } + + /** + * Set agent identity + */ + setIdentity(identity: AgentIdentity): void { + this.identity = identity; + this.emit('identitySet', { identity }); + } + + /** + * Get agent identity + */ + getIdentity(): AgentIdentity | undefined { + return this.identity; + } + + /** + * Register a tool + */ + registerTool(tool: AgentTool): void { + // Check if agent has required permissions + const hasPermission = tool.permissions.every(p => + this.config.permissions.includes(p) + ); + + if (!hasPermission) { + throw new Error(`Agent does not have required permissions for tool: ${tool.name}`); + } + + this.tools.set(tool.name, tool); + this.emit('toolRegistered', { tool }); + } + + /** + * Unregister a tool + */ + unregisterTool(name: string): boolean { + return this.tools.delete(name); + } + + /** + * Execute a tool + */ + async executeTool(name: string, params: unknown): Promise { + const tool = this.tools.get(name); + if (!tool) { + return { success: false, error: `Tool not found: ${name}` }; + } + + const context: ToolContext = { + workspace: this, + agentId: this.config.agentId, + sessionId: this.config.id, + permissions: this.config.permissions + }; + + try { + const result = await tool.execute(params, context); + this.emit('toolExecuted', { name, params, result }); + return result; + } catch (error) { + const result: ToolResult = { + success: false, + error: error instanceof Error ? error.message : String(error) + }; + this.emit('toolError', { name, params, error: result.error }); + return result; + } + } + + /** + * Get available tools + */ + getAvailableTools(): AgentTool[] { + return Array.from(this.tools.values()); + } + + // ============================================================================ + // Memory Management + // ============================================================================ + + /** + * Store value in short-term memory + */ + remember(key: string, value: unknown): void { + this.memory.shortTerm.set(key, value); + this.emit('memoryStored', { type: 'shortTerm', key }); + } + + /** + * Store value in long-term memory + */ + memorize(key: string, value: unknown): void { + this.memory.longTerm.set(key, value); + this.saveMemoryToFile(key, value, 'longTerm'); + this.emit('memoryStored', { type: 'longTerm', key }); + } + + /** + * Store value in session memory + */ + storeSession(key: string, value: unknown): void { + this.memory.session.set(key, value); + this.emit('memoryStored', { type: 'session', key }); + } + + /** + * Retrieve value from memory + */ + recall(key: string): unknown | undefined { + return ( + this.memory.shortTerm.get(key) || + this.memory.longTerm.get(key) || + this.memory.session.get(key) + ); + } + + /** + * Check if memory exists + */ + hasMemory(key: string): boolean { + return ( + this.memory.shortTerm.has(key) || + this.memory.longTerm.has(key) || + this.memory.session.has(key) + ); + } + + /** + * Forget a memory + */ + forget(key: string): boolean { + return ( + this.memory.shortTerm.delete(key) || + this.memory.longTerm.delete(key) || + this.memory.session.delete(key) + ); + } + + /** + * Clear all short-term memory + */ + clearShortTerm(): void { + this.memory.shortTerm.clear(); + this.emit('memoryCleared', { type: 'shortTerm' }); + } + + /** + * Clear session memory + */ + clearSession(): void { + this.memory.session.clear(); + this.emit('memoryCleared', { type: 'session' }); + } + + /** + * Save memory to file + */ + private saveMemoryToFile(key: string, value: unknown, type: string): void { + const memoryPath = join(this.workspacePath, 'memory', `${type}.json`); + let data: Record = {}; + + if (existsSync(memoryPath)) { + try { + data = JSON.parse(readFileSync(memoryPath, 'utf-8')); + } catch { + data = {}; + } + } + + data[key] = value; + writeFileSync(memoryPath, JSON.stringify(data, null, 2), 'utf-8'); + } + + /** + * Load long-term memory from file + */ + loadLongTermMemory(): void { + const memoryPath = join(this.workspacePath, 'memory', 'longTerm.json'); + if (existsSync(memoryPath)) { + try { + const data = JSON.parse(readFileSync(memoryPath, 'utf-8')); + for (const [key, value] of Object.entries(data)) { + this.memory.longTerm.set(key, value); + } + } catch { + // Ignore errors + } + } + } + + // ============================================================================ + // File Operations + // ============================================================================ + + /** + * Read a file + */ + readFile(path: string): string { + this.checkPermission('read'); + const fullPath = this.resolvePath(path); + this.checkPathInWorkspace(fullPath); + + return readFileSync(fullPath, 'utf-8'); + } + + /** + * Write a file + */ + writeFile(path: string, content: string): void { + this.checkPermission('write'); + const fullPath = this.resolvePath(path); + this.checkPathInWorkspace(fullPath); + this.checkFileSize(content.length); + + writeFileSync(fullPath, content, 'utf-8'); + this.emit('fileWritten', { path: fullPath }); + } + + /** + * Delete a file + */ + deleteFile(path: string): void { + this.checkPermission('delete'); + const fullPath = this.resolvePath(path); + this.checkPathInWorkspace(fullPath); + + rmSync(fullPath, { force: true }); + this.emit('fileDeleted', { path: fullPath }); + } + + /** + * List files in a directory + */ + listFiles(path: string = ''): string[] { + this.checkPermission('read'); + const fullPath = this.resolvePath(path); + this.checkPathInWorkspace(fullPath); + + if (!existsSync(fullPath)) return []; + + return readdirSync(fullPath).map(name => join(path, name)); + } + + /** + * Check if file exists + */ + fileExists(path: string): boolean { + const fullPath = this.resolvePath(path); + this.checkPathInWorkspace(fullPath); + return existsSync(fullPath); + } + + /** + * Get file stats + */ + getFileStats(path: string): { size: number; modified: Date; isDirectory: boolean } | null { + const fullPath = this.resolvePath(path); + this.checkPathInWorkspace(fullPath); + + if (!existsSync(fullPath)) return null; + + const stats = statSync(fullPath); + return { + size: stats.size, + modified: stats.mtime, + isDirectory: stats.isDirectory() + }; + } + + // ============================================================================ + // Permission & Security + // ============================================================================ + + /** + * Check if agent has a permission + */ + hasPermission(permission: Permission): boolean { + return this.config.permissions.includes(permission); + } + + /** + * Check permission and throw if missing + */ + private checkPermission(permission: Permission): void { + if (!this.hasPermission(permission)) { + throw new Error(`Permission denied: ${permission}`); + } + } + + /** + * Resolve path relative to workspace + */ + private resolvePath(path: string): string { + return resolve(this.workspacePath, path); + } + + /** + * Check if path is within workspace + */ + private checkPathInWorkspace(fullPath: string): void { + const relativePath = relative(this.workspacePath, fullPath); + if (relativePath.startsWith('..') || relativePath.startsWith('/')) { + throw new Error('Path is outside workspace boundaries'); + } + } + + /** + * Check file size limit + */ + private checkFileSize(size: number): void { + const maxBytes = this.config.resourceLimits.maxFileSizeMB * 1024 * 1024; + if (size > maxBytes) { + throw new Error(`File size exceeds limit: ${this.config.resourceLimits.maxFileSizeMB}MB`); + } + } + + // ============================================================================ + // Lifecycle + // ============================================================================ + + /** + * Get workspace path + */ + getPath(): string { + return this.workspacePath; + } + + /** + * Get workspace config + */ + getConfig(): WorkspaceConfig { + return { ...this.config }; + } + + /** + * Clean up workspace + */ + cleanup(): void { + this.active = false; + this.clearSession(); + this.emit('workspaceCleanup', { path: this.workspacePath }); + } + + /** + * Destroy workspace (delete files) + */ + destroy(): void { + this.cleanup(); + + if (existsSync(this.workspacePath)) { + rmSync(this.workspacePath, { recursive: true, force: true }); + } + + this.emit('workspaceDestroyed', { path: this.workspacePath }); + } + + /** + * Export workspace state + */ + exportState(): { + config: WorkspaceConfig; + memory: Record; + identity?: AgentIdentity; + tools: string[]; + } { + return { + config: this.getConfig(), + memory: { + shortTerm: Object.fromEntries(this.memory.shortTerm), + longTerm: Object.fromEntries(this.memory.longTerm), + session: Object.fromEntries(this.memory.session) + }, + identity: this.identity, + tools: Array.from(this.tools.keys()) + }; + } +} + +// ============================================================================ +// Workspace Factory +// ============================================================================ + +/** + * WorkspaceFactory - Creates and manages workspaces + */ +export class WorkspaceFactory { + private basePath: string; + private workspaces: Map = new Map(); + + constructor(basePath: string = './workspaces') { + this.basePath = resolve(basePath); + + if (!existsSync(this.basePath)) { + mkdirSync(this.basePath, { recursive: true }); + } + } + + /** + * Create a new workspace + */ + createWorkspace(config: { + projectId: string; + agentId: string; + role: string; + permissions?: Permission[]; + resourceLimits?: Partial; + }): WorkspaceManager { + const id = `ws-${randomUUID().substring(0, 8)}`; + + const fullConfig: WorkspaceConfig = { + id, + projectId: config.projectId, + agentId: config.agentId, + role: config.role, + basePath: this.basePath, + permissions: config.permissions || ['read'], + resourceLimits: { + maxMemoryMB: 512, + maxCpuPercent: 50, + maxFileSizeMB: 10, + maxExecutionTimeMs: 60000, + maxFileCount: 1000, + ...config.resourceLimits + }, + environment: {}, + mountPoints: [] + }; + + const workspace = new WorkspaceManager(fullConfig); + this.workspaces.set(id, workspace); + + return workspace; + } + + /** + * Get a workspace by ID + */ + getWorkspace(id: string): WorkspaceManager | undefined { + return this.workspaces.get(id); + } + + /** + * Get workspaces by project + */ + getWorkspacesByProject(projectId: string): WorkspaceManager[] { + return Array.from(this.workspaces.values()) + .filter(w => w.getConfig().projectId === projectId); + } + + /** + * Get all workspaces + */ + getAllWorkspaces(): WorkspaceManager[] { + return Array.from(this.workspaces.values()); + } + + /** + * Destroy a workspace + */ + destroyWorkspace(id: string): boolean { + const workspace = this.workspaces.get(id); + if (workspace) { + workspace.destroy(); + return this.workspaces.delete(id); + } + return false; + } + + /** + * Destroy all workspaces for a project + */ + destroyProjectWorkspaces(projectId: string): number { + const projectWorkspaces = this.getWorkspacesByProject(projectId); + let count = 0; + + for (const workspace of projectWorkspaces) { + workspace.destroy(); + this.workspaces.delete(workspace.getConfig().id); + count++; + } + + return count; + } + + /** + * Get factory stats + */ + getStats(): { + totalWorkspaces: number; + byProject: Record; + byRole: Record; + } { + const byProject: Record = {}; + const byRole: Record = {}; + + for (const workspace of this.workspaces.values()) { + const config = workspace.getConfig(); + byProject[config.projectId] = (byProject[config.projectId] || 0) + 1; + byRole[config.role] = (byRole[config.role] || 0) + 1; + } + + return { + totalWorkspaces: this.workspaces.size, + byProject, + byRole + }; + } +} + +// Default factory instance +export const defaultWorkspaceFactory = new WorkspaceFactory(); diff --git a/skills/ASR/LICENSE.txt b/skills/ASR/LICENSE.txt new file mode 100755 index 0000000..1e54539 --- /dev/null +++ b/skills/ASR/LICENSE.txt @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2025 z-ai-web-dev-sdk Skills + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/skills/ASR/SKILL.md b/skills/ASR/SKILL.md new file mode 100755 index 0000000..fde9bd7 --- /dev/null +++ b/skills/ASR/SKILL.md @@ -0,0 +1,580 @@ +--- +name: ASR +description: Implement speech-to-text (ASR/automatic speech recognition) capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to transcribe audio files, convert speech to text, build voice input features, or process audio recordings. Supports base64 encoded audio files and returns accurate text transcriptions. +license: MIT +--- + +# ASR (Speech to Text) Skill + +This skill guides the implementation of speech-to-text (ASR) functionality using the z-ai-web-dev-sdk package, enabling accurate transcription of spoken audio into text. + +## Skills Path + +**Skill Location**: `{project_path}/skills/ASR` + +this skill is located at above path in your project. + +**Reference Scripts**: Example test scripts are available in the `{Skill Location}/scripts/` directory for quick testing and reference. See `{Skill Location}/scripts/asr.ts` for a working example. + +## Overview + +Speech-to-Text (ASR - Automatic Speech Recognition) allows you to build applications that convert spoken language in audio files into written text, enabling voice-controlled interfaces, transcription services, and audio content analysis. + +**IMPORTANT**: z-ai-web-dev-sdk MUST be used in backend code only. Never use it in client-side code. + +## Prerequisites + +The z-ai-web-dev-sdk package is already installed. Import it as shown in the examples below. + +## CLI Usage (For Simple Tasks) + +For simple audio transcription tasks, you can use the z-ai CLI instead of writing code. This is ideal for quick transcriptions, testing audio files, or batch processing. + +### Basic Transcription from File + +```bash +# Transcribe an audio file +z-ai asr --file ./audio.wav + +# Save transcription to JSON file +z-ai asr -f ./recording.mp3 -o transcript.json + +# Transcribe and view output +z-ai asr --file ./interview.wav --output result.json +``` + +### Transcription from Base64 + +```bash +# Transcribe from base64 encoded audio +z-ai asr --base64 "UklGRiQAAABXQVZFZm10..." -o result.json + +# Using short option +z-ai asr -b "base64_encoded_audio_data" -o transcript.json +``` + +### Streaming Output + +```bash +# Stream transcription results +z-ai asr -f ./audio.wav --stream +``` + +### CLI Parameters + +- `--file, -f `: **Required** (if not using --base64) - Audio file path +- `--base64, -b `: **Required** (if not using --file) - Base64 encoded audio +- `--output, -o `: Optional - Output file path (JSON format) +- `--stream`: Optional - Stream the transcription output + +### Supported Audio Formats + +The ASR service supports various audio formats including: +- WAV (.wav) +- MP3 (.mp3) +- Other common audio formats + +### When to Use CLI vs SDK + +**Use CLI for:** +- Quick audio file transcriptions +- Testing audio recognition accuracy +- Simple batch processing scripts +- One-off transcription tasks + +**Use SDK for:** +- Real-time audio transcription in applications +- Integration with recording systems +- Custom audio processing workflows +- Production applications with streaming audio + +## Basic ASR Implementation + +### Simple Audio Transcription + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; + +async function transcribeAudio(audioFilePath) { + const zai = await ZAI.create(); + + // Read audio file and convert to base64 + const audioFile = fs.readFileSync(audioFilePath); + const base64Audio = audioFile.toString('base64'); + + const response = await zai.audio.asr.create({ + file_base64: base64Audio + }); + + return response.text; +} + +// Usage +const transcription = await transcribeAudio('./audio.wav'); +console.log('Transcription:', transcription); +``` + +### Transcribe Multiple Audio Files + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; + +async function transcribeBatch(audioFilePaths) { + const zai = await ZAI.create(); + const results = []; + + for (const filePath of audioFilePaths) { + try { + const audioFile = fs.readFileSync(filePath); + const base64Audio = audioFile.toString('base64'); + + const response = await zai.audio.asr.create({ + file_base64: base64Audio + }); + + results.push({ + file: filePath, + success: true, + transcription: response.text + }); + } catch (error) { + results.push({ + file: filePath, + success: false, + error: error.message + }); + } + } + + return results; +} + +// Usage +const files = ['./interview1.wav', './interview2.wav', './interview3.wav']; +const transcriptions = await transcribeBatch(files); + +transcriptions.forEach(result => { + if (result.success) { + console.log(`${result.file}: ${result.transcription}`); + } else { + console.error(`${result.file}: Error - ${result.error}`); + } +}); +``` + +## Advanced Use Cases + +### Audio File Processing with Metadata + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; +import path from 'path'; + +async function transcribeWithMetadata(audioFilePath) { + const zai = await ZAI.create(); + + // Get file metadata + const stats = fs.statSync(audioFilePath); + const audioFile = fs.readFileSync(audioFilePath); + const base64Audio = audioFile.toString('base64'); + + const startTime = Date.now(); + + const response = await zai.audio.asr.create({ + file_base64: base64Audio + }); + + const endTime = Date.now(); + + return { + filename: path.basename(audioFilePath), + filepath: audioFilePath, + fileSize: stats.size, + transcription: response.text, + wordCount: response.text.split(/\s+/).length, + processingTime: endTime - startTime, + timestamp: new Date().toISOString() + }; +} + +// Usage +const result = await transcribeWithMetadata('./meeting_recording.wav'); +console.log('Transcription Details:', JSON.stringify(result, null, 2)); +``` + +### Real-time Audio Processing Service + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; + +class ASRService { + constructor() { + this.zai = null; + this.transcriptionCache = new Map(); + } + + async initialize() { + this.zai = await ZAI.create(); + } + + generateCacheKey(audioBuffer) { + const crypto = require('crypto'); + return crypto.createHash('md5').update(audioBuffer).digest('hex'); + } + + async transcribe(audioFilePath, useCache = true) { + const audioBuffer = fs.readFileSync(audioFilePath); + const cacheKey = this.generateCacheKey(audioBuffer); + + // Check cache + if (useCache && this.transcriptionCache.has(cacheKey)) { + return { + transcription: this.transcriptionCache.get(cacheKey), + cached: true + }; + } + + // Transcribe audio + const base64Audio = audioBuffer.toString('base64'); + + const response = await this.zai.audio.asr.create({ + file_base64: base64Audio + }); + + // Cache result + if (useCache) { + this.transcriptionCache.set(cacheKey, response.text); + } + + return { + transcription: response.text, + cached: false + }; + } + + clearCache() { + this.transcriptionCache.clear(); + } + + getCacheSize() { + return this.transcriptionCache.size; + } +} + +// Usage +const asrService = new ASRService(); +await asrService.initialize(); + +const result1 = await asrService.transcribe('./audio.wav'); +console.log('First call (not cached):', result1); + +const result2 = await asrService.transcribe('./audio.wav'); +console.log('Second call (cached):', result2); +``` + +### Directory Transcription + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; +import path from 'path'; + +async function transcribeDirectory(directoryPath, outputJsonPath) { + const zai = await ZAI.create(); + + // Get all audio files + const files = fs.readdirSync(directoryPath); + const audioFiles = files.filter(file => + /\.(wav|mp3|m4a|flac|ogg)$/i.test(file) + ); + + const results = { + directory: directoryPath, + totalFiles: audioFiles.length, + processedAt: new Date().toISOString(), + transcriptions: [] + }; + + for (const filename of audioFiles) { + const filePath = path.join(directoryPath, filename); + + try { + const audioFile = fs.readFileSync(filePath); + const base64Audio = audioFile.toString('base64'); + + const response = await zai.audio.asr.create({ + file_base64: base64Audio + }); + + results.transcriptions.push({ + filename: filename, + success: true, + text: response.text, + wordCount: response.text.split(/\s+/).length + }); + + console.log(`✓ Transcribed: ${filename}`); + } catch (error) { + results.transcriptions.push({ + filename: filename, + success: false, + error: error.message + }); + + console.error(`✗ Failed: ${filename} - ${error.message}`); + } + } + + // Save results to JSON + fs.writeFileSync( + outputJsonPath, + JSON.stringify(results, null, 2) + ); + + return results; +} + +// Usage +const results = await transcribeDirectory( + './audio-recordings', + './transcriptions.json' +); + +console.log(`\nProcessed ${results.totalFiles} files`); +console.log(`Successful: ${results.transcriptions.filter(t => t.success).length}`); +console.log(`Failed: ${results.transcriptions.filter(t => !t.success).length}`); +``` + +## Best Practices + +### 1. Audio Format Handling + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; + +async function transcribeAnyFormat(audioFilePath) { + // Supported formats: WAV, MP3, M4A, FLAC, OGG, etc. + const validExtensions = ['.wav', '.mp3', '.m4a', '.flac', '.ogg']; + const ext = audioFilePath.toLowerCase().substring(audioFilePath.lastIndexOf('.')); + + if (!validExtensions.includes(ext)) { + throw new Error(`Unsupported audio format: ${ext}`); + } + + const zai = await ZAI.create(); + const audioFile = fs.readFileSync(audioFilePath); + const base64Audio = audioFile.toString('base64'); + + const response = await zai.audio.asr.create({ + file_base64: base64Audio + }); + + return response.text; +} +``` + +### 2. Error Handling + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; + +async function safeTranscribe(audioFilePath) { + try { + // Validate file exists + if (!fs.existsSync(audioFilePath)) { + throw new Error(`File not found: ${audioFilePath}`); + } + + // Check file size (e.g., limit to 100MB) + const stats = fs.statSync(audioFilePath); + const fileSizeMB = stats.size / (1024 * 1024); + + if (fileSizeMB > 100) { + throw new Error(`File too large: ${fileSizeMB.toFixed(2)}MB (max 100MB)`); + } + + // Transcribe + const zai = await ZAI.create(); + const audioFile = fs.readFileSync(audioFilePath); + const base64Audio = audioFile.toString('base64'); + + const response = await zai.audio.asr.create({ + file_base64: base64Audio + }); + + if (!response.text || response.text.trim().length === 0) { + throw new Error('Empty transcription result'); + } + + return { + success: true, + transcription: response.text, + filePath: audioFilePath, + fileSize: stats.size + }; + } catch (error) { + console.error('Transcription error:', error); + return { + success: false, + error: error.message, + filePath: audioFilePath + }; + } +} +``` + +### 3. Post-Processing Transcriptions + +```javascript +function cleanTranscription(text) { + // Remove excessive whitespace + text = text.replace(/\s+/g, ' ').trim(); + + // Capitalize first letter of sentences + text = text.replace(/(^\w|[.!?]\s+\w)/g, match => match.toUpperCase()); + + // Remove filler words (optional) + const fillers = ['um', 'uh', 'ah', 'like', 'you know']; + const fillerPattern = new RegExp(`\\b(${fillers.join('|')})\\b`, 'gi'); + text = text.replace(fillerPattern, '').replace(/\s+/g, ' '); + + return text; +} + +async function transcribeAndClean(audioFilePath) { + const zai = await ZAI.create(); + + const audioFile = fs.readFileSync(audioFilePath); + const base64Audio = audioFile.toString('base64'); + + const response = await zai.audio.asr.create({ + file_base64: base64Audio + }); + + return { + raw: response.text, + cleaned: cleanTranscription(response.text) + }; +} +``` + +## Common Use Cases + +1. **Meeting Transcription**: Convert recorded meetings into searchable text +2. **Interview Processing**: Transcribe interviews for analysis and documentation +3. **Podcast Transcription**: Create text versions of podcast episodes +4. **Voice Notes**: Convert voice memos to text for easier reference +5. **Call Center Analytics**: Analyze customer service calls +6. **Accessibility**: Provide text alternatives for audio content +7. **Voice Commands**: Enable voice-controlled applications +8. **Language Learning**: Transcribe pronunciation practice + +## Integration Examples + +### Express.js API Endpoint + +```javascript +import express from 'express'; +import multer from 'multer'; +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; + +const app = express(); +const upload = multer({ dest: 'uploads/' }); + +let zaiInstance; + +async function initZAI() { + zaiInstance = await ZAI.create(); +} + +app.post('/api/transcribe', upload.single('audio'), async (req, res) => { + try { + if (!req.file) { + return res.status(400).json({ error: 'No audio file provided' }); + } + + const audioFile = fs.readFileSync(req.file.path); + const base64Audio = audioFile.toString('base64'); + + const response = await zaiInstance.audio.asr.create({ + file_base64: base64Audio + }); + + // Clean up uploaded file + fs.unlinkSync(req.file.path); + + res.json({ + success: true, + transcription: response.text, + wordCount: response.text.split(/\s+/).length + }); + } catch (error) { + // Clean up on error + if (req.file && fs.existsSync(req.file.path)) { + fs.unlinkSync(req.file.path); + } + + res.status(500).json({ + success: false, + error: error.message + }); + } +}); + +initZAI().then(() => { + app.listen(3000, () => { + console.log('ASR API running on port 3000'); + }); +}); +``` + +## Troubleshooting + +**Issue**: "SDK must be used in backend" +- **Solution**: Ensure z-ai-web-dev-sdk is only imported in server-side code + +**Issue**: Empty or incorrect transcription +- **Solution**: Verify audio quality and format. Check if audio contains clear speech + +**Issue**: Large file processing fails +- **Solution**: Consider splitting large audio files into smaller segments + +**Issue**: Slow transcription speed +- **Solution**: Implement caching for repeated transcriptions, optimize file sizes + +**Issue**: Memory errors with large files +- **Solution**: Process files in chunks or increase Node.js memory limit + +## Performance Tips + +1. **Reuse SDK Instance**: Create once, use multiple times +2. **Implement Caching**: Cache transcriptions for duplicate files +3. **Batch Processing**: Process multiple files efficiently with proper queuing +4. **Audio Optimization**: Compress audio files before processing when possible +5. **Async Operations**: Use Promise.all for parallel processing when appropriate + +## Audio Quality Guidelines + +For best transcription results: +- **Sample Rate**: 16kHz or higher +- **Format**: WAV, MP3, or M4A recommended +- **Noise Level**: Minimize background noise +- **Speech Clarity**: Clear pronunciation and normal speaking pace +- **File Size**: Under 100MB recommended for individual files + +## Remember + +- Always use z-ai-web-dev-sdk in backend code only +- The SDK is already installed - import as shown in examples +- Audio files must be converted to base64 before processing +- Implement proper error handling for production applications +- Consider audio quality for best transcription accuracy +- Clean up temporary files after processing +- Cache results for frequently transcribed files diff --git a/skills/ASR/scripts/asr.ts b/skills/ASR/scripts/asr.ts new file mode 100755 index 0000000..5a39a39 --- /dev/null +++ b/skills/ASR/scripts/asr.ts @@ -0,0 +1,27 @@ +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; +import path from 'path'; + +async function main(inputFile: string) { + if (!fs.existsSync(inputFile)) { + console.error(`Audio file not found: ${inputFile}`); + return; + } + + try { + const zai = await ZAI.create(); + + const audioBuffer = fs.readFileSync(inputFile); + const file_base64 = audioBuffer.toString('base64'); + + const result = await zai.audio.asr.create({ file_base64 }); + + console.log('Transcription result:'); + console.log(result.text ?? JSON.stringify(result, null, 2)); + } catch (err: any) { + console.error('ASR failed:', err?.message || err); + } +} + +main('./output.wav'); + diff --git a/skills/LLM/LICENSE.txt b/skills/LLM/LICENSE.txt new file mode 100755 index 0000000..1e54539 --- /dev/null +++ b/skills/LLM/LICENSE.txt @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2025 z-ai-web-dev-sdk Skills + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/skills/LLM/SKILL.md b/skills/LLM/SKILL.md new file mode 100755 index 0000000..07e7ec0 --- /dev/null +++ b/skills/LLM/SKILL.md @@ -0,0 +1,856 @@ +--- +name: LLM +description: Implement large language model (LLM) chat completions using the z-ai-web-dev-sdk. Use this skill when the user needs to build conversational AI applications, chatbots, AI assistants, or any text generation features. Supports multi-turn conversations, system prompts, and context management. +license: MIT +--- + +# LLM (Large Language Model) Skill + +This skill guides the implementation of chat completions functionality using the z-ai-web-dev-sdk package, enabling powerful conversational AI and text generation capabilities. + +## Skills Path + +**Skill Location**: `{project_path}/skills/llm` + +this skill is located at above path in your project. + +**Reference Scripts**: Example test scripts are available in the `{Skill Location}/scripts/` directory for quick testing and reference. See `{Skill Location}/scripts/chat.ts` for a working example. + +## Overview + +The LLM skill allows you to build applications that leverage large language models for natural language understanding and generation, including chatbots, AI assistants, content generation, and more. + +**IMPORTANT**: z-ai-web-dev-sdk MUST be used in backend code only. Never use it in client-side code. + +## Prerequisites + +The z-ai-web-dev-sdk package is already installed. Import it as shown in the examples below. + +## CLI Usage (For Simple Tasks) + +For simple, one-off chat completions, you can use the z-ai CLI instead of writing code. This is ideal for quick tests, simple queries, or automation scripts. + +### Basic Chat + +```bash +# Simple question +z-ai chat --prompt "What is the capital of France?" + +# Save response to file +z-ai chat -p "Explain quantum computing" -o response.json + +# Stream the response +z-ai chat -p "Write a short poem" --stream +``` + +### With System Prompt + +```bash +# Custom system prompt for specific behavior +z-ai chat \ + --prompt "Review this code: function add(a,b) { return a+b; }" \ + --system "You are an expert code reviewer" \ + -o review.json +``` + +### With Thinking (Chain of Thought) + +```bash +# Enable thinking for complex reasoning +z-ai chat \ + --prompt "Solve this math problem: If a train travels 120km in 2 hours, what's its speed?" \ + --thinking \ + -o solution.json +``` + +### CLI Parameters + +- `--prompt, -p `: **Required** - User message content +- `--system, -s `: Optional - System prompt for custom behavior +- `--thinking, -t`: Optional - Enable chain-of-thought reasoning (default: disabled) +- `--output, -o `: Optional - Output file path (JSON format) +- `--stream`: Optional - Stream the response in real-time + +### When to Use CLI vs SDK + +**Use CLI for:** +- Quick one-off questions +- Simple automation scripts +- Testing prompts +- Single-turn conversations + +**Use SDK for:** +- Multi-turn conversations with context +- Custom conversation management +- Integration with web applications +- Complex chat workflows +- Production applications + +## Basic Chat Completions + +### Simple Question and Answer + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function askQuestion(question) { + const zai = await ZAI.create(); + + const completion = await zai.chat.completions.create({ + messages: [ + { + role: 'assistant', + content: 'You are a helpful assistant.' + }, + { + role: 'user', + content: question + } + ], + thinking: { type: 'disabled' } + }); + + const response = completion.choices[0]?.message?.content; + return response; +} + +// Usage +const answer = await askQuestion('What is the capital of France?'); +console.log('Answer:', answer); +``` + +### Custom System Prompt + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function customAssistant(systemPrompt, userMessage) { + const zai = await ZAI.create(); + + const completion = await zai.chat.completions.create({ + messages: [ + { + role: 'assistant', + content: systemPrompt + }, + { + role: 'user', + content: userMessage + } + ], + thinking: { type: 'disabled' } + }); + + return completion.choices[0]?.message?.content; +} + +// Usage - Code reviewer +const codeReview = await customAssistant( + 'You are an expert code reviewer. Analyze code for bugs, performance issues, and best practices.', + 'Review this function: function add(a, b) { return a + b; }' +); + +// Usage - Creative writer +const story = await customAssistant( + 'You are a creative fiction writer who writes engaging short stories.', + 'Write a short story about a robot learning to paint.' +); + +console.log(codeReview); +console.log(story); +``` + +## Multi-turn Conversations + +### Conversation History Management + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +class ConversationManager { + constructor(systemPrompt = 'You are a helpful assistant.') { + this.messages = [ + { + role: 'assistant', + content: systemPrompt + } + ]; + this.zai = null; + } + + async initialize() { + this.zai = await ZAI.create(); + } + + async sendMessage(userMessage) { + // Add user message to history + this.messages.push({ + role: 'user', + content: userMessage + }); + + // Get completion + const completion = await this.zai.chat.completions.create({ + messages: this.messages, + thinking: { type: 'disabled' } + }); + + const assistantResponse = completion.choices[0]?.message?.content; + + // Add assistant response to history + this.messages.push({ + role: 'assistant', + content: assistantResponse + }); + + return assistantResponse; + } + + getHistory() { + return this.messages; + } + + clearHistory(systemPrompt = 'You are a helpful assistant.') { + this.messages = [ + { + role: 'assistant', + content: systemPrompt + } + ]; + } + + getMessageCount() { + // Subtract 1 for system message + return this.messages.length - 1; + } +} + +// Usage +const conversation = new ConversationManager(); +await conversation.initialize(); + +const response1 = await conversation.sendMessage('Hi, my name is John.'); +console.log('AI:', response1); + +const response2 = await conversation.sendMessage('What is my name?'); +console.log('AI:', response2); // Should remember the name is John + +console.log('Total messages:', conversation.getMessageCount()); +``` + +### Context-Aware Conversations + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +class ContextualChat { + constructor() { + this.messages = []; + this.zai = null; + } + + async initialize() { + this.zai = await ZAI.create(); + } + + async startConversation(role, context) { + // Set up system prompt with context + const systemPrompt = `You are ${role}. Context: ${context}`; + + this.messages = [ + { + role: 'assistant', + content: systemPrompt + } + ]; + } + + async chat(userMessage) { + this.messages.push({ + role: 'user', + content: userMessage + }); + + const completion = await this.zai.chat.completions.create({ + messages: this.messages, + thinking: { type: 'disabled' } + }); + + const response = completion.choices[0]?.message?.content; + + this.messages.push({ + role: 'assistant', + content: response + }); + + return response; + } +} + +// Usage - Customer support scenario +const support = new ContextualChat(); +await support.initialize(); + +await support.startConversation( + 'a customer support agent for TechCorp', + 'The user has ordered product #12345 which is delayed due to shipping issues.' +); + +const reply1 = await support.chat('Where is my order?'); +console.log('Support:', reply1); + +const reply2 = await support.chat('Can I get a refund?'); +console.log('Support:', reply2); +``` + +## Advanced Use Cases + +### Content Generation + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +class ContentGenerator { + constructor() { + this.zai = null; + } + + async initialize() { + this.zai = await ZAI.create(); + } + + async generateBlogPost(topic, tone = 'professional') { + const completion = await this.zai.chat.completions.create({ + messages: [ + { + role: 'assistant', + content: `You are a professional content writer. Write in a ${tone} tone.` + }, + { + role: 'user', + content: `Write a blog post about: ${topic}. Include an introduction, main points, and conclusion.` + } + ], + thinking: { type: 'disabled' } + }); + + return completion.choices[0]?.message?.content; + } + + async generateProductDescription(productName, features) { + const completion = await this.zai.chat.completions.create({ + messages: [ + { + role: 'assistant', + content: 'You are an expert at writing compelling product descriptions for e-commerce.' + }, + { + role: 'user', + content: `Write a product description for "${productName}". Key features: ${features.join(', ')}.` + } + ], + thinking: { type: 'disabled' } + }); + + return completion.choices[0]?.message?.content; + } + + async generateEmailResponse(originalEmail, intent) { + const completion = await this.zai.chat.completions.create({ + messages: [ + { + role: 'assistant', + content: 'You are a professional email writer. Write clear, concise, and polite emails.' + }, + { + role: 'user', + content: `Original email: "${originalEmail}"\n\nWrite a ${intent} response.` + } + ], + thinking: { type: 'disabled' } + }); + + return completion.choices[0]?.message?.content; + } +} + +// Usage +const generator = new ContentGenerator(); +await generator.initialize(); + +const blogPost = await generator.generateBlogPost( + 'The Future of Artificial Intelligence', + 'informative' +); +console.log('Blog Post:', blogPost); + +const productDesc = await generator.generateProductDescription( + 'Smart Watch Pro', + ['Heart rate monitoring', 'GPS tracking', 'Waterproof', '7-day battery life'] +); +console.log('Product Description:', productDesc); +``` + +### Data Analysis and Summarization + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function analyzeData(data, analysisType) { + const zai = await ZAI.create(); + + const prompts = { + summarize: 'You are a data analyst. Summarize the key insights from the data.', + trend: 'You are a data analyst. Identify trends and patterns in the data.', + recommendation: 'You are a business analyst. Provide actionable recommendations based on the data.' + }; + + const completion = await zai.chat.completions.create({ + messages: [ + { + role: 'assistant', + content: prompts[analysisType] || prompts.summarize + }, + { + role: 'user', + content: `Analyze this data:\n\n${JSON.stringify(data, null, 2)}` + } + ], + thinking: { type: 'disabled' } + }); + + return completion.choices[0]?.message?.content; +} + +// Usage +const salesData = { + Q1: { revenue: 100000, customers: 250 }, + Q2: { revenue: 120000, customers: 280 }, + Q3: { revenue: 150000, customers: 320 }, + Q4: { revenue: 180000, customers: 380 } +}; + +const summary = await analyzeData(salesData, 'summarize'); +const trends = await analyzeData(salesData, 'trend'); +const recommendations = await analyzeData(salesData, 'recommendation'); + +console.log('Summary:', summary); +console.log('Trends:', trends); +console.log('Recommendations:', recommendations); +``` + +### Code Generation and Debugging + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +class CodeAssistant { + constructor() { + this.zai = null; + } + + async initialize() { + this.zai = await ZAI.create(); + } + + async generateCode(description, language) { + const completion = await this.zai.chat.completions.create({ + messages: [ + { + role: 'assistant', + content: `You are an expert ${language} programmer. Write clean, efficient, and well-commented code.` + }, + { + role: 'user', + content: `Write ${language} code to: ${description}` + } + ], + thinking: { type: 'disabled' } + }); + + return completion.choices[0]?.message?.content; + } + + async debugCode(code, issue) { + const completion = await this.zai.chat.completions.create({ + messages: [ + { + role: 'assistant', + content: 'You are an expert debugger. Identify bugs and suggest fixes.' + }, + { + role: 'user', + content: `Code:\n${code}\n\nIssue: ${issue}\n\nFind the bug and suggest a fix.` + } + ], + thinking: { type: 'disabled' } + }); + + return completion.choices[0]?.message?.content; + } + + async explainCode(code) { + const completion = await this.zai.chat.completions.create({ + messages: [ + { + role: 'assistant', + content: 'You are a programming teacher. Explain code clearly and simply.' + }, + { + role: 'user', + content: `Explain what this code does:\n\n${code}` + } + ], + thinking: { type: 'disabled' } + }); + + return completion.choices[0]?.message?.content; + } +} + +// Usage +const codeAssist = new CodeAssistant(); +await codeAssist.initialize(); + +const newCode = await codeAssist.generateCode( + 'Create a function that sorts an array of objects by a specific property', + 'JavaScript' +); +console.log('Generated Code:', newCode); + +const bugFix = await codeAssist.debugCode( + 'function add(a, b) { return a - b; }', + 'This function should add numbers but returns wrong results' +); +console.log('Debug Suggestion:', bugFix); +``` + +## Best Practices + +### 1. Prompt Engineering + +```javascript +// Bad: Vague prompt +const bad = await askQuestion('Tell me about AI'); + +// Good: Specific and structured prompt +async function askWithContext(topic, format, audience) { + const zai = await ZAI.create(); + + const completion = await zai.chat.completions.create({ + messages: [ + { + role: 'assistant', + content: `You are an expert educator. Explain topics clearly for ${audience}.` + }, + { + role: 'user', + content: `Explain ${topic} in ${format} format. Include practical examples.` + } + ], + thinking: { type: 'disabled' } + }); + + return completion.choices[0]?.message?.content; +} + +const good = await askWithContext('artificial intelligence', 'bullet points', 'beginners'); +``` + +### 2. Error Handling + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function safeCompletion(messages, retries = 3) { + let lastError; + + for (let attempt = 1; attempt <= retries; attempt++) { + try { + const zai = await ZAI.create(); + + const completion = await zai.chat.completions.create({ + messages: messages, + thinking: { type: 'disabled' } + }); + + const response = completion.choices[0]?.message?.content; + + if (!response || response.trim().length === 0) { + throw new Error('Empty response from AI'); + } + + return { + success: true, + content: response, + attempts: attempt + }; + } catch (error) { + lastError = error; + console.error(`Attempt ${attempt} failed:`, error.message); + + if (attempt < retries) { + // Wait before retry (exponential backoff) + await new Promise(resolve => setTimeout(resolve, 1000 * attempt)); + } + } + } + + return { + success: false, + error: lastError.message, + attempts: retries + }; +} +``` + +### 3. Context Management + +```javascript +class ManagedConversation { + constructor(maxMessages = 20) { + this.maxMessages = maxMessages; + this.systemPrompt = ''; + this.messages = []; + this.zai = null; + } + + async initialize(systemPrompt) { + this.zai = await ZAI.create(); + this.systemPrompt = systemPrompt; + this.messages = [ + { + role: 'assistant', + content: systemPrompt + } + ]; + } + + async chat(userMessage) { + // Add user message + this.messages.push({ + role: 'user', + content: userMessage + }); + + // Trim old messages if exceeding limit (keep system prompt) + if (this.messages.length > this.maxMessages) { + this.messages = [ + this.messages[0], // Keep system prompt + ...this.messages.slice(-(this.maxMessages - 1)) + ]; + } + + const completion = await this.zai.chat.completions.create({ + messages: this.messages, + thinking: { type: 'disabled' } + }); + + const response = completion.choices[0]?.message?.content; + + this.messages.push({ + role: 'assistant', + content: response + }); + + return response; + } + + getTokenEstimate() { + // Rough estimate: ~4 characters per token + const totalChars = this.messages + .map(m => m.content.length) + .reduce((a, b) => a + b, 0); + return Math.ceil(totalChars / 4); + } +} +``` + +### 4. Response Processing + +```javascript +async function getStructuredResponse(query, format = 'json') { + const zai = await ZAI.create(); + + const formatInstructions = { + json: 'Respond with valid JSON only. No additional text.', + list: 'Respond with a numbered list.', + markdown: 'Respond in Markdown format.' + }; + + const completion = await zai.chat.completions.create({ + messages: [ + { + role: 'assistant', + content: `You are a helpful assistant. ${formatInstructions[format]}` + }, + { + role: 'user', + content: query + } + ], + thinking: { type: 'disabled' } + }); + + const response = completion.choices[0]?.message?.content; + + // Parse JSON if requested + if (format === 'json') { + try { + return JSON.parse(response); + } catch (e) { + console.error('Failed to parse JSON response'); + return { raw: response }; + } + } + + return response; +} + +// Usage +const jsonData = await getStructuredResponse( + 'List three programming languages with their primary use cases', + 'json' +); +console.log(jsonData); +``` + +## Common Use Cases + +1. **Chatbots & Virtual Assistants**: Build conversational interfaces for customer support +2. **Content Generation**: Create articles, product descriptions, marketing copy +3. **Code Assistance**: Generate, explain, and debug code +4. **Data Analysis**: Analyze and summarize complex data sets +5. **Language Translation**: Translate text between languages +6. **Educational Tools**: Create tutoring and learning applications +7. **Email Automation**: Generate professional email responses +8. **Creative Writing**: Story generation, poetry, and creative content + +## Integration Examples + +### Express.js Chatbot API + +```javascript +import express from 'express'; +import ZAI from 'z-ai-web-dev-sdk'; + +const app = express(); +app.use(express.json()); + +// Store conversations in memory (use database in production) +const conversations = new Map(); + +let zaiInstance; + +async function initZAI() { + zaiInstance = await ZAI.create(); +} + +app.post('/api/chat', async (req, res) => { + try { + const { sessionId, message, systemPrompt } = req.body; + + if (!message) { + return res.status(400).json({ error: 'Message is required' }); + } + + // Get or create conversation history + let history = conversations.get(sessionId) || [ + { + role: 'assistant', + content: systemPrompt || 'You are a helpful assistant.' + } + ]; + + // Add user message + history.push({ + role: 'user', + content: message + }); + + // Get completion + const completion = await zaiInstance.chat.completions.create({ + messages: history, + thinking: { type: 'disabled' } + }); + + const aiResponse = completion.choices[0]?.message?.content; + + // Add AI response to history + history.push({ + role: 'assistant', + content: aiResponse + }); + + // Save updated history + conversations.set(sessionId, history); + + res.json({ + success: true, + response: aiResponse, + messageCount: history.length - 1 + }); + } catch (error) { + res.status(500).json({ + success: false, + error: error.message + }); + } +}); + +app.delete('/api/chat/:sessionId', (req, res) => { + const { sessionId } = req.params; + conversations.delete(sessionId); + res.json({ success: true, message: 'Conversation cleared' }); +}); + +initZAI().then(() => { + app.listen(3000, () => { + console.log('Chatbot API running on port 3000'); + }); +}); +``` + +## Troubleshooting + +**Issue**: "SDK must be used in backend" +- **Solution**: Ensure z-ai-web-dev-sdk is only imported and used in server-side code + +**Issue**: Empty or incomplete responses +- **Solution**: Check that completion.choices[0]?.message?.content exists and is not empty + +**Issue**: Conversation context getting too long +- **Solution**: Implement message trimming to keep only recent messages + +**Issue**: Inconsistent responses +- **Solution**: Use more specific system prompts and provide clear instructions + +**Issue**: Rate limiting errors +- **Solution**: Implement retry logic with exponential backoff + +## Performance Tips + +1. **Reuse SDK Instance**: Create ZAI instance once and reuse across requests +2. **Manage Context Length**: Trim old messages to avoid token limits +3. **Implement Caching**: Cache responses for common queries +4. **Use Specific Prompts**: Clear prompts lead to faster, better responses +5. **Handle Errors Gracefully**: Implement retry logic and fallback responses + +## Security Considerations + +1. **Input Validation**: Always validate and sanitize user input +2. **Rate Limiting**: Implement rate limits to prevent abuse +3. **API Key Protection**: Never expose SDK credentials in client-side code +4. **Content Filtering**: Filter sensitive or inappropriate content +5. **Session Management**: Implement proper session handling and cleanup + +## Remember + +- Always use z-ai-web-dev-sdk in backend code only +- The SDK is already installed - import as shown in examples +- Use the 'assistant' role for system prompts +- Set thinking to { type: 'disabled' } for standard completions +- Implement proper error handling and retries for production +- Manage conversation history to avoid token limits +- Clear and specific prompts lead to better results +- Check `scripts/chat.ts` for a quick start example diff --git a/skills/LLM/scripts/chat.ts b/skills/LLM/scripts/chat.ts new file mode 100755 index 0000000..046fd59 --- /dev/null +++ b/skills/LLM/scripts/chat.ts @@ -0,0 +1,32 @@ +import ZAI, { ChatMessage } from "z-ai-web-dev-sdk"; + +async function main(prompt: string) { + try { + const zai = await ZAI.create(); + + const messages: ChatMessage[] = [ + { + role: "assistant", + content: "Hi, I'm a helpful assistant." + }, + { + role: "user", + content: prompt, + }, + ]; + + const response = await zai.chat.completions.create({ + messages, + stream: false, + thinking: { type: "disabled" }, + }); + + const reply = response.choices?.[0]?.message?.content; + console.log("Chat reply:"); + console.log(reply ?? JSON.stringify(response, null, 2)); + } catch (err: any) { + console.error("Chat failed:", err?.message || err); + } +} + +main('What is the capital of France?'); diff --git a/skills/TTS/LICENSE.txt b/skills/TTS/LICENSE.txt new file mode 100755 index 0000000..1e54539 --- /dev/null +++ b/skills/TTS/LICENSE.txt @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2025 z-ai-web-dev-sdk Skills + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/skills/TTS/SKILL.md b/skills/TTS/SKILL.md new file mode 100755 index 0000000..d92d225 --- /dev/null +++ b/skills/TTS/SKILL.md @@ -0,0 +1,735 @@ +--- +name: TTS +description: Implement text-to-speech (TTS) capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to convert text into natural-sounding speech, create audio content, build voice-enabled applications, or generate spoken audio files. Supports multiple voices, adjustable speed, and various audio formats. +license: MIT +--- + +# TTS (Text to Speech) Skill + +This skill guides the implementation of text-to-speech (TTS) functionality using the z-ai-web-dev-sdk package, enabling conversion of text into natural-sounding speech audio. + +## Skills Path + +**Skill Location**: `{project_path}/skills/TTS` + +This skill is located at the above path in your project. + +**Reference Scripts**: Example test scripts are available in the `{Skill Location}/scripts/` directory for quick testing and reference. See `{Skill Location}/scripts/tts.ts` for a working example. + +## Overview + +Text-to-Speech allows you to build applications that generate spoken audio from text input, supporting various voices, speeds, and output formats for diverse use cases. + +**IMPORTANT**: z-ai-web-dev-sdk MUST be used in backend code only. Never use it in client-side code. + +## API Limitations and Constraints + +Before implementing TTS functionality, be aware of these important limitations: + +### Input Text Constraints +- **Maximum length**: 1024 characters per request +- Text exceeding this limit must be split into smaller chunks + +### Audio Parameters +- **Speed range**: 0.5 to 2.0 + - 0.5 = half speed (slower) + - 1.0 = normal speed (default) + - 2.0 = double speed (faster) +- **Volume range**: Greater than 0, up to 10 + - Default: 1.0 + - Values must be greater than 0 (exclusive) and up to 10 (inclusive) + +### Format and Streaming +- **Streaming limitation**: When `stream: true` is enabled, only `pcm` format is supported +- **Non-streaming**: Supports `wav`, `pcm`, and `mp3` formats +- **Sample rate**: 24000 Hz (recommended) + +### Best Practice for Long Text +```javascript +function splitTextIntoChunks(text, maxLength = 1000) { + const chunks = []; + const sentences = text.match(/[^.!?]+[.!?]+/g) || [text]; + + let currentChunk = ''; + for (const sentence of sentences) { + if ((currentChunk + sentence).length <= maxLength) { + currentChunk += sentence; + } else { + if (currentChunk) chunks.push(currentChunk.trim()); + currentChunk = sentence; + } + } + if (currentChunk) chunks.push(currentChunk.trim()); + + return chunks; +} +``` + +## Prerequisites + +The z-ai-web-dev-sdk package is already installed. Import it as shown in the examples below. + +## CLI Usage (For Simple Tasks) + +For simple text-to-speech conversions, you can use the z-ai CLI instead of writing code. This is ideal for quick audio generation, testing voices, or simple automation. + +### Basic TTS + +```bash +# Convert text to speech (default WAV format) +z-ai tts --input "Hello, world" --output ./hello.wav + +# Using short options +z-ai tts -i "Hello, world" -o ./hello.wav +``` + +### Different Voices and Speed + +```bash +# Use specific voice +z-ai tts -i "Welcome to our service" -o ./welcome.wav --voice tongtong + +# Adjust speech speed (0.5-2.0) +z-ai tts -i "This is faster speech" -o ./fast.wav --speed 1.5 + +# Slower speech +z-ai tts -i "This is slower speech" -o ./slow.wav --speed 0.8 +``` + +### Different Output Formats + +```bash +# MP3 format +z-ai tts -i "Hello World" -o ./hello.mp3 --format mp3 + +# WAV format (default) +z-ai tts -i "Hello World" -o ./hello.wav --format wav + +# PCM format +z-ai tts -i "Hello World" -o ./hello.pcm --format pcm +``` + +### Streaming Output + +```bash +# Stream audio generation +z-ai tts -i "This is a longer text that will be streamed" -o ./stream.wav --stream +``` + +### CLI Parameters + +- `--input, -i `: **Required** - Text to convert to speech (max 1024 characters) +- `--output, -o `: **Required** - Output audio file path +- `--voice, -v `: Optional - Voice type (default: tongtong) +- `--speed, -s `: Optional - Speech speed, 0.5-2.0 (default: 1.0) +- `--format, -f `: Optional - Output format: wav, mp3, pcm (default: wav) +- `--stream`: Optional - Enable streaming output (only supports pcm format) + +### When to Use CLI vs SDK + +**Use CLI for:** +- Quick text-to-speech conversions +- Testing different voices and speeds +- Simple batch audio generation +- Command-line automation scripts + +**Use SDK for:** +- Dynamic audio generation in applications +- Integration with web services +- Custom audio processing pipelines +- Production applications with complex requirements + +## Basic TTS Implementation + +### Simple Text to Speech + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; + +async function textToSpeech(text, outputPath) { + const zai = await ZAI.create(); + + const response = await zai.audio.tts.create({ + input: text, + voice: 'tongtong', + speed: 1.0, + response_format: 'wav', + stream: false + }); + + // Get array buffer from Response object + const arrayBuffer = await response.arrayBuffer(); + const buffer = Buffer.from(new Uint8Array(arrayBuffer)); + + fs.writeFileSync(outputPath, buffer); + console.log(`Audio saved to ${outputPath}`); + return outputPath; +} + +// Usage +await textToSpeech('Hello, world!', './output.wav'); +``` + +### Multiple Voice Options + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; + +async function generateWithVoice(text, voice, outputPath) { + const zai = await ZAI.create(); + + const response = await zai.audio.tts.create({ + input: text, + voice: voice, // Available voices: tongtong, chuichui, xiaochen, jam, kazi, douji, luodo + speed: 1.0, + response_format: 'wav', + stream: false + }); + + // Get array buffer from Response object + const arrayBuffer = await response.arrayBuffer(); + const buffer = Buffer.from(new Uint8Array(arrayBuffer)); + + fs.writeFileSync(outputPath, buffer); + return outputPath; +} + +// Usage +await generateWithVoice('Welcome to our service', 'tongtong', './welcome.wav'); +``` + +### Adjustable Speed + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; + +async function generateWithSpeed(text, speed, outputPath) { + const zai = await ZAI.create(); + + // Speed range: 0.5 to 2.0 (API constraint) + // 0.5 = half speed (slower) + // 1.0 = normal speed (default) + // 2.0 = double speed (faster) + // Values outside this range will cause API errors + + const response = await zai.audio.tts.create({ + input: text, + voice: 'tongtong', + speed: speed, + response_format: 'wav', + stream: false + }); + + // Get array buffer from Response object + const arrayBuffer = await response.arrayBuffer(); + const buffer = Buffer.from(new Uint8Array(arrayBuffer)); + + fs.writeFileSync(outputPath, buffer); + return outputPath; +} + +// Usage - slower narration +await generateWithSpeed('This is an important announcement', 0.8, './slow.wav'); + +// Usage - faster narration +await generateWithSpeed('Quick update', 1.3, './fast.wav'); +``` + +### Adjustable Volume + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; + +async function generateWithVolume(text, volume, outputPath) { + const zai = await ZAI.create(); + + // Volume range: greater than 0, up to 10 (API constraint) + // Values must be > 0 (exclusive) and <= 10 (inclusive) + // Default: 1.0 (normal volume) + + const response = await zai.audio.tts.create({ + input: text, + voice: 'tongtong', + speed: 1.0, + volume: volume, // Optional parameter + response_format: 'wav', + stream: false + }); + + // Get array buffer from Response object + const arrayBuffer = await response.arrayBuffer(); + const buffer = Buffer.from(new Uint8Array(arrayBuffer)); + + fs.writeFileSync(outputPath, buffer); + return outputPath; +} + +// Usage - louder audio +await generateWithVolume('This is an announcement', 5.0, './loud.wav'); + +// Usage - quieter audio +await generateWithVolume('Whispered message', 0.5, './quiet.wav'); +``` + +## Advanced Use Cases + +### Batch Processing + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; +import path from 'path'; + +async function batchTextToSpeech(textArray, outputDir) { + const zai = await ZAI.create(); + const results = []; + + // Ensure output directory exists + if (!fs.existsSync(outputDir)) { + fs.mkdirSync(outputDir, { recursive: true }); + } + + for (let i = 0; i < textArray.length; i++) { + try { + const text = textArray[i]; + const outputPath = path.join(outputDir, `audio_${i + 1}.wav`); + + const response = await zai.audio.tts.create({ + input: text, + voice: 'tongtong', + speed: 1.0, + response_format: 'wav', + stream: false + }); + + // Get array buffer from Response object + const arrayBuffer = await response.arrayBuffer(); + const buffer = Buffer.from(new Uint8Array(arrayBuffer)); + + fs.writeFileSync(outputPath, buffer); + results.push({ + success: true, + text, + path: outputPath + }); + } catch (error) { + results.push({ + success: false, + text: textArray[i], + error: error.message + }); + } + } + + return results; +} + +// Usage +const texts = [ + 'Welcome to chapter one', + 'Welcome to chapter two', + 'Welcome to chapter three' +]; + +const results = await batchTextToSpeech(texts, './audio-output'); +console.log('Generated:', results.length, 'audio files'); +``` + +### Dynamic Content Generation + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; + +class TTSGenerator { + constructor() { + this.zai = null; + } + + async initialize() { + this.zai = await ZAI.create(); + } + + async generateAudio(text, options = {}) { + const { + voice = 'tongtong', + speed = 1.0, + format = 'wav' + } = options; + + const response = await this.zai.audio.tts.create({ + input: text, + voice: voice, + speed: speed, + response_format: format, + stream: false + }); + + // Get array buffer from Response object + const arrayBuffer = await response.arrayBuffer(); + return Buffer.from(new Uint8Array(arrayBuffer)); + } + + async saveAudio(text, outputPath, options = {}) { + const buffer = await this.generateAudio(text, options); + if (buffer) { + fs.writeFileSync(outputPath, buffer); + return outputPath; + } + return null; + } +} + +// Usage +const generator = new TTSGenerator(); +await generator.initialize(); + +await generator.saveAudio( + 'Hello, this is a test', + './output.wav', + { speed: 1.2 } +); +``` + +### Next.js API Route Example + +```javascript +import { NextRequest, NextResponse } from 'next/server'; + +export async function POST(req: NextRequest) { + try { + const { text, voice = 'tongtong', speed = 1.0 } = await req.json(); + + // Import ZAI SDK + const ZAI = (await import('z-ai-web-dev-sdk')).default; + + // Create SDK instance + const zai = await ZAI.create(); + + // Generate TTS audio + const response = await zai.audio.tts.create({ + input: text.trim(), + voice: voice, + speed: speed, + response_format: 'wav', + stream: false, + }); + + // Get array buffer from Response object + const arrayBuffer = await response.arrayBuffer(); + const buffer = Buffer.from(new Uint8Array(arrayBuffer)); + + // Return audio as response + return new NextResponse(buffer, { + status: 200, + headers: { + 'Content-Type': 'audio/wav', + 'Content-Length': buffer.length.toString(), + 'Cache-Control': 'no-cache', + }, + }); + } catch (error) { + console.error('TTS API Error:', error); + + return NextResponse.json( + { + error: error instanceof Error ? error.message : '生成语音失败,请稍后重试', + }, + { status: 500 } + ); + } +} +``` + +## Best Practices + +### 1. Text Preparation +```javascript +function prepareTextForTTS(text) { + // Remove excessive whitespace + text = text.replace(/\s+/g, ' ').trim(); + + // Expand common abbreviations for better pronunciation + const abbreviations = { + 'Dr.': 'Doctor', + 'Mr.': 'Mister', + 'Mrs.': 'Misses', + 'etc.': 'et cetera' + }; + + for (const [abbr, full] of Object.entries(abbreviations)) { + text = text.replace(new RegExp(abbr, 'g'), full); + } + + return text; +} +``` + +### 2. Error Handling +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; + +async function safeTTS(text, outputPath) { + try { + // Validate input + if (!text || text.trim().length === 0) { + throw new Error('Text input cannot be empty'); + } + + if (text.length > 1024) { + throw new Error('Text input exceeds maximum length of 1024 characters'); + } + + const zai = await ZAI.create(); + + const response = await zai.audio.tts.create({ + input: text, + voice: 'tongtong', + speed: 1.0, + response_format: 'wav', + stream: false + }); + + // Get array buffer from Response object + const arrayBuffer = await response.arrayBuffer(); + const buffer = Buffer.from(new Uint8Array(arrayBuffer)); + + fs.writeFileSync(outputPath, buffer); + + return { + success: true, + path: outputPath, + size: buffer.length + }; + } catch (error) { + console.error('TTS Error:', error); + return { + success: false, + error: error.message + }; + } +} +``` + +### 3. SDK Instance Reuse + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +// Create a singleton instance +let zaiInstance = null; + +async function getZAIInstance() { + if (!zaiInstance) { + zaiInstance = await ZAI.create(); + } + return zaiInstance; +} + +// Usage +const zai = await getZAIInstance(); +const response = await zai.audio.tts.create({ ... }); +``` + +## Common Use Cases + +1. **Audiobooks & Podcasts**: Convert written content to audio format +2. **E-learning**: Create narration for educational content +3. **Accessibility**: Provide audio versions of text content +4. **Voice Assistants**: Generate dynamic responses +5. **Announcements**: Create automated audio notifications +6. **IVR Systems**: Generate phone system prompts +7. **Content Localization**: Create audio in different languages + +## Integration Examples + +### Express.js API Endpoint + +```javascript +import express from 'express'; +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; +import path from 'path'; + +const app = express(); +app.use(express.json()); + +let zaiInstance; +const outputDir = './audio-output'; + +async function initZAI() { + zaiInstance = await ZAI.create(); + if (!fs.existsSync(outputDir)) { + fs.mkdirSync(outputDir, { recursive: true }); + } +} + +app.post('/api/tts', async (req, res) => { + try { + const { text, voice = 'tongtong', speed = 1.0 } = req.body; + + if (!text) { + return res.status(400).json({ error: 'Text is required' }); + } + + const filename = `tts_${Date.now()}.wav`; + const outputPath = path.join(outputDir, filename); + + const response = await zaiInstance.audio.tts.create({ + input: text, + voice: voice, + speed: speed, + response_format: 'wav', + stream: false + }); + + // Get array buffer from Response object + const arrayBuffer = await response.arrayBuffer(); + const buffer = Buffer.from(new Uint8Array(arrayBuffer)); + + fs.writeFileSync(outputPath, buffer); + + res.json({ + success: true, + audioUrl: `/audio/${filename}`, + size: buffer.length + }); + } catch (error) { + res.status(500).json({ error: error.message }); + } +}); + +app.use('/audio', express.static('audio-output')); + +initZAI().then(() => { + app.listen(3000, () => { + console.log('TTS API running on port 3000'); + }); +}); +``` + +## Troubleshooting + +**Issue**: "Input text exceeds maximum length" +- **Solution**: Text input is limited to 1024 characters. Split longer text into chunks using the `splitTextIntoChunks` function shown in the API Limitations section + +**Issue**: "Invalid speed parameter" or unexpected speed behavior +- **Solution**: Speed must be between 0.5 and 2.0. Check your speed value is within this range + +**Issue**: "Invalid volume parameter" +- **Solution**: Volume must be greater than 0 and up to 10. Ensure volume value is in range (0, 10] + +**Issue**: "Stream format not supported" with WAV/MP3 +- **Solution**: Streaming mode only supports PCM format. Either use `response_format: 'pcm'` with streaming, or disable streaming (`stream: false`) for WAV/MP3 output + +**Issue**: "SDK must be used in backend" +- **Solution**: Ensure z-ai-web-dev-sdk is only imported in server-side code + +**Issue**: "TypeError: response.audio is undefined" +- **Solution**: The SDK returns a standard Response object, use `await response.arrayBuffer()` instead of accessing `response.audio` + +**Issue**: Generated audio file is empty or corrupted +- **Solution**: Ensure you're calling `await response.arrayBuffer()` and properly converting to Buffer: `Buffer.from(new Uint8Array(arrayBuffer))` + +**Issue**: Audio sounds unnatural +- **Solution**: Prepare text properly (remove special characters, expand abbreviations) + +**Issue**: Long processing times +- **Solution**: Break long text into smaller chunks and process in parallel + +**Issue**: Next.js caching old API route +- **Solution**: Create a new API route endpoint or restart the dev server + +## Performance Tips + +1. **Reuse SDK Instance**: Create ZAI instance once and reuse +2. **Implement Caching**: Cache generated audio for repeated text +3. **Batch Processing**: Process multiple texts efficiently +4. **Optimize Text**: Remove unnecessary content before generation +5. **Async Processing**: Use queues for handling multiple requests + +## Important Notes + +### API Constraints + +**Input Text Length**: Maximum 1024 characters per request. For longer text: +```javascript +// Split long text into chunks +const longText = "..."; // Your long text here +const chunks = splitTextIntoChunks(longText, 1000); + +for (const chunk of chunks) { + const response = await zai.audio.tts.create({ + input: chunk, + voice: 'tongtong', + speed: 1.0, + response_format: 'wav', + stream: false + }); + // Process each chunk... +} +``` + +**Streaming Format Limitation**: When using `stream: true`, only `pcm` format is supported. For `wav` or `mp3` output, use `stream: false`. + +**Sample Rate**: Audio is generated at 24000 Hz sample rate (recommended setting for playback). + +### Response Object Format + +The `zai.audio.tts.create()` method returns a standard **Response** object (not a custom object with an `audio` property). Always use: + +```javascript +// ✅ CORRECT +const response = await zai.audio.tts.create({ ... }); +const arrayBuffer = await response.arrayBuffer(); +const buffer = Buffer.from(new Uint8Array(arrayBuffer)); + +// ❌ WRONG - This will not work +const response = await zai.audio.tts.create({ ... }); +const buffer = Buffer.from(response.audio); // response.audio is undefined +``` + +### Available Voices + +- `tongtong` - 温暖亲切 +- `chuichui` - 活泼可爱 +- `xiaochen` - 沉稳专业 +- `jam` - 英音绅士 +- `kazi` - 清晰标准 +- `douji` - 自然流畅 +- `luodo` - 富有感染力 + +### Speed Range + +- Minimum: `0.5` (half speed) +- Default: `1.0` (normal speed) +- Maximum: `2.0` (double speed) + +**Important**: Speed values outside the range [0.5, 2.0] will result in API errors. + +### Volume Range + +- Minimum: Greater than `0` (exclusive) +- Default: `1.0` (normal volume) +- Maximum: `10` (inclusive) + +**Note**: Volume parameter is optional. When not specified, defaults to 1.0. + +## Remember + +- Always use z-ai-web-dev-sdk in backend code only +- **Input text is limited to 1024 characters maximum** - split longer text into chunks +- **Speed must be between 0.5 and 2.0** - values outside this range will cause errors +- **Volume must be greater than 0 and up to 10** - optional parameter with default 1.0 +- **Streaming only supports PCM format** - use non-streaming for WAV or MP3 output +- The SDK returns a standard Response object - use `await response.arrayBuffer()` +- Convert ArrayBuffer to Buffer using `Buffer.from(new Uint8Array(arrayBuffer))` +- Handle audio buffers properly when saving to files +- Implement error handling for production applications +- Consider caching for frequently generated content +- Clean up old audio files periodically to manage storage diff --git a/skills/TTS/tts.ts b/skills/TTS/tts.ts new file mode 100755 index 0000000..14f6de7 --- /dev/null +++ b/skills/TTS/tts.ts @@ -0,0 +1,25 @@ +import ZAI from "z-ai-web-dev-sdk"; +import fs from "fs"; + +async function main(text: string, outFile: string) { + try { + const zai = await ZAI.create(); + + const response = await zai.audio.tts.create({ + input: text, + voice: "tongtong", + speed: 1.0, + response_format: "wav", + stream: false, + }); + + const arrayBuffer = await response.arrayBuffer(); + const buffer = Buffer.from(new Uint8Array(arrayBuffer)); + fs.writeFileSync(outFile, buffer); + console.log(`TTS audio saved to ${outFile}`); + } catch (err: any) { + console.error("TTS failed:", err?.message || err); + } +} + +main("Hello, world!", "./output.wav"); diff --git a/skills/VLM/LICENSE.txt b/skills/VLM/LICENSE.txt new file mode 100755 index 0000000..1e54539 --- /dev/null +++ b/skills/VLM/LICENSE.txt @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2025 z-ai-web-dev-sdk Skills + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/skills/VLM/SKILL.md b/skills/VLM/SKILL.md new file mode 100755 index 0000000..67b995b --- /dev/null +++ b/skills/VLM/SKILL.md @@ -0,0 +1,588 @@ +--- +name: VLM +description: Implement vision-based AI chat capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to analyze images, describe visual content, or create applications that combine image understanding with conversational AI. Supports image URLs and base64 encoded images for multimodal interactions. +license: MIT +--- + +# VLM(Vision Chat) Skill + +This skill guides the implementation of vision chat functionality using the z-ai-web-dev-sdk package, enabling AI models to understand and respond to images combined with text prompts. + +## Skills Path + +**Skill Location**: `{project_path}/skills/VLM` + +this skill is located at above path in your project. + +**Reference Scripts**: Example test scripts are available in the `{Skill Location}/scripts/` directory for quick testing and reference. See `{Skill Location}/scripts/vlm.ts` for a working example. + +## Overview + +Vision Chat allows you to build applications that can analyze images, extract information from visual content, and answer questions about images through natural language conversation. + +**IMPORTANT**: z-ai-web-dev-sdk MUST be used in backend code only. Never use it in client-side code. + +## Prerequisites + +The z-ai-web-dev-sdk package is already installed. Import it as shown in the examples below. + +## CLI Usage (For Simple Tasks) + +For simple image analysis tasks, you can use the z-ai CLI instead of writing code. This is ideal for quick image descriptions, testing vision capabilities, or simple automation. + +### Basic Image Analysis + +```bash +# Describe an image from URL +z-ai vision --prompt "What's in this image?" --image "https://example.com/photo.jpg" + +# Using short options +z-ai vision -p "Describe this image" -i "https://example.com/image.png" +``` + +### Analyze Local Images + +```bash +# Analyze a local image file +z-ai vision -p "What objects are in this photo?" -i "./photo.jpg" + +# Save response to file +z-ai vision -p "Describe the scene" -i "./landscape.png" -o description.json +``` + +### Multiple Images + +```bash +# Analyze multiple images at once +z-ai vision \ + -p "Compare these two images" \ + -i "./photo1.jpg" \ + -i "./photo2.jpg" \ + -o comparison.json + +# Multiple images with detailed analysis +z-ai vision \ + --prompt "What are the differences between these images?" \ + --image "https://example.com/before.jpg" \ + --image "https://example.com/after.jpg" +``` + +### With Thinking (Chain of Thought) + +```bash +# Enable thinking for complex visual reasoning +z-ai vision \ + -p "Count the number of people in this image and describe their activities" \ + -i "./crowd.jpg" \ + --thinking \ + -o analysis.json +``` + +### Streaming Output + +```bash +# Stream the vision analysis +z-ai vision -p "Describe this image in detail" -i "./photo.jpg" --stream +``` + +### CLI Parameters + +- `--prompt, -p `: **Required** - Question or instruction about the image(s) +- `--image, -i `: Optional - Image URL or local file path (can be used multiple times) +- `--thinking, -t`: Optional - Enable chain-of-thought reasoning (default: disabled) +- `--output, -o `: Optional - Output file path (JSON format) +- `--stream`: Optional - Stream the response in real-time + +### Supported Image Formats + +- PNG (.png) +- JPEG (.jpg, .jpeg) +- GIF (.gif) +- WebP (.webp) +- BMP (.bmp) + +### When to Use CLI vs SDK + +**Use CLI for:** +- Quick image analysis +- Testing vision model capabilities +- One-off image descriptions +- Simple automation scripts + +**Use SDK for:** +- Multi-turn conversations with images +- Dynamic image analysis in applications +- Batch processing with custom logic +- Production applications with complex workflows + +## Recommended Approach + +For better performance and reliability, use base64 encoding to pass images to the model instead of image URLs. + +## Supported Content Types + +The Vision Chat API supports three types of media content: + +### 1. **image_url** - For Image Files +Use this type for static images (PNG, JPEG, GIF, WebP, etc.) +```typescript +{ + role: 'user', + content: [ + { type: 'text', text: prompt }, + { type: 'image_url', image_url: { url: imageUrl } } + ] +} +``` + +### 2. **video_url** - For Video Files +Use this type for video content (MP4, AVI, MOV, etc.) +```typescript +{ + role: 'user', + content: [ + { type: 'text', text: prompt }, + { type: 'video_url', video_url: { url: videoUrl } } + ] +} +``` + +### 3. **file_url** - For Document Files +Use this type for document files (PDF, DOCX, TXT, etc.) +```typescript +{ + role: 'user', + content: [ + { type: 'text', text: prompt }, + { type: 'file_url', file_url: { url: fileUrl } } + ] +} +``` + +**Note**: You can combine multiple content types in a single message. For example, you can include both text and multiple images, or text with both an image and a document. + +## Basic Vision Chat Implementation + +### Single Image Analysis + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function analyzeImage(imageUrl, question) { + const zai = await ZAI.create(); + + const response = await zai.chat.completions.createVision({ + messages: [ + { + role: 'user', + content: [ + { + type: 'text', + text: question + }, + { + type: 'image_url', + image_url: { + url: imageUrl + } + } + ] + } + ], + thinking: { type: 'disabled' } + }); + + return response.choices[0]?.message?.content; +} + +// Usage +const result = await analyzeImage( + 'https://example.com/product.jpg', + 'Describe this product in detail' +); +console.log('Analysis:', result); +``` + +### Multiple Images Analysis + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function compareImages(imageUrls, question) { + const zai = await ZAI.create(); + + const content = [ + { + type: 'text', + text: question + }, + ...imageUrls.map(url => ({ + type: 'image_url', + image_url: { url } + })) + ]; + + const response = await zai.chat.completions.createVision({ + messages: [ + { + role: 'user', + content: content + } + ], + thinking: { type: 'disabled' } + }); + + return response.choices[0]?.message?.content; +} + +// Usage +const comparison = await compareImages( + [ + 'https://example.com/before.jpg', + 'https://example.com/after.jpg' + ], + 'Compare these two images and describe the differences' +); +``` + +### Base64 Image Support + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; + +async function analyzeLocalImage(imagePath, question) { + const zai = await ZAI.create(); + + // Read image file and convert to base64 + const imageBuffer = fs.readFileSync(imagePath); + const base64Image = imageBuffer.toString('base64'); + const mimeType = imagePath.endsWith('.png') ? 'image/png' : 'image/jpeg'; + + const response = await zai.chat.completions.createVision({ + messages: [ + { + role: 'user', + content: [ + { + type: 'text', + text: question + }, + { + type: 'image_url', + image_url: { + url: `data:${mimeType};base64,${base64Image}` + } + } + ] + } + ], + thinking: { type: 'disabled' } + }); + + return response.choices[0]?.message?.content; +} +``` + +## Advanced Use Cases + +### Conversational Vision Chat + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +class VisionChatSession { + constructor() { + this.messages = []; + } + + async initialize() { + this.zai = await ZAI.create(); + } + + async addImage(imageUrl, initialQuestion) { + this.messages.push({ + role: 'user', + content: [ + { + type: 'text', + text: initialQuestion + }, + { + type: 'image_url', + image_url: { url: imageUrl } + } + ] + }); + + return this.getResponse(); + } + + async followUp(question) { + this.messages.push({ + role: 'user', + content: [ + { + type: 'text', + text: question + } + ] + }); + + return this.getResponse(); + } + + async getResponse() { + const response = await this.zai.chat.completions.createVision({ + messages: this.messages, + thinking: { type: 'disabled' } + }); + + const assistantMessage = response.choices[0]?.message?.content; + + this.messages.push({ + role: 'assistant', + content: assistantMessage + }); + + return assistantMessage; + } +} + +// Usage +const session = new VisionChatSession(); +await session.initialize(); + +const initial = await session.addImage( + 'https://example.com/chart.jpg', + 'What does this chart show?' +); +console.log('Initial analysis:', initial); + +const followup = await session.followUp('What are the key trends?'); +console.log('Follow-up:', followup); +``` + +### Image Classification and Tagging + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function classifyImage(imageUrl) { + const zai = await ZAI.create(); + + const prompt = `Analyze this image and provide: +1. Main subject/category +2. Key objects detected +3. Scene description +4. Suggested tags (comma-separated) + +Format your response as JSON.`; + + const response = await zai.chat.completions.createVision({ + messages: [ + { + role: 'user', + content: [ + { + type: 'text', + text: prompt + }, + { + type: 'image_url', + image_url: { url: imageUrl } + } + ] + } + ], + thinking: { type: 'disabled' } + }); + + const content = response.choices[0]?.message?.content; + + try { + return JSON.parse(content); + } catch (e) { + return { rawResponse: content }; + } +} +``` + +### OCR and Text Extraction + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function extractText(imageUrl) { + const zai = await ZAI.create(); + + const response = await zai.chat.completions.createVision({ + messages: [ + { + role: 'user', + content: [ + { + type: 'text', + text: 'Extract all text from this image. Preserve the layout and formatting as much as possible.' + }, + { + type: 'image_url', + image_url: { url: imageUrl } + } + ] + } + ], + thinking: { type: 'disabled' } + }); + + return response.choices[0]?.message?.content; +} +``` + +## Best Practices + +### 1. Image Quality and Size +- Use high-quality images for better analysis results +- Optimize image size to balance quality and processing speed +- Supported formats: JPEG, PNG, WebP + +### 2. Prompt Engineering +- Be specific about what information you need from the image +- Structure complex requests with numbered lists or bullet points +- Provide context about the image type (photo, diagram, chart, etc.) + +### 3. Error Handling +```javascript +async function safeVisionChat(imageUrl, question) { + try { + const zai = await ZAI.create(); + + const response = await zai.chat.completions.createVision({ + messages: [ + { + role: 'user', + content: [ + { type: 'text', text: question }, + { type: 'image_url', image_url: { url: imageUrl } } + ] + } + ], + thinking: { type: 'disabled' } + }); + + return { + success: true, + content: response.choices[0]?.message?.content + }; + } catch (error) { + console.error('Vision chat error:', error); + return { + success: false, + error: error.message + }; + } +} +``` + +### 4. Performance Optimization +- Cache SDK instance creation when processing multiple images +- Use appropriate image formats (JPEG for photos, PNG for diagrams) +- Consider image preprocessing for large batches + +### 5. Security Considerations +- Validate image URLs before processing +- Sanitize user-provided image data +- Implement rate limiting for public-facing APIs +- Never expose SDK credentials in client-side code + +## Common Use Cases + +1. **Product Analysis**: Analyze product images for e-commerce applications +2. **Document Understanding**: Extract information from receipts, invoices, forms +3. **Medical Imaging**: Assist in preliminary analysis (with appropriate disclaimers) +4. **Quality Control**: Detect defects or anomalies in manufacturing +5. **Content Moderation**: Analyze images for policy compliance +6. **Accessibility**: Generate alt text for images automatically +7. **Visual Search**: Understand and categorize images for search functionality + +## Integration Examples + +### Express.js API Endpoint + +```javascript +import express from 'express'; +import ZAI from 'z-ai-web-dev-sdk'; + +const app = express(); +app.use(express.json()); + +let zaiInstance; + +// Initialize SDK once +async function initZAI() { + zaiInstance = await ZAI.create(); +} + +app.post('/api/analyze-image', async (req, res) => { + try { + const { imageUrl, question } = req.body; + + if (!imageUrl || !question) { + return res.status(400).json({ + error: 'imageUrl and question are required' + }); + } + + const response = await zaiInstance.chat.completions.createVision({ + messages: [ + { + role: 'user', + content: [ + { type: 'text', text: question }, + { type: 'image_url', image_url: { url: imageUrl } } + ] + } + ], + thinking: { type: 'disabled' } + }); + + res.json({ + success: true, + analysis: response.choices[0]?.message?.content + }); + } catch (error) { + res.status(500).json({ + success: false, + error: error.message + }); + } +}); + +initZAI().then(() => { + app.listen(3000, () => { + console.log('Vision chat API running on port 3000'); + }); +}); +``` + +## Troubleshooting + +**Issue**: "SDK must be used in backend" +- **Solution**: Ensure z-ai-web-dev-sdk is only imported and used in server-side code + +**Issue**: Image not loading or being analyzed +- **Solution**: Verify the image URL is accessible and returns a valid image format + +**Issue**: Poor analysis quality +- **Solution**: Provide more specific prompts and ensure image quality is sufficient + +**Issue**: Slow response times +- **Solution**: Optimize image size and consider caching frequently analyzed images + +## Remember + +- Always use z-ai-web-dev-sdk in backend code only +- The SDK is already installed - import as shown in examples +- Structure prompts clearly for best results +- Handle errors gracefully in production applications +- Consider user privacy when processing images diff --git a/skills/VLM/scripts/vlm.ts b/skills/VLM/scripts/vlm.ts new file mode 100755 index 0000000..5a9a88f --- /dev/null +++ b/skills/VLM/scripts/vlm.ts @@ -0,0 +1,57 @@ +import ZAI, { VisionMessage } from 'z-ai-web-dev-sdk'; + +async function main(imageUrl: string, prompt: string) { + try { + const zai = await ZAI.create(); + + const messages: VisionMessage[] = [ + { + role: 'assistant', + content: [ + { type: 'text', text: 'Output only text, no markdown.' } + ] + }, + { + role: 'user', + content: [ + { type: 'text', text: prompt }, + { type: 'image_url', image_url: { url: imageUrl } } + ] + } + ]; + + // const messages: VisionMessage[] = [ + // { + // role: 'user', + // content: [ + // { type: 'text', text: prompt }, + // { type: 'video_url', video_url: { url: imageUrl } } + // ] + // } + // ]; + + // const messages: VisionMessage[] = [ + // { + // role: 'user', + // content: [ + // { type: 'text', text: prompt }, + // { type: 'file_url', file_url: { url: imageUrl } } + // ] + // } + // ]; + + const response = await zai.chat.completions.createVision({ + model: 'glm-4.6v', + messages, + thinking: { type: 'disabled' } + }); + + const reply = response.choices?.[0]?.message?.content; + console.log('Vision model reply:'); + console.log(reply ?? JSON.stringify(response, null, 2)); + } catch (err: any) { + console.error('Vision chat failed:', err?.message || err); + } +} + +main("https://cdn.bigmodel.cn/static/logo/register.png", "Please describe this image."); diff --git a/skills/docx/CHANGELOG.md b/skills/docx/CHANGELOG.md new file mode 100755 index 0000000..c9b5e00 --- /dev/null +++ b/skills/docx/CHANGELOG.md @@ -0,0 +1,85 @@ +# Changelog + +## [Added Comment Feature - python-docx Method] - 2026-01-29 + +### Added +- **批注功能 (Comment Feature)**: 使用python-docx的简单可靠方案 + - **推荐方法**: `scripts/add_comment_simple.py` - 使用python-docx直接操作.docx文件 + - **完整示例**: `scripts/examples/add_comments_pythondocx.py` - 展示各种使用场景 + - SKILL.md: 更新为推荐python-docx方法 + - ooxml.md: 保留OOXML方法作为高级选项 + - COMMENTS_UPDATE.md: 详细的功能更新说明 + +### Features +- ✅ 简单易用:无需解压/打包文档 +- ✅ 批注人自动设置为"Z.ai" +- ✅ 经过实际验证:在Word中正常显示 +- ✅ 支持多种定位方式:文本搜索、段落索引、条件判断等 +- ✅ 代码简洁:比OOXML方法简单得多 + +### Method Comparison + +**Recommended: python-docx** +```python +from docx import Document +doc = Document('input.docx') +doc.add_comment(runs=[para.runs[0]], text="批注", author="Z.ai") +doc.save('output.docx') +``` + +**Alternative: OOXML (Advanced)** +```python +from scripts.document import Document +doc = Document('unpacked', author="Z.ai") +para = doc["word/document.xml"].get_node(tag="w:p", contains="text") +doc.add_comment(start=para, end=para, text="批注") +doc.save() +``` + +### Usage Examples + +#### 推荐方法(python-docx) +```bash +# 安装依赖 +pip install python-docx + +# 使用简单脚本 +python scripts/add_comment_simple.py input.docx output.docx + +# 使用完整示例 +python scripts/examples/add_comments_pythondocx.py document.docx reviewed.docx +``` + +#### 高级方法(OOXML) +```bash +# 解压、处理、打包 +python ooxml/scripts/unpack.py document.docx unpacked +python scripts/add_comment.py unpacked 10 "批注内容" +python ooxml/scripts/pack.py unpacked output.docx +``` + +### Testing +- ✅ python-docx方法经过实际验证 +- ✅ 批注在Microsoft Word中正常显示 +- ✅ 作者正确显示为"Z.ai" +- ✅ 支持各种定位方式 +- ✅ 代码简洁可靠 + +### Documentation +- SKILL.md: 推荐python-docx方法,保留OOXML作为高级选项 +- COMMENTS_UPDATE.md: 详细说明两种方法的区别 +- 新增python-docx示例脚本 +- 保留OOXML示例供高级用户使用 + +### Why python-docx is Recommended +1. **简单**: 无需解压/打包文档 +2. **可靠**: 经过实际验证,在Word中正常工作 +3. **直接**: 直接操作.docx文件,一步到位 +4. **维护性**: 代码简洁,易于理解和修改 +5. **兼容性**: 使用标准库,兼容性好 + +OOXML方法适合: +- 需要低级XML控制 +- 需要同时处理tracked changes +- 需要批注回复等复杂功能 +- 已经在使用解压文档的工作流 diff --git a/skills/docx/LICENSE.txt b/skills/docx/LICENSE.txt new file mode 100755 index 0000000..c55ab42 --- /dev/null +++ b/skills/docx/LICENSE.txt @@ -0,0 +1,30 @@ +© 2025 Anthropic, PBC. All rights reserved. + +LICENSE: Use of these materials (including all code, prompts, assets, files, +and other components of this Skill) is governed by your agreement with +Anthropic regarding use of Anthropic's services. If no separate agreement +exists, use is governed by Anthropic's Consumer Terms of Service or +Commercial Terms of Service, as applicable: +https://www.anthropic.com/legal/consumer-terms +https://www.anthropic.com/legal/commercial-terms +Your applicable agreement is referred to as the "Agreement." "Services" are +as defined in the Agreement. + +ADDITIONAL RESTRICTIONS: Notwithstanding anything in the Agreement to the +contrary, users may not: + +- Extract these materials from the Services or retain copies of these + materials outside the Services +- Reproduce or copy these materials, except for temporary copies created + automatically during authorized use of the Services +- Create derivative works based on these materials +- Distribute, sublicense, or transfer these materials to any third party +- Make, offer to sell, sell, or import any inventions embodied in these + materials +- Reverse engineer, decompile, or disassemble these materials + +The receipt, viewing, or possession of these materials does not convey or +imply any license or right beyond those expressly granted above. + +Anthropic retains all right, title, and interest in these materials, +including all copyrights, patents, and other intellectual property rights. diff --git a/skills/docx/SKILL.md b/skills/docx/SKILL.md new file mode 100755 index 0000000..25afc02 --- /dev/null +++ b/skills/docx/SKILL.md @@ -0,0 +1,455 @@ +--- +name: docx +description: "Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. When GLM needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks" +license: Proprietary. LICENSE.txt has complete terms +--- + +# DOCX creation, editing, and analysis + +## Overview + +A user may ask you to create, edit, or analyze the contents of a .docx file. A .docx file is essentially a ZIP archive containing XML files and other resources that you can read or edit. You have different tools and workflows available for different tasks. + +# Design requiremnet + +Deliver studio-quality Word documents with deep thought on content, functionality, and styling. Users often don't explicitly request advanced features (covers, TOC, backgrounds, back covers, footnotes, charts)—deeply understand needs and proactively extend. The document must have 1.3x line spacing and have charts centered horizontally. +## Available color(choose one) +- "Ink & Zen" Color Palette (Wabi-Sabi Style) +The design uses a grayscale "Ink" palette to differentiate from standard business blue/morandi styles. +Primary (Titles):#0B1220 +Body Text:#0F172A +Secondary (Subtitles):#2B2B2B +Accent (UI / Decor):#9AA6B2 +Table Header / Subtle Background:#F1F5F9 + +- Wilderness Oasis": Sage & Deep Forest +Primary (Titles): #1A1F16 (Deep Forest Ink) +Body Text: #2D3329 (Dark Moss Gray) +Secondary (Subtitles): #4A5548 (Neutral Olive) +Accent (UI/Decor): #94A3B8 (Steady Silver) +Table/Background: #F8FAF7 (Ultra-Pale Mint White) + +- "Terra Cotta Afterglow": Warm Clay & Greige +Commonly utilized by top-tier consulting firms and architectural studios, this scheme warms up the gray scale to create a tactile sensation similar to premium cashmere. +Primary (Titles): #26211F (Deep Charcoal Espresso) +Body Text: #3D3735 (Dark Umber Gray) +Secondary (Subtitles): #6B6361 (Warm Greige) +Accent (UI/Decor): #C19A6B (Terra Cotta Gold / Muted Ochre) +Table/Background: #FDFCFB (Off-White / Paper Texture) + +- "Midnight Code": High-Contrast Slate & Silver +Ideal for cutting-edge technology, AI ventures, or digital transformation projects. This palette carries a slight "electric" undertone that provides superior visual penetration. +Primary (Titles): #020617 (Midnight Black) +Body Text: #1E293B (Deep Slate Blue) +Secondary (Subtitles): #64748B (Cool Blue-Gray) +Accent (UI/Decor): #94A3B8 (Steady Silver) +Table/Background: #F8FAFC (Glacial Blue-White) + +### Chinese plot PNG method** +If using Python to generate PNGs containing Chinese characters, note that Matplotlib defaults to the DejaVu Sans font which lacks Chinese support; since the environment already has the SimHei font installed, you should set it as the default by configuring: + +matplotlib.font_manager.fontManager.addfont('/usr/share/fonts/truetype/chinese/SimHei.ttf') +plt.rcParams['font.sans-serif'] = ['SimHei'] +plt.rcParams['axes.unicode_minus'] = False + + + + +## Specialized Element Styling +- Table Borders: Use a "Single" line style with a size of 12 and the Primary Ink color. Internal vertical borders should be set to Nil (invisible) to create a clean, modern horizontal-only look. +- **CRITICAL: Table Cell Margins** - ALL tables MUST set `margins` property at the Table level to prevent text from touching borders. This is mandatory for professional document quality. + +### Alignment and Typography +CJK body: justify + 2-char indent. English: left. Table numbers: right. Headings: no indent. +For both languages, Must use a line spacing of 1.3x (250 twips). Do not use single line spacing !!! + +### CRITICAL: Chinese Quotes in JavaScript/TypeScript Code +**MANDATORY**: When writing JavaScript/TypeScript code for docx-js, ALL Chinese quotation marks (""", ''') inside strings MUST be escaped as Unicode escape sequences: +- Left double quote "\u201c" (") +- Right double quote "\u201d" (") +- Left single quote "\u2018" (') +- Right single quote "\u2019" (') + +**Example - INCORRECT (will cause syntax error):** +```javascript +new TextRun({ + text: "他说"你好"" // ERROR: Chinese quotes break JS syntax +}) +``` + +**Example - CORRECT:** +```javascript +new TextRun({ + text: "他说\u201c你好\u201d" // Correct: escaped Unicode +}) +``` + +**Alternative - Use template literals:** +```javascript +new TextRun({ + text: `他说"你好"` // Also works: template literals allow Chinese quotes +}) +``` + +## Workflow Decision Tree + +### Reading/Analyzing Content +Use "Text extraction" or "Raw XML access" sections below. + +### Creating New Document +Use "Creating a new Word document" workflow. + +### Editing Existing Document +- **Your own document + simple changes** + Use "Basic OOXML editing" workflow + +- **Someone else's document** + Use **"Redlining workflow"** (recommended default) + +- **Legal, academic, business, or government docs** + Use **"Redlining workflow"** (required) + +## Reading and analyzing content + +**Note**: For .doc (legacy format), first convert with `libreoffice --convert-to docx file.doc`. + +### Text extraction +If you just need to read the text contents of a document, you should convert the document to markdown using pandoc. Pandoc provides excellent support for preserving document structure and can show tracked changes: + +```bash +# Convert document to markdown with tracked changes +pandoc --track-changes=all path-to-file.docx -o output.md +# Options: --track-changes=accept/reject/all +``` + +### Raw XML access +You need raw XML access for: comments, complex formatting, document structure, embedded media, and metadata. For any of these features, you'll need to unpack a document and read its raw XML contents. + +#### Unpacking a file +`python ooxml/scripts/unpack.py ` + +#### Key file structures +* `word/document.xml` - Main document contents +* `word/comments.xml` - Comments referenced in document.xml +* `word/media/` - Embedded images and media files +* Tracked changes use `` (insertions) and `` (deletions) tags + +## Creating a new Word document + +When creating a new Word document from scratch, use **docx-js**, but use bun instead of node to implement it. which allows you to create Word documents using JavaScript/TypeScript. + +### Workflow +1. **MANDATORY - READ ENTIRE FILE**: Read [`docx-js.md`](docx-js.md) (~560 lines) completely from start to finish. **NEVER set any range limits when reading this file.** Read the full file content for detailed syntax, critical formatting rules, and best practices before proceeding with document creation. +2. Create a JavaScript/TypeScript file using Document, Paragraph, TextRun components (You can assume all dependencies are installed, but if not, refer to the dependencies section below) +3. Export as .docx using Packer.toBuffer() + +### TOC (Table of Contents) +**If the document has more than three sections, generate a table of contents.** + +**Implementation**: Use docx-js `TableOfContents` component to create a live TOC that auto-populates from document headings. + +**CRITICAL**: For TOC to work correctly: +- All document headings MUST use `HeadingLevel` (e.g., `HeadingLevel.HEADING_1`) +- Do NOT add custom styles to heading paragraphs +- Place TOC before the actual heading content so it can scan them + +**Hint requirement**: A hint paragraph MUST be added immediately after the TOC component with these specifications: +- **Position**: Immediately after the TOC component +- **Alignment**: Center-aligned +- **Color**: Gray (e.g., "999999") +- **Font size**: 18 (9pt) +- **Language**: Matches user conversation language +- **Text content**: Inform the user to right-click the TOC and select "Update Field" to show correct page numbers + +### TOC Placeholders (Required Post-Processing) + +**REQUIRED**: After generating the DOCX file, you MUST add placeholder TOC entries that appear on first open (before the user updates the TOC). This prevents showing an empty TOC initially. + +**Implementation**: Always run the `add_toc_placeholders.py` script after generating the DOCX file: + +```bash +python skills/docx/scripts/add_toc_placeholders.py document.docx \ + --entries '[{"level":1,"text":"Chapter 1 Overview","page":"1"},{"level":2,"text":"Section 1.1 Details","page":"1"}]' +``` + +**Note**: The script supports up to 3 TOC levels for placeholder entries. + +**Entry format**: +- `level`: Heading level (1, 2, or 3) +- `text`: The heading text +- `page`: Estimated page number (will be corrected when TOC is updated) + +**Auto-generating entries**: +You can extract the actual headings from the document structure to generate accurate entries. Match the heading text and hierarchy from your document content. + +**Benefits**: +- Users see TOC content immediately on first open +- Placeholders are automatically replaced when user updates the TOC +- Improves perceived document quality and user experience + +### Document Formatting Rules + +**Page Break Restrictions** +Page breaks are ONLY allowed in these specific locations: +- Between cover page and table of contents (if TOC exists) +- Between cover page and main content (if NO TOC exists) +- Between table of contents and main content (if TOC exists) + +**All content after the table of contents must flow continuously WITHOUT page breaks.** + +**Text and Paragraph Rules** +- Complete sentences before starting a new line — do not break sentences across lines +- Use single, consistent style for each complete sentence +- Only start a new paragraph when the current paragraph is logically complete + +**List and Bullet Point Formatting** +- Use left-aligned formatting (NOT justified alignment) +- Insert a line break after each list item +- Never place multiple items on the same line (justification stretches text) + +## Editing an existing Word document + +**Note**: For .doc (legacy format), first convert with `libreoffice --convert-to docx file.doc`. + +When editing an existing Word document, use the **Document library** (a Python library for OOXML manipulation). The library automatically handles infrastructure setup and provides methods for document manipulation. For complex scenarios, you can access the underlying DOM directly through the library. + +### Workflow +1. **MANDATORY - READ ENTIRE FILE**: Read [`ooxml.md`](ooxml.md) (~600 lines) completely from start to finish. **NEVER set any range limits when reading this file.** Read the full file content for the Document library API and XML patterns for directly editing document files. +2. Unpack the document: `python ooxml/scripts/unpack.py ` +3. Create and run a Python script using the Document library (see "Document Library" section in ooxml.md) +4. Pack the final document: `python ooxml/scripts/pack.py ` + +The Document library provides both high-level methods for common operations and direct DOM access for complex scenarios. + +## Adding Comments (批注) + +Comments (批注) allow you to add annotations to documents without modifying the actual content. This is useful for review feedback, explanations, or questions about specific parts of a document. + +### Recommended Method: Using python-docx (简单推荐) + +The simplest and most reliable way to add comments is using the `python-docx` library: + +```python +from docx import Document + +# Open the document +doc = Document('input.docx') + +# Find paragraphs and add comments +for para in doc.paragraphs: + if "关键词" in para.text: # Find paragraphs containing specific text + doc.add_comment( + runs=[para.runs[0]], # Specify the text to comment on + text="批注内容", + author="Z.ai" # Set comment author as Z.ai + ) + +# Save the document +doc.save('output.docx') +``` + +**Key points:** +- Install: `pip install python-docx` or `bun add python-docx` +- Works directly on .docx files (no need to unpack/pack) +- Simple API, reliable results +- Comments appear in Word's comment pane with Z.ai as author + +**Common patterns:** + +```python +from docx import Document + +doc = Document('document.docx') + +# Add comment to first paragraph +if doc.paragraphs: + first_para = doc.paragraphs[0] + doc.add_comment( + runs=[first_para.runs[0]] if first_para.runs else [], + text="Review this introduction", + author="Z.ai" + ) + +# Add comment to specific paragraph by index +target_para = doc.paragraphs[5] # 6th paragraph +doc.add_comment( + runs=[target_para.runs[0]], + text="This section needs clarification", + author="Z.ai" +) + +# Add comments based on text search +for para in doc.paragraphs: + if "important" in para.text.lower(): + doc.add_comment( + runs=[para.runs[0]], + text="Flagged for review", + author="Z.ai" + ) + +doc.save('output.docx') +``` + +### Alternative Method: Using OOXML (Advanced) + +For complex scenarios requiring low-level XML manipulation, you can use the OOXML workflow. This method is more complex but provides finer control. + +**Note:** This method requires unpacking/packing documents and may encounter validation issues. Use python-docx unless you specifically need low-level XML control. + +#### OOXML Workflow + +1. **Unpack the document**: `python ooxml/scripts/unpack.py ` + +2. **Create and run a Python script**: + +```python +from scripts.document import Document + +# Initialize with Z.ai as the author +doc = Document('unpacked', author="Z.ai", initials="Z") + +# Add comment on a paragraph +para = doc["word/document.xml"].get_node(tag="w:p", contains="paragraph text") +doc.add_comment(start=para, end=para, text="This needs clarification") + +# Save changes +doc.save() +``` + +3. **Pack the document**: `python ooxml/scripts/pack.py ` + +**When to use OOXML method:** +- You need to work with tracked changes simultaneously +- You need fine-grained control over XML structure +- You're already working with unpacked documents +- You need to manipulate comments in complex ways + +**When to use python-docx method (recommended):** +- Adding comments is your primary task +- You want simple, reliable code +- You're working with complete .docx files +- You don't need low-level XML access + +## Redlining workflow for document review + +This workflow allows you to plan comprehensive tracked changes using markdown before implementing them in OOXML. **CRITICAL**: For complete tracked changes, you must implement ALL changes systematically. + +**Batching Strategy**: Group related changes into batches of 3-10 changes. This makes debugging manageable while maintaining efficiency. Test each batch before moving to the next. + +**Principle: Minimal, Precise Edits** +When implementing tracked changes, only mark text that actually changes. Repeating unchanged text makes edits harder to review and appears unprofessional. Break replacements into: [unchanged text] + [deletion] + [insertion] + [unchanged text]. Preserve the original run's RSID for unchanged text by extracting the `` element from the original and reusing it. + +Example - Changing "30 days" to "60 days" in a sentence: +```python +# BAD - Replaces entire sentence +'The term is 30 days.The term is 60 days.' + +# GOOD - Only marks what changed, preserves original for unchanged text +'The term is 3060 days.' +``` + +### Tracked changes workflow + +1. **Get markdown representation**: Convert document to markdown with tracked changes preserved: + ```bash + pandoc --track-changes=all path-to-file.docx -o current.md + ``` + +2. **Identify and group changes**: Review the document and identify ALL changes needed, organizing them into logical batches: + + **Location methods** (for finding changes in XML): + - Section/heading numbers (e.g., "Section 3.2", "Article IV") + - Paragraph identifiers if numbered + - Grep patterns with unique surrounding text + - Document structure (e.g., "first paragraph", "signature block") + - **DO NOT use markdown line numbers** - they don't map to XML structure + + **Batch organization** (group 3-10 related changes per batch): + - By section: "Batch 1: Section 2 amendments", "Batch 2: Section 5 updates" + - By type: "Batch 1: Date corrections", "Batch 2: Party name changes" + - By complexity: Start with simple text replacements, then tackle complex structural changes + - Sequential: "Batch 1: Pages 1-3", "Batch 2: Pages 4-6" + +3. **Read documentation and unpack**: + - **MANDATORY - READ ENTIRE FILE**: Read [`ooxml.md`](ooxml.md) (~600 lines) completely from start to finish. **NEVER set any range limits when reading this file.** Pay special attention to the "Document Library" and "Tracked Change Patterns" sections. + - **Unpack the document**: `python ooxml/scripts/unpack.py ` + - **Note the suggested RSID**: The unpack script will suggest an RSID to use for your tracked changes. Copy this RSID for use in step 4b. + +4. **Implement changes in batches**: Group changes logically (by section, by type, or by proximity) and implement them together in a single script. This approach: + - Makes debugging easier (smaller batch = easier to isolate errors) + - Allows incremental progress + - Maintains efficiency (batch size of 3-10 changes works well) + + **Suggested batch groupings:** + - By document section (e.g., "Section 3 changes", "Definitions", "Termination clause") + - By change type (e.g., "Date changes", "Party name updates", "Legal term replacements") + - By proximity (e.g., "Changes on pages 1-3", "Changes in first half of document") + + For each batch of related changes: + + **a. Map text to XML**: Grep for text in `word/document.xml` to verify how text is split across `` elements. + + **b. Create and run script**: Use `get_node` to find nodes, implement changes, then `doc.save()`. See **"Document Library"** section in ooxml.md for patterns. + + **Note**: Always grep `word/document.xml` immediately before writing a script to get current line numbers and verify text content. Line numbers change after each script run. + +5. **Pack the document**: After all batches are complete, convert the unpacked directory back to .docx: + ```bash + python ooxml/scripts/pack.py unpacked reviewed-document.docx + ``` + +6. **Final verification**: Do a comprehensive check of the complete document: + - Convert final document to markdown: + ```bash + pandoc --track-changes=all reviewed-document.docx -o verification.md + ``` + - Verify ALL changes were applied correctly: + ```bash + grep "original phrase" verification.md # Should NOT find it + grep "replacement phrase" verification.md # Should find it + ``` + - Check that no unintended changes were introduced + + +## Converting Documents to Images + +To visually analyze Word documents, convert them to images using a two-step process: + +1. **Convert DOCX to PDF**: + ```bash + soffice --headless --convert-to pdf document.docx + ``` + +2. **Convert PDF pages to JPEG images**: + ```bash + pdftoppm -jpeg -r 150 document.pdf page + ``` + This creates files like `page-1.jpg`, `page-2.jpg`, etc. + +Options: +- `-r 150`: Sets resolution to 150 DPI (adjust for quality/size balance) +- `-jpeg`: Output JPEG format (use `-png` for PNG if preferred) +- `-f N`: First page to convert (e.g., `-f 2` starts from page 2) +- `-l N`: Last page to convert (e.g., `-l 5` stops at page 5) +- `page`: Prefix for output files + +Example for specific range: +```bash +pdftoppm -jpeg -r 150 -f 2 -l 5 document.pdf page # Converts only pages 2-5 +``` + +## Code Style Guidelines +**IMPORTANT**: When generating code for DOCX operations: +- Write concise code +- Avoid verbose variable names and redundant operations +- Avoid unnecessary print statements + +## Dependencies + +Required dependencies (install if not available): + +- **pandoc**: `sudo apt-get install pandoc` (for text extraction) +- **docx**: `bun add docx` (for creating new documents) +- **LibreOffice**: `sudo apt-get install libreoffice` (for PDF conversion) +- **Poppler**: `sudo apt-get install poppler-utils` (for pdftoppm to convert PDF to images) +- **defusedxml**: `pip install defusedxml` (for secure XML parsing) diff --git a/skills/docx/docx-js.md b/skills/docx/docx-js.md new file mode 100755 index 0000000..530ac33 --- /dev/null +++ b/skills/docx/docx-js.md @@ -0,0 +1,681 @@ +# DOCX Library Tutorial + +Generate .docx files with JavaScript/TypeScript. + +**Important: Read this entire document before starting.** Critical formatting rules and common pitfalls are covered throughout - skipping sections may result in corrupted files or rendering issues. + +## Setup +Assumes docx is already installed globally +If not installed: first try `bun add docx`, then `npm install -g docx` +```javascript +const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun, Media, + Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink, + InternalHyperlink, TableOfContents, HeadingLevel, BorderStyle, WidthType, TabStopType, + TabStopPosition, UnderlineType, ShadingType, VerticalAlign, SymbolRun, PageNumber, + FootnoteReferenceRun, Footnote, PageBreak } = require('docx'); + +// Create & Save +const doc = new Document({ sections: [{ children: [/* content */] }] }); +Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer)); // Node.js +Packer.toBlob(doc).then(blob => { /* download logic */ }); // Browser +``` + +## Delivery Standard + +**Generic styling and mediocre aesthetics = mediocre delivery.** + +Deliver studio-quality Word documents with deep thought on content, functionality, and styling. Users often don't explicitly request advanced features (covers, TOC, backgrounds, back covers, footnotes, charts)—deeply understand needs and proactively extend. + +The following formatting standards are to be strictly applied without exception: + +- Line Spacing: The entire document must use 1.3x line spacing. +- Chart/Figure Placement: All charts, graphs, and figures must be explicitly centered horizontally on the page. + +```javascript +new Table({ + alignment: AlignmentType.CENTER, + rows: [ + new TableRow({ + children: [ + new TableCell({ + children: [ + new Paragraph({ + text: "centered text", + alignment: AlignmentType.CENTER, + }), + ], + verticalAlign: VerticalAlign.CENTER, + shading: { fill: colors.tableBg }, + borders: cellBorders, + }), + ], + }), + ], +}); +``` + +- The text in charts must have left/right/up/bottom margin. +- Image Handling:Preserve aspect ratio**: Never adjust image aspect ratio. Must insert according to the original ratio. +- Do not use background shading to all table section headers. + +Compliance with these specifications is mandatory. + +## Language Consistency + +**Document language = User conversation language** (including filename, body text, headings, headers, TOC hints, chart labels, and all other text). + +## Headers and Footers - REQUIRED BY DEFAULT + +Most documents **MUST** include headers and footers. The specific style (alignment, format, content) should match the document's overall design. + +- **Header**: Typically document title, company name, or chapter name +- **Footer**: Typically page numbers (format flexible: "X / Y", "Page X", "— X —", etc.) +- **Cover/Back cover**: Use `TitlePage` setting to hide header/footer on first page + +## Fonts +If the user do not require specific fonts, you must follow the fonts rule belowing: +### For Chinese: +| Element | Font Family | Font Size (Half-points) | Properties | +| :--- | :--- | :--- | :--- | +| Normal Body | Microsoft YaHei (微软雅黑) | 21 (10.5pt / 五号) | Standard for readability. | +| Heading 1 | SimHei (黑体) | 32 (16pt / 三号) | Bold, high impact. | +| Heading 2 | SimHei (黑体) | 28 (14pt / 四号) | Bold. | +| Caption | Microsoft YaHei | 20 (10pt) | For tables and charts. | + + - Microsoft YaHei, located at /usr/share/fonts/truetype/chinese/msyh.ttf + - SimHei, located at /usr/share/fonts/truetype/chinese/SimHei.ttf + - Code blocks: SarasaMonoSC, located at /usr/share/fonts/truetype/chinese/SarasaMonoSC-Regular.ttf + - Formulas / symbols: DejaVuSans, located at /usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf + - For body text and formulas, use Paragraph instead of Preformatted. + + +### For English +| Element | Font Family | Font Size (Half-points) | Properties | +| :--- | :--- | :--- | :--- | +| Normal Body | Calibri | 22 (11pt) | Highly legible; slightly larger than 10.5pt to match visual "weight." | +| Heading 1 | Times New Roman | 36 (18pt) | Bold, Serif; provides a clear "Newspaper" style hierarchy. | +| Heading 2 | Times New Roman | 28 (14pt) | Bold; classic and professional. | +| Caption | Calibri | 18 (9pt) | Clean and compact for metadata and notes. | + +- Times New Roman, located at /usr/share/fonts/truetype/english/Times-New-Roman.ttf +- Calibri,located at /usr/share/fonts/truetype/english/calibri-regular.ttf + +## Spacing & Paragraph Alignment +Task: Apply the following formatting rules to the provided text for a professional bilingual (Chinese/English) layout. +### Paragraph & Indentation: +Chinese Body: First-line indent of 2 characters (420 twips). +English Body: No first-line indent; use block format (space between paragraphs). +Alignment: Justified (Both) for all body text; Centered for Titles and Table Headers. +### Line & Paragraph Spacing(keep in mind) +Line Spacing: Set to 1.3 (250 twips) lines for both languages. +Heading 1: 600 twips before, 300 twips after. +### Mixed-Language Kerning: +Insert a standard half-width space between Chinese characters and English words/numbers (e.g., "共 20 个 items"). +### Punctuation: +Use full-width punctuation for Chinese text and half-width punctuation for English text. + +## Professional Elements (Critical) + +Produce documents that surpass user expectations by proactively incorporating high-end design elements without being prompted. Quality Benchmark: Visual excellence reflecting the standards of a top-tier designer in 2025. + +**Cover & Visual:** + - Double-Sided Branding: All formal documents (proposals, reports, contracts, bids) and creative assets (invitations, greeting cards) must include both a standalone front and back cover. + - Internal Accents: Body pages may include subtle background elements to enhance the overall aesthetic depth. + +**Structure:** +- Navigation: For any document with three or more sections, include a Table of Contents (TOC) immediately followed by a "refresh hint." + +**Data Presentation:** +- Visual Priority: Use professional charts to illustrate trends or comparisons rather than plain text lists. +- Table Aesthetics: Apply light gray headers or the "three-line" professional style; strictly avoid the default Word blue. + +**Links & References:** +- Interactive Links: All URLs must be formatted as clickable, active hyperlinks. +- Cross-Referencing: Number all figures and tables systematically (e.g., "see Figure 1") and use internal cross-references. +- Academic/Legal Rigor: For research or data-heavy documents, implement clickable in-text citations paired with accurate footnotes or endnotes. + +### TOC Refresh Hint + +Because Word TOCs utilize field codes, page numbers may become unaligned during generation. You must append the following gray hint text after the TOC to guide the user: + Note: This Table of Contents is generated via field codes. To ensure page number accuracy after editing, please right-click the TOC and select "Update Field." + +### Outline Adherence + +- **User provides outline**: Follow strictly, no additions, deletions, or reordering +- **No outline provided**: Use standard structure + - Academic: Introduction → Literature Review → Methodology → Results → Discussion → Conclusion. + - Business: Executive Summary → Analysis → Recommendations. + - Technical: Overview → Principles → Implementation → Examples → FAQ. + +### Scene Completeness + +Anticipate the functional requirements of the specific scenario. Examples include, but are not limited to: +- **Exam paper** → Include name/class/ID fields, point allocations for every question, and a dedicated grading table. +- **Contract** → Provide signature and seal blocks for all parties, date placeholders, contract ID numbers, and an attachment list. +- **Meeting minutes** → List attendees and absentees, define action items with assigned owners, and note the next meeting time. + +## Design Philosophy + +### Color Scheme + +**Low saturation tones**, avoid Word default blue and matplotlib default high saturation. + +**Flexibly choose** color schemes based on document scenario: + +| Style | Palette | Suitable Scenarios | +|-------|---------|-------------------| +| Morandi | Soft muted tones | Arts, editorial, lifestyle | +| Earth tones | Brown, olive, natural | Environmental, organic industries | +| Nordic | Cool gray, misty blue | Minimalism, technology, software | +| Japanese Wabi-sabi | Gray, raw wood, zen | Traditional, contemplative, crafts | +| French elegance | Off-white, dusty pink | Luxury, fashion, high-end retail | +| Industrial | Charcoal, rust, concrete | Manufacturing, engineering, construction | +| Academic | Navy, burgundy, ivory | Research, education, legal | +| Ocean mist | Misty blue, sand | Marine, wellness, travel | +| Forest moss | Olive, moss green | Nature, sustainability, forestry | +| Desert dusk | Ochre, sandy gold | Warmth, regional, historical | + +**Color scheme must be consistent within the same document.** + +### highlighting +Use low saturation color schemes for font highlighting. + +### Layout + +White space (margins, paragraph spacing), clear hierarchy (H1 > H2 > body), proper padding (text shouldn't touch borders). + +### Pagination Control + +Word uses flow layout, not fixed pages. + +### Alignment and Typography (keep in mind!!!) +CJK body: justify + 2-char indent. English: left. Table numbers: right. Headings: no indent. +For both languages, Must use a line spacing of 1.3x (250 twips). Do not use single line spacing !!! + +### Table Formatting(Very inportant) +- A caption must be added immediately after the table, keep in mind! +- The entire table must be centered horizontally on the page. keep in mind! +#### Cell Formatting (Inside the Table) +Left/Right Cell Margin: Set to at least 120-200 twips (approximately the width of one character). +Up/Down Cell Margin: Set to at least 100 twips +Text Alignment(must follow !!!): +- Horizontal Alignment: Center-aligned. This creates a clean vertical axis through the table column. +- Vertical Alignment: Center-aligned. Text must be positioned exactly in the middle of the cell's height to prevent it from "floating" too close to the top or bottom borders. +- Cell Margins (Padding): +Left/Right: Set to 120–200 twips (approx. 0.2–0.35 cm). This ensures text does not touch the borders, maintaining legibility. +Top/Bottom: Set to at least 60–100 twips to provide a consistent vertical buffer around the text. + + +### Page break +There must be page break between cover page and the content, between table of content and the content also, should NOT put cover page and content in a single page. + +## Page Layout & Margins (A4 Standard) +The layout uses a 1440 twip (1 inch) margin for content, with specialized margins for the cover. + +| Section | Top Margin | Bottom/Left/Right | Twips Calculation | +|---------------|------------|-------------------|-------------------------------------------| +| Cover Page | 0 | 0 | For edge-to-edge background images. | +| Main Content | 1800 | 1440 | Extra top space for the header. | +| **Twips Unit** | **1 inch = 1440 twips** | **A4 Width = 11906** | **A4 Height = 16838** | + +## Text & Formatting +```javascript +// IMPORTANT: Never use \n for line breaks - always use separate Paragraph elements +// ❌ WRONG: new TextRun("Line 1\nLine 2") +// ✅ CORRECT: new Paragraph({ children: [new TextRun("Line 1")] }), new Paragraph({ children: [new TextRun("Line 2")] }) + +// First-line indent for body paragraphs +// IMPORTANT: Chinese documents typically use 2-character indent (about 480 DXA for 12pt SimSun) +new Paragraph({ + indent: { firstLine: 480 }, // 2-character first-line indent for Chinese body text + children: [new TextRun({ text: "This is the main text (Chinese). The first line is indented by two characters.", font: "SimSun" })] +}) + +// Basic text with all formatting options +new Paragraph({ + alignment: AlignmentType.CENTER, + spacing: { before: 200, after: 200 }, + indent: { left: 720, right: 720, firstLine: 480 }, // Can combine with left/right indent + children: [ + new TextRun({ text: "Bold", bold: true }), + new TextRun({ text: "Italic", italics: true }), + new TextRun({ text: "Underlined", underline: { type: UnderlineType.DOUBLE, color: "FF0000" } }), + new TextRun({ text: "Colored", color: "FF0000", size: 28, font: "Times New Roman" }), // Times New Roman (system font) + new TextRun({ text: "Highlighted", highlight: "yellow" }), + new TextRun({ text: "Strikethrough", strike: true }), + new TextRun({ text: "x2", superScript: true }), + new TextRun({ text: "H2O", subScript: true }), + new TextRun({ text: "SMALL CAPS", smallCaps: true }), + new SymbolRun({ char: "2022", font: "Symbol" }), // Bullet • + new SymbolRun({ char: "00A9", font: "Arial" }) // Copyright © - Arial for symbols + ] +}) +``` + +## Styles & Professional Formatting + +```javascript +const doc = new Document({ + styles: { + default: { document: { run: { font: "Times New Roman", size: 24 } } }, // 12pt default (system font) + paragraphStyles: [ + // Document title style - override built-in Title style + { id: "Title", name: "Title", basedOn: "Normal", + run: { size: 56, bold: true, color: "000000", font: "Times New Roman" }, + paragraph: { spacing: { before: 240, after: 120 }, alignment: AlignmentType.CENTER } }, + // IMPORTANT: Override built-in heading styles by using their exact IDs + { id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true, + run: { size: 32, bold: true, color: "000000", font: "Times New Roman" }, // 16pt + paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } }, // outlineLevel enables TOC generation if needed + { id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true, + run: { size: 28, bold: true, color: "000000", font: "Times New Roman" }, // 14pt + paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } }, + // Custom styles use your own IDs + { id: "myStyle", name: "My Style", basedOn: "Normal", + run: { size: 28, bold: true, color: "000000" }, + paragraph: { spacing: { after: 120 }, alignment: AlignmentType.CENTER } } + ], + characterStyles: [{ id: "myCharStyle", name: "My Char Style", + run: { color: "FF0000", bold: true, underline: { type: UnderlineType.SINGLE } } }] + }, + sections: [{ + properties: { page: { margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } } }, + children: [ + new Paragraph({ heading: HeadingLevel.TITLE, children: [new TextRun("Document Title")] }), // Uses overridden Title style + new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("Heading 1")] }), // Uses overridden Heading1 style + new Paragraph({ style: "myStyle", children: [new TextRun("Custom paragraph style")] }), + new Paragraph({ children: [ + new TextRun("Normal with "), + new TextRun({ text: "custom char style", style: "myCharStyle" }) + ]}) + ] + }] +}); +``` + +**Font Management Strategy (CRITICAL):** + +**ALWAYS prioritize system-installed fonts** for reliability, performance, and cross-platform compatibility: + +1. **System fonts FIRST** (no download, immediate availability): + - English: **Times New Roman** (professional standard) + - Chinese: **SimSun/宋体** (formal document standard) + - Universal fallbacks: Arial, Calibri, Helvetica + +2. **Avoid custom font downloads** unless absolutely necessary for specific branding +3. **Test font availability** before deployment + +**Professional Font Combinations (System Fonts Only):** +- **Times New Roman (Headers) + Times New Roman (Body)** - Classic, professional, universally supported +- **Arial (Headers) + Arial (Body)** - Clean, modern, universally supported +- **Times New Roman (Headers) + Arial (Body)** - Classic serif headers with modern body + +**Chinese Document Font Guidelines (System Fonts):** +- **Body text**: Use **SimSun/宋体** - the standard system font for Chinese formal documents +- **Headings**: Use **SimHei/黑体** - bold sans-serif for visual hierarchy +- **Default size**: 12pt (size: 24) for body, 14-16pt for headings +- **CRITICAL**: SimSun for body text, SimHei ONLY for headings - never use SimHei for entire document + +```javascript +// English document style configuration (Times New Roman) +const doc = new Document({ + styles: { + default: { document: { run: { font: "Times New Roman", size: 24 } } }, // 12pt for body + paragraphStyles: [ + { id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true, + run: { size: 32, bold: true, font: "Times New Roman" }, // 16pt for H1 + paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } }, + { id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true, + run: { size: 28, bold: true, font: "Times New Roman" }, // 14pt for H2 + paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } } + ] + } +}); + +// Chinese document style configuration (SimSun/SimHei) +const doc = new Document({ + styles: { + default: { document: { run: { font: "SimSun", size: 24 } } }, // SimSun 12pt for body + paragraphStyles: [ + { id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true, + run: { size: 32, bold: true, font: "SimHei" }, // SimHei 16pt for H1 + paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } }, + { id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true, + run: { size: 28, bold: true, font: "SimHei" }, // SimHei 14pt for H2 + paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } } + ] + } +}); +``` + +**Key Styling Principles:** +- **ALWAYS use system-installed fonts** (Times New Roman for English, SimSun for Chinese) +- **Override built-in styles**: Use exact IDs like "Heading1", "Heading2", "Heading3" to override Word's built-in heading styles +- **HeadingLevel constants**: `HeadingLevel.HEADING_1` uses "Heading1" style, `HeadingLevel.HEADING_2` uses "Heading2" style, etc. +- **outlineLevel**: Set `outlineLevel: 0` for H1, `outlineLevel: 1` for H2, etc. (optional, only needed if TOC will be added) +- **Use custom styles** instead of inline formatting for consistency +- **Set a default font** using `styles.default.document.run.font` - Times New Roman for English, SimSun for Chinese +- **Establish visual hierarchy** with different font sizes (titles > headers > body) +- **Add proper spacing** with `before` and `after` paragraph spacing +- **Use colors sparingly**: Default to black (000000) and shades of gray for titles and headings (heading 1, heading 2, etc.) +- **Set consistent margins** (1440 = 1 inch is standard) + + +## Lists (ALWAYS USE PROPER LISTS - NEVER USE UNICODE BULLETS) + +### ⚠️ CRITICAL: Numbered List References - Read This Before Creating Lists! + +**Each independently numbered list MUST use a UNIQUE reference name** + +**Rules**: +- Same `reference` = continues numbering (1,2,3 → 4,5,6) +- Different `reference` = restarts at 1 (1,2,3 → 1,2,3) + +**When to use a new reference?** +- ✓ Numbered lists under new headings/sections +- ✓ Any list that needs independent numbering +- ✗ Subsequent items of the same list (keep using same reference) + +**Reference naming suggestions**: +- `list-section-1`, `list-section-2`, `list-section-3` +- `list-chapter-1`, `list-chapter-2` +- `list-requirements`, `list-constraints` (name based on content) + +```javascript +// ❌ WRONG: All lists use the same reference +numbering: { + config: [ + { reference: "my-list", levels: [...] } // Only one config + ] +} +// Result: +// Chapter 1 +// 1. Item A +// 2. Item B +// Chapter 2 +// 3. Item C ← WRONG! Should start from 1 +// 4. Item D + +// ✅ CORRECT: Each list uses different reference +numbering: { + config: [ + { reference: "list-chapter-1", levels: [...] }, + { reference: "list-chapter-2", levels: [...] }, + { reference: "list-chapter-3", levels: [...] } + ] +} +// Result: +// Chapter 1 +// 1. Item A +// 2. Item B +// Chapter 2 +// 1. Item C ✓ CORRECT! Restarts from 1 +// 2. Item D +// Chapter 3 +// 1. Item E ✓ CORRECT! Restarts from 1 +// 2. Item F +``` + +### Basic List Syntax + +```javascript +// Bullets - ALWAYS use the numbering config, NOT unicode symbols +// CRITICAL: Use LevelFormat.BULLET constant, NOT the string "bullet" +const doc = new Document({ + numbering: { + config: [ + { reference: "bullet-list", + levels: [{ level: 0, format: LevelFormat.BULLET, text: "•", alignment: AlignmentType.LEFT, + style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] }, + { reference: "first-numbered-list", + levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT, + style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] }, + { reference: "second-numbered-list", // Different reference = restarts at 1 + levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT, + style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] } + ] + }, + sections: [{ + children: [ + // Bullet list items + new Paragraph({ numbering: { reference: "bullet-list", level: 0 }, + children: [new TextRun("First bullet point")] }), + new Paragraph({ numbering: { reference: "bullet-list", level: 0 }, + children: [new TextRun("Second bullet point")] }), + // Numbered list items + new Paragraph({ numbering: { reference: "first-numbered-list", level: 0 }, + children: [new TextRun("First numbered item")] }), + new Paragraph({ numbering: { reference: "first-numbered-list", level: 0 }, + children: [new TextRun("Second numbered item")] }), + // ⚠️ CRITICAL: Different reference = INDEPENDENT list that restarts at 1 + // Same reference = CONTINUES previous numbering + new Paragraph({ numbering: { reference: "second-numbered-list", level: 0 }, + children: [new TextRun("Starts at 1 again (because different reference)")] }) + ] + }] +}); + +// ⚠️ CRITICAL: NEVER use unicode bullets - they create fake lists that don't work properly +// new TextRun("• Item") // WRONG +// new SymbolRun({ char: "2022" }) // WRONG +// ✅ ALWAYS use numbering config with LevelFormat.BULLET for real Word lists +``` + +## Tables +```javascript +// Complete table with margins, borders, headers, and bullet points +const tableBorder = { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" }; +const cellBorders = { top: tableBorder, bottom: tableBorder, left: tableBorder, right: tableBorder }; + +new Table({ + columnWidths: [4680, 4680], // ⚠️ CRITICAL: Set column widths at table level - values in DXA (twentieths of a point) + // ⚠️ MANDATORY: margins MUST be set to prevent text touching borders + margins: { top: 100, bottom: 100, left: 180, right: 180 }, // Minimum comfortable padding + rows: [ + new TableRow({ + tableHeader: true, + children: [ + new TableCell({ + borders: cellBorders, + width: { size: 4680, type: WidthType.DXA }, // ALSO set width on each cell + // ⚠️ CRITICAL: Always use ShadingType.CLEAR to prevent black backgrounds in Word. + shading: { fill: "D5E8F0", type: ShadingType.CLEAR }, + verticalAlign: VerticalAlign.CENTER, + children: [new Paragraph({ + alignment: AlignmentType.CENTER, + children: [new TextRun({ text: "Header", bold: true, size: 22 })] + })] + }), + new TableCell({ + borders: cellBorders, + width: { size: 4680, type: WidthType.DXA }, // ALSO set width on each cell + shading: { fill: "D5E8F0", type: ShadingType.CLEAR }, + children: [new Paragraph({ + alignment: AlignmentType.CENTER, + children: [new TextRun({ text: "Bullet Points", bold: true, size: 22 })] + })] + }) + ] + }), + new TableRow({ + children: [ + new TableCell({ + borders: cellBorders, + width: { size: 4680, type: WidthType.DXA }, // ALSO set width on each cell + children: [new Paragraph({ children: [new TextRun("Regular data")] })] + }), + new TableCell({ + borders: cellBorders, + width: { size: 4680, type: WidthType.DXA }, // ALSO set width on each cell + children: [ + new Paragraph({ + numbering: { reference: "bullet-list", level: 0 }, + children: [new TextRun("First bullet point")] + }), + new Paragraph({ + numbering: { reference: "bullet-list", level: 0 }, + children: [new TextRun("Second bullet point")] + }) + ] + }) + ] + }) + ] +}) +``` + +**IMPORTANT: Table Width & Borders** +- Use BOTH `columnWidths: [width1, width2, ...]` array AND `width: { size: X, type: WidthType.DXA }` on each cell +- Values in DXA (twentieths of a point): 1440 = 1 inch, Letter usable width = 9360 DXA (with 1" margins) +- Apply borders to individual `TableCell` elements, NOT the `Table` itself + +**Precomputed Column Widths (Letter size with 1" margins = 9360 DXA total):** +- **2 columns:** `columnWidths: [4680, 4680]` (equal width) +- **3 columns:** `columnWidths: [3120, 3120, 3120]` (equal width) + +## Links & Navigation +```javascript +// TOC example +// new TableOfContents("Table of Contents", { hyperlink: true, headingStyleRange: "1-3" }), +// +// CRITICAL: If adding TOC, use HeadingLevel only, NOT custom styles +// ❌ WRONG: new Paragraph({ heading: HeadingLevel.HEADING_1, style: "customHeader", children: [new TextRun("Title")] }) +// ✅ CORRECT: new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("Title")] }) + +// REQUIRED: After generating the DOCX, add TOC placeholders for first-open experience +// Always run: python skills/docx/scripts/add_toc_placeholders.py document.docx --entries '[...]' +// This adds placeholder entries that appear before the user updates the TOC (modifies file in-place) +// Extract headings from your document to generate accurate entries + +// External link +new Paragraph({ + children: [new ExternalHyperlink({ + children: [new TextRun({ text: "Google", style: "Hyperlink" })], + link: "https://www.google.com" + })] +}), + +// Internal link & bookmark +new Paragraph({ + children: [new InternalHyperlink({ + children: [new TextRun({ text: "Go to Section", style: "Hyperlink" })], + anchor: "section1" + })] +}), +new Paragraph({ + children: [new TextRun("Section Content")], + bookmark: { id: "section1", name: "section1" } +}), + +``` + +Use `new Paragraph({ children: [new PageBreak()] })` at the start of the next section to ensure TOC is isolated. + +## Images & Media +```javascript +// Basic image with sizing & positioning +// CRITICAL: Always specify 'type' parameter - it's REQUIRED for ImageRun +new Paragraph({ + alignment: AlignmentType.CENTER, + children: [new ImageRun({ + type: "png", // NEW REQUIREMENT: Must specify image type (png, jpg, jpeg, gif, bmp, svg) + data: fs.readFileSync("image.png"), + transformation: { width: 200, height: 150, rotation: 0 }, // rotation in degrees + altText: { title: "Logo", description: "Company logo", name: "Name" } // IMPORTANT: All three fields are required + })] +}) +``` + +## Page Breaks +```javascript +// Manual page break +new Paragraph({ children: [new PageBreak()] }), + +// Page break before paragraph +new Paragraph({ + pageBreakBefore: true, + children: [new TextRun("This starts on a new page")] +}) + +// ⚠️ CRITICAL: NEVER use PageBreak standalone - it will create invalid XML that Word cannot open +// ❌ WRONG: new PageBreak() +// ✅ CORRECT: new Paragraph({ children: [new PageBreak()] }) +``` + +## Cover Page +**If the document has a cover page, the cover content should be centered both horizontally and vertically.** + +**Important notes for cover pages:** +- **Horizontal centering**: Use `alignment: AlignmentType.CENTER` on all cover page paragraphs +- **Vertical centering**: Use `spacing: { before: XXXX }` on elements to visually center content (adjust based on page height) +- **Separate section**: Create a dedicated section for the cover page to separate it from main content +- **Page break**: Use `new Paragraph({ children: [new PageBreak()] })` at the start of the next section to ensure cover is isolated + +## Headers/Footers & Page Setup +```javascript +const doc = new Document({ + sections: [{ + properties: { + page: { + margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 }, // 1440 = 1 inch + size: { orientation: PageOrientation.LANDSCAPE }, + pageNumbers: { start: 1, formatType: "decimal" } // "upperRoman", "lowerRoman", "upperLetter", "lowerLetter" + } + }, + headers: { + default: new Header({ children: [new Paragraph({ + alignment: AlignmentType.RIGHT, + children: [new TextRun("Header Text")] + })] }) + }, + footers: { + default: new Footer({ children: [new Paragraph({ + alignment: AlignmentType.CENTER, + children: [new TextRun("Page "), new TextRun({ children: [PageNumber.CURRENT] }), new TextRun(" of "), new TextRun({ children: [PageNumber.TOTAL_PAGES] })] + })] }) + }, + children: [/* content */] + }] +}); +``` + +## Tabs +```javascript +new Paragraph({ + tabStops: [ + { type: TabStopType.LEFT, position: TabStopPosition.MAX / 4 }, + { type: TabStopType.CENTER, position: TabStopPosition.MAX / 2 }, + { type: TabStopType.RIGHT, position: TabStopPosition.MAX * 3 / 4 } + ], + children: [new TextRun("Left\tCenter\tRight")] +}) +``` + +## Constants & Quick Reference +- **Underlines:** `SINGLE`, `DOUBLE`, `WAVY`, `DASH` +- **Borders:** `SINGLE`, `DOUBLE`, `DASHED`, `DOTTED` +- **Numbering:** `DECIMAL` (1,2,3), `UPPER_ROMAN` (I,II,III), `LOWER_LETTER` (a,b,c) +- **Tabs:** `LEFT`, `CENTER`, `RIGHT`, `DECIMAL` +- **Symbols:** `"2022"` (•), `"00A9"` (©), `"00AE"` (®), `"2122"` (™), `"00B0"` (°), `"F070"` (✓), `"F0FC"` (✗) + +## Critical Issues & Common Mistakes +- **CRITICAL for cover pages**: If the document has a cover page, the cover content should be centered both horizontally (AlignmentType.CENTER) and vertically (use spacing.before to adjust) +- **CRITICAL: PageBreak must ALWAYS be inside a Paragraph** - standalone PageBreak creates invalid XML that Word cannot open +- **ALWAYS use ShadingType.CLEAR for table cell shading** - Never use ShadingType.SOLID (causes black background). +- Measurements in DXA (1440 = 1 inch) | Each table cell needs ≥1 Paragraph | If TOC is added, it requires HeadingLevel styles only +- **CRITICAL: ALWAYS use system-installed fonts** - Times New Roman for English, SimSun for Chinese - NEVER download custom fonts unless absolutely necessary +- **ALWAYS use custom styles** with appropriate system fonts for professional appearance and proper visual hierarchy +- **ALWAYS set a default font** using `styles.default.document.run.font` - **Times New Roman** for English, **SimSun** for Chinese +- **CRITICAL for Chinese documents**: Use SimSun for body text, SimHei ONLY for headings - NEVER use SimHei for entire document +- **CRITICAL for Chinese body text**: Add first-line indent with `indent: { firstLine: 480 }` (approximately 2 characters for 12pt font) +- **ALWAYS use columnWidths array for tables** + individual cell widths for compatibility +- **NEVER use unicode symbols for bullets** - always use proper numbering configuration with `LevelFormat.BULLET` constant (NOT the string "bullet") +- **NEVER use \n for line breaks anywhere** - always use separate Paragraph elements for each line +- **ALWAYS use TextRun objects within Paragraph children** - never use text property directly on Paragraph +- **CRITICAL for images**: ImageRun REQUIRES `type` parameter - always specify "png", "jpg", "jpeg", "gif", "bmp", or "svg" +- **CRITICAL for bullets**: Must use `LevelFormat.BULLET` constant, not string "bullet", and include `text: "•"` for the bullet character +- **CRITICAL for numbering**: Each numbering reference creates an INDEPENDENT list. Same reference = continues numbering (1,2,3 then 4,5,6). Different reference = restarts at 1 (1,2,3 then 1,2,3). Use unique reference names for each separate numbered section! +- **CRITICAL for TOC**: When using TableOfContents, headings must use HeadingLevel ONLY - do NOT add custom styles to heading paragraphs or TOC will break. +- **CRITICAL for Tables**: Set `columnWidths` array + individual cell widths, apply borders to cells not table +- **MANDATORY for Tables**: ALWAYS set `margins` at Table level - this prevents text from touching borders and is required for professional quality. NEVER omit this property. +- **Set table margins at TABLE level** for consistent cell padding (avoids repetition per cell) \ No newline at end of file diff --git a/skills/docx/ooxml.md b/skills/docx/ooxml.md new file mode 100755 index 0000000..47af881 --- /dev/null +++ b/skills/docx/ooxml.md @@ -0,0 +1,615 @@ +# Office Open XML Technical Reference + +**Important: Read this entire document before starting.** This document covers: +- [Technical Guidelines](#technical-guidelines) - Schema compliance rules and validation requirements +- [Document Content Patterns](#document-content-patterns) - XML patterns for headings, lists, tables, formatting, etc. +- [Document Library (Python)](#document-library-python) - Recommended approach for OOXML manipulation with automatic infrastructure setup +- [Tracked Changes (Redlining)](#tracked-changes-redlining) - XML patterns for implementing tracked changes + +## Technical Guidelines + +### Schema Compliance +- **Element ordering in ``**: ``, ``, ``, ``, `` +- **Whitespace**: Add `xml:space='preserve'` to `` elements with leading/trailing spaces +- **Unicode**: Escape characters in ASCII content: `"` becomes `“` + - **Character encoding reference**: Curly quotes `""` become `“”`, apostrophe `'` becomes `’`, em-dash `—` becomes `—` +- **Tracked changes**: Use `` and `` tags with `w:author="GLM"` outside `` elements + - **Critical**: `` closes with ``, `` closes with `` - never mix + - **RSIDs must be 8-digit hex**: Use values like `00AB1234` (only 0-9, A-F characters) + - **trackRevisions placement**: Add `` after `` in settings.xml +- **Images**: Add to `word/media/`, reference in `document.xml`, set dimensions to prevent overflow + +## Document Content Patterns + +### Basic Structure +```xml + + Text content + +``` + +### Headings and Styles +```xml + + + + + + Document Title + + + + + Section Heading + +``` + +### Text Formatting +```xml + +Bold + +Italic + +Underlined + +Highlighted +``` + +### Lists +```xml + + + + + + + + First item + + + + + + + + + + New list item 1 + + + + + + + + + + + Bullet item + +``` + +### Tables +```xml + + + + + + + + + + + + Cell 1 + + + + Cell 2 + + + +``` + +### Layout +```xml + + + + + + + + + + + + New Section Title + + + + + + + + + + Centered text + + + + + + + + Monospace text + + + + + + + This text is Courier New + + and this text uses default font + +``` + +## File Updates + +When adding content, update these files: + +**`word/_rels/document.xml.rels`:** +```xml + + +``` + +**`[Content_Types].xml`:** +```xml + + +``` + +### Images +**CRITICAL**: Calculate dimensions to prevent page overflow and maintain aspect ratio. + +```xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +``` + +### Links (Hyperlinks) + +**IMPORTANT**: All hyperlinks (both internal and external) require the Hyperlink style to be defined in styles.xml. Without this style, links will look like regular text instead of blue underlined clickable links. + +**External Links:** +```xml + + + + + Link Text + + + + + +``` + +**Internal Links:** + +```xml + + + + + Link Text + + + + + +Target content + +``` + +**Hyperlink Style (required in styles.xml):** +```xml + + + + + + + + + + +``` + +## Document Library (Python) + +Use the Document class from `scripts/document.py` for all tracked changes and comments. It automatically handles infrastructure setup (people.xml, RSIDs, settings.xml, comment files, relationships, content types). Only use direct XML manipulation for complex scenarios not supported by the library. + +**Working with Unicode and Entities:** +- **Searching**: Both entity notation and Unicode characters work - `contains="“Company"` and `contains="\u201cCompany"` find the same text +- **Replacing**: Use either entities (`“`) or Unicode (`\u201c`) - both work and will be converted appropriately based on the file's encoding (ascii → entities, utf-8 → Unicode) + +### Initialization + +**Find the docx skill root** (directory containing `scripts/` and `ooxml/`): +```bash +# Search for document.py to locate the skill root +# Note: /mnt/skills is used here as an example; check your context for the actual location +find /mnt/skills -name "document.py" -path "*/docx/scripts/*" 2>/dev/null | head -1 +# Example output: /mnt/skills/docx/scripts/document.py +# Skill root is: /mnt/skills/docx +``` + +**Run your script with PYTHONPATH** set to the docx skill root: +```bash +PYTHONPATH=/mnt/skills/docx python your_script.py +``` + +**In your script**, import from the skill root: +```python +from scripts.document import Document, DocxXMLEditor + +# Basic initialization (automatically creates temp copy and sets up infrastructure) +doc = Document('unpacked') + +# Customize author and initials +doc = Document('unpacked', author="John Doe", initials="JD") + +# Enable track revisions mode +doc = Document('unpacked', track_revisions=True) + +# Specify custom RSID (auto-generated if not provided) +doc = Document('unpacked', rsid="07DC5ECB") +``` + +### Creating Tracked Changes + +**CRITICAL**: Only mark text that actually changes. Keep ALL unchanged text outside ``/`` tags. Marking unchanged text makes edits unprofessional and harder to review. + +**Attribute Handling**: The Document class auto-injects attributes (w:id, w:date, w:rsidR, w:rsidDel, w16du:dateUtc, xml:space) into new elements. When preserving unchanged text from the original document, copy the original `` element with its existing attributes to maintain document integrity. + +**Method Selection Guide**: +- **Adding your own changes to regular text**: Use `replace_node()` with ``/`` tags, or `suggest_deletion()` for removing entire `` or `` elements +- **Partially modifying another author's tracked change**: Use `replace_node()` to nest your changes inside their ``/`` +- **Completely rejecting another author's insertion**: Use `revert_insertion()` on the `` element (NOT `suggest_deletion()`) +- **Completely rejecting another author's deletion**: Use `revert_deletion()` on the `` element to restore deleted content using tracked changes + +```python +# Minimal edit - change one word: "The report is monthly" → "The report is quarterly" +# Original: The report is monthly +node = doc["word/document.xml"].get_node(tag="w:r", contains="The report is monthly") +rpr = tags[0].toxml() if (tags := node.getElementsByTagName("w:rPr")) else "" +replacement = f'{rpr}The report is {rpr}monthly{rpr}quarterly' +doc["word/document.xml"].replace_node(node, replacement) + +# Minimal edit - change number: "within 30 days" → "within 45 days" +# Original: within 30 days +node = doc["word/document.xml"].get_node(tag="w:r", contains="within 30 days") +rpr = tags[0].toxml() if (tags := node.getElementsByTagName("w:rPr")) else "" +replacement = f'{rpr}within {rpr}30{rpr}45{rpr} days' +doc["word/document.xml"].replace_node(node, replacement) + +# Complete replacement - preserve formatting even when replacing all text +node = doc["word/document.xml"].get_node(tag="w:r", contains="apple") +rpr = tags[0].toxml() if (tags := node.getElementsByTagName("w:rPr")) else "" +replacement = f'{rpr}apple{rpr}banana orange' +doc["word/document.xml"].replace_node(node, replacement) + +# Insert new content (no attributes needed - auto-injected) +node = doc["word/document.xml"].get_node(tag="w:r", contains="existing text") +doc["word/document.xml"].insert_after(node, 'new text') + +# Partially delete another author's insertion +# Original: quarterly financial report +# Goal: Delete only "financial" to make it "quarterly report" +node = doc["word/document.xml"].get_node(tag="w:ins", attrs={"w:id": "5"}) +# IMPORTANT: Preserve w:author="Jane Smith" on the outer to maintain authorship +replacement = ''' + quarterly + financial + report +''' +doc["word/document.xml"].replace_node(node, replacement) + +# Change part of another author's insertion +# Original: in silence, safe and sound +# Goal: Change "safe and sound" to "soft and unbound" +node = doc["word/document.xml"].get_node(tag="w:ins", attrs={"w:id": "8"}) +replacement = f''' + in silence, + + + soft and unbound + + + safe and sound +''' +doc["word/document.xml"].replace_node(node, replacement) + +# Delete entire run (use only when deleting all content; use replace_node for partial deletions) +node = doc["word/document.xml"].get_node(tag="w:r", contains="text to delete") +doc["word/document.xml"].suggest_deletion(node) + +# Delete entire paragraph (in-place, handles both regular and numbered list paragraphs) +para = doc["word/document.xml"].get_node(tag="w:p", contains="paragraph to delete") +doc["word/document.xml"].suggest_deletion(para) + +# Add new numbered list item +target_para = doc["word/document.xml"].get_node(tag="w:p", contains="existing list item") +pPr = tags[0].toxml() if (tags := target_para.getElementsByTagName("w:pPr")) else "" +new_item = f'{pPr}New item' +tracked_para = DocxXMLEditor.suggest_paragraph(new_item) +doc["word/document.xml"].insert_after(target_para, tracked_para) +# Optional: add spacing paragraph before content for better visual separation +# spacing = DocxXMLEditor.suggest_paragraph('') +# doc["word/document.xml"].insert_after(target_para, spacing + tracked_para) +``` + +### Adding Comments + +Comments are added with the author name "Z.ai" by default. Initialize the Document with custom author if needed: + +```python +# Initialize with Z.ai as author (recommended) +doc = Document('unpacked', author="Z.ai", initials="Z") + +# Add comment spanning two existing tracked changes +# Note: w:id is auto-generated. Only search by w:id if you know it from XML inspection +start_node = doc["word/document.xml"].get_node(tag="w:del", attrs={"w:id": "1"}) +end_node = doc["word/document.xml"].get_node(tag="w:ins", attrs={"w:id": "2"}) +doc.add_comment(start=start_node, end=end_node, text="Explanation of this change") + +# Add comment on a paragraph +para = doc["word/document.xml"].get_node(tag="w:p", contains="paragraph text") +doc.add_comment(start=para, end=para, text="Comment on this paragraph") + +# Add comment on newly created tracked change +# First create the tracked change +node = doc["word/document.xml"].get_node(tag="w:r", contains="old") +new_nodes = doc["word/document.xml"].replace_node( + node, + 'oldnew' +) +# Then add comment on the newly created elements +# new_nodes[0] is the , new_nodes[1] is the +doc.add_comment(start=new_nodes[0], end=new_nodes[1], text="Changed old to new per requirements") + +# Reply to existing comment +doc.reply_to_comment(parent_comment_id=0, text="I agree with this change") +``` + +### Rejecting Tracked Changes + +**IMPORTANT**: Use `revert_insertion()` to reject insertions and `revert_deletion()` to restore deletions using tracked changes. Use `suggest_deletion()` only for regular unmarked content. + +```python +# Reject insertion (wraps it in deletion) +# Use this when another author inserted text that you want to delete +ins = doc["word/document.xml"].get_node(tag="w:ins", attrs={"w:id": "5"}) +nodes = doc["word/document.xml"].revert_insertion(ins) # Returns [ins] + +# Reject deletion (creates insertion to restore deleted content) +# Use this when another author deleted text that you want to restore +del_elem = doc["word/document.xml"].get_node(tag="w:del", attrs={"w:id": "3"}) +nodes = doc["word/document.xml"].revert_deletion(del_elem) # Returns [del_elem, new_ins] + +# Reject all insertions in a paragraph +para = doc["word/document.xml"].get_node(tag="w:p", contains="paragraph text") +nodes = doc["word/document.xml"].revert_insertion(para) # Returns [para] + +# Reject all deletions in a paragraph +para = doc["word/document.xml"].get_node(tag="w:p", contains="paragraph text") +nodes = doc["word/document.xml"].revert_deletion(para) # Returns [para] +``` + +### Inserting Images + +**CRITICAL**: The Document class works with a temporary copy at `doc.unpacked_path`. Always copy images to this temp directory, not the original unpacked folder. + +```python +from PIL import Image +import shutil, os + +# Initialize document first +doc = Document('unpacked') + +# Copy image and calculate full-width dimensions with aspect ratio +media_dir = os.path.join(doc.unpacked_path, 'word/media') +os.makedirs(media_dir, exist_ok=True) +shutil.copy('image.png', os.path.join(media_dir, 'image1.png')) +img = Image.open(os.path.join(media_dir, 'image1.png')) +width_emus = int(6.5 * 914400) # 6.5" usable width, 914400 EMUs/inch +height_emus = int(width_emus * img.size[1] / img.size[0]) + +# Add relationship and content type +rels_editor = doc['word/_rels/document.xml.rels'] +next_rid = rels_editor.get_next_rid() +rels_editor.append_to(rels_editor.dom.documentElement, + f'') +doc['[Content_Types].xml'].append_to(doc['[Content_Types].xml'].dom.documentElement, + '') + +# Insert image +node = doc["word/document.xml"].get_node(tag="w:p", line_number=100) +doc["word/document.xml"].insert_after(node, f''' + + + + + + + + + + + + + + + + + +''') +``` + +### Getting Nodes + +```python +# By text content +node = doc["word/document.xml"].get_node(tag="w:p", contains="specific text") + +# By line range +para = doc["word/document.xml"].get_node(tag="w:p", line_number=range(100, 150)) + +# By attributes +node = doc["word/document.xml"].get_node(tag="w:del", attrs={"w:id": "1"}) + +# By exact line number (must be line number where tag opens) +para = doc["word/document.xml"].get_node(tag="w:p", line_number=42) + +# Combine filters +node = doc["word/document.xml"].get_node(tag="w:r", line_number=range(40, 60), contains="text") + +# Disambiguate when text appears multiple times - add line_number range +node = doc["word/document.xml"].get_node(tag="w:r", contains="Section", line_number=range(2400, 2500)) +``` + +### Saving + +```python +# Save with automatic validation (copies back to original directory) +doc.save() # Validates by default, raises error if validation fails + +# Save to different location +doc.save('modified-unpacked') + +# Skip validation (debugging only - needing this in production indicates XML issues) +doc.save(validate=False) +``` + +### Direct DOM Manipulation + +For complex scenarios not covered by the library: + +```python +# Access any XML file +editor = doc["word/document.xml"] +editor = doc["word/comments.xml"] + +# Direct DOM access (defusedxml.minidom.Document) +node = doc["word/document.xml"].get_node(tag="w:p", line_number=5) +parent = node.parentNode +parent.removeChild(node) +parent.appendChild(node) # Move to end + +# General document manipulation (without tracked changes) +old_node = doc["word/document.xml"].get_node(tag="w:p", contains="original text") +doc["word/document.xml"].replace_node(old_node, "replacement text") + +# Multiple insertions - use return value to maintain order +node = doc["word/document.xml"].get_node(tag="w:r", line_number=100) +nodes = doc["word/document.xml"].insert_after(node, "A") +nodes = doc["word/document.xml"].insert_after(nodes[-1], "B") +nodes = doc["word/document.xml"].insert_after(nodes[-1], "C") +# Results in: original_node, A, B, C +``` + +## Tracked Changes (Redlining) + +**Use the Document class above for all tracked changes.** The patterns below are for reference when constructing replacement XML strings. + +### Validation Rules +The validator checks that the document text matches the original after reverting GLM's changes. This means: +- **NEVER modify text inside another author's `` or `` tags** +- **ALWAYS use nested deletions** to remove another author's insertions +- **Every edit must be properly tracked** with `` or `` tags + +### Tracked Change Patterns + +**CRITICAL RULES**: +1. Never modify the content inside another author's tracked changes. Always use nested deletions. +2. **XML Structure**: Always place `` and `` at paragraph level containing complete `` elements. Never nest inside `` elements - this creates invalid XML that breaks document processing. + +**Text Insertion:** +```xml + + + inserted text + + +``` + +**Text Deletion:** +```xml + + + deleted text + + +``` + +**Deleting Another Author's Insertion (MUST use nested structure):** +```xml + + + + monthly + + + + weekly + +``` + +**Restoring Another Author's Deletion:** +```xml + + + within 30 days + + + within 30 days + +``` \ No newline at end of file diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-chart.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-chart.xsd new file mode 100755 index 0000000..6454ef9 --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-chart.xsd @@ -0,0 +1,1499 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd new file mode 100755 index 0000000..afa4f46 --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd @@ -0,0 +1,146 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd new file mode 100755 index 0000000..64e66b8 --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd @@ -0,0 +1,1085 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd new file mode 100755 index 0000000..687eea8 --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd @@ -0,0 +1,11 @@ + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-main.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-main.xsd new file mode 100755 index 0000000..6ac81b0 --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-main.xsd @@ -0,0 +1,3081 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-picture.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-picture.xsd new file mode 100755 index 0000000..1dbf051 --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-picture.xsd @@ -0,0 +1,23 @@ + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd new file mode 100755 index 0000000..f1af17d --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd @@ -0,0 +1,185 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd new file mode 100755 index 0000000..0a185ab --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd @@ -0,0 +1,287 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/pml.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/pml.xsd new file mode 100755 index 0000000..14ef488 --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/pml.xsd @@ -0,0 +1,1676 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd new file mode 100755 index 0000000..c20f3bf --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd @@ -0,0 +1,28 @@ + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd new file mode 100755 index 0000000..ac60252 --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd @@ -0,0 +1,144 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd new file mode 100755 index 0000000..424b8ba --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd @@ -0,0 +1,174 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd new file mode 100755 index 0000000..2bddce2 --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd @@ -0,0 +1,25 @@ + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd new file mode 100755 index 0000000..8a8c18b --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd @@ -0,0 +1,18 @@ + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd new file mode 100755 index 0000000..5c42706 --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd @@ -0,0 +1,59 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd new file mode 100755 index 0000000..853c341 --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd @@ -0,0 +1,56 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd new file mode 100755 index 0000000..da835ee --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd @@ -0,0 +1,195 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-math.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-math.xsd new file mode 100755 index 0000000..87ad265 --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-math.xsd @@ -0,0 +1,582 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd new file mode 100755 index 0000000..9e86f1b --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd @@ -0,0 +1,25 @@ + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/sml.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/sml.xsd new file mode 100755 index 0000000..d0be42e --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/sml.xsd @@ -0,0 +1,4439 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-main.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-main.xsd new file mode 100755 index 0000000..8821dd1 --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-main.xsd @@ -0,0 +1,570 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd new file mode 100755 index 0000000..ca2575c --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd @@ -0,0 +1,509 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd new file mode 100755 index 0000000..dd079e6 --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd @@ -0,0 +1,12 @@ + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd new file mode 100755 index 0000000..3dd6cf6 --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd @@ -0,0 +1,108 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd new file mode 100755 index 0000000..f1041e3 --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd @@ -0,0 +1,96 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/wml.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/wml.xsd new file mode 100755 index 0000000..9c5b7a6 --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/wml.xsd @@ -0,0 +1,3646 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/xml.xsd b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/xml.xsd new file mode 100755 index 0000000..0f13678 --- /dev/null +++ b/skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/xml.xsd @@ -0,0 +1,116 @@ + + + + + + See http://www.w3.org/XML/1998/namespace.html and + http://www.w3.org/TR/REC-xml for information about this namespace. + + This schema document describes the XML namespace, in a form + suitable for import by other schema documents. + + Note that local names in this namespace are intended to be defined + only by the World Wide Web Consortium or its subgroups. The + following names are currently defined in this namespace and should + not be used with conflicting semantics by any Working Group, + specification, or document instance: + + base (as an attribute name): denotes an attribute whose value + provides a URI to be used as the base for interpreting any + relative URIs in the scope of the element on which it + appears; its value is inherited. This name is reserved + by virtue of its definition in the XML Base specification. + + lang (as an attribute name): denotes an attribute whose value + is a language code for the natural language of the content of + any element; its value is inherited. This name is reserved + by virtue of its definition in the XML specification. + + space (as an attribute name): denotes an attribute whose + value is a keyword indicating what whitespace processing + discipline is intended for the content of the element; its + value is inherited. This name is reserved by virtue of its + definition in the XML specification. + + Father (in any context at all): denotes Jon Bosak, the chair of + the original XML Working Group. This name is reserved by + the following decision of the W3C XML Plenary and + XML Coordination groups: + + In appreciation for his vision, leadership and dedication + the W3C XML Plenary on this 10th day of February, 2000 + reserves for Jon Bosak in perpetuity the XML name + xml:Father + + + + + This schema defines attributes and an attribute group + suitable for use by + schemas wishing to allow xml:base, xml:lang or xml:space attributes + on elements they define. + + To enable this, such a schema must import this schema + for the XML namespace, e.g. as follows: + <schema . . .> + . . . + <import namespace="http://www.w3.org/XML/1998/namespace" + schemaLocation="http://www.w3.org/2001/03/xml.xsd"/> + + Subsequently, qualified reference to any of the attributes + or the group defined below will have the desired effect, e.g. + + <type . . .> + . . . + <attributeGroup ref="xml:specialAttrs"/> + + will define a type which will schema-validate an instance + element with any of those attributes + + + + In keeping with the XML Schema WG's standard versioning + policy, this schema document will persist at + http://www.w3.org/2001/03/xml.xsd. + At the date of issue it can also be found at + http://www.w3.org/2001/xml.xsd. + The schema document at that URI may however change in the future, + in order to remain compatible with the latest version of XML Schema + itself. In other words, if the XML Schema namespace changes, the version + of this document at + http://www.w3.org/2001/xml.xsd will change + accordingly; the version at + http://www.w3.org/2001/03/xml.xsd will not change. + + + + + + In due course, we should install the relevant ISO 2- and 3-letter + codes as the enumerated possible values . . . + + + + + + + + + + + + + + + See http://www.w3.org/TR/xmlbase/ for + information about this attribute. + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ecma/fouth-edition/opc-contentTypes.xsd b/skills/docx/ooxml/schemas/ecma/fouth-edition/opc-contentTypes.xsd new file mode 100755 index 0000000..a6de9d2 --- /dev/null +++ b/skills/docx/ooxml/schemas/ecma/fouth-edition/opc-contentTypes.xsd @@ -0,0 +1,42 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ecma/fouth-edition/opc-coreProperties.xsd b/skills/docx/ooxml/schemas/ecma/fouth-edition/opc-coreProperties.xsd new file mode 100755 index 0000000..10e978b --- /dev/null +++ b/skills/docx/ooxml/schemas/ecma/fouth-edition/opc-coreProperties.xsd @@ -0,0 +1,50 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ecma/fouth-edition/opc-digSig.xsd b/skills/docx/ooxml/schemas/ecma/fouth-edition/opc-digSig.xsd new file mode 100755 index 0000000..4248bf7 --- /dev/null +++ b/skills/docx/ooxml/schemas/ecma/fouth-edition/opc-digSig.xsd @@ -0,0 +1,49 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/ecma/fouth-edition/opc-relationships.xsd b/skills/docx/ooxml/schemas/ecma/fouth-edition/opc-relationships.xsd new file mode 100755 index 0000000..5649746 --- /dev/null +++ b/skills/docx/ooxml/schemas/ecma/fouth-edition/opc-relationships.xsd @@ -0,0 +1,33 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/mce/mc.xsd b/skills/docx/ooxml/schemas/mce/mc.xsd new file mode 100755 index 0000000..ef72545 --- /dev/null +++ b/skills/docx/ooxml/schemas/mce/mc.xsd @@ -0,0 +1,75 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/microsoft/wml-2010.xsd b/skills/docx/ooxml/schemas/microsoft/wml-2010.xsd new file mode 100755 index 0000000..f65f777 --- /dev/null +++ b/skills/docx/ooxml/schemas/microsoft/wml-2010.xsd @@ -0,0 +1,560 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/microsoft/wml-2012.xsd b/skills/docx/ooxml/schemas/microsoft/wml-2012.xsd new file mode 100755 index 0000000..6b00755 --- /dev/null +++ b/skills/docx/ooxml/schemas/microsoft/wml-2012.xsd @@ -0,0 +1,67 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/microsoft/wml-2018.xsd b/skills/docx/ooxml/schemas/microsoft/wml-2018.xsd new file mode 100755 index 0000000..f321d33 --- /dev/null +++ b/skills/docx/ooxml/schemas/microsoft/wml-2018.xsd @@ -0,0 +1,14 @@ + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/microsoft/wml-cex-2018.xsd b/skills/docx/ooxml/schemas/microsoft/wml-cex-2018.xsd new file mode 100755 index 0000000..364c6a9 --- /dev/null +++ b/skills/docx/ooxml/schemas/microsoft/wml-cex-2018.xsd @@ -0,0 +1,20 @@ + + + + + + + + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/microsoft/wml-cid-2016.xsd b/skills/docx/ooxml/schemas/microsoft/wml-cid-2016.xsd new file mode 100755 index 0000000..fed9d15 --- /dev/null +++ b/skills/docx/ooxml/schemas/microsoft/wml-cid-2016.xsd @@ -0,0 +1,13 @@ + + + + + + + + + + + + + diff --git a/skills/docx/ooxml/schemas/microsoft/wml-sdtdatahash-2020.xsd b/skills/docx/ooxml/schemas/microsoft/wml-sdtdatahash-2020.xsd new file mode 100755 index 0000000..680cf15 --- /dev/null +++ b/skills/docx/ooxml/schemas/microsoft/wml-sdtdatahash-2020.xsd @@ -0,0 +1,4 @@ + + + + diff --git a/skills/docx/ooxml/schemas/microsoft/wml-symex-2015.xsd b/skills/docx/ooxml/schemas/microsoft/wml-symex-2015.xsd new file mode 100755 index 0000000..89ada90 --- /dev/null +++ b/skills/docx/ooxml/schemas/microsoft/wml-symex-2015.xsd @@ -0,0 +1,8 @@ + + + + + + + + diff --git a/skills/docx/ooxml/scripts/pack.py b/skills/docx/ooxml/scripts/pack.py new file mode 100755 index 0000000..68bc088 --- /dev/null +++ b/skills/docx/ooxml/scripts/pack.py @@ -0,0 +1,159 @@ +#!/usr/bin/env python3 +""" +Tool to pack a directory into a .docx, .pptx, or .xlsx file with XML formatting undone. + +Example usage: + python pack.py [--force] +""" + +import argparse +import shutil +import subprocess +import sys +import tempfile +import defusedxml.minidom +import zipfile +from pathlib import Path + + +def main(): + parser = argparse.ArgumentParser(description="Pack a directory into an Office file") + parser.add_argument("input_directory", help="Unpacked Office document directory") + parser.add_argument("output_file", help="Output Office file (.docx/.pptx/.xlsx)") + parser.add_argument("--force", action="store_true", help="Skip validation") + args = parser.parse_args() + + try: + success = pack_document( + args.input_directory, args.output_file, validate=not args.force + ) + + # Show warning if validation was skipped + if args.force: + print("Warning: Skipped validation, file may be corrupt", file=sys.stderr) + # Exit with error if validation failed + elif not success: + print("Contents would produce a corrupt file.", file=sys.stderr) + print("Please validate XML before repacking.", file=sys.stderr) + print("Use --force to skip validation and pack anyway.", file=sys.stderr) + sys.exit(1) + + except ValueError as e: + sys.exit(f"Error: {e}") + + +def pack_document(input_dir, output_file, validate=False): + """Pack a directory into an Office file (.docx/.pptx/.xlsx). + + Args: + input_dir: Path to unpacked Office document directory + output_file: Path to output Office file + validate: If True, validates with soffice (default: False) + + Returns: + bool: True if successful, False if validation failed + """ + input_dir = Path(input_dir) + output_file = Path(output_file) + + if not input_dir.is_dir(): + raise ValueError(f"{input_dir} is not a directory") + if output_file.suffix.lower() not in {".docx", ".pptx", ".xlsx"}: + raise ValueError(f"{output_file} must be a .docx, .pptx, or .xlsx file") + + # Work in temporary directory to avoid modifying original + with tempfile.TemporaryDirectory() as temp_dir: + temp_content_dir = Path(temp_dir) / "content" + shutil.copytree(input_dir, temp_content_dir) + + # Process XML files to remove pretty-printing whitespace + for pattern in ["*.xml", "*.rels"]: + for xml_file in temp_content_dir.rglob(pattern): + condense_xml(xml_file) + + # Create final Office file as zip archive + output_file.parent.mkdir(parents=True, exist_ok=True) + with zipfile.ZipFile(output_file, "w", zipfile.ZIP_DEFLATED) as zf: + for f in temp_content_dir.rglob("*"): + if f.is_file(): + zf.write(f, f.relative_to(temp_content_dir)) + + # Validate if requested + if validate: + if not validate_document(output_file): + output_file.unlink() # Delete the corrupt file + return False + + return True + + +def validate_document(doc_path): + """Validate document by converting to HTML with soffice.""" + # Determine the correct filter based on file extension + match doc_path.suffix.lower(): + case ".docx": + filter_name = "html:HTML" + case ".pptx": + filter_name = "html:impress_html_Export" + case ".xlsx": + filter_name = "html:HTML (StarCalc)" + + with tempfile.TemporaryDirectory() as temp_dir: + try: + result = subprocess.run( + [ + "soffice", + "--headless", + "--convert-to", + filter_name, + "--outdir", + temp_dir, + str(doc_path), + ], + capture_output=True, + timeout=10, + text=True, + ) + if not (Path(temp_dir) / f"{doc_path.stem}.html").exists(): + error_msg = result.stderr.strip() or "Document validation failed" + print(f"Validation error: {error_msg}", file=sys.stderr) + return False + return True + except FileNotFoundError: + print("Warning: soffice not found. Skipping validation.", file=sys.stderr) + return True + except subprocess.TimeoutExpired: + print("Validation error: Timeout during conversion", file=sys.stderr) + return False + except Exception as e: + print(f"Validation error: {e}", file=sys.stderr) + return False + + +def condense_xml(xml_file): + """Strip unnecessary whitespace and remove comments.""" + with open(xml_file, "r", encoding="utf-8") as f: + dom = defusedxml.minidom.parse(f) + + # Process each element to remove whitespace and comments + for element in dom.getElementsByTagName("*"): + # Skip w:t elements and their processing + if element.tagName.endswith(":t"): + continue + + # Remove whitespace-only text nodes and comment nodes + for child in list(element.childNodes): + if ( + child.nodeType == child.TEXT_NODE + and child.nodeValue + and child.nodeValue.strip() == "" + ) or child.nodeType == child.COMMENT_NODE: + element.removeChild(child) + + # Write back the condensed XML + with open(xml_file, "wb") as f: + f.write(dom.toxml(encoding="UTF-8")) + + +if __name__ == "__main__": + main() diff --git a/skills/docx/ooxml/scripts/unpack.py b/skills/docx/ooxml/scripts/unpack.py new file mode 100755 index 0000000..4938798 --- /dev/null +++ b/skills/docx/ooxml/scripts/unpack.py @@ -0,0 +1,29 @@ +#!/usr/bin/env python3 +"""Unpack and format XML contents of Office files (.docx, .pptx, .xlsx)""" + +import random +import sys +import defusedxml.minidom +import zipfile +from pathlib import Path + +# Get command line arguments +assert len(sys.argv) == 3, "Usage: python unpack.py " +input_file, output_dir = sys.argv[1], sys.argv[2] + +# Extract and format +output_path = Path(output_dir) +output_path.mkdir(parents=True, exist_ok=True) +zipfile.ZipFile(input_file).extractall(output_path) + +# Pretty print all XML files +xml_files = list(output_path.rglob("*.xml")) + list(output_path.rglob("*.rels")) +for xml_file in xml_files: + content = xml_file.read_text(encoding="utf-8") + dom = defusedxml.minidom.parseString(content) + xml_file.write_bytes(dom.toprettyxml(indent=" ", encoding="ascii")) + +# For .docx files, suggest an RSID for tracked changes +if input_file.endswith(".docx"): + suggested_rsid = "".join(random.choices("0123456789ABCDEF", k=8)) + print(f"Suggested RSID for edit session: {suggested_rsid}") diff --git a/skills/docx/ooxml/scripts/validate.py b/skills/docx/ooxml/scripts/validate.py new file mode 100755 index 0000000..508c589 --- /dev/null +++ b/skills/docx/ooxml/scripts/validate.py @@ -0,0 +1,69 @@ +#!/usr/bin/env python3 +""" +Command line tool to validate Office document XML files against XSD schemas and tracked changes. + +Usage: + python validate.py --original +""" + +import argparse +import sys +from pathlib import Path + +from validation import DOCXSchemaValidator, PPTXSchemaValidator, RedliningValidator + + +def main(): + parser = argparse.ArgumentParser(description="Validate Office document XML files") + parser.add_argument( + "unpacked_dir", + help="Path to unpacked Office document directory", + ) + parser.add_argument( + "--original", + required=True, + help="Path to original file (.docx/.pptx/.xlsx)", + ) + parser.add_argument( + "-v", + "--verbose", + action="store_true", + help="Enable verbose output", + ) + args = parser.parse_args() + + # Validate paths + unpacked_dir = Path(args.unpacked_dir) + original_file = Path(args.original) + file_extension = original_file.suffix.lower() + assert unpacked_dir.is_dir(), f"Error: {unpacked_dir} is not a directory" + assert original_file.is_file(), f"Error: {original_file} is not a file" + assert file_extension in [".docx", ".pptx", ".xlsx"], ( + f"Error: {original_file} must be a .docx, .pptx, or .xlsx file" + ) + + # Run validations + match file_extension: + case ".docx": + validators = [DOCXSchemaValidator, RedliningValidator] + case ".pptx": + validators = [PPTXSchemaValidator] + case _: + print(f"Error: Validation not supported for file type {file_extension}") + sys.exit(1) + + # Run validators + success = True + for V in validators: + validator = V(unpacked_dir, original_file, verbose=args.verbose) + if not validator.validate(): + success = False + + if success: + print("All validations PASSED!") + + sys.exit(0 if success else 1) + + +if __name__ == "__main__": + main() diff --git a/skills/docx/ooxml/scripts/validation/__init__.py b/skills/docx/ooxml/scripts/validation/__init__.py new file mode 100755 index 0000000..db092ec --- /dev/null +++ b/skills/docx/ooxml/scripts/validation/__init__.py @@ -0,0 +1,15 @@ +""" +Validation modules for Word document processing. +""" + +from .base import BaseSchemaValidator +from .docx import DOCXSchemaValidator +from .pptx import PPTXSchemaValidator +from .redlining import RedliningValidator + +__all__ = [ + "BaseSchemaValidator", + "DOCXSchemaValidator", + "PPTXSchemaValidator", + "RedliningValidator", +] diff --git a/skills/docx/ooxml/scripts/validation/base.py b/skills/docx/ooxml/scripts/validation/base.py new file mode 100755 index 0000000..0681b19 --- /dev/null +++ b/skills/docx/ooxml/scripts/validation/base.py @@ -0,0 +1,951 @@ +""" +Base validator with common validation logic for document files. +""" + +import re +from pathlib import Path + +import lxml.etree + + +class BaseSchemaValidator: + """Base validator with common validation logic for document files.""" + + # Elements whose 'id' attributes must be unique within their file + # Format: element_name -> (attribute_name, scope) + # scope can be 'file' (unique within file) or 'global' (unique across all files) + UNIQUE_ID_REQUIREMENTS = { + # Word elements + "comment": ("id", "file"), # Comment IDs in comments.xml + "commentrangestart": ("id", "file"), # Must match comment IDs + "commentrangeend": ("id", "file"), # Must match comment IDs + "bookmarkstart": ("id", "file"), # Bookmark start IDs + "bookmarkend": ("id", "file"), # Bookmark end IDs + # Note: ins and del (track changes) can share IDs when part of same revision + # PowerPoint elements + "sldid": ("id", "file"), # Slide IDs in presentation.xml + "sldmasterid": ("id", "global"), # Slide master IDs must be globally unique + "sldlayoutid": ("id", "global"), # Slide layout IDs must be globally unique + "cm": ("authorid", "file"), # Comment author IDs + # Excel elements + "sheet": ("sheetid", "file"), # Sheet IDs in workbook.xml + "definedname": ("id", "file"), # Named range IDs + # Drawing/Shape elements (all formats) + "cxnsp": ("id", "file"), # Connection shape IDs + "sp": ("id", "file"), # Shape IDs + "pic": ("id", "file"), # Picture IDs + "grpsp": ("id", "file"), # Group shape IDs + } + + # Mapping of element names to expected relationship types + # Subclasses should override this with format-specific mappings + ELEMENT_RELATIONSHIP_TYPES = {} + + # Unified schema mappings for all Office document types + SCHEMA_MAPPINGS = { + # Document type specific schemas + "word": "ISO-IEC29500-4_2016/wml.xsd", # Word documents + "ppt": "ISO-IEC29500-4_2016/pml.xsd", # PowerPoint presentations + "xl": "ISO-IEC29500-4_2016/sml.xsd", # Excel spreadsheets + # Common file types + "[Content_Types].xml": "ecma/fouth-edition/opc-contentTypes.xsd", + "app.xml": "ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd", + "core.xml": "ecma/fouth-edition/opc-coreProperties.xsd", + "custom.xml": "ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd", + ".rels": "ecma/fouth-edition/opc-relationships.xsd", + # Word-specific files + "people.xml": "microsoft/wml-2012.xsd", + "commentsIds.xml": "microsoft/wml-cid-2016.xsd", + "commentsExtensible.xml": "microsoft/wml-cex-2018.xsd", + "commentsExtended.xml": "microsoft/wml-2012.xsd", + # Chart files (common across document types) + "chart": "ISO-IEC29500-4_2016/dml-chart.xsd", + # Theme files (common across document types) + "theme": "ISO-IEC29500-4_2016/dml-main.xsd", + # Drawing and media files + "drawing": "ISO-IEC29500-4_2016/dml-main.xsd", + } + + # Unified namespace constants + MC_NAMESPACE = "http://schemas.openxmlformats.org/markup-compatibility/2006" + XML_NAMESPACE = "http://www.w3.org/XML/1998/namespace" + + # Common OOXML namespaces used across validators + PACKAGE_RELATIONSHIPS_NAMESPACE = ( + "http://schemas.openxmlformats.org/package/2006/relationships" + ) + OFFICE_RELATIONSHIPS_NAMESPACE = ( + "http://schemas.openxmlformats.org/officeDocument/2006/relationships" + ) + CONTENT_TYPES_NAMESPACE = ( + "http://schemas.openxmlformats.org/package/2006/content-types" + ) + + # Folders where we should clean ignorable namespaces + MAIN_CONTENT_FOLDERS = {"word", "ppt", "xl"} + + # All allowed OOXML namespaces (superset of all document types) + OOXML_NAMESPACES = { + "http://schemas.openxmlformats.org/officeDocument/2006/math", + "http://schemas.openxmlformats.org/officeDocument/2006/relationships", + "http://schemas.openxmlformats.org/schemaLibrary/2006/main", + "http://schemas.openxmlformats.org/drawingml/2006/main", + "http://schemas.openxmlformats.org/drawingml/2006/chart", + "http://schemas.openxmlformats.org/drawingml/2006/chartDrawing", + "http://schemas.openxmlformats.org/drawingml/2006/diagram", + "http://schemas.openxmlformats.org/drawingml/2006/picture", + "http://schemas.openxmlformats.org/drawingml/2006/spreadsheetDrawing", + "http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing", + "http://schemas.openxmlformats.org/wordprocessingml/2006/main", + "http://schemas.openxmlformats.org/presentationml/2006/main", + "http://schemas.openxmlformats.org/spreadsheetml/2006/main", + "http://schemas.openxmlformats.org/officeDocument/2006/sharedTypes", + "http://www.w3.org/XML/1998/namespace", + } + + def __init__(self, unpacked_dir, original_file, verbose=False): + self.unpacked_dir = Path(unpacked_dir).resolve() + self.original_file = Path(original_file) + self.verbose = verbose + + # Set schemas directory + self.schemas_dir = Path(__file__).parent.parent.parent / "schemas" + + # Get all XML and .rels files + patterns = ["*.xml", "*.rels"] + self.xml_files = [ + f for pattern in patterns for f in self.unpacked_dir.rglob(pattern) + ] + + if not self.xml_files: + print(f"Warning: No XML files found in {self.unpacked_dir}") + + def validate(self): + """Run all validation checks and return True if all pass.""" + raise NotImplementedError("Subclasses must implement the validate method") + + def validate_xml(self): + """Validate that all XML files are well-formed.""" + errors = [] + + for xml_file in self.xml_files: + try: + # Try to parse the XML file + lxml.etree.parse(str(xml_file)) + except lxml.etree.XMLSyntaxError as e: + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: " + f"Line {e.lineno}: {e.msg}" + ) + except Exception as e: + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: " + f"Unexpected error: {str(e)}" + ) + + if errors: + print(f"FAILED - Found {len(errors)} XML violations:") + for error in errors: + print(error) + return False + else: + if self.verbose: + print("PASSED - All XML files are well-formed") + return True + + def validate_namespaces(self): + """Validate that namespace prefixes in Ignorable attributes are declared.""" + errors = [] + + for xml_file in self.xml_files: + try: + root = lxml.etree.parse(str(xml_file)).getroot() + declared = set(root.nsmap.keys()) - {None} # Exclude default namespace + + for attr_val in [ + v for k, v in root.attrib.items() if k.endswith("Ignorable") + ]: + undeclared = set(attr_val.split()) - declared + errors.extend( + f" {xml_file.relative_to(self.unpacked_dir)}: " + f"Namespace '{ns}' in Ignorable but not declared" + for ns in undeclared + ) + except lxml.etree.XMLSyntaxError: + continue + + if errors: + print(f"FAILED - {len(errors)} namespace issues:") + for error in errors: + print(error) + return False + if self.verbose: + print("PASSED - All namespace prefixes properly declared") + return True + + def validate_unique_ids(self): + """Validate that specific IDs are unique according to OOXML requirements.""" + errors = [] + global_ids = {} # Track globally unique IDs across all files + + for xml_file in self.xml_files: + try: + root = lxml.etree.parse(str(xml_file)).getroot() + file_ids = {} # Track IDs that must be unique within this file + + # Remove all mc:AlternateContent elements from the tree + mc_elements = root.xpath( + ".//mc:AlternateContent", namespaces={"mc": self.MC_NAMESPACE} + ) + for elem in mc_elements: + elem.getparent().remove(elem) + + # Now check IDs in the cleaned tree + for elem in root.iter(): + # Get the element name without namespace + tag = ( + elem.tag.split("}")[-1].lower() + if "}" in elem.tag + else elem.tag.lower() + ) + + # Check if this element type has ID uniqueness requirements + if tag in self.UNIQUE_ID_REQUIREMENTS: + attr_name, scope = self.UNIQUE_ID_REQUIREMENTS[tag] + + # Look for the specified attribute + id_value = None + for attr, value in elem.attrib.items(): + attr_local = ( + attr.split("}")[-1].lower() + if "}" in attr + else attr.lower() + ) + if attr_local == attr_name: + id_value = value + break + + if id_value is not None: + if scope == "global": + # Check global uniqueness + if id_value in global_ids: + prev_file, prev_line, prev_tag = global_ids[ + id_value + ] + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: " + f"Line {elem.sourceline}: Global ID '{id_value}' in <{tag}> " + f"already used in {prev_file} at line {prev_line} in <{prev_tag}>" + ) + else: + global_ids[id_value] = ( + xml_file.relative_to(self.unpacked_dir), + elem.sourceline, + tag, + ) + elif scope == "file": + # Check file-level uniqueness + key = (tag, attr_name) + if key not in file_ids: + file_ids[key] = {} + + if id_value in file_ids[key]: + prev_line = file_ids[key][id_value] + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: " + f"Line {elem.sourceline}: Duplicate {attr_name}='{id_value}' in <{tag}> " + f"(first occurrence at line {prev_line})" + ) + else: + file_ids[key][id_value] = elem.sourceline + + except (lxml.etree.XMLSyntaxError, Exception) as e: + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: Error: {e}" + ) + + if errors: + print(f"FAILED - Found {len(errors)} ID uniqueness violations:") + for error in errors: + print(error) + return False + else: + if self.verbose: + print("PASSED - All required IDs are unique") + return True + + def validate_file_references(self): + """ + Validate that all .rels files properly reference files and that all files are referenced. + """ + errors = [] + + # Find all .rels files + rels_files = list(self.unpacked_dir.rglob("*.rels")) + + if not rels_files: + if self.verbose: + print("PASSED - No .rels files found") + return True + + # Get all files in the unpacked directory (excluding reference files) + all_files = [] + for file_path in self.unpacked_dir.rglob("*"): + if ( + file_path.is_file() + and file_path.name != "[Content_Types].xml" + and not file_path.name.endswith(".rels") + ): # This file is not referenced by .rels + all_files.append(file_path.resolve()) + + # Track all files that are referenced by any .rels file + all_referenced_files = set() + + if self.verbose: + print( + f"Found {len(rels_files)} .rels files and {len(all_files)} target files" + ) + + # Check each .rels file + for rels_file in rels_files: + try: + # Parse relationships file + rels_root = lxml.etree.parse(str(rels_file)).getroot() + + # Get the directory where this .rels file is located + rels_dir = rels_file.parent + + # Find all relationships and their targets + referenced_files = set() + broken_refs = [] + + for rel in rels_root.findall( + ".//ns:Relationship", + namespaces={"ns": self.PACKAGE_RELATIONSHIPS_NAMESPACE}, + ): + target = rel.get("Target") + if target and not target.startswith( + ("http", "mailto:") + ): # Skip external URLs + # Resolve the target path relative to the .rels file location + if rels_file.name == ".rels": + # Root .rels file - targets are relative to unpacked_dir + target_path = self.unpacked_dir / target + else: + # Other .rels files - targets are relative to their parent's parent + # e.g., word/_rels/document.xml.rels -> targets relative to word/ + base_dir = rels_dir.parent + target_path = base_dir / target + + # Normalize the path and check if it exists + try: + target_path = target_path.resolve() + if target_path.exists() and target_path.is_file(): + referenced_files.add(target_path) + all_referenced_files.add(target_path) + else: + broken_refs.append((target, rel.sourceline)) + except (OSError, ValueError): + broken_refs.append((target, rel.sourceline)) + + # Report broken references + if broken_refs: + rel_path = rels_file.relative_to(self.unpacked_dir) + for broken_ref, line_num in broken_refs: + errors.append( + f" {rel_path}: Line {line_num}: Broken reference to {broken_ref}" + ) + + except Exception as e: + rel_path = rels_file.relative_to(self.unpacked_dir) + errors.append(f" Error parsing {rel_path}: {e}") + + # Check for unreferenced files (files that exist but are not referenced anywhere) + unreferenced_files = set(all_files) - all_referenced_files + + if unreferenced_files: + for unref_file in sorted(unreferenced_files): + unref_rel_path = unref_file.relative_to(self.unpacked_dir) + errors.append(f" Unreferenced file: {unref_rel_path}") + + if errors: + print(f"FAILED - Found {len(errors)} relationship validation errors:") + for error in errors: + print(error) + print( + "CRITICAL: These errors will cause the document to appear corrupt. " + + "Broken references MUST be fixed, " + + "and unreferenced files MUST be referenced or removed." + ) + return False + else: + if self.verbose: + print( + "PASSED - All references are valid and all files are properly referenced" + ) + return True + + def validate_all_relationship_ids(self): + """ + Validate that all r:id attributes in XML files reference existing IDs + in their corresponding .rels files, and optionally validate relationship types. + """ + import lxml.etree + + errors = [] + + # Process each XML file that might contain r:id references + for xml_file in self.xml_files: + # Skip .rels files themselves + if xml_file.suffix == ".rels": + continue + + # Determine the corresponding .rels file + # For dir/file.xml, it's dir/_rels/file.xml.rels + rels_dir = xml_file.parent / "_rels" + rels_file = rels_dir / f"{xml_file.name}.rels" + + # Skip if there's no corresponding .rels file (that's okay) + if not rels_file.exists(): + continue + + try: + # Parse the .rels file to get valid relationship IDs and their types + rels_root = lxml.etree.parse(str(rels_file)).getroot() + rid_to_type = {} + + for rel in rels_root.findall( + f".//{{{self.PACKAGE_RELATIONSHIPS_NAMESPACE}}}Relationship" + ): + rid = rel.get("Id") + rel_type = rel.get("Type", "") + if rid: + # Check for duplicate rIds + if rid in rid_to_type: + rels_rel_path = rels_file.relative_to(self.unpacked_dir) + errors.append( + f" {rels_rel_path}: Line {rel.sourceline}: " + f"Duplicate relationship ID '{rid}' (IDs must be unique)" + ) + # Extract just the type name from the full URL + type_name = ( + rel_type.split("/")[-1] if "/" in rel_type else rel_type + ) + rid_to_type[rid] = type_name + + # Parse the XML file to find all r:id references + xml_root = lxml.etree.parse(str(xml_file)).getroot() + + # Find all elements with r:id attributes + for elem in xml_root.iter(): + # Check for r:id attribute (relationship ID) + rid_attr = elem.get(f"{{{self.OFFICE_RELATIONSHIPS_NAMESPACE}}}id") + if rid_attr: + xml_rel_path = xml_file.relative_to(self.unpacked_dir) + elem_name = ( + elem.tag.split("}")[-1] if "}" in elem.tag else elem.tag + ) + + # Check if the ID exists + if rid_attr not in rid_to_type: + errors.append( + f" {xml_rel_path}: Line {elem.sourceline}: " + f"<{elem_name}> references non-existent relationship '{rid_attr}' " + f"(valid IDs: {', '.join(sorted(rid_to_type.keys())[:5])}{'...' if len(rid_to_type) > 5 else ''})" + ) + # Check if we have type expectations for this element + elif self.ELEMENT_RELATIONSHIP_TYPES: + expected_type = self._get_expected_relationship_type( + elem_name + ) + if expected_type: + actual_type = rid_to_type[rid_attr] + # Check if the actual type matches or contains the expected type + if expected_type not in actual_type.lower(): + errors.append( + f" {xml_rel_path}: Line {elem.sourceline}: " + f"<{elem_name}> references '{rid_attr}' which points to '{actual_type}' " + f"but should point to a '{expected_type}' relationship" + ) + + except Exception as e: + xml_rel_path = xml_file.relative_to(self.unpacked_dir) + errors.append(f" Error processing {xml_rel_path}: {e}") + + if errors: + print(f"FAILED - Found {len(errors)} relationship ID reference errors:") + for error in errors: + print(error) + print("\nThese ID mismatches will cause the document to appear corrupt!") + return False + else: + if self.verbose: + print("PASSED - All relationship ID references are valid") + return True + + def _get_expected_relationship_type(self, element_name): + """ + Get the expected relationship type for an element. + First checks the explicit mapping, then tries pattern detection. + """ + # Normalize element name to lowercase + elem_lower = element_name.lower() + + # Check explicit mapping first + if elem_lower in self.ELEMENT_RELATIONSHIP_TYPES: + return self.ELEMENT_RELATIONSHIP_TYPES[elem_lower] + + # Try pattern detection for common patterns + # Pattern 1: Elements ending in "Id" often expect a relationship of the prefix type + if elem_lower.endswith("id") and len(elem_lower) > 2: + # e.g., "sldId" -> "sld", "sldMasterId" -> "sldMaster" + prefix = elem_lower[:-2] # Remove "id" + # Check if this might be a compound like "sldMasterId" + if prefix.endswith("master"): + return prefix.lower() + elif prefix.endswith("layout"): + return prefix.lower() + else: + # Simple case like "sldId" -> "slide" + # Common transformations + if prefix == "sld": + return "slide" + return prefix.lower() + + # Pattern 2: Elements ending in "Reference" expect a relationship of the prefix type + if elem_lower.endswith("reference") and len(elem_lower) > 9: + prefix = elem_lower[:-9] # Remove "reference" + return prefix.lower() + + return None + + def validate_content_types(self): + """Validate that all content files are properly declared in [Content_Types].xml.""" + errors = [] + + # Find [Content_Types].xml file + content_types_file = self.unpacked_dir / "[Content_Types].xml" + if not content_types_file.exists(): + print("FAILED - [Content_Types].xml file not found") + return False + + try: + # Parse and get all declared parts and extensions + root = lxml.etree.parse(str(content_types_file)).getroot() + declared_parts = set() + declared_extensions = set() + + # Get Override declarations (specific files) + for override in root.findall( + f".//{{{self.CONTENT_TYPES_NAMESPACE}}}Override" + ): + part_name = override.get("PartName") + if part_name is not None: + declared_parts.add(part_name.lstrip("/")) + + # Get Default declarations (by extension) + for default in root.findall( + f".//{{{self.CONTENT_TYPES_NAMESPACE}}}Default" + ): + extension = default.get("Extension") + if extension is not None: + declared_extensions.add(extension.lower()) + + # Root elements that require content type declaration + declarable_roots = { + "sld", + "sldLayout", + "sldMaster", + "presentation", # PowerPoint + "document", # Word + "workbook", + "worksheet", # Excel + "theme", # Common + } + + # Common media file extensions that should be declared + media_extensions = { + "png": "image/png", + "jpg": "image/jpeg", + "jpeg": "image/jpeg", + "gif": "image/gif", + "bmp": "image/bmp", + "tiff": "image/tiff", + "wmf": "image/x-wmf", + "emf": "image/x-emf", + } + + # Get all files in the unpacked directory + all_files = list(self.unpacked_dir.rglob("*")) + all_files = [f for f in all_files if f.is_file()] + + # Check all XML files for Override declarations + for xml_file in self.xml_files: + path_str = str(xml_file.relative_to(self.unpacked_dir)).replace( + "\\", "/" + ) + + # Skip non-content files + if any( + skip in path_str + for skip in [".rels", "[Content_Types]", "docProps/", "_rels/"] + ): + continue + + try: + root_tag = lxml.etree.parse(str(xml_file)).getroot().tag + root_name = root_tag.split("}")[-1] if "}" in root_tag else root_tag + + if root_name in declarable_roots and path_str not in declared_parts: + errors.append( + f" {path_str}: File with <{root_name}> root not declared in [Content_Types].xml" + ) + + except Exception: + continue # Skip unparseable files + + # Check all non-XML files for Default extension declarations + for file_path in all_files: + # Skip XML files and metadata files (already checked above) + if file_path.suffix.lower() in {".xml", ".rels"}: + continue + if file_path.name == "[Content_Types].xml": + continue + if "_rels" in file_path.parts or "docProps" in file_path.parts: + continue + + extension = file_path.suffix.lstrip(".").lower() + if extension and extension not in declared_extensions: + # Check if it's a known media extension that should be declared + if extension in media_extensions: + relative_path = file_path.relative_to(self.unpacked_dir) + errors.append( + f' {relative_path}: File with extension \'{extension}\' not declared in [Content_Types].xml - should add: ' + ) + + except Exception as e: + errors.append(f" Error parsing [Content_Types].xml: {e}") + + if errors: + print(f"FAILED - Found {len(errors)} content type declaration errors:") + for error in errors: + print(error) + return False + else: + if self.verbose: + print( + "PASSED - All content files are properly declared in [Content_Types].xml" + ) + return True + + def validate_file_against_xsd(self, xml_file, verbose=False): + """Validate a single XML file against XSD schema, comparing with original. + + Args: + xml_file: Path to XML file to validate + verbose: Enable verbose output + + Returns: + tuple: (is_valid, new_errors_set) where is_valid is True/False/None (skipped) + """ + # Resolve both paths to handle symlinks + xml_file = Path(xml_file).resolve() + unpacked_dir = self.unpacked_dir.resolve() + + # Validate current file + is_valid, current_errors = self._validate_single_file_xsd( + xml_file, unpacked_dir + ) + + if is_valid is None: + return None, set() # Skipped + elif is_valid: + return True, set() # Valid, no errors + + # Get errors from original file for this specific file + original_errors = self._get_original_file_errors(xml_file) + + # Compare with original (both are guaranteed to be sets here) + assert current_errors is not None + new_errors = current_errors - original_errors + + if new_errors: + if verbose: + relative_path = xml_file.relative_to(unpacked_dir) + print(f"FAILED - {relative_path}: {len(new_errors)} new error(s)") + for error in list(new_errors)[:3]: + truncated = error[:250] + "..." if len(error) > 250 else error + print(f" - {truncated}") + return False, new_errors + else: + # All errors existed in original + if verbose: + print( + f"PASSED - No new errors (original had {len(current_errors)} errors)" + ) + return True, set() + + def validate_against_xsd(self): + """Validate XML files against XSD schemas, showing only new errors compared to original.""" + new_errors = [] + original_error_count = 0 + valid_count = 0 + skipped_count = 0 + + for xml_file in self.xml_files: + relative_path = str(xml_file.relative_to(self.unpacked_dir)) + is_valid, new_file_errors = self.validate_file_against_xsd( + xml_file, verbose=False + ) + + if is_valid is None: + skipped_count += 1 + continue + elif is_valid and not new_file_errors: + valid_count += 1 + continue + elif is_valid: + # Had errors but all existed in original + original_error_count += 1 + valid_count += 1 + continue + + # Has new errors + new_errors.append(f" {relative_path}: {len(new_file_errors)} new error(s)") + for error in list(new_file_errors)[:3]: # Show first 3 errors + new_errors.append( + f" - {error[:250]}..." if len(error) > 250 else f" - {error}" + ) + + # Print summary + if self.verbose: + print(f"Validated {len(self.xml_files)} files:") + print(f" - Valid: {valid_count}") + print(f" - Skipped (no schema): {skipped_count}") + if original_error_count: + print(f" - With original errors (ignored): {original_error_count}") + print( + f" - With NEW errors: {len(new_errors) > 0 and len([e for e in new_errors if not e.startswith(' ')]) or 0}" + ) + + if new_errors: + print("\nFAILED - Found NEW validation errors:") + for error in new_errors: + print(error) + return False + else: + if self.verbose: + print("\nPASSED - No new XSD validation errors introduced") + return True + + def _get_schema_path(self, xml_file): + """Determine the appropriate schema path for an XML file.""" + # Check exact filename match + if xml_file.name in self.SCHEMA_MAPPINGS: + return self.schemas_dir / self.SCHEMA_MAPPINGS[xml_file.name] + + # Check .rels files + if xml_file.suffix == ".rels": + return self.schemas_dir / self.SCHEMA_MAPPINGS[".rels"] + + # Check chart files + if "charts/" in str(xml_file) and xml_file.name.startswith("chart"): + return self.schemas_dir / self.SCHEMA_MAPPINGS["chart"] + + # Check theme files + if "theme/" in str(xml_file) and xml_file.name.startswith("theme"): + return self.schemas_dir / self.SCHEMA_MAPPINGS["theme"] + + # Check if file is in a main content folder and use appropriate schema + if xml_file.parent.name in self.MAIN_CONTENT_FOLDERS: + return self.schemas_dir / self.SCHEMA_MAPPINGS[xml_file.parent.name] + + return None + + def _clean_ignorable_namespaces(self, xml_doc): + """Remove attributes and elements not in allowed namespaces.""" + # Create a clean copy + xml_string = lxml.etree.tostring(xml_doc, encoding="unicode") + xml_copy = lxml.etree.fromstring(xml_string) + + # Remove attributes not in allowed namespaces + for elem in xml_copy.iter(): + attrs_to_remove = [] + + for attr in elem.attrib: + # Check if attribute is from a namespace other than allowed ones + if "{" in attr: + ns = attr.split("}")[0][1:] + if ns not in self.OOXML_NAMESPACES: + attrs_to_remove.append(attr) + + # Remove collected attributes + for attr in attrs_to_remove: + del elem.attrib[attr] + + # Remove elements not in allowed namespaces + self._remove_ignorable_elements(xml_copy) + + return lxml.etree.ElementTree(xml_copy) + + def _remove_ignorable_elements(self, root): + """Recursively remove all elements not in allowed namespaces.""" + elements_to_remove = [] + + # Find elements to remove + for elem in list(root): + # Skip non-element nodes (comments, processing instructions, etc.) + if not hasattr(elem, "tag") or callable(elem.tag): + continue + + tag_str = str(elem.tag) + if tag_str.startswith("{"): + ns = tag_str.split("}")[0][1:] + if ns not in self.OOXML_NAMESPACES: + elements_to_remove.append(elem) + continue + + # Recursively clean child elements + self._remove_ignorable_elements(elem) + + # Remove collected elements + for elem in elements_to_remove: + root.remove(elem) + + def _preprocess_for_mc_ignorable(self, xml_doc): + """Preprocess XML to handle mc:Ignorable attribute properly.""" + # Remove mc:Ignorable attributes before validation + root = xml_doc.getroot() + + # Remove mc:Ignorable attribute from root + if f"{{{self.MC_NAMESPACE}}}Ignorable" in root.attrib: + del root.attrib[f"{{{self.MC_NAMESPACE}}}Ignorable"] + + return xml_doc + + def _validate_single_file_xsd(self, xml_file, base_path): + """Validate a single XML file against XSD schema. Returns (is_valid, errors_set).""" + schema_path = self._get_schema_path(xml_file) + if not schema_path: + return None, None # Skip file + + try: + # Load schema + with open(schema_path, "rb") as xsd_file: + parser = lxml.etree.XMLParser() + xsd_doc = lxml.etree.parse( + xsd_file, parser=parser, base_url=str(schema_path) + ) + schema = lxml.etree.XMLSchema(xsd_doc) + + # Load and preprocess XML + with open(xml_file, "r") as f: + xml_doc = lxml.etree.parse(f) + + xml_doc, _ = self._remove_template_tags_from_text_nodes(xml_doc) + xml_doc = self._preprocess_for_mc_ignorable(xml_doc) + + # Clean ignorable namespaces if needed + relative_path = xml_file.relative_to(base_path) + if ( + relative_path.parts + and relative_path.parts[0] in self.MAIN_CONTENT_FOLDERS + ): + xml_doc = self._clean_ignorable_namespaces(xml_doc) + + # Validate + if schema.validate(xml_doc): + return True, set() + else: + errors = set() + for error in schema.error_log: + # Store normalized error message (without line numbers for comparison) + errors.add(error.message) + return False, errors + + except Exception as e: + return False, {str(e)} + + def _get_original_file_errors(self, xml_file): + """Get XSD validation errors from a single file in the original document. + + Args: + xml_file: Path to the XML file in unpacked_dir to check + + Returns: + set: Set of error messages from the original file + """ + import tempfile + import zipfile + + # Resolve both paths to handle symlinks (e.g., /var vs /private/var on macOS) + xml_file = Path(xml_file).resolve() + unpacked_dir = self.unpacked_dir.resolve() + relative_path = xml_file.relative_to(unpacked_dir) + + with tempfile.TemporaryDirectory() as temp_dir: + temp_path = Path(temp_dir) + + # Extract original file + with zipfile.ZipFile(self.original_file, "r") as zip_ref: + zip_ref.extractall(temp_path) + + # Find corresponding file in original + original_xml_file = temp_path / relative_path + + if not original_xml_file.exists(): + # File didn't exist in original, so no original errors + return set() + + # Validate the specific file in original + is_valid, errors = self._validate_single_file_xsd( + original_xml_file, temp_path + ) + return errors if errors else set() + + def _remove_template_tags_from_text_nodes(self, xml_doc): + """Remove template tags from XML text nodes and collect warnings. + + Template tags follow the pattern {{ ... }} and are used as placeholders + for content replacement. They should be removed from text content before + XSD validation while preserving XML structure. + + Returns: + tuple: (cleaned_xml_doc, warnings_list) + """ + warnings = [] + template_pattern = re.compile(r"\{\{[^}]*\}\}") + + # Create a copy of the document to avoid modifying the original + xml_string = lxml.etree.tostring(xml_doc, encoding="unicode") + xml_copy = lxml.etree.fromstring(xml_string) + + def process_text_content(text, content_type): + if not text: + return text + matches = list(template_pattern.finditer(text)) + if matches: + for match in matches: + warnings.append( + f"Found template tag in {content_type}: {match.group()}" + ) + return template_pattern.sub("", text) + return text + + # Process all text nodes in the document + for elem in xml_copy.iter(): + # Skip processing if this is a w:t element + if not hasattr(elem, "tag") or callable(elem.tag): + continue + tag_str = str(elem.tag) + if tag_str.endswith("}t") or tag_str == "t": + continue + + elem.text = process_text_content(elem.text, "text content") + elem.tail = process_text_content(elem.tail, "tail content") + + return lxml.etree.ElementTree(xml_copy), warnings + + +if __name__ == "__main__": + raise RuntimeError("This module should not be run directly.") diff --git a/skills/docx/ooxml/scripts/validation/docx.py b/skills/docx/ooxml/scripts/validation/docx.py new file mode 100755 index 0000000..602c470 --- /dev/null +++ b/skills/docx/ooxml/scripts/validation/docx.py @@ -0,0 +1,274 @@ +""" +Validator for Word document XML files against XSD schemas. +""" + +import re +import tempfile +import zipfile + +import lxml.etree + +from .base import BaseSchemaValidator + + +class DOCXSchemaValidator(BaseSchemaValidator): + """Validator for Word document XML files against XSD schemas.""" + + # Word-specific namespace + WORD_2006_NAMESPACE = "http://schemas.openxmlformats.org/wordprocessingml/2006/main" + + # Word-specific element to relationship type mappings + # Start with empty mapping - add specific cases as we discover them + ELEMENT_RELATIONSHIP_TYPES = {} + + def validate(self): + """Run all validation checks and return True if all pass.""" + # Test 0: XML well-formedness + if not self.validate_xml(): + return False + + # Test 1: Namespace declarations + all_valid = True + if not self.validate_namespaces(): + all_valid = False + + # Test 2: Unique IDs + if not self.validate_unique_ids(): + all_valid = False + + # Test 3: Relationship and file reference validation + if not self.validate_file_references(): + all_valid = False + + # Test 4: Content type declarations + if not self.validate_content_types(): + all_valid = False + + # Test 5: XSD schema validation + if not self.validate_against_xsd(): + all_valid = False + + # Test 6: Whitespace preservation + if not self.validate_whitespace_preservation(): + all_valid = False + + # Test 7: Deletion validation + if not self.validate_deletions(): + all_valid = False + + # Test 8: Insertion validation + if not self.validate_insertions(): + all_valid = False + + # Test 9: Relationship ID reference validation + if not self.validate_all_relationship_ids(): + all_valid = False + + # Count and compare paragraphs + self.compare_paragraph_counts() + + return all_valid + + def validate_whitespace_preservation(self): + """ + Validate that w:t elements with whitespace have xml:space='preserve'. + """ + errors = [] + + for xml_file in self.xml_files: + # Only check document.xml files + if xml_file.name != "document.xml": + continue + + try: + root = lxml.etree.parse(str(xml_file)).getroot() + + # Find all w:t elements + for elem in root.iter(f"{{{self.WORD_2006_NAMESPACE}}}t"): + if elem.text: + text = elem.text + # Check if text starts or ends with whitespace + if re.match(r"^\s.*", text) or re.match(r".*\s$", text): + # Check if xml:space="preserve" attribute exists + xml_space_attr = f"{{{self.XML_NAMESPACE}}}space" + if ( + xml_space_attr not in elem.attrib + or elem.attrib[xml_space_attr] != "preserve" + ): + # Show a preview of the text + text_preview = ( + repr(text)[:50] + "..." + if len(repr(text)) > 50 + else repr(text) + ) + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: " + f"Line {elem.sourceline}: w:t element with whitespace missing xml:space='preserve': {text_preview}" + ) + + except (lxml.etree.XMLSyntaxError, Exception) as e: + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: Error: {e}" + ) + + if errors: + print(f"FAILED - Found {len(errors)} whitespace preservation violations:") + for error in errors: + print(error) + return False + else: + if self.verbose: + print("PASSED - All whitespace is properly preserved") + return True + + def validate_deletions(self): + """ + Validate that w:t elements are not within w:del elements. + For some reason, XSD validation does not catch this, so we do it manually. + """ + errors = [] + + for xml_file in self.xml_files: + # Only check document.xml files + if xml_file.name != "document.xml": + continue + + try: + root = lxml.etree.parse(str(xml_file)).getroot() + + # Find all w:t elements that are descendants of w:del elements + namespaces = {"w": self.WORD_2006_NAMESPACE} + xpath_expression = ".//w:del//w:t" + problematic_t_elements = root.xpath( + xpath_expression, namespaces=namespaces + ) + for t_elem in problematic_t_elements: + if t_elem.text: + # Show a preview of the text + text_preview = ( + repr(t_elem.text)[:50] + "..." + if len(repr(t_elem.text)) > 50 + else repr(t_elem.text) + ) + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: " + f"Line {t_elem.sourceline}: found within : {text_preview}" + ) + + except (lxml.etree.XMLSyntaxError, Exception) as e: + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: Error: {e}" + ) + + if errors: + print(f"FAILED - Found {len(errors)} deletion validation violations:") + for error in errors: + print(error) + return False + else: + if self.verbose: + print("PASSED - No w:t elements found within w:del elements") + return True + + def count_paragraphs_in_unpacked(self): + """Count the number of paragraphs in the unpacked document.""" + count = 0 + + for xml_file in self.xml_files: + # Only check document.xml files + if xml_file.name != "document.xml": + continue + + try: + root = lxml.etree.parse(str(xml_file)).getroot() + # Count all w:p elements + paragraphs = root.findall(f".//{{{self.WORD_2006_NAMESPACE}}}p") + count = len(paragraphs) + except Exception as e: + print(f"Error counting paragraphs in unpacked document: {e}") + + return count + + def count_paragraphs_in_original(self): + """Count the number of paragraphs in the original docx file.""" + count = 0 + + try: + # Create temporary directory to unpack original + with tempfile.TemporaryDirectory() as temp_dir: + # Unpack original docx + with zipfile.ZipFile(self.original_file, "r") as zip_ref: + zip_ref.extractall(temp_dir) + + # Parse document.xml + doc_xml_path = temp_dir + "/word/document.xml" + root = lxml.etree.parse(doc_xml_path).getroot() + + # Count all w:p elements + paragraphs = root.findall(f".//{{{self.WORD_2006_NAMESPACE}}}p") + count = len(paragraphs) + + except Exception as e: + print(f"Error counting paragraphs in original document: {e}") + + return count + + def validate_insertions(self): + """ + Validate that w:delText elements are not within w:ins elements. + w:delText is only allowed in w:ins if nested within a w:del. + """ + errors = [] + + for xml_file in self.xml_files: + if xml_file.name != "document.xml": + continue + + try: + root = lxml.etree.parse(str(xml_file)).getroot() + namespaces = {"w": self.WORD_2006_NAMESPACE} + + # Find w:delText in w:ins that are NOT within w:del + invalid_elements = root.xpath( + ".//w:ins//w:delText[not(ancestor::w:del)]", + namespaces=namespaces + ) + + for elem in invalid_elements: + text_preview = ( + repr(elem.text or "")[:50] + "..." + if len(repr(elem.text or "")) > 50 + else repr(elem.text or "") + ) + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: " + f"Line {elem.sourceline}: within : {text_preview}" + ) + + except (lxml.etree.XMLSyntaxError, Exception) as e: + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: Error: {e}" + ) + + if errors: + print(f"FAILED - Found {len(errors)} insertion validation violations:") + for error in errors: + print(error) + return False + else: + if self.verbose: + print("PASSED - No w:delText elements within w:ins elements") + return True + + def compare_paragraph_counts(self): + """Compare paragraph counts between original and new document.""" + original_count = self.count_paragraphs_in_original() + new_count = self.count_paragraphs_in_unpacked() + + diff = new_count - original_count + diff_str = f"+{diff}" if diff > 0 else str(diff) + print(f"\nParagraphs: {original_count} → {new_count} ({diff_str})") + + +if __name__ == "__main__": + raise RuntimeError("This module should not be run directly.") diff --git a/skills/docx/ooxml/scripts/validation/pptx.py b/skills/docx/ooxml/scripts/validation/pptx.py new file mode 100755 index 0000000..66d5b1e --- /dev/null +++ b/skills/docx/ooxml/scripts/validation/pptx.py @@ -0,0 +1,315 @@ +""" +Validator for PowerPoint presentation XML files against XSD schemas. +""" + +import re + +from .base import BaseSchemaValidator + + +class PPTXSchemaValidator(BaseSchemaValidator): + """Validator for PowerPoint presentation XML files against XSD schemas.""" + + # PowerPoint presentation namespace + PRESENTATIONML_NAMESPACE = ( + "http://schemas.openxmlformats.org/presentationml/2006/main" + ) + + # PowerPoint-specific element to relationship type mappings + ELEMENT_RELATIONSHIP_TYPES = { + "sldid": "slide", + "sldmasterid": "slidemaster", + "notesmasterid": "notesmaster", + "sldlayoutid": "slidelayout", + "themeid": "theme", + "tablestyleid": "tablestyles", + } + + def validate(self): + """Run all validation checks and return True if all pass.""" + # Test 0: XML well-formedness + if not self.validate_xml(): + return False + + # Test 1: Namespace declarations + all_valid = True + if not self.validate_namespaces(): + all_valid = False + + # Test 2: Unique IDs + if not self.validate_unique_ids(): + all_valid = False + + # Test 3: UUID ID validation + if not self.validate_uuid_ids(): + all_valid = False + + # Test 4: Relationship and file reference validation + if not self.validate_file_references(): + all_valid = False + + # Test 5: Slide layout ID validation + if not self.validate_slide_layout_ids(): + all_valid = False + + # Test 6: Content type declarations + if not self.validate_content_types(): + all_valid = False + + # Test 7: XSD schema validation + if not self.validate_against_xsd(): + all_valid = False + + # Test 8: Notes slide reference validation + if not self.validate_notes_slide_references(): + all_valid = False + + # Test 9: Relationship ID reference validation + if not self.validate_all_relationship_ids(): + all_valid = False + + # Test 10: Duplicate slide layout references validation + if not self.validate_no_duplicate_slide_layouts(): + all_valid = False + + return all_valid + + def validate_uuid_ids(self): + """Validate that ID attributes that look like UUIDs contain only hex values.""" + import lxml.etree + + errors = [] + # UUID pattern: 8-4-4-4-12 hex digits with optional braces/hyphens + uuid_pattern = re.compile( + r"^[\{\(]?[0-9A-Fa-f]{8}-?[0-9A-Fa-f]{4}-?[0-9A-Fa-f]{4}-?[0-9A-Fa-f]{4}-?[0-9A-Fa-f]{12}[\}\)]?$" + ) + + for xml_file in self.xml_files: + try: + root = lxml.etree.parse(str(xml_file)).getroot() + + # Check all elements for ID attributes + for elem in root.iter(): + for attr, value in elem.attrib.items(): + # Check if this is an ID attribute + attr_name = attr.split("}")[-1].lower() + if attr_name == "id" or attr_name.endswith("id"): + # Check if value looks like a UUID (has the right length and pattern structure) + if self._looks_like_uuid(value): + # Validate that it contains only hex characters in the right positions + if not uuid_pattern.match(value): + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: " + f"Line {elem.sourceline}: ID '{value}' appears to be a UUID but contains invalid hex characters" + ) + + except (lxml.etree.XMLSyntaxError, Exception) as e: + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: Error: {e}" + ) + + if errors: + print(f"FAILED - Found {len(errors)} UUID ID validation errors:") + for error in errors: + print(error) + return False + else: + if self.verbose: + print("PASSED - All UUID-like IDs contain valid hex values") + return True + + def _looks_like_uuid(self, value): + """Check if a value has the general structure of a UUID.""" + # Remove common UUID delimiters + clean_value = value.strip("{}()").replace("-", "") + # Check if it's 32 hex-like characters (could include invalid hex chars) + return len(clean_value) == 32 and all(c.isalnum() for c in clean_value) + + def validate_slide_layout_ids(self): + """Validate that sldLayoutId elements in slide masters reference valid slide layouts.""" + import lxml.etree + + errors = [] + + # Find all slide master files + slide_masters = list(self.unpacked_dir.glob("ppt/slideMasters/*.xml")) + + if not slide_masters: + if self.verbose: + print("PASSED - No slide masters found") + return True + + for slide_master in slide_masters: + try: + # Parse the slide master file + root = lxml.etree.parse(str(slide_master)).getroot() + + # Find the corresponding _rels file for this slide master + rels_file = slide_master.parent / "_rels" / f"{slide_master.name}.rels" + + if not rels_file.exists(): + errors.append( + f" {slide_master.relative_to(self.unpacked_dir)}: " + f"Missing relationships file: {rels_file.relative_to(self.unpacked_dir)}" + ) + continue + + # Parse the relationships file + rels_root = lxml.etree.parse(str(rels_file)).getroot() + + # Build a set of valid relationship IDs that point to slide layouts + valid_layout_rids = set() + for rel in rels_root.findall( + f".//{{{self.PACKAGE_RELATIONSHIPS_NAMESPACE}}}Relationship" + ): + rel_type = rel.get("Type", "") + if "slideLayout" in rel_type: + valid_layout_rids.add(rel.get("Id")) + + # Find all sldLayoutId elements in the slide master + for sld_layout_id in root.findall( + f".//{{{self.PRESENTATIONML_NAMESPACE}}}sldLayoutId" + ): + r_id = sld_layout_id.get( + f"{{{self.OFFICE_RELATIONSHIPS_NAMESPACE}}}id" + ) + layout_id = sld_layout_id.get("id") + + if r_id and r_id not in valid_layout_rids: + errors.append( + f" {slide_master.relative_to(self.unpacked_dir)}: " + f"Line {sld_layout_id.sourceline}: sldLayoutId with id='{layout_id}' " + f"references r:id='{r_id}' which is not found in slide layout relationships" + ) + + except (lxml.etree.XMLSyntaxError, Exception) as e: + errors.append( + f" {slide_master.relative_to(self.unpacked_dir)}: Error: {e}" + ) + + if errors: + print(f"FAILED - Found {len(errors)} slide layout ID validation errors:") + for error in errors: + print(error) + print( + "Remove invalid references or add missing slide layouts to the relationships file." + ) + return False + else: + if self.verbose: + print("PASSED - All slide layout IDs reference valid slide layouts") + return True + + def validate_no_duplicate_slide_layouts(self): + """Validate that each slide has exactly one slideLayout reference.""" + import lxml.etree + + errors = [] + slide_rels_files = list(self.unpacked_dir.glob("ppt/slides/_rels/*.xml.rels")) + + for rels_file in slide_rels_files: + try: + root = lxml.etree.parse(str(rels_file)).getroot() + + # Find all slideLayout relationships + layout_rels = [ + rel + for rel in root.findall( + f".//{{{self.PACKAGE_RELATIONSHIPS_NAMESPACE}}}Relationship" + ) + if "slideLayout" in rel.get("Type", "") + ] + + if len(layout_rels) > 1: + errors.append( + f" {rels_file.relative_to(self.unpacked_dir)}: has {len(layout_rels)} slideLayout references" + ) + + except Exception as e: + errors.append( + f" {rels_file.relative_to(self.unpacked_dir)}: Error: {e}" + ) + + if errors: + print("FAILED - Found slides with duplicate slideLayout references:") + for error in errors: + print(error) + return False + else: + if self.verbose: + print("PASSED - All slides have exactly one slideLayout reference") + return True + + def validate_notes_slide_references(self): + """Validate that each notesSlide file is referenced by only one slide.""" + import lxml.etree + + errors = [] + notes_slide_references = {} # Track which slides reference each notesSlide + + # Find all slide relationship files + slide_rels_files = list(self.unpacked_dir.glob("ppt/slides/_rels/*.xml.rels")) + + if not slide_rels_files: + if self.verbose: + print("PASSED - No slide relationship files found") + return True + + for rels_file in slide_rels_files: + try: + # Parse the relationships file + root = lxml.etree.parse(str(rels_file)).getroot() + + # Find all notesSlide relationships + for rel in root.findall( + f".//{{{self.PACKAGE_RELATIONSHIPS_NAMESPACE}}}Relationship" + ): + rel_type = rel.get("Type", "") + if "notesSlide" in rel_type: + target = rel.get("Target", "") + if target: + # Normalize the target path to handle relative paths + normalized_target = target.replace("../", "") + + # Track which slide references this notesSlide + slide_name = rels_file.stem.replace( + ".xml", "" + ) # e.g., "slide1" + + if normalized_target not in notes_slide_references: + notes_slide_references[normalized_target] = [] + notes_slide_references[normalized_target].append( + (slide_name, rels_file) + ) + + except (lxml.etree.XMLSyntaxError, Exception) as e: + errors.append( + f" {rels_file.relative_to(self.unpacked_dir)}: Error: {e}" + ) + + # Check for duplicate references + for target, references in notes_slide_references.items(): + if len(references) > 1: + slide_names = [ref[0] for ref in references] + errors.append( + f" Notes slide '{target}' is referenced by multiple slides: {', '.join(slide_names)}" + ) + for slide_name, rels_file in references: + errors.append(f" - {rels_file.relative_to(self.unpacked_dir)}") + + if errors: + print( + f"FAILED - Found {len([e for e in errors if not e.startswith(' ')])} notes slide reference validation errors:" + ) + for error in errors: + print(error) + print("Each slide may optionally have its own slide file.") + return False + else: + if self.verbose: + print("PASSED - All notes slide references are unique") + return True + + +if __name__ == "__main__": + raise RuntimeError("This module should not be run directly.") diff --git a/skills/docx/ooxml/scripts/validation/redlining.py b/skills/docx/ooxml/scripts/validation/redlining.py new file mode 100755 index 0000000..2ec7eab --- /dev/null +++ b/skills/docx/ooxml/scripts/validation/redlining.py @@ -0,0 +1,279 @@ +""" +Validator for tracked changes in Word documents. +""" + +import subprocess +import tempfile +import zipfile +from pathlib import Path + + +class RedliningValidator: + """Validator for tracked changes in Word documents.""" + + def __init__(self, unpacked_dir, original_docx, verbose=False): + self.unpacked_dir = Path(unpacked_dir) + self.original_docx = Path(original_docx) + self.verbose = verbose + self.namespaces = { + "w": "http://schemas.openxmlformats.org/wordprocessingml/2006/main" + } + + def validate(self): + """Main validation method that returns True if valid, False otherwise.""" + # Verify unpacked directory exists and has correct structure + modified_file = self.unpacked_dir / "word" / "document.xml" + if not modified_file.exists(): + print(f"FAILED - Modified document.xml not found at {modified_file}") + return False + + # First, check if there are any tracked changes by GLM to validate + try: + import xml.etree.ElementTree as ET + + tree = ET.parse(modified_file) + root = tree.getroot() + + # Check for w:del or w:ins tags authored by GLM + del_elements = root.findall(".//w:del", self.namespaces) + ins_elements = root.findall(".//w:ins", self.namespaces) + + # Filter to only include changes by GLM + glm_del_elements = [ + elem + for elem in del_elements + if elem.get(f"{{{self.namespaces['w']}}}author") == "GLM" + ] + glm_ins_elements = [ + elem + for elem in ins_elements + if elem.get(f"{{{self.namespaces['w']}}}author") == "GLM" + ] + + # Redlining validation is only needed if tracked changes by GLM have been used. + if not glm_del_elements and not glm_ins_elements: + if self.verbose: + print("PASSED - No tracked changes by GLM found.") + return True + + except Exception: + # If we can't parse the XML, continue with full validation + pass + + # Create temporary directory for unpacking original docx + with tempfile.TemporaryDirectory() as temp_dir: + temp_path = Path(temp_dir) + + # Unpack original docx + try: + with zipfile.ZipFile(self.original_docx, "r") as zip_ref: + zip_ref.extractall(temp_path) + except Exception as e: + print(f"FAILED - Error unpacking original docx: {e}") + return False + + original_file = temp_path / "word" / "document.xml" + if not original_file.exists(): + print( + f"FAILED - Original document.xml not found in {self.original_docx}" + ) + return False + + # Parse both XML files using xml.etree.ElementTree for redlining validation + try: + import xml.etree.ElementTree as ET + + modified_tree = ET.parse(modified_file) + modified_root = modified_tree.getroot() + original_tree = ET.parse(original_file) + original_root = original_tree.getroot() + except ET.ParseError as e: + print(f"FAILED - Error parsing XML files: {e}") + return False + + # Remove GLM's tracked changes from both documents + self._remove_glm_tracked_changes(original_root) + self._remove_glm_tracked_changes(modified_root) + + # Extract and compare text content + modified_text = self._extract_text_content(modified_root) + original_text = self._extract_text_content(original_root) + + if modified_text != original_text: + # Show detailed character-level differences for each paragraph + error_message = self._generate_detailed_diff( + original_text, modified_text + ) + print(error_message) + return False + + if self.verbose: + print("PASSED - All changes by GLM are properly tracked") + return True + + def _generate_detailed_diff(self, original_text, modified_text): + """Generate detailed word-level differences using git word diff.""" + error_parts = [ + "FAILED - Document text doesn't match after removing GLM's tracked changes", + "", + "Likely causes:", + " 1. Modified text inside another author's or tags", + " 2. Made edits without proper tracked changes", + " 3. Didn't nest inside when deleting another's insertion", + "", + "For pre-redlined documents, use correct patterns:", + " - To reject another's INSERTION: Nest inside their ", + " - To restore another's DELETION: Add new AFTER their ", + "", + ] + + # Show git word diff + git_diff = self._get_git_word_diff(original_text, modified_text) + if git_diff: + error_parts.extend(["Differences:", "============", git_diff]) + else: + error_parts.append("Unable to generate word diff (git not available)") + + return "\n".join(error_parts) + + def _get_git_word_diff(self, original_text, modified_text): + """Generate word diff using git with character-level precision.""" + try: + with tempfile.TemporaryDirectory() as temp_dir: + temp_path = Path(temp_dir) + + # Create two files + original_file = temp_path / "original.txt" + modified_file = temp_path / "modified.txt" + + original_file.write_text(original_text, encoding="utf-8") + modified_file.write_text(modified_text, encoding="utf-8") + + # Try character-level diff first for precise differences + result = subprocess.run( + [ + "git", + "diff", + "--word-diff=plain", + "--word-diff-regex=.", # Character-by-character diff + "-U0", # Zero lines of context - show only changed lines + "--no-index", + str(original_file), + str(modified_file), + ], + capture_output=True, + text=True, + ) + + if result.stdout.strip(): + # Clean up the output - remove git diff header lines + lines = result.stdout.split("\n") + # Skip the header lines (diff --git, index, +++, ---, @@) + content_lines = [] + in_content = False + for line in lines: + if line.startswith("@@"): + in_content = True + continue + if in_content and line.strip(): + content_lines.append(line) + + if content_lines: + return "\n".join(content_lines) + + # Fallback to word-level diff if character-level is too verbose + result = subprocess.run( + [ + "git", + "diff", + "--word-diff=plain", + "-U0", # Zero lines of context + "--no-index", + str(original_file), + str(modified_file), + ], + capture_output=True, + text=True, + ) + + if result.stdout.strip(): + lines = result.stdout.split("\n") + content_lines = [] + in_content = False + for line in lines: + if line.startswith("@@"): + in_content = True + continue + if in_content and line.strip(): + content_lines.append(line) + return "\n".join(content_lines) + + except (subprocess.CalledProcessError, FileNotFoundError, Exception): + # Git not available or other error, return None to use fallback + pass + + return None + + def _remove_glm_tracked_changes(self, root): + """Remove tracked changes authored by GLM from the XML root.""" + ins_tag = f"{{{self.namespaces['w']}}}ins" + del_tag = f"{{{self.namespaces['w']}}}del" + author_attr = f"{{{self.namespaces['w']}}}author" + + # Remove w:ins elements + for parent in root.iter(): + to_remove = [] + for child in parent: + if child.tag == ins_tag and child.get(author_attr) == "GLM": + to_remove.append(child) + for elem in to_remove: + parent.remove(elem) + + # Unwrap content in w:del elements where author is "GLM" + deltext_tag = f"{{{self.namespaces['w']}}}delText" + t_tag = f"{{{self.namespaces['w']}}}t" + + for parent in root.iter(): + to_process = [] + for child in parent: + if child.tag == del_tag and child.get(author_attr) == "GLM": + to_process.append((child, list(parent).index(child))) + + # Process in reverse order to maintain indices + for del_elem, del_index in reversed(to_process): + # Convert w:delText to w:t before moving + for elem in del_elem.iter(): + if elem.tag == deltext_tag: + elem.tag = t_tag + + # Move all children of w:del to its parent before removing w:del + for child in reversed(list(del_elem)): + parent.insert(del_index, child) + parent.remove(del_elem) + + def _extract_text_content(self, root): + """Extract text content from Word XML, preserving paragraph structure. + + Empty paragraphs are skipped to avoid false positives when tracked + insertions add only structural elements without text content. + """ + p_tag = f"{{{self.namespaces['w']}}}p" + t_tag = f"{{{self.namespaces['w']}}}t" + + paragraphs = [] + for p_elem in root.findall(f".//{p_tag}"): + # Get all text elements within this paragraph + text_parts = [] + for t_elem in p_elem.findall(f".//{t_tag}"): + if t_elem.text: + text_parts.append(t_elem.text) + paragraph_text = "".join(text_parts) + # Skip empty paragraphs - they don't affect content validation + if paragraph_text: + paragraphs.append(paragraph_text) + + return "\n".join(paragraphs) + + +if __name__ == "__main__": + raise RuntimeError("This module should not be run directly.") diff --git a/skills/docx/scripts/__init__.py b/skills/docx/scripts/__init__.py new file mode 100755 index 0000000..bf9c562 --- /dev/null +++ b/skills/docx/scripts/__init__.py @@ -0,0 +1 @@ +# Make scripts directory a package for relative imports in tests diff --git a/skills/docx/scripts/add_toc_placeholders.py b/skills/docx/scripts/add_toc_placeholders.py new file mode 100755 index 0000000..bec8482 --- /dev/null +++ b/skills/docx/scripts/add_toc_placeholders.py @@ -0,0 +1,220 @@ +#!/usr/bin/env python3 +""" +Add placeholder entries to Table of Contents in a DOCX file. + +This script adds placeholder TOC entries between the 'separate' and 'end' +field characters, so users see some content on first open instead of an empty TOC. +The original file is replaced with the modified version. + +Usage: + python add_toc_placeholders.py --entries + + entries_json format: JSON string with array of objects: + [ + {"level": 1, "text": "Chapter 1 Overview", "page": "1"}, + {"level": 2, "text": "Section 1.1 Details", "page": "1"} + ] + + If --entries is not provided, generates generic placeholders. + +Example: + python add_toc_placeholders.py document.docx + python add_toc_placeholders.py document.docx --entries '[{"level":1,"text":"Introduction","page":"1"}]' +""" + +import argparse +import html +import json +import shutil +import sys +import tempfile +import zipfile +from pathlib import Path + + +def add_toc_placeholders(docx_path: str, entries: list = None) -> None: + """Add placeholder TOC entries to a DOCX file (in-place replacement). + + Args: + docx_path: Path to DOCX file (will be modified in-place) + entries: Optional list of placeholder entries. Each entry should be a dict + with 'level' (1-3), 'text', and 'page' keys. + """ + docx_path = Path(docx_path) + + # Create temp directory for extraction + with tempfile.TemporaryDirectory() as temp_dir: + temp_path = Path(temp_dir) + extracted_dir = temp_path / "extracted" + temp_output = temp_path / "output.docx" + + # Extract DOCX + with zipfile.ZipFile(docx_path, 'r') as zip_ref: + zip_ref.extractall(extracted_dir) + + # Detect TOC styles from styles.xml + toc_style_mapping = _detect_toc_styles(extracted_dir / "word" / "styles.xml") + print(toc_style_mapping) + # Process document.xml + document_xml = extracted_dir / "word" / "document.xml" + if not document_xml.exists(): + raise ValueError("document.xml not found in the DOCX file") + + # Read and process XML + content = document_xml.read_text(encoding='utf-8') + + # Find TOC structure and add placeholders + modified_content = _insert_toc_placeholders(content, entries, toc_style_mapping) + + # Write back + document_xml.write_text(modified_content, encoding='utf-8') + + # Repack DOCX to temp file + with zipfile.ZipFile(temp_output, 'w', zipfile.ZIP_DEFLATED) as zipf: + for file_path in extracted_dir.rglob('*'): + if file_path.is_file(): + arcname = file_path.relative_to(extracted_dir) + zipf.write(file_path, arcname) + + # Replace original file with modified version (use shutil.move for cross-device support) + docx_path.unlink() + shutil.move(str(temp_output), str(docx_path)) + + +def _detect_toc_styles(styles_xml_path: Path) -> dict: + """Detect TOC style IDs from styles.xml. + + Args: + styles_xml_path: Path to styles.xml + + Returns: + Dictionary mapping level (1, 2, 3) to style ID + """ + default_mapping = {1: "9", 2: "11", 3: "12"} + + if not styles_xml_path.exists(): + return default_mapping + + content = styles_xml_path.read_text(encoding='utf-8') + + # Find styles with names like "toc 1", "toc 2", "toc 3" + import re + toc_styles = {} + for match in re.finditer(r']*w:styleId="([^"]*)"[^>]*>.*? str: + """Insert placeholder TOC entries into XML content. + + Args: + xml_content: The XML content of document.xml + entries: Optional list of placeholder entries + toc_style_mapping: Dictionary mapping level to style ID + + Returns: + Modified XML content with placeholders inserted + """ + # Generate default placeholder entries if none provided + if entries is None: + entries = [ + {"level": 1, "text": "Chapter 1 Overview", "page": "1"}, + {"level": 2, "text": "Section 1.1 Details", "page": "1"}, + {"level": 2, "text": "Section 1.2 More Details", "page": "2"}, + {"level": 1, "text": "Chapter 2 Content", "page": "3"}, + ] + + # Use provided mapping or default + if toc_style_mapping is None: + toc_style_mapping = {1: "9", 2: "11", 3: "12"} + + # Find the TOC structure: w:p with w:fldChar separate, followed by w:p with w:fldChar end + # Pattern: ... + separate_end_pattern = ( + r'(]*>]*>.*?]*w:fldCharType="separate"[^>]*/>)' + r'(]*>]*>.*?]*w:fldCharType="end"[^>]*/>)' + ) + + import re + + def replace_with_placeholders(match): + separate_para = match.group(1) + end_para = match.group(2) + + # Indentation values in twips (1 inch = 1440 twips) + # Level 1: 0, Level 2: 0.25" (360), Level 3: 0.5" (720), Level 4+: 0.75" (1080) + indent_mapping = {1: 0, 2: 360, 3: 720, 4: 1080, 5: 1440, 6: 1800} + + # Generate placeholder paragraphs matching Word's TOC format + placeholder_paragraphs = [] + for entry in entries: + level = entry.get('level', 1) + text = html.escape(entry.get('text', '')) + page = entry.get('page', '1') + + # Get style ID for this level + toc_style = toc_style_mapping.get(level, toc_style_mapping.get(1, "9")) + + # Get indentation for this level + indent = indent_mapping.get(level, 0) + indent_attr = f'' if indent > 0 else '' + + # Use w:tab element (not w:tabStop) like Word does + placeholder_para = f''' + + + {indent_attr} + + + {text} + + {page} +''' + placeholder_paragraphs.append(placeholder_para) + + # Join with the separate paragraph at start and end paragraph at end + return separate_para + '\n'.join(placeholder_paragraphs) + end_para + + # Replace the pattern + modified_content = re.sub(separate_end_pattern, replace_with_placeholders, xml_content, flags=re.DOTALL) + + return modified_content + + +def main(): + parser = argparse.ArgumentParser( + description='Add placeholder entries to Table of Contents in a DOCX file (in-place)' + ) + parser.add_argument('docx_file', help='DOCX file to modify (will be replaced)') + parser.add_argument( + '--entries', + help='JSON string with placeholder entries: [{"level":1,"text":"Chapter 1","page":"1"}]' + ) + + args = parser.parse_args() + + # Parse entries if provided + entries = None + if args.entries: + try: + entries = json.loads(args.entries) + except json.JSONDecodeError as e: + print(f"Error parsing entries JSON: {e}", file=sys.stderr) + sys.exit(1) + + # Add placeholders + try: + add_toc_placeholders(args.docx_file, entries) + print(f"Successfully added TOC placeholders to {args.docx_file}") + except Exception as e: + print(f"Error: {e}", file=sys.stderr) + sys.exit(1) + + +if __name__ == '__main__': + main() diff --git a/skills/docx/scripts/document.py b/skills/docx/scripts/document.py new file mode 100755 index 0000000..bac280b --- /dev/null +++ b/skills/docx/scripts/document.py @@ -0,0 +1,1302 @@ +#!/usr/bin/env python3 +""" +Library for working with Word documents: comments, tracked changes, and editing. + +Usage: + from skills.docx.scripts.document import Document + + # Initialize + doc = Document('workspace/unpacked') + doc = Document('workspace/unpacked', author="John Doe", initials="JD") + + # Find nodes + node = doc["word/document.xml"].get_node(tag="w:del", attrs={"w:id": "1"}) + node = doc["word/document.xml"].get_node(tag="w:p", line_number=10) + + # Add comments + doc.add_comment(start=node, end=node, text="Comment text") + doc.reply_to_comment(parent_comment_id=0, text="Reply text") + + # Suggest tracked changes + doc["word/document.xml"].suggest_deletion(node) # Delete content + doc["word/document.xml"].revert_insertion(ins_node) # Reject insertion + doc["word/document.xml"].revert_deletion(del_node) # Reject deletion + + # Save + doc.save() +""" + +import html +import random +import shutil +import tempfile +from datetime import datetime, timezone +from pathlib import Path + +from defusedxml import minidom +from ooxml.scripts.pack import pack_document +from ooxml.scripts.validation.docx import DOCXSchemaValidator +from ooxml.scripts.validation.redlining import RedliningValidator + +from .utilities import XMLEditor + +# Path to template files +TEMPLATE_DIR = Path(__file__).parent / "templates" + + +class DocxXMLEditor(XMLEditor): + """XMLEditor that automatically applies RSID, author, and date to new elements. + + Automatically adds attributes to elements that support them when inserting new content: + - w:rsidR, w:rsidRDefault, w:rsidP (for w:p and w:r elements) + - w:author and w:date (for w:ins, w:del, w:comment elements) + - w:id (for w:ins and w:del elements) + + Attributes: + dom (defusedxml.minidom.Document): The DOM document for direct manipulation + """ + + def __init__( + self, xml_path, rsid: str, author: str = "GLM", initials: str = "C" + ): + """Initialize with required RSID and optional author. + + Args: + xml_path: Path to XML file to edit + rsid: RSID to automatically apply to new elements + author: Author name for tracked changes and comments (default: "GLM") + initials: Author initials (default: "C") + """ + super().__init__(xml_path) + self.rsid = rsid + self.author = author + self.initials = initials + + def _get_next_change_id(self): + """Get the next available change ID by checking all tracked change elements.""" + max_id = -1 + for tag in ("w:ins", "w:del"): + elements = self.dom.getElementsByTagName(tag) + for elem in elements: + change_id = elem.getAttribute("w:id") + if change_id: + try: + max_id = max(max_id, int(change_id)) + except ValueError: + pass + return max_id + 1 + + def _ensure_w16du_namespace(self): + """Ensure w16du namespace is declared on the root element.""" + root = self.dom.documentElement + if not root.hasAttribute("xmlns:w16du"): # type: ignore + root.setAttribute( # type: ignore + "xmlns:w16du", + "http://schemas.microsoft.com/office/word/2023/wordml/word16du", + ) + + def _ensure_w16cex_namespace(self): + """Ensure w16cex namespace is declared on the root element.""" + root = self.dom.documentElement + if not root.hasAttribute("xmlns:w16cex"): # type: ignore + root.setAttribute( # type: ignore + "xmlns:w16cex", + "http://schemas.microsoft.com/office/word/2018/wordml/cex", + ) + + def _ensure_w14_namespace(self): + """Ensure w14 namespace is declared on the root element.""" + root = self.dom.documentElement + if not root.hasAttribute("xmlns:w14"): # type: ignore + root.setAttribute( # type: ignore + "xmlns:w14", + "http://schemas.microsoft.com/office/word/2010/wordml", + ) + + def _inject_attributes_to_nodes(self, nodes): + """Inject RSID, author, and date attributes into DOM nodes where applicable. + + Adds attributes to elements that support them: + - w:r: gets w:rsidR (or w:rsidDel if inside w:del) + - w:p: gets w:rsidR, w:rsidRDefault, w:rsidP, w14:paraId, w14:textId + - w:t: gets xml:space="preserve" if text has leading/trailing whitespace + - w:ins, w:del: get w:id, w:author, w:date, w16du:dateUtc + - w:comment: gets w:author, w:date, w:initials + - w16cex:commentExtensible: gets w16cex:dateUtc + + Args: + nodes: List of DOM nodes to process + """ + from datetime import datetime, timezone + + timestamp = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ") + + def is_inside_deletion(elem): + """Check if element is inside a w:del element.""" + parent = elem.parentNode + while parent: + if parent.nodeType == parent.ELEMENT_NODE and parent.tagName == "w:del": + return True + parent = parent.parentNode + return False + + def add_rsid_to_p(elem): + if not elem.hasAttribute("w:rsidR"): + elem.setAttribute("w:rsidR", self.rsid) + if not elem.hasAttribute("w:rsidRDefault"): + elem.setAttribute("w:rsidRDefault", self.rsid) + if not elem.hasAttribute("w:rsidP"): + elem.setAttribute("w:rsidP", self.rsid) + # Add w14:paraId and w14:textId if not present + if not elem.hasAttribute("w14:paraId"): + self._ensure_w14_namespace() + elem.setAttribute("w14:paraId", _generate_hex_id()) + if not elem.hasAttribute("w14:textId"): + self._ensure_w14_namespace() + elem.setAttribute("w14:textId", _generate_hex_id()) + + def add_rsid_to_r(elem): + # Use w:rsidDel for inside , otherwise w:rsidR + if is_inside_deletion(elem): + if not elem.hasAttribute("w:rsidDel"): + elem.setAttribute("w:rsidDel", self.rsid) + else: + if not elem.hasAttribute("w:rsidR"): + elem.setAttribute("w:rsidR", self.rsid) + + def add_tracked_change_attrs(elem): + # Auto-assign w:id if not present + if not elem.hasAttribute("w:id"): + elem.setAttribute("w:id", str(self._get_next_change_id())) + if not elem.hasAttribute("w:author"): + elem.setAttribute("w:author", self.author) + if not elem.hasAttribute("w:date"): + elem.setAttribute("w:date", timestamp) + # Add w16du:dateUtc for tracked changes (same as w:date since we generate UTC timestamps) + if elem.tagName in ("w:ins", "w:del") and not elem.hasAttribute( + "w16du:dateUtc" + ): + self._ensure_w16du_namespace() + elem.setAttribute("w16du:dateUtc", timestamp) + + def add_comment_attrs(elem): + if not elem.hasAttribute("w:author"): + elem.setAttribute("w:author", self.author) + if not elem.hasAttribute("w:date"): + elem.setAttribute("w:date", timestamp) + if not elem.hasAttribute("w:initials"): + elem.setAttribute("w:initials", self.initials) + + def add_comment_extensible_date(elem): + # Add w16cex:dateUtc for comment extensible elements + if not elem.hasAttribute("w16cex:dateUtc"): + self._ensure_w16cex_namespace() + elem.setAttribute("w16cex:dateUtc", timestamp) + + def add_xml_space_to_t(elem): + # Add xml:space="preserve" to w:t if text has leading/trailing whitespace + if ( + elem.firstChild + and elem.firstChild.nodeType == elem.firstChild.TEXT_NODE + ): + text = elem.firstChild.data + if text and (text[0].isspace() or text[-1].isspace()): + if not elem.hasAttribute("xml:space"): + elem.setAttribute("xml:space", "preserve") + + for node in nodes: + if node.nodeType != node.ELEMENT_NODE: + continue + + # Handle the node itself + if node.tagName == "w:p": + add_rsid_to_p(node) + elif node.tagName == "w:r": + add_rsid_to_r(node) + elif node.tagName == "w:t": + add_xml_space_to_t(node) + elif node.tagName in ("w:ins", "w:del"): + add_tracked_change_attrs(node) + elif node.tagName == "w:comment": + add_comment_attrs(node) + elif node.tagName == "w16cex:commentExtensible": + add_comment_extensible_date(node) + + # Process descendants (getElementsByTagName doesn't return the element itself) + for elem in node.getElementsByTagName("w:p"): + add_rsid_to_p(elem) + for elem in node.getElementsByTagName("w:r"): + add_rsid_to_r(elem) + for elem in node.getElementsByTagName("w:t"): + add_xml_space_to_t(elem) + for tag in ("w:ins", "w:del"): + for elem in node.getElementsByTagName(tag): + add_tracked_change_attrs(elem) + for elem in node.getElementsByTagName("w:comment"): + add_comment_attrs(elem) + for elem in node.getElementsByTagName("w16cex:commentExtensible"): + add_comment_extensible_date(elem) + + def replace_node(self, elem, new_content): + """Replace node with automatic attribute injection.""" + nodes = super().replace_node(elem, new_content) + self._inject_attributes_to_nodes(nodes) + return nodes + + def insert_after(self, elem, xml_content): + """Insert after with automatic attribute injection.""" + nodes = super().insert_after(elem, xml_content) + self._inject_attributes_to_nodes(nodes) + return nodes + + def insert_before(self, elem, xml_content): + """Insert before with automatic attribute injection.""" + nodes = super().insert_before(elem, xml_content) + self._inject_attributes_to_nodes(nodes) + return nodes + + def append_to(self, elem, xml_content): + """Append to with automatic attribute injection.""" + nodes = super().append_to(elem, xml_content) + self._inject_attributes_to_nodes(nodes) + return nodes + + def revert_insertion(self, elem): + """Reject an insertion by wrapping its content in a deletion. + + Wraps all runs inside w:ins in w:del, converting w:t to w:delText. + Can process a single w:ins element or a container element with multiple w:ins. + + Args: + elem: Element to process (w:ins, w:p, w:body, etc.) + + Returns: + list: List containing the processed element(s) + + Raises: + ValueError: If the element contains no w:ins elements + + Example: + # Reject a single insertion + ins = doc["word/document.xml"].get_node(tag="w:ins", attrs={"w:id": "5"}) + doc["word/document.xml"].revert_insertion(ins) + + # Reject all insertions in a paragraph + para = doc["word/document.xml"].get_node(tag="w:p", line_number=42) + doc["word/document.xml"].revert_insertion(para) + """ + # Collect insertions + ins_elements = [] + if elem.tagName == "w:ins": + ins_elements.append(elem) + else: + ins_elements.extend(elem.getElementsByTagName("w:ins")) + + # Validate that there are insertions to reject + if not ins_elements: + raise ValueError( + f"revert_insertion requires w:ins elements. " + f"The provided element <{elem.tagName}> contains no insertions. " + ) + + # Process all insertions - wrap all children in w:del + for ins_elem in ins_elements: + runs = list(ins_elem.getElementsByTagName("w:r")) + if not runs: + continue + + # Create deletion wrapper + del_wrapper = self.dom.createElement("w:del") + + # Process each run + for run in runs: + # Convert w:t → w:delText and w:rsidR → w:rsidDel + if run.hasAttribute("w:rsidR"): + run.setAttribute("w:rsidDel", run.getAttribute("w:rsidR")) + run.removeAttribute("w:rsidR") + elif not run.hasAttribute("w:rsidDel"): + run.setAttribute("w:rsidDel", self.rsid) + + for t_elem in list(run.getElementsByTagName("w:t")): + del_text = self.dom.createElement("w:delText") + # Copy ALL child nodes (not just firstChild) to handle entities + while t_elem.firstChild: + del_text.appendChild(t_elem.firstChild) + for i in range(t_elem.attributes.length): + attr = t_elem.attributes.item(i) + del_text.setAttribute(attr.name, attr.value) + t_elem.parentNode.replaceChild(del_text, t_elem) + + # Move all children from ins to del wrapper + while ins_elem.firstChild: + del_wrapper.appendChild(ins_elem.firstChild) + + # Add del wrapper back to ins + ins_elem.appendChild(del_wrapper) + + # Inject attributes to the deletion wrapper + self._inject_attributes_to_nodes([del_wrapper]) + + return [elem] + + def revert_deletion(self, elem): + """Reject a deletion by re-inserting the deleted content. + + Creates w:ins elements after each w:del, copying deleted content and + converting w:delText back to w:t. + Can process a single w:del element or a container element with multiple w:del. + + Args: + elem: Element to process (w:del, w:p, w:body, etc.) + + Returns: + list: If elem is w:del, returns [elem, new_ins]. Otherwise returns [elem]. + + Raises: + ValueError: If the element contains no w:del elements + + Example: + # Reject a single deletion - returns [w:del, w:ins] + del_elem = doc["word/document.xml"].get_node(tag="w:del", attrs={"w:id": "3"}) + nodes = doc["word/document.xml"].revert_deletion(del_elem) + + # Reject all deletions in a paragraph - returns [para] + para = doc["word/document.xml"].get_node(tag="w:p", line_number=42) + nodes = doc["word/document.xml"].revert_deletion(para) + """ + # Collect deletions FIRST - before we modify the DOM + del_elements = [] + is_single_del = elem.tagName == "w:del" + + if is_single_del: + del_elements.append(elem) + else: + del_elements.extend(elem.getElementsByTagName("w:del")) + + # Validate that there are deletions to reject + if not del_elements: + raise ValueError( + f"revert_deletion requires w:del elements. " + f"The provided element <{elem.tagName}> contains no deletions. " + ) + + # Track created insertion (only relevant if elem is a single w:del) + created_insertion = None + + # Process all deletions - create insertions that copy the deleted content + for del_elem in del_elements: + # Clone the deleted runs and convert them to insertions + runs = list(del_elem.getElementsByTagName("w:r")) + if not runs: + continue + + # Create insertion wrapper + ins_elem = self.dom.createElement("w:ins") + + for run in runs: + # Clone the run + new_run = run.cloneNode(True) + + # Convert w:delText → w:t + for del_text in list(new_run.getElementsByTagName("w:delText")): + t_elem = self.dom.createElement("w:t") + # Copy ALL child nodes (not just firstChild) to handle entities + while del_text.firstChild: + t_elem.appendChild(del_text.firstChild) + for i in range(del_text.attributes.length): + attr = del_text.attributes.item(i) + t_elem.setAttribute(attr.name, attr.value) + del_text.parentNode.replaceChild(t_elem, del_text) + + # Update run attributes: w:rsidDel → w:rsidR + if new_run.hasAttribute("w:rsidDel"): + new_run.setAttribute("w:rsidR", new_run.getAttribute("w:rsidDel")) + new_run.removeAttribute("w:rsidDel") + elif not new_run.hasAttribute("w:rsidR"): + new_run.setAttribute("w:rsidR", self.rsid) + + ins_elem.appendChild(new_run) + + # Insert the new insertion after the deletion + nodes = self.insert_after(del_elem, ins_elem.toxml()) + + # If processing a single w:del, track the created insertion + if is_single_del and nodes: + created_insertion = nodes[0] + + # Return based on input type + if is_single_del and created_insertion: + return [elem, created_insertion] + else: + return [elem] + + @staticmethod + def suggest_paragraph(xml_content: str) -> str: + """Transform paragraph XML to add tracked change wrapping for insertion. + + Wraps runs in and adds to w:rPr in w:pPr for numbered lists. + + Args: + xml_content: XML string containing a element + + Returns: + str: Transformed XML with tracked change wrapping + """ + wrapper = f'{xml_content}' + doc = minidom.parseString(wrapper) + para = doc.getElementsByTagName("w:p")[0] + + # Ensure w:pPr exists + pPr_list = para.getElementsByTagName("w:pPr") + if not pPr_list: + pPr = doc.createElement("w:pPr") + para.insertBefore( + pPr, para.firstChild + ) if para.firstChild else para.appendChild(pPr) + else: + pPr = pPr_list[0] + + # Ensure w:rPr exists in w:pPr + rPr_list = pPr.getElementsByTagName("w:rPr") + if not rPr_list: + rPr = doc.createElement("w:rPr") + pPr.appendChild(rPr) + else: + rPr = rPr_list[0] + + # Add to w:rPr + ins_marker = doc.createElement("w:ins") + rPr.insertBefore( + ins_marker, rPr.firstChild + ) if rPr.firstChild else rPr.appendChild(ins_marker) + + # Wrap all non-pPr children in + ins_wrapper = doc.createElement("w:ins") + for child in [c for c in para.childNodes if c.nodeName != "w:pPr"]: + para.removeChild(child) + ins_wrapper.appendChild(child) + para.appendChild(ins_wrapper) + + return para.toxml() + + def suggest_deletion(self, elem): + """Mark a w:r or w:p element as deleted with tracked changes (in-place DOM manipulation). + + For w:r: wraps in , converts to , preserves w:rPr + For w:p (regular): wraps content in , converts to + For w:p (numbered list): adds to w:rPr in w:pPr, wraps content in + + Args: + elem: A w:r or w:p DOM element without existing tracked changes + + Returns: + Element: The modified element + + Raises: + ValueError: If element has existing tracked changes or invalid structure + """ + if elem.nodeName == "w:r": + # Check for existing w:delText + if elem.getElementsByTagName("w:delText"): + raise ValueError("w:r element already contains w:delText") + + # Convert w:t → w:delText + for t_elem in list(elem.getElementsByTagName("w:t")): + del_text = self.dom.createElement("w:delText") + # Copy ALL child nodes (not just firstChild) to handle entities + while t_elem.firstChild: + del_text.appendChild(t_elem.firstChild) + # Preserve attributes like xml:space + for i in range(t_elem.attributes.length): + attr = t_elem.attributes.item(i) + del_text.setAttribute(attr.name, attr.value) + t_elem.parentNode.replaceChild(del_text, t_elem) + + # Update run attributes: w:rsidR → w:rsidDel + if elem.hasAttribute("w:rsidR"): + elem.setAttribute("w:rsidDel", elem.getAttribute("w:rsidR")) + elem.removeAttribute("w:rsidR") + elif not elem.hasAttribute("w:rsidDel"): + elem.setAttribute("w:rsidDel", self.rsid) + + # Wrap in w:del + del_wrapper = self.dom.createElement("w:del") + parent = elem.parentNode + parent.insertBefore(del_wrapper, elem) + parent.removeChild(elem) + del_wrapper.appendChild(elem) + + # Inject attributes to the deletion wrapper + self._inject_attributes_to_nodes([del_wrapper]) + + return del_wrapper + + elif elem.nodeName == "w:p": + # Check for existing tracked changes + if elem.getElementsByTagName("w:ins") or elem.getElementsByTagName("w:del"): + raise ValueError("w:p element already contains tracked changes") + + # Check if it's a numbered list item + pPr_list = elem.getElementsByTagName("w:pPr") + is_numbered = pPr_list and pPr_list[0].getElementsByTagName("w:numPr") + + if is_numbered: + # Add to w:rPr in w:pPr + pPr = pPr_list[0] + rPr_list = pPr.getElementsByTagName("w:rPr") + + if not rPr_list: + rPr = self.dom.createElement("w:rPr") + pPr.appendChild(rPr) + else: + rPr = rPr_list[0] + + # Add marker + del_marker = self.dom.createElement("w:del") + rPr.insertBefore( + del_marker, rPr.firstChild + ) if rPr.firstChild else rPr.appendChild(del_marker) + + # Convert w:t → w:delText in all runs + for t_elem in list(elem.getElementsByTagName("w:t")): + del_text = self.dom.createElement("w:delText") + # Copy ALL child nodes (not just firstChild) to handle entities + while t_elem.firstChild: + del_text.appendChild(t_elem.firstChild) + # Preserve attributes like xml:space + for i in range(t_elem.attributes.length): + attr = t_elem.attributes.item(i) + del_text.setAttribute(attr.name, attr.value) + t_elem.parentNode.replaceChild(del_text, t_elem) + + # Update run attributes: w:rsidR → w:rsidDel + for run in elem.getElementsByTagName("w:r"): + if run.hasAttribute("w:rsidR"): + run.setAttribute("w:rsidDel", run.getAttribute("w:rsidR")) + run.removeAttribute("w:rsidR") + elif not run.hasAttribute("w:rsidDel"): + run.setAttribute("w:rsidDel", self.rsid) + + # Wrap all non-pPr children in + del_wrapper = self.dom.createElement("w:del") + for child in [c for c in elem.childNodes if c.nodeName != "w:pPr"]: + elem.removeChild(child) + del_wrapper.appendChild(child) + elem.appendChild(del_wrapper) + + # Inject attributes to the deletion wrapper + self._inject_attributes_to_nodes([del_wrapper]) + + return elem + + else: + raise ValueError(f"Element must be w:r or w:p, got {elem.nodeName}") + + +def _generate_hex_id() -> str: + """Generate random 8-character hex ID for para/durable IDs. + + Values are constrained to be less than 0x7FFFFFFF per OOXML spec: + - paraId must be < 0x80000000 + - durableId must be < 0x7FFFFFFF + We use the stricter constraint (0x7FFFFFFF) for both. + """ + return f"{random.randint(1, 0x7FFFFFFE):08X}" + + +def _generate_rsid() -> str: + """Generate random 8-character hex RSID.""" + return "".join(random.choices("0123456789ABCDEF", k=8)) + + +class Document: + """Manages comments in unpacked Word documents.""" + + def __init__( + self, + unpacked_dir, + rsid=None, + track_revisions=False, + author="GLM", + initials="C", + ): + """ + Initialize with path to unpacked Word document directory. + Automatically sets up comment infrastructure (people.xml, RSIDs). + + Args: + unpacked_dir: Path to unpacked DOCX directory (must contain word/ subdirectory) + rsid: Optional RSID to use for all comment elements. If not provided, one will be generated. + track_revisions: If True, enables track revisions in settings.xml (default: False) + author: Default author name for comments (default: "GLM") + initials: Default author initials for comments (default: "C") + """ + self.original_path = Path(unpacked_dir) + + if not self.original_path.exists() or not self.original_path.is_dir(): + raise ValueError(f"Directory not found: {unpacked_dir}") + + # Create temporary directory with subdirectories for unpacked content and baseline + self.temp_dir = tempfile.mkdtemp(prefix="docx_") + self.unpacked_path = Path(self.temp_dir) / "unpacked" + shutil.copytree(self.original_path, self.unpacked_path) + + # Pack original directory into temporary .docx for validation baseline (outside unpacked dir) + self.original_docx = Path(self.temp_dir) / "original.docx" + pack_document(self.original_path, self.original_docx, validate=False) + + self.word_path = self.unpacked_path / "word" + + # Generate RSID if not provided + self.rsid = rsid if rsid else _generate_rsid() + print(f"Using RSID: {self.rsid}") + + # Set default author and initials + self.author = author + self.initials = initials + + # Cache for lazy-loaded editors + self._editors = {} + + # Comment file paths + self.comments_path = self.word_path / "comments.xml" + self.comments_extended_path = self.word_path / "commentsExtended.xml" + self.comments_ids_path = self.word_path / "commentsIds.xml" + self.comments_extensible_path = self.word_path / "commentsExtensible.xml" + + # Load existing comments and determine next ID (before setup modifies files) + self.existing_comments = self._load_existing_comments() + self.next_comment_id = self._get_next_comment_id() + + # Convenient access to document.xml editor (semi-private) + self._document = self["word/document.xml"] + + # Setup tracked changes infrastructure + self._setup_tracking(track_revisions=track_revisions) + + # Add author to people.xml + self._add_author_to_people(author) + + def __getitem__(self, xml_path: str) -> DocxXMLEditor: + """ + Get or create a DocxXMLEditor for the specified XML file. + + Enables lazy-loaded editors with bracket notation: + node = doc["word/document.xml"].get_node(tag="w:p", line_number=42) + + Args: + xml_path: Relative path to XML file (e.g., "word/document.xml", "word/comments.xml") + + Returns: + DocxXMLEditor instance for the specified file + + Raises: + ValueError: If the file does not exist + + Example: + # Get node from document.xml + node = doc["word/document.xml"].get_node(tag="w:del", attrs={"w:id": "1"}) + + # Get node from comments.xml + comment = doc["word/comments.xml"].get_node(tag="w:comment", attrs={"w:id": "0"}) + """ + if xml_path not in self._editors: + file_path = self.unpacked_path / xml_path + if not file_path.exists(): + raise ValueError(f"XML file not found: {xml_path}") + # Use DocxXMLEditor with RSID, author, and initials for all editors + self._editors[xml_path] = DocxXMLEditor( + file_path, rsid=self.rsid, author=self.author, initials=self.initials + ) + return self._editors[xml_path] + + def add_comment(self, start, end, text: str) -> int: + """ + Add a comment spanning from one element to another. + + Args: + start: DOM element for the starting point + end: DOM element for the ending point + text: Comment content + + Returns: + The comment ID that was created + + Example: + start_node = cm.get_document_node(tag="w:del", id="1") + end_node = cm.get_document_node(tag="w:ins", id="2") + cm.add_comment(start=start_node, end=end_node, text="Explanation") + """ + comment_id = self.next_comment_id + para_id = _generate_hex_id() + durable_id = _generate_hex_id() + timestamp = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ") + + # Add comment ranges to document.xml immediately + self._document.insert_before(start, self._comment_range_start_xml(comment_id)) + + # If end node is a paragraph, append comment markup inside it + # Otherwise insert after it (for run-level anchors) + if end.tagName == "w:p": + self._document.append_to(end, self._comment_range_end_xml(comment_id)) + else: + self._document.insert_after(end, self._comment_range_end_xml(comment_id)) + + # Add to comments.xml immediately + self._add_to_comments_xml( + comment_id, para_id, text, self.author, self.initials, timestamp + ) + + # Add to commentsExtended.xml immediately + self._add_to_comments_extended_xml(para_id, parent_para_id=None) + + # Add to commentsIds.xml immediately + self._add_to_comments_ids_xml(para_id, durable_id) + + # Add to commentsExtensible.xml immediately + self._add_to_comments_extensible_xml(durable_id) + + # Update existing_comments so replies work + self.existing_comments[comment_id] = {"para_id": para_id} + + self.next_comment_id += 1 + return comment_id + + def reply_to_comment( + self, + parent_comment_id: int, + text: str, + ) -> int: + """ + Add a reply to an existing comment. + + Args: + parent_comment_id: The w:id of the parent comment to reply to + text: Reply text + + Returns: + The comment ID that was created for the reply + + Example: + cm.reply_to_comment(parent_comment_id=0, text="I agree with this change") + """ + if parent_comment_id not in self.existing_comments: + raise ValueError(f"Parent comment with id={parent_comment_id} not found") + + parent_info = self.existing_comments[parent_comment_id] + comment_id = self.next_comment_id + para_id = _generate_hex_id() + durable_id = _generate_hex_id() + timestamp = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ") + + # Add comment ranges to document.xml immediately + parent_start_elem = self._document.get_node( + tag="w:commentRangeStart", attrs={"w:id": str(parent_comment_id)} + ) + parent_ref_elem = self._document.get_node( + tag="w:commentReference", attrs={"w:id": str(parent_comment_id)} + ) + + self._document.insert_after( + parent_start_elem, self._comment_range_start_xml(comment_id) + ) + parent_ref_run = parent_ref_elem.parentNode + self._document.insert_after( + parent_ref_run, f'' + ) + self._document.insert_after( + parent_ref_run, self._comment_ref_run_xml(comment_id) + ) + + # Add to comments.xml immediately + self._add_to_comments_xml( + comment_id, para_id, text, self.author, self.initials, timestamp + ) + + # Add to commentsExtended.xml immediately (with parent) + self._add_to_comments_extended_xml( + para_id, parent_para_id=parent_info["para_id"] + ) + + # Add to commentsIds.xml immediately + self._add_to_comments_ids_xml(para_id, durable_id) + + # Add to commentsExtensible.xml immediately + self._add_to_comments_extensible_xml(durable_id) + + # Update existing_comments so replies work + self.existing_comments[comment_id] = {"para_id": para_id} + + self.next_comment_id += 1 + return comment_id + + def __del__(self): + """Clean up temporary directory on deletion.""" + if hasattr(self, "temp_dir") and Path(self.temp_dir).exists(): + shutil.rmtree(self.temp_dir) + + def validate(self) -> None: + """ + Validate the document against XSD schema and redlining rules. + + Raises: + ValueError: If validation fails. + """ + # Create validators with current state + schema_validator = DOCXSchemaValidator( + self.unpacked_path, self.original_docx, verbose=False + ) + redlining_validator = RedliningValidator( + self.unpacked_path, self.original_docx, verbose=False + ) + + # Run validations + if not schema_validator.validate(): + raise ValueError("Schema validation failed") + if not redlining_validator.validate(): + raise ValueError("Redlining validation failed") + + def save(self, destination=None, validate=True) -> None: + """ + Save all modified XML files to disk and copy to destination directory. + + This persists all changes made via add_comment() and reply_to_comment(). + + Args: + destination: Optional path to save to. If None, saves back to original directory. + validate: If True, validates document before saving (default: True). + """ + # Only ensure comment relationships and content types if comment files exist + if self.comments_path.exists(): + self._ensure_comment_relationships() + self._ensure_comment_content_types() + + # Save all modified XML files in temp directory + for editor in self._editors.values(): + editor.save() + + # Validate by default + if validate: + self.validate() + + # Copy contents from temp directory to destination (or original directory) + target_path = Path(destination) if destination else self.original_path + shutil.copytree(self.unpacked_path, target_path, dirs_exist_ok=True) + + # ==================== Private: Initialization ==================== + + def _get_next_comment_id(self): + """Get the next available comment ID.""" + if not self.comments_path.exists(): + return 0 + + editor = self["word/comments.xml"] + max_id = -1 + for comment_elem in editor.dom.getElementsByTagName("w:comment"): + comment_id = comment_elem.getAttribute("w:id") + if comment_id: + try: + max_id = max(max_id, int(comment_id)) + except ValueError: + pass + return max_id + 1 + + def _load_existing_comments(self): + """Load existing comments from files to enable replies.""" + if not self.comments_path.exists(): + return {} + + editor = self["word/comments.xml"] + existing = {} + + for comment_elem in editor.dom.getElementsByTagName("w:comment"): + comment_id = comment_elem.getAttribute("w:id") + if not comment_id: + continue + + # Find para_id from the w:p element within the comment + para_id = None + for p_elem in comment_elem.getElementsByTagName("w:p"): + para_id = p_elem.getAttribute("w14:paraId") + if para_id: + break + + if not para_id: + continue + + existing[int(comment_id)] = {"para_id": para_id} + + return existing + + # ==================== Private: Setup Methods ==================== + + def _setup_tracking(self, track_revisions=False): + """Set up comment infrastructure in unpacked directory. + + Args: + track_revisions: If True, enables track revisions in settings.xml + """ + # Create or update word/people.xml + people_file = self.word_path / "people.xml" + self._update_people_xml(people_file) + + # Update XML files + self._add_content_type_for_people(self.unpacked_path / "[Content_Types].xml") + self._add_relationship_for_people( + self.word_path / "_rels" / "document.xml.rels" + ) + + # Always add RSID to settings.xml, optionally enable trackRevisions + self._update_settings( + self.word_path / "settings.xml", track_revisions=track_revisions + ) + + def _update_people_xml(self, path): + """Create people.xml if it doesn't exist.""" + if not path.exists(): + # Copy from template + shutil.copy(TEMPLATE_DIR / "people.xml", path) + + def _add_content_type_for_people(self, path): + """Add people.xml content type to [Content_Types].xml if not already present.""" + editor = self["[Content_Types].xml"] + + if self._has_override(editor, "/word/people.xml"): + return + + # Add Override element + root = editor.dom.documentElement + override_xml = '' + editor.append_to(root, override_xml) + + def _add_relationship_for_people(self, path): + """Add people.xml relationship to document.xml.rels if not already present.""" + editor = self["word/_rels/document.xml.rels"] + + if self._has_relationship(editor, "people.xml"): + return + + root = editor.dom.documentElement + root_tag = root.tagName # type: ignore + prefix = root_tag.split(":")[0] + ":" if ":" in root_tag else "" + next_rid = editor.get_next_rid() + + # Create the relationship entry + rel_xml = f'<{prefix}Relationship Id="{next_rid}" Type="http://schemas.microsoft.com/office/2011/relationships/people" Target="people.xml"/>' + editor.append_to(root, rel_xml) + + def _update_settings(self, path, track_revisions=False, update_fields=True): + """Add RSID and optionally enable track revisions and update fields in settings.xml. + + Args: + path: Path to settings.xml + track_revisions: If True, adds trackRevisions element + update_fields: If True, adds updateFields element to auto-update fields on open + + Places elements per OOXML schema order: + - trackRevisions: early (before defaultTabStop) + - updateFields: early (before defaultTabStop) + - rsids: late (after compat) + """ + editor = self["word/settings.xml"] + root = editor.get_node(tag="w:settings") + prefix = root.tagName.split(":")[0] if ":" in root.tagName else "w" + + # Conditionally add trackRevisions if requested + if track_revisions: + track_revisions_exists = any( + elem.tagName == f"{prefix}:trackRevisions" + for elem in editor.dom.getElementsByTagName(f"{prefix}:trackRevisions") + ) + + if not track_revisions_exists: + track_rev_xml = f"<{prefix}:trackRevisions/>" + # Try to insert before documentProtection, defaultTabStop, or at start + inserted = False + for tag in [f"{prefix}:documentProtection", f"{prefix}:defaultTabStop"]: + elements = editor.dom.getElementsByTagName(tag) + if elements: + editor.insert_before(elements[0], track_rev_xml) + inserted = True + break + if not inserted: + # Insert as first child of settings + if root.firstChild: + editor.insert_before(root.firstChild, track_rev_xml) + else: + editor.append_to(root, track_rev_xml) + + # Conditionally add updateFields if requested + if update_fields: + update_fields_exists = any( + elem.tagName == f"{prefix}:updateFields" + for elem in editor.dom.getElementsByTagName(f"{prefix}:updateFields") + ) + + if not update_fields_exists: + update_fields_xml = f'<{prefix}:updateFields {prefix}:val="true"/>' + # Try to insert before defaultTabStop, hyphenationZone, or at start + inserted = False + for tag in [f"{prefix}:defaultTabStop", f"{prefix}:hyphenationZone"]: + elements = editor.dom.getElementsByTagName(tag) + if elements: + editor.insert_before(elements[0], update_fields_xml) + inserted = True + break + if not inserted: + # Insert as first child of settings + if root.firstChild: + editor.insert_before(root.firstChild, update_fields_xml) + else: + editor.append_to(root, update_fields_xml) + + # Always check if rsids section exists + rsids_elements = editor.dom.getElementsByTagName(f"{prefix}:rsids") + + if not rsids_elements: + # Add new rsids section + rsids_xml = f'''<{prefix}:rsids> + <{prefix}:rsidRoot {prefix}:val="{self.rsid}"/> + <{prefix}:rsid {prefix}:val="{self.rsid}"/> +''' + + # Try to insert after compat, before clrSchemeMapping, or before closing tag + inserted = False + compat_elements = editor.dom.getElementsByTagName(f"{prefix}:compat") + if compat_elements: + editor.insert_after(compat_elements[0], rsids_xml) + inserted = True + + if not inserted: + clr_elements = editor.dom.getElementsByTagName( + f"{prefix}:clrSchemeMapping" + ) + if clr_elements: + editor.insert_before(clr_elements[0], rsids_xml) + inserted = True + + if not inserted: + editor.append_to(root, rsids_xml) + else: + # Check if this rsid already exists + rsids_elem = rsids_elements[0] + rsid_exists = any( + elem.getAttribute(f"{prefix}:val") == self.rsid + for elem in rsids_elem.getElementsByTagName(f"{prefix}:rsid") + ) + + if not rsid_exists: + rsid_xml = f'<{prefix}:rsid {prefix}:val="{self.rsid}"/>' + editor.append_to(rsids_elem, rsid_xml) + + # ==================== Private: XML File Creation ==================== + + def _add_to_comments_xml( + self, comment_id, para_id, text, author, initials, timestamp + ): + """Add a single comment to comments.xml.""" + if not self.comments_path.exists(): + shutil.copy(TEMPLATE_DIR / "comments.xml", self.comments_path) + + editor = self["word/comments.xml"] + root = editor.get_node(tag="w:comments") + + escaped_text = ( + text.replace("&", "&").replace("<", "<").replace(">", ">") + ) + # Note: w:rsidR, w:rsidRDefault, w:rsidP on w:p, w:rsidR on w:r, + # and w:author, w:date, w:initials on w:comment are automatically added by DocxXMLEditor + comment_xml = f''' + + + {escaped_text} + +''' + editor.append_to(root, comment_xml) + + def _add_to_comments_extended_xml(self, para_id, parent_para_id): + """Add a single comment to commentsExtended.xml.""" + if not self.comments_extended_path.exists(): + shutil.copy( + TEMPLATE_DIR / "commentsExtended.xml", self.comments_extended_path + ) + + editor = self["word/commentsExtended.xml"] + root = editor.get_node(tag="w15:commentsEx") + + if parent_para_id: + xml = f'' + else: + xml = f'' + editor.append_to(root, xml) + + def _add_to_comments_ids_xml(self, para_id, durable_id): + """Add a single comment to commentsIds.xml.""" + if not self.comments_ids_path.exists(): + shutil.copy(TEMPLATE_DIR / "commentsIds.xml", self.comments_ids_path) + + editor = self["word/commentsIds.xml"] + root = editor.get_node(tag="w16cid:commentsIds") + + xml = f'' + editor.append_to(root, xml) + + def _add_to_comments_extensible_xml(self, durable_id): + """Add a single comment to commentsExtensible.xml.""" + if not self.comments_extensible_path.exists(): + shutil.copy( + TEMPLATE_DIR / "commentsExtensible.xml", self.comments_extensible_path + ) + + editor = self["word/commentsExtensible.xml"] + root = editor.get_node(tag="w16cex:commentsExtensible") + + xml = f'' + editor.append_to(root, xml) + + # ==================== Private: XML Fragments ==================== + + def _comment_range_start_xml(self, comment_id): + """Generate XML for comment range start.""" + return f'' + + def _comment_range_end_xml(self, comment_id): + """Generate XML for comment range end with reference run. + + Note: w:rsidR is automatically added by DocxXMLEditor. + """ + return f''' + + + +''' + + def _comment_ref_run_xml(self, comment_id): + """Generate XML for comment reference run. + + Note: w:rsidR is automatically added by DocxXMLEditor. + """ + return f''' + + +''' + + # ==================== Private: Metadata Updates ==================== + + def _has_relationship(self, editor, target): + """Check if a relationship with given target exists.""" + for rel_elem in editor.dom.getElementsByTagName("Relationship"): + if rel_elem.getAttribute("Target") == target: + return True + return False + + def _has_override(self, editor, part_name): + """Check if an override with given part name exists.""" + for override_elem in editor.dom.getElementsByTagName("Override"): + if override_elem.getAttribute("PartName") == part_name: + return True + return False + + def _has_author(self, editor, author): + """Check if an author already exists in people.xml.""" + for person_elem in editor.dom.getElementsByTagName("w15:person"): + if person_elem.getAttribute("w15:author") == author: + return True + return False + + def _add_author_to_people(self, author): + """Add author to people.xml (called during initialization).""" + people_path = self.word_path / "people.xml" + + # people.xml should already exist from _setup_tracking + if not people_path.exists(): + raise ValueError("people.xml should exist after _setup_tracking") + + editor = self["word/people.xml"] + root = editor.get_node(tag="w15:people") + + # Check if author already exists + if self._has_author(editor, author): + return + + # Add author with proper XML escaping to prevent injection + escaped_author = html.escape(author, quote=True) + person_xml = f''' + +''' + editor.append_to(root, person_xml) + + def _ensure_comment_relationships(self): + """Ensure word/_rels/document.xml.rels has comment relationships.""" + editor = self["word/_rels/document.xml.rels"] + + if self._has_relationship(editor, "comments.xml"): + return + + root = editor.dom.documentElement + root_tag = root.tagName # type: ignore + prefix = root_tag.split(":")[0] + ":" if ":" in root_tag else "" + next_rid_num = int(editor.get_next_rid()[3:]) + + # Add relationship elements + rels = [ + ( + next_rid_num, + "http://schemas.openxmlformats.org/officeDocument/2006/relationships/comments", + "comments.xml", + ), + ( + next_rid_num + 1, + "http://schemas.microsoft.com/office/2011/relationships/commentsExtended", + "commentsExtended.xml", + ), + ( + next_rid_num + 2, + "http://schemas.microsoft.com/office/2016/09/relationships/commentsIds", + "commentsIds.xml", + ), + ( + next_rid_num + 3, + "http://schemas.microsoft.com/office/2018/08/relationships/commentsExtensible", + "commentsExtensible.xml", + ), + ] + + for rel_id, rel_type, target in rels: + rel_xml = f'<{prefix}Relationship Id="rId{rel_id}" Type="{rel_type}" Target="{target}"/>' + editor.append_to(root, rel_xml) + + def _ensure_comment_content_types(self): + """Ensure [Content_Types].xml has comment content types.""" + editor = self["[Content_Types].xml"] + + if self._has_override(editor, "/word/comments.xml"): + return + + root = editor.dom.documentElement + + # Add Override elements + overrides = [ + ( + "/word/comments.xml", + "application/vnd.openxmlformats-officedocument.wordprocessingml.comments+xml", + ), + ( + "/word/commentsExtended.xml", + "application/vnd.openxmlformats-officedocument.wordprocessingml.commentsExtended+xml", + ), + ( + "/word/commentsIds.xml", + "application/vnd.openxmlformats-officedocument.wordprocessingml.commentsIds+xml", + ), + ( + "/word/commentsExtensible.xml", + "application/vnd.openxmlformats-officedocument.wordprocessingml.commentsExtensible+xml", + ), + ] + + for part_name, content_type in overrides: + override_xml = ( + f'' + ) + editor.append_to(root, override_xml) diff --git a/skills/docx/scripts/templates/comments.xml b/skills/docx/scripts/templates/comments.xml new file mode 100755 index 0000000..b5dace0 --- /dev/null +++ b/skills/docx/scripts/templates/comments.xml @@ -0,0 +1,3 @@ + + + \ No newline at end of file diff --git a/skills/docx/scripts/templates/commentsExtended.xml b/skills/docx/scripts/templates/commentsExtended.xml new file mode 100755 index 0000000..b4cf23e --- /dev/null +++ b/skills/docx/scripts/templates/commentsExtended.xml @@ -0,0 +1,3 @@ + + + \ No newline at end of file diff --git a/skills/docx/scripts/templates/commentsExtensible.xml b/skills/docx/scripts/templates/commentsExtensible.xml new file mode 100755 index 0000000..e32a05e --- /dev/null +++ b/skills/docx/scripts/templates/commentsExtensible.xml @@ -0,0 +1,3 @@ + + + \ No newline at end of file diff --git a/skills/docx/scripts/templates/commentsIds.xml b/skills/docx/scripts/templates/commentsIds.xml new file mode 100755 index 0000000..d04bc8e --- /dev/null +++ b/skills/docx/scripts/templates/commentsIds.xml @@ -0,0 +1,3 @@ + + + \ No newline at end of file diff --git a/skills/docx/scripts/templates/people.xml b/skills/docx/scripts/templates/people.xml new file mode 100755 index 0000000..a839caf --- /dev/null +++ b/skills/docx/scripts/templates/people.xml @@ -0,0 +1,3 @@ + + + \ No newline at end of file diff --git a/skills/docx/scripts/utilities.py b/skills/docx/scripts/utilities.py new file mode 100755 index 0000000..d92dae6 --- /dev/null +++ b/skills/docx/scripts/utilities.py @@ -0,0 +1,374 @@ +#!/usr/bin/env python3 +""" +Utilities for editing OOXML documents. + +This module provides XMLEditor, a tool for manipulating XML files with support for +line-number-based node finding and DOM manipulation. Each element is automatically +annotated with its original line and column position during parsing. + +Example usage: + editor = XMLEditor("document.xml") + + # Find node by line number or range + elem = editor.get_node(tag="w:r", line_number=519) + elem = editor.get_node(tag="w:p", line_number=range(100, 200)) + + # Find node by text content + elem = editor.get_node(tag="w:p", contains="specific text") + + # Find node by attributes + elem = editor.get_node(tag="w:r", attrs={"w:id": "target"}) + + # Combine filters + elem = editor.get_node(tag="w:p", line_number=range(1, 50), contains="text") + + # Replace, insert, or manipulate + new_elem = editor.replace_node(elem, "new text") + editor.insert_after(new_elem, "more") + + # Save changes + editor.save() +""" + +import html +from pathlib import Path +from typing import Optional, Union + +import defusedxml.minidom +import defusedxml.sax + + +class XMLEditor: + """ + Editor for manipulating OOXML XML files with line-number-based node finding. + + This class parses XML files and tracks the original line and column position + of each element. This enables finding nodes by their line number in the original + file, which is useful when working with Read tool output. + + Attributes: + xml_path: Path to the XML file being edited + encoding: Detected encoding of the XML file ('ascii' or 'utf-8') + dom: Parsed DOM tree with parse_position attributes on elements + """ + + def __init__(self, xml_path): + """ + Initialize with path to XML file and parse with line number tracking. + + Args: + xml_path: Path to XML file to edit (str or Path) + + Raises: + ValueError: If the XML file does not exist + """ + self.xml_path = Path(xml_path) + if not self.xml_path.exists(): + raise ValueError(f"XML file not found: {xml_path}") + + with open(self.xml_path, "rb") as f: + header = f.read(200).decode("utf-8", errors="ignore") + self.encoding = "ascii" if 'encoding="ascii"' in header else "utf-8" + + parser = _create_line_tracking_parser() + self.dom = defusedxml.minidom.parse(str(self.xml_path), parser) + + def get_node( + self, + tag: str, + attrs: Optional[dict[str, str]] = None, + line_number: Optional[Union[int, range]] = None, + contains: Optional[str] = None, + ): + """ + Get a DOM element by tag and identifier. + + Finds an element by either its line number in the original file or by + matching attribute values. Exactly one match must be found. + + Args: + tag: The XML tag name (e.g., "w:del", "w:ins", "w:r") + attrs: Dictionary of attribute name-value pairs to match (e.g., {"w:id": "1"}) + line_number: Line number (int) or line range (range) in original XML file (1-indexed) + contains: Text string that must appear in any text node within the element. + Supports both entity notation (“) and Unicode characters (\u201c). + + Returns: + defusedxml.minidom.Element: The matching DOM element + + Raises: + ValueError: If node not found or multiple matches found + + Example: + elem = editor.get_node(tag="w:r", line_number=519) + elem = editor.get_node(tag="w:r", line_number=range(100, 200)) + elem = editor.get_node(tag="w:del", attrs={"w:id": "1"}) + elem = editor.get_node(tag="w:p", attrs={"w14:paraId": "12345678"}) + elem = editor.get_node(tag="w:commentRangeStart", attrs={"w:id": "0"}) + elem = editor.get_node(tag="w:p", contains="specific text") + elem = editor.get_node(tag="w:t", contains="“Agreement") # Entity notation + elem = editor.get_node(tag="w:t", contains="\u201cAgreement") # Unicode character + """ + matches = [] + for elem in self.dom.getElementsByTagName(tag): + # Check line_number filter + if line_number is not None: + parse_pos = getattr(elem, "parse_position", (None,)) + elem_line = parse_pos[0] + + # Handle both single line number and range + if isinstance(line_number, range): + if elem_line not in line_number: + continue + else: + if elem_line != line_number: + continue + + # Check attrs filter + if attrs is not None: + if not all( + elem.getAttribute(attr_name) == attr_value + for attr_name, attr_value in attrs.items() + ): + continue + + # Check contains filter + if contains is not None: + elem_text = self._get_element_text(elem) + # Normalize the search string: convert HTML entities to Unicode characters + # This allows searching for both "“Rowan" and ""Rowan" + normalized_contains = html.unescape(contains) + if normalized_contains not in elem_text: + continue + + # If all applicable filters passed, this is a match + matches.append(elem) + + if not matches: + # Build descriptive error message + filters = [] + if line_number is not None: + line_str = ( + f"lines {line_number.start}-{line_number.stop - 1}" + if isinstance(line_number, range) + else f"line {line_number}" + ) + filters.append(f"at {line_str}") + if attrs is not None: + filters.append(f"with attributes {attrs}") + if contains is not None: + filters.append(f"containing '{contains}'") + + filter_desc = " ".join(filters) if filters else "" + base_msg = f"Node not found: <{tag}> {filter_desc}".strip() + + # Add helpful hint based on filters used + if contains: + hint = "Text may be split across elements or use different wording." + elif line_number: + hint = "Line numbers may have changed if document was modified." + elif attrs: + hint = "Verify attribute values are correct." + else: + hint = "Try adding filters (attrs, line_number, or contains)." + + raise ValueError(f"{base_msg}. {hint}") + if len(matches) > 1: + raise ValueError( + f"Multiple nodes found: <{tag}>. " + f"Add more filters (attrs, line_number, or contains) to narrow the search." + ) + return matches[0] + + def _get_element_text(self, elem): + """ + Recursively extract all text content from an element. + + Skips text nodes that contain only whitespace (spaces, tabs, newlines), + which typically represent XML formatting rather than document content. + + Args: + elem: defusedxml.minidom.Element to extract text from + + Returns: + str: Concatenated text from all non-whitespace text nodes within the element + """ + text_parts = [] + for node in elem.childNodes: + if node.nodeType == node.TEXT_NODE: + # Skip whitespace-only text nodes (XML formatting) + if node.data.strip(): + text_parts.append(node.data) + elif node.nodeType == node.ELEMENT_NODE: + text_parts.append(self._get_element_text(node)) + return "".join(text_parts) + + def replace_node(self, elem, new_content): + """ + Replace a DOM element with new XML content. + + Args: + elem: defusedxml.minidom.Element to replace + new_content: String containing XML to replace the node with + + Returns: + List[defusedxml.minidom.Node]: All inserted nodes + + Example: + new_nodes = editor.replace_node(old_elem, "text") + """ + parent = elem.parentNode + nodes = self._parse_fragment(new_content) + for node in nodes: + parent.insertBefore(node, elem) + parent.removeChild(elem) + return nodes + + def insert_after(self, elem, xml_content): + """ + Insert XML content after a DOM element. + + Args: + elem: defusedxml.minidom.Element to insert after + xml_content: String containing XML to insert + + Returns: + List[defusedxml.minidom.Node]: All inserted nodes + + Example: + new_nodes = editor.insert_after(elem, "text") + """ + parent = elem.parentNode + next_sibling = elem.nextSibling + nodes = self._parse_fragment(xml_content) + for node in nodes: + if next_sibling: + parent.insertBefore(node, next_sibling) + else: + parent.appendChild(node) + return nodes + + def insert_before(self, elem, xml_content): + """ + Insert XML content before a DOM element. + + Args: + elem: defusedxml.minidom.Element to insert before + xml_content: String containing XML to insert + + Returns: + List[defusedxml.minidom.Node]: All inserted nodes + + Example: + new_nodes = editor.insert_before(elem, "text") + """ + parent = elem.parentNode + nodes = self._parse_fragment(xml_content) + for node in nodes: + parent.insertBefore(node, elem) + return nodes + + def append_to(self, elem, xml_content): + """ + Append XML content as a child of a DOM element. + + Args: + elem: defusedxml.minidom.Element to append to + xml_content: String containing XML to append + + Returns: + List[defusedxml.minidom.Node]: All inserted nodes + + Example: + new_nodes = editor.append_to(elem, "text") + """ + nodes = self._parse_fragment(xml_content) + for node in nodes: + elem.appendChild(node) + return nodes + + def get_next_rid(self): + """Get the next available rId for relationships files.""" + max_id = 0 + for rel_elem in self.dom.getElementsByTagName("Relationship"): + rel_id = rel_elem.getAttribute("Id") + if rel_id.startswith("rId"): + try: + max_id = max(max_id, int(rel_id[3:])) + except ValueError: + pass + return f"rId{max_id + 1}" + + def save(self): + """ + Save the edited XML back to the file. + + Serializes the DOM tree and writes it back to the original file path, + preserving the original encoding (ascii or utf-8). + """ + content = self.dom.toxml(encoding=self.encoding) + self.xml_path.write_bytes(content) + + def _parse_fragment(self, xml_content): + """ + Parse XML fragment and return list of imported nodes. + + Args: + xml_content: String containing XML fragment + + Returns: + List of defusedxml.minidom.Node objects imported into this document + + Raises: + AssertionError: If fragment contains no element nodes + """ + # Extract namespace declarations from the root document element + root_elem = self.dom.documentElement + namespaces = [] + if root_elem and root_elem.attributes: + for i in range(root_elem.attributes.length): + attr = root_elem.attributes.item(i) + if attr.name.startswith("xmlns"): # type: ignore + namespaces.append(f'{attr.name}="{attr.value}"') # type: ignore + + ns_decl = " ".join(namespaces) + wrapper = f"{xml_content}" + fragment_doc = defusedxml.minidom.parseString(wrapper) + nodes = [ + self.dom.importNode(child, deep=True) + for child in fragment_doc.documentElement.childNodes # type: ignore + ] + elements = [n for n in nodes if n.nodeType == n.ELEMENT_NODE] + assert elements, "Fragment must contain at least one element" + return nodes + + +def _create_line_tracking_parser(): + """ + Create a SAX parser that tracks line and column numbers for each element. + + Monkey patches the SAX content handler to store the current line and column + position from the underlying expat parser onto each element as a parse_position + attribute (line, column) tuple. + + Returns: + defusedxml.sax.xmlreader.XMLReader: Configured SAX parser + """ + + def set_content_handler(dom_handler): + def startElementNS(name, tagName, attrs): + orig_start_cb(name, tagName, attrs) + cur_elem = dom_handler.elementStack[-1] + cur_elem.parse_position = ( + parser._parser.CurrentLineNumber, # type: ignore + parser._parser.CurrentColumnNumber, # type: ignore + ) + + orig_start_cb = dom_handler.startElementNS + dom_handler.startElementNS = startElementNS + orig_set_content_handler(dom_handler) + + parser = defusedxml.sax.make_parser() + orig_set_content_handler = parser.setContentHandler + parser.setContentHandler = set_content_handler # type: ignore + return parser diff --git a/skills/finance/Finance_API_Doc.md b/skills/finance/Finance_API_Doc.md new file mode 100755 index 0000000..2a14480 --- /dev/null +++ b/skills/finance/Finance_API_Doc.md @@ -0,0 +1,445 @@ +# Finance API Complete Documentation + +## API Overview + +Finance API provides comprehensive financial data access interfaces, including real-time market data, historical stock prices, and the latest financial news. + +### 🌐 Access via API Gateway + +**This API is accessed through the web-dev-ai-gateway unified proxy service.** + +**Gateway Configuration:** +- **Gateway Base URL:** `GATEWAY_URL` (e.g., `https://internal-api.z.ai`) +- **API Path Prefix:** `API_PREFIX` (e.g., `/external/finance`) +- **Authentication:** Automatic (gateway injects `x-rapidapi-host` and `x-rapidapi-key`) +- **Required Header:** `X-Z-AI-From: Z` + +**URL Structure:** +``` +{GATEWAY_URL}{API_PREFIX}/{endpoint} +``` + +**Example:** +- Full URL: `https://internal-api.z.ai/external/finance/v1/markets/search?search=Apple` +- Breakdown: + - `https://internal-api.z.ai` - Gateway base URL (`GATEWAY_URL`) + - `/external/finance` - API path prefix (`API_PREFIX`) + - `/v1/markets/search` - API endpoint path + + +### Quick Start + +```bash +# Get real-time quote for Apple +curl -X GET "{GATEWAY_URL}{API_PREFIX}/v1/markets/quote?ticker=AAPL&type=STOCKS" \ + -H "X-Z-AI-From: Z" +``` + + +## 1. Market Data API + +### 1.1 GET v2/markets/tickers - Get All Available Market Tickers + +**Parameters:** +- `page` (optional, Number): Page number, default value is 1 +- `type` (required, String): Asset type, optional values: + - `STOCKS` - Stocks + - `ETF` - Exchange Traded Funds + - `MUTUALFUNDS` - Mutual Funds + +**curl example (via Gateway):** +```bash +curl -X GET "{GATEWAY_URL}{API_PREFIX}/v2/markets/tickers?page=1&type=STOCKS" \ + -H "X-Z-AI-From: Z" +``` + +--- + +### 1.2 GET v1/markets/search - Search Stocks + +**Parameters:** +- `search` (required, String): Search keyword (company name or stock symbol) + +**curl example (via Gateway):** +```bash +curl -X GET "{GATEWAY_URL}{API_PREFIX}/v1/markets/search?search=Apple" \ + -H "X-Z-AI-From: Z" +``` + +**Purpose:** Used to find specific stock or company ticker codes + +--- + +### 1.3 GET v1/markets/quote (real-time) - Real-time Quotes + +**Parameters:** +- `ticker` (required, String): Stock symbol (only one can be entered) +- `type` (required, String): Asset type + - `STOCKS` - Stocks + - `ETF` - Exchange Traded Funds + - `MUTUALFUNDS` - Mutual Funds + +**curl example (via Gateway):** +```bash +curl -X GET "{GATEWAY_URL}{API_PREFIX}/v1/markets/quote?ticker=AAPL&type=STOCKS" \ + -H "X-Z-AI-From: Z" +``` + +--- + +### 1.4 GET v1/markets/stock/quotes (snapshots) - Snapshot Quotes + +**Parameters:** +- `ticker` (required, String): Stock symbols, separated by commas + +**curl example:** +```bash +curl --request GET \ + --url '{GATEWAY_URL}{API_PREFIX}/v1/markets/stock/quotes?ticker=AAPL%2CMSFT%2C%5ESPX%2C%5ENYA%2CGAZP.ME%2CSIBN.ME%2CGEECEE.NS' +``` + +**Purpose:** Batch get snapshot data for multiple stocks + +--- + + +## 2. Historical Data API + +### 2.1 GET v1/markets/stock/history - Stock Historical Data + +**Parameters:** +- `symbol` (required, String): Stock symbol +- `interval` (required, String): Time interval + - `5m` - 5 minutes + - `15m` - 15 minutes + - `30m` - 30 minutes + - `1h` - 1 hour + - `1d` - Daily + - `1wk` - Weekly + - `1mo` - Monthly + - `3mo` - 3 months +- `diffandsplits` (optional, String): Include dividend and split data + - `true` - Include + - `false` - Exclude (default) + +**curl example:** +```bash +curl --request GET \ + --url '{GATEWAY_URL}{API_PREFIX}/v1/markets/stock/history?symbol=AAPL&interval=1d&diffandsplits=false' +``` + +**Purpose:** Get historical price data for specific stocks, used for technical analysis and backtesting + +--- + +### 2.2 GET v2/markets/stock/history - Stock Historical Data V2 + +**Parameters:** +- `symbol` (required, String): Stock symbol +- `interval` (optional, String): Time interval + - `1m`, `2m`, `3m`, `4m`, `5m`, `15m`, `30m` + - `1h`, `1d`, `1wk`, `1mo`, `1qty` +- `limit` (optional, Number): Limit the number of candles (1-1000) +- `dividend` (optional, String): Include dividend data (`true` or `false`) + +**curl example:** +```bash +curl --request GET \ + --url '{GATEWAY_URL}{API_PREFIX}/v2/markets/stock/history?symbol=AAPL&interval=1m&limit=640' +``` + +**Purpose:** Enhanced historical data interface + +--- + +## 3. News API + +### 3.1 GET v1/markets/news - Market News + +**Parameters:** +- `ticker` (optional, String): Stock symbols, comma-separated for multiple stocks + +**curl example:** +```bash +# Get general market news +curl --request GET \ + --url '{GATEWAY_URL}{API_PREFIX}/v1/markets/news' + +# Get specific stock news +curl --request GET \ + --url '{GATEWAY_URL}{API_PREFIX}/v1/markets/news?ticker=AAPL,TSLA' +``` + +**Purpose:** Get the latest market news and updates + +--- + +### 3.2 GET v2/markets/news - Market News V2 + +**Parameters:** +- `ticker` (optional, String): Stock symbol +- `type` (optional, String): News type (`ALL`, `VIDEO`, `PRESS-RELEASE`) + +**curl example:** +```bash +curl --request GET \ + --url '{GATEWAY_URL}{API_PREFIX}/v2/markets/news?ticker=AAPL&type=ALL' +``` + +**Purpose:** Enhanced interface for getting latest market-related news + +--- + +## 5. Stock Detailed Information API + +### 5.1 GET v1/markets/stock/modules (asset-profile) - Company Profile + +**Parameters:** +- `ticker` (required, String): Stock symbol + +**curl example:** +```bash +curl --request GET \ + --url '{GATEWAY_URL}{API_PREFIX}/v1/markets/stock/modules?ticker=AAPL&module=asset-profile' +``` + +**Purpose:** Get company basic information, business description, management team, etc. + +--- + +### 5.2 GET v1/stock/modules - Stock Module Data + +**Parameters:** +- `ticker` (required, String): Stock symbol +- `module` (required, String): Module name (one per request) + - Acceptable values: `profile`, `income-statement`, `balance-sheet`, `cashflow-statement`, + `statistics`, `calendar-events`, `sec-filings`, `recommendation-trend`, + `upgrade-downgrade-history`, `institution-ownership`, `fund-ownership`, + `major-directHolders`, `major-holders-breakdown`, `insider-transactions`, + `insider-holders`, `net-share-purchase-activity`, `earnings`, `industry-trend`, + `index-trend`, `sector-trend` + +**curl example:** +```bash +# Get specific module +curl --request GET \ + --url '{GATEWAY_URL}{API_PREFIX}/v1/markets/stock/modules?ticker=AAPL&module=statistics' +``` + +**Purpose:** Get one data module per request (price, financial, analyst ratings, etc.) + +--- + +### 5.3 GET v1/markets/stock/modules (statistics) - Stock Statistics + +**Parameters:** +- `ticker` (required, String): Stock symbol + +**curl example:** +```bash +curl --request GET \ + --url '{GATEWAY_URL}{API_PREFIX}/v1/markets/stock/modules?ticker=AAPL&module=statistics' +``` + +**Purpose:** Get key statistical indicators such as PE ratios, market cap, trading volume + +--- + +### 5.4 GET v1/markets/stock/modules (financial-data) - Get Financial Data + +**Parameters:** +- `ticker` (required, String): Stock symbol +- `module` (required, String): `financial-data` + +**curl example:** +```bash +curl --request GET \ + --url '{GATEWAY_URL}{API_PREFIX}/v1/markets/stock/modules?ticker=AAPL&module=financial-data' +``` + +**Purpose:** Get revenue, profit, cash flow and other financial indicators + +--- + +### 5.5 GET v1/markets/stock/modules (sec-filings) - Get SEC Filings + +**Parameters:** +- `ticker` (required, String): Stock symbol +- `module` (required, String): `sec-filings` + +**curl example:** +```bash +curl --request GET \ + --url '{GATEWAY_URL}{API_PREFIX}/v1/markets/stock/modules?ticker=AAPL&module=sec-filings' +``` + +**Purpose:** Get files submitted by companies to the U.S. Securities and Exchange Commission + +--- + +### 5.6 GET v1/markets/stock/modules (earnings) - Earnings Data + +**Parameters:** +- `ticker` (required, String): Stock symbol + +**curl example:** +```bash +curl --request GET \ + --url '{GATEWAY_URL}{API_PREFIX}/v1/markets/stock/modules?ticker=AAPL&module=earnings' +``` + +**Purpose:** Get quarterly and annual earnings information + +--- + +### 5.7 GET v1/markets/stock/modules (calendar-events) - Get Calendar Events + +**Parameters:** +- `ticker` (required, String): Stock symbol +- `module` (required, String): `calendar-events` + +**curl example:** +```bash +curl --request GET \ + --url '{GATEWAY_URL}{API_PREFIX}/v1/markets/stock/modules?ticker=AAPL&module=calendar-events' +``` + +**Purpose:** Get upcoming earnings release dates, dividend dates, etc. + +--- + +## 6. Financial Statements API + +### 7.1 GET v1/markets/stock/modules (balance-sheet) - Balance Sheet + +**Parameters:** +- `ticker` (required, String): Stock symbol + +**curl example:** +```bash +curl --request GET \ + --url '{GATEWAY_URL}{API_PREFIX}/v1/markets/stock/modules?ticker=AAPL&module=balance-sheet' +``` + +**Purpose:** Get company balance sheet data + +--- + +### 7.3 GET v1/markets/stock/modules (income-statement) - Income Statement + +**Parameters:** +- `ticker` (required, String): Stock symbol + +**curl example:** +```bash +curl --request GET \ + --url '{GATEWAY_URL}{API_PREFIX}/v1/markets/stock/modules?ticker=AAPL&module=income-statement' +``` + +**Purpose:** Get company income statement data + +--- + +### 7.4 GET v1/markets/stock/modules (cashflow-statement) - Cash Flow Statement + +**Parameters:** +- `ticker` (required, String): Stock symbol + +**curl example:** +```bash +curl --request GET \ + --url '{GATEWAY_URL}{API_PREFIX}/v1/markets/stock/modules?ticker=AAPL&module=cashflow-statement' +``` + +**Purpose:** Get company cash flow statement data + +--- + +## Usage Flow Examples + +### Example 1: Find and Get Real-time Stock Data + +```bash +# 1. Search company +GET /v1/markets/search?search=Apple + +# 2. Get real-time quote +GET /v1/markets/quote?ticker=AAPL&type=STOCKS + +# 3. Get detailed information +GET /v1/markets/stock/modules?ticker=AAPL&module=asset-profile +``` + +### Example 2: Analyze Stock Investment Value + +```bash +# 1. Get financial data +GET /v1/markets/stock/modules?ticker=AAPL&module=financial-data + +# 2. Get earnings data +GET /v1/markets/stock/modules?ticker=AAPL&module=earnings +``` + +--- + +## Usage Tips + +### 1. Batch Query Optimization +```bash +# Get data for multiple stocks at once (snapshots endpoint) via Gateway +curl -X GET "{GATEWAY_URL}{API_PREFIX}/v1/markets/stock/quotes?ticker=AAPL,MSFT,GOOGL,AMZN,TSLA" \ + -H "X-Z-AI-From: Z" +``` + +### 2. Time Range Query +```bash +# Get historical data with specific interval via Gateway +curl -X GET "{GATEWAY_URL}{API_PREFIX}/v1/markets/stock/history?symbol=AAPL&interval=1d&diffandsplits=false" \ + -H "X-Z-AI-From: Z" +``` + +### 3. Combined Query Example +### 3. Combined Query Example + +**Python example (via Gateway):** +```python +import requests + +# Gateway automatically handles authentication +headers = { + 'X-Z-AI-From': 'Z' +} + +gateway_url = '{GATEWAY_URL}{API_PREFIX}/v1' +symbol = 'AAPL' + +# Get real-time price +quote = requests.get(f'{gateway_url}/markets/quote?ticker={symbol}&type=STOCKS', headers=headers) + +# Get company profile +profile = requests.get(f'{gateway_url}/markets/stock/modules?ticker={symbol}&module=asset-profile', headers=headers) + +# Get financial data +financials = requests.get(f'{gateway_url}/markets/stock/modules?ticker={symbol}&module=financial-data', headers=headers) +``` + + +--- + +## Best Practices + +### Gateway Usage + +1. **Authentication Header** - Always include `X-Z-AI-From: Z` header + +### API Usage + +1. **Rate Limiting:** Pay attention to API call frequency limits to avoid being throttled +2. **Error Handling:** Implement comprehensive error handling mechanisms +3. **Data Caching:** Consider caching common requests to optimize performance +4. **Batch Queries:** Use comma-separated symbols parameter to query multiple stocks at once +5. **Timestamps:** Use Unix timestamps for historical data queries +6. **Parameter Validation:** Validate all required parameters before sending requests +7. **Response Parsing:** Implement robust JSON parsing and data validation + +--- diff --git a/skills/finance/SKILL.md b/skills/finance/SKILL.md new file mode 100755 index 0000000..d58ca07 --- /dev/null +++ b/skills/finance/SKILL.md @@ -0,0 +1,53 @@ +--- +name: finance +description: "Comprehensive Finance API integration skill for real-time and historical financial data analysis, market research, and investment decision-making. Priority use cases: stock price queries, market data analysis, company financial information, portfolio tracking, market news retrieval, stock screening, technical analysis, and any financial market-related requests. This skill should be the primary choice for all Finance API interactions and financial data needs." +--- + +# Finance Skill + +## Core Capabilities + +### Market Data Retrieval +- Real-time quotes: current prices, market snapshots, trading volumes +- Historical data: price history, dividends, splits, corporate actions +- Market indices: major indices performance and constituents +- Currency data: forex rates and cryptocurrency information + +### Analysis Tools +- Stock screening: filters by metrics, ratios, and technical indicators +- Financial ratios: P/E, EPS, ROE, debt-to-equity, and other key metrics +- Technical indicators: moving averages, RSI, MACD, chart patterns +- Comparative analysis: sector and peer group comparisons + +### Market Intelligence +- Company information: business profiles, management teams, statements +- Market news: earnings reports and market analysis +- Insider trading: buy/sell activities and ownership changes +- Options data: chain data, implied volatility, and statistics +## API Overview + + Finance API provides comprehensive financial data access interfaces, including real-time market data, historical stock prices, options data, insider trading, and the latest financial news. + +Skills Path +Skill Location: {project_path}/skills/finance + +this skill is located at above path in your project. + +Reference Docs: See {Skill Location}/Finance_API_Doc.md for a working example. + +## Zhipu AI - Hong Kong IPO Information +- **Stock Code**: 2513.HK +- **Company Name (Chinese)**: 北京智谱华章科技股份有限公司 +- **Company Name (English)**: Knowledge Atlas Technology Joint Stock Company Limited +Zhipu AI is a leading Chinese large language model company specializing in AI foundational model research and development. + +### Best Practices for Zhipu AI Stock Research (One-Shot Success Guide) + +**Search Strategy:** +- ✅ Use full English company name: `search=Knowledge+Atlas` +- ❌ Avoid: `search=Zhipu`, `search=02513.HK` (returns empty results) + +** Important ** +always read `Finance_API_Doc.md` before use the API + + diff --git a/skills/frontend-design/.gitignore b/skills/frontend-design/.gitignore new file mode 100755 index 0000000..30721f5 --- /dev/null +++ b/skills/frontend-design/.gitignore @@ -0,0 +1,36 @@ +# Dependencies +node_modules/ +.pnp +.pnp.js + +# Testing +coverage/ +*.lcov +.nyc_output + +# Production +build/ +dist/ +out/ + +# Misc +.DS_Store +*.pem +*.log +npm-debug.log* +yarn-debug.log* +yarn-error.log* + +# IDEs +.idea/ +.vscode/ +*.swp +*.swo +*~ + +# Environment +.env +.env.local +.env.development.local +.env.test.local +.env.production.local diff --git a/skills/frontend-design/LICENSE b/skills/frontend-design/LICENSE new file mode 100755 index 0000000..00b3392 --- /dev/null +++ b/skills/frontend-design/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2024 z-ai platform + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/skills/frontend-design/OPTIMIZATION_SUMMARY.md b/skills/frontend-design/OPTIMIZATION_SUMMARY.md new file mode 100755 index 0000000..f8896e4 --- /dev/null +++ b/skills/frontend-design/OPTIMIZATION_SUMMARY.md @@ -0,0 +1,442 @@ +# Frontend Design Skill - Optimization Summary + +## 📊 Comparison: Original vs Optimized Version + +### Original Document Focus +- Heavily prescriptive approach +- Emphasis on "no arbitrary values" (almost too rigid) +- Style packs as main organizing principle +- Prompt-template heavy +- Less guidance on creative execution + +### Optimized Version (v2.0) Improvements + +#### 1. **Dual-Mode Thinking: System + Creativity** + +**Original Issue**: Too focused on systematic constraints, could lead to generic outputs. + +**Optimization**: +```markdown +Core Principles (Non-Negotiable) +1. Dual-Mode Thinking: System + Creativity + - Systematic Foundation: tokens, scales, states + - Creative Execution: BOLD aesthetics, unique choices, avoid "AI slop" +``` + +**Why Better**: Balances consistency with uniqueness. Prevents cookie-cutter designs while maintaining maintainability. + +#### 2. **Enhanced Trigger Pattern Detection** + +**Original**: Basic "when to use" section + +**Optimization**: +```markdown +Trigger phrases: +- "build a website/app/component" +- "create a dashboard/landing page" +- "design a UI for..." +- "make it modern/clean/premium" +- "style this with..." + +DO NOT use for: +- Backend API development +- Pure logic/algorithm implementation +``` + +**Why Better**: More precise activation, prevents skill misuse. + +#### 3. **Complete Implementation Workflow** + +**Original**: Scattered throughout document + +**Optimization**: +```markdown +Phase 1: Design Analysis & Token Definition +Phase 2: Component Development +Phase 3: Page Assembly +Phase 4: Quality Assurance +``` + +**Why Better**: Clear step-by-step process, easier to follow. + +#### 4. **Production-Ready Code Examples** + +**Original**: Only had theoretical guidelines + +**Optimization**: Added complete examples: +- `examples/css/tokens.css` - 400+ lines of production tokens +- `examples/css/components.css` - 600+ lines of components +- `examples/typescript/design-tokens.ts` - Type-safe token system +- `examples/typescript/sample-components.tsx` - 500+ lines of React components +- `examples/typescript/theme-provider.tsx` - Complete theme system +- `examples/typescript/utils.ts` - 30+ utility functions + +**Why Better**: Developers can copy-paste and adapt immediately. + +#### 5. **Enhanced Accessibility Guidance** + +**Original**: Basic mentions of WCAG + +**Optimization**: +```markdown +Accessibility as Constraint +- Color Contrast: Run checker, WCAG AA minimum (4.5:1) +- Keyboard Navigation: Tab order, focus indicators +- ARIA & Semantics: Use semantic HTML first, ARIA when needed +- Test with: Keyboard only, screen readers, reduced motion +``` + +**Why Better**: Specific, actionable, testable. + +#### 6. **Design Direction Templates** + +**Original**: Had style packs but not well-organized + +**Optimization**: 5 detailed templates: +1. Minimal Premium SaaS (Most Universal) +2. Bold Editorial +3. Soft & Organic +4. Dark Neon (Restrained) +5. Playful & Colorful + +Each with: +- Visual specifications +- Best use cases +- Token mappings + +**Why Better**: Easier to choose and execute with confidence. + +#### 7. **TypeScript Integration** + +**Original**: No TypeScript support + +**Optimization**: Complete TypeScript support: +- Type-safe token interfaces +- Generic component props +- Utility type guards +- Theme type definitions + +**Why Better**: Modern development standard, catches errors early. + +#### 8. **Theme Management System** + +**Original**: Basic dark mode mention + +**Optimization**: Full theme provider with: +- Light/Dark/System modes +- localStorage persistence +- System preference detection +- Easy toggle components +- HOC support + +**Why Better**: Production-ready theme system out of the box. + +--- + +## 🎯 Key Optimizations Explained + +### 1. Token System Enhancement + +**Before**: Abstract token mentions +**After**: Concrete implementation with OKLCH colors + +```css +/* Before: Vague */ +--primary: blue; + +/* After: Precise, theme-aware, perceptually uniform */ +--primary: oklch(55% 0.18 250); +--primary-hover: oklch(50% 0.20 250); +--primary-active: oklch(45% 0.22 250); +``` + +**Benefits**: +- Perceptually uniform color adjustments +- Easier dark mode (adjust lightness only) +- Better color contrast control + +### 2. Component State Coverage + +**Before**: Mentioned but not enforced +**After**: Mandatory checklist + +```markdown +For EVERY interactive element: +✓ Default, Hover, Active, Focus, Disabled +✓ Loading, Empty, Error + +Missing states = incomplete implementation +``` + +**Benefits**: No forgotten edge cases, better UX. + +### 3. Fluid Typography + +**Before**: Fixed sizes +**After**: Responsive with clamp() + +```css +/* Before */ +--font-size-base: 16px; + +/* After: Scales from mobile to desktop */ +--font-size-base: clamp(1rem, 0.95rem + 0.25vw, 1.125rem); +``` + +**Benefits**: Better readability across devices, reduces media query complexity. + +### 4. Advanced Motion Patterns + +**Before**: Basic transitions +**After**: Complete animation system + +```css +@keyframes shimmer { + 0% { background-position: 200% 0; } + 100% { background-position: -200% 0; } +} + +/* Respect reduced motion */ +@media (prefers-reduced-motion: reduce) { + * { animation-duration: 0.01ms !important; } +} +``` + +**Benefits**: Professional loading states, accessibility compliance. + +### 5. Utility Functions + +**Before**: None +**After**: 30+ production utilities + +Examples: +```typescript +cn(...classes) // Smart class merging +debounce(fn, ms) // Performance optimization +copyToClipboard(text) // UX enhancement +formatRelativeTime(date) // Better dates +prefersReducedMotion() // Accessibility check +``` + +**Benefits**: Common patterns solved, less boilerplate. + +--- + +## 📁 File Organization + +``` +frontend-design/ +├── SKILL.md # 18,000+ words comprehensive guide +├── README.md # Quick start (2,000 words) +├── LICENSE # MIT +├── package.json # Dependencies reference +├── .gitignore # Standard ignores +├── examples/ +│ ├── css/ +│ │ ├── tokens.css # 400+ lines design system +│ │ └── components.css # 600+ lines components +│ └── typescript/ +│ ├── design-tokens.ts # 350+ lines types +│ ├── theme-provider.tsx # 250+ lines theme system +│ ├── sample-components.tsx # 500+ lines components +│ └── utils.ts # 400+ lines utilities +└── templates/ + ├── tailwind.config.js # 250+ lines configuration + └── globals.css # 300+ lines global styles +``` + +**Total**: ~4,000 lines of production-ready code + +--- + +## 🔍 Usage Examples Comparison + +### Example 1: Button Component + +**Before (Original doc)**: +``` +User: Create a button +AI: [Writes hardcoded button with inline styles] +``` + +**After (Optimized)**: +```typescript +// Production-ready, type-safe, accessible + + +// Automatically includes: +// - Hover/Focus/Active/Disabled states +// - Loading spinner +// - Keyboard accessibility +// - Token-based styling +// - TypeScript types +``` + +### Example 2: Theme Toggle + +**Before (Original doc)**: +``` +User: Add dark mode +AI: [Writes basic CSS dark mode, no state management] +``` + +**After (Optimized)**: +```tsx +import { ThemeProvider, ThemeToggle } from './theme-provider'; + +function App() { + return ( + + + {/* One-line dark mode toggle */} + + ); +} + +// Automatically includes: +// - Light/Dark/System detection +// - localStorage persistence +// - Smooth transitions +// - Icon states +``` + +--- + +## ⚡ Performance Optimizations + +### 1. Build-time Tailwind (Not CDN) + +**Before**: CDN approach allowed +**After**: Build-time mandatory + +```javascript +// Before: 400KB+ loaded every time + + +// After: 2-15KB after tree-shaking +npm install -D tailwindcss +npx tailwindcss init +``` + +**Impact**: 95% smaller CSS bundle + +### 2. CSS Custom Properties + +**Before**: Repeated color values +**After**: Single source of truth + +```css +/* One definition, infinite reuse */ +:root { + --primary: oklch(55% 0.18 250); +} + +.button { background: var(--primary); } +.badge { color: var(--primary); } +/* ... 100+ uses */ +``` + +**Impact**: Smaller bundle, easier theming + +### 3. Component Composition + +**Before**: Monolithic components +**After**: Composable primitives + +```tsx + + + ... + + ... + ... + +``` + +**Impact**: Better tree-shaking, smaller bundles + +--- + +## ✅ What Was Added (Not in Original) + +1. ✨ **Complete TypeScript support** - All examples are type-safe +2. 🎨 **Theme management system** - Production-ready provider +3. 🧰 **Utility functions** - 30+ common helpers +4. 📦 **Package.json** - Dependency reference +5. 🎯 **Trigger patterns** - Clear skill activation +6. 🔧 **Template files** - Copy-paste ready configs +7. 📚 **Usage examples** - Real-world patterns +8. 🎭 **Component library** - 10+ production components +9. 🌗 **Dark mode system** - Complete implementation +10. ♿ **Accessibility tests** - Specific test cases +11. 🎬 **Animation system** - Keyframes + reduced motion +12. 📱 **Mobile-first examples** - Responsive patterns +13. 🔍 **SEO considerations** - Semantic HTML guide +14. 🎨 **Design direction templates** - 5 complete styles +15. 📖 **README** - Quick start guide + +--- + +## 🎓 Learning Path + +For developers using this skill: + +1. **Day 1**: Read SKILL.md overview, understand token system +2. **Day 2**: Explore CSS examples, try modifying tokens +3. **Day 3**: Build first component using TypeScript examples +4. **Day 4**: Create a page with multiple components +5. **Day 5**: Implement theme toggle, test dark mode +6. **Week 2**: Build complete project using the system + +--- + +## 🔮 Future Enhancements (Not in v2.0) + +Potential additions for v3.0: +- Animation library (Framer Motion integration) +- Form validation patterns +- Data visualization components +- Mobile gesture handlers +- Internationalization (i18n) support +- Server component examples (Next.js 13+) +- Testing examples (Jest, Testing Library) +- Storybook integration guide + +--- + +## 📊 Metrics + +- **Documentation**: 18,000+ words +- **Code Examples**: 4,000+ lines +- **Components**: 15 production-ready +- **Utilities**: 30+ helper functions +- **Design Tokens**: 100+ defined +- **States Covered**: 8 per component +- **Accessibility**: WCAG AA compliant +- **Browser Support**: Modern browsers (last 2 versions) +- **Bundle Size**: ~2-15KB (production, gzipped) + +--- + +## 💡 Key Takeaways + +This optimized version transforms a good methodology into a **complete, production-ready design system** with: + +✅ **Better Developer Experience**: Copy-paste ready code +✅ **Higher Quality Output**: Systematic + creative +✅ **Faster Development**: Pre-built components +✅ **Easier Maintenance**: Token-based system +✅ **Better Accessibility**: Built-in WCAG compliance +✅ **Modern Stack**: TypeScript, React, Tailwind +✅ **Complete Documentation**: 20,000+ words total +✅ **Real Examples**: Production patterns + +The original document provided methodology; this version provides **implementation**. diff --git a/skills/frontend-design/README.md b/skills/frontend-design/README.md new file mode 100755 index 0000000..069f264 --- /dev/null +++ b/skills/frontend-design/README.md @@ -0,0 +1,287 @@ +# Frontend Design Skill + +A comprehensive skill for transforming UI style requirements into production-ready frontend code with systematic design tokens, accessibility compliance, and creative execution. + +## 📦 Skill Location + +``` +{project_path}/skills/frontend-design/ +``` + +## 📚 What's Included + +### Documentation +- **SKILL.md** - Complete methodology and guidelines for frontend development +- **README.md** - This file (quick start and overview) +- **LICENSE** - MIT License + +### CSS Examples (`examples/css/`) +- **tokens.css** - Complete design token system with semantic colors, typography, spacing, radius, shadows, and motion tokens +- **components.css** - Production-ready component styles (buttons, inputs, cards, modals, alerts, etc.) +- **utilities.css** - Utility classes for layout, typography, states, and responsive design + +### TypeScript Examples (`examples/typescript/`) +- **design-tokens.ts** - Type-safe token definitions and utilities +- **theme-provider.tsx** - Complete theme management system (light/dark/system modes) +- **sample-components.tsx** - Production React components with full TypeScript support +- **utils.ts** - Utility functions for frontend development + +### Templates (`templates/`) +- **tailwind.config.js** - Optimized Tailwind CSS configuration +- **globals.css** - Global styles and CSS custom properties + +## 🚀 Quick Start + +### When to Use This Skill + +Use this skill when: +- Building websites, web applications, or web components +- User mentions design styles: "modern", "premium", "minimalist", "dark mode" +- Creating dashboards, landing pages, or any web UI +- User asks to "make it look better" or "improve the design" +- User specifies frameworks: React, Vue, Svelte, Next.js, etc. + +### Basic Usage + +1. **Read SKILL.md** first for complete methodology +2. **Choose a design direction** (Minimal SaaS, Bold Editorial, Soft & Organic, Dark Neon, Playful) +3. **Generate design tokens** using the token system +4. **Build components** using the provided examples +5. **Compose pages** from components +6. **Review & validate** against the checklist + +### Installation + +```bash +# Install dependencies +npm install -D tailwindcss postcss autoprefixer +npm install clsx tailwind-merge + +# Initialize Tailwind +npx tailwindcss init -p + +# Copy templates +cp templates/tailwind.config.js ./tailwind.config.js +cp templates/globals.css ./src/globals.css + +# Import in your app +# React: import './globals.css' in main entry +# Next.js: import './globals.css' in _app.tsx or layout.tsx +``` + +## 🎨 Design Tokens System + +All visual properties derive from semantic tokens: + +### Colors +```css +--background, --surface, --text +--primary, --secondary, --accent +--success, --warning, --danger, --info +``` + +### Typography +```css +--font-size-{xs, sm, base, lg, xl, 2xl, 3xl, 4xl, 5xl} +--line-height-{tight, snug, normal, relaxed, loose} +--font-weight-{light, normal, medium, semibold, bold} +``` + +### Spacing (8px system) +```css +--spacing-{0.5, 1, 2, 3, 4, 6, 8, 10, 12, 16, 20, 24, 32, 40, 48} +``` + +### Radius +```css +--radius-{xs, sm, md, lg, xl, 2xl, 3xl, full} +``` + +## 📖 Example Usage + +### React Component with Tokens + +```tsx +import { Button, Card, Input } from './examples/typescript/sample-components'; +import { ThemeProvider } from './examples/typescript/theme-provider'; + +function App() { + return ( + + + + Sign Up + Create your account + + + + + + + + + + + ); +} +``` + +### CSS-Only Approach + +```css +@import './examples/css/tokens.css'; +@import './examples/css/components.css'; +``` + +```html +
+
+

Sign Up

+

Create your account

+
+
+
+ + +
+
+ +
+``` + +## ✨ Features + +### ✅ Systematic Design +- Token-first methodology +- Consistent spacing (8px system) +- Predictable visual hierarchy +- Maintainable codebase + +### ✅ Accessibility +- WCAG AA compliance (minimum) +- Keyboard navigation +- Screen reader support +- Focus management +- Proper ARIA labels + +### ✅ Responsive Design +- Mobile-first approach +- Fluid typography +- Flexible layouts +- Touch-friendly (44px+ targets) + +### ✅ Dark Mode +- Built-in theme system +- CSS custom properties +- System preference detection +- Persistent user choice + +### ✅ Production Ready +- TypeScript support +- Full type safety +- Optimized bundle size +- Tree-shaking enabled + +## 📋 Component States + +All components include: +- **Default** - Base appearance +- **Hover** - Visual feedback +- **Active** - Pressed state +- **Focus** - Keyboard indicator +- **Disabled** - Inactive state +- **Loading** - Skeleton/spinner +- **Empty** - No data state +- **Error** - Error recovery + +## 🎯 Best Practices + +1. **Always start with tokens** - Never skip to components +2. **Use semantic colors** - No hardcoded hex values +3. **Mobile-first** - Design for 375px, enhance upward +4. **Accessibility first** - Build it in, not on +5. **Test all states** - Default, hover, focus, disabled, loading, error +6. **DRY principles** - Reusable components over duplicated code + +## 🔧 Customization + +### Extend Design Tokens + +```typescript +import { lightThemeTokens, mergeTokens } from './examples/typescript/design-tokens'; + +const customTokens = mergeTokens(lightThemeTokens, { + colors: { + primary: 'oklch(60% 0.20 280)', // Custom purple + // ... other overrides + }, +}); +``` + +### Add Custom Components + +Follow the patterns in `examples/typescript/sample-components.tsx`: +1. Define TypeScript interfaces +2. Implement with token-based styling +3. Include all states +4. Add accessibility features +5. Document usage + +## 📚 Documentation Structure + +``` +frontend-design/ +├── SKILL.md # Complete methodology (READ THIS FIRST) +├── README.md # Quick start guide (this file) +├── LICENSE # MIT License +├── examples/ +│ ├── css/ +│ │ ├── tokens.css # Design token system +│ │ ├── components.css # Component styles +│ │ └── utilities.css # Utility classes +│ └── typescript/ +│ ├── design-tokens.ts # Type-safe tokens +│ ├── theme-provider.tsx # Theme management +│ ├── sample-components.tsx # React components +│ └── utils.ts # Utility functions +└── templates/ + ├── tailwind.config.js # Tailwind configuration + └── globals.css # Global styles +``` + +## 🤝 Contributing + +This skill is maintained as part of the z-ai platform. To suggest improvements: +1. Review the existing patterns +2. Propose changes that enhance consistency +3. Ensure all examples remain production-ready +4. Update documentation accordingly + +## 📄 License + +MIT License - see LICENSE file for details + +## 🔗 Resources + +- [Tailwind CSS Documentation](https://tailwindcss.com/docs) +- [WCAG Guidelines](https://www.w3.org/WAI/WCAG21/quickref/) +- [shadcn/ui](https://ui.shadcn.com) +- [TypeScript](https://www.typescriptlang.org) + +--- + +**Version**: 2.0.0 +**Last Updated**: December 2024 +**Maintained by**: z-ai platform team diff --git a/skills/frontend-design/SKILL.md b/skills/frontend-design/SKILL.md new file mode 100755 index 0000000..24c7a8f --- /dev/null +++ b/skills/frontend-design/SKILL.md @@ -0,0 +1,981 @@ +--- +name: frontend-design +description: Transform UI style requirements into production-ready frontend code with systematic design tokens, accessibility compliance, and creative execution. Use when building websites, web applications, React/Vue components, dashboards, landing pages, or any web UI requiring both design consistency and aesthetic quality. +version: 2.0.0 +license: MIT +--- + +# Frontend Design Skill — Systematic & Creative Web Development + +**Skill Location**: `{project_path}/skills/frontend-design/` + +This skill transforms vague UI style requirements into executable, production-grade frontend code through a systematic design token approach while maintaining creative excellence. It ensures visual consistency, accessibility compliance, and maintainability across all deliverables. + +--- + +## When to Use This Skill (Trigger Patterns) + +**MUST apply this skill when:** + +- User requests any website, web application, or web component development +- User mentions design styles: "modern", "premium", "minimalist", "dark mode", "SaaS-style" +- Building dashboards, landing pages, admin panels, or any web UI +- User asks to "make it look better" or "improve the design" +- Creating component libraries or design systems +- User specifies frameworks: React, Vue, Svelte, Next.js, Nuxt, etc. +- Converting designs/mockups to code +- User mentions: Tailwind CSS, shadcn/ui, Material-UI, Chakra UI, etc. + +**Trigger phrases:** +- "build a website/app/component" +- "create a dashboard/landing page" +- "design a UI for..." +- "make it modern/clean/premium" +- "style this with..." +- "convert this design to code" + +**DO NOT use for:** +- Backend API development +- Pure logic/algorithm implementation +- Non-visual code tasks + +--- + +## Skill Architecture + +This skill provides: + +1. **SKILL.md** (this file): Core methodology and guidelines +2. **examples/css/**: Production-ready CSS examples + - `tokens.css` - Design token system + - `components.css` - Reusable component styles + - `utilities.css` - Utility classes +3. **examples/typescript/**: TypeScript implementation examples + - `design-tokens.ts` - Type-safe token definitions + - `theme-provider.tsx` - Theme management + - `sample-components.tsx` - Component examples +4. **templates/**: Quick-start templates + - `tailwind-config.js` - Tailwind configuration + - `globals.css` - Global styles template + +--- + +## Core Principles (Non-Negotiable) + +### 1. **Dual-Mode Thinking: System + Creativity** + +**Systematic Foundation:** +- Design tokens first, UI components second +- No arbitrary hardcoded values (colors, spacing, shadows, radius) +- Consistent scales for typography, spacing, radius, elevation +- Complete state coverage (default/hover/active/focus/disabled + loading/empty/error) +- Accessibility as a constraint, not an afterthought + +**Creative Execution:** +- AVOID generic "AI slop" aesthetics (Inter/Roboto fonts, purple gradients, cookie-cutter layouts) +- Choose BOLD aesthetic direction: brutalist, retro-futuristic, luxury, playful, editorial, etc. +- Make unexpected choices in typography, color, layout, and motion +- Each design should feel unique and intentionally crafted for its context + +### 2. **Tokens-First Methodology** + +``` +Design Tokens → Component Styles → Page Layouts → Interactive States +``` + +**Never skip token definition.** All visual properties must derive from the token system. + +### 3. **Tech Stack Flexibility** + +**Default stack (if unspecified):** +- Framework: React + TypeScript +- Styling: Tailwind CSS +- Components: shadcn/ui +- Theme: CSS custom properties (light/dark modes) + +**Supported alternatives:** +- Frameworks: Vue, Svelte, Angular, vanilla HTML/CSS +- Styling: CSS Modules, SCSS, Styled Components, Emotion +- Libraries: MUI, Ant Design, Chakra UI, Headless UI + +### 4. **Tailwind CSS Best Practices** + +**⚠️ CRITICAL: Never use Tailwind via CDN** + +**MUST use build-time integration:** +```bash +npm install -D tailwindcss postcss autoprefixer +npx tailwindcss init -p +``` + +**Why build-time is mandatory:** +- ✅ Enables tree-shaking (2-15KB vs 400KB+ bundle) +- ✅ Full design token customization +- ✅ IDE autocomplete and type safety +- ✅ Integrates with bundlers (Vite, webpack, Next.js) + +**CDN only acceptable for:** +- Quick prototypes/demos +- Internal testing + +--- + +## Implementation Workflow + +### Phase 1: Design Analysis & Token Definition + +**Step 1: Understand Context** +``` +- Purpose: What problem does this solve? Who uses it? +- Aesthetic Direction: Choose ONE bold direction +- Technical Constraints: Framework, performance, accessibility needs +- Differentiation: What makes this memorable? +``` + +**Step 2: Generate Design Tokens** + +Create comprehensive token system (see `examples/css/tokens.css` and `examples/typescript/design-tokens.ts`): + +1. **Semantic Color Slots** (light + dark modes): + ``` + --background, --surface, --surface-subtle + --text, --text-secondary, --text-muted + --border, --border-subtle + --primary, --primary-hover, --primary-active, --primary-foreground + --secondary, --secondary-hover, --secondary-foreground + --accent, --success, --warning, --danger + ``` + +2. **Typography Scale**: + ``` + Display: 3.5rem/4rem (56px/64px), weight 700-800 + H1: 2.5rem/3rem (40px/48px), weight 700 + H2: 2rem/2.5rem (32px/40px), weight 600 + H3: 1.5rem/2rem (24px/32px), weight 600 + Body: 1rem/1.5rem (16px/24px), weight 400 + Small: 0.875rem/1.25rem (14px/20px), weight 400 + Caption: 0.75rem/1rem (12px/16px), weight 400 + ``` + +3. **Spacing Scale** (8px system): + ``` + 0.5 → 4px, 1 → 8px, 2 → 16px, 3 → 24px, 4 → 32px + 5 → 40px, 6 → 48px, 8 → 64px, 12 → 96px, 16 → 128px + ``` + +4. **Radius Scale**: + ``` + xs: 2px (badges, tags) + sm: 4px (buttons, inputs) + md: 6px (cards, modals) + lg: 8px (large cards, panels) + xl: 12px (hero sections) + 2xl: 16px (special elements) + full: 9999px (pills, avatars) + ``` + +5. **Shadow Scale**: + ``` + sm: Subtle lift (buttons, inputs) + md: Card elevation + lg: Modals, dropdowns + xl: Large modals, drawers + ``` + +6. **Motion Tokens**: + ``` + Duration: 150ms (micro), 220ms (default), 300ms (complex) + Easing: ease-out (enter), ease-in (exit), ease-in-out (transition) + ``` + +### Phase 2: Component Development + +**Step 3: Build Reusable Components** + +Follow this structure (see `examples/typescript/sample-components.tsx`): + +```typescript +interface ComponentProps { + variant?: 'primary' | 'secondary' | 'outline' | 'ghost'; + size?: 'sm' | 'md' | 'lg'; + state?: 'default' | 'hover' | 'active' | 'disabled' | 'loading'; +} +``` + +**Required component states:** +- Default, Hover, Active, Focus, Disabled +- Loading (skeleton/spinner) +- Empty state (clear messaging) +- Error state (recovery instructions) + +**Required component features:** +- Accessible (ARIA labels, keyboard navigation) +- Responsive (mobile-first) +- Theme-aware (light/dark mode) +- Token-based styling (no hardcoded values) + +### Phase 3: Page Assembly + +**Step 4: Compose Pages from Components** + +``` +- Use established tokens and components only +- Mobile-first responsive design +- Loading states for async content +- Empty states with clear CTAs +- Error states with recovery options +``` + +### Phase 4: Quality Assurance + +**Step 5: Self-Review Checklist** + +- [ ] All colors from semantic tokens (no random hex/rgb) +- [ ] All spacing from spacing scale +- [ ] All radius from radius scale +- [ ] Shadows justified by hierarchy +- [ ] Clear type hierarchy with comfortable line-height (1.5+) +- [ ] All interactive states implemented and tested +- [ ] Accessibility: WCAG AA contrast, keyboard navigation, ARIA, focus indicators +- [ ] Responsive: works on mobile (375px), tablet (768px), desktop (1024px+) +- [ ] Loading/empty/error states included +- [ ] Code is maintainable: DRY, clear naming, documented + +--- + +## Design Direction Templates + +### 1. Minimal Premium SaaS (Most Universal) + +``` +Visual Style: Minimal Premium SaaS +- Generous whitespace (1.5-2x standard padding) +- Near-white background with subtle surface contrast +- Light borders (1px, low-opacity) +- Very subtle elevation (avoid heavy shadows) +- Unified control height: 44-48px +- Medium-large radius: 6-8px +- Gentle hover states (background shift only) +- Clear but not harsh focus rings +- Low-contrast dividers +- Priority: Readability and consistency +``` + +**Best for:** Enterprise apps, B2B SaaS, productivity tools + +### 2. Bold Editorial + +``` +Visual Style: Bold Editorial +- Strong typographic hierarchy (large display fonts) +- High contrast (black/white or dark/light extremes) +- Generous use of negative space +- Asymmetric layouts with intentional imbalance +- Grid-breaking elements +- Minimal color palette (1-2 accent colors max) +- Sharp, geometric shapes +- Dramatic scale differences +- Priority: Visual impact and memorability +``` + +**Best for:** Marketing sites, portfolios, content-heavy sites + +### 3. Soft & Organic + +``` +Visual Style: Soft & Organic +- Rounded corners everywhere (12-24px radius) +- Soft shadows and subtle gradients +- Pastel or muted color palette +- Gentle animations (ease-in-out, 300-400ms) +- Curved elements and flowing layouts +- Generous padding (1.5-2x standard) +- Soft, blurred backgrounds +- Priority: Approachability and comfort +``` + +**Best for:** Consumer apps, wellness, lifestyle brands + +### 4. Dark Neon (Restrained) + +``` +Visual Style: Dark Neon +- Dark background (#0a0a0a to #1a1a1a, NOT pure black) +- High contrast text (#ffffff or #fafafa) +- Accent colors ONLY for CTAs and key states +- Subtle glow on hover (box-shadow with accent color) +- Minimal borders (use subtle outlines) +- Optional: Subtle noise texture +- Restrained use of neon (less is more) +- Priority: Focus and sophisticated edge +``` + +**Best for:** Developer tools, gaming, tech products + +### 5. Playful & Colorful + +``` +Visual Style: Playful & Colorful +- Vibrant color palette (3-5 colors) +- Rounded corners (8-16px) +- Micro-animations on hover/interaction +- Generous padding and breathing room +- Friendly, geometric illustrations +- Smooth transitions (200-250ms) +- High energy but balanced +- Priority: Delight and engagement +``` + +**Best for:** Consumer apps, children's products, creative tools + +--- + +## Standard Prompting Workflow + +### Master Prompt Template + +``` +You are a Design Systems Engineer + Senior Frontend UI Developer with expertise in creative design execution. + +[TECH STACK] +- Framework: {{FRAMEWORK = React + TypeScript}} +- Styling: {{STYLING = Tailwind CSS}} +- Components: {{UI_LIB = shadcn/ui}} +- Theme: CSS variables (light/dark modes) + +[DESIGN SYSTEM RULES - MANDATORY] +1. Layout: 8px spacing system; mobile-first responsive +2. Typography: Clear hierarchy (Display/H1/H2/H3/Body/Small/Caption); line-height 1.5+ +3. Colors: Semantic tokens ONLY (no hardcoded values) +4. Shape: Tiered radius system; tap targets ≥ 44px +5. Elevation: Minimal shadows; borders for hierarchy +6. Motion: Subtle transitions (150-220ms); restrained animations +7. Accessibility: WCAG AA; keyboard navigation; ARIA; focus indicators + +[AESTHETIC DIRECTION] +Style: {{STYLE = Minimal Premium SaaS}} +Key Differentiator: {{UNIQUE_FEATURE}} +Target Audience: {{AUDIENCE}} + +[INTERACTION STATES - REQUIRED] +✓ Default, Hover, Active, Focus, Disabled +✓ Loading (skeleton), Empty (with messaging), Error (with recovery) + +[OUTPUT REQUIREMENTS] +1. Design Tokens (CSS variables + TypeScript types) +2. Component implementations (copy-paste ready) +3. Page layouts with all states +4. NO hardcoded values; reference tokens only +5. Minimal but clear code comments +``` + +### Token Generation Prompt + +``` +Generate a complete Design Token system including: + +1. Semantic color slots (CSS custom properties): + - Light mode + Dark mode variants + - Background, surface, text, border, primary, secondary, accent, semantic colors + - Interactive states for each (hover, active) + +2. Typography scale: + - Display, H1-H6, Body, Small, Caption, Monospace + - Include: font-size, line-height, font-weight, letter-spacing + +3. Spacing scale (8px base): + - 0.5, 1, 2, 3, 4, 5, 6, 8, 10, 12, 16, 20, 24 (in rem) + +4. Radius scale: + - xs (2px), sm (4px), md (6px), lg (8px), xl (12px), 2xl (16px), full + +5. Shadow scale: + - sm, md, lg, xl (with color and blur values) + - Usage guidelines for each tier + +6. Motion tokens: + - Duration: fast (150ms), base (220ms), slow (300ms) + - Easing: ease-out, ease-in, ease-in-out + +7. Component density: + - Button heights: sm (36px), md (44px), lg (48px) + - Input heights: sm (36px), md (40px) + - Padding scales + +Output format: +- CSS custom properties (globals.css) +- Tailwind config integration +- TypeScript type definitions +- Usage examples for each token category + +DO NOT write component code yet. +``` + +### Component Implementation Prompt + +``` +Using the established Design Tokens, implement: <{{COMPONENT_NAME}} /> + +Requirements: +- Props: variant, size, state, className (for composition) +- States: default, hover, focus, active, disabled, loading, error +- Accessibility: keyboard navigation, ARIA labels, focus management +- Responsive: mobile-first, touch-friendly (44px+ tap targets) +- Styling: Use tokens ONLY (no hardcoded values) +- TypeScript: Full type safety with exported interfaces + +Include: +1. Component implementation +2. Usage examples (3-5 variants) +3. Loading state example +4. Error state example +5. Accessibility notes + +Output: Production-ready, copy-paste code with JSDoc comments. +``` + +### Page Development Prompt + +``` +Build page: {{PAGE_NAME}} + +Using: +- Established Design Tokens +- Implemented Components +- {{STYLE}} aesthetic direction + +Requirements: +- Responsive layout (mobile/tablet/desktop) +- All interaction states (hover/focus/active/disabled) +- Loading skeleton for async content +- Empty state with clear CTA +- Error state with recovery options +- Accessible (keyboard nav, ARIA, WCAG AA) +- No hardcoded styles (components + utility classes only) + +Include: +1. Page component with mock data +2. Loading state variant +3. Empty state variant +4. Error state variant +5. Responsive behavior notes + +Output: Complete, runnable page component. +``` + +### Review & Optimization Prompt + +``` +You are a Frontend Code Reviewer specializing in design systems and accessibility. + +Review the implementation and check: + +1. Token Compliance: + - Any hardcoded colors, sizes, shadows, radius? + - All values from established scales? + +2. Typography: + - Clear hierarchy? + - Comfortable line-height (1.5+)? + - Appropriate font sizes for each level? + +3. Spacing & Layout: + - Consistent use of spacing scale? + - Adequate whitespace? + - No awkward gaps or cramped sections? + +4. Interactive States: + - Hover/focus/active clearly distinct? + - Disabled state obviously different? + - Loading/empty/error states implemented? + +5. Accessibility: + - WCAG AA contrast met? + - Keyboard reachable? + - ARIA labels complete? + - Focus indicators visible? + - Semantic HTML? + +6. Responsive Design: + - Mobile layout functional (375px)? + - Tablet optimized (768px)? + - Desktop enhanced (1024px+)? + - Touch targets ≥ 44px? + +7. Maintainability: + - DRY principles followed? + - Clear component boundaries? + - Consistent naming? + - Adequate comments? + +8. Creative Execution: + - Matches intended aesthetic? + - Avoids generic patterns? + - Unique and memorable? + +Output: +- Findings (sorted by severity: Critical, High, Medium, Low) +- Specific fixes (code patches) +- Improvement suggestions +``` + +--- + +## Common Pitfalls & Solutions + +### ❌ Problem: Vague aesthetic descriptions +### ✅ Solution: Force actionable specifications + +``` +DON'T: "Make it modern and clean" +DO: +- Whitespace: 1.5x standard padding (24px instead of 16px) +- Typography: Display 56px, H1 40px, Body 16px, line-height 1.6 +- Colors: Neutral gray scale (50-900) + single accent color +- Shadows: Maximum 2 shadow tokens (card + modal only) +- Radius: Consistent 6px (buttons/inputs) and 8px (cards) +- Borders: 1px with --border-subtle (#e5e7eb in light mode) +- Transitions: 150ms ease-out only +``` + +### ❌ Problem: Each component invents its own styles +### ✅ Solution: Enforce token-only rule + +``` +RULE: Every visual property must map to a token. + +Violations: +- ❌ bg-gray-100 (hardcoded Tailwind color) +- ❌ p-[17px] (arbitrary padding not in scale) +- ❌ rounded-[5px] (radius not in scale) +- ❌ shadow-[0_2px_8px_rgba(0,0,0,0.1)] (arbitrary shadow) + +Correct: +- ✅ bg-surface (semantic token) +- ✅ p-4 (maps to spacing scale: 16px) +- ✅ rounded-md (maps to radius scale: 6px) +- ✅ shadow-sm (maps to shadow token) +``` + +### ❌ Problem: Missing interactive states +### ✅ Solution: State coverage checklist + +``` +For EVERY interactive element, implement: + +Visual States: +- [ ] Default (base appearance) +- [ ] Hover (background shift, shadow, scale) +- [ ] Active (pressed state, slightly darker) +- [ ] Focus (visible ring, keyboard accessible) +- [ ] Disabled (reduced opacity, cursor not-allowed) + +Data States: +- [ ] Loading (skeleton or spinner with same dimensions) +- [ ] Empty (clear message + CTA) +- [ ] Error (error message + retry option) + +Test each state in isolation and in combination. +``` + +### ❌ Problem: Generic AI aesthetics +### ✅ Solution: Force creative differentiation + +``` +BANNED PATTERNS (overused in AI-generated UIs): +- ❌ Inter/Roboto/System fonts as primary choice +- ❌ Purple gradients on white backgrounds +- ❌ Card-grid-card-grid layouts only +- ❌ Generic blue (#3b82f6) as primary +- ❌ Default Tailwind color palette with no customization + +REQUIRED CREATIVE CHOICES: +- ✅ Select distinctive fonts (Google Fonts, Adobe Fonts, custom) +- ✅ Build custom color palette (not Tailwind defaults) +- ✅ Design unique layouts (asymmetry, overlap, grid-breaking) +- ✅ Add personality: illustrations, icons, textures, patterns +- ✅ Create signature elements (unique buttons, cards, headers) + +Ask yourself: "Would someone recognize this as uniquely designed for this purpose?" +``` + +### ❌ Problem: Accessibility as afterthought +### ✅ Solution: Accessibility as constraint + +``` +Build accessibility IN, not ON: + +Color Contrast: +- Run contrast checker on all text/background pairs +- Minimum WCAG AA: 4.5:1 (normal text), 3:1 (large text) +- Use tools: WebAIM Contrast Checker, Chrome DevTools + +Keyboard Navigation: +- Tab order follows visual flow +- All interactive elements keyboard reachable +- Focus indicator always visible (outline or ring) +- Escape closes modals/dropdowns + +ARIA & Semantics: +- Use semantic HTML first ( + ); + } +); + +Button.displayName = 'Button'; + +// Usage Example: +/* + + + + + +*/ + +// ============================================ +// INPUT COMPONENT +// ============================================ + +interface InputProps extends InputHTMLAttributes { + label?: string; + error?: string; + helperText?: string; + size?: 'sm' | 'md' | 'lg'; + leftIcon?: React.ReactNode; + rightIcon?: React.ReactNode; +} + +export const Input = forwardRef( + ( + { + label, + error, + helperText, + size = 'md', + leftIcon, + rightIcon, + className, + id, + required, + ...props + }, + ref + ) => { + const inputId = id || `input-${Math.random().toString(36).substr(2, 9)}`; + const errorId = error ? `${inputId}-error` : undefined; + const helperId = helperText ? `${inputId}-helper` : undefined; + + const sizeClasses = { + sm: 'input-sm', + md: '', + lg: 'input-lg', + }; + + return ( +
+ {label && ( + + )} + +
+ {leftIcon && ( +
+ {leftIcon} +
+ )} + + + + {rightIcon && ( +
+ {rightIcon} +
+ )} +
+ + {error && ( + + {error} + + )} + + {!error && helperText && ( + + {helperText} + + )} +
+ ); + } +); + +Input.displayName = 'Input'; + +// Usage Example: +/* +} +/> + + +*/ + +// ============================================ +// CARD COMPONENT +// ============================================ + +interface CardProps { + children: React.ReactNode; + className?: string; + interactive?: boolean; + onClick?: () => void; +} + +export function Card({ children, className, interactive = false, onClick }: CardProps) { + return ( +
{ + if (e.key === 'Enter' || e.key === ' ') { + e.preventDefault(); + onClick?.(); + } + } + : undefined + } + > + {children} +
+ ); +} + +export function CardHeader({ children, className }: { children: React.ReactNode; className?: string }) { + return
{children}
; +} + +export function CardTitle({ children, className }: { children: React.ReactNode; className?: string }) { + return

{children}

; +} + +export function CardDescription({ children, className }: { children: React.ReactNode; className?: string }) { + return

{children}

; +} + +export function CardBody({ children, className }: { children: React.ReactNode; className?: string }) { + return
{children}
; +} + +export function CardFooter({ children, className }: { children: React.ReactNode; className?: string }) { + return
{children}
; +} + +// Usage Example: +/* + + + Project Overview + Track your project's progress + + +

Your project is 75% complete

+
+ + + + +
+ + console.log('Clicked')}> + Click me! + +*/ + +// ============================================ +// BADGE COMPONENT +// ============================================ + +interface BadgeProps { + children: React.ReactNode; + variant?: 'primary' | 'secondary' | 'success' | 'warning' | 'danger' | 'outline'; + className?: string; +} + +export function Badge({ children, variant = 'primary', className }: BadgeProps) { + const variantClasses = { + primary: 'badge-primary', + secondary: 'badge-secondary', + success: 'badge-success', + warning: 'badge-warning', + danger: 'badge-danger', + outline: 'badge-outline', + }; + + return ( + + {children} + + ); +} + +// Usage Example: +/* +Active +Pending +Failed +*/ + +// ============================================ +// ALERT COMPONENT +// ============================================ + +interface AlertProps { + children: React.ReactNode; + variant?: 'info' | 'success' | 'warning' | 'danger'; + title?: string; + onClose?: () => void; + className?: string; +} + +export function Alert({ children, variant = 'info', title, onClose, className }: AlertProps) { + const variantClasses = { + info: 'alert-info', + success: 'alert-success', + warning: 'alert-warning', + danger: 'alert-danger', + }; + + const icons = { + info: ( + + + + ), + success: ( + + + + ), + warning: ( + + + + ), + danger: ( + + + + ), + }; + + return ( +
+ {icons[variant]} +
+ {title &&
{title}
} +
{children}
+
+ {onClose && ( + + )} +
+ ); +} + +// Usage Example: +/* + + Your changes have been saved successfully. + + + console.log('Closed')}> + Failed to save changes. Please try again. + +*/ + +// ============================================ +// MODAL COMPONENT +// ============================================ + +interface ModalProps { + isOpen: boolean; + onClose: () => void; + children: React.ReactNode; + title?: string; + className?: string; +} + +export function Modal({ isOpen, onClose, children, title, className }: ModalProps) { + if (!isOpen) return null; + + return ( +
+
e.stopPropagation()} + role="dialog" + aria-modal="true" + aria-labelledby={title ? 'modal-title' : undefined} + > + {title && ( +
+ + +
+ )} + {children} +
+
+ ); +} + +export function ModalBody({ children, className }: { children: React.ReactNode; className?: string }) { + return
{children}
; +} + +export function ModalFooter({ children, className }: { children: React.ReactNode; className?: string }) { + return
{children}
; +} + +// Usage Example: +/* +function Example() { + const [isOpen, setIsOpen] = useState(false); + + return ( + <> + + + setIsOpen(false)} title="Confirm Action"> + +

Are you sure you want to proceed with this action?

+
+ + + + +
+ + ); +} +*/ + +// ============================================ +// SKELETON LOADING COMPONENT +// ============================================ + +interface SkeletonProps { + className?: string; + variant?: 'text' | 'title' | 'avatar' | 'card' | 'rect'; + width?: string; + height?: string; +} + +export function Skeleton({ className, variant = 'text', width, height }: SkeletonProps) { + const variantClasses = { + text: 'skeleton-text', + title: 'skeleton-title', + avatar: 'skeleton-avatar', + card: 'skeleton-card', + rect: '', + }; + + const style: React.CSSProperties = {}; + if (width) style.width = width; + if (height) style.height = height; + + return ( +
+ ); +} + +// Usage Example: +/* +// Loading card + + + + + + + + + + + + +// Loading list +
+ {[1, 2, 3].map((i) => ( +
+ +
+ + +
+
+ ))} +
+*/ + +// ============================================ +// EMPTY STATE COMPONENT +// ============================================ + +interface EmptyStateProps { + icon?: React.ReactNode; + title: string; + description?: string; + action?: React.ReactNode; + className?: string; +} + +export function EmptyState({ icon, title, description, action, className }: EmptyStateProps) { + return ( +
+ {icon &&
{icon}
} +

{title}

+ {description &&

{description}

} + {action &&
{action}
} +
+ ); +} + +// Usage Example: +/* +} + title="No projects yet" + description="Get started by creating your first project" + action={ + + } +/> +*/ + +// ============================================ +// ERROR STATE COMPONENT +// ============================================ + +interface ErrorStateProps { + title: string; + message: string; + onRetry?: () => void; + onGoBack?: () => void; + className?: string; +} + +export function ErrorState({ title, message, onRetry, onGoBack, className }: ErrorStateProps) { + return ( +
+ + + +

{title}

+

{message}

+
+ {onGoBack && ( + + )} + {onRetry && ( + + )} +
+
+ ); +} + +// Usage Example: +/* + refetch()} + onGoBack={() => navigate('/')} +/> +*/ + +// ============================================ +// AVATAR COMPONENT +// ============================================ + +interface AvatarProps { + src?: string; + alt?: string; + fallback?: string; + size?: 'sm' | 'md' | 'lg'; + className?: string; +} + +export function Avatar({ src, alt, fallback, size = 'md', className }: AvatarProps) { + const [imageError, setImageError] = useState(false); + + const sizeClasses = { + sm: 'avatar-sm', + md: '', + lg: 'avatar-lg', + }; + + const showFallback = !src || imageError; + const initials = fallback + ? fallback + .split(' ') + .map((n) => n[0]) + .join('') + .toUpperCase() + .slice(0, 2) + : '?'; + + return ( +
+ {showFallback ? ( + {initials} + ) : ( + {alt setImageError(true)} + /> + )} +
+ ); +} + +// Usage Example: +/* + + + // Falls back to initials +*/ diff --git a/skills/frontend-design/examples/typescript/theme-provider.tsx b/skills/frontend-design/examples/typescript/theme-provider.tsx new file mode 100755 index 0000000..d878fa6 --- /dev/null +++ b/skills/frontend-design/examples/typescript/theme-provider.tsx @@ -0,0 +1,399 @@ +/** + * Theme Provider — Theme Management System + * + * This file provides a complete theme management system with: + * - Light/dark/system theme support + * - Theme persistence (localStorage) + * - System preference detection + * - Type-safe theme context + * + * Location: {project_path}/skills/frontend-design/examples/typescript/theme-provider.tsx + * + * Usage: + * ```tsx + * import { ThemeProvider, useTheme } from './theme-provider'; + * + * function App() { + * return ( + * + * + * + * ); + * } + * + * function ThemeToggle() { + * const { theme, setTheme } = useTheme(); + * return ; + * } + * ``` + */ + +import React, { createContext, useContext, useEffect, useState, ReactNode } from 'react'; +import { Theme } from './design-tokens'; + +// ============================================ +// Types +// ============================================ + +interface ThemeProviderProps { + children: ReactNode; + defaultTheme?: Theme; + storageKey?: string; +} + +interface ThemeContextType { + theme: Theme; + effectiveTheme: 'light' | 'dark'; // Resolved theme (system → light/dark) + setTheme: (theme: Theme) => void; + toggleTheme: () => void; +} + +// ============================================ +// Context +// ============================================ + +const ThemeContext = createContext(undefined); + +// ============================================ +// Provider Component +// ============================================ + +export function ThemeProvider({ + children, + defaultTheme = 'system', + storageKey = 'ui-theme', +}: ThemeProviderProps) { + const [theme, setThemeState] = useState(() => { + // Load theme from localStorage if available + if (typeof window !== 'undefined') { + const stored = localStorage.getItem(storageKey); + if (stored && ['light', 'dark', 'system'].includes(stored)) { + return stored as Theme; + } + } + return defaultTheme; + }); + + const [effectiveTheme, setEffectiveTheme] = useState<'light' | 'dark'>(() => { + if (theme === 'system') { + return window.matchMedia('(prefers-color-scheme: dark)').matches ? 'dark' : 'light'; + } + return theme; + }); + + // Update effective theme when theme or system preference changes + useEffect(() => { + const root = window.document.documentElement; + + // Remove previous theme classes + root.classList.remove('light', 'dark'); + + if (theme === 'system') { + const systemTheme = window.matchMedia('(prefers-color-scheme: dark)').matches + ? 'dark' + : 'light'; + root.classList.add(systemTheme); + root.setAttribute('data-theme', systemTheme); + setEffectiveTheme(systemTheme); + } else { + root.classList.add(theme); + root.setAttribute('data-theme', theme); + setEffectiveTheme(theme); + } + }, [theme]); + + // Listen for system theme changes when in system mode + useEffect(() => { + if (theme !== 'system') return; + + const mediaQuery = window.matchMedia('(prefers-color-scheme: dark)'); + + const handleChange = (e: MediaQueryListEvent) => { + const systemTheme = e.matches ? 'dark' : 'light'; + const root = window.document.documentElement; + root.classList.remove('light', 'dark'); + root.classList.add(systemTheme); + root.setAttribute('data-theme', systemTheme); + setEffectiveTheme(systemTheme); + }; + + // Modern browsers + if (mediaQuery.addEventListener) { + mediaQuery.addEventListener('change', handleChange); + return () => mediaQuery.removeEventListener('change', handleChange); + } + // Legacy browsers + else if (mediaQuery.addListener) { + mediaQuery.addListener(handleChange); + return () => mediaQuery.removeListener(handleChange); + } + }, [theme]); + + const setTheme = (newTheme: Theme) => { + localStorage.setItem(storageKey, newTheme); + setThemeState(newTheme); + }; + + const toggleTheme = () => { + setTheme(effectiveTheme === 'light' ? 'dark' : 'light'); + }; + + const value: ThemeContextType = { + theme, + effectiveTheme, + setTheme, + toggleTheme, + }; + + return {children}; +} + +// ============================================ +// Hook +// ============================================ + +export function useTheme(): ThemeContextType { + const context = useContext(ThemeContext); + + if (context === undefined) { + throw new Error('useTheme must be used within a ThemeProvider'); + } + + return context; +} + +// ============================================ +// Theme Toggle Component +// ============================================ + +interface ThemeToggleProps { + className?: string; + iconSize?: number; +} + +export function ThemeToggle({ className = '', iconSize = 20 }: ThemeToggleProps) { + const { theme, effectiveTheme, setTheme } = useTheme(); + + return ( + + ); +} + +// ============================================ +// Theme Selector Component (Dropdown) +// ============================================ + +interface ThemeSelectorProps { + className?: string; +} + +export function ThemeSelector({ className = '' }: ThemeSelectorProps) { + const { theme, setTheme } = useTheme(); + const [isOpen, setIsOpen] = useState(false); + + const themes: { value: Theme; label: string; icon: JSX.Element }[] = [ + { + value: 'light', + label: 'Light', + icon: ( + + + + + + + + + + + + ), + }, + { + value: 'dark', + label: 'Dark', + icon: ( + + + + ), + }, + { + value: 'system', + label: 'System', + icon: ( + + + + + + ), + }, + ]; + + return ( +
+ + + {isOpen && ( + <> +
setIsOpen(false)} + aria-hidden="true" + /> +
+
+ {themes.map((t) => ( + + ))} +
+
+ + )} +
+ ); +} + +// ============================================ +// Utility: Apply theme class to body +// ============================================ + +export function useThemeClass() { + const { effectiveTheme } = useTheme(); + + useEffect(() => { + const root = window.document.documentElement; + root.classList.remove('light', 'dark'); + root.classList.add(effectiveTheme); + root.setAttribute('data-theme', effectiveTheme); + }, [effectiveTheme]); +} + +// ============================================ +// Higher-Order Component for Theme +// ============================================ + +export function withTheme

( + Component: React.ComponentType

+) { + return function ThemedComponent(props: P) { + const theme = useTheme(); + return ; + }; +} diff --git a/skills/frontend-design/examples/typescript/utils.ts b/skills/frontend-design/examples/typescript/utils.ts new file mode 100755 index 0000000..1507000 --- /dev/null +++ b/skills/frontend-design/examples/typescript/utils.ts @@ -0,0 +1,354 @@ +/** + * Utility Functions + * + * Location: {project_path}/skills/frontend-design/examples/typescript/utils.ts + */ + +import { type ClassValue, clsx } from 'clsx'; +import { twMerge } from 'tailwind-merge'; + +/** + * Merge Tailwind CSS classes with proper precedence + * Combines clsx and tailwind-merge for optimal class handling + * + * @param inputs - Class values to merge + * @returns Merged class string + * + * @example + * cn('px-4 py-2', 'px-6') // => 'py-2 px-6' (px-6 overwrites px-4) + * cn('text-red-500', condition && 'text-blue-500') // => conditional classes + */ +export function cn(...inputs: ClassValue[]) { + return twMerge(clsx(inputs)); +} + +/** + * Format file size in human-readable format + * + * @param bytes - File size in bytes + * @param decimals - Number of decimal places (default: 2) + * @returns Formatted string (e.g., "1.5 MB") + */ +export function formatFileSize(bytes: number, decimals: number = 2): string { + if (bytes === 0) return '0 Bytes'; + + const k = 1024; + const dm = decimals < 0 ? 0 : decimals; + const sizes = ['Bytes', 'KB', 'MB', 'GB', 'TB']; + const i = Math.floor(Math.log(bytes) / Math.log(k)); + + return `${parseFloat((bytes / Math.pow(k, i)).toFixed(dm))} ${sizes[i]}`; +} + +/** + * Debounce function to limit execution rate + * + * @param func - Function to debounce + * @param wait - Wait time in milliseconds + * @returns Debounced function + */ +export function debounce any>( + func: T, + wait: number +): (...args: Parameters) => void { + let timeout: NodeJS.Timeout | null = null; + + return function executedFunction(...args: Parameters) { + const later = () => { + timeout = null; + func(...args); + }; + + if (timeout) clearTimeout(timeout); + timeout = setTimeout(later, wait); + }; +} + +/** + * Throttle function to limit execution frequency + * + * @param func - Function to throttle + * @param limit - Time limit in milliseconds + * @returns Throttled function + */ +export function throttle any>( + func: T, + limit: number +): (...args: Parameters) => void { + let inThrottle: boolean; + + return function executedFunction(...args: Parameters) { + if (!inThrottle) { + func(...args); + inThrottle = true; + setTimeout(() => (inThrottle = false), limit); + } + }; +} + +/** + * Generate a random ID + * + * @param length - Length of the ID (default: 8) + * @returns Random ID string + */ +export function generateId(length: number = 8): string { + return Math.random() + .toString(36) + .substring(2, 2 + length); +} + +/** + * Check if code is running in browser + */ +export const isBrowser = typeof window !== 'undefined'; + +/** + * Safely access localStorage + */ +export const storage = { + get: (key: string): string | null => { + if (!isBrowser) return null; + try { + return localStorage.getItem(key); + } catch { + return null; + } + }, + set: (key: string, value: string): void => { + if (!isBrowser) return; + try { + localStorage.setItem(key, value); + } catch { + // Handle quota exceeded or other errors + } + }, + remove: (key: string): void => { + if (!isBrowser) return; + try { + localStorage.removeItem(key); + } catch { + // Handle errors + } + }, +}; + +/** + * Copy text to clipboard + * + * @param text - Text to copy + * @returns Promise - Success status + */ +export async function copyToClipboard(text: string): Promise { + if (!isBrowser) return false; + + try { + if (navigator.clipboard) { + await navigator.clipboard.writeText(text); + return true; + } else { + // Fallback for older browsers + const textarea = document.createElement('textarea'); + textarea.value = text; + textarea.style.position = 'fixed'; + textarea.style.opacity = '0'; + document.body.appendChild(textarea); + textarea.select(); + const success = document.execCommand('copy'); + document.body.removeChild(textarea); + return success; + } + } catch { + return false; + } +} + +/** + * Format date in relative time (e.g., "2 hours ago") + * + * @param date - Date to format + * @returns Formatted relative time string + */ +export function formatRelativeTime(date: Date): string { + const now = new Date(); + const diffInSeconds = Math.floor((now.getTime() - date.getTime()) / 1000); + + if (diffInSeconds < 60) return 'just now'; + if (diffInSeconds < 3600) return `${Math.floor(diffInSeconds / 60)}m ago`; + if (diffInSeconds < 86400) return `${Math.floor(diffInSeconds / 3600)}h ago`; + if (diffInSeconds < 604800) return `${Math.floor(diffInSeconds / 86400)}d ago`; + if (diffInSeconds < 2592000) return `${Math.floor(diffInSeconds / 604800)}w ago`; + if (diffInSeconds < 31536000) return `${Math.floor(diffInSeconds / 2592000)}mo ago`; + return `${Math.floor(diffInSeconds / 31536000)}y ago`; +} + +/** + * Truncate text with ellipsis + * + * @param text - Text to truncate + * @param maxLength - Maximum length + * @returns Truncated text + */ +export function truncate(text: string, maxLength: number): string { + if (text.length <= maxLength) return text; + return text.slice(0, maxLength - 3) + '...'; +} + +/** + * Sleep/delay function + * + * @param ms - Milliseconds to sleep + * @returns Promise that resolves after delay + */ +export function sleep(ms: number): Promise { + return new Promise((resolve) => setTimeout(resolve, ms)); +} + +/** + * Clamp a number between min and max + * + * @param value - Value to clamp + * @param min - Minimum value + * @param max - Maximum value + * @returns Clamped value + */ +export function clamp(value: number, min: number, max: number): number { + return Math.min(Math.max(value, min), max); +} + +/** + * Check if user prefers reduced motion + */ +export function prefersReducedMotion(): boolean { + if (!isBrowser) return false; + return window.matchMedia('(prefers-reduced-motion: reduce)').matches; +} + +/** + * Check if user prefers dark mode + */ +export function prefersDarkMode(): boolean { + if (!isBrowser) return false; + return window.matchMedia('(prefers-color-scheme: dark)').matches; +} + +/** + * Format number with commas + * + * @param num - Number to format + * @returns Formatted number string + * + * @example + * formatNumber(1000) // => "1,000" + * formatNumber(1000000) // => "1,000,000" + */ +export function formatNumber(num: number): string { + return num.toString().replace(/\B(?=(\d{3})+(?!\d))/g, ','); +} + +/** + * Abbreviate large numbers + * + * @param num - Number to abbreviate + * @returns Abbreviated number string + * + * @example + * abbreviateNumber(1000) // => "1K" + * abbreviateNumber(1000000) // => "1M" + * abbreviateNumber(1500000) // => "1.5M" + */ +export function abbreviateNumber(num: number): string { + if (num < 1000) return num.toString(); + if (num < 1000000) return `${(num / 1000).toFixed(1).replace(/\.0$/, '')}K`; + if (num < 1000000000) return `${(num / 1000000).toFixed(1).replace(/\.0$/, '')}M`; + return `${(num / 1000000000).toFixed(1).replace(/\.0$/, '')}B`; +} + +/** + * Get initials from name + * + * @param name - Full name + * @param maxLength - Maximum number of initials (default: 2) + * @returns Initials string + * + * @example + * getInitials("John Doe") // => "JD" + * getInitials("Mary Jane Watson") // => "MJ" + */ +export function getInitials(name: string, maxLength: number = 2): string { + return name + .split(' ') + .map((n) => n[0]) + .join('') + .toUpperCase() + .slice(0, maxLength); +} + +/** + * Validate email format + * + * @param email - Email to validate + * @returns True if valid email format + */ +export function isValidEmail(email: string): boolean { + const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/; + return emailRegex.test(email); +} + +/** + * Validate URL format + * + * @param url - URL to validate + * @returns True if valid URL format + */ +export function isValidUrl(url: string): boolean { + try { + new URL(url); + return true; + } catch { + return false; + } +} + +/** + * Remove HTML tags from string + * + * @param html - HTML string + * @returns Plain text + */ +export function stripHtml(html: string): string { + if (!isBrowser) return html; + const tmp = document.createElement('div'); + tmp.innerHTML = html; + return tmp.textContent || tmp.innerText || ''; +} + +/** + * Capitalize first letter of string + * + * @param str - String to capitalize + * @returns Capitalized string + */ +export function capitalize(str: string): string { + return str.charAt(0).toUpperCase() + str.slice(1); +} + +/** + * Convert camelCase to kebab-case + * + * @param str - camelCase string + * @returns kebab-case string + */ +export function camelToKebab(str: string): string { + return str.replace(/([A-Z])/g, '-$1').toLowerCase(); +} + +/** + * Convert kebab-case to camelCase + * + * @param str - kebab-case string + * @returns camelCase string + */ +export function kebabToCamel(str: string): string { + return str.replace(/-([a-z])/g, (g) => g[1].toUpperCase()); +} diff --git a/skills/frontend-design/package.json b/skills/frontend-design/package.json new file mode 100755 index 0000000..858f61e --- /dev/null +++ b/skills/frontend-design/package.json @@ -0,0 +1,44 @@ +{ + "name": "@z-ai/frontend-design-skill", + "version": "2.0.0", + "description": "Comprehensive frontend design skill for systematic and creative web development", + "keywords": [ + "design-system", + "design-tokens", + "frontend", + "ui-components", + "react", + "tailwind", + "typescript", + "accessibility" + ], + "author": "z-ai platform team", + "license": "MIT", + "repository": { + "type": "git", + "url": "https://github.com/z-ai/skills" + }, + "peerDependencies": { + "react": "^18.0.0", + "react-dom": "^18.0.0" + }, + "devDependencies": { + "@tailwindcss/forms": "^0.5.7", + "@tailwindcss/typography": "^0.5.10", + "@types/react": "^18.2.0", + "@types/react-dom": "^18.2.0", + "autoprefixer": "^10.4.16", + "clsx": "^2.1.0", + "postcss": "^8.4.32", + "tailwind-merge": "^2.2.0", + "tailwindcss": "^3.4.0", + "typescript": "^5.3.0" + }, + "files": [ + "SKILL.md", + "README.md", + "LICENSE", + "examples/**/*", + "templates/**/*" + ] +} diff --git a/skills/frontend-design/templates/globals.css b/skills/frontend-design/templates/globals.css new file mode 100755 index 0000000..6bb0cb6 --- /dev/null +++ b/skills/frontend-design/templates/globals.css @@ -0,0 +1,385 @@ +/** + * Global Styles + * + * This file provides base styles, resets, and CSS custom properties + * for the design system. + * + * Location: {project_path}/skills/frontend-design/templates/globals.css + * + * Import this file in your main entry point: + * - React: import './globals.css' in index.tsx or App.tsx + * - Next.js: import './globals.css' in pages/_app.tsx or app/layout.tsx + */ + +@tailwind base; +@tailwind components; +@tailwind utilities; + +/* ============================================ + BASE LAYER - CSS Custom Properties + ============================================ */ + +@layer base { + /* Root CSS Variables - Light Mode (Default) */ + :root { + /* Background & Surfaces */ + --background: 0 0% 100%; + --foreground: 222.2 84% 4.9%; + --card: 0 0% 100%; + --card-foreground: 222.2 84% 4.9%; + --popover: 0 0% 100%; + --popover-foreground: 222.2 84% 4.9%; + + /* Primary Brand */ + --primary: 222.2 47.4% 11.2%; + --primary-foreground: 210 40% 98%; + + /* Secondary */ + --secondary: 210 40% 96.1%; + --secondary-foreground: 222.2 47.4% 11.2%; + + /* Muted */ + --muted: 210 40% 96.1%; + --muted-foreground: 215.4 16.3% 46.9%; + + /* Accent */ + --accent: 210 40% 96.1%; + --accent-foreground: 222.2 47.4% 11.2%; + + /* Destructive/Danger */ + --destructive: 0 84.2% 60.2%; + --destructive-foreground: 210 40% 98%; + + /* Borders */ + --border: 214.3 31.8% 91.4%; + --input: 214.3 31.8% 91.4%; + --ring: 222.2 84% 4.9%; + + /* Misc */ + --radius: 0.5rem; + } + + /* Dark Mode Overrides */ + .dark { + --background: 222.2 84% 4.9%; + --foreground: 210 40% 98%; + --card: 222.2 84% 4.9%; + --card-foreground: 210 40% 98%; + --popover: 222.2 84% 4.9%; + --popover-foreground: 210 40% 98%; + + --primary: 210 40% 98%; + --primary-foreground: 222.2 47.4% 11.2%; + + --secondary: 217.2 32.6% 17.5%; + --secondary-foreground: 210 40% 98%; + + --muted: 217.2 32.6% 17.5%; + --muted-foreground: 215 20.2% 65.1%; + + --accent: 217.2 32.6% 17.5%; + --accent-foreground: 210 40% 98%; + + --destructive: 0 62.8% 30.6%; + --destructive-foreground: 210 40% 98%; + + --border: 217.2 32.6% 17.5%; + --input: 217.2 32.6% 17.5%; + --ring: 212.7 26.8% 83.9%; + } + + /* Base Element Styles */ + * { + @apply border-border; + } + + body { + @apply bg-background text-foreground; + font-feature-settings: "rlig" 1, "calt" 1; + } + + /* Typography Defaults */ + h1, h2, h3, h4, h5, h6 { + @apply font-semibold; + } + + /* Focus Visible Styles */ + *:focus-visible { + @apply outline-none ring-2 ring-ring ring-offset-2; + } + + /* Smooth Scrolling */ + html { + scroll-behavior: smooth; + } + + /* Reduced Motion Preference */ + @media (prefers-reduced-motion: reduce) { + html { + scroll-behavior: auto; + } + + *, + *::before, + *::after { + animation-duration: 0.01ms !important; + animation-iteration-count: 1 !important; + transition-duration: 0.01ms !important; + scroll-behavior: auto !important; + } + } + + /* Custom Scrollbar Styles (Webkit) */ + ::-webkit-scrollbar { + width: 10px; + height: 10px; + } + + ::-webkit-scrollbar-track { + @apply bg-muted; + } + + ::-webkit-scrollbar-thumb { + @apply bg-muted-foreground/30 rounded-md; + } + + ::-webkit-scrollbar-thumb:hover { + @apply bg-muted-foreground/50; + } + + /* Firefox Scrollbar */ + * { + scrollbar-width: thin; + scrollbar-color: hsl(var(--muted-foreground) / 0.3) hsl(var(--muted)); + } + + /* Selection Color */ + ::selection { + @apply bg-primary/20 text-foreground; + } +} + +/* ============================================ + COMPONENTS LAYER - Reusable Components + ============================================ */ + +@layer components { + /* Container */ + .container { + @apply w-full mx-auto px-4 sm:px-6 lg:px-8; + max-width: 1536px; + } + + /* Button Base */ + .btn { + @apply inline-flex items-center justify-center gap-2 rounded-md text-sm font-medium transition-colors focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring disabled:pointer-events-none disabled:opacity-50; + height: 2.5rem; + padding: 0 1rem; + } + + .btn-sm { + height: 2rem; + padding: 0 0.75rem; + font-size: 0.875rem; + } + + .btn-lg { + height: 3rem; + padding: 0 1.5rem; + } + + /* Button Variants */ + .btn-primary { + @apply bg-primary text-primary-foreground hover:bg-primary/90; + } + + .btn-secondary { + @apply bg-secondary text-secondary-foreground hover:bg-secondary/80; + } + + .btn-outline { + @apply border border-input bg-background hover:bg-accent hover:text-accent-foreground; + } + + .btn-ghost { + @apply hover:bg-accent hover:text-accent-foreground; + } + + .btn-destructive { + @apply bg-destructive text-destructive-foreground hover:bg-destructive/90; + } + + /* Input */ + .input { + @apply flex h-10 w-full rounded-md border border-input bg-background px-3 py-2 text-sm ring-offset-background file:border-0 file:bg-transparent file:text-sm file:font-medium placeholder:text-muted-foreground focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 disabled:cursor-not-allowed disabled:opacity-50; + } + + /* Card */ + .card { + @apply rounded-lg border bg-card text-card-foreground shadow-sm; + } + + .card-header { + @apply flex flex-col space-y-1.5 p-6; + } + + .card-title { + @apply text-2xl font-semibold leading-none tracking-tight; + } + + .card-description { + @apply text-sm text-muted-foreground; + } + + .card-content { + @apply p-6 pt-0; + } + + .card-footer { + @apply flex items-center p-6 pt-0; + } + + /* Badge */ + .badge { + @apply inline-flex items-center rounded-full border px-2.5 py-0.5 text-xs font-semibold transition-colors focus:outline-none focus:ring-2 focus:ring-ring focus:ring-offset-2; + } + + .badge-primary { + @apply border-transparent bg-primary text-primary-foreground hover:bg-primary/80; + } + + .badge-secondary { + @apply border-transparent bg-secondary text-secondary-foreground hover:bg-secondary/80; + } + + .badge-outline { + @apply text-foreground; + } + + /* Alert */ + .alert { + @apply relative w-full rounded-lg border p-4; + } + + .alert-destructive { + @apply border-destructive/50 text-destructive dark:border-destructive [&>svg]:text-destructive; + } +} + +/* ============================================ + UTILITIES LAYER - Custom Utilities + ============================================ */ + +@layer utilities { + /* Text Balance - Prevents orphans */ + .text-balance { + text-wrap: balance; + } + + /* Truncate with multiple lines */ + .line-clamp-2 { + display: -webkit-box; + -webkit-line-clamp: 2; + -webkit-box-orient: vertical; + overflow: hidden; + } + + .line-clamp-3 { + display: -webkit-box; + -webkit-line-clamp: 3; + -webkit-box-orient: vertical; + overflow: hidden; + } + + /* Hide scrollbar but keep functionality */ + .scrollbar-hide { + -ms-overflow-style: none; + scrollbar-width: none; + } + + .scrollbar-hide::-webkit-scrollbar { + display: none; + } + + /* Animate on scroll utilities */ + .animate-on-scroll { + opacity: 0; + transform: translateY(20px); + transition: opacity 0.6s ease-out, transform 0.6s ease-out; + } + + .animate-on-scroll.in-view { + opacity: 1; + transform: translateY(0); + } + + /* Glassmorphism effect */ + .glass { + @apply bg-background/80 backdrop-blur-md border border-border/50; + } + + /* Gradient text */ + .gradient-text { + @apply bg-clip-text text-transparent bg-gradient-to-r from-primary to-accent; + } + + /* Focus ring utilities */ + .focus-ring { + @apply focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2; + } + + /* Aspect ratio utilities (if not using @tailwindcss/aspect-ratio) */ + .aspect-video { + aspect-ratio: 16 / 9; + } + + .aspect-square { + aspect-ratio: 1 / 1; + } + + /* Safe area insets (for mobile) */ + .safe-top { + padding-top: env(safe-area-inset-top); + } + + .safe-bottom { + padding-bottom: env(safe-area-inset-bottom); + } + + .safe-left { + padding-left: env(safe-area-inset-left); + } + + .safe-right { + padding-right: env(safe-area-inset-right); + } +} + +/* ============================================ + PRINT STYLES + ============================================ */ + +@media print { + /* Hide unnecessary elements when printing */ + nav, footer, .no-print { + display: none !important; + } + + /* Optimize for printing */ + body { + @apply text-black bg-white; + } + + a { + @apply text-black no-underline; + } + + /* Page breaks */ + h1, h2, h3, h4, h5, h6 { + page-break-after: avoid; + } + + img, table, figure { + page-break-inside: avoid; + } +} diff --git a/skills/frontend-design/templates/tailwind.config.js b/skills/frontend-design/templates/tailwind.config.js new file mode 100755 index 0000000..6fb2509 --- /dev/null +++ b/skills/frontend-design/templates/tailwind.config.js @@ -0,0 +1,263 @@ +/** + * Tailwind CSS Configuration + * + * This configuration integrates the design token system with Tailwind CSS. + * It provides custom colors, spacing, typography, and more. + * + * Location: {project_path}/skills/frontend-design/templates/tailwind.config.js + * + * Installation: + * 1. npm install -D tailwindcss postcss autoprefixer + * 2. Copy this file to your project root as tailwind.config.js + * 3. Update content paths to match your project structure + * 4. Import globals.css in your main entry file + */ + +/** @type {import('tailwindcss').Config} */ +module.exports = { + // Specify which files Tailwind should scan for class names + content: [ + './src/**/*.{js,jsx,ts,tsx}', + './pages/**/*.{js,jsx,ts,tsx}', + './components/**/*.{js,jsx,ts,tsx}', + './app/**/*.{js,jsx,ts,tsx}', + ], + + // Enable dark mode via class strategy + darkMode: ['class'], + + theme: { + // Extend default theme (preserves Tailwind defaults) + extend: { + // Custom colors using CSS variables for theme support + colors: { + border: 'hsl(var(--border))', + input: 'hsl(var(--input))', + ring: 'hsl(var(--ring))', + background: 'hsl(var(--background))', + foreground: 'hsl(var(--foreground))', + primary: { + DEFAULT: 'hsl(var(--primary))', + foreground: 'hsl(var(--primary-foreground))', + }, + secondary: { + DEFAULT: 'hsl(var(--secondary))', + foreground: 'hsl(var(--secondary-foreground))', + }, + destructive: { + DEFAULT: 'hsl(var(--destructive))', + foreground: 'hsl(var(--destructive-foreground))', + }, + muted: { + DEFAULT: 'hsl(var(--muted))', + foreground: 'hsl(var(--muted-foreground))', + }, + accent: { + DEFAULT: 'hsl(var(--accent))', + foreground: 'hsl(var(--accent-foreground))', + }, + popover: { + DEFAULT: 'hsl(var(--popover))', + foreground: 'hsl(var(--popover-foreground))', + }, + card: { + DEFAULT: 'hsl(var(--card))', + foreground: 'hsl(var(--card-foreground))', + }, + }, + + // Border radius scale + borderRadius: { + lg: 'var(--radius)', + md: 'calc(var(--radius) - 2px)', + sm: 'calc(var(--radius) - 4px)', + }, + + // Custom font families + fontFamily: { + sans: ['var(--font-sans)', 'system-ui', 'sans-serif'], + serif: ['var(--font-serif)', 'Georgia', 'serif'], + mono: ['var(--font-mono)', 'monospace'], + }, + + // Typography scale using fluid sizing + fontSize: { + xs: ['clamp(0.75rem, 0.7rem + 0.15vw, 0.875rem)', { lineHeight: '1.4' }], + sm: ['clamp(0.875rem, 0.8rem + 0.2vw, 1rem)', { lineHeight: '1.5' }], + base: ['clamp(1rem, 0.95rem + 0.25vw, 1.125rem)', { lineHeight: '1.6' }], + lg: ['clamp(1.125rem, 1.05rem + 0.3vw, 1.25rem)', { lineHeight: '1.5' }], + xl: ['clamp(1.25rem, 1.15rem + 0.4vw, 1.5rem)', { lineHeight: '1.4' }], + '2xl': ['clamp(1.5rem, 1.35rem + 0.6vw, 2rem)', { lineHeight: '1.3' }], + '3xl': ['clamp(1.875rem, 1.65rem + 0.9vw, 2.5rem)', { lineHeight: '1.2' }], + '4xl': ['clamp(2.25rem, 1.95rem + 1.2vw, 3.5rem)', { lineHeight: '1.1' }], + '5xl': ['clamp(3rem, 2.5rem + 2vw, 4.5rem)', { lineHeight: '1' }], + }, + + // Spacing scale (8px base system) + spacing: { + 0.5: '0.125rem', // 2px + 1.5: '0.375rem', // 6px + 2.5: '0.625rem', // 10px + 3.5: '0.875rem', // 14px + 4.5: '1.125rem', // 18px + 5.5: '1.375rem', // 22px + 6.5: '1.625rem', // 26px + 7.5: '1.875rem', // 30px + 8.5: '2.125rem', // 34px + 9.5: '2.375rem', // 38px + 13: '3.25rem', // 52px + 15: '3.75rem', // 60px + 17: '4.25rem', // 68px + 18: '4.5rem', // 72px + 19: '4.75rem', // 76px + 21: '5.25rem', // 84px + 22: '5.5rem', // 88px + 23: '5.75rem', // 92px + 25: '6.25rem', // 100px + }, + + // Custom shadows + boxShadow: { + sm: '0 1px 2px 0 rgb(0 0 0 / 0.05)', + DEFAULT: '0 1px 3px 0 rgb(0 0 0 / 0.1), 0 1px 2px -1px rgb(0 0 0 / 0.1)', + md: '0 4px 6px -1px rgb(0 0 0 / 0.1), 0 2px 4px -2px rgb(0 0 0 / 0.1)', + lg: '0 10px 15px -3px rgb(0 0 0 / 0.1), 0 4px 6px -4px rgb(0 0 0 / 0.1)', + xl: '0 20px 25px -5px rgb(0 0 0 / 0.1), 0 8px 10px -6px rgb(0 0 0 / 0.1)', + '2xl': '0 25px 50px -12px rgb(0 0 0 / 0.25)', + inner: 'inset 0 2px 4px 0 rgb(0 0 0 / 0.05)', + }, + + // Animation durations + transitionDuration: { + DEFAULT: '220ms', + fast: '150ms', + slow: '300ms', + }, + + // Keyframe animations + keyframes: { + // Fade in animation + 'fade-in': { + '0%': { opacity: '0' }, + '100%': { opacity: '1' }, + }, + // Fade out animation + 'fade-out': { + '0%': { opacity: '1' }, + '100%': { opacity: '0' }, + }, + // Slide in from top + 'slide-in-top': { + '0%': { transform: 'translateY(-100%)' }, + '100%': { transform: 'translateY(0)' }, + }, + // Slide in from bottom + 'slide-in-bottom': { + '0%': { transform: 'translateY(100%)' }, + '100%': { transform: 'translateY(0)' }, + }, + // Slide in from left + 'slide-in-left': { + '0%': { transform: 'translateX(-100%)' }, + '100%': { transform: 'translateX(0)' }, + }, + // Slide in from right + 'slide-in-right': { + '0%': { transform: 'translateX(100%)' }, + '100%': { transform: 'translateX(0)' }, + }, + // Scale in + 'scale-in': { + '0%': { transform: 'scale(0.95)', opacity: '0' }, + '100%': { transform: 'scale(1)', opacity: '1' }, + }, + // Spin animation + spin: { + '0%': { transform: 'rotate(0deg)' }, + '100%': { transform: 'rotate(360deg)' }, + }, + // Shimmer animation (for skeletons) + shimmer: { + '0%': { backgroundPosition: '200% 0' }, + '100%': { backgroundPosition: '-200% 0' }, + }, + // Pulse animation + pulse: { + '0%, 100%': { opacity: '1' }, + '50%': { opacity: '0.5' }, + }, + // Bounce animation + bounce: { + '0%, 100%': { transform: 'translateY(-25%)', animationTimingFunction: 'cubic-bezier(0.8,0,1,1)' }, + '50%': { transform: 'translateY(0)', animationTimingFunction: 'cubic-bezier(0,0,0.2,1)' }, + }, + }, + + // Animation utilities + animation: { + 'fade-in': 'fade-in 0.2s ease-out', + 'fade-out': 'fade-out 0.2s ease-out', + 'slide-in-top': 'slide-in-top 0.3s ease-out', + 'slide-in-bottom': 'slide-in-bottom 0.3s ease-out', + 'slide-in-left': 'slide-in-left 0.3s ease-out', + 'slide-in-right': 'slide-in-right 0.3s ease-out', + 'scale-in': 'scale-in 0.2s ease-out', + spin: 'spin 0.6s linear infinite', + shimmer: 'shimmer 1.5s ease-in-out infinite', + pulse: 'pulse 2s cubic-bezier(0.4, 0, 0.6, 1) infinite', + bounce: 'bounce 1s infinite', + }, + + // Custom z-index scale + zIndex: { + dropdown: '1000', + sticky: '1100', + fixed: '1200', + 'modal-backdrop': '1300', + modal: '1400', + popover: '1500', + tooltip: '1600', + }, + }, + }, + + // Plugins + plugins: [ + // Typography plugin for prose styling + require('@tailwindcss/typography'), + + // Forms plugin for better form styling + require('@tailwindcss/forms'), + + // Custom plugin for component utilities + function ({ addComponents, theme }) { + addComponents({ + // Container utility + '.container': { + width: '100%', + marginLeft: 'auto', + marginRight: 'auto', + paddingLeft: theme('spacing.4'), + paddingRight: theme('spacing.4'), + '@screen sm': { + maxWidth: '640px', + }, + '@screen md': { + maxWidth: '768px', + }, + '@screen lg': { + maxWidth: '1024px', + paddingLeft: theme('spacing.6'), + paddingRight: theme('spacing.6'), + }, + '@screen xl': { + maxWidth: '1280px', + }, + '@screen 2xl': { + maxWidth: '1536px', + }, + }, + }); + }, + ], +}; diff --git a/skills/gift-evaluator/SKILL.md b/skills/gift-evaluator/SKILL.md new file mode 100755 index 0000000..e392744 --- /dev/null +++ b/skills/gift-evaluator/SKILL.md @@ -0,0 +1,83 @@ +--- +name: gift-evaluator +description: The PRIMARY tool for Spring Festival gift analysis and social interaction generation. Use this skill when users upload photos of gifts (alcohol, tea, supplements, etc.) to inquire about their value, authenticity, or how to respond socially. Integrates visual perception, market valuation, and HTML card generation. +license: Internal Tool +--- + +This skill transforms the assistant into an "AI Gift Appraiser" (春节礼品鉴定师). It bridges the gap between raw visual data and complex social context. It is designed to handle the full lifecycle of a user's request: identifying the object, determining its market and social value, and producing a shareable, gamified HTML artifact. + +## Agent Thinking Strategy + +Before and during the execution of tools, maintain a "High EQ" and "Market-Savvy" mindset. You are not just identifying objects; you are decoding social relationships. + +1. **Visual Extraction (The Eye)**: + * Call the vision tool to get a raw description. + * **CRITICAL**: Read the raw description carefully. Extract specific entities: Brand names (e.g., "Moutai", "Dior"), Vintages, Packaging details (e.g., "Dusty bottle" implies old stock, "Gift box" implies formality). + +2. **Valuation Logic (The Brain)**: + * **Price Anchoring**: Use search tools to find the *current* market price. + * **Social Labeling**: Classify the gift based on price and intent: + * `luxury`: High value (> ¥1000), "Hard Currency". + * `standard`: Festive, safe choices (¥200 - ¥1000). + * `budget`: Practical, funny, or cheap (< ¥200). + +3. **Creative Synthesis (The Mouth)**: + * **Deep Critique**: Generate a "Roast" (毒舌点评) of **at least 50 words**. It must combine the visual details (e.g., dust, packaging color) with the price reality. Be spicy but insightful. + * **Structured Strategy**: You must structure the "Thank You Notes" and "Return Gift Ideas" into JSON format for the UI to render. + +## Tool Usage Guidelines +### 1. The Perception Phase (Visual Analysis) +Purpose: Utilizing VLM skills to conduct a multi-dimensional visual decomposition of the uploaded product image. This process automatically identifies and extracts structured data including Brand Recognition, Product Style, Packaging Design, and Aesthetic Category. + +**Output Analysis**: + +* The tool returns a raw string content. Read it to extract keywords for the next step. + +### 2. The Valuation Phase (Search) + +**Purpose**: Validate the product's worth. +**Command**:search "EXTRACTED_KEYWORDS + price + review" + + +### 3. The Content Structuring Phase (Reasoning) + +**Purpose**: Prepare the data for the HTML generator. **Do not call a tool here, just think and format strings.** + +1. **Construct `thank_you_json**`: Create 3 distinct styles of private messages. +* *Format*: `[{"style": "Style Name", "content": "Message..."}]` +* *Requirement*: +* Style 1: "Decent/Formal" (for elders/bosses). +* Style 2: "Friendly/Warm" (for peers/relatives). +* Style 3: "Humorous/Close" (for best friends). + + +2. **Construct `return_gift_json**`: Analyze 4 potential giver personas. +* *Format*: `[{"target": "If giver is...", "item": "Suggest...", "reason": "Why..."}]` +* *Requirement*: Suggestions must include Age/Gender/Relation analysis (e.g., "If giver is an elder male", "If giver is a peer female"). +* *Value Logic*: Adhere to the principle of Value Reciprocity. The return gift's value should primarily match the received gift's value, while adjusting slightly based on the giver's status (e.g., seniority or intimacy). + + +### 4. The Creation Phase (Render) + +**Purpose**: Package the analysis into a modern, interactive HTML card. +**HTML Generation**: + * *Constraint*: The `image_url` parameter in the Python command MUST be the original absolute path.`output_path` must be the full path. + * *Command*: + ```bash + python3 html_tools.py generate_gift_card \ + --product_name "EXTRACTED_NAME" \ + --price "ESTIMATED_PRICE" \ + --evaluation "YOUR_LONG_AND_SPICY_CRITIQUE" \ + --thank_you_json '[{"style":"...","content":"..."}]' \ + --return_gift_json '[{"target":"...","item":"...","reason":"..."}]' \ + --vibe_code "luxury|standard|budget" \ + --image_url "IMAGE_FILE_PATH" \ + --output_path "TARGET_FILE_PATH" + ``` + +## Operational Rules + +1. **JSON Formatting**: The `thank_you_json` and `return_gift_json` arguments MUST be valid JSON strings using double quotes. Do not wrap them in code blocks inside the command. +2. **Critique Depth**: The `evaluation` text must be rich. Don't just say "It's expensive." Say "This 2018 vintage shows your uncle raided his personal cellar; the label wear proves it's real." +3. **Vibe Consistency**: Ensure `vibe_code` matches the `price` assessment. +4. **Final Output**: Always present the path to the generated HTML file. diff --git a/skills/gift-evaluator/html_tools.py b/skills/gift-evaluator/html_tools.py new file mode 100755 index 0000000..3353aee --- /dev/null +++ b/skills/gift-evaluator/html_tools.py @@ -0,0 +1,268 @@ +import os +import argparse +import json +import html +import base64 +import mimetypes +import urllib.request + +def generate_gift_card(product_name, price, evaluation, thank_you_json, return_gift_json, vibe_code, image_url, output_path="gift_card_result.html"): + """ + 生成现代风格的交互式礼品鉴定卡片。 + """ + + # --- 图片转 Base64 逻辑 (保持上一步功能) --- + final_image_src = image_url + try: + image_data = None + mime_type = None + if image_url.startswith(('http://', 'https://')): + req = urllib.request.Request(image_url, headers={'User-Agent': 'Mozilla/5.0'}) + with urllib.request.urlopen(req, timeout=10) as response: + image_data = response.read() + mime_type = response.headers.get_content_type() + else: + if os.path.exists(image_url): + mime_type, _ = mimetypes.guess_type(image_url) + with open(image_url, "rb") as f: + image_data = f.read() + + if image_data: + if not mime_type: mime_type = "image/jpeg" + b64_str = base64.b64encode(image_data).decode('utf-8') + final_image_src = f"data:{mime_type};base64,{b64_str}" + + except Exception as e: + print(f"⚠️ 图片转换 Base64 失败,使用原链接。错误: {e}") + + # --- 1. 数据解析 --- + try: + thank_you_data = json.loads(thank_you_json) + except: + thank_you_data = [{"style": "通用版", "content": thank_you_json}] + + try: + return_gift_data = json.loads(return_gift_json) + except: + return_gift_data = [{"target": "通用建议", "item": return_gift_json, "reason": "万能回礼"}] + + # --- 2. 风格配置 --- + styles = { + "luxury": { + "page_bg": "bg-neutral-900", + "card_bg": "bg-neutral-900/80 backdrop-blur-xl border border-white/10", + "text_main": "text-white", "text_sub": "text-neutral-400", + "accent": "text-amber-400", "tag_bg": "bg-amber-400/20 text-amber-400", + "btn_hover": "hover:bg-amber-400 hover:text-black", + "img_bg": "bg-neutral-800" # 图片衬底色 + }, + "standard": { + "page_bg": "bg-stone-200", + "card_bg": "bg-white/95 backdrop-blur-xl border border-stone-200", + "text_main": "text-stone-800", "text_sub": "text-stone-500", + "accent": "text-red-600", "tag_bg": "bg-red-50 text-red-600", + "btn_hover": "hover:bg-red-600 hover:text-white", + "img_bg": "bg-stone-100" + }, + "budget": { + "page_bg": "bg-yellow-50", + "card_bg": "bg-white border-4 border-black shadow-[8px_8px_0px_0px_rgba(0,0,0,1)]", + "text_main": "text-black", "text_sub": "text-gray-600", + "accent": "text-blue-600", "tag_bg": "bg-black text-white", + "btn_hover": "hover:bg-blue-600 hover:text-white", + "img_bg": "bg-gray-200" + } + } + st = styles.get(vibe_code, styles["standard"]) + if "img_bg" not in st: st["img_bg"] = "bg-black/5" # 兼容兜底 + + # --- 3. 辅助逻辑 --- + is_dark_mode = "text-white" in st['text_main'] + bubble_bg = "bg-white/10 border-white/10" if is_dark_mode else "bg-black/5 border-black/5" + bubble_hover = "hover:bg-white/20" if is_dark_mode else "hover:bg-black/10" + divider_color = "border-white/20" if is_dark_mode else "border-black/10" + + # --- 4. HTML 构建 --- + thank_you_html = "" + for item in thank_you_data: + thank_you_html += f""" +

+
+ {item['style']} + + + 点击复制 + +
+

{item['content']}

+
+ ✓ 已复制 +
+
+ """ + + return_gift_html = "" + for item in return_gift_data: + return_gift_html += f""" +
+
+
+
{item['target']}
+
+
{item['item']}
+
{item['reason']}
+
+ """ + + html_content = f""" + + + + + + 礼品鉴定报告 + + + + + +
+ +
+ +
+ + +
+ +
+
+ AI Gift Analysis +
+

{product_name}

+
+ 当前估值 + {price} +
+
+
+ +
+
+ +

+ + 专家鉴定评价 +

+ +
+ {evaluation} +
+ +
+
AI
+
+ 首席鉴定官 + Verified Analysis +
+
+
+
+ +
+
+
+
+ +
+
+

私信回复话术

+

高情商回复,点击卡片即可复制

+
+
+
+ {thank_you_html} +
+
+ +
+
+
+ +
+
+

推荐回礼策略

+

基于价格区间的最优解

+
+
+
+ {return_gift_html} +
+
+ +
+

Designed by AI Gift Agent • 春节特别版

+
+
+
+ + + + + """ + + try: + directory = os.path.dirname(output_path) + if directory: + os.makedirs(directory, exist_ok=True) + with open(output_path, "w", encoding="utf-8") as f: + f.write(html_content) + return os.path.abspath(output_path) + except Exception as e: + return f"Error saving HTML file: {str(e)}" + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description="Generate Gift Card HTML") + parser.add_argument("action", nargs="?", help="Action command") + parser.add_argument("--product_name", required=True) + parser.add_argument("--price", required=True) + parser.add_argument("--evaluation", required=True) + parser.add_argument("--thank_you_json", required=True) + parser.add_argument("--return_gift_json", required=True) + parser.add_argument("--vibe_code", required=True) + parser.add_argument("--image_url", required=True) + parser.add_argument("--output_path", required=True) + + args = parser.parse_args() + + result_path = generate_gift_card( + product_name=args.product_name, + price=args.price, + evaluation=args.evaluation, + thank_you_json=args.thank_you_json, + return_gift_json=args.return_gift_json, + vibe_code=args.vibe_code, + image_url=args.image_url, + output_path=args.output_path + ) + + print(f"HTML Card generated successfully: {result_path}") \ No newline at end of file diff --git a/skills/image-generation/LICENSE.txt b/skills/image-generation/LICENSE.txt new file mode 100755 index 0000000..1e54539 --- /dev/null +++ b/skills/image-generation/LICENSE.txt @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2025 z-ai-web-dev-sdk Skills + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/skills/image-generation/SKILL.md b/skills/image-generation/SKILL.md new file mode 100755 index 0000000..5289ce1 --- /dev/null +++ b/skills/image-generation/SKILL.md @@ -0,0 +1,583 @@ +--- +name: image-generation +description: Implement AI image generation capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to create images from text descriptions, generate visual content, create artwork, design assets, or build applications with AI-powered image creation. Supports multiple image sizes and returns base64 encoded images. Also includes CLI tool for quick image generation. +license: MIT +--- + +# Image Generation Skill + +This skill guides the implementation of image generation functionality using the z-ai-web-dev-sdk package and CLI tool, enabling creation of high-quality images from text descriptions. + +## Skills Path + +**Skill Location**: `{project_path}/skills/image-generation` + +this skill is located at above path in your project. + +**Reference Scripts**: Example test scripts are available in the `{Skill Location}/scripts/` directory for quick testing and reference. See `{Skill Location}/scripts/image-generation.ts` for a working example. + +## Overview + +Image Generation allows you to build applications that create visual content from text prompts using AI models, enabling creative workflows, design automation, and visual content production. + +**IMPORTANT**: z-ai-web-dev-sdk MUST be used in backend code only. Never use it in client-side code. + +## Prerequisites + +The z-ai-web-dev-sdk package is already installed. Import it as shown in the examples below. + +## Basic Image Generation + +### Simple Image Creation + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; + +async function generateImage(prompt, outputPath) { + const zai = await ZAI.create(); + + const response = await zai.images.generations.create({ + prompt: prompt, + size: '1024x1024' + }); + + const imageBase64 = response.data[0].base64; + + // Save image + const buffer = Buffer.from(imageBase64, 'base64'); + fs.writeFileSync(outputPath, buffer); + + console.log(`Image saved to ${outputPath}`); + return outputPath; +} + +// Usage +await generateImage( + 'A cute cat playing in the garden', + './cat_image.png' +); +``` + +### Multiple Image Sizes + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; + +// Supported sizes +const SUPPORTED_SIZES = [ + '1024x1024', // Square + '768x1344', // Portrait + '864x1152', // Portrait + '1344x768', // Landscape + '1152x864', // Landscape + '1440x720', // Wide landscape + '720x1440' // Tall portrait +]; + +async function generateImageWithSize(prompt, size, outputPath) { + if (!SUPPORTED_SIZES.includes(size)) { + throw new Error(`Unsupported size: ${size}. Use one of: ${SUPPORTED_SIZES.join(', ')}`); + } + + const zai = await ZAI.create(); + + const response = await zai.images.generations.create({ + prompt: prompt, + size: size + }); + + const imageBase64 = response.data[0].base64; + const buffer = Buffer.from(imageBase64, 'base64'); + fs.writeFileSync(outputPath, buffer); + + return { + path: outputPath, + size: size, + fileSize: buffer.length + }; +} + +// Usage - Different sizes +await generateImageWithSize( + 'A beautiful landscape', + '1344x768', + './landscape.png' +); + +await generateImageWithSize( + 'A portrait of a person', + '768x1344', + './portrait.png' +); +``` + +## CLI Tool Usage + +The z-ai CLI tool provides a convenient way to generate images directly from the command line. + +### Basic CLI Usage + +```bash +# Generate image with full options +z-ai image --prompt "A beautiful landscape" --output "./image.png" + +# Short form +z-ai image -p "A cute cat" -o "./cat.png" + +# Specify size +z-ai image -p "A sunset" -o "./sunset.png" -s 1344x768 + +# Portrait orientation +z-ai image -p "A portrait" -o "./portrait.png" -s 768x1344 +``` + +### CLI Use Cases + +```bash +# Website hero image +z-ai image -p "Modern tech office with diverse team collaborating" -o "./hero.png" -s 1440x720 + +# Product image +z-ai image -p "Sleek smartphone on minimalist desk, professional product photography" -o "./product.png" -s 1024x1024 + +# Blog post illustration +z-ai image -p "Abstract visualization of data flowing through networks" -o "./blog_header.png" -s 1344x768 + +# Social media content +z-ai image -p "Vibrant illustration of community connection" -o "./social.png" -s 1024x1024 + +# Website favicon/logo +z-ai image -p "Simple geometric logo with blue gradient, minimal design" -o "./logo.png" -s 1024x1024 + +# Background pattern +z-ai image -p "Subtle geometric pattern, pastel colors, website background" -o "./bg_pattern.png" -s 1440x720 +``` + +## Advanced Use Cases + +### Batch Image Generation + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; +import path from 'path'; + +async function generateImageBatch(prompts, outputDir, size = '1024x1024') { + const zai = await ZAI.create(); + + // Ensure output directory exists + if (!fs.existsSync(outputDir)) { + fs.mkdirSync(outputDir, { recursive: true }); + } + + const results = []; + + for (let i = 0; i < prompts.length; i++) { + try { + const prompt = prompts[i]; + const filename = `image_${i + 1}.png`; + const outputPath = path.join(outputDir, filename); + + const response = await zai.images.generations.create({ + prompt: prompt, + size: size + }); + + const imageBase64 = response.data[0].base64; + const buffer = Buffer.from(imageBase64, 'base64'); + fs.writeFileSync(outputPath, buffer); + + results.push({ + success: true, + prompt: prompt, + path: outputPath, + size: buffer.length + }); + + console.log(`✓ Generated: ${filename}`); + } catch (error) { + results.push({ + success: false, + prompt: prompts[i], + error: error.message + }); + + console.error(`✗ Failed: ${prompts[i]} - ${error.message}`); + } + } + + return results; +} + +// Usage +const prompts = [ + 'A serene mountain landscape at sunset', + 'A futuristic city with flying cars', + 'An underwater coral reef teeming with life' +]; + +const results = await generateImageBatch(prompts, './generated-images'); +console.log(`Generated ${results.filter(r => r.success).length} images`); +``` + +### Image Generation Service + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; +import path from 'path'; +import crypto from 'crypto'; + +class ImageGenerationService { + constructor(outputDir = './generated-images') { + this.outputDir = outputDir; + this.zai = null; + this.cache = new Map(); + } + + async initialize() { + this.zai = await ZAI.create(); + + if (!fs.existsSync(this.outputDir)) { + fs.mkdirSync(this.outputDir, { recursive: true }); + } + } + + generateCacheKey(prompt, size) { + return crypto + .createHash('md5') + .update(`${prompt}-${size}`) + .digest('hex'); + } + + async generate(prompt, options = {}) { + const { + size = '1024x1024', + useCache = true, + filename = null + } = options; + + // Check cache + const cacheKey = this.generateCacheKey(prompt, size); + + if (useCache && this.cache.has(cacheKey)) { + const cachedPath = this.cache.get(cacheKey); + if (fs.existsSync(cachedPath)) { + return { + path: cachedPath, + cached: true, + prompt: prompt, + size: size + }; + } + } + + // Generate new image + const response = await this.zai.images.generations.create({ + prompt: prompt, + size: size + }); + + const imageBase64 = response.data[0].base64; + const buffer = Buffer.from(imageBase64, 'base64'); + + // Determine output path + const outputFilename = filename || `${cacheKey}.png`; + const outputPath = path.join(this.outputDir, outputFilename); + + fs.writeFileSync(outputPath, buffer); + + // Cache result + if (useCache) { + this.cache.set(cacheKey, outputPath); + } + + return { + path: outputPath, + cached: false, + prompt: prompt, + size: size, + fileSize: buffer.length + }; + } + + clearCache() { + this.cache.clear(); + } + + getCacheSize() { + return this.cache.size; + } +} + +// Usage +const service = new ImageGenerationService(); +await service.initialize(); + +const result = await service.generate( + 'A modern office space', + { size: '1440x720' } +); + +console.log('Generated:', result.path); +``` + +### Website Asset Generator + +```bash +# Using CLI for quick website asset generation +z-ai image -p "Modern tech hero banner, blue gradient" -o "./assets/hero.png" -s 1440x720 +z-ai image -p "Team collaboration illustration" -o "./assets/team.png" -s 1344x768 +z-ai image -p "Simple geometric logo" -o "./assets/logo.png" -s 1024x1024 +``` + +## Best Practices + +### 1. Effective Prompt Engineering + +```javascript +function buildEffectivePrompt(subject, style, details = []) { + const components = [ + subject, + style, + ...details, + 'high quality', + 'detailed' + ]; + + return components.filter(Boolean).join(', '); +} + +// Usage +const prompt = buildEffectivePrompt( + 'mountain landscape', + 'oil painting style', + ['sunset lighting', 'dramatic clouds', 'reflection in lake'] +); + +// Result: "mountain landscape, oil painting style, sunset lighting, dramatic clouds, reflection in lake, high quality, detailed" +``` + +### 2. Size Selection Helper + +```javascript +function selectOptimalSize(purpose) { + const sizeMap = { + 'hero-banner': '1440x720', + 'blog-header': '1344x768', + 'social-square': '1024x1024', + 'portrait': '768x1344', + 'product': '1024x1024', + 'landscape': '1344x768', + 'mobile-banner': '720x1440', + 'thumbnail': '1024x1024' + }; + + return sizeMap[purpose] || '1024x1024'; +} + +// Usage +const size = selectOptimalSize('hero-banner'); +await generateImage('website hero image', size, './hero.png'); +``` + +### 3. Error Handling + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; + +async function safeGenerateImage(prompt, size, outputPath, retries = 3) { + let lastError; + + for (let attempt = 1; attempt <= retries; attempt++) { + try { + const zai = await ZAI.create(); + + const response = await zai.images.generations.create({ + prompt: prompt, + size: size + }); + + if (!response.data || !response.data[0] || !response.data[0].base64) { + throw new Error('Invalid response from image generation API'); + } + + const imageBase64 = response.data[0].base64; + const buffer = Buffer.from(imageBase64, 'base64'); + fs.writeFileSync(outputPath, buffer); + + return { + success: true, + path: outputPath, + attempts: attempt + }; + } catch (error) { + lastError = error; + console.error(`Attempt ${attempt} failed:`, error.message); + + if (attempt < retries) { + // Wait before retry (exponential backoff) + await new Promise(resolve => setTimeout(resolve, 1000 * attempt)); + } + } + } + + return { + success: false, + error: lastError.message, + attempts: retries + }; +} +``` + +## Common Use Cases + +1. **Website Design**: Generate hero images, backgrounds, and visual assets +2. **Marketing Materials**: Create social media graphics and promotional images +3. **Product Visualization**: Generate product mockups and variations +4. **Content Creation**: Produce blog post illustrations and thumbnails +5. **Brand Assets**: Create logos, icons, and brand imagery +6. **UI/UX Design**: Generate interface elements and illustrations +7. **Game Development**: Create concept art and game assets +8. **E-commerce**: Generate product images and lifestyle shots + +## Integration Examples + +### Express.js API Endpoint + +```javascript +import express from 'express'; +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; +import path from 'path'; + +const app = express(); +app.use(express.json()); +app.use('/images', express.static('generated-images')); + +let zaiInstance; +const outputDir = './generated-images'; + +async function initZAI() { + zaiInstance = await ZAI.create(); + if (!fs.existsSync(outputDir)) { + fs.mkdirSync(outputDir, { recursive: true }); + } +} + +app.post('/api/generate-image', async (req, res) => { + try { + const { prompt, size = '1024x1024' } = req.body; + + if (!prompt) { + return res.status(400).json({ error: 'Prompt is required' }); + } + + const response = await zaiInstance.images.generations.create({ + prompt: prompt, + size: size + }); + + const imageBase64 = response.data[0].base64; + const buffer = Buffer.from(imageBase64, 'base64'); + + const filename = `img_${Date.now()}.png`; + const filepath = path.join(outputDir, filename); + fs.writeFileSync(filepath, buffer); + + res.json({ + success: true, + imageUrl: `/images/${filename}`, + prompt: prompt, + size: size + }); + } catch (error) { + res.status(500).json({ + success: false, + error: error.message + }); + } +}); + +initZAI().then(() => { + app.listen(3000, () => { + console.log('Image generation API running on port 3000'); + }); +}); +``` + +## CLI Integration in Scripts + +### Shell Script Example + +```bash +#!/bin/bash + +# Generate website assets using CLI +echo "Generating website assets..." + +z-ai image -p "Modern tech hero banner, blue gradient" -o "./assets/hero.png" -s 1440x720 +z-ai image -p "Team collaboration illustration" -o "./assets/team.png" -s 1344x768 +z-ai image -p "Simple geometric logo" -o "./assets/logo.png" -s 1024x1024 + +echo "Assets generated successfully!" +``` + +## Troubleshooting + +**Issue**: "SDK must be used in backend" +- **Solution**: Ensure z-ai-web-dev-sdk is only used in server-side code + +**Issue**: Invalid size parameter +- **Solution**: Use only supported sizes: 1024x1024, 768x1344, 864x1152, 1344x768, 1152x864, 1440x720, 720x1440 + +**Issue**: Generated image doesn't match prompt +- **Solution**: Make prompts more specific and descriptive. Include style, details, and quality terms + +**Issue**: CLI command not found +- **Solution**: Ensure z-ai CLI is properly installed and in PATH + +**Issue**: Image file is corrupted +- **Solution**: Verify base64 decoding and file writing are correct + +## Prompt Engineering Tips + +### Good Prompts +- ✓ "Professional product photography of wireless headphones, white background, studio lighting, high quality" +- ✓ "Mountain landscape at golden hour, oil painting style, dramatic clouds, detailed" +- ✓ "Modern minimalist logo for tech company, blue and white, geometric shapes" + +### Poor Prompts +- ✗ "headphones" +- ✗ "picture of mountains" +- ✗ "logo" + +### Prompt Components +1. **Subject**: What you want to see +2. **Style**: Art style, photography style, etc. +3. **Details**: Specific elements, colors, mood +4. **Quality**: "high quality", "detailed", "professional" + +## Supported Image Sizes + +- `1024x1024` - Square +- `768x1344` - Portrait +- `864x1152` - Portrait +- `1344x768` - Landscape +- `1152x864` - Landscape +- `1440x720` - Wide landscape +- `720x1440` - Tall portrait + +## Remember + +- Always use z-ai-web-dev-sdk in backend code only +- The SDK is already installed - import as shown +- CLI tool is available for quick image generation +- Supported sizes are specific - use the provided list +- Base64 images need to be decoded before saving +- Consider caching for repeated prompts +- Implement retry logic for production applications +- Use descriptive prompts for better results diff --git a/skills/image-generation/scripts/image-generation.ts b/skills/image-generation/scripts/image-generation.ts new file mode 100755 index 0000000..7596f32 --- /dev/null +++ b/skills/image-generation/scripts/image-generation.ts @@ -0,0 +1,28 @@ +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; + +async function main(prompt: string, size: '1024x1024' | '768x1344' | '864x1152' | '1344x768' | '1152x864' | '1440x720' | '720x1440', outFile: string) { + try { + const zai = await ZAI.create(); + + const response = await zai.images.generations.create({ + prompt, + size + }); + + const base64 = response?.data?.[0]?.base64; + if (!base64) { + console.error('No image data returned by the API'); + console.log('Full response:', JSON.stringify(response, null, 2)); + return; + } + + const buffer = Buffer.from(base64, 'base64'); + fs.writeFileSync(outFile, buffer); + console.log(`Image saved to ${outFile}`); + } catch (err: any) { + console.error('Image generation failed:', err?.message || err); + } +} + +main('A cute kitten', '1024x1024', './output.png'); diff --git a/skills/pdf/LICENSE.txt b/skills/pdf/LICENSE.txt new file mode 100755 index 0000000..c55ab42 --- /dev/null +++ b/skills/pdf/LICENSE.txt @@ -0,0 +1,30 @@ +© 2025 Anthropic, PBC. All rights reserved. + +LICENSE: Use of these materials (including all code, prompts, assets, files, +and other components of this Skill) is governed by your agreement with +Anthropic regarding use of Anthropic's services. If no separate agreement +exists, use is governed by Anthropic's Consumer Terms of Service or +Commercial Terms of Service, as applicable: +https://www.anthropic.com/legal/consumer-terms +https://www.anthropic.com/legal/commercial-terms +Your applicable agreement is referred to as the "Agreement." "Services" are +as defined in the Agreement. + +ADDITIONAL RESTRICTIONS: Notwithstanding anything in the Agreement to the +contrary, users may not: + +- Extract these materials from the Services or retain copies of these + materials outside the Services +- Reproduce or copy these materials, except for temporary copies created + automatically during authorized use of the Services +- Create derivative works based on these materials +- Distribute, sublicense, or transfer these materials to any third party +- Make, offer to sell, sell, or import any inventions embodied in these + materials +- Reverse engineer, decompile, or disassemble these materials + +The receipt, viewing, or possession of these materials does not convey or +imply any license or right beyond those expressly granted above. + +Anthropic retains all right, title, and interest in these materials, +including all copyrights, patents, and other intellectual property rights. diff --git a/skills/pdf/SKILL.md b/skills/pdf/SKILL.md new file mode 100755 index 0000000..471b2a2 --- /dev/null +++ b/skills/pdf/SKILL.md @@ -0,0 +1,1534 @@ +--- +name: pdf +description: Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When GLM needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale. +license: Proprietary. LICENSE.txt has complete terms +--- + +# PDF Processing Guide + +## Overview + +This guide covers essential PDF processing operations using Python libraries and command-line tools. For advanced features, JavaScript libraries, and detailed examples, see reference.md. If you need to fill out a PDF form, read forms.md and follow its instructions. + +Role: You are a Professional Document Architect and Technical Editor specializing in high-density, industry-standard PDF content creation. If the content is not rich enough, use the web-search skill first. + +Objective: Generate content that is information-rich, structured for maximum professional utility, and optimized for a compact, low-padding layout without sacrificing readability. + +--- + + +## Core Constraints (Must Follow) + +### 1. Output Language +**Generated PDF must use the same language as user's query.** +- Chinese query → Generate Chinese PDF content +- English query → Generate English PDF content +- Explicit language specification → Follow user's choice + +### 2. Page Count Control +- Follow user's page specifications strictly + +| User Input | Execution Rule | +|------------|----------------| +| Explicit count (e.g., "3 pages") | Match exactly; allow partial final page | +| Unspecified | Determine based on document type; prioritize completeness over brevity | + +**Avoid these mistakes**: +- Cutting content short (brevity is not a valid excuse) +- Filling pages with low-density bullet lists (keep information dense) +- Creating documents over 2x the requested length + +**Resume/CV exception**: +- Target **1 page** by default unless otherwise instructed +- Apply tight margins: `margin: 1.5cm` + +### 3. Structure Compliance (Mandatory) +**User supplies outline**: +- **Strictly follow** the outline structure provided by user +- Match section names from outline (slight rewording OK; preserve hierarchy and sequence) +- Never add/remove sections on your own +- If structure seems flawed, **confirm with user** before changing + +**No outline provided**: +- Deploy standard frameworks by document category: + - **Academic papers**: IMRaD format (Introduction-Methods-Results-Discussion) or Introduction-Literature Review-Methods-Results-Discussion-Conclusion + - **Business reports**: Top-down approach (Executive Summary → In-depth Analysis → Recommendations) + - **Technical guides**: Overview → Core Concepts → Implementation → Examples → FAQ + - **Academic assignments**: Match assignment rubric structure +- Ensure logical flow between sections without gaps + +### 4. Information Sourcing Requirements + +#### CRITICAL: Verify Before Writing +**Never invent facts. If unsure, SEARCH immediately.** + +Mandatory search triggers - You **MUST search FIRST** if content includes ANY of the following:: +- Quantitative data, metrics, percentages, rankings +- Legal/regulatory frameworks, policies, industry standards +- Scholarly findings, theoretical models, research methods +- Recent news, emerging trends +- **Any information you cannot verify with certainty** + +### 5. Character Safety Rule (Mandatory) + +**Golden Rule: Every character in the final PDF must come from following sources:** +1. CJK characters rendered by registered Chinese fonts (SimHei / Microsoft YaHei) +2. Mathematical/relational operators (e.g., `+` ,`−` , `×`, `÷`, `±`, `≤`,`√`, `∑`,`≅`, `∫`, `π`, `∠`, etc.) + +**FORBIDDEN unicode escape sequence (DO NOT USE):** +1. Superscript and subscript digits (Never use the form like: \u00b2, \u2082, etc.) +2. Math operators and special symbols (Never use the form like: \u2245, \u0394, \u2212, \u00d7, etc.) +3. Emoji characters (Never use the form like: \u2728, \u2705, etc.) + +**The ONLY way to produce bold text, superscripts, subscripts, or Mathematical/relational operators is through ReportLab tags inside `Paragraph()` objects:** + +| Need | Correct Method | Correct Example | +|------|---------------|---------| +| Superscript | `` tag in `Paragraph()` | `Paragraph('102 × 103 = 105', style)` | +| Subscript | `` tag in `Paragraph()` | `Paragraph('H2O', style)` | +| Bold | `` tag in `Paragraph()` | `Paragraph('Title', style)` | +| Mathematical/relational operators | Literal char in `Paragraph()` | `Paragraph('AB ⊥ AC, ∠A = 90°, and ΔABC ≅ ΔDCF', style)` | +| Scientific notation | Combined tags in `Paragraph()` | `Paragraph('1.2 × 108 kg/m3', style)` | + +```python +from reportlab.platypus import Paragraph +from reportlab.lib.styles import ParagraphStyle +from reportlab.lib.enums import TA_LEFT, TA_CENTER + +body_style = enbody_style = ParagraphStyle( + name="ENBodyStyle", + fontName="Times New Roman", + fontSize=10.5, + leading=18, + alignment=TA_JUSTIFY, +) +header_style = ParagraphStyle( + name='CoverTitle', + fontName='Times New Roman', + fontSize=42, + leading=50, + alignment=TA_CENTER, + spaceAfter=36 +) + +# Superscript: area unit +Paragraph('Total area: 500 m2', body_style) + +# Subscript: chemical formula +Paragraph('The reaction produces CO2 and H2O', body_style) + +# Scientific notation: large number with superscript +Paragraph('Speed of light: 3.0 × 108 m/s', body_style) + +# Combined superscript and subscript +Paragraph('Ek = mv2/2', body_style) + +# Bold heading +Paragraph('Chapter 1: Introduction', header_style) + +# Math symbols in body text +Paragraph('When ∠ A = 90°, AB ⊥ AC and ΔABC ≅ ΔDEF', body_style) +``` + +**Pre-generation check — before writing ANY string, ask:** +> "Does this string contain a character outside basic CJK or Mathematical/relational operators?" +> If YES → it MUST be inside a `Paragraph()` with the appropriate tag. +> If it is a superscript/subscript digit in raw unicode escape sequence form → REPLACE with ``/`` tag. + +**NEVER rely on post-generation scanning. Prevent at the point of writing.** + +## Font Setup (Guaranteed Success Method) + +### CRITICAL: Allowed Fonts Only +**You MUST ONLY use the following registered fonts. Using ANY other font (such as Arial, Helvetica, Courier, Georgia, etc.) is STRICTLY FORBIDDEN and will cause rendering failures.** + +| Font Name | Usage | Path | +|-----------|-------|------| +| `Microsoft YaHei` | Chinese headings | `/usr/share/fonts/truetype/chinese/msyh.ttf` | +| `SimHei` | Chinese body text | `/usr/share/fonts/truetype/chinese/SimHei.ttf` | +| `SarasaMonoSC` | Chinese code blocks | `/usr/share/fonts/truetype/chinese/SarasaMonoSC-Regular.ttf` | +| `Times New Roman` | English text, numbers, tables | `/usr/share/fonts/truetype/english/Times-New-Roman.ttf` | +| `Calibri` | English alternative | `/usr/share/fonts/truetype/english/calibri-regular.ttf` | +| `DejaVuSans` | Formulas, symbols, code | `/usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf` | + +**FORBIDDEN fonts (DO NOT USE):** +- ❌ Arial, Arial-Bold, Arial-Italic +- ❌ Helvetica, Helvetica-Bold, Helvetica-Oblique +- ❌ Courier, Courier-Bold +- ❌ Any font not listed in the table above + +**For bold text and superscript/subscript:** +- Must call `registerFontFamily()` after registering fonts +- Then use ``, ``, `` tags in Paragraph +- **CRITICAL**: These tags ONLY work inside `Paragraph()` objects, NOT in plain strings + +### Font Registration Template +```python +from reportlab.pdfbase import pdfmetrics +from reportlab.pdfbase.ttfonts import TTFont +from reportlab.pdfbase.pdfmetrics import registerFontFamily + +# Chinese fonts +pdfmetrics.registerFont(TTFont('Microsoft YaHei', '/usr/share/fonts/truetype/chinese/msyh.ttf')) +pdfmetrics.registerFont(TTFont('SimHei', '/usr/share/fonts/truetype/chinese/SimHei.ttf')) +pdfmetrics.registerFont(TTFont("SarasaMonoSC", '/usr/share/fonts/truetype/chinese/SarasaMonoSC-Regular.ttf')) + +# English fonts +pdfmetrics.registerFont(TTFont('Times New Roman', '/usr/share/fonts/truetype/english/Times-New-Roman.ttf')) +pdfmetrics.registerFont(TTFont('Calibri', '/usr/share/fonts/truetype/english/calibri-regular.ttf')) + +# Symbol/Formula font +pdfmetrics.registerFont(TTFont("DejaVuSans", '/usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf')) + +# CRITICAL: Register font families to enable , , tags +registerFontFamily('Microsoft YaHei', normal='Microsoft YaHei', bold='Microsoft YaHei') +registerFontFamily('SimHei', normal='SimHei', bold='SimHei') +registerFontFamily('Times New Roman', normal='Times New Roman', bold='Times New Roman') +registerFontFamily('Calibri', normal='Calibri', bold='Calibri') +registerFontFamily('DejaVuSans', normal='DejaVuSans', bold='DejaVuSans') +``` + +### Font Configuration by Document Type + +**For Chinese PDFs:** +- Body text: `SimHei` or `Microsoft YaHei` +- Headings: `Microsoft YaHei` (MUST use for Chinese headings) +- Code blocks: `SarasaMonoSC` +- Formulas/symbols: `DejaVuSans` +- **In tables: ALL Chinese content and numbers MUST use `SimHei`** + +**For English PDFs:** +- Body text: `Times New Roman` +- Headings: `Times New Roman` (MUST use for English headings) +- Code blocks: `DejaVuSans` +- **In tables: ALL English content and numbers MUST use `Times New Roman`** + +**For Mixed Chinese-English PDFs (CRITICAL):** +- Chinese text and numbers: Use `SimHei` +- English text: Use `Times New Roman` +- **ALWAYS apply this rule when generating PDFs containing both Chinese and English text** +- **In tables: ALL Chinese content and numbers MUST use `SimHei`, ALL English content MUST use `Times New Roman`** +- **Mixed Chinese-English Text Font Handling**: When a single string contains **both Chinese and English characters (e.g., "My name is Lei Shen (沈磊)")**: MUST split the string by language and apply different fonts to each part using ReportLab's inline `` tags within `Paragraph` objects. English fonts (e.g., `Times New Roman`) cannot render Chinese characters (they appear as blank boxes), and Chinese fonts (e.g., `SimHei`) render English with poor spacing. Must set `ParagraphStyle.fontName` to your **base font**, then wrap segments of the other language with `` inline tags. + +```python +from reportlab.lib.styles import ParagraphStyle +from reportlab.platypus import Paragraph +from reportlab.pdfbase import pdfmetrics +from reportlab.pdfbase.ttfonts import TTFont + +pdfmetrics.registerFont(TTFont('SimHei', '/usr/share/fonts/truetype/chinese/SimHei.ttf')) +pdfmetrics.registerFont(TTFont('Times New Roman', '/usr/share/fonts/truetype/english/Times-New-Roman.ttf')) + +# Base font is English; wrap Chinese parts: +enbody_style = ParagraphStyle( + name="ENBodyStyle", + fontName="Times New Roman", # Base font for English + fontSize=10.5, + leading=18, + alignment=TA_JUSTIFY, +) +# Wrap Chinese segments with tag +story.append(Paragraph( + 'Zhipu QingYan (智谱清言) is developed by Z.ai' + 'My name is Lei Shen (沈磊)', + '文心一言 (ERNIE Bot) is by Baidu.', + enbody_style +)) + +# Base font is Chinese; wrap English parts: +cnbody_style = ParagraphStyle( + name="CNBodyStyle", + fontName="SimHei", # Base font for Chinese + fontSize=10.5, + leading=18, + alignment=TA_JUSTIFY, +) +# Wrap Chinese segments with tag +story.append(Paragraph( + '本报告使用 GPT-4 ' + '和 GLM 进行测试。', + cnbody_style +)) +``` + +### Chinese Plot PNG Method +If using Python to generate PNGs containing Chinese characters: +```python +import matplotlib.pyplot as plt +plt.rcParams['font.sans-serif'] = ['SimHei'] +plt.rcParams['axes.unicode_minus'] = False +``` + +### Available Font Paths +Run `fc-list` to get more fonts. Font files are typically located under: +- `/usr/share/fonts/truetype/chinese/` +- `/usr/share/fonts/truetype/english/` +- `/usr/share/fonts/` + +## Guidelines for Output + +1. **Information Density**: Prioritize depth and conciseness. Avoid fluff or excessive introductory filler. Use professional, precise terminology. + +2. **Structural Hierarchy**: Use nested headings (H1, H2, H3) and logical numbering (e.g., 1.1, 1.1.1) to organize complex data. + +3. **Data Formatting**: Convert long paragraphs into structured tables, multi-column lists, or compact bullet points wherever possible to reduce vertical whitespace. + +4. **Visual Rhythm**: Use horizontal rules (---) to separate major sections. Ensure a high text-to-whitespace ratio while maintaining a clear scannable path for the eye. + +5. **Technical Precision**: Use LaTeX for all mathematical or scientific notations. Ensure all tables are formatted with clear headers. + +6. **Tone**: Academic, corporate, and authoritative. Adapt to the specific professional field (e.g., Legal, Engineering, Financial) as requested. + +7. **Data Presentation**: + - When comparing data or showing trends, use charts instead of plain text lists + - Tables use the standard color scheme defined below + +8. **Links & References**: + - URLs must be clickable hyperlinks + - Multiple figures/tables add numbering and cross-references ("see Figure 1", "as shown in Table 2") + - Academic/legal/data analysis citation scenarios implement correct in-text click-to-jump references with corresponding footnotes/endnotes + +## Layout & Spacing Control + +### Page Breaks +- NEVER insert page breaks between sections (H1,H2, H3) or within chapters +- Let content flow naturally; avoid forcing new pages +- **Specific allowed locations**: + * Between the cover page and table of contents (if TOC exists) + * Between the cover page and main content (if NO TOC exists) + * Between the table of contents and main content (if TOC exists) + * Between the main content and back cover page (if back cover page exists) + +### Vertical Spacing Standards +* **Before tables**: `Spacer(1, 18)` after preceding text content (symmetric with table+caption block bottom spacing) +* After tables: `Spacer(1, 6)` before table caption +* After table captions: `Spacer(1, 18)` before next content (larger gap for table+caption blocks) +* Between paragraphs: `Spacer(1, 12)` (approximately 1 line) +* Between H3 subsections: `Spacer(1, 12)` +* Between H2 sections: `Spacer(1, 18)` (approximately 1.5 lines) +* Between H1 sections: `Spacer(1, 24)` (approximately 2 lines) +* NEVER use `Spacer(1, X)` where X > 24, except for intentional H1 major section breaks or cover page elements + +### Cover Page Specifications +When creating PDFs with cover pages, use the following enlarged specifications: + +**Title Formatting:** +- Main title font size: `36-48pt` (vs normal heading 18-20pt) +- Subtitle font size: `18-24pt` +- Author/date font size: `14-16pt` +- ALL titles MUST be bold: Use `` tags in Paragraph (requires `registerFontFamily()` call first) + +**Cover Page Spacing:** +- Top margin to title: `Spacer(1, 120)` or more (push title to upper-middle area) +- After main title: `Spacer(1, 36)` before subtitle +- After subtitle: `Spacer(1, 48)` before author/institution info +- Between author lines: `Spacer(1, 18)` +- After author block: `Spacer(1, 60)` before date +- Use `PageBreak()` after cover page content + +**Alignment:** +- All text or image in cover page must use `TA_CENTER` + +**Cover Page Style Example:** +```python +# Cover page styles +cover_title_style = ParagraphStyle( + name='CoverTitle', + fontName='Microsoft YaHei', # or 'Times New Roman' for English + fontSize=42, + leading=50, + alignment=TA_CENTER, + spaceAfter=36 +) + +cover_subtitle_style = ParagraphStyle( + name='CoverSubtitle', + fontName='SimHei', # or 'Times New Roman' for English + fontSize=20, + leading=28, + alignment=TA_CENTER, + spaceAfter=48 +) + +cover_author_style = ParagraphStyle( + name='CoverAuthor', + fontName='SimHei', # or 'Times New Roman' for English + fontSize=14, + leading=22, + alignment=TA_CENTER, + spaceAfter=18 +) + +# Cover page construction +story.append(Spacer(1, 120)) # Push down from top +story.append(Paragraph("报告主标题", cover_title_style)) +story.append(Spacer(1, 36)) +story.append(Paragraph("副标题或说明文字", cover_subtitle_style)) +story.append(Spacer(1, 48)) +story.append(Paragraph("作者姓名", cover_author_style)) +story.append(Paragraph("所属机构", cover_author_style)) +story.append(Spacer(1, 60)) +story.append(Paragraph("2025年2月", cover_author_style)) +story.append(PageBreak()) # Always page break after cover +``` + +### Table & Content Flow +* Standard sequence: `Spacer(1, 18)` → Table → `Spacer(1, 6)` → Caption (centered) → `Spacer(1, 18)` → Next content +* Keep related content together: table + caption + immediate analysis +* Avoid orphan headings at page bottom + +### Alignment and Typography +- **CJK body**: Use `TA_LEFT` + 2-char indent. Headings: no indent. +- **Font sizes**: Body 11pt, subheadings 14pt, headings 18-20pt +- **Line height**: 1.5-1.6 (keep line leading at 1.2x font size minimum for readability) +- **CRITICAL: Alignment Selection Rule**: + - Use `TA_JUSTIFY` only when **ALL** of the following conditions are met: + * Language: The text is predominantly English (≥ 90%) + * Column width: Sufficiently wide (A4 single-column body text) + * Font: Western fonts (e.g. Times New Roman / Calibri) + * Chinese content: None or negligible + - Otherwise, always default to `TA_LEFT` + - **Note**: CJK text with `TA_JUSTIFY` can cause orphaned punctuation (commas, periods) at line start + - For Chinese text, always add `wordWrap='CJK'` to ParagraphStyle to ensure proper typography rules + +### Style Configuration +* Normal paragraph: `spaceBefore=0`, `spaceAfter=6-12` +* Headings: `spaceBefore=12-18`, `spaceAfter=6-12` +* **Headings must be bold**: Use `` tags in Paragraph (requires `registerFontFamily()` call first) +* Table captions: `spaceBefore=3`, `spaceAfter=6`, `alignment=TA_CENTER` +* **CRITICAL**: For Chinese text, always add `wordWrap='CJK'` to ParagraphStyle + - Prevents closing punctuation from appearing at line start + - Prevents opening brackets from appearing at line end + - Ensures proper Chinese typography rules + +### Table Formatting + +#### Standard Table Color Scheme (MUST USE for ALL tables) +```python +# Define standard colors for consistent table styling +TABLE_HEADER_COLOR = colors.HexColor('#1F4E79') # Dark blue for header +TABLE_HEADER_TEXT = colors.white # White text for header +TABLE_ROW_EVEN = colors.white # White for even rows +TABLE_ROW_ODD = colors.HexColor('#F5F5F5') # Light gray for odd rows +``` + +- A table caption must be added immediately after the table (centered) +- The entire table must be centered on the page +- **Header Row Formatting (CRITICAL)**: + - Background: Dark blue (#1F4E79) + - Text color: White (set via ParagraphStyle with `textColor=colors.white`) + - Font weight: **Bold** (use `` tags in Paragraph after calling `registerFontFamily()`) + - **IMPORTANT**: Bold tags ONLY work inside `Paragraph()` objects. Plain strings like `'Text'` will NOT render bold. +- **Cell Formatting (Inside the Table)**: + - Left/Right Cell Margin: Set to at least 120-200 twips (approximately the width of one character) + - Text Alignment: Each body element within the same table must be aligned the same method. + - **Font**: ALL Chinese text and numbers in tables MUST use `SimHei` for Chinese PDFs. + ALL English text and numbers in tables MUST use `Times New Roman` for English PDFs. + ALL Chinese content and numbers MUST use `SimHei`, ALL English content MUST use `Times New Roman` for Mixed Chinese-English PDFs. +- **Units with Exponents (CRITICAL)**: + - PROHIBITED: `W/m2`, `kg/m3`, `m/s2` (plain text exponents) + - RIGHT: `Paragraph('W/m2', style)`, `Paragraph('kg/m3', style)` (proper superscript in Paragraph) + - Always use `` tags inside Paragraph objects for unit exponents in table cells +- **Numeric Values in Tables (CRITICAL)**: + - Large numbers MUST use scientific notation: `Paragraph('-1.246 × 108', style)` not `-124600000` + - Small decimals MUST use scientific notation: `Paragraph('2.5 × 10-3', style)` not `0.0025` + - Threshold: Use scientific notation when |value| ≥ 10000 or |value| ≤ 0.001 + - Format: `Paragraph('coefficient × 10exponent', style)` (e.g., `Paragraph('-1.246 × 108', style)`) + +#### Table Cell Paragraph Wrapping (MANDATORY - REVIEW BEFORE EVERY TABLE) + +**STOP AND CHECK**: Before creating ANY table, verify that ALL text cells use `Paragraph()`. + +```python +# 1) key point in Chinese: wordWrap="CJK" +tbl_center = ParagraphStyle( + "tbl_center", + fontName="SimHei", + fontSize=9, + leading=12, + alignment=TA_CENTER, + wordWrap="CJK", +) + +# 2) ALL content MUST be wrapped in Paragraph - NO EXCEPTIONS for text +findings_data = [] +for a, b, c in findings: + findings_data.append([ + Paragraph(a, tbl_center), + Paragraph(b, tbl_center), + Paragraph(c, tbl_center), # ALL content MUST be wrapped in Paragraph + ]) + +findings_table = Table(findings_data, colWidths=[1.8*cm, 3*cm, 9*cm]) +``` + +**Complete Table Example:** +```python +from reportlab.platypus import Table, TableStyle, Paragraph, Image +from reportlab.lib.styles import ParagraphStyle +from reportlab.lib import colors +from reportlab.lib.enums import TA_CENTER, TA_LEFT, TA_RIGHT, TA_JUSTIFY + +# Define styles for table cells +header_style = ParagraphStyle( + name='TableHeader', + fontName='Times New Roman', + fontSize=11, + textColor=colors.white, + alignment=TA_CENTER +) + +cell_style = ParagraphStyle( + name='TableCell', + fontName='Times New Roman', + fontSize=10, + textColor=colors.black, + alignment=TA_CENTER +) + +cell_style_jus = ParagraphStyle( + name='TableCellLeft', + fontName='Times New Roman', + fontSize=10, + textColor=colors.black, + alignment=TA_JUSTIFY +) + +cell_style_right = ParagraphStyle( + name='TableCellRight', + fontName='Times New Roman', + fontSize=10, + textColor=colors.black, + alignment=TA_RIGHT +) + +# ✅ CORRECT: All text content wrapped in Paragraph() +data = [ + # Header row - bold text with Paragraph + [ + Paragraph('Parameter', header_style), + Paragraph('Unit', header_style), + Paragraph('Value', header_style), + Paragraph('Note', header_style) + ], + # Data rows - all text in Paragraph + [ + Paragraph('Temperature', cell_style_jus), + Paragraph('°C', cell_style), + Paragraph('25.5', cell_style_jus), + Paragraph('Ambient', cell_style) + ], + [ + Paragraph('Pressure', cell_style_jus), + Paragraph('Pa', cell_style), + Paragraph('1.01 × 105', cell_style_jus), # Scientific notation + Paragraph('Standard', cell_style) + ], + [ + Paragraph('Density', cell_style_jus), + Paragraph('kg/m3', cell_style), # Unit with exponent + Paragraph('1.225', cell_style_jus), + Paragraph('Air at STP', cell_style) + ], + [ + Paragraph('H2O Content', cell_style_jus), # Subscript + Paragraph('%', cell_style), + Paragraph('45.2', cell_style_jus), + Paragraph('Relative humidity', cell_style) + ] +] + +# ❌ PROHIBITED: Plain strings - NEVER DO THIS +# data = [ +# ['Parameter', 'Unit', 'Value'], # Bold won't work! +# ['Pressure', 'Pa', '1.01 × 105'], # Superscript won't work! +# ] + +# Create table +table = Table(data, colWidths=[120, 80, 100, 120]) +table.setStyle(TableStyle([ + # Header styling + ('BACKGROUND', (0, 0), (-1, 0), colors.HexColor('#1F4E79')), + ('TEXTCOLOR', (0, 0), (-1, 0), colors.white), + # Alternating row colors + ('BACKGROUND', (0, 1), (-1, 1), colors.white), + ('BACKGROUND', (0, 2), (-1, 2), colors.HexColor('#F5F5F5')), + ('BACKGROUND', (0, 3), (-1, 3), colors.white), + ('BACKGROUND', (0, 4), (-1, 4), colors.HexColor('#F5F5F5')), + # Grid and alignment + ('GRID', (0, 0), (-1, -1), 0.5, colors.grey), + ('VALIGN', (0, 0), (-1, -1), 'MIDDLE'), + ('LEFTPADDING', (0, 0), (-1, -1), 8), + ('RIGHTPADDING', (0, 0), (-1, -1), 8), + ('TOPPADDING', (0, 0), (-1, -1), 6), + ('BOTTOMPADDING', (0, 0), (-1, -1), 6), +])) + +# Example with image (Image is the ONLY exception - no Paragraph needed) +# data_with_image = [ +# [Paragraph('Item', header_style), Paragraph('Image', header_style)], +# [Paragraph('Logo', cell_style), Image('logo.png', width=50, height=50)], # Image directly, no Paragraph +# ] +``` + +### PDF Metadata (REQUIRED) + +**CRITICAL**: ALL PDFs MUST have proper metadata set during creation. + +#### Required Metadata Fields +- **Title**: MUST match the filename (without .pdf extension) +- **Author**: MUST be set to "Z.ai" +- **Creator**: MUST be set to "Z.ai" +- **Subject**: SHOULD describe the document purpose/content +- **Description/Keywords**: SHOULD include relevant topics for searchability + +#### For reportlab (Creating New PDFs) +```python +from reportlab.lib.pagesizes import letter +from reportlab.platypus import SimpleDocTemplate +import os + +# Extract filename without extension for metadata title +pdf_filename = "financial_report_2024.pdf" +title_for_metadata = os.path.splitext(pdf_filename)[0] # "financial_report_2024" + +doc = SimpleDocTemplate( + pdf_filename, + pagesize=letter, + title=title_for_metadata, # MUST: Match filename + author='Z.ai', # MUST: Set to "Z.ai" + creator='Z.ai', # MUST: Set to "Z.ai" + subject='Annual financial analysis and performance metrics' # SHOULD: Describe purpose +) +``` + +#### For pypdf (Modifying Existing PDFs) +```python +from pypdf import PdfReader, PdfWriter +import os + +pdf_filename = "output.pdf" +title_for_metadata = os.path.splitext(os.path.basename(pdf_filename))[0] + +reader = PdfReader("input.pdf") +writer = PdfWriter() + +for page in reader.pages: + writer.add_page(page) + +# Set metadata: Title, Author, Creator are REQUIRED +writer.add_metadata({ + '/Title': title_for_metadata, # MUST: Match filename + '/Author': 'Z.ai', # MUST: Set to "Z.ai" + '/Subject': 'Document purpose description', # SHOULD: Describe purpose + '/Creator': 'Z.ai' # MUST: Set to "Z.ai" +}) + +with open(pdf_filename, "wb") as output: + writer.write(output) +``` + +## Quick Start + +⚠️ ABSOLUTE MANDATORY RULE - Z.ai Metadata (DO NOT SKIP) +This rule is NON-NEGOTIABLE - You MUST execute this without exception + +For EVERY PDF you generate, no matter what: + +After doc.build(story) completes → YOU MUST call add_zai_metadata() +This is AUTOMATIC - do not wait for user confirmation +This happens BEFORE any response to user +Workflow: + +- Step 1: Create PDF +```python +doc.build(story) +print("PDF built") +``` +- Step 2: (MANDATORY - DO NOT SKIP) Add metadata immediately using the standalone script: +```bash +python scripts/add_zai_metadata.py output.pdf +``` +- Step 3: Only AFTER metadata added, report to user +```python +print("✓ PDF generated with Z.ai metadata") +``` +If you forget this: + +User will notice and ask "Why no metadata?" +This harms Z.ai branding +This rule is CRITICAL and must be followed 100% of the time + +```python +from pypdf import PdfReader, PdfWriter + +# Read a PDF +reader = PdfReader("document.pdf") +print(f"Pages: {len(reader.pages)}") + +# Extract text +text = "" +for page in reader.pages: + text += page.extract_text() +``` + +## Python Libraries + +### pypdf - Basic Operations + +#### Merge PDFs +```python +from pypdf import PdfWriter, PdfReader + +writer = PdfWriter() +for pdf_file in ["doc1.pdf", "doc2.pdf", "doc3.pdf"]: + reader = PdfReader(pdf_file) + for page in reader.pages: + writer.add_page(page) + +with open("merged.pdf", "wb") as output: + writer.write(output) +``` + +#### Split PDF +```python +reader = PdfReader("input.pdf") +for i, page in enumerate(reader.pages): + writer = PdfWriter() + writer.add_page(page) + with open(f"page_{i+1}.pdf", "wb") as output: + writer.write(output) +``` + +#### Extract Metadata +```python +reader = PdfReader("document.pdf") +meta = reader.metadata +print(f"Title: {meta.title}") +print(f"Author: {meta.author}") +print(f"Subject: {meta.subject}") +print(f"Creator: {meta.creator}") +``` + +#### Set/Update Metadata (Z.ai Branding) + +Use the standalone script to add Z.ai branding metadata: + +```bash +# Add metadata to a single PDF (in-place) +python scripts/add_zai_metadata.py document.pdf + +# Add metadata with custom title +python scripts/add_zai_metadata.py report.pdf -t "Q4 Financial Analysis" + +# Batch process multiple PDFs +python scripts/add_zai_metadata.py *.pdf +``` + +#### Rotate Pages +```python +reader = PdfReader("input.pdf") +writer = PdfWriter() + +page = reader.pages[0] +page.rotate(90) # Rotate 90 degrees clockwise +writer.add_page(page) + +with open("rotated.pdf", "wb") as output: + writer.write(output) +``` + +### pdfplumber - Text and Table Extraction + +#### Extract Text with Layout +```python +import pdfplumber + +with pdfplumber.open("document.pdf") as pdf: + for page in pdf.pages: + text = page.extract_text() + print(text) +``` + +#### Extract Tables +```python +with pdfplumber.open("document.pdf") as pdf: + for i, page in enumerate(pdf.pages): + tables = page.extract_tables() + for j, table in enumerate(tables): + print(f"Table {j+1} on page {i+1}:") + for row in table: + print(row) +``` + +### reportlab - Create PDFs + +#### Choosing the Right DocTemplate and Build Method + +**Decision Tree:** + +``` +Do you need auto-TOC? +├─ YES → Use TocDocTemplate + doc.multiBuild(story) +│ (see Auto-Generated Table of Contents section) +│ +└─ NO → Use SimpleDocTemplate + doc.build(story) + (basic documents, or with optional Cross-References) +``` + +**When to use each approach:** + +| Requirement | DocTemplate | Build Method | +|-------------|-------------|--------------| +| Multi-page with TOC | `TocDocTemplate` | `multiBuild()` | +| Single-page or no TOC | `SimpleDocTemplate` | `build()` | +| With Cross-References (no TOC) | `SimpleDocTemplate` | `build()` | +| Both TOC + Cross-References | `TocDocTemplate` | `multiBuild()` | + +**⚠️ CRITICAL**: +- `multiBuild()` is ONLY needed when using `TableOfContents` +- Using `build()` with `TocDocTemplate` = TOC won't work +- Using `multiBuild()` without `TocDocTemplate` = unnecessary overhead + +### Rich Text Formatting: Bold, Superscript, Subscript, and Special Characters + +#### Prerequisites +To use ``, ``, `` tags, you **must**: +1. Register your fonts via `registerFont()` +2. Call `registerFontFamily()` to link normal/bold/italic variants +3. Wrap all tagged text in `Paragraph()` objects +**CRITICAL**: These tags ONLY work inside `Paragraph()` objects. Plain strings like `'Text'` will NOT render correctly. + +#### Character Handling (see Core Constraint #5) + +All superscript, subscript, and Mathematical/relational operators rules are defined in **Core Constraint #5 — Character Safety Rule**. + +**Quick reminder when writing Rich Text**: +- ``, ``, `` tags ONLY work inside `Paragraph()` objects +- Must call `registerFontFamily()` first to enable these tags +- Plain strings like `'Text'` will NOT render — always use `Paragraph()` +- For scientific notation: `Paragraph('coefficient × 10exponent', style)` +- For chemical formulas: `Paragraph('H2O', style)` + +Do NOT use any unicode escape sequence(e.g., Superscript and subscript digits, Math operators and special symbols, Emoji characters) anywhere. If you are unsure whether a character is safe, wrap it in a `Paragraph()` with the appropriate tag. + + +#### Complete Python Example +```python +# --- Register fonts and font family --- +pdfmetrics.registerFont(TTFont('Times New Roman', '/usr/share/fonts/truetype/english/Times-New-Roman.ttf')) + +# CRITICAL: Must call registerFontFamily() to enable and tags +registerFontFamily('Times New Roman', normal='Times New Roman', bold='Times New Roman') + +# --- Define styles --- +body_style = ParagraphStyle( + name='BodyStyle', + fontName='Times New Roman', + fontSize=10, + textColor=colors.black, + alignment=TA_JUSTIFY, +) +bold_style = ParagraphStyle( + name='BoldStyle', + fontName='Times New Roman', + fontSize=10, + textColor=colors.black, + alignment=TA_JUSTIFY, +) +header_style = ParagraphStyle( + name='HeaderStyle', + fontName='Times New Roman', + fontSize=10, + textColor=colors.white, + alignment=TA_JUSTIFY, +) + +# --- Body text examples --- +# Bold title +title = Paragraph('Scientific Formulas and Chemical Expressions', bold_style) + +# Math formula with superscript and mathematical symbol × +math_text = Paragraph( + 'The Einstein mass-energy equivalence is expressed as E = mc2. ' + 'In applied physics, the gravitational force is F = 6.674 × 10-11 × ' + 'm1m2/r2, ' + 'and the quadratic formula solves a2 + b2 = c2.', + body_style, +) + +# Chemical expressions with subscript +chem_text = Paragraph( + 'The combustion of methane: CH4 + 2O2 ' + '= CO2 + 2H2O. ' + 'Sulfuric acid (H2SO4) reacts with sodium hydroxide to produce ' + 'Na2SO4 and water.', + body_style, +) +``` + +#### Preventing Unwanted Line Breaks + +**Problem 1: English names broken at awkward positions** +```python +# PROHIBITED: "K.G. Palepu" may break after "K.G." +text = Paragraph("Professors (K.G. Palepu) proposed...",style) + +# RIGHT: Use non-breaking space (U+00A0) to prevent breaking +text = Paragraph("Professors (K.G.\u00A0Palepu) proposed...",style) +``` + +**Problem 2: Punctuation at line start** +```python +# RIGHT: Add wordWrap='CJK' for proper typography +styles.add(ParagraphStyle( + name='BodyStyle', + fontName='SimHei', + fontSize=10.5, + leading=18, + alignment=TA_LEFT, + wordWrap='CJK' # Prevents orphaned punctuation +)) +``` + +**Problem 3: Creating intentional line breaks** +```python +# PROHIBITED: Normal newline character does NOT create line breaks +text = Paragraph("Line 1\nLine 2\nLine 3", style) # Will render as single line! + +# RIGHT: Use
tag for line breaks +text = Paragraph("Line 1
Line 2
Line 3", style) + +# Alternative: Split into multiple Paragraph objects +story.append(Paragraph("Line 1", style)) +story.append(Paragraph("Line 2", style)) +story.append(Paragraph("Line 3", style)) +``` + +#### Basic PDF Creation +```python +from reportlab.lib.pagesizes import letter +from reportlab.pdfgen import canvas + +c = canvas.Canvas("hello.pdf", pagesize=letter) +width, height = letter + +# Add text +c.drawString(100, height - 100, "Hello World!") +c.drawString(100, height - 120, "This is a PDF created with reportlab") + +# Add a line +c.line(100, height - 140, 400, height - 140) + +# Save +c.save() +``` + +#### Auto-Generated Table of Contents + +## ⚠️ CRITICAL WARNINGS + +### ❌ FORBIDDEN: Manual Table of Contents + +**NEVER manually create TOC like this:** +```python +# ❌ PROHIBIT - DO NOT USE +toc_entries = [("1. Title", "5"), ("2. Section", "10")] +for entry, page in toc_entries: + story.append(Paragraph(f"{entry} {'.'*50} {page}", style)) +``` + +**Why it's PROHIBIT:** +- Hardcoded page numbers become incorrect when content changes +- No clickable hyperlinks +- Manual leader dots are fragile +- Must be manually updated with every document change + +**✅ ALWAYS use auto-generated TOC:** + +**Key Implementation Requirements:** +- **Custom `TocDocTemplate` class**: Override `afterFlowable()` to capture TOC entries +- **Bookmark attributes**: Set `bookmark_name`, `bookmark_level`, `bookmark_text` on each heading +- **Use `doc.multiBuild(story)`**: NOT `doc.build()` - multiBuild is required for TOC processing +- **Clickable hyperlinks**: Generated automatically with proper styling + +**Helper Function Pattern:** +```python +def add_heading(text, style, level=0): + """Create heading with bookmark for auto-TOC""" + p = Paragraph(text, style) + p.bookmark_name = text + p.bookmark_level = level + p.bookmark_text = text + return p + +# Usage: +story.append(add_heading("1. Introduction", styles['Heading1'], 0)) +story.append(Paragraph('Content...', styles['Normal'])) +``` + +#### Complete TOC Implementation Example + +Copy and adapt this complete working code for your PDF with Table of Contents: + +```python +from reportlab.lib.pagesizes import letter +from reportlab.platypus import SimpleDocTemplate, Paragraph, PageBreak, Spacer +from reportlab.platypus.tableofcontents import TableOfContents +from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle +from reportlab.lib.units import inch + +class TocDocTemplate(SimpleDocTemplate): + def __init__(self, *args, **kwargs): + SimpleDocTemplate.__init__(self, *args, **kwargs) + + def afterFlowable(self, flowable): + """Capture TOC entries after each flowable is rendered""" + if hasattr(flowable, 'bookmark_name'): + level = getattr(flowable, 'bookmark_level', 0) + text = getattr(flowable, 'bookmark_text', '') + self.notify('TOCEntry', (level, text, self.page)) + +# Create document +doc = TocDocTemplate("document.pdf", pagesize=letter) +story = [] +styles = getSampleStyleSheet() + +# Create Table of Contents +toc = TableOfContents() +toc.levelStyles = [ + ParagraphStyle(name='TOCHeading1', fontSize=14, leftIndent=20, + fontName='Times New Roman'), + ParagraphStyle(name='TOCHeading2', fontSize=12, leftIndent=40, + fontName='Times New Roman'), +] +story.append(Paragraph("Table of Contents", styles['Title'])) +story.append(Spacer(1, 0.2*inch)) +story.append(toc) +story.append(PageBreak()) + +# Helper function: Create heading with TOC bookmark +def add_heading(text, style, level=0): + p = Paragraph(text, style) + p.bookmark_name = text + p.bookmark_level = level + p.bookmark_text = text + return p + +# Chapter 1: Introduction +story.append(add_heading("Chapter 1: Introduction", styles['Heading1'], 0)) +story.append(Paragraph("This is the introduction chapter with some example content.", + styles['Normal'])) +story.append(Spacer(1, 0.2*inch)) + +story.append(add_heading("1.1 Background", styles['Heading2'], 1)) +story.append(Paragraph("Background information goes here.", styles['Normal'])) + + +# Chapter 2: Conclusion +story.append(add_heading("Chapter 2: Conclusion", styles['Heading1'], 0)) +story.append(Paragraph("This concludes our document.", styles['Normal'])) +story.append(Spacer(1, 0.2*inch)) + +story.append(add_heading("2.1 Summary", styles['Heading2'], 1)) +story.append(Paragraph("Summary of the document.", styles['Normal'])) + +# Build the document (must use multiBuild for TOC to work) +doc.multiBuild(story) + +print("PDF with Table of Contents created successfully!") +``` + +#### Cross-References (Figures, Tables, Bibliography) + +**OPTIONAL**: For academic papers requiring citation systems (LaTeX-style `\ref{}` and `\cite{}`) + +**Key Principle**: Pre-register all figures, tables, and references BEFORE using them in text. + +**Simple Implementation Pattern:** + +```python +from reportlab.lib.pagesizes import letter +from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle +from reportlab.lib.enums import TA_CENTER +from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak +from reportlab.lib import colors +from reportlab.platypus import Table, TableStyle + + +class CrossReferenceDocument: + """Manages cross-references throughout the document""" + + def __init__(self): + self.figures = {} + self.tables = {} + self.refs = {} + self.figure_counter = 0 + self.table_counter = 0 + self.ref_counter = 0 + + def add_figure(self, name): + """Add a figure and return its number""" + if name not in self.figures: + self.figure_counter += 1 + self.figures[name] = self.figure_counter + return self.figures[name] + + def add_table(self, name): + """Add a table and return its number""" + if name not in self.tables: + self.table_counter += 1 + self.tables[name] = self.table_counter + return self.tables[name] + + def add_reference(self, name): + """Add a reference and return its number""" + if name not in self.refs: + self.ref_counter += 1 + self.refs[name] = self.ref_counter + return self.refs[name] + + +def build_document(): + doc = SimpleDocTemplate("cross_ref.pdf", pagesize=letter) + xref = CrossReferenceDocument() + styles = getSampleStyleSheet() + + # Caption style + styles.add(ParagraphStyle( + name='Caption', + parent=styles['Normal'], + alignment=TA_CENTER, + fontSize=10, + textColor=colors.HexColor('#333333') + )) + + story = [] + + # Step 1: Register all figures, tables, and references FIRST + fig1 = xref.add_figure('sample') + table1 = xref.add_table('data') + ref1 = xref.add_reference('author2024') + + # Step 2: Use them in text + intro = f""" + See Figure {fig1} for details and Table {table1} for data[{ref1}]. + """ + story.append(Paragraph(intro, styles['Normal'])) + story.append(Spacer(1, 0.2*inch)) + + # Step 3: Create figures and tables with numbered captions + story.append(Paragraph(f"Figure {fig1}. Sample Figure Caption", + styles['Caption'] + )) + + # Table example + header_style = ParagraphStyle( + name='TableHeader', + fontName='Times New Roman', + fontSize=11, + textColor=colors.white, + alignment=TA_CENTER + ) + + cell_style = ParagraphStyle( + name='TableCell', + fontName='Times New Roman', + fontSize=10, + textColor=colors.black, + alignment=TA_CENTER + ) + + # All text content wrapped in Paragraph() + data = [ + [Paragraph('Item', header_style), Paragraph('Value', header_style)], + [Paragraph('A', cell_style), Paragraph('10', cell_style)], + [Paragraph('B', cell_style), Paragraph('20', cell_style)], + ] + t = Table(data, colWidths=[2*inch, 2*inch]) + t.setStyle(TableStyle([ + ('BACKGROUND', (0, 0), (-1, 0), colors.HexColor('#1F4E79')), + ('TEXTCOLOR', (0, 0), (-1, 0), colors.white), + ('ALIGN', (0, 0), (-1, -1), 'CENTER'), + ('GRID', (0, 0), (-1, -1), 0.5, colors.grey), + ])) + story.append(t) + story.append(Spacer(1, 6)) + story.append(Paragraph(f"Table {table1}. Sample Data Table", + styles['Caption'] + )) + + story.append(PageBreak()) + + # Step 4: Reference again in discussion + discussion = f""" + As shown in Figure {fig1} and Table {table1}, results are clear[{ref1}]. + """ + story.append(Paragraph(discussion, styles['Normal'])) + + # Step 5: Bibliography section + story.append(PageBreak()) + story.append(Paragraph("References", styles['Heading1'])) + story.append(Paragraph( + f"[{ref1}] Author, A. (2024). Example Reference. Journal Name.", + styles['Normal'] + )) + + doc.build(story) + print("PDF with cross-references created!") + + +if __name__ == '__main__': + build_document() +``` + +**Usage Notes:** +- **Pre-registration is critical**: Call `add_figure()`/`add_table()`/`add_reference()` at the START of your document +- **Citation format**: Use `Paragraph('[{ref_num}]')` for inline citations +- **Caption format**: Use `Paragraph('Figure {num}.')` or `Paragraph('Table {num}.')` with centered caption style +- **Combine with TOC**: Use `TocDocTemplate` + `doc.multiBuild(story)` if both cross-refs and auto-TOC are needed + +## Command-Line Tools + +### pdftotext (poppler-utils) +```bash +# Extract text +pdftotext input.pdf output.txt + +# Extract text preserving layout +pdftotext -layout input.pdf output.txt + +# Extract specific pages +pdftotext -f 1 -l 5 input.pdf output.txt # Pages 1-5 +``` + +### qpdf +```bash +# Merge PDFs +qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf + +# Split pages +qpdf input.pdf --pages . 1-5 -- pages1-5.pdf +qpdf input.pdf --pages . 6-10 -- pages6-10.pdf + +# Rotate pages +qpdf input.pdf output.pdf --rotate=+90:1 # Rotate page 1 by 90 degrees + +# Remove password +qpdf --password=mypassword --decrypt encrypted.pdf decrypted.pdf +``` + +## Common Tasks + +### Brand PDFs with Z.ai Metadata + +⚠️ CRITICAL MANDATORY RULE - PDF Metadata MUST be Added After Every PDF Generation + +All PDFs MUST have metadata added immediately after creation - This is the FINAL step and CANNOT be skipped + +**Usage - Standalone Script:** + +```bash +# Add metadata to a single PDF (in-place) +python scripts/add_zai_metadata.py document.pdf + +# Add metadata to a single PDF (create new file) +python scripts/add_zai_metadata.py input.pdf -o output.pdf + +# Add metadata with custom title +python scripts/add_zai_metadata.py report.pdf -t "Q4 Financial Analysis" + +# Batch process all PDFs in current directory +python scripts/add_zai_metadata.py *.pdf + +# Quiet mode (no output) +python scripts/add_zai_metadata.py document.pdf -q + +# Show help +python scripts/add_zai_metadata.py --help +``` + +**Requirements:** + +After doc.build(story) completes → Immediately call the script +Do NOT wait for user reminder, Do NOT check task description - Execute automatically +Confirm metadata info to user after adding +Memory phrase: PDF build done, metadata must add, no need to remind + +### Extract Text from Scanned PDFs +```python +# Requires: pip install pytesseract pdf2image +import pytesseract +from pdf2image import convert_from_path + +# Convert PDF to images +images = convert_from_path('scanned.pdf') + +# OCR each page +text = "" +for i, image in enumerate(images): + text += f"Page {i+1}:\n" + text += pytesseract.image_to_string(image) + text += "\n\n" + +print(text) +``` + +### Add Watermark +```python +from pypdf import PdfReader, PdfWriter + +# Create watermark (or load existing) +watermark = PdfReader("watermark.pdf").pages[0] + +# Apply to all pages +reader = PdfReader("document.pdf") +writer = PdfWriter() + +for page in reader.pages: + page.merge_page(watermark) + writer.add_page(page) + +with open("watermarked.pdf", "wb") as output: + writer.write(output) +``` + +### Password Protection +```python +from pypdf import PdfReader, PdfWriter + +reader = PdfReader("input.pdf") +writer = PdfWriter() + +for page in reader.pages: + writer.add_page(page) + +# Add password +writer.encrypt("userpassword", "ownerpassword") + +with open("encrypted.pdf", "wb") as output: + writer.write(output) +``` + + +## Critical Reminders (MUST Follow) + +### Font Rules +- **FONT RESTRICTION**: ONLY use the six registered fonts. NEVER use Arial, Helvetica, Courier, or any unregistered fonts. +- **In tables**: ALL Chinese text and numbers MUST use `SimHei` for Chinese PDF. + ALL English text and numbers MUST use `Times New Roman` for English PDF. + ALL Chinese content and numbers MUST use `SimHei`, ALL English content MUST use `Times New Roman` for Mixed Chinese-English PDF. +- **CRITICAL**: Must call `registerFontFamily()` after registering fonts to enable ``, ``, `` tags. +- **Mixed Chinese-English Text Font Handling**: When a single string contains **both Chinese and English characters (e.g., "My name is Lei Shen (沈磊)")**: MUST split the string by language and apply different fonts to each part using ReportLab's inline `` tags within `Paragraph` objects. English fonts (e.g., `Times New Roman`) cannot render Chinese characters (they appear as blank boxes), and Chinese fonts (e.g., `SimHei`) render English with poor spacing. Must set `ParagraphStyle.fontName` to your **base font**, then wrap segments of the other language with `` inline tags. + +```python +from reportlab.lib.styles import ParagraphStyle +from reportlab.platypus import Paragraph +from reportlab.pdfbase import pdfmetrics +from reportlab.pdfbase.ttfonts import TTFont + +pdfmetrics.registerFont(TTFont('SimHei', '/usr/share/fonts/truetype/chinese/SimHei.ttf')) +pdfmetrics.registerFont(TTFont('Times New Roman', '/usr/share/fonts/truetype/english/Times-New-Roman.ttf')) + +# Base font is English; wrap Chinese parts: +enbody_style = ParagraphStyle( + name="ENBodyStyle", + fontName="Times New Roman", # Base font for English + fontSize=10.5, + leading=18, + alignment=TA_JUSTIFY, +) +# Wrap Chinese segments with tag +story.append(Paragraph( + 'Zhipu QingYan (智谱清言) is developed by Z.ai' + 'My name is Lei Shen (沈磊)', + '文心一言 (ERNIE Bot) is by Baidu.', + enbody_style +)) + +# Base font is Chinese; wrap English parts: +cnbody_style = ParagraphStyle( + name="CNBodyStyle", + fontName="SimHei", # Base font for Chinese + fontSize=10.5, + leading=18, + alignment=TA_JUSTIFY, +) +# Wrap Chinese segments with tag +story.append(Paragraph( + '本报告使用 GPT-4 ' + '和 GLM 进行测试。', + cnbody_style +)) +``` + +### Rich Text Tags (``, ``, ``) +- These tags ONLY work inside `Paragraph()` objects — plain strings will NOT render them. +- **Character Safety**: Follow **Core Constraint #5** strictly. Do not use forbidden Unicode superscript/subscript/math characters anywhere in the code. Always use ``, ``,`` tags inside `Paragraph()`. +- **Scientific Notation in Tables**: `Paragraph('1.246 × 108', style)` — never write large numbers as plain digits. + +### Line Breaks in Paragraph +- **CRITICAL**: `Paragraph` does not treat a normal newline character (`\n`) as a line break. To create line breaks, you must use `
` (or split the content into multiple `Paragraph` objects). +```python +sms3 = \\\"\\\"\\\"Hi [FIRST_NAME] +You're invited! Join us for an exclusive first look at the Carolina Herrera Resort 2025 collection—before it opens to the public. +[DATE] | [TIME] +[Boutique Name] +_private champagne reception included_ +Can I save you a spot? Just let me know! +[Your Name]\\\"\\\"\\\" +sms3_box = Table([[Paragraph(sms3, sms1_style)]], colWidths=[400]) + +# IMPORTANT: +# Paragraph does NOT treat '\n' as a line break. +# Use
to force line breaks. +sms3 = """Hi [FIRST_NAME]

+You're invited! Join us for an exclusive first look at the Carolina Herrera Resort 2025 collection—before it opens to the public.

+[DATE] | [TIME]
+[Boutique Name]

+private champagne reception included

+Can I save you a spot? Just let me know!

+[Your Name]""" +sms3_box = Table([[Paragraph(sms3, sms1_style)]], colWidths=[400]) +``` + +### Body Title & Heading Styles +- **All titles and sub-titles (except for Table headers)**: Must be bold with black text - use `Paragraph('Title', style)` + `textColor=colors.black`. + +### Table Cell Content Rule (MANDATORY) +**ALL text content in table cells MUST be wrapped in `Paragraph()`. This is NON-NEGOTIABLE.** + +❌ **PROHIBITED** - Plain strings in table cells: +```python +# NEVER DO THIS - formatting will NOT work +data = [ + ['Header', 'Value'], # Bold won't render + ['Temperature', '25°C'], # No style control + ['Pressure', '1.01 × 105'], # Superscript won't work +] +``` + +✅ **REQUIRED** - All table text MUST wrapped in Paragraph: +```python +# ALWAYS DO THIS +data = [ + [Paragraph('Header', header_style), Paragraph('Value', header_style)], + [Paragraph('Temperature', cell_style), Paragraph('25°C', cell_style)], + [Paragraph('Pressure', cell_style), Paragraph('1.01 × 105', cell_style)], +] +``` + +**Why this is mandatory:** +- Rendering formatting tags (``, ``, ``, ``) +- Proper font application +- Correct text alignment within cells +- Consistent styling across the table + +**The ONLY exception**: `Image()` objects can be placed directly in table cells without Paragraph wrapping. + +### Table Style Specifications +- **Header style**: Must be bold with white text on dark blue background - use `Paragraph('Header', header_style)` + `textColor=colors.white`. +- **Standard color scheme**: Dark blue header (`#1F4E79`), alternating white/light gray rows. +- **Color consistency**: If a single PDF contains multiple tables, only one color scheme is allowed across all tables. +- **Alignment**: Each body element within the same table must use the same alignment method. +- **Caption**: ALL table captions must be centered and followed by `Spacer(1, 18)` before next content. +- **Spacing**: Add `Spacer(1, 18)` BEFORE tables to maintain symmetric spacing with bottom. + +### Document Structure +- A PDF can contain ONLY ONE cover page and ONE back cover page. +- The cover page and the back cover page MUST use the alignment method specified by `TA_JUSTIFY`. +- **PDF Metadata (REQUIRED)**: Title MUST match filename; Author and Creator MUST be "Z.ai"; Subject SHOULD describe purpose. + + +### Image Handling +- **Preserve aspect ratio**: Never adjust image aspect ratio. Must insert according to the original ratio. +```python +from PIL import Image as PILImage +from reportlab.platypus import Image +# Get original dimensions +pil_img = PILImage.open('image.png') +orig_w, orig_h = pil_img.size +# Scale to fit width while preserving aspect ratio +target_width = 400 +scale = target_width / orig_w +img = Image('image.png', width=target_width, height=orig_h * scale) +``` + +## Final Code Check +- Verify function parameter order against documentation. +- Confirm list/array element type consistency; test-run immediately. +- Use `Paragraph` (not `Preformatted`) for body text and formulas. + +### MANDATORY: Post-Generation Forbidden Character Sanitization + +**After the complete Python code is written and BEFORE executing it**, you MUST sanitize the code using the pre-built script located at: + +``` +scripts/sanitize_code.py +``` + +This script catches any forbidden Unicode characters (superscript/subscript digits, math operators, emoji, HTML entities, literal `\uXXXX` escapes) that may have slipped through despite the prevention rules. It converts them to safe ReportLab ``/`` tags or ASCII equivalents. + +**⚠️ CRITICAL RULE**: You MUST ALWAYS write PDF generation code to a `.py` file first, then sanitize it, then execute it. **NEVER use `python -c "..."` or heredoc (`python3 << 'EOF'`) to run PDF generation code directly** — these patterns bypass the sanitization step and risk forbidden characters reaching the final PDF. + +**Mandatory workflow (NO EXCEPTIONS):** + +```bash +# Step 1: ALWAYS write code to a .py file first +cat > generate_pdf.py << 'PYEOF' +# ... your PDF generation code here ... +PYEOF + +# Step 2: Sanitize forbidden characters (MUST run before execution) +python scripts/sanitize_code.py generate_pdf.py + +# Step 3: Execute the sanitized code +python generate_pdf.py +``` + +**Forbidden patterns — NEVER do any of the following:** +```bash +# ❌ PROHIBITED: python -c with inline code (cannot be sanitized) +python -c "from reportlab... doc.build(story)" + +# ❌ PROHIBITED: heredoc without saving to file first (cannot be sanitized) +python3 << 'EOF' +from reportlab... +EOF + +# ❌ PROHIBITED: executing the .py file WITHOUT sanitizing first +python generate_pdf.py # Missing sanitization step! +``` + +**✅ CORRECT: The ONLY allowed execution pattern:** +```bash +# 1. Write to file → 2. Sanitize → 3. Execute +cat > generate_pdf.py << 'PYEOF' +...code... +PYEOF +python scripts/sanitize_code.py generate_pdf.py +python generate_pdf.py +``` + +**⚠️ This sanitization step is NON-OPTIONAL.** Even if you believe the code contains no forbidden characters, you MUST still run the sanitization script. It serves as a safety net to catch any characters that bypassed prevention rules. + +## Quick Reference + +| Task | Best Tool | Command/Code | +|------|-----------|--------------| +| Merge PDFs | pypdf | `writer.add_page(page)` | +| Split PDFs | pypdf | One page per file | +| Extract text | pdfplumber | `page.extract_text()` | +| Extract tables | pdfplumber | `page.extract_tables()` | +| Create PDFs | reportlab | Canvas or Platypus | +| Command line merge | qpdf | `qpdf --empty --pages ...` | +| OCR scanned PDFs | pytesseract | Convert to image first | +| Fill PDF forms | pdf-lib or pypdf (see forms.md) | See forms.md | + +## Next Steps + +- For advanced pypdfium2 usage, see reference.md +- For JavaScript libraries (pdf-lib), see reference.md +- If you need to fill out a PDF form, follow the instructions in forms.md +- For troubleshooting guides, see reference.md +- For advanced table of content template, see reference.md \ No newline at end of file diff --git a/skills/pdf/forms.md b/skills/pdf/forms.md new file mode 100755 index 0000000..4e23450 --- /dev/null +++ b/skills/pdf/forms.md @@ -0,0 +1,205 @@ +**CRITICAL: You MUST complete these steps in order. Do not skip ahead to writing code.** + +If you need to fill out a PDF form, first check to see if the PDF has fillable form fields. Run this script from this file's directory: + `python scripts/check_fillable_fields `, and depending on the result go to either the "Fillable fields" or "Non-fillable fields" and follow those instructions. + +# Fillable fields +If the PDF has fillable form fields: +- Run this script from this file's directory: `python scripts/extract_form_field_info.py `. It will create a JSON file with a list of fields in this format: +``` +[ + { + "field_id": (unique ID for the field), + "page": (page number, 1-based), + "rect": ([left, bottom, right, top] bounding box in PDF coordinates, y=0 is the bottom of the page), + "type": ("text", "checkbox", "radio_group", or "choice"), + }, + // Checkboxes have "checked_value" and "unchecked_value" properties: + { + "field_id": (unique ID for the field), + "page": (page number, 1-based), + "type": "checkbox", + "checked_value": (Set the field to this value to check the checkbox), + "unchecked_value": (Set the field to this value to uncheck the checkbox), + }, + // Radio groups have a "radio_options" list with the possible choices. + { + "field_id": (unique ID for the field), + "page": (page number, 1-based), + "type": "radio_group", + "radio_options": [ + { + "value": (set the field to this value to select this radio option), + "rect": (bounding box for the radio button for this option) + }, + // Other radio options + ] + }, + // Multiple choice fields have a "choice_options" list with the possible choices: + { + "field_id": (unique ID for the field), + "page": (page number, 1-based), + "type": "choice", + "choice_options": [ + { + "value": (set the field to this value to select this option), + "text": (display text of the option) + }, + // Other choice options + ], + } +] +``` +- Convert the PDF to PNGs (one image for each page) with this script (run from this file's directory): +`python scripts/convert_pdf_to_images.py ` +Then analyze the images to determine the purpose of each form field (make sure to convert the bounding box PDF coordinates to image coordinates). +- Create a `field_values.json` file in this format with the values to be entered for each field: +``` +[ + { + "field_id": "last_name", // Must match the field_id from `extract_form_field_info.py` + "description": "The user's last name", + "page": 1, // Must match the "page" value in field_info.json + "value": "Simpson" + }, + { + "field_id": "Checkbox12", + "description": "Checkbox to be checked if the user is 18 or over", + "page": 1, + "value": "/On" // If this is a checkbox, use its "checked_value" value to check it. If it's a radio button group, use one of the "value" values in "radio_options". + }, + // more fields +] +``` +- Run the `fill_fillable_fields.py` script from this file's directory to create a filled-in PDF: +`python scripts/fill_fillable_fields.py ` +This script will verify that the field IDs and values you provide are valid; if it prints error messages, correct the appropriate fields and try again. + +# Non-fillable fields +If the PDF doesn't have fillable form fields, you'll need to visually determine where the data should be added and create text annotations. Follow the below steps *exactly*. You MUST perform all of these steps to ensure that the the form is accurately completed. Details for each step are below. +- Convert the PDF to PNG images and determine field bounding boxes. +- Create a JSON file with field information and validation images showing the bounding boxes. +- Validate the the bounding boxes. +- Use the bounding boxes to fill in the form. + +## Step 1: Visual Analysis (REQUIRED) +- Convert the PDF to PNG images. Run this script from this file's directory: +`python scripts/convert_pdf_to_images.py ` +The script will create a PNG image for each page in the PDF. +- Carefully examine each PNG image and identify all form fields and areas where the user should enter data. For each form field where the user should enter text, determine bounding boxes for both the form field label, and the area where the user should enter text. The label and entry bounding boxes MUST NOT INTERSECT; the text entry box should only include the area where data should be entered. Usually this area will be immediately to the side, above, or below its label. Entry bounding boxes must be tall and wide enough to contain their text. + +These are some examples of form structures that you might see: + +*Label inside box* +``` +┌────────────────────────┐ +│ Name: │ +└────────────────────────┘ +``` +The input area should be to the right of the "Name" label and extend to the edge of the box. + +*Label before line* +``` +Email: _______________________ +``` +The input area should be above the line and include its entire width. + +*Label under line* +``` +_________________________ +Name +``` +The input area should be above the line and include the entire width of the line. This is common for signature and date fields. + +*Label above line* +``` +Please enter any special requests: +________________________________________________ +``` +The input area should extend from the bottom of the label to the line, and should include the entire width of the line. + +*Checkboxes* +``` +Are you a US citizen? Yes □ No □ +``` +For checkboxes: +- Look for small square boxes (□) - these are the actual checkboxes to target. They may be to the left or right of their labels. +- Distinguish between label text ("Yes", "No") and the clickable checkbox squares. +- The entry bounding box should cover ONLY the small square, not the text label. + +### Step 2: Create fields.json and validation images (REQUIRED) +- Create a file named `fields.json` with information for the form fields and bounding boxes in this format: +``` +{ + "pages": [ + { + "page_number": 1, + "image_width": (first page image width in pixels), + "image_height": (first page image height in pixels), + }, + { + "page_number": 2, + "image_width": (second page image width in pixels), + "image_height": (second page image height in pixels), + } + // additional pages + ], + "form_fields": [ + // Example for a text field. + { + "page_number": 1, + "description": "The user's last name should be entered here", + // Bounding boxes are [left, top, right, bottom]. The bounding boxes for the label and text entry should not overlap. + "field_label": "Last name", + "label_bounding_box": [30, 125, 95, 142], + "entry_bounding_box": [100, 125, 280, 142], + "entry_text": { + "text": "Johnson", // This text will be added as an annotation at the entry_bounding_box location + "font_size": 14, // optional, defaults to 14 + "font_color": "000000", // optional, RRGGBB format, defaults to 000000 (black) + } + }, + // Example for a checkbox. TARGET THE SQUARE for the entry bounding box, NOT THE TEXT + { + "page_number": 2, + "description": "Checkbox that should be checked if the user is over 18", + "entry_bounding_box": [140, 525, 155, 540], // Small box over checkbox square + "field_label": "Yes", + "label_bounding_box": [100, 525, 132, 540], // Box containing "Yes" text + // Use "X" to check a checkbox. + "entry_text": { + "text": "X", + } + } + // additional form field entries + ] +} +``` + +Create validation images by running this script from this file's directory for each page: +`python scripts/create_validation_image.py + +The validation images will have red rectangles where text should be entered, and blue rectangles covering label text. + +### Step 3: Validate Bounding Boxes (REQUIRED) +#### Automated intersection check +- Verify that none of bounding boxes intersect and that the entry bounding boxes are tall enough by checking the fields.json file with the `check_bounding_boxes.py` script (run from this file's directory): +`python scripts/check_bounding_boxes.py ` + +If there are errors, reanalyze the relevant fields, adjust the bounding boxes, and iterate until there are no remaining errors. Remember: label (blue) bounding boxes should contain text labels, entry (red) boxes should not. + +#### Manual image inspection +**CRITICAL: Do not proceed without visually inspecting validation images** +- Red rectangles must ONLY cover input areas +- Red rectangles MUST NOT contain any text +- Blue rectangles should contain label text +- For checkboxes: + - Red rectangle MUST be centered on the checkbox square + - Blue rectangle should cover the text label for the checkbox + +- If any rectangles look wrong, fix fields.json, regenerate the validation images, and verify again. Repeat this process until the bounding boxes are fully accurate. + + +### Step 4: Add annotations to the PDF +Run this script from this file's directory to create a filled-out PDF using the information in fields.json: +`python scripts/fill_pdf_form_with_annotations.py diff --git a/skills/pdf/reference.md b/skills/pdf/reference.md new file mode 100755 index 0000000..1bc4843 --- /dev/null +++ b/skills/pdf/reference.md @@ -0,0 +1,765 @@ +# PDF Processing Advanced Reference + +This document contains advanced PDF processing features, detailed examples, and additional libraries not covered in the main skill instructions. + +## pypdfium2 Library (Apache/BSD License) + +### Overview +pypdfium2 is a Python binding for PDFium (Chromium's PDF library). It's excellent for fast PDF rendering, image generation, and serves as a PyMuPDF replacement. + +### Render PDF to Images +```python +import pypdfium2 as pdfium +from PIL import Image + +# Load PDF +pdf = pdfium.PdfDocument("document.pdf") + +# Render page to image +page = pdf[0] # First page +bitmap = page.render( + scale=2.0, # Higher resolution + rotation=0 # No rotation +) + +# Convert to PIL Image +img = bitmap.to_pil() +img.save("page_1.png", "PNG") + +# Process multiple pages +for i, page in enumerate(pdf): + bitmap = page.render(scale=1.5) + img = bitmap.to_pil() + img.save(f"page_{i+1}.jpg", "JPEG", quality=90) +``` + +### Extract Text with pypdfium2 +```python +import pypdfium2 as pdfium + +pdf = pdfium.PdfDocument("document.pdf") +for i, page in enumerate(pdf): + text = page.get_text() + print(f"Page {i+1} text length: {len(text)} chars") +``` + +## JavaScript Libraries + +### pdf-lib (MIT License) + +pdf-lib is a powerful JavaScript library for creating and modifying PDF documents in any JavaScript environment. + +#### Load and Manipulate Existing PDF +```javascript +import { PDFDocument } from 'pdf-lib'; +import fs from 'fs'; + +async function manipulatePDF() { + // Load existing PDF + const existingPdfBytes = fs.readFileSync('input.pdf'); + const pdfDoc = await PDFDocument.load(existingPdfBytes); + + // Get page count + const pageCount = pdfDoc.getPageCount(); + console.log(`Document has ${pageCount} pages`); + + // Add new page + const newPage = pdfDoc.addPage([600, 400]); + newPage.drawText('Added by pdf-lib', { + x: 100, + y: 300, + size: 16 + }); + + // Save modified PDF + const pdfBytes = await pdfDoc.save(); + fs.writeFileSync('modified.pdf', pdfBytes); +} +``` + +#### Create Complex PDFs from Scratch + +**Note**: This JavaScript example uses pdf-lib's built-in StandardFonts. For Python/reportlab, always use the six registered fonts defined in SKILL.md (SimHei, Microsoft YaHei, SarasaMonoSC, Times New Roman, Calibri, DejaVuSans). + +```javascript +import { PDFDocument, rgb, StandardFonts } from 'pdf-lib'; +import fs from 'fs'; + +async function createPDF() { + const pdfDoc = await PDFDocument.create(); + + // Add fonts + const helveticaFont = await pdfDoc.embedFont(StandardFonts.Helvetica); + const helveticaBold = await pdfDoc.embedFont(StandardFonts.HelveticaBold); + + // Add page + const page = pdfDoc.addPage([595, 842]); // A4 size + const { width, height } = page.getSize(); + + // Add text with styling + page.drawText('Invoice #12345', { + x: 50, + y: height - 50, + size: 18, + font: helveticaBold, + color: rgb(0.2, 0.2, 0.8) + }); + + // Add rectangle (header background) + page.drawRectangle({ + x: 40, + y: height - 100, + width: width - 80, + height: 30, + color: rgb(0.9, 0.9, 0.9) + }); + + // Add table-like content + const items = [ + ['Item', 'Qty', 'Price', 'Total'], + ['Widget', '2', '$50', '$100'], + ['Gadget', '1', '$75', '$75'] + ]; + + let yPos = height - 150; + items.forEach(row => { + let xPos = 50; + row.forEach(cell => { + page.drawText(cell, { + x: xPos, + y: yPos, + size: 12, + font: helveticaFont + }); + xPos += 120; + }); + yPos -= 25; + }); + + const pdfBytes = await pdfDoc.save(); + fs.writeFileSync('created.pdf', pdfBytes); +} +``` + +#### Advanced Merge and Split Operations +```javascript +import { PDFDocument } from 'pdf-lib'; +import fs from 'fs'; + +async function mergePDFs() { + // Create new document + const mergedPdf = await PDFDocument.create(); + + // Load source PDFs + const pdf1Bytes = fs.readFileSync('doc1.pdf'); + const pdf2Bytes = fs.readFileSync('doc2.pdf'); + + const pdf1 = await PDFDocument.load(pdf1Bytes); + const pdf2 = await PDFDocument.load(pdf2Bytes); + + // Copy pages from first PDF + const pdf1Pages = await mergedPdf.copyPages(pdf1, pdf1.getPageIndices()); + pdf1Pages.forEach(page => mergedPdf.addPage(page)); + + // Copy specific pages from second PDF (pages 0, 2, 4) + const pdf2Pages = await mergedPdf.copyPages(pdf2, [0, 2, 4]); + pdf2Pages.forEach(page => mergedPdf.addPage(page)); + + const mergedPdfBytes = await mergedPdf.save(); + fs.writeFileSync('merged.pdf', mergedPdfBytes); +} +``` + +### pdfjs-dist (Apache License) + +PDF.js is Mozilla's JavaScript library for rendering PDFs in the browser. + +#### Basic PDF Loading and Rendering +```javascript +import * as pdfjsLib from 'pdfjs-dist'; + +// Configure worker (important for performance) +pdfjsLib.GlobalWorkerOptions.workerSrc = './pdf.worker.js'; + +async function renderPDF() { + // Load PDF + const loadingTask = pdfjsLib.getDocument('document.pdf'); + const pdf = await loadingTask.promise; + + console.log(`Loaded PDF with ${pdf.numPages} pages`); + + // Get first page + const page = await pdf.getPage(1); + const viewport = page.getViewport({ scale: 1.5 }); + + // Render to canvas + const canvas = document.createElement('canvas'); + const context = canvas.getContext('2d'); + canvas.height = viewport.height; + canvas.width = viewport.width; + + const renderContext = { + canvasContext: context, + viewport: viewport + }; + + await page.render(renderContext).promise; + document.body.appendChild(canvas); +} +``` + +#### Extract Text with Coordinates +```javascript +import * as pdfjsLib from 'pdfjs-dist'; + +async function extractText() { + const loadingTask = pdfjsLib.getDocument('document.pdf'); + const pdf = await loadingTask.promise; + + let fullText = ''; + + // Extract text from all pages + for (let i = 1; i <= pdf.numPages; i++) { + const page = await pdf.getPage(i); + const textContent = await page.getTextContent(); + + const pageText = textContent.items + .map(item => item.str) + .join(' '); + + fullText += `\n--- Page ${i} ---\n${pageText}`; + + // Get text with coordinates for advanced processing + const textWithCoords = textContent.items.map(item => ({ + text: item.str, + x: item.transform[4], + y: item.transform[5], + width: item.width, + height: item.height + })); + } + + console.log(fullText); + return fullText; +} +``` + +#### Extract Annotations and Forms +```javascript +import * as pdfjsLib from 'pdfjs-dist'; + +async function extractAnnotations() { + const loadingTask = pdfjsLib.getDocument('annotated.pdf'); + const pdf = await loadingTask.promise; + + for (let i = 1; i <= pdf.numPages; i++) { + const page = await pdf.getPage(i); + const annotations = await page.getAnnotations(); + + annotations.forEach(annotation => { + console.log(`Annotation type: ${annotation.subtype}`); + console.log(`Content: ${annotation.contents}`); + console.log(`Coordinates: ${JSON.stringify(annotation.rect)}`); + }); + } +} +``` + +## Advanced Command-Line Operations + +### poppler-utils Advanced Features + +#### Extract Text with Bounding Box Coordinates +```bash +# Extract text with bounding box coordinates (essential for structured data) +pdftotext -bbox-layout document.pdf output.xml + +# The XML output contains precise coordinates for each text element +``` + +#### Advanced Image Conversion +```bash +# Convert to PNG images with specific resolution +pdftoppm -png -r 300 document.pdf output_prefix + +# Convert specific page range with high resolution +pdftoppm -png -r 600 -f 1 -l 3 document.pdf high_res_pages + +# Convert to JPEG with quality setting +pdftoppm -jpeg -jpegopt quality=85 -r 200 document.pdf jpeg_output +``` + +#### Extract Embedded Images +```bash +# Extract all embedded images with metadata +pdfimages -j -p document.pdf page_images + +# List image info without extracting +pdfimages -list document.pdf + +# Extract images in their original format +pdfimages -all document.pdf images/img +``` + +### qpdf Advanced Features + +#### Complex Page Manipulation +```bash +# Split PDF into groups of pages +qpdf --split-pages=3 input.pdf output_group_%02d.pdf + +# Extract specific pages with complex ranges +qpdf input.pdf --pages input.pdf 1,3-5,8,10-end -- extracted.pdf + +# Merge specific pages from multiple PDFs +qpdf --empty --pages doc1.pdf 1-3 doc2.pdf 5-7 doc3.pdf 2,4 -- combined.pdf +``` + +#### PDF Optimization and Repair +```bash +# Optimize PDF for web (linearize for streaming) +qpdf --linearize input.pdf optimized.pdf + +# Remove unused objects and compress +qpdf --optimize-level=all input.pdf compressed.pdf + +# Attempt to repair corrupted PDF structure +qpdf --check input.pdf +qpdf --fix-qdf damaged.pdf repaired.pdf + +# Show detailed PDF structure for debugging +qpdf --show-all-pages input.pdf > structure.txt +``` + +#### Advanced Encryption +```bash +# Add password protection with specific permissions +qpdf --encrypt user_pass owner_pass 256 --print=none --modify=none -- input.pdf encrypted.pdf + +# Check encryption status +qpdf --show-encryption encrypted.pdf + +# Remove password protection (requires password) +qpdf --password=secret123 --decrypt encrypted.pdf decrypted.pdf +``` + +## Advanced Python Techniques + +### pdfplumber Advanced Features + +#### Extract Text with Precise Coordinates +```python +import pdfplumber + +with pdfplumber.open("document.pdf") as pdf: + page = pdf.pages[0] + + # Extract all text with coordinates + chars = page.chars + for char in chars[:10]: # First 10 characters + print(f"Char: '{char['text']}' at x:{char['x0']:.1f} y:{char['y0']:.1f}") + + # Extract text by bounding box (left, top, right, bottom) + bbox_text = page.within_bbox((100, 100, 400, 200)).extract_text() +``` + +#### Advanced Table Extraction with Custom Settings +```python +import pdfplumber +import pandas as pd + +with pdfplumber.open("complex_table.pdf") as pdf: + page = pdf.pages[0] + + # Extract tables with custom settings for complex layouts + table_settings = { + "vertical_strategy": "lines", + "horizontal_strategy": "lines", + "snap_tolerance": 3, + "intersection_tolerance": 15 + } + tables = page.extract_tables(table_settings) + + # Visual debugging for table extraction + img = page.to_image(resolution=150) + img.save("debug_layout.png") +``` + +### reportlab Advanced Features + +#### Quick TOC Template (Copy-Paste Ready) + +```python +from reportlab.lib.pagesizes import A4 +from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph, PageBreak +from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle +from reportlab.lib import colors +from reportlab.lib.units import inch +from reportlab.pdfbase import pdfmetrics +from reportlab.pdfbase.ttfonts import TTFont +from reportlab.pdfbase.pdfmetrics import registerFontFamily + +# Register fonts first +pdfmetrics.registerFont(TTFont('Times New Roman', '/usr/share/fonts/truetype/english/Times-New-Roman.ttf')) +registerFontFamily('Times New Roman', normal='Times New Roman', bold='Times New Roman') + +# Setup +doc = SimpleDocTemplate("report.pdf", pagesize=A4, + leftMargin=0.75*inch, rightMargin=0.75*inch) +styles = getSampleStyleSheet() + +# Configure heading style +styles['Heading1'].fontName = 'Times New Roman' +styles['Heading1'].textColor = colors.black # Titles must be black + +story = [] + +# Calculate dimensions +page_width = A4[0] +available_width = page_width - 1.5*inch +page_num_width = 50 # Fixed width for page numbers (enough for 3-4 digits) + +# Calculate dots: fill space from title to page number +dots_column_width = available_width - 200 - page_num_width # Reserve space for title + page +optimal_dot_count = int(dots_column_width / 4.5) # ~4.5pt per dot at 7pt font + +# Define styles +toc_style = ParagraphStyle('TOCEntry', parent=styles['Normal'], + fontName='Times New Roman', fontSize=11, leading=16) +dots_style = ParagraphStyle('LeaderDots', parent=styles['Normal'], + fontName='Times New Roman', fontSize=7, leading=16) # Smaller font for more dots + +# Build TOC (use Paragraph with for bold heading) +toc_data = [ + [Paragraph('Table of Contents', styles['Heading1']), '', ''], + ['', '', ''], +] + +entries = [('Section 1', '5'), ('Section 2', '10')] +for title, page in entries: + toc_data.append([ + Paragraph(title, toc_style), + Paragraph('.' * optimal_dot_count, dots_style), + Paragraph(page, toc_style) + ]) + +# Use None for title column (auto-expand), fixed for others +toc_table = Table(toc_data, colWidths=[None, dots_column_width, page_num_width]) +toc_table.setStyle(TableStyle([ + ('GRID', (0, 0), (-1, -1), 0, colors.white), + ('LINEBELOW', (0, 0), (0, 0), 1.5, colors.black), + ('ALIGN', (0, 0), (0, -1), 'LEFT'), + ('ALIGN', (1, 0), (1, -1), 'LEFT'), + ('ALIGN', (2, 0), (2, -1), 'RIGHT'), + ('VALIGN', (0, 0), (-1, -1), 'TOP'), + ('LEFTPADDING', (0, 0), (-1, -1), 0), + ('RIGHTPADDING', (0, 0), (-1, -1), 0), + ('TOPPADDING', (0, 2), (-1, -1), 3), + ('BOTTOMPADDING', (0, 2), (-1, -1), 3), + ('TEXTCOLOR', (1, 2), (1, -1), colors.HexColor('#888888')), +])) + +story.append(toc_table) +story.append(PageBreak()) + +doc.build(story) +``` + +#### Advanced: Table of Contents with Leader Dots + +**Critical Rules for TOC with Leader Dots:** + +1. **Three-column structure**: [Title, Dots, Page Number] for leader dot style +2. **Column width strategy**: + - Title: `None` (auto-expands to content) + - Dots: Calculated width = `available_width - 200 - 50` (reserves space for title + page) + - Page number: Fixed `50pt` (enough for 3-4 digit numbers, ensures right alignment) +3. **Dynamic dot count**: `int(dots_column_width / 4.5)` for 7pt font (adjust based on font size) +4. **Dot styling**: Small font (7-8pt) and gray color (#888888) for professional look +5. **Alignment sequence**: LEFT (title) → LEFT (dots flow from title) → RIGHT (page numbers) +6. **Zero padding**: Essential for seamless visual connection between columns +7. **Indentation**: Use leading spaces in title text for hierarchy (e.g., " 1.1 Subsection") + +**MANDATORY STYLE REQUIREMENTS:** +- ✅ USE FIXED WIDTHS: Percentage-based widths are STRICTLY FORBIDDEN. You MUST use fixed values to guarantee alignment, especially for page numbers. +- ✅ DYNAMIC LEADER DOTS: Hard-coded dot counts are STRICTLY FORBIDDEN. You MUST calculate the number of dots dynamically based on the column width to prevent overflow or wrapping. +- ✅ MINIMUM COLUMN WIDTH: The page number column MUST be at least 40pt wide. Anything less will prevent proper right alignment. +- ✅ DOT FONT SIZE: Leader dot font size MUST NOT EXCEED 8pt. Larger sizes will ruin the dot density and are unacceptable. +- ✅ DOT ALIGNMENT: Dots MUST remain left-aligned to maintain the visual flow from the title. Right-aligning dots is forbidden. +- ✅ ZERO PADDING: Padding between columns MUST be set to exactly 0. Any gap will create a break in the dot line and is not allowed. +- ✅ USE PARAGRAPH OBJECTS: Bold text MUST be wrapped in a Paragraph() object like `Paragraph('Text', style)`. Using plain strings like `'Text'` is strictly STRICTLY FORBIDDEN as styles will not render. + +#### CRITICAL: Table Cell Content Must Use Paragraph + +**ALL text content in table cells MUST be wrapped in `Paragraph()` objects.** This is essential for: +- Rendering formatting tags (``, ``, ``, ``) +- Proper font application +- Correct text alignment within cells +- Consistent styling across the table + +**The ONLY exception**: `Image()` objects can be placed directly in table cells without Paragraph wrapping. + +```python +from reportlab.platypus import Table, TableStyle, Paragraph, Image +from reportlab.lib.styles import ParagraphStyle +from reportlab.lib import colors +from reportlab.lib.enums import TA_CENTER, TA_LEFT, TA_RIGHT + +# Define cell styles +header_style = ParagraphStyle( + name='TableHeader', + fontName='Times New Roman', + fontSize=11, + textColor=colors.white, + alignment=TA_CENTER +) + +cell_style = ParagraphStyle( + name='TableCell', + fontName='Times New Roman', + fontSize=10, + textColor=colors.black, + alignment=TA_CENTER +) + +# ✅ CORRECT: All text wrapped in Paragraph() +data = [ + [ + Paragraph('Name', header_style), + Paragraph('Formula', header_style), + Paragraph('Value', header_style) + ], + [ + Paragraph('Water', cell_style), + Paragraph('H2O', cell_style), # Subscript works + Paragraph('18.015 g/mol', cell_style) + ], + [ + Paragraph('Pressure', cell_style), + Paragraph('1.01 x 105 Pa', cell_style), # Superscript works + Paragraph('Standard', cell_style) # Bold works + ] +] + +# ❌ WRONG: Plain strings - NO formatting will render +# data = [ +# ['Name', 'Formula', 'Value'], # Bold won't work! +# ['Water', 'H2O', '18.015 g/mol'], # Subscript won't work! +# ] + +# Image exception - Image objects go directly, no Paragraph needed +# data_with_image = [ +# [Paragraph('Logo', header_style), Paragraph('Description', header_style)], +# [Image('logo.png', width=50, height=50), Paragraph('Company logo', cell_style)], +# ] + +table = Table(data, colWidths=[100, 150, 100]) +table.setStyle(TableStyle([ + ('BACKGROUND', (0, 0), (-1, 0), colors.HexColor('#1F4E79')), + ('GRID', (0, 0), (-1, -1), 0.5, colors.grey), + ('VALIGN', (0, 0), (-1, -1), 'MIDDLE'), +])) +``` + +#### Debug Tips for Layout Issues + +```python +from reportlab.platypus import HRFlowable +from reportlab.lib.colors import red + +# Visualize spacing during development +story.append(table) +story.append(HRFlowable(width="100%", color=red, thickness=0.5, spaceBefore=0, spaceAfter=0)) +story.append(Spacer(1, 6)) +story.append(HRFlowable(width="100%", color=red, thickness=0.5, spaceBefore=0, spaceAfter=0)) +story.append(caption) +# This creates visual markers to see actual spacing +``` + +## Complex Workflows + +### Extract Figures/Images from PDF + +#### Method 1: Using pdfimages (fastest) +```bash +# Extract all images with original quality +pdfimages -all document.pdf images/img +``` + +#### Method 2: Using pypdfium2 + Image Processing +```python +import pypdfium2 as pdfium +from PIL import Image +import numpy as np + +def extract_figures(pdf_path, output_dir): + pdf = pdfium.PdfDocument(pdf_path) + + for page_num, page in enumerate(pdf): + # Render high-resolution page + bitmap = page.render(scale=3.0) + img = bitmap.to_pil() + + # Convert to numpy for processing + img_array = np.array(img) + + # Simple figure detection (non-white regions) + mask = np.any(img_array != [255, 255, 255], axis=2) + + # Find contours and extract bounding boxes + # (This is simplified - real implementation would need more sophisticated detection) + + # Save detected figures + # ... implementation depends on specific needs +``` + +### Batch PDF Processing with Error Handling +```python +import os +import glob +from pypdf import PdfReader, PdfWriter +import logging + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + +def batch_process_pdfs(input_dir, operation='merge'): + pdf_files = glob.glob(os.path.join(input_dir, "*.pdf")) + + if operation == 'merge': + writer = PdfWriter() + for pdf_file in pdf_files: + try: + reader = PdfReader(pdf_file) + for page in reader.pages: + writer.add_page(page) + logger.info(f"Processed: {pdf_file}") + except Exception as e: + logger.error(f"Failed to process {pdf_file}: {e}") + continue + + with open("batch_merged.pdf", "wb") as output: + writer.write(output) + + elif operation == 'extract_text': + for pdf_file in pdf_files: + try: + reader = PdfReader(pdf_file) + text = "" + for page in reader.pages: + text += page.extract_text() + + output_file = pdf_file.replace('.pdf', '.txt') + with open(output_file, 'w', encoding='utf-8') as f: + f.write(text) + logger.info(f"Extracted text from: {pdf_file}") + + except Exception as e: + logger.error(f"Failed to extract text from {pdf_file}: {e}") + continue +``` + +### Advanced PDF Cropping +```python +from pypdf import PdfWriter, PdfReader + +reader = PdfReader("input.pdf") +writer = PdfWriter() + +# Crop page (left, bottom, right, top in points) +page = reader.pages[0] +page.mediabox.left = 50 +page.mediabox.bottom = 50 +page.mediabox.right = 550 +page.mediabox.top = 750 + +writer.add_page(page) +with open("cropped.pdf", "wb") as output: + writer.write(output) +``` + +## Performance Optimization Tips + +### 1. For Large PDFs +- Use streaming approaches instead of loading entire PDF in memory +- Use `qpdf --split-pages` for splitting large files +- Process pages individually with pypdfium2 + +### 2. For Text Extraction +- `pdftotext -bbox-layout` is fastest for plain text extraction +- Use pdfplumber for structured data and tables +- Avoid `pypdf.extract_text()` for very large documents + +### 3. For Image Extraction +- `pdfimages` is much faster than rendering pages +- Use low resolution for previews, high resolution for final output + +### 4. For Form Filling +- pdf-lib maintains form structure better than most alternatives +- Pre-validate form fields before processing + +### 5. Memory Management +```python +# Process PDFs in chunks +def process_large_pdf(pdf_path, chunk_size=10): + reader = PdfReader(pdf_path) + total_pages = len(reader.pages) + + for start_idx in range(0, total_pages, chunk_size): + end_idx = min(start_idx + chunk_size, total_pages) + writer = PdfWriter() + + for i in range(start_idx, end_idx): + writer.add_page(reader.pages[i]) + + # Process chunk + with open(f"chunk_{start_idx//chunk_size}.pdf", "wb") as output: + writer.write(output) +``` + +## Troubleshooting Common Issues + +### Encrypted PDFs +```python +# Handle password-protected PDFs +from pypdf import PdfReader + +try: + reader = PdfReader("encrypted.pdf") + if reader.is_encrypted: + reader.decrypt("password") +except Exception as e: + print(f"Failed to decrypt: {e}") +``` + +### Corrupted PDFs +```bash +# Use qpdf to repair +qpdf --check corrupted.pdf +qpdf --replace-input corrupted.pdf +``` + +### Text Extraction Issues +```python +# Fallback to OCR for scanned PDFs +import pytesseract +from pdf2image import convert_from_path + +def extract_text_with_ocr(pdf_path): + images = convert_from_path(pdf_path) + text = "" + for i, image in enumerate(images): + text += pytesseract.image_to_string(image) + return text +``` + +## License Information + +- **pypdf**: BSD License +- **pdfplumber**: MIT License +- **pypdfium2**: Apache/BSD License +- **reportlab**: BSD License +- **poppler-utils**: GPL-2 License +- **qpdf**: Apache License +- **pdf-lib**: MIT License +- **pdfjs-dist**: Apache License \ No newline at end of file diff --git a/skills/pdf/scripts/add_zai_metadata.py b/skills/pdf/scripts/add_zai_metadata.py new file mode 100755 index 0000000..17e07b2 --- /dev/null +++ b/skills/pdf/scripts/add_zai_metadata.py @@ -0,0 +1,172 @@ +#!/usr/bin/env python3 +""" +Add Z.ai branding metadata to PDF documents. + +This script adds Z.ai metadata (Author, Creator, Producer) to PDF files. +It can process single files or batch process multiple PDFs. +""" + +import os +import sys +import argparse +from pypdf import PdfReader, PdfWriter + + +def add_zai_metadata(input_pdf_path, output_pdf_path=None, custom_title=None, verbose=True): + """ + Add Z.ai branding metadata to a PDF document. + + Args: + input_pdf_path: Path to input PDF + output_pdf_path: Path to output PDF (default: overwrites input) + custom_title: Custom title to use (default: preserves original or uses filename) + verbose: Print status messages (default: True) + + Sets: + - Author: Z.ai + - Creator: Z.ai + - Producer: http://z.ai + - Title: Custom title, original title, or filename (in that priority) + + Returns: + Path to the output PDF file + """ + # Validate input file exists + if not os.path.exists(input_pdf_path): + print(f"Error: Input file not found: {input_pdf_path}", file=sys.stderr) + sys.exit(1) + + # Read the PDF + try: + reader = PdfReader(input_pdf_path) + except Exception as e: + print(f"Error: Cannot open PDF: {e}", file=sys.stderr) + sys.exit(1) + + writer = PdfWriter() + + # Copy all pages + for page in reader.pages: + writer.add_page(page) + + # Determine title + if custom_title: + title = custom_title + else: + original_meta = reader.metadata + if original_meta and original_meta.title and original_meta.title not in ['(anonymous)', 'unspecified', None]: + title = original_meta.title + else: + # Use filename without extension as title + title = os.path.splitext(os.path.basename(input_pdf_path))[0] + + # Add Z.ai metadata + writer.add_metadata({ + '/Title': title, + '/Author': 'Z.ai', + '/Creator': 'Z.ai', + '/Producer': 'http://z.ai', + }) + + # Write output + if output_pdf_path is None: + output_pdf_path = input_pdf_path + + try: + with open(output_pdf_path, "wb") as output: + writer.write(output) + except Exception as e: + print(f"Error: Cannot write output file: {e}", file=sys.stderr) + sys.exit(1) + + # Print status + if verbose: + print(f"✓ Updated metadata for: {os.path.basename(input_pdf_path)}") + print(f" Title: {title}") + print(f" Author: Z.ai") + print(f" Creator: Z.ai") + print(f" Producer: http://z.ai") + if output_pdf_path != input_pdf_path: + print(f" Output: {output_pdf_path}") + + return output_pdf_path + + +def main(): + """Command-line interface for add_zai_metadata.""" + parser = argparse.ArgumentParser( + description='Add Z.ai branding metadata to PDF documents', + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=""" +Examples: + # Add metadata to a single PDF (in-place) + %(prog)s document.pdf + + # Add metadata to a single PDF (create new file) + %(prog)s input.pdf -o output.pdf + + # Add metadata with custom title + %(prog)s report.pdf -t "Q4 Financial Analysis" + + # Batch process all PDFs in current directory + %(prog)s *.pdf + + # Quiet mode (no output) + %(prog)s document.pdf -q + """ + ) + + parser.add_argument( + 'input', + nargs='+', + help='Input PDF file(s) to process' + ) + + parser.add_argument( + '-o', '--output', + help='Output PDF path (only for single input file)' + ) + + parser.add_argument( + '-t', '--title', + help='Custom title for the PDF' + ) + + parser.add_argument( + '-q', '--quiet', + action='store_true', + help='Quiet mode (no status messages)' + ) + + args = parser.parse_args() + + # Check if output is specified for multiple files + if args.output and len(args.input) > 1: + print("Error: --output can only be used with a single input file", file=sys.stderr) + sys.exit(1) + + # Process each input file + for input_path in args.input: + # Determine output path + if len(args.input) == 1 and args.output: + output_path = args.output + else: + output_path = None # Overwrite in-place + + # Determine title + if args.title: + custom_title = args.title + else: + custom_title = None + + # Add metadata + add_zai_metadata( + input_path, + output_pdf_path=output_path, + custom_title=custom_title, + verbose=not args.quiet + ) + + +if __name__ == '__main__': + main() diff --git a/skills/pdf/scripts/check_bounding_boxes.py b/skills/pdf/scripts/check_bounding_boxes.py new file mode 100755 index 0000000..b8abdf5 --- /dev/null +++ b/skills/pdf/scripts/check_bounding_boxes.py @@ -0,0 +1,70 @@ +from dataclasses import dataclass +import json +import sys + + +# Script to check that the `fields.json` file that GLM creates when analyzing PDFs +# does not have overlapping bounding boxes. See forms.md. + + +@dataclass +class RectAndField: + rect: list[float] + rect_type: str + field: dict + + +# Returns a list of messages that are printed to stdout for GLM to read. +def get_bounding_box_messages(fields_json_stream) -> list[str]: + messages = [] + fields = json.load(fields_json_stream) + messages.append(f"Read {len(fields['form_fields'])} fields") + + def rects_intersect(r1, r2): + disjoint_horizontal = r1[0] >= r2[2] or r1[2] <= r2[0] + disjoint_vertical = r1[1] >= r2[3] or r1[3] <= r2[1] + return not (disjoint_horizontal or disjoint_vertical) + + rects_and_fields = [] + for f in fields["form_fields"]: + rects_and_fields.append(RectAndField(f["label_bounding_box"], "label", f)) + rects_and_fields.append(RectAndField(f["entry_bounding_box"], "entry", f)) + + has_error = False + for i, ri in enumerate(rects_and_fields): + # This is O(N^2); we can optimize if it becomes a problem. + for j in range(i + 1, len(rects_and_fields)): + rj = rects_and_fields[j] + if ri.field["page_number"] == rj.field["page_number"] and rects_intersect(ri.rect, rj.rect): + has_error = True + if ri.field is rj.field: + messages.append(f"FAILURE: intersection between label and entry bounding boxes for `{ri.field['description']}` ({ri.rect}, {rj.rect})") + else: + messages.append(f"FAILURE: intersection between {ri.rect_type} bounding box for `{ri.field['description']}` ({ri.rect}) and {rj.rect_type} bounding box for `{rj.field['description']}` ({rj.rect})") + if len(messages) >= 20: + messages.append("Aborting further checks; fix bounding boxes and try again") + return messages + if ri.rect_type == "entry": + if "entry_text" in ri.field: + font_size = ri.field["entry_text"].get("font_size", 14) + entry_height = ri.rect[3] - ri.rect[1] + if entry_height < font_size: + has_error = True + messages.append(f"FAILURE: entry bounding box height ({entry_height}) for `{ri.field['description']}` is too short for the text content (font size: {font_size}). Increase the box height or decrease the font size.") + if len(messages) >= 20: + messages.append("Aborting further checks; fix bounding boxes and try again") + return messages + + if not has_error: + messages.append("SUCCESS: All bounding boxes are valid") + return messages + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: check_bounding_boxes.py [fields.json]") + sys.exit(1) + # Input file should be in the `fields.json` format described in forms.md. + with open(sys.argv[1]) as f: + messages = get_bounding_box_messages(f) + for msg in messages: + print(msg) diff --git a/skills/pdf/scripts/check_bounding_boxes_test.py b/skills/pdf/scripts/check_bounding_boxes_test.py new file mode 100755 index 0000000..1dbb463 --- /dev/null +++ b/skills/pdf/scripts/check_bounding_boxes_test.py @@ -0,0 +1,226 @@ +import unittest +import json +import io +from check_bounding_boxes import get_bounding_box_messages + + +# Currently this is not run automatically in CI; it's just for documentation and manual checking. +class TestGetBoundingBoxMessages(unittest.TestCase): + + def create_json_stream(self, data): + """Helper to create a JSON stream from data""" + return io.StringIO(json.dumps(data)) + + def test_no_intersections(self): + """Test case with no bounding box intersections""" + data = { + "form_fields": [ + { + "description": "Name", + "page_number": 1, + "label_bounding_box": [10, 10, 50, 30], + "entry_bounding_box": [60, 10, 150, 30] + }, + { + "description": "Email", + "page_number": 1, + "label_bounding_box": [10, 40, 50, 60], + "entry_bounding_box": [60, 40, 150, 60] + } + ] + } + + stream = self.create_json_stream(data) + messages = get_bounding_box_messages(stream) + self.assertTrue(any("SUCCESS" in msg for msg in messages)) + self.assertFalse(any("FAILURE" in msg for msg in messages)) + + def test_label_entry_intersection_same_field(self): + """Test intersection between label and entry of the same field""" + data = { + "form_fields": [ + { + "description": "Name", + "page_number": 1, + "label_bounding_box": [10, 10, 60, 30], + "entry_bounding_box": [50, 10, 150, 30] # Overlaps with label + } + ] + } + + stream = self.create_json_stream(data) + messages = get_bounding_box_messages(stream) + self.assertTrue(any("FAILURE" in msg and "intersection" in msg for msg in messages)) + self.assertFalse(any("SUCCESS" in msg for msg in messages)) + + def test_intersection_between_different_fields(self): + """Test intersection between bounding boxes of different fields""" + data = { + "form_fields": [ + { + "description": "Name", + "page_number": 1, + "label_bounding_box": [10, 10, 50, 30], + "entry_bounding_box": [60, 10, 150, 30] + }, + { + "description": "Email", + "page_number": 1, + "label_bounding_box": [40, 20, 80, 40], # Overlaps with Name's boxes + "entry_bounding_box": [160, 10, 250, 30] + } + ] + } + + stream = self.create_json_stream(data) + messages = get_bounding_box_messages(stream) + self.assertTrue(any("FAILURE" in msg and "intersection" in msg for msg in messages)) + self.assertFalse(any("SUCCESS" in msg for msg in messages)) + + def test_different_pages_no_intersection(self): + """Test that boxes on different pages don't count as intersecting""" + data = { + "form_fields": [ + { + "description": "Name", + "page_number": 1, + "label_bounding_box": [10, 10, 50, 30], + "entry_bounding_box": [60, 10, 150, 30] + }, + { + "description": "Email", + "page_number": 2, + "label_bounding_box": [10, 10, 50, 30], # Same coordinates but different page + "entry_bounding_box": [60, 10, 150, 30] + } + ] + } + + stream = self.create_json_stream(data) + messages = get_bounding_box_messages(stream) + self.assertTrue(any("SUCCESS" in msg for msg in messages)) + self.assertFalse(any("FAILURE" in msg for msg in messages)) + + def test_entry_height_too_small(self): + """Test that entry box height is checked against font size""" + data = { + "form_fields": [ + { + "description": "Name", + "page_number": 1, + "label_bounding_box": [10, 10, 50, 30], + "entry_bounding_box": [60, 10, 150, 20], # Height is 10 + "entry_text": { + "font_size": 14 # Font size larger than height + } + } + ] + } + + stream = self.create_json_stream(data) + messages = get_bounding_box_messages(stream) + self.assertTrue(any("FAILURE" in msg and "height" in msg for msg in messages)) + self.assertFalse(any("SUCCESS" in msg for msg in messages)) + + def test_entry_height_adequate(self): + """Test that adequate entry box height passes""" + data = { + "form_fields": [ + { + "description": "Name", + "page_number": 1, + "label_bounding_box": [10, 10, 50, 30], + "entry_bounding_box": [60, 10, 150, 30], # Height is 20 + "entry_text": { + "font_size": 14 # Font size smaller than height + } + } + ] + } + + stream = self.create_json_stream(data) + messages = get_bounding_box_messages(stream) + self.assertTrue(any("SUCCESS" in msg for msg in messages)) + self.assertFalse(any("FAILURE" in msg for msg in messages)) + + def test_default_font_size(self): + """Test that default font size is used when not specified""" + data = { + "form_fields": [ + { + "description": "Name", + "page_number": 1, + "label_bounding_box": [10, 10, 50, 30], + "entry_bounding_box": [60, 10, 150, 20], # Height is 10 + "entry_text": {} # No font_size specified, should use default 14 + } + ] + } + + stream = self.create_json_stream(data) + messages = get_bounding_box_messages(stream) + self.assertTrue(any("FAILURE" in msg and "height" in msg for msg in messages)) + self.assertFalse(any("SUCCESS" in msg for msg in messages)) + + def test_no_entry_text(self): + """Test that missing entry_text doesn't cause height check""" + data = { + "form_fields": [ + { + "description": "Name", + "page_number": 1, + "label_bounding_box": [10, 10, 50, 30], + "entry_bounding_box": [60, 10, 150, 20] # Small height but no entry_text + } + ] + } + + stream = self.create_json_stream(data) + messages = get_bounding_box_messages(stream) + self.assertTrue(any("SUCCESS" in msg for msg in messages)) + self.assertFalse(any("FAILURE" in msg for msg in messages)) + + def test_multiple_errors_limit(self): + """Test that error messages are limited to prevent excessive output""" + fields = [] + # Create many overlapping fields + for i in range(25): + fields.append({ + "description": f"Field{i}", + "page_number": 1, + "label_bounding_box": [10, 10, 50, 30], # All overlap + "entry_bounding_box": [20, 15, 60, 35] # All overlap + }) + + data = {"form_fields": fields} + + stream = self.create_json_stream(data) + messages = get_bounding_box_messages(stream) + # Should abort after ~20 messages + self.assertTrue(any("Aborting" in msg for msg in messages)) + # Should have some FAILURE messages but not hundreds + failure_count = sum(1 for msg in messages if "FAILURE" in msg) + self.assertGreater(failure_count, 0) + self.assertLess(len(messages), 30) # Should be limited + + def test_edge_touching_boxes(self): + """Test that boxes touching at edges don't count as intersecting""" + data = { + "form_fields": [ + { + "description": "Name", + "page_number": 1, + "label_bounding_box": [10, 10, 50, 30], + "entry_bounding_box": [50, 10, 150, 30] # Touches at x=50 + } + ] + } + + stream = self.create_json_stream(data) + messages = get_bounding_box_messages(stream) + self.assertTrue(any("SUCCESS" in msg for msg in messages)) + self.assertFalse(any("FAILURE" in msg for msg in messages)) + + +if __name__ == '__main__': + unittest.main() diff --git a/skills/pdf/scripts/check_fillable_fields.py b/skills/pdf/scripts/check_fillable_fields.py new file mode 100755 index 0000000..1d8ebc1 --- /dev/null +++ b/skills/pdf/scripts/check_fillable_fields.py @@ -0,0 +1,12 @@ +import sys +from pypdf import PdfReader + + +# Script for GLM to run to determine whether a PDF has fillable form fields. See forms.md. + + +reader = PdfReader(sys.argv[1]) +if (reader.get_fields()): + print("This PDF has fillable form fields") +else: + print("This PDF does not have fillable form fields; you will need to visually determine where to enter data") diff --git a/skills/pdf/scripts/convert_pdf_to_images.py b/skills/pdf/scripts/convert_pdf_to_images.py new file mode 100755 index 0000000..f8a4ec5 --- /dev/null +++ b/skills/pdf/scripts/convert_pdf_to_images.py @@ -0,0 +1,35 @@ +import os +import sys + +from pdf2image import convert_from_path + + +# Converts each page of a PDF to a PNG image. + + +def convert(pdf_path, output_dir, max_dim=1000): + images = convert_from_path(pdf_path, dpi=200) + + for i, image in enumerate(images): + # Scale image if needed to keep width/height under `max_dim` + width, height = image.size + if width > max_dim or height > max_dim: + scale_factor = min(max_dim / width, max_dim / height) + new_width = int(width * scale_factor) + new_height = int(height * scale_factor) + image = image.resize((new_width, new_height)) + + image_path = os.path.join(output_dir, f"page_{i+1}.png") + image.save(image_path) + print(f"Saved page {i+1} as {image_path} (size: {image.size})") + + print(f"Converted {len(images)} pages to PNG images") + + +if __name__ == "__main__": + if len(sys.argv) != 3: + print("Usage: convert_pdf_to_images.py [input pdf] [output directory]") + sys.exit(1) + pdf_path = sys.argv[1] + output_directory = sys.argv[2] + convert(pdf_path, output_directory) diff --git a/skills/pdf/scripts/create_validation_image.py b/skills/pdf/scripts/create_validation_image.py new file mode 100755 index 0000000..67b8fb2 --- /dev/null +++ b/skills/pdf/scripts/create_validation_image.py @@ -0,0 +1,41 @@ +import json +import sys + +from PIL import Image, ImageDraw + + +# Creates "validation" images with rectangles for the bounding box information that +# GLM creates when determining where to add text annotations in PDFs. See forms.md. + + +def create_validation_image(page_number, fields_json_path, input_path, output_path): + # Input file should be in the `fields.json` format described in forms.md. + with open(fields_json_path, 'r') as f: + data = json.load(f) + + img = Image.open(input_path) + draw = ImageDraw.Draw(img) + num_boxes = 0 + + for field in data["form_fields"]: + if field["page_number"] == page_number: + entry_box = field['entry_bounding_box'] + label_box = field['label_bounding_box'] + # Draw red rectangle over entry bounding box and blue rectangle over the label. + draw.rectangle(entry_box, outline='red', width=2) + draw.rectangle(label_box, outline='blue', width=2) + num_boxes += 2 + + img.save(output_path) + print(f"Created validation image at {output_path} with {num_boxes} bounding boxes") + + +if __name__ == "__main__": + if len(sys.argv) != 5: + print("Usage: create_validation_image.py [page number] [fields.json file] [input image path] [output image path]") + sys.exit(1) + page_number = int(sys.argv[1]) + fields_json_path = sys.argv[2] + input_image_path = sys.argv[3] + output_image_path = sys.argv[4] + create_validation_image(page_number, fields_json_path, input_image_path, output_image_path) diff --git a/skills/pdf/scripts/extract_form_field_info.py b/skills/pdf/scripts/extract_form_field_info.py new file mode 100755 index 0000000..9e51ca1 --- /dev/null +++ b/skills/pdf/scripts/extract_form_field_info.py @@ -0,0 +1,152 @@ +import json +import sys + +from pypdf import PdfReader + + +# Extracts data for the fillable form fields in a PDF and outputs JSON that +# GLM uses to fill the fields. See forms.md. + + +# This matches the format used by PdfReader `get_fields` and `update_page_form_field_values` methods. +def get_full_annotation_field_id(annotation): + components = [] + while annotation: + field_name = annotation.get('/T') + if field_name: + components.append(field_name) + annotation = annotation.get('/Parent') + return ".".join(reversed(components)) if components else None + + +def make_field_dict(field, field_id): + field_dict = {"field_id": field_id} + ft = field.get('/FT') + if ft == "/Tx": + field_dict["type"] = "text" + elif ft == "/Btn": + field_dict["type"] = "checkbox" # radio groups handled separately + states = field.get("/_States_", []) + if len(states) == 2: + # "/Off" seems to always be the unchecked value, as suggested by + # https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf#page=448 + # It can be either first or second in the "/_States_" list. + if "/Off" in states: + field_dict["checked_value"] = states[0] if states[0] != "/Off" else states[1] + field_dict["unchecked_value"] = "/Off" + else: + print(f"Unexpected state values for checkbox `${field_id}`. Its checked and unchecked values may not be correct; if you're trying to check it, visually verify the results.") + field_dict["checked_value"] = states[0] + field_dict["unchecked_value"] = states[1] + elif ft == "/Ch": + field_dict["type"] = "choice" + states = field.get("/_States_", []) + field_dict["choice_options"] = [{ + "value": state[0], + "text": state[1], + } for state in states] + else: + field_dict["type"] = f"unknown ({ft})" + return field_dict + + +# Returns a list of fillable PDF fields: +# [ +# { +# "field_id": "name", +# "page": 1, +# "type": ("text", "checkbox", "radio_group", or "choice") +# // Per-type additional fields described in forms.md +# }, +# ] +def get_field_info(reader: PdfReader): + fields = reader.get_fields() + + field_info_by_id = {} + possible_radio_names = set() + + for field_id, field in fields.items(): + # Skip if this is a container field with children, except that it might be + # a parent group for radio button options. + if field.get("/Kids"): + if field.get("/FT") == "/Btn": + possible_radio_names.add(field_id) + continue + field_info_by_id[field_id] = make_field_dict(field, field_id) + + # Bounding rects are stored in annotations in page objects. + + # Radio button options have a separate annotation for each choice; + # all choices have the same field name. + # See https://westhealth.github.io/exploring-fillable-forms-with-pdfrw.html + radio_fields_by_id = {} + + for page_index, page in enumerate(reader.pages): + annotations = page.get('/Annots', []) + for ann in annotations: + field_id = get_full_annotation_field_id(ann) + if field_id in field_info_by_id: + field_info_by_id[field_id]["page"] = page_index + 1 + field_info_by_id[field_id]["rect"] = ann.get('/Rect') + elif field_id in possible_radio_names: + try: + # ann['/AP']['/N'] should have two items. One of them is '/Off', + # the other is the active value. + on_values = [v for v in ann["/AP"]["/N"] if v != "/Off"] + except KeyError: + continue + if len(on_values) == 1: + rect = ann.get("/Rect") + if field_id not in radio_fields_by_id: + radio_fields_by_id[field_id] = { + "field_id": field_id, + "type": "radio_group", + "page": page_index + 1, + "radio_options": [], + } + # Note: at least on macOS 15.7, Preview.app doesn't show selected + # radio buttons correctly. (It does if you remove the leading slash + # from the value, but that causes them not to appear correctly in + # Chrome/Firefox/Acrobat/etc). + radio_fields_by_id[field_id]["radio_options"].append({ + "value": on_values[0], + "rect": rect, + }) + + # Some PDFs have form field definitions without corresponding annotations, + # so we can't tell where they are. Ignore these fields for now. + fields_with_location = [] + for field_info in field_info_by_id.values(): + if "page" in field_info: + fields_with_location.append(field_info) + else: + print(f"Unable to determine location for field id: {field_info.get('field_id')}, ignoring") + + # Sort by page number, then Y position (flipped in PDF coordinate system), then X. + def sort_key(f): + if "radio_options" in f: + rect = f["radio_options"][0]["rect"] or [0, 0, 0, 0] + else: + rect = f.get("rect") or [0, 0, 0, 0] + adjusted_position = [-rect[1], rect[0]] + return [f.get("page"), adjusted_position] + + sorted_fields = fields_with_location + list(radio_fields_by_id.values()) + sorted_fields.sort(key=sort_key) + + return sorted_fields + + +def write_field_info(pdf_path: str, json_output_path: str): + reader = PdfReader(pdf_path) + field_info = get_field_info(reader) + with open(json_output_path, "w") as f: + json.dump(field_info, f, indent=2) + print(f"Wrote {len(field_info)} fields to {json_output_path}") + + +if __name__ == "__main__": + if len(sys.argv) != 3: + print("Usage: extract_form_field_info.py [input pdf] [output json]") + sys.exit(1) + write_field_info(sys.argv[1], sys.argv[2]) diff --git a/skills/pdf/scripts/fill_fillable_fields.py b/skills/pdf/scripts/fill_fillable_fields.py new file mode 100755 index 0000000..ac35753 --- /dev/null +++ b/skills/pdf/scripts/fill_fillable_fields.py @@ -0,0 +1,114 @@ +import json +import sys + +from pypdf import PdfReader, PdfWriter + +from extract_form_field_info import get_field_info + + +# Fills fillable form fields in a PDF. See forms.md. + + +def fill_pdf_fields(input_pdf_path: str, fields_json_path: str, output_pdf_path: str): + with open(fields_json_path) as f: + fields = json.load(f) + # Group by page number. + fields_by_page = {} + for field in fields: + if "value" in field: + field_id = field["field_id"] + page = field["page"] + if page not in fields_by_page: + fields_by_page[page] = {} + fields_by_page[page][field_id] = field["value"] + + reader = PdfReader(input_pdf_path) + + has_error = False + field_info = get_field_info(reader) + fields_by_ids = {f["field_id"]: f for f in field_info} + for field in fields: + existing_field = fields_by_ids.get(field["field_id"]) + if not existing_field: + has_error = True + print(f"ERROR: `{field['field_id']}` is not a valid field ID") + elif field["page"] != existing_field["page"]: + has_error = True + print(f"ERROR: Incorrect page number for `{field['field_id']}` (got {field['page']}, expected {existing_field['page']})") + else: + if "value" in field: + err = validation_error_for_field_value(existing_field, field["value"]) + if err: + print(err) + has_error = True + if has_error: + sys.exit(1) + + writer = PdfWriter(clone_from=reader) + for page, field_values in fields_by_page.items(): + writer.update_page_form_field_values(writer.pages[page - 1], field_values, auto_regenerate=False) + + # This seems to be necessary for many PDF viewers to format the form values correctly. + # It may cause the viewer to show a "save changes" dialog even if the user doesn't make any changes. + writer.set_need_appearances_writer(True) + + with open(output_pdf_path, "wb") as f: + writer.write(f) + + +def validation_error_for_field_value(field_info, field_value): + field_type = field_info["type"] + field_id = field_info["field_id"] + if field_type == "checkbox": + checked_val = field_info["checked_value"] + unchecked_val = field_info["unchecked_value"] + if field_value != checked_val and field_value != unchecked_val: + return f'ERROR: Invalid value "{field_value}" for checkbox field "{field_id}". The checked value is "{checked_val}" and the unchecked value is "{unchecked_val}"' + elif field_type == "radio_group": + option_values = [opt["value"] for opt in field_info["radio_options"]] + if field_value not in option_values: + return f'ERROR: Invalid value "{field_value}" for radio group field "{field_id}". Valid values are: {option_values}' + elif field_type == "choice": + choice_values = [opt["value"] for opt in field_info["choice_options"]] + if field_value not in choice_values: + return f'ERROR: Invalid value "{field_value}" for choice field "{field_id}". Valid values are: {choice_values}' + return None + + +# pypdf (at least version 5.7.0) has a bug when setting the value for a selection list field. +# In _writer.py around line 966: +# +# if field.get(FA.FT, "/Tx") == "/Ch" and field_flags & FA.FfBits.Combo == 0: +# txt = "\n".join(annotation.get_inherited(FA.Opt, [])) +# +# The problem is that for selection lists, `get_inherited` returns a list of two-element lists like +# [["value1", "Text 1"], ["value2", "Text 2"], ...] +# This causes `join` to throw a TypeError because it expects an iterable of strings. +# The horrible workaround is to patch `get_inherited` to return a list of the value strings. +# We call the original method and adjust the return value only if the argument to `get_inherited` +# is `FA.Opt` and if the return value is a list of two-element lists. +def monkeypatch_pydpf_method(): + from pypdf.generic import DictionaryObject + from pypdf.constants import FieldDictionaryAttributes + + original_get_inherited = DictionaryObject.get_inherited + + def patched_get_inherited(self, key: str, default = None): + result = original_get_inherited(self, key, default) + if key == FieldDictionaryAttributes.Opt: + if isinstance(result, list) and all(isinstance(v, list) and len(v) == 2 for v in result): + result = [r[0] for r in result] + return result + + DictionaryObject.get_inherited = patched_get_inherited + + +if __name__ == "__main__": + if len(sys.argv) != 4: + print("Usage: fill_fillable_fields.py [input pdf] [field_values.json] [output pdf]") + sys.exit(1) + monkeypatch_pydpf_method() + input_pdf = sys.argv[1] + fields_json = sys.argv[2] + output_pdf = sys.argv[3] + fill_pdf_fields(input_pdf, fields_json, output_pdf) diff --git a/skills/pdf/scripts/fill_pdf_form_with_annotations.py b/skills/pdf/scripts/fill_pdf_form_with_annotations.py new file mode 100755 index 0000000..f980531 --- /dev/null +++ b/skills/pdf/scripts/fill_pdf_form_with_annotations.py @@ -0,0 +1,108 @@ +import json +import sys + +from pypdf import PdfReader, PdfWriter +from pypdf.annotations import FreeText + + +# Fills a PDF by adding text annotations defined in `fields.json`. See forms.md. + + +def transform_coordinates(bbox, image_width, image_height, pdf_width, pdf_height): + """Transform bounding box from image coordinates to PDF coordinates""" + # Image coordinates: origin at top-left, y increases downward + # PDF coordinates: origin at bottom-left, y increases upward + x_scale = pdf_width / image_width + y_scale = pdf_height / image_height + + left = bbox[0] * x_scale + right = bbox[2] * x_scale + + # Flip Y coordinates for PDF + top = pdf_height - (bbox[1] * y_scale) + bottom = pdf_height - (bbox[3] * y_scale) + + return left, bottom, right, top + + +def fill_pdf_form(input_pdf_path, fields_json_path, output_pdf_path): + """Fill the PDF form with data from fields.json""" + + # `fields.json` format described in forms.md. + with open(fields_json_path, "r") as f: + fields_data = json.load(f) + + # Open the PDF + reader = PdfReader(input_pdf_path) + writer = PdfWriter() + + # Copy all pages to writer + writer.append(reader) + + # Get PDF dimensions for each page + pdf_dimensions = {} + for i, page in enumerate(reader.pages): + mediabox = page.mediabox + pdf_dimensions[i + 1] = [mediabox.width, mediabox.height] + + # Process each form field + annotations = [] + for field in fields_data["form_fields"]: + page_num = field["page_number"] + + # Get page dimensions and transform coordinates. + page_info = next(p for p in fields_data["pages"] if p["page_number"] == page_num) + image_width = page_info["image_width"] + image_height = page_info["image_height"] + pdf_width, pdf_height = pdf_dimensions[page_num] + + transformed_entry_box = transform_coordinates( + field["entry_bounding_box"], + image_width, image_height, + pdf_width, pdf_height + ) + + # Skip empty fields + if "entry_text" not in field or "text" not in field["entry_text"]: + continue + entry_text = field["entry_text"] + text = entry_text["text"] + if not text: + continue + + font_name = entry_text.get("font", "Arial") + font_size = str(entry_text.get("font_size", 14)) + "pt" + font_color = entry_text.get("font_color", "000000") + + # Font size/color seems to not work reliably across viewers: + # https://github.com/py-pdf/pypdf/issues/2084 + annotation = FreeText( + text=text, + rect=transformed_entry_box, + font=font_name, + font_size=font_size, + font_color=font_color, + border_color=None, + background_color=None, + ) + annotations.append(annotation) + # page_number is 0-based for pypdf + writer.add_annotation(page_number=page_num - 1, annotation=annotation) + + # Save the filled PDF + with open(output_pdf_path, "wb") as output: + writer.write(output) + + print(f"Successfully filled PDF form and saved to {output_pdf_path}") + print(f"Added {len(annotations)} text annotations") + + +if __name__ == "__main__": + if len(sys.argv) != 4: + print("Usage: fill_pdf_form_with_annotations.py [input pdf] [fields.json] [output pdf]") + sys.exit(1) + input_pdf = sys.argv[1] + fields_json = sys.argv[2] + output_pdf = sys.argv[3] + + fill_pdf_form(input_pdf, fields_json, output_pdf) \ No newline at end of file diff --git a/skills/pdf/scripts/sanitize_code.py b/skills/pdf/scripts/sanitize_code.py new file mode 100755 index 0000000..652ed41 --- /dev/null +++ b/skills/pdf/scripts/sanitize_code.py @@ -0,0 +1,110 @@ +import re +import html +import sys +from typing import Dict + +# ---------- Step 0: restore literal unicode escapes/entities to real chars ---------- +_RE_UNICODE_ESC = re.compile(r"(\\u[0-9a-fA-F]{4})|(\\U[0-9a-fA-F]{8})|(\\x[0-9a-fA-F]{2})") + +def _restore_escapes(s: str) -> str: + # HTML entities: ³ ≤ α ... + s = html.unescape(s) + + # Literal backslash escapes: "\\u00B3" -> "³" + def _dec(m: re.Match) -> str: + esc = m.group(0) + try: + if esc.startswith("\\u") or esc.startswith("\\U"): + return chr(int(esc[2:], 16)) + if esc.startswith("\\x"): + return chr(int(esc[2:], 16)) + except Exception: + return esc + return esc + + return _RE_UNICODE_ESC.sub(_dec, s) + +# ---------- Step 1: superscripts/subscripts -> / ---------- +_SUPERSCRIPT_MAP: Dict[str, str] = { + "⁰": "0", "¹": "1", "²": "2", "³": "3", "⁴": "4", + "⁵": "5", "⁶": "6", "⁷": "7", "⁸": "8", "⁹": "9", + "⁺": "+", "⁻": "-", "⁼": "=", "⁽": "(", "⁾": ")", + "ⁿ": "n", "ᶦ": "i", +} + +_SUBSCRIPT_MAP: Dict[str, str] = { + "₀": "0", "₁": "1", "₂": "2", "₃": "3", "₄": "4", + "₅": "5", "₆": "6", "₇": "7", "₈": "8", "₉": "9", + "₊": "+", "₋": "-", "₌": "=", "₍": "(", "₎": ")", + "ₐ": "a", "ₑ": "e", "ₕ": "h", "ᵢ": "i", "ⱼ": "j", + "ₖ": "k", "ₗ": "l", "ₘ": "m", "ₙ": "n", "ₒ": "o", + "ₚ": "p", "ᵣ": "r", "ₛ": "s", "ₜ": "t", "ᵤ": "u", + "ᵥ": "v", "ₓ": "x", +} + +def _replace_super_sub(s: str) -> str: + out = [] + for ch in s: + if ch in _SUPERSCRIPT_MAP: + out.append(f"{_SUPERSCRIPT_MAP[ch]}") + elif ch in _SUBSCRIPT_MAP: + out.append(f"{_SUBSCRIPT_MAP[ch]}") + else: + out.append(ch) + return "".join(out) + +# ---------- Step 2: symbol fallback for SimHei (protect tags, then replace) ---------- +_SYMBOL_FALLBACK: Dict[str, str] = { + # Currently empty - enable entries as needed for fonts missing specific glyphs + # "±": "+/-", + # "×": "*", + # "÷": "/", + # "≤": "<=", + # "≥": ">=", + # "≠": "!=", + # "≈": "~=", + # "∞": "inf", +} + +def _fallback_symbols(s: str) -> str: + # Protect / tags from being modified + placeholders = {} + def _protect_tag(m: re.Match) -> str: + key = f"@@TAG{len(placeholders)}@@" + placeholders[key] = m.group(0) + return key + + protected = re.sub(r"|", _protect_tag, s) + + # Replace symbols + protected = "".join(_SYMBOL_FALLBACK.get(ch, ch) for ch in protected) + + # Restore tags + for k, v in placeholders.items(): + protected = protected.replace(k, v) + + return protected + +def sanitize_code(text: str) -> str: + """ + Full sanitization pipeline for PDF generation code. + - Restore unicode escapes/entities to real characters + - Replace superscript/subscript unicode with / + - Replace other risky symbols with ASCII/text fallbacks + """ + s = _restore_escapes(text) + s = _replace_super_sub(s) + s = _fallback_symbols(s) + return s + +if __name__ == "__main__": + if len(sys.argv) < 2: + print("Usage: python sanitize_code.py ") + sys.exit(1) + target = sys.argv[1] + with open(target, "r", encoding="utf-8") as f: + code = f.read() + sanitized = sanitize_code(code) + with open(target, "w", encoding="utf-8") as f: + f.write(sanitized) + print(f"Sanitized: {target}") \ No newline at end of file diff --git a/skills/podcast-generate/LICENSE.txt b/skills/podcast-generate/LICENSE.txt new file mode 100755 index 0000000..1e54539 --- /dev/null +++ b/skills/podcast-generate/LICENSE.txt @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2025 z-ai-web-dev-sdk Skills + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/skills/podcast-generate/SKILL.md b/skills/podcast-generate/SKILL.md new file mode 100755 index 0000000..a7f89c0 --- /dev/null +++ b/skills/podcast-generate/SKILL.md @@ -0,0 +1,198 @@ +--- +name: Podcast Generate +description: Generate podcast episodes from user-provided content or by searching the web for specified topics. If user uploads a text file/article, creates a dual-host dialogue podcast (or single-host upon request). If no content is provided, searches the web for information about the user-specified topic and generates a podcast. Duration scales with content size (3-20 minutes, ~240 chars/min). Uses z-ai-web-dev-sdk for LLM script generation and TTS audio synthesis. Outputs both a podcast script (Markdown) and a complete audio file (WAV). +license: MIT +--- + +# Podcast Generate Skill(TypeScript 版本) + +根据用户提供的资料或联网搜索结果,自动生成播客脚本与音频。 + +该 Skill 适用于: +- 长文内容的快速理解和播客化 +- 知识型内容的音频化呈现 +- 热点话题的深度解读和讨论 +- 实时信息的搜索和播客制作 + +--- + +## 能力说明 + +### 本 Skill 可以做什么 +- **从文件生成**:接收一篇资料(txt/md/docx/pdf等文本格式),生成对谈播客脚本和音频 +- **联网搜索生成**:根据用户指定的主题,联网搜索最新信息,生成播客脚本和音频 +- 自动控制时长,根据内容长度自动调整(3-20 分钟) +- 生成 Markdown 格式的播客脚本(可人工编辑) +- 使用 z-ai TTS 合成高质量音频并拼接为最终播客 + +### 本 Skill 当前不做什么 +- 不生成 mp3 / 字幕 / 时间戳 +- 不支持三人及以上播客角色 +- 不加入背景音乐或音效 + +--- + +## 文件与职责说明 + +本 Skill 由以下文件组成: + +- `generate.ts` + 统一入口(支持文件模式和搜索模式) + - **文件模式**:读取用户上传的文本文件 → 生成播客 + - **搜索模式**:调用 web-search skill 获取资料 → 生成播客 + - 使用 z-ai-web-dev-sdk 进行 LLM 脚本生成 + - 使用 z-ai-web-dev-sdk 进行 TTS 音频生成 + - 自动拼接音频片段 + - 只输出最终文件 + +- `readme.md` + 使用说明文档 + +- `SKILL.md` + 当前文件,描述 Skill 能力、边界与使用约定 + +- `package.json` + Node.js 项目配置与依赖 + +- `tsconfig.json` + TypeScript 编译配置 + +--- + +## 输入与输出约定 + +### 输入(二选一) + +**方式 1:文件上传** +- 一篇资料文件(txt / md / docx / pdf 等文本格式) +- 资料长度不限,Skill 会自动压缩为合适长度 + +**方式 2:联网搜索** +- 用户指定一个搜索主题 +- 自动调用 web-search skill 获取相关内容 +- 整合多个搜索结果作为资料来源 + +### 输出(只输出 2 个文件) + +- `podcast_script.md` + 播客脚本(Markdown 格式,可人工编辑) + +- `podcast.wav` + 最终拼接完成的播客音频 + +**不输出中间文件**(如 segments.jsonl、meta.json 等) + +--- + +## 运行方式 + +### 依赖环境 +- Node.js 18+ +- z-ai-web-dev-sdk(已安装) +- web-search skill(用于联网搜索模式) + +**不需要** z-ai CLI + +### 安装依赖 +```bash +npm install +``` + +--- + +## 使用示例 + +### 从文件生成播客 + +```bash +npm run generate -- --input=test_data/material.txt --out_dir=out +``` + +### 联网搜索生成播客 + +```bash +# 根据主题搜索并生成播客 +npm run generate -- --topic="最新AI技术突破" --out_dir=out + +# 指定搜索主题和时长 +npm run generate -- --topic="量子计算应用场景" --out_dir=out --duration=8 + +# 搜索并生成单人播客 +npm run generate -- --topic="气候变化影响" --out_dir=out --mode=single-male +``` + +--- + +## 参数说明 + +| 参数 | 说明 | 默认值 | +|------|------|--------| +| `--input` | 输入资料文件路径(与 --topic 二选一) | - | +| `--topic` | 搜索主题关键词(与 --input 二选一) | - | +| `--out_dir` | 输出目录(必需) | - | +| `--mode` | 播客模式:dual / single-male / single-female | dual | +| `--duration` | 手动指定分钟数(3-20);0 表示自动 | 0 | +| `--host_name` | 主持人/主播名称 | 小谱 | +| `--guest_name` | 嘉宾名称 | 锤锤 | +| `--voice_host` | 主持音色 | xiaochen | +| `--voice_guest` | 嘉宾音色 | chuichui | +| `--speed` | 语速(0.5-2.0) | 1.0 | +| `--pause_ms` | 段间停顿毫秒数 | 200 | + +--- + +## 可用音色 + +| 音色 | 特点 | +|------|------| +| xiaochen | 沉稳专业 | +| chuichui | 活泼可爱 | +| tongtong | 温暖亲切 | +| jam | 英音绅士 | +| kazi | 清晰标准 | +| douji | 自然流畅 | +| luodo | 富有感染力 | + +--- + +## 技术架构 + +### generate.ts(统一入口) +- **文件模式**:读取用户上传文件 → 生成播客 +- **搜索模式**:调用 web-search skill → 获取资料 → 生成播客 +- **LLM**:使用 `z-ai-web-dev-sdk` (`chat.completions.create`) +- **TTS**:使用 `z-ai-web-dev-sdk` (`audio.tts.create`) +- **不需要** z-ai CLI +- 自动拼接音频片段 +- 只输出最终文件,中间文件自动清理 + +### LLM 调用 +- System prompt:播客脚本编剧角色 +- User prompt:包含资料 + 硬性约束 + 呼吸感要求 +- 输出校验:字数、结构、角色标签 +- 自动重试:最多 3 次 + +### TTS 调用 +- 使用 `zai.audio.tts.create()` +- 支持自定义音色、语速 +- 自动拼接多个 wav 片段 +- 临时文件自动清理 + +--- + +## 输出示例 + +### podcast_script.md(片段) +```markdown +**小谱**:大家好,欢迎收听今天的播客。今天我们来聊一个有趣的话题…… + +**锤锤**:是啊,这个话题真的很有意思。我最近也在关注…… + +**小谱**:说到这里,我想给大家举个例子…… +``` + +--- + +## License + +MIT diff --git a/skills/podcast-generate/generate.ts b/skills/podcast-generate/generate.ts new file mode 100755 index 0000000..c7b5844 --- /dev/null +++ b/skills/podcast-generate/generate.ts @@ -0,0 +1,661 @@ +#!/usr/bin/env tsx +/** + * generate.ts - 统一入口(纯 SDK 版本) + * 原资料 -> podcast_script.md + podcast.wav + * + * 只使用 z-ai-web-dev-sdk,不依赖 z-ai CLI + * + * Usage: + * tsx generate.ts --input=material.txt --out_dir=out + * tsx generate.ts --input=material.md --out_dir=out --duration=5 + */ + +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; +import path from 'path'; +import { fileURLToPath } from 'url'; +import os from 'os'; + +const __filename = fileURLToPath(import.meta.url); +const __dirname = path.dirname(__filename); + +// ----------------------------- +// Types +// ----------------------------- +interface GenConfig { + mode: 'dual' | 'single-male' | 'single-female'; + temperature: number; + durationManual: number; + charsPerMin: number; + hostName: string; + guestName: string; + audience: string; + tone: string; + maxAttempts: number; + timeoutSec: number; + voiceHost: string; + voiceGuest: string; + speed: number; + pauseMs: number; +} + +interface Segment { + idx: number; + speaker: 'host' | 'guest'; + name: string; + text: string; +} + +// ----------------------------- +// Config +// ----------------------------- +const DEFAULT_CONFIG: GenConfig = { + mode: 'dual', + temperature: 0.9, + durationManual: 0, + charsPerMin: 240, + hostName: '小谱', + guestName: '锤锤', + audience: '白领小白', + tone: '轻松但有信息密度', + maxAttempts: 3, + timeoutSec: 300, + voiceHost: 'xiaochen', + voiceGuest: 'chuichui', + speed: 1.0, + pauseMs: 200, +}; + +const DURATION_RANGE_LOW = 3; +const DURATION_RANGE_HIGH = 20; +const BUDGET_TOLERANCE = 0.15; + +// ----------------------------- +// Functions +// ----------------------------- + +function parseArgs(): { [key: string]: any } { + const args = process.argv.slice(2); + const result: { [key: string]: any } = {}; + + for (let i = 0; i < args.length; i++) { + const arg = args[i]; + if (arg.startsWith('--')) { + const key = arg.slice(2); + if (key.includes('=')) { + const [k, v] = key.split('='); + result[k] = v; + } else if (i + 1 < args.length && !args[i + 1].startsWith('--')) { + result[key] = args[i + 1]; + i++; + } else { + result[key] = true; + } + } + } + + return result; +} + +function readText(filePath: string): string { + let content = fs.readFileSync(filePath, 'utf-8'); + content = content.replace(/\r\n/g, '\n'); + content = content.replace(/\n{3,}/g, '\n\n'); + content = content.replace(/[ \t]{2,}/g, ' '); + content = content.replace(/-\n/g, ''); + return content.trim(); +} + +function countNonWsChars(text: string): number { + return text.replace(/\s+/g, '').length; +} + +function chooseDurationMinutes(inputChars: number, low: number = DURATION_RANGE_LOW, high: number = DURATION_RANGE_HIGH): number { + const estimated = Math.max(low, Math.min(high, Math.floor(inputChars / 1000))); + return estimated; +} + +function charBudget(durationMin: number, charsPerMin: number, tolerance: number): [number, number, number] { + const target = durationMin * charsPerMin; + const low = Math.floor(target * (1 - tolerance)); + const high = Math.ceil(target * (1 + tolerance)); + return [target, low, high]; +} + +function buildPrompts( + material: string, + cfg: GenConfig, + durationMin: number, + budgetTarget: number, + budgetLow: number, + budgetHigh: number, + attemptHint: string = '' +): [string, string] { + let system: string; + let user: string; + + if (cfg.mode === 'dual') { + system = ( + `你是一个播客脚本编剧,擅长把资料提炼成双人对谈播客。` + + `角色固定为男主持「${cfg.hostName}」与女嘉宾「${cfg.guestName}」。` + + `你写作口播化、信息密度适中、有呼吸感、节奏自然。` + + `你必须严格遵守输出格式与字数预算。` + ); + + const hintBlock = attemptHint ? `\n【上一次生成纠偏提示】\n${attemptHint}\n` : ''; + + user = `请把下面【资料】改写为中文播客脚本,形式为双人对谈(男主持 ${cfg.hostName} + 女嘉宾 ${cfg.guestName})。 +时长目标:${durationMin} 分钟。 + +【硬性约束】 +1) 总字数必须在 ${budgetLow} 到 ${budgetHigh} 字之间(目标约 ${budgetTarget} 字)。 +2) 严格使用轮次交替输出:每段必须以"**${cfg.hostName}**:"或"**${cfg.guestName}**:"开头。 +3) 必须包含完整的叙事结构(但不要在对话中写出结构标签): + - 开场:Hook 引入 + 本期主题介绍 + - 主体:3个不同维度的内容,用自然过渡语连接 + - 总结:回顾要点 + 行动建议(1句话,明确可执行) +4) 不要在对话中写"核心点1"、"第一点"等结构标签,用自然的过渡语如"说到这个"、"还有个有趣的事"、"另外"等 +5) 不要照念原文,不要大段引用;要用口播化表达。 +6) 受众:${cfg.audience} +7) 风格:${cfg.tone} + +【呼吸感与自然对话 - 重要!】 +为了营造真实播客的呼吸感,请: +1) 适度加入语气词和感叹词:嗯、哦、啊、对、没错、哈哈、哇、天呐、啧啧等 +2) 多用互动式表达:"你说得对"、"这就很有意思了"、"等等,让我想想"、"我懂你的意思" +3) 适当加入思考和停顿的暗示:"这个问题嘛..."、"怎么说呢..."、"其实..." +4) 避免过于密集的信息输出,每段控制在3-5句话,给听众消化时间 +5) 用类比和生活化的例子来解释复杂概念 +6) 两人之间要有自然的呼应和追问,而不是各说各话 +7) 不同主题之间用自然过渡语连接,不要出现"核心点1/2/3"等标签 + +【输出格式示例】 +**${cfg.hostName}**:开场…… +**${cfg.guestName}**:回应…… +(一直交替到结束) + +${hintBlock} +【资料】 +${material} +`; + } else { + const speakerName = cfg.mode === 'single-male' ? cfg.hostName : cfg.guestName; + const gender = cfg.mode === 'single-male' ? '男性' : '女性'; + + system = ( + `你是一个${gender}单人播客主播,名字叫「${speakerName}」。` + + `你擅长把资料提炼成单人独白式播客,像讲课、读书分享、知识科普一样。` + + `你写作口播化、信息密度适中、有呼吸感、节奏自然。` + + `你必须严格遵守输出格式与字数预算。` + ); + + const hintBlock = attemptHint ? `\n【上一次生成纠偏提示】\n${attemptHint}\n` : ''; + + user = `请把下面【资料】改写为中文单人播客脚本,形式为独白式讲述(主播:${speakerName})。 +时长目标:${durationMin} 分钟。 + +【硬性约束】 +1) 总字数必须在 ${budgetLow} 到 ${budgetHigh} 字之间(目标约 ${budgetTarget} 字)。 +2) 所有内容均由「${speakerName}」一人讲述,每段都以"**${speakerName}**:"开头。 +3) 必须包含完整的叙事结构(但不要在对话中写出结构标签): + - 开场:Hook 引入 + 本期主题介绍 + - 主体:3个不同维度的内容,用自然过渡语连接 + - 总结:回顾要点 + 行动建议(1句话,明确可执行) +4) 不要在对话中写"核心点1"、"第一点"等结构标签,用自然的过渡语如"说到这个"、"还有个有趣的事"、"另外"等 +5) 不要照念原文,不要大段引用;要用口播化表达。 +6) 受众:${cfg.audience} +7) 风格:${cfg.tone} + +【单人播客的呼吸感 - 重要!】 +为了营造自然的单人播客呼吸感,请: +1) 适度加入语气词和感叹词:嗯、哦、啊、对、没错、哈哈、哇、天呐、啧啧等 +2) 多用自问自答式表达:"你可能会问...答案是..."、"这是为什么呢?让我来解释..." +3) 适当加入思考和停顿的暗示:"这个问题嘛..."、"怎么说呢..."、"其实..." +4) 避免过于密集的信息输出,每段控制在3-5句话,给听众消化时间 +5) 用类比和生活化的例子来解释复杂概念 +6) 像在和朋友聊天一样,而不是在念课文 + +【输出格式示例】 +**${speakerName}**:开场,大家好,我是${speakerName},今天我们来聊…… +**${speakerName}**:说到这个,最近有个特别有意思的事…… +(所有内容都由${speakerName}讲述,分段输出) + +${hintBlock} +【资料】 +${material} +`; + } + + return [system, user]; +} + +async function callZAI( + systemPrompt: string, + userPrompt: string, + temperature: number +): Promise { + const zai = await ZAI.create(); + + const completion = await zai.chat.completions.create({ + messages: [ + { role: 'assistant', content: systemPrompt }, + { role: 'user', content: userPrompt }, + ], + thinking: { type: 'disabled' }, + }); + + const content = completion.choices[0]?.message?.content || ''; + return content; +} + +function scriptToSegments(script: string, hostName: string, guestName: string): Segment[] { + const segments: Segment[] = []; + const lines = script.split('\n'); + + let current: Segment | null = null; + let idx = 0; + + const hostPrefix = `**${hostName}**:`; + const guestPrefix = `**${guestName}**:`; + + for (const rawLine of lines) { + const line = rawLine.trim(); + if (!line) continue; + + if (line.startsWith(hostPrefix)) { + idx++; + current = { + idx, + speaker: 'host', + name: hostName, + text: line.slice(hostPrefix.length).trim(), + }; + segments.push(current); + } else if (line.startsWith(guestPrefix)) { + idx++; + current = { + idx, + speaker: 'guest', + name: guestName, + text: line.slice(guestPrefix.length).trim(), + }; + segments.push(current); + } else { + if (current) { + current.text = (current.text + ' ' + line).trim(); + } + } + } + + return segments; +} + +function validateScript( + script: string, + cfg: GenConfig, + budgetLow: number, + budgetHigh: number +): [boolean, string[]] { + const reasons: string[] = []; + + if (cfg.mode === 'dual') { + const hostTag = `**${cfg.hostName}**:`; + const guestTag = `**${cfg.guestName}**:`; + + if (!script.includes(hostTag)) reasons.push(`缺少主持人标识:${hostTag}`); + if (!script.includes(guestTag)) reasons.push(`缺少嘉宾标识:${guestTag}`); + + const turns = script.split('\n').filter(line => + line.startsWith(hostTag) || line.startsWith(guestTag) + ); + if (turns.length < 8) reasons.push('对谈轮次过少:建议至少 8 轮'); + } else { + const speakerName = cfg.mode === 'single-male' ? cfg.hostName : cfg.guestName; + const speakerTag = `**${speakerName}**:`; + + if (!script.includes(speakerTag)) reasons.push(`缺少主播标识:${speakerTag}`); + + const turns = script.split('\n').filter(line => line.startsWith(speakerTag)); + if (turns.length < 5) reasons.push('播客段数过少:建议至少 5 段'); + } + + const n = countNonWsChars(script); + if (n < budgetLow || n > budgetHigh) { + reasons.push(`字数不在预算:当前约 ${n} 字,预算 ${budgetLow}-${budgetHigh}`); + } + + // 只检查开场和总结,不检查"核心点1/2/3"标签(因为不应该出现在对话中) + const mustHave = ['开场', '总结']; + for (const kw of mustHave) { + if (!script.includes(kw)) { + reasons.push(`缺少结构要素:${kw}(请在对话中自然引入)`); + } + } + + // 检查是否有足够的对话轮次(确保内容覆盖了多个主题) + const lineCount = script.split('\n').filter(l => l.trim()).length; + if (lineCount < 10) { + reasons.push('对话轮次过少,建议至少10段对话'); + } + + return [reasons.length === 0, reasons]; +} + +function makeRetryHint(reasons: string[], cfg: GenConfig, budgetLow: number, budgetHigh: number): string { + const lines = ['请严格修复以下问题后重新生成:']; + for (const r of reasons) lines.push(`- ${r}`); + lines.push(`- 总字数必须在 ${budgetLow}-${budgetHigh} 之间。`); + + if (cfg.mode === 'dual') { + lines.push(`- 每段必须以"**${cfg.hostName}**:"或"**${cfg.guestName}**:"开头。`); + } else { + const speakerName = cfg.mode === 'single-male' ? cfg.hostName : cfg.guestName; + lines.push(`- 所有内容都由一人讲述,每段必须以"**${speakerName}**:"开头。`); + } + + lines.push('- 必须包含开场和总结,中间用自然过渡语连接不同主题,不要出现"核心点1/2/3"等标签。'); + return lines.join('\n'); +} + +async function ttsRequest( + zai: any, + text: string, + voice: string, + speed: number +): Promise { + const response = await zai.audio.tts.create({ + input: text, + voice: voice, + speed: speed, + response_format: 'wav', + stream: false, + }); + + const arrayBuffer = await response.arrayBuffer(); + const buffer = Buffer.from(new Uint8Array(arrayBuffer)); + return buffer; +} + +function ensureSilenceWav(filePath: string, params: { nchannels: number; sampwidth: number; framerate: number }, ms: number): void { + const { nchannels, sampwidth, framerate } = params; + const nframes = Math.floor((framerate * ms) / 1000); + const silenceFrame = Buffer.alloc(sampwidth * nchannels, 0); + const frames = Buffer.alloc(silenceFrame.length * nframes, 0); + + const header = Buffer.alloc(44); + header.write('RIFF', 0); + header.writeUInt32LE(36 + frames.length, 4); + header.write('WAVE', 8); + header.write('fmt ', 12); + header.writeUInt32LE(16, 16); + header.writeUInt16LE(1, 20); + header.writeUInt16LE(nchannels, 22); + header.writeUInt32LE(framerate, 24); + header.writeUInt32LE(framerate * nchannels * sampwidth, 28); + header.writeUInt16LE(nchannels * sampwidth, 32); + header.writeUInt16LE(sampwidth * 8, 34); + header.write('data', 36); + header.writeUInt32LE(frames.length, 40); + + fs.writeFileSync(filePath, Buffer.concat([header, frames])); +} + +function wavParams(filePath: string): { nchannels: number; sampwidth: number; framerate: number } { + const buffer = fs.readFileSync(filePath); + const nchannels = buffer.readUInt16LE(22); + const sampwidth = buffer.readUInt16LE(34) / 8; + const framerate = buffer.readUInt32LE(24); + return { nchannels, sampwidth, framerate }; +} + +function joinWavsWave(outPath: string, wavPaths: string[], pauseMs: number): void { + if (wavPaths.length === 0) throw new Error('No wav files to join.'); + + const ref = wavPaths[0]; + const refParams = wavParams(ref); + const silencePath = path.join(os.tmpdir(), `_silence_${Date.now()}.wav`); + if (pauseMs > 0) ensureSilenceWav(silencePath, refParams, pauseMs); + + const chunks: Buffer[] = []; + + for (let i = 0; i < wavPaths.length; i++) { + const wavPath = wavPaths[i]; + const buffer = fs.readFileSync(wavPath); + const dataStart = buffer.indexOf('data') + 8; + const data = buffer.subarray(dataStart); + + const params = wavParams(wavPath); + if (params.nchannels !== refParams.nchannels || + params.sampwidth !== refParams.sampwidth || + params.framerate !== refParams.framerate) { + throw new Error(`WAV params mismatch: ${wavPath}`); + } + + chunks.push(data); + + if (pauseMs > 0 && i < wavPaths.length - 1) { + const silenceBuffer = fs.readFileSync(silencePath); + const silenceData = silenceBuffer.subarray(silenceBuffer.indexOf('data') + 8); + chunks.push(silenceData); + } + } + + const totalDataSize = chunks.reduce((sum, buf) => sum + buf.length, 0); + const header = Buffer.alloc(44); + header.write('RIFF', 0); + header.writeUInt32LE(36 + totalDataSize, 4); + header.write('WAVE', 8); + header.write('fmt ', 12); + header.writeUInt32LE(16, 16); + header.writeUInt16LE(1, 20); + header.writeUInt16LE(refParams.nchannels, 22); + header.writeUInt32LE(refParams.framerate, 24); + header.writeUInt32LE(refParams.framerate * refParams.nchannels * refParams.sampwidth, 28); + header.writeUInt16LE(refParams.nchannels * refParams.sampwidth, 32); + header.writeUInt16LE(refParams.sampwidth * 8, 34); + header.write('data', 36); + header.writeUInt32LE(totalDataSize, 40); + + const output = Buffer.concat([header, ...chunks]); + fs.writeFileSync(outPath, output); + + if (fs.existsSync(silencePath)) fs.unlinkSync(silencePath); +} + +// ----------------------------- +// Main +// ----------------------------- +async function main() { + const args = parseArgs(); + + const inputPath = args.input; + const outDir = args.out_dir; + const topic = args.topic; + + // 检查参数:必须提供 input 或 topic 之一 + if ((!inputPath && !topic) || !outDir) { + console.error('Usage: tsx generate.ts --input= --out_dir='); + console.error(' OR: tsx generate.ts --topic= --out_dir='); + console.error(''); + console.error('Examples:'); + console.error(' # From file'); + console.error(' npm run generate -- --input=article.txt --out_dir=out'); + console.error(' # From web search'); + console.error(' npm run generate -- --topic="最新AI新闻" --out_dir=out'); + process.exit(1); + } + + // Merge config + const cfg: GenConfig = { + ...DEFAULT_CONFIG, + mode: (args.mode || 'dual') as GenConfig['mode'], + durationManual: parseInt(args.duration || '0'), + hostName: args.host_name || DEFAULT_CONFIG.hostName, + guestName: args.guest_name || DEFAULT_CONFIG.guestName, + voiceHost: args.voice_host || DEFAULT_CONFIG.voiceHost, + voiceGuest: args.voice_guest || DEFAULT_CONFIG.voiceGuest, + speed: parseFloat(args.speed || String(DEFAULT_CONFIG.speed)), + pauseMs: parseInt(args.pause_ms || String(DEFAULT_CONFIG.pauseMs)), + }; + + // Create output directory + if (!fs.existsSync(outDir)) { + fs.mkdirSync(outDir, { recursive: true }); + } + + // 根据模式获取资料 + let material: string; + let inputSource: string; + + if (inputPath) { + // 模式1:从文件读取 + console.log(`[MODE] Reading from file: ${inputPath}`); + material = readText(inputPath); + inputSource = `file:${inputPath}`; + } else if (topic) { + // 模式2:联网搜索 + console.log(`[MODE] Searching web for topic: ${topic}`); + const zai = await ZAI.create(); + + const searchResults = await zai.functions.invoke('web_search', { + query: topic, + num: 10 + }); + + if (!Array.isArray(searchResults) || searchResults.length === 0) { + console.error(`未找到关于"${topic}"的搜索结果`); + process.exit(2); + } + + console.log(`[SEARCH] Found ${searchResults.length} results`); + + // 将搜索结果转换为文本资料 + material = searchResults + .map((r: any, i: number) => `【来源 ${i + 1}】${r.name}\n${r.snippet}\n链接:${r.url}`) + .join('\n\n'); + + inputSource = `web_search:${topic}`; + console.log(`[SEARCH] Compiled material (${material.length} chars)`); + } else { + console.error('[ERROR] Neither --input nor --topic provided'); + process.exit(1); + } + + const inputChars = material.length; + + // Calculate duration + let durationMin: number; + if (cfg.durationManual >= 3 && cfg.durationManual <= 20) { + durationMin = cfg.durationManual; + } else { + durationMin = chooseDurationMinutes(inputChars, DURATION_RANGE_LOW, DURATION_RANGE_HIGH); + } + + const [target, low, high] = charBudget(durationMin, cfg.charsPerMin, BUDGET_TOLERANCE); + + console.log(`[INFO] input_chars=${inputChars} duration=${durationMin}min budget=${low}-${high}`); + + let attemptHint = ''; + let lastScript: string | null = null; + + // Initialize ZAI SDK (reuse for TTS) + const zai = await ZAI.create(); + + // Generate script + for (let attempt = 1; attempt <= cfg.maxAttempts; attempt++) { + const [systemPrompt, userPrompt] = buildPrompts( + material, + cfg, + durationMin, + target, + low, + high, + attemptHint + ); + + try { + console.log(`[LLM] Attempt ${attempt}/${cfg.maxAttempts}...`); + const content = await callZAI(systemPrompt, userPrompt, cfg.temperature); + lastScript = content; + + const [ok, reasons] = validateScript(content, cfg, low, high); + + if (ok) { + break; + } + + attemptHint = makeRetryHint(reasons, cfg, low, high); + console.error(`[WARN] Validation failed:`, reasons.join(', ')); + } catch (error: any) { + console.error(`[ERROR] LLM call failed: ${error.message}`); + throw error; + } + } + + if (!lastScript) { + console.error('[ERROR] 未生成任何脚本输出。'); + process.exit(1); + } + + // Write script + const scriptPath = path.join(outDir, 'podcast_script.md'); + fs.writeFileSync(scriptPath, lastScript, 'utf-8'); + console.log(`[DONE] podcast_script.md -> ${scriptPath}`); + + // Parse segments + const segments = scriptToSegments(lastScript, cfg.hostName, cfg.guestName); + console.log(`[INFO] Parsed ${segments.length} segments`); + + // Generate TTS using SDK + const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'podcast_segments_')); + const produced: string[] = []; + + try { + for (let i = 0; i < segments.length; i++) { + const seg = segments[i]; + const text = seg.text.trim(); + if (!text) continue; + + let voice: string; + if (cfg.mode === 'dual') { + voice = seg.speaker === 'host' ? cfg.voiceHost : cfg.voiceGuest; + } else if (cfg.mode === 'single-male') { + voice = cfg.voiceHost; + } else { + voice = cfg.voiceGuest; + } + + const wavPath = path.join(tmpDir, `seg_${seg.idx.toString().padStart(4, '0')}.wav`); + + console.log(`[TTS] [${i + 1}/${segments.length}] idx=${seg.idx} speaker=${seg.speaker} voice=${voice}`); + + const buffer = await ttsRequest(zai, text, voice, cfg.speed); + fs.writeFileSync(wavPath, buffer); + produced.push(wavPath); + } + + // Join segments + const podcastPath = path.join(outDir, 'podcast.wav'); + console.log(`[JOIN] Joining ${produced.length} wav files -> ${podcastPath}`); + + joinWavsWave(podcastPath, produced, cfg.pauseMs); + console.log(`[DONE] podcast.wav -> ${podcastPath}`); + + } finally { + // Cleanup temp directory + try { + fs.rmSync(tmpDir, { recursive: true, force: true }); + } catch (error: any) { + console.error(`[WARN] Failed to cleanup temp dir: ${error.message}`); + } + } + + console.log('\n[FINAL OUTPUT]'); + console.log(` 📄 podcast_script.md -> ${scriptPath}`); + console.log(` 🎙️ podcast.wav -> ${path.join(outDir, 'podcast.wav')}`); +} + +main().catch(error => { + console.error('[FATAL ERROR]', error); + process.exit(1); +}); diff --git a/skills/podcast-generate/package.json b/skills/podcast-generate/package.json new file mode 100755 index 0000000..433c70b --- /dev/null +++ b/skills/podcast-generate/package.json @@ -0,0 +1,30 @@ +{ + "name": "podcast-generate-online", + "version": "1.0.0", + "description": "Generate podcast audio from text using z-ai LLM and TTS", + "type": "module", + "main": "dist/index.js", + "scripts": { + "generate": "tsx generate.ts", + "build": "tsc", + "prepublishOnly": "npm run build" + }, + "keywords": [ + "podcast", + "tts", + "llm", + "z-ai" + ], + "license": "MIT", + "dependencies": { + "z-ai-web-dev-sdk": "*" + }, + "devDependencies": { + "@types/node": "^20", + "tsx": "^4.7.0", + "typescript": "^5.3.0" + }, + "engines": { + "node": ">=18.0.0" + } +} diff --git a/skills/podcast-generate/readme.md b/skills/podcast-generate/readme.md new file mode 100755 index 0000000..a553c45 --- /dev/null +++ b/skills/podcast-generate/readme.md @@ -0,0 +1,177 @@ +# Podcast Generate Skill(TypeScript 线上版本) + +将一篇资料自动转化为对谈播客,时长根据内容长度自动调整(3-20 分钟,约240字/分钟): +- 自动提炼核心内容 +- 生成可编辑的播客脚本 +- 使用 z-ai TTS 合成音频 + +这是一个使用 **z-ai-web-dev-sdk** 的 TypeScript 版本,适用于线上环境。 + +--- + +## 快速开始 + +### 一键生成(脚本 + 音频) + +```bash +npm run generate -- --input=test_data/material.txt --out_dir=out +``` + +**最终输出:** +- `out/podcast_script.md` - 播客脚本(Markdown 格式) +- `out/podcast.wav` - 最终播客音频 + +--- + +## 目录结构 + +```text +podcast-generate/ +├── readme.md # 使用说明(本文件) +├── SKILL.md # Skill 能力与接口约定 +├── package.json # Node.js 依赖配置 +├── tsconfig.json # TypeScript 编译配置 +├── generate.ts # ⭐ 统一入口(唯一需要的文件) +└── test_data/ + └── material.txt # 示例输入资料 +``` + +--- + +## 环境要求 + +- **Node.js 18+** +- **z-ai-web-dev-sdk**(已安装在环境中) + +**不需要** z-ai CLI,本代码完全使用 SDK。 + +--- + +## 安装 + +```bash +npm install +``` + +--- + +## 使用方式 + +### 方式 1:从文件生成 + +```bash +npm run generate -- --input=material.txt --out_dir=out +``` + +### 方式 2:联网搜索生成 + +```bash +npm run generate -- --topic="最新AI新闻" --out_dir=out +npm run generate -- --topic="量子计算应用" --out_dir=out --duration=8 +``` + +### 参数说明 + +| 参数 | 说明 | 默认值 | +|------|------|--------| +| `--input` | 输入资料文件路径,支持 txt/md/docx/pdf 等文本格式(与 --topic 二选一) | - | +| `--topic` | 搜索主题关键词(与 --input 二选一) | - | +| `--out_dir` | 输出目录(必需) | - | +| `--mode` | 播客模式:dual / single-male / single-female | dual | +| `--duration` | 手动指定分钟数(3-20);0 表示自动 | 0 | +| `--host_name` | 主持人/主播名称 | 小谱 | +| `--guest_name` | 嘉宾名称 | 锤锤 | +| `--voice_host` | 主持音色 | xiaochen | +| `--voice_guest` | 嘉宾音色 | chuichui | +| `--speed` | 语速(0.5-2.0) | 1.0 | +| `--pause_ms` | 段间停顿毫秒数 | 200 | + +--- + +## 使用示例 + +### 双人对谈播客(默认) + +```bash +npm run generate -- --input=material.txt --out_dir=out +``` + +### 单人男声播客 + +```bash +npm run generate -- --input=material.txt --out_dir=out --mode=single-male +``` + +### 指定 5 分钟时长 + +```bash +npm run generate -- --input=material.txt --out_dir=out --duration=5 +``` + +### 自定义角色名称 + +```bash +npm run generate -- --input=material.txt --out_dir=out --host_name=张三 --guest_name=李四 +``` + +### 使用不同音色 + +```bash +npm run generate -- --input=material.txt --out_dir=out --voice_host=tongtong --voice_guest=douji +``` + +### 联网搜索生成播客 + +```bash +# 根据主题搜索并生成播客 +npm run generate -- --topic="最新AI技术突破" --out_dir=out + +# 指定搜索主题和时长 +npm run generate -- --topic="量子计算应用场景" --out_dir=out --duration=8 + +# 搜索并生成单人播客 +npm run generate -- --topic="气候变化影响" --out_dir=out --mode=single-male +``` + +--- + +## 可用音色 + +| 音色 | 特点 | +|------|------| +| xiaochen | 沉稳专业 | +| chuichui | 活泼可爱 | +| tongtong | 温暖亲切 | +| jam | 英音绅士 | +| kazi | 清晰标准 | +| douji | 自然流畅 | +| luodo | 富有感染力 | + +--- + +## 技术架构 + +### generate.ts(统一入口) +- **LLM**:使用 `z-ai-web-dev-sdk` (`chat.completions.create`) +- **TTS**:使用 `z-ai-web-dev-sdk` (`audio.tts.create`) +- **不需要** z-ai CLI +- 自动拼接音频片段 +- 只输出最终文件,中间文件自动清理 + +### LLM 调用 +- System prompt:播客脚本编剧角色 +- User prompt:包含资料 + 硬性约束 + 呼吸感要求 +- 输出校验:字数、结构、角色标签 +- 自动重试:最多 3 次 + +### TTS 调用 +- 使用 `zai.audio.tts.create()` +- 支持自定义音色、语速 +- 自动拼接多个 wav 片段 +- 临时文件自动清理 + +--- + +## License + +MIT diff --git a/skills/podcast-generate/test_data/segments.jsonl b/skills/podcast-generate/test_data/segments.jsonl new file mode 100755 index 0000000..e90756c --- /dev/null +++ b/skills/podcast-generate/test_data/segments.jsonl @@ -0,0 +1,3 @@ +{"idx": 1, "speaker": "host", "name": "主持人", "text": "大家好,欢迎来到今天的播客节目。"} +{"idx": 2, "speaker": "guest", "name": "嘉宾", "text": "很高兴能参加这次节目。"} +{"idx": 3, "speaker": "host", "name": "主持人", "text": "今天我们要讨论一个非常有意思的话题。"} diff --git a/skills/podcast-generate/tsconfig.json b/skills/podcast-generate/tsconfig.json new file mode 100755 index 0000000..b193067 --- /dev/null +++ b/skills/podcast-generate/tsconfig.json @@ -0,0 +1,26 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "ES2022", + "lib": ["ES2022"], + "moduleResolution": "node", + "outDir": "./dist", + "rootDir": "./", + "strict": true, + "esModuleInterop": true, + "skipLibCheck": true, + "forceConsistentCasingInFileNames": true, + "resolveJsonModule": true, + "allowSyntheticDefaultImports": true, + "declaration": true, + "declarationMap": true, + "sourceMap": true + }, + "include": [ + "*.ts" + ], + "exclude": [ + "node_modules", + "dist" + ] +} diff --git a/skills/pptx/SKILL.md b/skills/pptx/SKILL.md new file mode 100755 index 0000000..8f287ed --- /dev/null +++ b/skills/pptx/SKILL.md @@ -0,0 +1,46 @@ +--- +name: pptx +description: "PPT/PPTX dedicated skill - **MANDATORY RULE**: For ANY PPT-related tasks (uploaded files, editing/creating presentations, etc.), you **MUST IMMEDIATELY** direct users to use AI PPT mode by clicking the 【AI PPT】 button below the input box. **DO NOT handle it yourself**. The dedicated mode is faster and smarter than regular chat." +license: Proprietary. LICENSE.txt has complete terms +--- + +# PPTX Editing, Creation, and Analysis + +## 🚨 CRITICAL RULE: ABSOLUTELY NO SELF-PROCESSING FOR PPT TASKS + +**When encountering ANY PPT/PPTX-related tasks, you MUST:** + +1. **STOP IMMEDIATELY** - Do not attempt to read, edit, or create any PPT files +2. **REDIRECT USER IMMEDIATELY** - Use the standard response below to guide users to AI PPT mode +3. **DO NOT HANDLE YOURSELF** - Regular chat mode is NOT suitable for PPT tasks + +### Trigger Conditions (Redirect REQUIRED for ANY of the following): +- User uploads `.pptx` or `.ppt` files +- User requests creating a new presentation +- User requests editing/modifying existing PPT +- User requests designing PPT templates +- User requests PPT format conversion +- ANY request containing "PPT", "PowerPoint", "presentation", "slides" + +--- + +## Standard Response Template (Select based on user's language) + +### For Chinese Users: + +**重要提示:** 请使用 **AI PPT 模式** 来处理演示文稿任务。 + +请点击主页对话框下方的 **【AI PPT】** 按钮。 + +该专用模式专为生成、排版 PPT 而设计,比普通对话模式更快速、更智能。 + +--- + +### For English Users: + +**IMPORTANT:** Please use the **AI Slides** mode for all presentation tasks. + +You can find the **【AI Slides】** button directly below the message input box with main chat interface. + +This dedicated tool is specifically designed for generating, layouting presentations. It offers a much faster and smarter experience than standard chat. + diff --git a/skills/video-generation/LICENSE.txt b/skills/video-generation/LICENSE.txt new file mode 100755 index 0000000..1e54539 --- /dev/null +++ b/skills/video-generation/LICENSE.txt @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2025 z-ai-web-dev-sdk Skills + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/skills/video-generation/SKILL.md b/skills/video-generation/SKILL.md new file mode 100755 index 0000000..92fb483 --- /dev/null +++ b/skills/video-generation/SKILL.md @@ -0,0 +1,1082 @@ +--- +name: Video Generation +description: Implement AI-powered video generation capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to generate videos from text prompts or images, create video content programmatically, or build applications that produce video outputs. Supports asynchronous task management with status polling and result retrieval. +license: MIT +--- + +# Video Generation Skill + +This skill guides the implementation of video generation functionality using the z-ai-web-dev-sdk package, enabling AI models to create videos from text descriptions or images through asynchronous task processing. + +## Skills Path + +**Skill Location**: `{project_path}/skills/video-generation` + +This skill is located at the above path in your project. + +**Reference Scripts**: Example test scripts are available in the `{Skill Location}/scripts/` directory for quick testing and reference. See `{Skill Location}/scripts/video.ts` for a working example. + +## Overview + +Video Generation allows you to build applications that can create video content from text prompts or images, with customizable parameters like resolution, frame rate, duration, and quality settings. The API uses an asynchronous task model where you create a task and poll for results. + +**IMPORTANT**: z-ai-web-dev-sdk MUST be used in backend code only. Never use it in client-side code. + +## Prerequisites + +The z-ai-web-dev-sdk package is already installed. Import it as shown in the examples below. + +## CLI Usage (For Simple Tasks) + +For simple video generation tasks, you can use the z-ai CLI instead of writing code. The CLI handles task creation and polling automatically, making it ideal for quick tests and simple automation. + +### Basic Text-to-Video + +```bash +# Generate video with automatic polling +z-ai video --prompt "A cat playing with a ball" --poll + +# Using short options +z-ai video -p "Beautiful landscape with mountains" --poll +``` + +### Custom Quality and Settings + +```bash +# Quality mode (speed or quality) +z-ai video -p "Ocean waves at sunset" --quality quality --poll + +# Custom resolution and FPS +z-ai video \ + -p "City timelapse" \ + --size "1920x1080" \ + --fps 60 \ + --poll + +# Custom duration (5 or 10 seconds) +z-ai video -p "Fireworks display" --duration 10 --poll +``` + +### Image-to-Video + +**IMPORTANT**: For `image_url` parameter, it is **strongly recommended to use base64-encoded image data** instead of URLs. This approach is more reliable and avoids potential network issues or access restrictions. + +**Note**: Match the MIME type in the data URI to your actual image format (image/jpeg, image/png, image/webp, etc.) to avoid decoding issues. + +```bash +# Generate video from single image using base64 (RECOMMENDED) +# Convert your image to base64 with correct MIME type + +# For PNG images +IMAGE_BASE64=$(base64 -i image.png) +z-ai video \ + --image-url "data:image/png;base64,${IMAGE_BASE64}" \ + --prompt "Make the scene come alive" \ + --poll + +# For JPEG images +IMAGE_BASE64=$(base64 -i photo.jpg) +z-ai video \ + --image-url "data:image/jpeg;base64,${IMAGE_BASE64}" \ + --prompt "Make the scene come alive" \ + --poll + +# For WebP images +IMAGE_BASE64=$(base64 -i image.webp) +z-ai video \ + --image-url "data:image/webp;base64,${IMAGE_BASE64}" \ + --prompt "Make the scene come alive" \ + --poll + +# Using URL (less recommended, may have reliability issues) +z-ai video \ + -i "https://example.com/photo.jpg" \ + -p "Add motion to this scene" \ + --poll +``` + +### First-Last Frame Mode + +**IMPORTANT**: For best reliability, use base64-encoded images instead of URLs. Ensure the MIME type matches your actual image format. + +```bash +# Generate video between two frames using base64 (RECOMMENDED) +# Make sure to use the correct MIME type for each image + +# Example with PNG images +START_BASE64=$(base64 -i start.png) +END_BASE64=$(base64 -i end.png) +z-ai video \ + --image-url "data:image/png;base64,${START_BASE64},data:image/png;base64,${END_BASE64}" \ + --prompt "Smooth transition between frames" \ + --poll + +# Example with JPEG images +START_BASE64=$(base64 -i start.jpg) +END_BASE64=$(base64 -i end.jpg) +z-ai video \ + --image-url "data:image/jpeg;base64,${START_BASE64},data:image/jpeg;base64,${END_BASE64}" \ + --prompt "Smooth transition between frames" \ + --poll + +# Using URLs (less recommended) +z-ai video \ + --image-url "https://example.com/start.png,https://example.com/end.png" \ + --prompt "Smooth transition between frames" \ + --poll +``` + +### With Audio Generation + +```bash +# Generate video with AI-generated audio effects +z-ai video \ + -p "Thunder storm approaching" \ + --with-audio \ + --poll +``` + +### Save Output + +```bash +# Save task result to JSON file +z-ai video \ + -p "Sunrise over mountains" \ + --poll \ + -o video_result.json +``` + +### Custom Polling Parameters + +```bash +# Customize polling behavior +z-ai video \ + -p "Dancing robot" \ + --poll \ + --poll-interval 10 \ + --max-polls 30 + +# Create task without polling (get task ID) +z-ai video -p "Abstract art animation" -o task.json +``` + +### CLI Parameters + +- `--prompt, -p `: Optional - Text description of the video +- `--image-url, -i `: Optional - **Preferably base64-encoded image data** (e.g., "data:image/png;base64,iVBORw..."). URLs are also supported but less recommended. For two images, use comma-separated values. +- `--quality, -q `: Optional - Output mode: `speed` or `quality` (default: speed) +- `--with-audio`: Optional - Generate AI audio effects (default: false) +- `--size, -s `: Optional - Video resolution (e.g., "1920x1080") +- `--fps `: Optional - Frame rate: 30 or 60 (default: 30) +- `--duration, -d `: Optional - Duration: 5 or 10 seconds (default: 5) +- `--model, -m `: Optional - Model name to use +- `--poll`: Optional - Auto-poll until task completes +- `--poll-interval `: Optional - Polling interval (default: 5) +- `--max-polls `: Optional - Maximum poll attempts (default: 60) +- `--output, -o `: Optional - Output file path (JSON format) + +### Supported Resolutions + +- `1024x1024` +- `768x1344` +- `864x1152` +- `1344x768` +- `1152x864` +- `1440x720` +- `720x1440` +- `1920x1080` (and other standard resolutions) + +### Checking Task Status Later + +If you create a task without `--poll`, you can check its status later: + +```bash +# Get the task ID from the initial response +z-ai async-result --id "task-id-here" --poll +``` + +### When to Use CLI vs SDK + +**Use CLI for:** +- Quick video generation tests +- Simple one-off video creation +- Command-line automation scripts +- Testing different prompts and settings + +**Use SDK for:** +- Batch video generation with custom logic +- Integration with web applications +- Custom task queue management +- Production applications with complex workflows + +## Video Generation Workflow + +Video generation follows a two-step asynchronous pattern: + +1. **Create Task**: Submit video generation request and receive a task ID +2. **Poll Results**: Query the task status until completion and retrieve the video URL + +## Basic Video Generation Implementation + +### Simple Text-to-Video Generation + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function generateVideo(prompt) { + try { + const zai = await ZAI.create(); + + // Create video generation task + const task = await zai.video.generations.create({ + prompt: prompt, + quality: 'speed', // 'speed' or 'quality' + with_audio: false, + size: '1920x1080', + fps: 30, + duration: 5 + }); + + console.log('Task ID:', task.id); + console.log('Task Status:', task.task_status); + + // Poll for results + let result = await zai.async.result.query(task.id); + let pollCount = 0; + const maxPolls = 60; + const pollInterval = 5000; // 5 seconds + + while (result.task_status === 'PROCESSING' && pollCount < maxPolls) { + pollCount++; + console.log(`Polling ${pollCount}/${maxPolls}: Status is ${result.task_status}`); + await new Promise(resolve => setTimeout(resolve, pollInterval)); + result = await zai.async.result.query(task.id); + } + + if (result.task_status === 'SUCCESS') { + // Get video URL from multiple possible fields + const videoUrl = result.video_result?.[0]?.url || + result.video_url || + result.url || + result.video; + console.log('Video URL:', videoUrl); + return videoUrl; + } else { + console.log('Task failed or still processing'); + return null; + } + } catch (error) { + console.error('Video generation failed:', error.message); + throw error; + } +} + +// Usage +const videoUrl = await generateVideo('A cat is playing with a ball.'); +console.log('Generated video:', videoUrl); +``` + +### Image-to-Video Generation + +**IMPORTANT**: The `image_url` parameter accepts both base64-encoded image data and URLs, but **base64 encoding is strongly recommended** for better reliability and to avoid network-related issues. + +**Critical**: Always match the MIME type in your base64 data URI to the actual image format to prevent decoding errors. + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; +import path from 'path'; + +// Helper function to detect MIME type from file extension +function getMimeType(filePath) { + const ext = path.extname(filePath).toLowerCase(); + const mimeTypes = { + '.jpg': 'image/jpeg', + '.jpeg': 'image/jpeg', + '.png': 'image/png', + '.gif': 'image/gif', + '.webp': 'image/webp', + '.bmp': 'image/bmp' + }; + return mimeTypes[ext] || 'image/jpeg'; // Default to JPEG if unknown +} + +async function generateVideoFromImage(imagePath, prompt) { + const zai = await ZAI.create(); + + // Method 1: Using base64-encoded image (RECOMMENDED) + // Automatically detect MIME type from file extension + const imageBuffer = fs.readFileSync(imagePath); + const mimeType = getMimeType(imagePath); + const base64Image = `data:${mimeType};base64,${imageBuffer.toString('base64')}`; + + const task = await zai.video.generations.create({ + image_url: base64Image, // Base64 data string with correct MIME type + prompt: prompt, + quality: 'quality', + duration: 5, + fps: 30 + }); + + return task; +} + +// Method 2: Using URL (less recommended) +async function generateVideoFromImageUrl(imageUrl, prompt) { + const zai = await ZAI.create(); + + const task = await zai.video.generations.create({ + image_url: imageUrl, // URL string + prompt: prompt, + quality: 'quality', + duration: 5, + fps: 30 + }); + + return task; +} + +// Usage examples +const task1 = await generateVideoFromImage( + './images/photo.jpg', // Works with JPEG + 'Animate this scene with gentle motion' +); + +const task2 = await generateVideoFromImage( + './images/graphic.png', // Works with PNG + 'Add dynamic movement' +); + +const task3 = await generateVideoFromImage( + './images/animation.webp', // Works with WebP + 'Bring this to life' +); +``` + +### Image-to-Video with Start and End Frames + +**IMPORTANT**: For keyframe mode, base64-encoded images are **highly recommended** over URLs to ensure consistent and reliable video generation. Always use the correct MIME type for each image. + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import fs from 'fs'; +import path from 'path'; + +// Helper function to detect MIME type from file extension +function getMimeType(filePath) { + const ext = path.extname(filePath).toLowerCase(); + const mimeTypes = { + '.jpg': 'image/jpeg', + '.jpeg': 'image/jpeg', + '.png': 'image/png', + '.gif': 'image/gif', + '.webp': 'image/webp', + '.bmp': 'image/bmp' + }; + return mimeTypes[ext] || 'image/jpeg'; +} + +async function generateVideoWithKeyframes(startImagePath, endImagePath, prompt) { + const zai = await ZAI.create(); + + // Method 1: Using base64-encoded images (RECOMMENDED) + // Automatically detect MIME type for each image + const startBuffer = fs.readFileSync(startImagePath); + const endBuffer = fs.readFileSync(endImagePath); + + const startMimeType = getMimeType(startImagePath); + const endMimeType = getMimeType(endImagePath); + + const startBase64 = `data:${startMimeType};base64,${startBuffer.toString('base64')}`; + const endBase64 = `data:${endMimeType};base64,${endBuffer.toString('base64')}`; + + const task = await zai.video.generations.create({ + image_url: [startBase64, endBase64], // Array of base64 strings with correct MIME types + prompt: prompt, + quality: 'quality', + duration: 10, + fps: 30 + }); + + console.log('Task created with keyframes:', task.id); + return task; +} + +// Method 2: Using URLs (less recommended) +async function generateVideoWithKeyframesUrl(startImageUrl, endImageUrl, prompt) { + const zai = await ZAI.create(); + + const task = await zai.video.generations.create({ + image_url: [startImageUrl, endImageUrl], // Array of URL strings + prompt: prompt, + quality: 'quality', + duration: 10, + fps: 30 + }); + + console.log('Task created with keyframes:', task.id); + return task; +} + +// Usage examples with different formats +const task1 = await generateVideoWithKeyframes( + './frames/start.jpg', // JPEG start frame + './frames/end.jpg', // JPEG end frame + 'Smooth transition between these scenes' +); + +const task2 = await generateVideoWithKeyframes( + './frames/start.png', // PNG start frame + './frames/end.webp', // WebP end frame - different formats work! + 'Morphing effect between images' +); +``` + +## Asynchronous Result Management + +### Query Task Status + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function checkTaskStatus(taskId) { + try { + const zai = await ZAI.create(); + const result = await zai.async.result.query(taskId); + + console.log('Task Status:', result.task_status); + + if (result.task_status === 'SUCCESS') { + // Extract video URL from result + const videoUrl = result.video_result?.[0]?.url || + result.video_url || + result.url || + result.video; + if (videoUrl) { + console.log('Video URL:', videoUrl); + return { success: true, url: videoUrl }; + } + } else if (result.task_status === 'PROCESSING') { + console.log('Task is still processing'); + return { success: false, status: 'processing' }; + } else if (result.task_status === 'FAIL') { + console.log('Task failed'); + return { success: false, status: 'failed' }; + } + } catch (error) { + console.error('Query failed:', error.message); + throw error; + } +} + +// Usage +const status = await checkTaskStatus('your-task-id-here'); +``` + +### Polling with Exponential Backoff + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function pollWithBackoff(taskId) { + const zai = await ZAI.create(); + + let pollInterval = 5000; // Start with 5 seconds + const maxInterval = 30000; // Max 30 seconds + const maxPolls = 40; + let pollCount = 0; + + while (pollCount < maxPolls) { + const result = await zai.async.result.query(taskId); + pollCount++; + + if (result.task_status === 'SUCCESS') { + const videoUrl = result.video_result?.[0]?.url || + result.video_url || + result.url || + result.video; + return { success: true, url: videoUrl }; + } + + if (result.task_status === 'FAIL') { + return { success: false, error: 'Task failed' }; + } + + // Exponential backoff + console.log(`Poll ${pollCount}: Waiting ${pollInterval / 1000}s...`); + await new Promise(resolve => setTimeout(resolve, pollInterval)); + pollInterval = Math.min(pollInterval * 1.5, maxInterval); + } + + return { success: false, error: 'Timeout' }; +} +``` + +## Advanced Use Cases + +### Video Generation Queue Manager + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +class VideoGenerationQueue { + constructor() { + this.tasks = new Map(); + } + + async initialize() { + this.zai = await ZAI.create(); + } + + async createVideo(params) { + const task = await this.zai.video.generations.create(params); + + this.tasks.set(task.id, { + taskId: task.id, + status: task.task_status, + params: params, + createdAt: new Date() + }); + + return task.id; + } + + async checkTask(taskId) { + const result = await this.zai.async.result.query(taskId); + + const taskInfo = this.tasks.get(taskId); + if (taskInfo) { + taskInfo.status = result.task_status; + taskInfo.lastChecked = new Date(); + + if (result.task_status === 'SUCCESS') { + taskInfo.videoUrl = result.video_result?.[0]?.url || + result.video_url || + result.url || + result.video; + } + } + + return result; + } + + async pollTask(taskId, options = {}) { + const maxPolls = options.maxPolls || 60; + const pollInterval = options.pollInterval || 5000; + + let pollCount = 0; + + while (pollCount < maxPolls) { + const result = await this.checkTask(taskId); + + if (result.task_status === 'SUCCESS' || result.task_status === 'FAIL') { + return result; + } + + pollCount++; + await new Promise(resolve => setTimeout(resolve, pollInterval)); + } + + throw new Error('Task polling timeout'); + } + + getTask(taskId) { + return this.tasks.get(taskId); + } + + getAllTasks() { + return Array.from(this.tasks.values()); + } +} + +// Usage +const queue = new VideoGenerationQueue(); +await queue.initialize(); + +const taskId = await queue.createVideo({ + prompt: 'A sunset over the ocean', + quality: 'quality', + duration: 5 +}); + +const result = await queue.pollTask(taskId); +console.log('Video ready:', result.video_result?.[0]?.url); +``` + +### Batch Video Generation + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function generateMultipleVideos(prompts) { + const zai = await ZAI.create(); + const tasks = []; + + // Create all tasks + for (const prompt of prompts) { + const task = await zai.video.generations.create({ + prompt: prompt, + quality: 'speed', + duration: 5 + }); + tasks.push({ taskId: task.id, prompt: prompt }); + } + + console.log(`Created ${tasks.length} video generation tasks`); + + // Poll all tasks + const results = []; + for (const task of tasks) { + const result = await pollTaskUntilComplete(zai, task.taskId); + results.push({ + prompt: task.prompt, + taskId: task.taskId, + ...result + }); + } + + return results; +} + +async function pollTaskUntilComplete(zai, taskId) { + let pollCount = 0; + const maxPolls = 60; + + while (pollCount < maxPolls) { + const result = await zai.async.result.query(taskId); + + if (result.task_status === 'SUCCESS') { + return { + success: true, + url: result.video_result?.[0]?.url || + result.video_url || + result.url || + result.video + }; + } + + if (result.task_status === 'FAIL') { + return { success: false, error: 'Generation failed' }; + } + + pollCount++; + await new Promise(resolve => setTimeout(resolve, 5000)); + } + + return { success: false, error: 'Timeout' }; +} + +// Usage +const prompts = [ + 'A cat playing with yarn', + 'A dog running in a park', + 'A bird flying in the sky' +]; + +const videos = await generateMultipleVideos(prompts); +videos.forEach(video => { + console.log(`${video.prompt}: ${video.success ? video.url : video.error}`); +}); +``` + +## Configuration Parameters + +### Video Generation Parameters + +| Parameter | Type | Required | Description | Default | +|-----------|------|----------|-------------|---------| +| `prompt` | string | Optional* | Text description of the video | - | +| `image_url` | string \| string[] | Optional* | Image URL(s) for generation | - | +| `quality` | string | Optional | Output mode: `'speed'` or `'quality'` | `'speed'` | +| `with_audio` | boolean | Optional | Generate AI audio effects | `false` | +| `size` | string | Optional | Video resolution (e.g., `'1920x1080'`) | - | +| `fps` | number | Optional | Frame rate: `30` or `60` | `30` | +| `duration` | number | Optional | Duration in seconds: `5` or `10` | `5` | +| `model` | string | Optional | Model name | - | + +*Note: At least one of `prompt` or `image_url` must be provided. + +### Image URL Formats + +```javascript +// Single image (starting frame) +image_url: 'https://example.com/image.jpg' + +// Multiple images (start and end frames) +image_url: [ + 'https://example.com/start.jpg', + 'https://example.com/end.jpg' +] +``` + +### Task Status Values + +- `PROCESSING`: Task is being processed +- `SUCCESS`: Task completed successfully +- `FAIL`: Task failed + +## Response Formats + +### Task Creation Response + +```json +{ + "id": "task-12345", + "task_status": "PROCESSING", + "model": "video-model-v1" +} +``` + +### Task Query Response (Success) + +```json +{ + "task_status": "SUCCESS", + "model": "video-model-v1", + "request_id": "req-67890", + "video_result": [ + { + "url": "https://cdn.example.com/generated-video.mp4" + } + ] +} +``` + +### Task Query Response (Processing) + +```json +{ + "task_status": "PROCESSING", + "id": "task-12345", + "model": "video-model-v1" +} +``` + +## Best Practices + +### 1. Polling Strategy + +```javascript +// Recommended polling implementation +async function smartPoll(zai, taskId) { + // Check immediately (some tasks complete fast) + let result = await zai.async.result.query(taskId); + + if (result.task_status !== 'PROCESSING') { + return result; + } + + // Start polling with reasonable intervals + let interval = 5000; // 5 seconds + let maxPolls = 60; // 5 minutes total + + for (let i = 0; i < maxPolls; i++) { + await new Promise(resolve => setTimeout(resolve, interval)); + result = await zai.async.result.query(taskId); + + if (result.task_status !== 'PROCESSING') { + return result; + } + } + + throw new Error('Task timeout'); +} +``` + +### 2. Error Handling + +```javascript +async function safeVideoGeneration(params) { + try { + const zai = await ZAI.create(); + + // Validate parameters + if (!params.prompt && !params.image_url) { + throw new Error('Either prompt or image_url is required'); + } + + const task = await zai.video.generations.create(params); + const result = await smartPoll(zai, task.id); + + if (result.task_status === 'SUCCESS') { + const videoUrl = result.video_result?.[0]?.url || + result.video_url || + result.url || + result.video; + + if (!videoUrl) { + throw new Error('Video URL not found in response'); + } + + return { + success: true, + url: videoUrl, + taskId: task.id + }; + } else { + return { + success: false, + error: 'Video generation failed', + taskId: task.id + }; + } + } catch (error) { + console.error('Video generation error:', error); + return { + success: false, + error: error.message + }; + } +} +``` + +### 3. Resource Management + +- Cache the ZAI instance for multiple video generations +- Implement task ID storage for long-running operations +- Clean up completed tasks from your tracking system +- Implement timeout mechanisms to prevent infinite polling + +### 4. Quality vs Speed Trade-offs + +```javascript +// Fast generation for previews or high volume +const quickVideo = await zai.video.generations.create({ + prompt: 'A cat playing', + quality: 'speed', + duration: 5, + fps: 30 +}); + +// High quality for final production +const qualityVideo = await zai.video.generations.create({ + prompt: 'A cat playing', + quality: 'quality', + duration: 10, + fps: 60, + size: '1920x1080' +}); +``` + +### 5. Security Considerations + +- Validate all user inputs before creating tasks +- Implement rate limiting for video generation endpoints +- Store and validate task IDs securely +- Never expose SDK credentials in client-side code +- Set reasonable timeouts for polling operations + +## Common Use Cases + +1. **Social Media Content**: Generate short video clips for posts and stories +2. **Marketing Materials**: Create product demonstration videos +3. **Education**: Generate visual explanations and tutorials +4. **Entertainment**: Create animated content from descriptions +5. **Prototyping**: Quick video mockups for presentations +6. **Game Development**: Generate cutscene or background videos +7. **Content Automation**: Bulk video generation for various purposes + +## Integration Examples + +### Express.js API Endpoint + +```javascript +import express from 'express'; +import ZAI from 'z-ai-web-dev-sdk'; + +const app = express(); +app.use(express.json()); + +let zaiInstance; + +async function initZAI() { + zaiInstance = await ZAI.create(); +} + +// Create video generation task +app.post('/api/video/create', async (req, res) => { + try { + const { prompt, image_url, quality, duration } = req.body; + + if (!prompt && !image_url) { + return res.status(400).json({ + error: 'Either prompt or image_url is required' + }); + } + + // Note: image_url should preferably be base64-encoded image data + // Format: "data:image/jpeg;base64,..." or array of such strings + // URLs are also supported but less recommended + const task = await zaiInstance.video.generations.create({ + prompt, + image_url, // Accepts base64 data or URL + quality: quality || 'speed', + duration: duration || 5, + fps: 30 + }); + + res.json({ + success: true, + taskId: task.id, + status: task.task_status + }); + } catch (error) { + res.status(500).json({ + success: false, + error: error.message + }); + } +}); + +// Query task status +app.get('/api/video/status/:taskId', async (req, res) => { + try { + const { taskId } = req.params; + const result = await zaiInstance.async.result.query(taskId); + + const response = { + taskId: taskId, + status: result.task_status + }; + + if (result.task_status === 'SUCCESS') { + response.videoUrl = result.video_result?.[0]?.url || + result.video_url || + result.url || + result.video; + } + + res.json(response); + } catch (error) { + res.status(500).json({ + success: false, + error: error.message + }); + } +}); + +initZAI().then(() => { + app.listen(3000, () => { + console.log('Video generation API running on port 3000'); + }); +}); +``` + +### WebSocket Real-time Updates + +```javascript +import WebSocket from 'ws'; +import ZAI from 'z-ai-web-dev-sdk'; + +const wss = new WebSocket.Server({ port: 8080 }); +let zaiInstance; + +async function initZAI() { + zaiInstance = await ZAI.create(); +} + +wss.on('connection', (ws) => { + ws.on('message', async (message) => { + try { + const data = JSON.parse(message); + + if (data.action === 'generate') { + // Create task + const task = await zaiInstance.video.generations.create(data.params); + + ws.send(JSON.stringify({ + type: 'task_created', + taskId: task.id + })); + + // Poll for results and send updates + pollAndNotify(ws, task.id); + } + } catch (error) { + ws.send(JSON.stringify({ + type: 'error', + message: error.message + })); + } + }); +}); + +async function pollAndNotify(ws, taskId) { + let pollCount = 0; + const maxPolls = 60; + + while (pollCount < maxPolls) { + const result = await zaiInstance.async.result.query(taskId); + + ws.send(JSON.stringify({ + type: 'status_update', + taskId: taskId, + status: result.task_status + })); + + if (result.task_status === 'SUCCESS') { + ws.send(JSON.stringify({ + type: 'complete', + taskId: taskId, + videoUrl: result.video_result?.[0]?.url || + result.video_url || + result.url || + result.video + })); + break; + } + + if (result.task_status === 'FAIL') { + ws.send(JSON.stringify({ + type: 'failed', + taskId: taskId + })); + break; + } + + pollCount++; + await new Promise(resolve => setTimeout(resolve, 5000)); + } +} + +initZAI(); +``` + +## Troubleshooting + +**Issue**: "SDK must be used in backend" +- **Solution**: Ensure z-ai-web-dev-sdk is only imported and used in server-side code + +**Issue**: Task stays in PROCESSING status indefinitely +- **Solution**: Implement proper timeout mechanisms and consider the video complexity and duration + +**Issue**: Video URL not found in response +- **Solution**: Check multiple possible response fields (video_result, video_url, url, video) as shown in examples + +**Issue**: Task fails immediately +- **Solution**: Verify that parameters meet requirements (valid prompt/image_url, supported values for quality/fps/duration) + +**Issue**: Slow video generation +- **Solution**: Use 'speed' quality mode, reduce duration/fps, or consider simpler prompts + +**Issue**: Polling timeout +- **Solution**: Increase maxPolls value or pollInterval based on video duration and quality settings + +## Performance Tips + +1. **Use appropriate quality settings**: Choose 'speed' for quick results, 'quality' for final production +2. **Start with shorter durations**: Test with 5-second videos before generating longer content +3. **Implement intelligent polling**: Use exponential backoff to reduce API calls +4. **Cache ZAI instance**: Reuse the same instance for multiple video generations +5. **Parallel processing**: Create multiple tasks simultaneously and poll them independently +6. **Monitor and log**: Track task completion times to optimize your polling strategy + +## Remember + +- Always use z-ai-web-dev-sdk in backend code only +- Video generation is asynchronous - always implement proper polling +- Check multiple response fields for video URL to ensure compatibility +- Implement timeouts to prevent infinite polling loops +- Handle all three task statuses: PROCESSING, SUCCESS, and FAIL +- Consider rate limits and implement appropriate delays between requests +- The SDK is already installed - import as shown in examples diff --git a/skills/video-generation/scripts/video.ts b/skills/video-generation/scripts/video.ts new file mode 100755 index 0000000..8be3f44 --- /dev/null +++ b/skills/video-generation/scripts/video.ts @@ -0,0 +1,168 @@ +import ZAI from "z-ai-web-dev-sdk"; +import fs from "fs"; + +async function create() { + try { + const zai = await ZAI.create(); + + console.log("Creating video generation task..."); + + const task = await zai.video.generations.create({ + prompt: "A cat is playing with a ball.", + quality: "speed", + with_audio: false, + size: "1920x1080", + fps: 30, + duration: 5, + }); + + console.log(`Task created!`); + console.log(`Task ID: ${task.id}`); + console.log(`Task Status: ${task.task_status}`); + console.log(`Model: ${task.model || 'N/A'}`); + + return { zai, task }; + } catch (err: any) { + console.error("Video generation failed:", err?.message || err); + throw err; + } +} + +/** + * Example: Image-to-Video Generation using base64 + * IMPORTANT: Using base64-encoded image data is STRONGLY RECOMMENDED over URLs + * for better reliability and to avoid network-related issues. + * + * CRITICAL: Always match the MIME type to your actual image format. + */ +async function createFromImage(imagePath: string) { + try { + const zai = await ZAI.create(); + + console.log("Creating image-to-video generation task..."); + console.log(`Reading image from: ${imagePath}`); + + // Read image file and convert to base64 + const imageBuffer = fs.readFileSync(imagePath); + + // Detect MIME type from file extension + const imageExt = imagePath.split('.').pop()?.toLowerCase() || ''; + const mimeTypeMap: Record = { + 'jpg': 'image/jpeg', + 'jpeg': 'image/jpeg', + 'png': 'image/png', + 'gif': 'image/gif', + 'webp': 'image/webp', + 'bmp': 'image/bmp' + }; + const mimeType = mimeTypeMap[imageExt] || 'image/jpeg'; // Default to JPEG if unknown + + const base64Image = `data:${mimeType};base64,${imageBuffer.toString('base64')}`; + + console.log(`Image format detected: ${mimeType}`); + console.log(`Image converted to base64 (${base64Image.substring(0, 50)}...)`); + + // Create video generation task with base64 image + const task = await zai.video.generations.create({ + image_url: base64Image, // Use base64 with correct MIME type + prompt: "Animate this scene with gentle motion", + quality: "quality", + size: "1920x1080", + fps: 30, + duration: 5, + }); + + console.log(`Task created!`); + console.log(`Task ID: ${task.id}`); + console.log(`Task Status: ${task.task_status}`); + console.log(`Model: ${task.model || 'N/A'}`); + + return { zai, task }; + } catch (err: any) { + console.error("Image-to-video generation failed:", err?.message || err); + throw err; + } +} + +async function query(zai: any, taskId: string) { + try { + // 首次查询 + let result = await zai.async.result.query(taskId); + + if (result.task_status === 'SUCCESS') { + // 如果任务立即完成,直接返回结果 + console.log("\nTask completed immediately, fetching result..."); + displayResult(result); + return result; + } + + // 轮询查询结果 + console.log("\nPolling for result..."); + let pollCount = 0; + const maxPolls = 30; // 最多轮询30次 + const pollInterval = 10000; // 每10秒查询一次 + + while (result.task_status === 'PROCESSING' && pollCount < maxPolls) { + pollCount++; + console.log(`Poll ${pollCount}/${maxPolls}: Status is ${result.task_status}, waiting ${pollInterval / 1000}s...`); + await new Promise(resolve => setTimeout(resolve, pollInterval)); + result = await zai.async.result.query(taskId); + } + + displayResult(result); + return result; + } catch (err: any) { + console.error("Query failed:", err?.message || err); + throw err; + } +} + +async function main() { + try { + // Method 1: Text-to-Video (default) + const { zai, task } = await create(); + + // Method 2: Image-to-Video with base64 (RECOMMENDED for image input) + // Uncomment the lines below and comment out the lines above to use image-to-video + // Make sure to provide a valid image path + // const { zai, task } = await createFromImage('./path/to/your/image.jpg'); + + await query(zai, task.id); + } catch (err: any) { + console.error("Video generation failed:", err?.message || err); + process.exit(1); + } +} + +function displayResult(result: any) { + console.log("\n=== Result ==="); + console.log(`Task Status: ${result.task_status}`); + console.log(`Model: ${result.model || 'N/A'}`); + console.log(`Request ID: ${result.request_id || 'N/A'}`); + + if (result.task_status === 'SUCCESS') { + // 尝试从多种可能的字段中获取视频URL + const videoUrl = + result.video_result?.[0]?.url || + result.video_url || + result.url || + result.video; + + if (videoUrl) { + console.log(`\n✅ Video generated successfully!`); + console.log(`Video URL: ${videoUrl}`); + console.log(`\nYou can open this URL in your browser or download it.`); + } else { + console.log(`\n⚠️ Task completed but video URL not found in response.`); + console.log(`Full response:`, JSON.stringify(result, null, 2)); + } + } else if (result.task_status === 'PROCESSING') { + console.log(`\n⏳ Task is still processing. Please try again later.`); + console.log(`Task ID: ${result.id || 'N/A'}`); + } else if (result.task_status === 'FAIL') { + console.log(`\n❌ Task failed.`); + console.log(`Full response:`, JSON.stringify(result, null, 2)); + } +} + +main(); diff --git a/skills/video-understand/LICENSE.txt b/skills/video-understand/LICENSE.txt new file mode 100755 index 0000000..1e54539 --- /dev/null +++ b/skills/video-understand/LICENSE.txt @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2025 z-ai-web-dev-sdk Skills + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/skills/video-understand/SKILL.md b/skills/video-understand/SKILL.md new file mode 100755 index 0000000..c278bce --- /dev/null +++ b/skills/video-understand/SKILL.md @@ -0,0 +1,916 @@ +--- +name: video-understand +description: Implement specialized video understanding capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to analyze video content, understand motion and temporal sequences, extract information from video frames, describe video scenes, or perform video-based AI analysis. Optimized for MP4, AVI, MOV, and other common video formats. +license: MIT +--- + +# Video Understanding Skill + +This skill provides specialized video understanding functionality using the z-ai-web-dev-sdk package, enabling AI models to analyze, describe, and extract information from video content including motion, temporal sequences, and scene changes. + +## Skills Path + +**Skill Location**: `{project_path}/skills/video-understand` + +this skill is located at above path in your project. + +**Reference Scripts**: Example test scripts are available in the `{Skill Location}/scripts/` directory for quick testing and reference. See `{Skill Location}/scripts/video-understand.ts` for a working example. + +## Overview + +Video Understanding focuses specifically on video content analysis, providing capabilities for: +- Video scene understanding and description +- Action and motion detection +- Temporal sequence analysis +- Event detection in videos +- Video content summarization +- Scene change detection +- People and object tracking across frames +- Audio-visual content analysis (when applicable) + +**IMPORTANT**: z-ai-web-dev-sdk MUST be used in backend code only. Never use it in client-side code. + +## Prerequisites + +The z-ai-web-dev-sdk package is already installed. Import it as shown in the examples below. + +## CLI Usage (For Simple Tasks) + +For quick video analysis tasks, you can use the z-ai CLI instead of writing code. This is ideal for simple video descriptions, testing, or automation. + +### Basic Video Analysis + +```bash +# Analyze a video from URL +z-ai vision --prompt "Summarize what happens in this video" --image "https://example.com/video.mp4" + +# Note: Use --image flag for video URLs as well +z-ai vision -p "Describe the key events" -i "https://example.com/presentation.mp4" +``` + +### Analyze Local Videos + +```bash +# Analyze a local video file +z-ai vision -p "What activities are shown in this video?" -i "./recording.mp4" + +# Save response to file +z-ai vision -p "Provide a detailed summary" -i "./meeting.mp4" -o summary.json +``` + +### Advanced Video Analysis + +```bash +# Complex scene understanding with thinking +z-ai vision \ + -p "Analyze this video and identify: 1) Main events, 2) People and their actions, 3) Timeline of key moments" \ + -i "./event.mp4" \ + --thinking \ + -o analysis.json + +# Action detection +z-ai vision \ + -p "Identify all actions performed by people in this video" \ + -i "./sports.mp4" \ + --thinking +``` + +### Streaming Output + +```bash +# Stream the video analysis +z-ai vision -p "Describe this video content" -i "./video.mp4" --stream +``` + +### CLI Parameters + +- `--prompt, -p `: **Required** - Question or instruction about the video +- `--image, -i `: Optional - Video URL or local file path (despite the name, it works for videos too) +- `--thinking, -t`: Optional - Enable chain-of-thought reasoning for complex analysis (default: disabled) +- `--output, -o `: Optional - Output file path (JSON format) +- `--stream`: Optional - Stream the response in real-time + +### Supported Video Formats + +- MP4 (.mp4) - Most widely supported format +- AVI (.avi) - Audio Video Interleave +- MOV (.mov) - QuickTime format +- WebM (.webm) - Web-optimized format +- MKV (.mkv) - Matroska format +- FLV (.flv) - Flash Video format + +### When to Use CLI vs SDK + +**Use CLI for:** +- Quick video summaries +- One-off video analysis +- Testing video understanding capabilities +- Simple automation scripts +- Generating video descriptions + +**Use SDK for:** +- Multi-turn conversations about videos +- Complex video processing pipelines +- Production applications with error handling +- Custom integration with video processing logic +- Batch video processing with custom workflows + +## Recommended Approach + +For better performance and reliability with local videos, consider: +1. Uploading videos to a CDN and using URLs +2. For shorter videos, convert key frames to images for faster analysis +3. For long videos, consider chunking or sampling at intervals + +## Basic Video Understanding Implementation + +### Single Video Analysis + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function analyzeVideo(videoUrl, prompt) { + const zai = await ZAI.create(); + + const response = await zai.chat.completions.createVision({ + messages: [ + { + role: 'user', + content: [ + { + type: 'text', + text: prompt + }, + { + type: 'video_url', + video_url: { + url: videoUrl + } + } + ] + } + ], + thinking: { type: 'disabled' } + }); + + return response.choices[0]?.message?.content; +} + +// Usage examples +const summary = await analyzeVideo( + 'https://example.com/presentation.mp4', + 'Summarize the key points presented in this video' +); + +const actionDetection = await analyzeVideo( + 'https://example.com/sports.mp4', + 'Identify and describe all athletic actions performed in this video' +); +``` + +### Video Scene Understanding + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function understandVideoScenes(videoUrl) { + const zai = await ZAI.create(); + + const prompt = `Analyze this video and provide: +1. Overall summary of the video content +2. Main scenes or segments (with approximate timestamps if possible) +3. Key people or characters and their roles +4. Important actions or events in chronological order +5. Setting and environment description +6. Overall mood or tone`; + + const response = await zai.chat.completions.createVision({ + messages: [ + { + role: 'user', + content: [ + { type: 'text', text: prompt }, + { type: 'video_url', video_url: { url: videoUrl } } + ] + } + ], + thinking: { type: 'enabled' } // Enable for detailed analysis + }); + + return response.choices[0]?.message?.content; +} + +// Usage +const sceneAnalysis = await understandVideoScenes( + 'https://example.com/documentary.mp4' +); +``` + +### Motion and Action Detection + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function detectActions(videoUrl, specificAction = null) { + const zai = await ZAI.create(); + + const prompt = specificAction + ? `Identify all instances of "${specificAction}" in this video. For each instance, describe when it occurs and provide details about how it's performed.` + : 'Identify and describe all significant actions and movements in this video. Include who is performing them and when they occur.'; + + const response = await zai.chat.completions.createVision({ + messages: [ + { + role: 'user', + content: [ + { type: 'text', text: prompt }, + { type: 'video_url', video_url: { url: videoUrl } } + ] + } + ], + thinking: { type: 'enabled' } + }); + + return response.choices[0]?.message?.content; +} + +// Usage +const runningActions = await detectActions( + 'https://example.com/sports.mp4', + 'running' +); + +const allActions = await detectActions( + 'https://example.com/activity.mp4' +); +``` + +### Event Timeline Extraction + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function extractTimeline(videoUrl) { + const zai = await ZAI.create(); + + const prompt = `Create a detailed timeline of events in this video: +- Identify key moments and transitions +- Note approximate timing (beginning, middle, end or specific timestamps if visible) +- Describe what happens at each key point +- Identify any cause-and-effect relationships between events + +Format as a chronological list.`; + + const response = await zai.chat.completions.createVision({ + messages: [ + { + role: 'user', + content: [ + { type: 'text', text: prompt }, + { type: 'video_url', video_url: { url: videoUrl } } + ] + } + ], + thinking: { type: 'enabled' } + }); + + return response.choices[0]?.message?.content; +} +``` + +### Video Content Classification + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function classifyVideo(videoUrl) { + const zai = await ZAI.create(); + + const prompt = `Classify this video content: +1. Primary category (e.g., educational, entertainment, sports, news, tutorial) +2. Sub-category or genre +3. Target audience +4. Content style (professional, casual, documentary, etc.) +5. Key themes or topics +6. Suggested tags (10-15 keywords) + +Format your response as structured JSON.`; + + const response = await zai.chat.completions.createVision({ + messages: [ + { + role: 'user', + content: [ + { type: 'text', text: prompt }, + { type: 'video_url', video_url: { url: videoUrl } } + ] + } + ], + thinking: { type: 'disabled' } + }); + + const content = response.choices[0]?.message?.content; + + try { + return JSON.parse(content); + } catch (e) { + return { rawResponse: content }; + } +} +``` + +## Advanced Use Cases + +### Multi-turn Video Conversation + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +class VideoConversation { + constructor() { + this.messages = []; + } + + async initialize() { + this.zai = await ZAI.create(); + } + + async loadVideo(videoUrl, initialQuestion) { + this.messages.push({ + role: 'user', + content: [ + { type: 'text', text: initialQuestion }, + { type: 'video_url', video_url: { url: videoUrl } } + ] + }); + + return this.getResponse(); + } + + async askFollowUp(question) { + this.messages.push({ + role: 'user', + content: [ + { type: 'text', text: question } + ] + }); + + return this.getResponse(); + } + + async getResponse() { + const response = await this.zai.chat.completions.createVision({ + messages: this.messages, + thinking: { type: 'disabled' } + }); + + const assistantMessage = response.choices[0]?.message?.content; + + this.messages.push({ + role: 'assistant', + content: assistantMessage + }); + + return assistantMessage; + } +} + +// Usage +const conversation = new VideoConversation(); +await conversation.initialize(); + +const initial = await conversation.loadVideo( + 'https://example.com/lecture.mp4', + 'What is the main topic of this lecture?' +); + +const followup1 = await conversation.askFollowUp( + 'Can you explain the key concepts mentioned?' +); + +const followup2 = await conversation.askFollowUp( + 'What examples were used to illustrate these concepts?' +); +``` + +### Video Quality Assessment + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function assessVideoQuality(videoUrl) { + const zai = await ZAI.create(); + + const prompt = `Assess the quality of this video: +1. Visual quality (resolution, clarity, lighting) - Rate 1-10 +2. Audio quality (if audio is present) - Rate 1-10 +3. Camera work (stability, framing, composition) - Rate 1-10 +4. Production value (editing, transitions, effects) - Rate 1-10 +5. Content clarity (is the message clear?) - Rate 1-10 +6. Pacing (too fast, too slow, just right) +7. Technical issues (artifacts, blur, audio sync, etc.) +8. Overall rating - 1-10 +9. Specific recommendations for improvement + +Provide detailed feedback for each criterion.`; + + const response = await zai.chat.completions.createVision({ + messages: [ + { + role: 'user', + content: [ + { type: 'text', text: prompt }, + { type: 'video_url', video_url: { url: videoUrl } } + ] + } + ], + thinking: { type: 'enabled' } + }); + + return response.choices[0]?.message?.content; +} +``` + +### Video Content Moderation + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function moderateVideo(videoUrl) { + const zai = await ZAI.create(); + + const prompt = `Review this video for content moderation: +1. Check for any inappropriate or sensitive content +2. Identify any potential safety concerns +3. Note any content that might violate common community guidelines +4. Assess age-appropriateness +5. Identify any copyrighted material visible (logos, brands, music) +6. Overall safety rating: Safe / Caution / Review Required + +Provide specific examples for any concerns identified.`; + + const response = await zai.chat.completions.createVision({ + messages: [ + { + role: 'user', + content: [ + { type: 'text', text: prompt }, + { type: 'video_url', video_url: { url: videoUrl } } + ] + } + ], + thinking: { type: 'enabled' } + }); + + return response.choices[0]?.message?.content; +} +``` + +### Video Transcript Generation (Visual Description) + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function generateVisualTranscript(videoUrl) { + const zai = await ZAI.create(); + + const prompt = `Generate a detailed visual transcript of this video: +- Describe what's happening in each scene +- Note any text that appears on screen +- Describe important visual elements +- Mention any scene changes or transitions +- Include descriptions of people's actions and expressions + +Format as a time-based narrative (e.g., "At the beginning...", "Then...", "Finally...").`; + + const response = await zai.chat.completions.createVision({ + messages: [ + { + role: 'user', + content: [ + { type: 'text', text: prompt }, + { type: 'video_url', video_url: { url: videoUrl } } + ] + } + ], + thinking: { type: 'disabled' } + }); + + return response.choices[0]?.message?.content; +} +``` + +### Sports Video Analysis + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function analyzeSportsVideo(videoUrl, sport = null) { + const zai = await ZAI.create(); + + const prompt = sport + ? `Analyze this ${sport} video in detail: +1. Identify players and their positions +2. Describe key plays and strategies +3. Note scoring events or important moments +4. Assess player performance +5. Identify any rule violations or fouls +6. Describe the pace and flow of the game` + : `Analyze this sports video: +1. Identify the sport being played +2. Describe the key actions and plays +3. Note any scoring or significant events +4. Describe player movements and strategies +5. Overall assessment of the game or match`; + + const response = await zai.chat.completions.createVision({ + messages: [ + { + role: 'user', + content: [ + { type: 'text', text: prompt }, + { type: 'video_url', video_url: { url: videoUrl } } + ] + } + ], + thinking: { type: 'enabled' } + }); + + return response.choices[0]?.message?.content; +} +``` + +### Educational Video Summarization + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function summarizeEducationalVideo(videoUrl) { + const zai = await ZAI.create(); + + const prompt = `Summarize this educational video for students: +1. Main topic or learning objective +2. Key concepts explained (in order) +3. Important definitions or terminology +4. Examples used to illustrate concepts +5. Visual aids or demonstrations shown +6. Key takeaways or conclusions +7. Suggested review points + +Format as a study guide.`; + + const response = await zai.chat.completions.createVision({ + messages: [ + { + role: 'user', + content: [ + { type: 'text', text: prompt }, + { type: 'video_url', video_url: { url: videoUrl } } + ] + } + ], + thinking: { type: 'enabled' } + }); + + return response.choices[0]?.message?.content; +} +``` + +## Batch Video Processing + +### Process Multiple Videos + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +class VideoBatchProcessor { + constructor() { + this.zai = null; + } + + async initialize() { + this.zai = await ZAI.create(); + } + + async processVideo(videoUrl, prompt) { + const response = await this.zai.chat.completions.createVision({ + messages: [ + { + role: 'user', + content: [ + { type: 'text', text: prompt }, + { type: 'video_url', video_url: { url: videoUrl } } + ] + } + ], + thinking: { type: 'disabled' } + }); + + return response.choices[0]?.message?.content; + } + + async processBatch(videoUrls, prompt) { + const results = []; + + for (const videoUrl of videoUrls) { + try { + console.log(`Processing: ${videoUrl}`); + const result = await this.processVideo(videoUrl, prompt); + results.push({ videoUrl, success: true, result }); + + // Add delay to avoid rate limiting + await new Promise(resolve => setTimeout(resolve, 1000)); + } catch (error) { + results.push({ + videoUrl, + success: false, + error: error.message + }); + } + } + + return results; + } +} + +// Usage +const processor = new VideoBatchProcessor(); +await processor.initialize(); + +const videos = [ + 'https://example.com/video1.mp4', + 'https://example.com/video2.mp4', + 'https://example.com/video3.mp4' +]; + +const results = await processor.processBatch( + videos, + 'Provide a brief summary of this video suitable for a content catalog' +); +``` + +## Best Practices + +### 1. Video Preparation +- Use standard video formats (MP4, MOV, AVI) +- Ensure videos are accessible via public URLs or properly encoded +- For long videos, consider creating shorter clips for specific analysis +- Optimize video size for faster processing +- Ensure good lighting and audio quality in source videos + +### 2. Prompt Engineering for Videos +- Be specific about temporal aspects ("beginning", "throughout", "at the end") +- Mention what type of analysis you need (actions, events, scenes, etc.) +- For long videos, ask for summaries or key moments +- Use thinking mode for complex temporal reasoning +- Specify if you need chronological or thematic organization + +### 3. Error Handling + +```javascript +async function safeVideoAnalysis(videoUrl, prompt) { + try { + const zai = await ZAI.create(); + + const response = await zai.chat.completions.createVision({ + messages: [ + { + role: 'user', + content: [ + { type: 'text', text: prompt }, + { type: 'video_url', video_url: { url: videoUrl } } + ] + } + ], + thinking: { type: 'disabled' } + }); + + return { + success: true, + content: response.choices[0]?.message?.content + }; + } catch (error) { + console.error('Video analysis error:', error); + return { + success: false, + error: error.message + }; + } +} +``` + +### 4. Performance Optimization +- Cache SDK instance for batch processing +- Implement request throttling (add delays between requests) +- Process videos asynchronously when possible +- For very long videos, consider analyzing at specific intervals +- Use appropriate thinking mode (disabled for simple descriptions, enabled for complex analysis) + +### 5. Security Considerations +- Validate video URLs before processing +- Implement rate limiting for public APIs +- Sanitize user-provided video URLs +- Never expose SDK credentials in client-side code +- Implement content moderation for user-uploaded videos +- Consider video file size limits + +## Common Use Cases + +1. **Content Moderation**: Automatically review video uploads for policy compliance +2. **Video Cataloging**: Generate descriptions and tags for video libraries +3. **Sports Analysis**: Analyze games, identify plays, assess performance +4. **Educational Content**: Summarize lectures, create study guides +5. **Security & Surveillance**: Detect events, track activities (with appropriate authorization) +6. **Quality Control**: Assess video production quality +7. **Social Media**: Generate video captions and descriptions +8. **Training & Documentation**: Analyze training videos, create documentation +9. **Event Recording**: Summarize meetings, conferences, presentations +10. **Entertainment**: Analyze films, shows for content, themes, scenes + +## Integration Examples + +### Express.js API Endpoint + +```javascript +import express from 'express'; +import ZAI from 'z-ai-web-dev-sdk'; + +const app = express(); +app.use(express.json()); + +let zaiInstance; + +async function initZAI() { + zaiInstance = await ZAI.create(); +} + +// Analyze video from URL +app.post('/api/analyze-video', async (req, res) => { + try { + const { videoUrl, prompt } = req.body; + + if (!videoUrl || !prompt) { + return res.status(400).json({ + error: 'videoUrl and prompt are required' + }); + } + + const response = await zaiInstance.chat.completions.createVision({ + messages: [ + { + role: 'user', + content: [ + { type: 'text', text: prompt }, + { type: 'video_url', video_url: { url: videoUrl } } + ] + } + ], + thinking: { type: 'disabled' } + }); + + res.json({ + success: true, + analysis: response.choices[0]?.message?.content + }); + } catch (error) { + res.status(500).json({ + success: false, + error: error.message + }); + } +}); + +// Get video summary +app.post('/api/video-summary', async (req, res) => { + try { + const { videoUrl } = req.body; + + if (!videoUrl) { + return res.status(400).json({ error: 'videoUrl is required' }); + } + + const prompt = 'Provide a comprehensive summary of this video including: 1) Main content/topic, 2) Key events in chronological order, 3) Important people or subjects, 4) Overall takeaway.'; + + const response = await zaiInstance.chat.completions.createVision({ + messages: [ + { + role: 'user', + content: [ + { type: 'text', text: prompt }, + { type: 'video_url', video_url: { url: videoUrl } } + ] + } + ], + thinking: { type: 'enabled' } + }); + + res.json({ + success: true, + summary: response.choices[0]?.message?.content + }); + } catch (error) { + res.status(500).json({ + success: false, + error: error.message + }); + } +}); + +initZAI().then(() => { + app.listen(3000, () => { + console.log('Video understanding API running on port 3000'); + }); +}); +``` + +### Next.js API Route + +```javascript +// pages/api/video-understand.js +import ZAI from 'z-ai-web-dev-sdk'; + +let zaiInstance = null; + +async function getZAI() { + if (!zaiInstance) { + zaiInstance = await ZAI.create(); + } + return zaiInstance; +} + +export default async function handler(req, res) { + if (req.method !== 'POST') { + return res.status(405).json({ error: 'Method not allowed' }); + } + + try { + const { videoUrl, prompt, enableThinking = false } = req.body; + + if (!videoUrl || !prompt) { + return res.status(400).json({ + error: 'videoUrl and prompt are required' + }); + } + + const zai = await getZAI(); + + const response = await zai.chat.completions.createVision({ + messages: [ + { + role: 'user', + content: [ + { type: 'text', text: prompt }, + { type: 'video_url', video_url: { url: videoUrl } } + ] + } + ], + thinking: { type: enableThinking ? 'enabled' : 'disabled' } + }); + + res.status(200).json({ + success: true, + analysis: response.choices[0]?.message?.content + }); + } catch (error) { + console.error('Error:', error); + res.status(500).json({ + success: false, + error: error.message + }); + } +} +``` + +## Troubleshooting + +**Issue**: "SDK must be used in backend" +- **Solution**: Ensure z-ai-web-dev-sdk is only imported and used in server-side code, never in client/browser code + +**Issue**: Video not loading or being analyzed +- **Solution**: Verify the video URL is accessible, returns correct MIME type, and is in a supported format + +**Issue**: Inaccurate temporal analysis +- **Solution**: Enable thinking mode for complex temporal reasoning, provide more specific prompts about time/sequence + +**Issue**: Slow response times for videos +- **Solution**: Videos take longer to process than images; consider shorter clips or sampling for long videos + +**Issue**: Missing details from video +- **Solution**: Be more specific in your prompt, ask about particular time segments or aspects + +**Issue**: Video format not supported +- **Solution**: Convert video to MP4 (most widely supported), check that URL returns proper video MIME type + +## Remember + +- Always use z-ai-web-dev-sdk in backend code only +- The SDK is already installed - import as shown in examples +- Use `video_url` content type for video files +- Video analysis takes longer than image analysis - be patient +- Enable thinking mode for complex temporal reasoning and event detection +- Structure prompts to include temporal information (beginning, middle, end) +- Handle errors gracefully in production +- Implement rate limiting and delays for batch processing +- Validate and sanitize user inputs +- Consider privacy and security when processing user videos +- For very long videos, consider analyzing specific segments or key frames diff --git a/skills/video-understand/scripts/video-understand.ts b/skills/video-understand/scripts/video-understand.ts new file mode 100755 index 0000000..9df6aa3 --- /dev/null +++ b/skills/video-understand/scripts/video-understand.ts @@ -0,0 +1,41 @@ +import ZAI, { VisionMessage } from 'z-ai-web-dev-sdk'; + +async function main(videoUrl: string, prompt: string) { + try { + const zai = await ZAI.create(); + + const messages: VisionMessage[] = [ + { + role: 'assistant', + content: [ + { type: 'text', text: 'Output only text, no markdown.' } + ] + }, + { + role: 'user', + content: [ + { type: 'text', text: prompt }, + { type: 'video_url', video_url: { url: videoUrl } } + ] + } + ]; + + const response = await zai.chat.completions.createVision({ + model: 'glm-4.6v', + messages, + thinking: { type: 'disabled' } + }); + + const reply = response.choices?.[0]?.message?.content; + console.log('Video Understanding Result:'); + console.log(reply ?? JSON.stringify(response, null, 2)); + } catch (err: any) { + console.error('Video understanding failed:', err?.message || err); + } +} + +// Example usage - analyze a video +main( + "https://example.com/sample-video.mp4", + "Please analyze this video and describe the main events, actions, and key moments in chronological order." +); diff --git a/skills/web-reader/LICENSE.txt b/skills/web-reader/LICENSE.txt new file mode 100755 index 0000000..1e54539 --- /dev/null +++ b/skills/web-reader/LICENSE.txt @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2025 z-ai-web-dev-sdk Skills + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/skills/web-reader/SKILL.md b/skills/web-reader/SKILL.md new file mode 100755 index 0000000..ed3f9db --- /dev/null +++ b/skills/web-reader/SKILL.md @@ -0,0 +1,1140 @@ +--- +name: web-reader +description: Implement web page content extraction capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to scrape web pages, extract article content, retrieve page metadata, or build applications that process web content. Supports automatic content extraction with title, HTML, and publication time retrieval. +license: MIT +--- + +# Web Reader Skill + +This skill guides the implementation of web page reading and content extraction functionality using the z-ai-web-dev-sdk package, enabling applications to fetch and process web page content programmatically. + +## Skills Path + +**Skill Location**: `{project_path}/skills/web-reader` + +This skill is located at the above path in your project. + +**Reference Scripts**: Example test scripts are available in the `{Skill Location}/scripts/` directory for quick testing and reference. See `{Skill Location}/scripts/web-reader.ts` for a working example. + +## Overview + +Web Reader allows you to build applications that can extract content from web pages, retrieve article metadata, and process HTML content. The API automatically handles content extraction, providing clean, structured data from any web URL. + +**IMPORTANT**: z-ai-web-dev-sdk MUST be used in backend code only. Never use it in client-side code. + +## Prerequisites + +The z-ai-web-dev-sdk package is already installed. Import it as shown in the examples below. + +## CLI Usage (For Simple Tasks) + +For simple web page content extraction, you can use the z-ai CLI instead of writing code. This is ideal for quick content scraping, testing URLs, or simple automation tasks. + +### Basic Page Reading + +```bash +# Extract content from a web page +z-ai function --name "page_reader" --args '{"url": "https://example.com"}' + +# Using short options +z-ai function -n page_reader -a '{"url": "https://www.example.com/article"}' +``` + +### Save Page Content + +```bash +# Save extracted content to JSON file +z-ai function \ + -n page_reader \ + -a '{"url": "https://news.example.com/article"}' \ + -o page_content.json + +# Extract and save blog post +z-ai function \ + -n page_reader \ + -a '{"url": "https://blog.example.com/post/123"}' \ + -o blog_post.json +``` + +### Common Use Cases + +```bash +# Extract news article +z-ai function \ + -n page_reader \ + -a '{"url": "https://news.site.com/breaking-news"}' \ + -o news.json + +# Read documentation page +z-ai function \ + -n page_reader \ + -a '{"url": "https://docs.example.com/getting-started"}' \ + -o docs.json + +# Scrape blog content +z-ai function \ + -n page_reader \ + -a '{"url": "https://techblog.com/ai-trends-2024"}' \ + -o blog.json + +# Extract research article +z-ai function \ + -n page_reader \ + -a '{"url": "https://research.org/papers/quantum-computing"}' \ + -o research.json +``` + +### CLI Parameters + +- `--name, -n`: **Required** - Function name (use "page_reader") +- `--args, -a`: **Required** - JSON arguments object with: + - `url` (string, required): The URL of the web page to read +- `--output, -o `: Optional - Output file path (JSON format) + +### Response Structure + +The CLI returns a JSON object containing: +- `title`: Page title +- `html`: Main content HTML +- `text`: Plain text content +- `publish_time`: Publication timestamp (if available) +- `url`: Original URL +- `metadata`: Additional page metadata + +### Example Response + +```json +{ + "title": "Introduction to Machine Learning", + "html": "

Introduction to Machine Learning

Machine learning is...

", + "text": "Introduction to Machine Learning\n\nMachine learning is...", + "publish_time": "2024-01-15T10:30:00Z", + "url": "https://example.com/ml-intro", + "metadata": { + "author": "John Doe", + "description": "A comprehensive guide to ML" + } +} +``` + +### Processing Multiple URLs + +```bash +# Create a simple script to process multiple URLs +for url in \ + "https://site1.com/article1" \ + "https://site2.com/article2" \ + "https://site3.com/article3" +do + filename=$(echo $url | md5sum | cut -d' ' -f1) + z-ai function -n page_reader -a "{\"url\": \"$url\"}" -o "${filename}.json" +done +``` + +### When to Use CLI vs SDK + +**Use CLI for:** +- Quick content extraction +- Testing URL accessibility +- Simple web scraping tasks +- One-off content retrieval + +**Use SDK for:** +- Batch URL processing with custom logic +- Integration with web applications +- Complex content processing pipelines +- Production applications with error handling + +## How It Works + +The Web Reader uses the `page_reader` function to: +1. Fetch the web page content +2. Extract main article content and metadata +3. Parse and clean the HTML +4. Return structured data including title, content, and publication time + +## Basic Web Reading Implementation + +### Simple Page Reading + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function readWebPage(url) { + try { + const zai = await ZAI.create(); + + const result = await zai.functions.invoke('page_reader', { + url: url + }); + + console.log('Title:', result.data.title); + console.log('URL:', result.data.url); + console.log('Published:', result.data.publishedTime); + console.log('HTML Content:', result.data.html); + console.log('Tokens Used:', result.data.usage.tokens); + + return result.data; + } catch (error) { + console.error('Page reading failed:', error.message); + throw error; + } +} + +// Usage +const pageData = await readWebPage('https://example.com/article'); +console.log('Page title:', pageData.title); +``` + +### Extract Article Text Only + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function extractArticleText(url) { + const zai = await ZAI.create(); + + const result = await zai.functions.invoke('page_reader', { + url: url + }); + + // Convert HTML to plain text (basic approach) + const plainText = result.data.html + .replace(/<[^>]*>/g, ' ') + .replace(/\s+/g, ' ') + .trim(); + + return { + title: result.data.title, + text: plainText, + url: result.data.url, + publishedTime: result.data.publishedTime + }; +} + +// Usage +const article = await extractArticleText('https://news.example.com/story'); +console.log(article.title); +console.log(article.text.substring(0, 200) + '...'); +``` + +### Read Multiple Pages + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function readMultiplePages(urls) { + const zai = await ZAI.create(); + const results = []; + + for (const url of urls) { + try { + const result = await zai.functions.invoke('page_reader', { + url: url + }); + + results.push({ + url: url, + success: true, + data: result.data + }); + } catch (error) { + results.push({ + url: url, + success: false, + error: error.message + }); + } + } + + return results; +} + +// Usage +const urls = [ + 'https://example.com/article1', + 'https://example.com/article2', + 'https://example.com/article3' +]; + +const pages = await readMultiplePages(urls); +pages.forEach(page => { + if (page.success) { + console.log(`✓ ${page.data.title}`); + } else { + console.log(`✗ ${page.url}: ${page.error}`); + } +}); +``` + +## Advanced Use Cases + +### Web Content Analyzer + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +class WebContentAnalyzer { + constructor() { + this.cache = new Map(); + } + + async initialize() { + this.zai = await ZAI.create(); + } + + async readPage(url, useCache = true) { + // Check cache + if (useCache && this.cache.has(url)) { + console.log('Returning cached result for:', url); + return this.cache.get(url); + } + + // Fetch fresh content + const result = await this.zai.functions.invoke('page_reader', { + url: url + }); + + // Cache the result + if (useCache) { + this.cache.set(url, result.data); + } + + return result.data; + } + + async getPageMetadata(url) { + const data = await this.readPage(url); + + return { + title: data.title, + url: data.url, + publishedTime: data.publishedTime, + contentLength: data.html.length, + wordCount: this.estimateWordCount(data.html) + }; + } + + estimateWordCount(html) { + const text = html.replace(/<[^>]*>/g, ' '); + const words = text.split(/\s+/).filter(word => word.length > 0); + return words.length; + } + + async comparePages(url1, url2) { + const [page1, page2] = await Promise.all([ + this.readPage(url1), + this.readPage(url2) + ]); + + return { + page1: { + title: page1.title, + wordCount: this.estimateWordCount(page1.html), + published: page1.publishedTime + }, + page2: { + title: page2.title, + wordCount: this.estimateWordCount(page2.html), + published: page2.publishedTime + } + }; + } + + clearCache() { + this.cache.clear(); + } +} + +// Usage +const analyzer = new WebContentAnalyzer(); +await analyzer.initialize(); + +const metadata = await analyzer.getPageMetadata('https://example.com/article'); +console.log('Article Metadata:', metadata); + +const comparison = await analyzer.comparePages( + 'https://example.com/article1', + 'https://example.com/article2' +); +console.log('Comparison:', comparison); +``` + +### RSS Feed Reader + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +class FeedReader { + constructor() { + this.articles = []; + } + + async initialize() { + this.zai = await ZAI.create(); + } + + async fetchArticlesFromUrls(urls) { + const articles = []; + + for (const url of urls) { + try { + const result = await this.zai.functions.invoke('page_reader', { + url: url + }); + + articles.push({ + title: result.data.title, + url: result.data.url, + publishedTime: result.data.publishedTime, + content: result.data.html, + fetchedAt: new Date().toISOString() + }); + + console.log(`Fetched: ${result.data.title}`); + } catch (error) { + console.error(`Failed to fetch ${url}:`, error.message); + } + } + + this.articles = articles; + return articles; + } + + getRecentArticles(limit = 10) { + return this.articles + .sort((a, b) => { + const dateA = new Date(a.publishedTime || a.fetchedAt); + const dateB = new Date(b.publishedTime || b.fetchedAt); + return dateB - dateA; + }) + .slice(0, limit); + } + + searchArticles(keyword) { + return this.articles.filter(article => { + const searchText = `${article.title} ${article.content}`.toLowerCase(); + return searchText.includes(keyword.toLowerCase()); + }); + } +} + +// Usage +const reader = new FeedReader(); +await reader.initialize(); + +const feedUrls = [ + 'https://example.com/article1', + 'https://example.com/article2', + 'https://example.com/article3' +]; + +await reader.fetchArticlesFromUrls(feedUrls); +const recent = reader.getRecentArticles(5); +console.log('Recent articles:', recent.map(a => a.title)); +``` + +### Content Aggregator + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function aggregateContent(urls, options = {}) { + const zai = await ZAI.create(); + const aggregated = { + sources: [], + totalWords: 0, + aggregatedAt: new Date().toISOString() + }; + + for (const url of urls) { + try { + const result = await zai.functions.invoke('page_reader', { + url: url + }); + + const text = result.data.html.replace(/<[^>]*>/g, ' '); + const wordCount = text.split(/\s+/).filter(w => w.length > 0).length; + + aggregated.sources.push({ + title: result.data.title, + url: result.data.url, + publishedTime: result.data.publishedTime, + wordCount: wordCount, + excerpt: text.substring(0, 200).trim() + '...' + }); + + aggregated.totalWords += wordCount; + + if (options.delay) { + await new Promise(resolve => setTimeout(resolve, options.delay)); + } + } catch (error) { + console.error(`Failed to fetch ${url}:`, error.message); + } + } + + return aggregated; +} + +// Usage +const sources = [ + 'https://example.com/news1', + 'https://example.com/news2', + 'https://example.com/news3' +]; + +const aggregated = await aggregateContent(sources, { delay: 1000 }); +console.log(`Aggregated ${aggregated.sources.length} sources`); +console.log(`Total words: ${aggregated.totalWords}`); +``` + +### Web Scraping Pipeline + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +class ScrapingPipeline { + constructor() { + this.processors = []; + } + + async initialize() { + this.zai = await ZAI.create(); + } + + addProcessor(name, processorFn) { + this.processors.push({ name, fn: processorFn }); + } + + async scrape(url) { + // Fetch the page + const result = await this.zai.functions.invoke('page_reader', { + url: url + }); + + let data = { + raw: result.data, + processed: {} + }; + + // Run through processors + for (const processor of this.processors) { + try { + data.processed[processor.name] = await processor.fn(data.raw); + console.log(`✓ Processed with ${processor.name}`); + } catch (error) { + console.error(`✗ Failed ${processor.name}:`, error.message); + data.processed[processor.name] = null; + } + } + + return data; + } +} + +// Processor functions +function extractLinks(pageData) { + const linkRegex = /href=["'](https?:\/\/[^"']+)["']/g; + const links = []; + let match; + + while ((match = linkRegex.exec(pageData.html)) !== null) { + links.push(match[1]); + } + + return [...new Set(links)]; // Remove duplicates +} + +function extractImages(pageData) { + const imgRegex = /src=["'](https?:\/\/[^"']+\.(jpg|jpeg|png|gif|webp))["']/gi; + const images = []; + let match; + + while ((match = imgRegex.exec(pageData.html)) !== null) { + images.push(match[1]); + } + + return [...new Set(images)]; +} + +function extractPlainText(pageData) { + return pageData.html + .replace(/]*>[\s\S]*?<\/script>/gi, '') + .replace(/]*>[\s\S]*?<\/style>/gi, '') + .replace(/<[^>]*>/g, ' ') + .replace(/\s+/g, ' ') + .trim(); +} + +// Usage +const pipeline = new ScrapingPipeline(); +await pipeline.initialize(); + +pipeline.addProcessor('links', extractLinks); +pipeline.addProcessor('images', extractImages); +pipeline.addProcessor('plainText', extractPlainText); + +const result = await pipeline.scrape('https://example.com/article'); +console.log('Links found:', result.processed.links.length); +console.log('Images found:', result.processed.images.length); +console.log('Text length:', result.processed.plainText.length); +``` + +## Response Format + +### Successful Response + +```typescript +{ + code: 200, + status: 200, + data: { + title: "Article Title", + url: "https://example.com/article", + html: "
Article content...
", + publishedTime: "2025-01-15T10:30:00Z", + usage: { + tokens: 1500 + } + }, + meta: { + usage: { + tokens: 1500 + } + } +} +``` + +### Response Fields + +| Field | Type | Description | +|-------|------|-------------| +| `code` | number | Response status code | +| `status` | number | HTTP status code | +| `data.title` | string | Page title | +| `data.url` | string | Page URL | +| `data.html` | string | Extracted HTML content | +| `data.publishedTime` | string | Publication date (optional) | +| `data.usage.tokens` | number | Tokens used for processing | +| `meta.usage.tokens` | number | Total tokens used | + +## Best Practices + +### 1. Error Handling + +```javascript +async function safeReadPage(url) { + try { + const zai = await ZAI.create(); + + // Validate URL + if (!url || !url.startsWith('http')) { + throw new Error('Invalid URL format'); + } + + const result = await zai.functions.invoke('page_reader', { + url: url + }); + + // Check response status + if (result.code !== 200) { + throw new Error(`Failed to fetch page: ${result.code}`); + } + + // Verify essential data + if (!result.data.html || !result.data.title) { + throw new Error('Incomplete page data received'); + } + + return { + success: true, + data: result.data + }; + } catch (error) { + console.error('Page reading error:', error); + return { + success: false, + error: error.message + }; + } +} +``` + +### 2. Rate Limiting + +```javascript +class RateLimitedReader { + constructor(requestsPerMinute = 10) { + this.requestsPerMinute = requestsPerMinute; + this.requestTimes = []; + } + + async initialize() { + this.zai = await ZAI.create(); + } + + async readPage(url) { + await this.waitForRateLimit(); + + const result = await this.zai.functions.invoke('page_reader', { + url: url + }); + + this.requestTimes.push(Date.now()); + return result.data; + } + + async waitForRateLimit() { + const now = Date.now(); + const oneMinuteAgo = now - 60000; + + // Remove old timestamps + this.requestTimes = this.requestTimes.filter(time => time > oneMinuteAgo); + + // Check if we need to wait + if (this.requestTimes.length >= this.requestsPerMinute) { + const oldestRequest = this.requestTimes[0]; + const waitTime = 60000 - (now - oldestRequest); + + if (waitTime > 0) { + console.log(`Rate limit reached. Waiting ${waitTime}ms...`); + await new Promise(resolve => setTimeout(resolve, waitTime)); + } + } + } +} + +// Usage +const reader = new RateLimitedReader(10); // 10 requests per minute +await reader.initialize(); + +const urls = ['https://example.com/1', 'https://example.com/2']; +for (const url of urls) { + const data = await reader.readPage(url); + console.log('Fetched:', data.title); +} +``` + +### 3. Caching Strategy + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +class CachedWebReader { + constructor(cacheDuration = 3600000) { // 1 hour default + this.cache = new Map(); + this.cacheDuration = cacheDuration; + } + + async initialize() { + this.zai = await ZAI.create(); + } + + async readPage(url, forceRefresh = false) { + const cacheKey = url; + const cached = this.cache.get(cacheKey); + + // Return cached if valid and not forcing refresh + if (cached && !forceRefresh) { + const age = Date.now() - cached.timestamp; + if (age < this.cacheDuration) { + console.log('Returning cached content for:', url); + return cached.data; + } + } + + // Fetch fresh content + const result = await this.zai.functions.invoke('page_reader', { + url: url + }); + + // Update cache + this.cache.set(cacheKey, { + data: result.data, + timestamp: Date.now() + }); + + return result.data; + } + + clearCache() { + this.cache.clear(); + } + + getCacheStats() { + return { + size: this.cache.size, + entries: Array.from(this.cache.keys()) + }; + } +} + +// Usage +const reader = new CachedWebReader(3600000); // 1 hour cache +await reader.initialize(); + +const data1 = await reader.readPage('https://example.com'); // Fresh fetch +const data2 = await reader.readPage('https://example.com'); // From cache +const data3 = await reader.readPage('https://example.com', true); // Force refresh +``` + +### 4. Parallel Processing + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function readPagesInParallel(urls, concurrency = 3) { + const zai = await ZAI.create(); + const results = []; + + // Process in batches + for (let i = 0; i < urls.length; i += concurrency) { + const batch = urls.slice(i, i + concurrency); + + const batchResults = await Promise.allSettled( + batch.map(url => + zai.functions.invoke('page_reader', { url }) + .then(result => ({ + url: url, + success: true, + data: result.data + })) + .catch(error => ({ + url: url, + success: false, + error: error.message + })) + ) + ); + + results.push(...batchResults.map(r => r.value)); + console.log(`Completed batch ${Math.floor(i / concurrency) + 1}`); + } + + return results; +} + +// Usage +const urls = [ + 'https://example.com/1', + 'https://example.com/2', + 'https://example.com/3', + 'https://example.com/4', + 'https://example.com/5' +]; + +const results = await readPagesInParallel(urls, 2); // 2 concurrent requests +results.forEach(result => { + if (result.success) { + console.log(`✓ ${result.data.title}`); + } else { + console.log(`✗ ${result.url}: ${result.error}`); + } +}); +``` + +### 5. Content Processing + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +class ContentProcessor { + static extractMainContent(html) { + // Remove scripts, styles, and comments + let content = html + .replace(/]*>[\s\S]*?<\/script>/gi, '') + .replace(/]*>[\s\S]*?<\/style>/gi, '') + .replace(//g, ''); + + return content; + } + + static htmlToPlainText(html) { + return html + .replace(//gi, '\n') + .replace(/<\/p>/gi, '\n\n') + .replace(/<[^>]*>/g, '') + .replace(/ /g, ' ') + .replace(/&/g, '&') + .replace(/</g, '<') + .replace(/>/g, '>') + .replace(/"/g, '"') + .replace(/\s+/g, ' ') + .trim(); + } + + static extractMetadata(html) { + const metadata = {}; + + // Extract meta description + const descMatch = html.match(/ k.trim()); + + // Extract author + const authorMatch = html.match(/ { + try { + const { url } = req.body; + + if (!url) { + return res.status(400).json({ + error: 'URL is required' + }); + } + + const result = await zaiInstance.functions.invoke('page_reader', { + url: url + }); + + res.json({ + success: true, + data: { + title: result.data.title, + url: result.data.url, + content: result.data.html, + publishedTime: result.data.publishedTime, + tokensUsed: result.data.usage.tokens + } + }); + } catch (error) { + res.status(500).json({ + success: false, + error: error.message + }); + } +}); + +app.post('/api/read-multiple', async (req, res) => { + try { + const { urls } = req.body; + + if (!urls || !Array.isArray(urls)) { + return res.status(400).json({ + error: 'URLs array is required' + }); + } + + const results = await Promise.allSettled( + urls.map(url => + zaiInstance.functions.invoke('page_reader', { url }) + .then(result => ({ + url: url, + success: true, + data: result.data + })) + .catch(error => ({ + url: url, + success: false, + error: error.message + })) + ) + ); + + res.json({ + success: true, + results: results.map(r => r.value) + }); + } catch (error) { + res.status(500).json({ + success: false, + error: error.message + }); + } +}); + +initZAI().then(() => { + app.listen(3000, () => { + console.log('Web reader API running on port 3000'); + }); +}); +``` + +### Scheduled Content Fetcher + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; +import cron from 'node-cron'; + +class ScheduledFetcher { + constructor() { + this.urls = []; + this.results = []; + } + + async initialize() { + this.zai = await ZAI.create(); + } + + addUrl(url, schedule) { + this.urls.push({ url, schedule }); + } + + async fetchContent(url) { + try { + const result = await this.zai.functions.invoke('page_reader', { + url: url + }); + + return { + url: url, + success: true, + title: result.data.title, + content: result.data.html, + fetchedAt: new Date().toISOString() + }; + } catch (error) { + return { + url: url, + success: false, + error: error.message, + fetchedAt: new Date().toISOString() + }; + } + } + + startScheduledFetch(url, schedule) { + cron.schedule(schedule, async () => { + console.log(`Fetching ${url}...`); + const result = await this.fetchContent(url); + this.results.push(result); + + // Keep only last 100 results + if (this.results.length > 100) { + this.results = this.results.slice(-100); + } + + console.log(`Fetched: ${result.success ? result.title : result.error}`); + }); + } + + start() { + for (const { url, schedule } of this.urls) { + this.startScheduledFetch(url, schedule); + } + } + + getResults() { + return this.results; + } +} + +// Usage +const fetcher = new ScheduledFetcher(); +await fetcher.initialize(); + +// Fetch every hour +fetcher.addUrl('https://example.com/news', '0 * * * *'); + +// Fetch every day at midnight +fetcher.addUrl('https://example.com/daily', '0 0 * * *'); + +fetcher.start(); +console.log('Scheduled fetching started'); +``` + +## Troubleshooting + +**Issue**: "SDK must be used in backend" +- **Solution**: Ensure z-ai-web-dev-sdk is only imported and used in server-side code + +**Issue**: Failed to fetch page (404, 403, etc.) +- **Solution**: Verify the URL is accessible and not behind authentication/paywall + +**Issue**: Incomplete or missing content +- **Solution**: Some pages may have dynamic content that requires JavaScript. The reader extracts static HTML content. + +**Issue**: High token usage +- **Solution**: The token usage depends on page size. Consider caching frequently accessed pages. + +**Issue**: Slow response times +- **Solution**: Implement caching, use parallel processing for multiple URLs, and consider rate limiting + +**Issue**: Empty HTML content +- **Solution**: Check if the page requires authentication or has anti-scraping measures. Verify the URL is correct. + +## Performance Tips + +1. **Implement caching**: Cache frequently accessed pages to reduce API calls +2. **Use parallel processing**: Fetch multiple pages concurrently (with rate limiting) +3. **Process content efficiently**: Extract only needed information from HTML +4. **Set timeouts**: Implement reasonable timeouts for page fetching +5. **Monitor token usage**: Track usage to optimize costs +6. **Batch operations**: Group multiple URL fetches when possible + +## Security Considerations + +- Validate all URLs before processing +- Sanitize extracted HTML content before displaying +- Implement rate limiting to prevent abuse +- Never expose SDK credentials in client-side code +- Be respectful of robots.txt and website terms of service +- Handle user data according to privacy regulations +- Implement proper error handling for failed requests + +## Remember + +- Always use z-ai-web-dev-sdk in backend code only +- The SDK is already installed - import as shown in examples +- Implement proper error handling for robust applications +- Use caching to improve performance and reduce costs +- Respect website terms of service and rate limits +- Process HTML content carefully to extract meaningful data +- Monitor token usage for cost optimization diff --git a/skills/web-reader/scripts/web-reader.ts b/skills/web-reader/scripts/web-reader.ts new file mode 100755 index 0000000..f01e009 --- /dev/null +++ b/skills/web-reader/scripts/web-reader.ts @@ -0,0 +1,37 @@ +import ZAI from 'z-ai-web-dev-sdk'; + +interface PageReaderFunctionResult { + code: number; + data: { + html: string; + publishedTime?: string; + title: string; + url: string; + usage: { + tokens: number; + }; + }; + meta: { + usage: { + tokens: number; + }; + }; + status: number; +} + +async function main(url: string) { + try { + const zai = await ZAI.create(); + + const results: PageReaderFunctionResult = await zai.functions.invoke('page_reader', { + url: url + }); + + console.log('Web reader invocation succeeded. Results:'); + console.log(JSON.stringify(results, null, 2)); + } catch (err: any) { + console.error('page_reader failed:', err?.message || err); + } +} + +main('https://www.google.com'); diff --git a/skills/web-search/LICENSE.txt b/skills/web-search/LICENSE.txt new file mode 100755 index 0000000..1e54539 --- /dev/null +++ b/skills/web-search/LICENSE.txt @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2025 z-ai-web-dev-sdk Skills + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/skills/web-search/SKILL.md b/skills/web-search/SKILL.md new file mode 100755 index 0000000..7ea2f63 --- /dev/null +++ b/skills/web-search/SKILL.md @@ -0,0 +1,912 @@ +--- +name: web-search +description: Implement web search capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to search for real-time information from the web, retrieve up-to-date content beyond the knowledge cutoff, or find the latest news and data. Returns structured search results with URLs, snippets, and metadata. +license: MIT +--- + +# Web Search Skill + +This skill guides the implementation of web search functionality using the z-ai-web-dev-sdk package, enabling applications to search the web and retrieve current information. + +## Installation Path + +**Recommended Location**: `{project_path}/skills/web-search` + +Extract this skill package to the above path in your project. + +**Reference Scripts**: Example test scripts are available in the `{project_path}/skills/web-search/scripts/` directory for quick testing and reference. See `{project_path}/skills/web-search/scripts/web_search.ts` for a working example. + +## Overview + +The Web Search skill allows you to build applications that can search the internet, retrieve current information, and access real-time data from web sources. + +**IMPORTANT**: z-ai-web-dev-sdk MUST be used in backend code only. Never use it in client-side code. + +## Prerequisites + +The z-ai-web-dev-sdk package is already installed. Import it as shown in the examples below. + +## CLI Usage (For Simple Tasks) + +For simple web search queries, you can use the z-ai CLI instead of writing code. This is ideal for quick information retrieval, testing search functionality, or command-line automation. + +### Basic Web Search + +```bash +# Simple search query +z-ai function --name "web_search" --args '{"query": "artificial intelligence"}' + +# Using short options +z-ai function -n web_search -a '{"query": "latest tech news"}' +``` + +### Search with Custom Parameters + +```bash +# Limit number of results +z-ai function \ + -n web_search \ + -a '{"query": "machine learning", "num": 5}' + +# Search with recency filter (results from last N days) +z-ai function \ + -n web_search \ + -a '{"query": "cryptocurrency news", "num": 10, "recency_days": 7}' +``` + +### Save Search Results + +```bash +# Save results to JSON file +z-ai function \ + -n web_search \ + -a '{"query": "climate change research", "num": 5}' \ + -o search_results.json + +# Recent news with file output +z-ai function \ + -n web_search \ + -a '{"query": "AI breakthroughs", "num": 3, "recency_days": 1}' \ + -o ai_news.json +``` + +### Advanced Search Examples + +```bash +# Search for specific topics +z-ai function \ + -n web_search \ + -a '{"query": "quantum computing applications", "num": 8}' \ + -o quantum.json + +# Find recent scientific papers +z-ai function \ + -n web_search \ + -a '{"query": "genomics research", "num": 5, "recency_days": 30}' \ + -o genomics.json + +# Technology news from last 24 hours +z-ai function \ + -n web_search \ + -a '{"query": "tech industry updates", "recency_days": 1}' \ + -o today_tech.json +``` + +### CLI Parameters + +- `--name, -n`: **Required** - Function name (use "web_search") +- `--args, -a`: **Required** - JSON arguments object with: + - `query` (string, required): Search keywords + - `num` (number, optional): Number of results (default: 10) + - `recency_days` (number, optional): Filter results from last N days +- `--output, -o `: Optional - Output file path (JSON format) + +### Search Result Structure + +Each result contains: +- `url`: Full URL of the result +- `name`: Title of the page +- `snippet`: Preview text/description +- `host_name`: Domain name +- `rank`: Result ranking +- `date`: Publication/update date +- `favicon`: Favicon URL + +### When to Use CLI vs SDK + +**Use CLI for:** +- Quick information lookups +- Testing search queries +- Simple automation scripts +- One-off research tasks + +**Use SDK for:** +- Dynamic search in applications +- Multi-step search workflows +- Custom result processing and filtering +- Production applications with complex logic + +## Search Result Type + +Each search result is a `SearchFunctionResultItem` with the following structure: + +```typescript +interface SearchFunctionResultItem { + url: string; // Full URL of the result + name: string; // Title of the page + snippet: string; // Preview text/description + host_name: string; // Domain name + rank: number; // Result ranking + date: string; // Publication/update date + favicon: string; // Favicon URL +} +``` + +## Basic Web Search + +### Simple Search Query + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function searchWeb(query) { + const zai = await ZAI.create(); + + const results = await zai.functions.invoke('web_search', { + query: query, + num: 10 + }); + + return results; +} + +// Usage +const searchResults = await searchWeb('What is the capital of France?'); +console.log('Search Results:', searchResults); +``` + +### Search with Custom Result Count + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function searchWithLimit(query, numberOfResults) { + const zai = await ZAI.create(); + + const results = await zai.functions.invoke('web_search', { + query: query, + num: numberOfResults + }); + + return results; +} + +// Usage - Get top 5 results +const topResults = await searchWithLimit('artificial intelligence news', 5); + +// Usage - Get top 20 results +const moreResults = await searchWithLimit('JavaScript frameworks', 20); +``` + +### Formatted Search Results + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function getFormattedResults(query) { + const zai = await ZAI.create(); + + const results = await zai.functions.invoke('web_search', { + query: query, + num: 10 + }); + + // Format results for display + const formatted = results.map((item, index) => ({ + position: index + 1, + title: item.name, + url: item.url, + description: item.snippet, + domain: item.host_name, + publishDate: item.date + })); + + return formatted; +} + +// Usage +const results = await getFormattedResults('climate change solutions'); +results.forEach(result => { + console.log(`${result.position}. ${result.title}`); + console.log(` ${result.url}`); + console.log(` ${result.description}`); + console.log(''); +}); +``` + +## Advanced Use Cases + +### Search with Result Processing + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +class SearchProcessor { + constructor() { + this.zai = null; + } + + async initialize() { + this.zai = await ZAI.create(); + } + + async search(query, options = {}) { + const { + num = 10, + filterDomain = null, + minSnippetLength = 0 + } = options; + + const results = await this.zai.functions.invoke('web_search', { + query: query, + num: num + }); + + // Filter results + let filtered = results; + + if (filterDomain) { + filtered = filtered.filter(item => + item.host_name.includes(filterDomain) + ); + } + + if (minSnippetLength > 0) { + filtered = filtered.filter(item => + item.snippet.length >= minSnippetLength + ); + } + + return filtered; + } + + extractDomains(results) { + return [...new Set(results.map(item => item.host_name))]; + } + + groupByDomain(results) { + const grouped = {}; + + results.forEach(item => { + if (!grouped[item.host_name]) { + grouped[item.host_name] = []; + } + grouped[item.host_name].push(item); + }); + + return grouped; + } + + sortByDate(results, ascending = false) { + return results.sort((a, b) => { + const dateA = new Date(a.date); + const dateB = new Date(b.date); + return ascending ? dateA - dateB : dateB - dateA; + }); + } +} + +// Usage +const processor = new SearchProcessor(); +await processor.initialize(); + +const results = await processor.search('machine learning tutorials', { + num: 15, + minSnippetLength: 50 +}); + +console.log('Domains found:', processor.extractDomains(results)); +console.log('Grouped by domain:', processor.groupByDomain(results)); +console.log('Sorted by date:', processor.sortByDate(results)); +``` + +### News Search + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function searchNews(topic, timeframe = 'recent') { + const zai = await ZAI.create(); + + // Add time-based keywords to query + const timeKeywords = { + recent: 'latest news', + today: 'today news', + week: 'this week news', + month: 'this month news' + }; + + const query = `${topic} ${timeKeywords[timeframe] || timeKeywords.recent}`; + + const results = await zai.functions.invoke('web_search', { + query: query, + num: 10 + }); + + // Sort by date (most recent first) + const sortedResults = results.sort((a, b) => { + return new Date(b.date) - new Date(a.date); + }); + + return sortedResults; +} + +// Usage +const aiNews = await searchNews('artificial intelligence', 'today'); +const techNews = await searchNews('technology', 'week'); + +console.log('Latest AI News:'); +aiNews.forEach(item => { + console.log(`${item.name} (${item.date})`); + console.log(`${item.snippet}\n`); +}); +``` + +### Research Assistant + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +class ResearchAssistant { + constructor() { + this.zai = null; + } + + async initialize() { + this.zai = await ZAI.create(); + } + + async researchTopic(topic, depth = 'standard') { + const numResults = { + quick: 5, + standard: 10, + deep: 20 + }; + + const results = await this.zai.functions.invoke('web_search', { + query: topic, + num: numResults[depth] || 10 + }); + + // Analyze results + const analysis = { + topic: topic, + totalResults: results.length, + sources: this.extractDomains(results), + topResults: results.slice(0, 5).map(r => ({ + title: r.name, + url: r.url, + summary: r.snippet + })), + dateRange: this.getDateRange(results) + }; + + return analysis; + } + + extractDomains(results) { + const domains = {}; + results.forEach(item => { + domains[item.host_name] = (domains[item.host_name] || 0) + 1; + }); + return domains; + } + + getDateRange(results) { + const dates = results + .map(r => new Date(r.date)) + .filter(d => !isNaN(d)); + + if (dates.length === 0) return null; + + return { + earliest: new Date(Math.min(...dates)), + latest: new Date(Math.max(...dates)) + }; + } + + async compareTopics(topic1, topic2) { + const [results1, results2] = await Promise.all([ + this.zai.functions.invoke('web_search', { query: topic1, num: 10 }), + this.zai.functions.invoke('web_search', { query: topic2, num: 10 }) + ]); + + const domains1 = new Set(results1.map(r => r.host_name)); + const domains2 = new Set(results2.map(r => r.host_name)); + + const commonDomains = [...domains1].filter(d => domains2.has(d)); + + return { + topic1: { + name: topic1, + results: results1.length, + uniqueDomains: domains1.size + }, + topic2: { + name: topic2, + results: results2.length, + uniqueDomains: domains2.size + }, + commonDomains: commonDomains + }; + } +} + +// Usage +const assistant = new ResearchAssistant(); +await assistant.initialize(); + +const research = await assistant.researchTopic('quantum computing', 'deep'); +console.log('Research Analysis:', research); + +const comparison = await assistant.compareTopics( + 'renewable energy', + 'solar power' +); +console.log('Topic Comparison:', comparison); +``` + +### Search Result Validation + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function validateSearchResults(query) { + const zai = await ZAI.create(); + + const results = await zai.functions.invoke('web_search', { + query: query, + num: 10 + }); + + // Validate and score results + const validated = results.map(item => { + let score = 0; + let flags = []; + + // Check snippet quality + if (item.snippet && item.snippet.length > 50) { + score += 20; + } else { + flags.push('short_snippet'); + } + + // Check date availability + if (item.date && item.date !== 'N/A') { + score += 20; + } else { + flags.push('no_date'); + } + + // Check URL validity + try { + new URL(item.url); + score += 20; + } catch (e) { + flags.push('invalid_url'); + } + + // Check domain quality (not perfect, but basic check) + if (!item.host_name.includes('spam') && + !item.host_name.includes('ads')) { + score += 20; + } else { + flags.push('suspicious_domain'); + } + + // Check title quality + if (item.name && item.name.length > 10) { + score += 20; + } else { + flags.push('short_title'); + } + + return { + ...item, + qualityScore: score, + validationFlags: flags, + isHighQuality: score >= 80 + }; + }); + + // Sort by quality score + return validated.sort((a, b) => b.qualityScore - a.qualityScore); +} + +// Usage +const validated = await validateSearchResults('best programming practices'); +console.log('High quality results:', + validated.filter(r => r.isHighQuality).length +); +``` + +## Best Practices + +### 1. Query Optimization + +```javascript +// Bad: Too vague +const bad = await searchWeb('information'); + +// Good: Specific and targeted +const good = await searchWeb('JavaScript async/await best practices 2024'); + +// Good: Include context +const goodWithContext = await searchWeb('React hooks tutorial for beginners'); +``` + +### 2. Error Handling + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function safeSearch(query, retries = 3) { + let lastError; + + for (let attempt = 1; attempt <= retries; attempt++) { + try { + const zai = await ZAI.create(); + + const results = await zai.functions.invoke('web_search', { + query: query, + num: 10 + }); + + if (!Array.isArray(results) || results.length === 0) { + throw new Error('No results found or invalid response'); + } + + return { + success: true, + results: results, + attempts: attempt + }; + } catch (error) { + lastError = error; + console.error(`Attempt ${attempt} failed:`, error.message); + + if (attempt < retries) { + // Wait before retry (exponential backoff) + await new Promise(resolve => setTimeout(resolve, 1000 * attempt)); + } + } + } + + return { + success: false, + error: lastError.message, + attempts: retries + }; +} +``` + +### 3. Result Caching + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +class CachedSearch { + constructor(cacheDuration = 3600000) { // 1 hour default + this.cache = new Map(); + this.cacheDuration = cacheDuration; + this.zai = null; + } + + async initialize() { + this.zai = await ZAI.create(); + } + + getCacheKey(query, num) { + return `${query}_${num}`; + } + + async search(query, num = 10) { + const cacheKey = this.getCacheKey(query, num); + const cached = this.cache.get(cacheKey); + + // Check if cached and not expired + if (cached && Date.now() - cached.timestamp < this.cacheDuration) { + console.log('Returning cached results'); + return { + ...cached.data, + cached: true + }; + } + + // Perform fresh search + const results = await this.zai.functions.invoke('web_search', { + query: query, + num: num + }); + + // Cache results + this.cache.set(cacheKey, { + data: results, + timestamp: Date.now() + }); + + return { + results: results, + cached: false + }; + } + + clearCache() { + this.cache.clear(); + } + + getCacheSize() { + return this.cache.size; + } +} + +// Usage +const search = new CachedSearch(1800000); // 30 minutes cache +await search.initialize(); + +const result1 = await search.search('TypeScript tutorial'); +console.log('Cached:', result1.cached); // false + +const result2 = await search.search('TypeScript tutorial'); +console.log('Cached:', result2.cached); // true +``` + +### 4. Rate Limiting + +```javascript +class RateLimitedSearch { + constructor(requestsPerMinute = 60) { + this.zai = null; + this.requestsPerMinute = requestsPerMinute; + this.requests = []; + } + + async initialize() { + this.zai = await ZAI.create(); + } + + async search(query, num = 10) { + await this.checkRateLimit(); + + const results = await this.zai.functions.invoke('web_search', { + query: query, + num: num + }); + + this.requests.push(Date.now()); + return results; + } + + async checkRateLimit() { + const now = Date.now(); + const oneMinuteAgo = now - 60000; + + // Remove requests older than 1 minute + this.requests = this.requests.filter(time => time > oneMinuteAgo); + + if (this.requests.length >= this.requestsPerMinute) { + const oldestRequest = this.requests[0]; + const waitTime = 60000 - (now - oldestRequest); + + console.log(`Rate limit reached. Waiting ${waitTime}ms`); + await new Promise(resolve => setTimeout(resolve, waitTime)); + + // Recheck after waiting + return this.checkRateLimit(); + } + } +} +``` + +## Common Use Cases + +1. **Real-time Information Retrieval**: Get current news, stock prices, weather +2. **Research & Analysis**: Gather information on specific topics +3. **Content Discovery**: Find articles, tutorials, documentation +4. **Competitive Analysis**: Research competitors and market trends +5. **Fact Checking**: Verify information against web sources +6. **SEO & Content Research**: Analyze search results for content strategy +7. **News Aggregation**: Collect news from various sources +8. **Academic Research**: Find papers, studies, and academic content + +## Integration Examples + +### Express.js Search API + +```javascript +import express from 'express'; +import ZAI from 'z-ai-web-dev-sdk'; + +const app = express(); +app.use(express.json()); + +let zaiInstance; + +async function initZAI() { + zaiInstance = await ZAI.create(); +} + +app.get('/api/search', async (req, res) => { + try { + const { q: query, num = 10 } = req.query; + + if (!query) { + return res.status(400).json({ error: 'Query parameter "q" is required' }); + } + + const numResults = Math.min(parseInt(num) || 10, 20); + + const results = await zaiInstance.functions.invoke('web_search', { + query: query, + num: numResults + }); + + res.json({ + success: true, + query: query, + totalResults: results.length, + results: results + }); + } catch (error) { + res.status(500).json({ + success: false, + error: error.message + }); + } +}); + +app.get('/api/search/news', async (req, res) => { + try { + const { topic, timeframe = 'recent' } = req.query; + + if (!topic) { + return res.status(400).json({ error: 'Topic parameter is required' }); + } + + const timeKeywords = { + recent: 'latest news', + today: 'today news', + week: 'this week news' + }; + + const query = `${topic} ${timeKeywords[timeframe] || timeKeywords.recent}`; + + const results = await zaiInstance.functions.invoke('web_search', { + query: query, + num: 15 + }); + + // Sort by date + const sortedResults = results.sort((a, b) => { + return new Date(b.date) - new Date(a.date); + }); + + res.json({ + success: true, + topic: topic, + timeframe: timeframe, + results: sortedResults + }); + } catch (error) { + res.status(500).json({ + success: false, + error: error.message + }); + } +}); + +initZAI().then(() => { + app.listen(3000, () => { + console.log('Search API running on port 3000'); + }); +}); +``` + +### Search with AI Summary + +```javascript +import ZAI from 'z-ai-web-dev-sdk'; + +async function searchAndSummarize(query) { + const zai = await ZAI.create(); + + // Step 1: Search the web + const searchResults = await zai.functions.invoke('web_search', { + query: query, + num: 10 + }); + + // Step 2: Create summary using chat completions + const searchContext = searchResults + .slice(0, 5) + .map((r, i) => `${i + 1}. ${r.name}\n${r.snippet}`) + .join('\n\n'); + + const completion = await zai.chat.completions.create({ + messages: [ + { + role: 'assistant', + content: 'You are a research assistant. Summarize search results clearly and concisely.' + }, + { + role: 'user', + content: `Query: "${query}"\n\nSearch Results:\n${searchContext}\n\nProvide a comprehensive summary of these results.` + } + ], + thinking: { type: 'disabled' } + }); + + const summary = completion.choices[0]?.message?.content; + + return { + query: query, + summary: summary, + sources: searchResults.slice(0, 5).map(r => ({ + title: r.name, + url: r.url + })), + totalResults: searchResults.length + }; +} + +// Usage +const result = await searchAndSummarize('benefits of renewable energy'); +console.log('Summary:', result.summary); +console.log('Sources:', result.sources); +``` + +## Troubleshooting + +**Issue**: "SDK must be used in backend" +- **Solution**: Ensure z-ai-web-dev-sdk is only imported and used in server-side code + +**Issue**: Empty or no results returned +- **Solution**: Try different query terms, check internet connectivity, verify API status + +**Issue**: Unexpected response format +- **Solution**: Verify the response is an array, check for API changes, add type validation + +**Issue**: Rate limiting errors +- **Solution**: Implement request throttling, add delays between searches, use caching + +**Issue**: Low quality search results +- **Solution**: Refine query terms, filter results by domain or date, validate result quality + +## Performance Tips + +1. **Reuse SDK Instance**: Create ZAI instance once and reuse across searches +2. **Implement Caching**: Cache search results to reduce API calls +3. **Optimize Query Terms**: Use specific, targeted queries for better results +4. **Limit Result Count**: Request only the number of results you need +5. **Parallel Searches**: Use Promise.all for multiple independent searches +6. **Result Filtering**: Filter results on client side when possible + +## Security Considerations + +1. **Input Validation**: Sanitize and validate user search queries +2. **Rate Limiting**: Implement rate limits to prevent abuse +3. **API Key Protection**: Never expose SDK credentials in client-side code +4. **Result Filtering**: Filter potentially harmful or inappropriate content +5. **URL Validation**: Validate URLs before redirecting users +6. **Privacy**: Don't log sensitive user search queries + +## Remember + +- Always use z-ai-web-dev-sdk in backend code only +- The SDK is already installed - import as shown in examples +- Search results are returned as an array of SearchFunctionResultItem objects +- Implement proper error handling and retries for production +- Cache results when appropriate to reduce API calls +- Use specific query terms for better search results +- Validate and filter results before displaying to users +- Check `scripts/web_search.ts` for a quick start example diff --git a/skills/web-search/scripts/web_search.ts b/skills/web-search/scripts/web_search.ts new file mode 100755 index 0000000..23b8af1 --- /dev/null +++ b/skills/web-search/scripts/web_search.ts @@ -0,0 +1,44 @@ +import ZAI from 'z-ai-web-dev-sdk'; + +interface SearchFunctionResultItem { + url: string; + name: string; + snippet: string; + host_name: string; + rank: number; + date: string; + favicon: string; +} + +async function main(query: string, num: number = 10) { + try { + const zai = await ZAI.create(); + + const searchResult = await zai.functions.invoke('web_search', { + query: query, + num: num + }); + + console.log('Search Results:'); + console.log('================\n'); + + if (Array.isArray(searchResult)) { + searchResult.forEach((item: SearchFunctionResultItem, index: number) => { + console.log(`${index + 1}. ${item.name}`); + console.log(` URL: ${item.url}`); + console.log(` Snippet: ${item.snippet}`); + console.log(` Host: ${item.host_name}`); + console.log(` Date: ${item.date}`); + console.log(''); + }); + + console.log(`\nTotal results: ${searchResult.length}`); + } else { + console.log('Unexpected response format:', searchResult); + } + } catch (err: any) { + console.error('Web search failed:', err?.message || err); + } +} + +main('What is the capital of France?', 5); diff --git a/skills/xlsx/LICENSE.txt b/skills/xlsx/LICENSE.txt new file mode 100755 index 0000000..c55ab42 --- /dev/null +++ b/skills/xlsx/LICENSE.txt @@ -0,0 +1,30 @@ +© 2025 Anthropic, PBC. All rights reserved. + +LICENSE: Use of these materials (including all code, prompts, assets, files, +and other components of this Skill) is governed by your agreement with +Anthropic regarding use of Anthropic's services. If no separate agreement +exists, use is governed by Anthropic's Consumer Terms of Service or +Commercial Terms of Service, as applicable: +https://www.anthropic.com/legal/consumer-terms +https://www.anthropic.com/legal/commercial-terms +Your applicable agreement is referred to as the "Agreement." "Services" are +as defined in the Agreement. + +ADDITIONAL RESTRICTIONS: Notwithstanding anything in the Agreement to the +contrary, users may not: + +- Extract these materials from the Services or retain copies of these + materials outside the Services +- Reproduce or copy these materials, except for temporary copies created + automatically during authorized use of the Services +- Create derivative works based on these materials +- Distribute, sublicense, or transfer these materials to any third party +- Make, offer to sell, sell, or import any inventions embodied in these + materials +- Reverse engineer, decompile, or disassemble these materials + +The receipt, viewing, or possession of these materials does not convey or +imply any license or right beyond those expressly granted above. + +Anthropic retains all right, title, and interest in these materials, +including all copyrights, patents, and other intellectual property rights. diff --git a/skills/xlsx/SKILL.md b/skills/xlsx/SKILL.md new file mode 100755 index 0000000..8f21a13 --- /dev/null +++ b/skills/xlsx/SKILL.md @@ -0,0 +1,496 @@ +--- +name: xlsx +description: "Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When GLM needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas" +license: Proprietary. LICENSE.txt has complete terms +--- + +# XLSX creation, editing, and analysis + +## Overview + +A user may ask you to create, edit, or analyze the contents of an .xlsx file. You have different tools and workflows available for different tasks. + +Must output excel files. + +## Important Requirements + +**Python 3 and openpyxl Required for Excel Generation**: You can assume Python 3 as the runtime environment. The `openpyxl` library is required as the primary tool for creating Excel files, managing styles, and writing formulas. + +**pandas Utilized for Data Processing**: You can utilize `pandas` for efficient data manipulation and processing tasks. The processed data is subsequently exported to the final Excel file through `openpyxl`. + +**LibreOffice Required for Formula Recalculation**: You can utilize `recalc.py` for formula check. You can assume LibreOffice is installed for recalculating formula values using the `recalc.py` script. The script automatically configures LibreOffice on first run. + + +# Requirements for Outputs + +## All Excel files + +## Critical Instruction Protocols + +### Query Decomposition & Verification +Before generating any code, strictly analyze the user's prompt. +- **Explicit Requests**: Analyze Explicit Needs: Clearly identify the analytical objectives, constraints, required formats, the Excel sheets to be delivered (including sheet names, column definitions, calculation logic, and required metrics), as well as all data fields explicitly requested by the user. These elements define the mandatory delivery scope and specify exactly what must be built in the workbook. +- **Implicit Requests**:Analyze Implicit Needs: Evaluate the business context, intended users of the Excel file, expected interaction patterns (e.g., filtering, sorting, manual inputs), and downstream use cases such as reporting or decision support. These considerations guide how sheets are structured, formulas are designed, and results are presented to ensure usability and clarity. +- **Multi-Part Requests**: If the user asks for "two tables", "three scenarios", or "a summary and a detail sheet", you MUST generate ALL requested components. + + +### Zero Formula Errors +- Every Excel model MUST be delivered with ZERO formula errors (#REF!, #DIV/0!, #VALUE!, #N/A, #NAME?) + +### Preserve Existing Templates (when updating templates) +- Study and EXACTLY match existing format, style, and conventions when modifying files +- Never impose standardized formatting on files with established patterns +- Existing template conventions ALWAYS override these guidelines + + +## Financial models + +### Color Coding Standards +Unless otherwise stated by the user or existing template + +#### Industry-Standard Color Conventions +- **Blue text (RGB: 0,0,255)**: Hardcoded inputs, and numbers users will change for scenarios +- **Black text (RGB: 0,0,0)**: ALL formulas and calculations +- **Green text (RGB: 0,128,0)**: Links pulling from other worksheets within same workbook +- **Red text (RGB: 255,0,0)**: External links to other files +- **Yellow background (RGB: 255,255,0)**: Key assumptions needing attention or cells that need to be updated + +### Number Formatting Standards + +#### Required Format Rules +- **Years**: Format as text strings (e.g., "2024" not "2,024") +- **Currency**: Use $#,##0 format; ALWAYS specify units in headers ("Revenue ($mm)") +- **Zeros**: Use number formatting to make all zeros "-", including percentages (e.g., "$#,##0;($#,##0);-") +- **Percentages**: Default to 0.0% format (one decimal) +- **Multiples**: Format as 0.0x for valuation multiples (EV/EBITDA, P/E) +- **Negative numbers**: Use parentheses (123) not minus -123 + +### Formula Construction Rules + +#### Assumptions Placement +- Place ALL assumptions (growth rates, margins, multiples, etc.) in separate assumption cells +- Use cell references instead of hardcoded values in formulas +- Example: Use =B5*(1+$B$6) instead of =B5*1.05 + +#### Formula Error Prevention +- Verify all cell references are correct +- Check for off-by-one errors in ranges +- Ensure consistent formulas across all projection periods +- Test with edge cases (zero values, negative numbers) +- Verify no unintended circular references + +#### Documentation Requirements for Hardcodes +- Comment or in cells beside (if end of table). Format: "Source: [System/Document], [Date], [Specific Reference], [URL if applicable]" +- Examples: + - "Source: Company 10-K, FY2024, Page 45, Revenue Note, [SEC EDGAR URL]" + - "Source: Company 10-Q, Q2 2025, Exhibit 99.1, [SEC EDGAR URL]" + - "Source: Bloomberg Terminal, 8/15/2025, AAPL US Equity" + - "Source: FactSet, 8/20/2025, Consensus Estimates Screen" + + +## Style Rules + +Implement all styling directly using the `python-openpyxl` library. The following standards define the visual architecture of the spreadsheets. + +### Global Layout & Design Principles +**Layout & Dimensions** +- **Canvas Origin**: Content MUST start at cell **B2** to provide a top-left padding margin. Do not start at A1. +- **Cell Sizing**: Optimize column widths and row heights for data readability. Avoid unscaled cells (e.g., narrow columns with excessive height). +- **Title Row**: Row 2 is reserved for the title. Explicitly set row height to prevent clipping: `row_dimensions[2].height = 30` (adjust upwards if font size requires). + +**Visual Hierarchy** +- **Professionalism**: Prioritize business-appropriate color schemes. Avoid decorative elements that distract from data. +- **Consistency**: Apply uniform fonts, borders, and colors to similar data types across the workbook. +- **White Space**: Maintain adequate margins to prevent visual crowding. +- **Alternating Row Fill**: When the data area of the table exceeds three rows, alternating row fills (white and gray) are applied by default. +- When making the chart, labels and text elements are kept as concise as possible to maximize readability, and provide a clear reference key or table nearby mapping them to their original full names. + + +### Font Standards (MUST FOLLOW) +- **English Text**: Always use **Times New Roman** as the default font + +```python +# Font configuration example +from openpyxl.styles import Font + +# English content +english_font = Font(name='Times New Roman', size=11) + +``` + + +### Title Formatting Rules (MUST FOLLOW) +- **NO Background Shading**: Titles must NOT have any background fill/shading (PatternFill) +- **Left Alignment**: All titles must be left-aligned, NOT centered +- **Bold Text**: Use bold font weight to distinguish titles instead of background colors + +```python +# ✅ CORRECT Title Style + +from openpyxl.styles import Font, Alignment +from openpyxl import Workbook + +# Load existing file +wb = Workbook() +sheet = wb.active + +title_font = Font(name='Times New Roman', size=18, bold=True, color="000000") +title_alignment = Alignment(horizontal='left', vertical='center') + +sheet['B2'] = "Report Title" +sheet['B2'].font = title_font +sheet['B2'].alignment = title_alignment +# NO fill applied - title has no background shading + +# ❌ WRONG - Do NOT use background shading on titles +# title_fill = PatternFill(start_color="333333", fill_type="solid") # FORBIDDEN +# sheet['B2'].fill = title_fill # FORBIDDEN +``` + + + +### Visual Themes +#### 1. Default Style +**Use for:** All non-financial tasks (General data, project management, inventories). + +**Color Palette Constraints** +- **Base Colors**: White (#FFFFFF), Black (#000000), and Grey scales ONLY. +- **Accent Color**: **Blue** (varying saturation) is the ONLY allowed accent color for highlighting or differentiation. +- **Restrictions**: + - ❌ NO Green, Red, Orange, Purple, Yellow, or Pink. + - ❌ NO Gradients or Rainbow schemes. + +```python +# Palette +from openpyxl.styles import Alignment, Border, Font, Side, PatternFill, + +# Base & Accents +background_white = "FFFFFF" # background +background_row_alt = "E9E9E9" # Alternating row fill +grey_header = "333333" # Section headers +border_grey = "E3DEDE" # Standard borders +blue_primary = "0B5CAD" # Primary Accent + +# Application Example: Data Headers (NOT Titles) +header_fill = PatternFill(start_color=grey_header, end_color=grey_header, fill_type="solid") +header_font = Font(name='Times New Roman', color="FFFFFF", bold=True) + +for cell in sheet['B3:E3'][0]: + cell.fill = header_fill + cell.font = header_font + +# Example: Title style (NO shading, left-aligned) +title_font = Font(name='Times New Roman', size=18, bold=True, color="000000") +title_alignment = Alignment(horizontal='left', vertical='center') +sheet['B2'].font = title_font +sheet['B2'].alignment = title_alignment +# NO fill for titles +``` + +#### 2. Professional Finance Style +**Use for:** Financial, fiscal, and market analysis (Stock data, GDP, Budgets, P&L, ROI). + +**Market Data Color Convention (Critical)** +Apply the following color logic based on the target region: + +| Region | Price Up / Positive | Price Down / Negative | +| --- | --- | --- | +| **China (Mainland)** | **Red** | **Green** | +| **International** | **Green** | **Red** | + +```python +# Professional Finance Palette +from openpyxl.styles import PatternFill, Font + +text_dark = "000000" +background_light = "E6E8EB" +header_fill_blue = "1B3F66" +metrics_highlight_warm = "F5E6D3" +negative_red = "FF0000" + + +# Data Headers Example +pfs_header_fill = PatternFill(start_color=header_fill_blue, end_color=header_fill_blue, fill_type="solid") +pfs_header_font = Font(name='Times New Roman', color="FFFFFF", bold=True) + +for cell in sheet['B3:E3'][0]: + cell.fill = pfs_header_fill + cell.font = pfs_header_font + + +# Default font - Times New Roman for English +default_font = Font(name='Times New Roman', size=11, color=text_dark) + +# Example: Title style (NO shading, left-aligned) +# NO fill for titles +title_font = Font(name='Times New Roman', size=18, bold=True, color="000000") +title_alignment = Alignment(horizontal='left', vertical='center') +sheet['B2'].font = title_font +sheet['B2'].alignment = title_alignment + +# Example: Apply header style (for data headers, NOT titles) +header_fill = PatternFill(start_color=grey_header, end_color=grey_header, fill_type="solid") +header_font = Font(name='Times New Roman', color="FFFFFF", bold=True) +for cell in sheet['B3:E3'][0]: # Data headers, not title row + cell.fill = header_fill + cell.font = header_font +``` + +### Content Color Conventions +Apply specific font colors to indicate data source and functionality (consistent with Financial Model requirements): + +- **Blue Font**: Hardcoded inputs and fixed values. +- **Black Font**: Calculated results and formulas. +- **Green Font**: References to other worksheets within the same file. +- **Red Font**: References to external files or sources. + + +## Chart Creation Notes + +### 1. Data Source Must Contain “Actual Values” + +* Excel formulas written via **openpyxl** are not automatically calculated, which can cause charts to appear blank because no cached values are available. +* You can use `recalc.py` to calculate the values so that charts reference computed results. +* Finally, use `recalc.py` again to perform a validation check. + +### 2. Reference Range Must Match Title Settings + +* When `titles_from_data=True` is set, **the first row of the reference range must contain text headers**. + If this row is empty or contains numeric data, it may result in incorrect series names or data misalignment. +* Ensure that the chart’s reference range starts from the data rows and does not incorrectly include title rows. + +### 3. Impact of “Visibility” on Chart Data + +* By default, Excel charts do not plot data from hidden rows or columns (auxiliary tables are often hidden). + You must **explicitly disable the “plot visible cells only” option**, otherwise the chart will appear blank. + +```python +# After hiding auxiliary data rows, for each chart object, +# set plot_visible_only to False. This line is required. +chart.plot_visible_only = False +``` + + +# Workflows + +## Reading and analyzing data + +### Data analysis with pandas +For data analysis, visualization, and basic operations, use **pandas** which provides powerful data manipulation capabilities: + +```python +import pandas as pd + +# Read Excel +df = pd.read_excel('file.xlsx') # Default: first sheet +all_sheets = pd.read_excel('file.xlsx', sheet_name=None) # All sheets as dict + +# Analyze +df.head() # Preview data +df.info() # Column info +df.describe() # Statistics + +# Write Excel +df.to_excel('output.xlsx', index=False) +``` + +## Excel File Workflows + +## CRITICAL: Use Formulas, Not Hardcoded Values + +**Always use Excel formulas instead of calculating values in Python and hardcoding them.** This ensures the spreadsheet remains dynamic and updateable. + +### ❌ WRONG - Hardcoding Calculated Values +```python +# Bad: Calculating in Python and hardcoding result +total = df['Sales'].sum() +sheet['B10'] = total # Hardcodes 5000 + +# Bad: Computing growth rate in Python +growth = (df.iloc[-1]['Revenue'] - df.iloc[0]['Revenue']) / df.iloc[0]['Revenue'] +sheet['C5'] = growth # Hardcodes 0.15 + +# Bad: Python calculation for average +avg = sum(values) / len(values) +sheet['D20'] = avg # Hardcodes 42.5 +``` + +### ✅ CORRECT - Using Excel Formulas +```python +# Good: Let Excel calculate the sum +sheet['B10'] = '=SUM(B2:B9)' + +# Good: Growth rate as Excel formula +sheet['C5'] = '=(C4-C2)/C2' + +# Good: Average using Excel function +sheet['D20'] = '=AVERAGE(D2:D19)' +``` + +This applies to ALL calculations - totals, percentages, ratios, differences, etc. The spreadsheet should be able to recalculate when source data changes. + +## Common Workflow +1. **Choose tool**: pandas for data, openpyxl for formulas/formatting +2. **Think and Plan**: Plan all sheets structure, formulas, cross-references before coding +3. **Create/Load**: Create new workbook or load existing file +4. **Modify**: Add/edit data, formulas, and formatting +5. **Save**: Write to file +6. **Recalculate formulas (MANDATORY IF USING FORMULAS)**: Use the recalc.py script + ```bash + python recalc.py output.xlsx + ``` +7. **Verify and fix any errors**: + - The script returns JSON with error details + - If `status` is `errors_found`, check `error_summary` for specific error types and locations + - Fix the identified errors and recalculate again + - Common errors to fix: + - `#REF!`: Invalid cell references + - `#DIV/0!`: Division by zero + - `#NAME?`: Unrecognized formula name + - **When writing to Excel, do not directly assign plain text that begins with “=” to a cell; otherwise, the system may misinterpret it as an invalid formula and trigger a `#NAME?` error.** + - **For non-calculative descriptive text (such as legends), be sure to remove the leading equals sign before writing it in code, so it is correctly recognized as a regular string.** + - `#VALUE!`: Wrong data type in formula + + +### Creating new Excel files + +```python +# Using openpyxl for formulas and formatting +from openpyxl import Workbook +from openpyxl.styles import Font, PatternFill, Alignment + +wb = Workbook() +sheet = wb.active + +# Add data +sheet['A1'] = 'Hello' +sheet['B1'] = 'World' +sheet.append(['Row', 'of', 'data']) + +# Add formula +sheet['B2'] = '=SUM(A1:A10)' + +# Formatting +sheet['A1'].font = Font(bold=True, color='FF0000') +sheet['A1'].fill = PatternFill('solid', start_color='FFFF00') +sheet['A1'].alignment = Alignment(horizontal='center') + +# Column width +sheet.column_dimensions['A'].width = 20 + +wb.save('output.xlsx') +``` + +### Editing existing Excel files + +```python +# Using openpyxl to preserve formulas and formatting +from openpyxl import load_workbook + +# Load existing file +wb = load_workbook('existing.xlsx') +sheet = wb.active # or wb['SheetName'] for specific sheet + +# Working with multiple sheets +for sheet_name in wb.sheetnames: + sheet = wb[sheet_name] + print(f"Sheet: {sheet_name}") + +# Modify cells +sheet['A1'] = 'New Value' +sheet.insert_rows(2) # Insert row at position 2 +sheet.delete_cols(3) # Delete column 3 + +# Add new sheet +new_sheet = wb.create_sheet('NewSheet') +new_sheet['A1'] = 'Data' + +wb.save('modified.xlsx') +``` + +## Recalculating formulas + +Excel files created or modified by openpyxl contain formulas as strings but not calculated values. Use the provided `recalc.py` script to recalculate formulas: + +```bash +python recalc.py [timeout_seconds] +``` + +Example: +```bash +python recalc.py output.xlsx 30 +``` + +The script: +- Automatically sets up LibreOffice macro on first run +- Recalculates all formulas in all sheets +- Scans ALL cells for Excel errors (#REF!, #DIV/0!, etc.) +- Returns JSON with detailed error locations and counts +- Works on both Linux and macOS + +## Formula Verification Checklist + +Quick checks to ensure formulas work correctly: + +### Essential Verification +- [ ] **Test 2-3 sample references**: Verify they pull correct values before building full model +- [ ] **Column mapping**: Confirm Excel columns match (e.g., column 64 = BL, not BK) +- [ ] **Row offset**: Remember Excel rows are 1-indexed (DataFrame row 5 = Excel row 6) + +### Common Pitfalls +- [ ] **NaN handling**: Check for null values with `pd.notna()` +- [ ] **Far-right columns**: FY data often in columns 50+ +- [ ] **Multiple matches**: Search all occurrences, not just first +- [ ] **Division by zero**: Check denominators before using `/` in formulas (#DIV/0!) +- [ ] **Wrong references**: Verify all cell references point to intended cells (#REF!) +- [ ] **Cross-sheet references**: Use correct format (Sheet1!A1) for linking sheets + +### Formula Testing Strategy +- [ ] **Start small**: Test formulas on 2-3 cells before applying broadly +- [ ] **Verify dependencies**: Check all cells referenced in formulas exist +- [ ] **Test edge cases**: Include zero, negative, and very large values + +### Interpreting recalc.py Output +The script returns JSON with error details: +```json +{ + "status": "success", // or "errors_found" + "total_errors": 0, // Total error count + "total_formulas": 42, // Number of formulas in file + "error_summary": { // Only present if errors found + "#REF!": { + "count": 2, + "locations": ["Sheet1!B5", "Sheet1!C10"] + } + } +} +``` + +## Best Practices + +### Library Selection +- **pandas**: Best for data analysis, bulk operations, and simple data export +- **openpyxl**: Best for complex formatting, formulas, and Excel-specific features + +### Working with openpyxl +- Cell indices are 1-based (row=1, column=1 refers to cell A1) +- Use `data_only=True` to read calculated values: `load_workbook('file.xlsx', data_only=True)` +- **Warning**: If opened with `data_only=True` and saved, formulas are replaced with values and permanently lost +- For large files: Use `read_only=True` for reading or `write_only=True` for writing +- Formulas are preserved but not evaluated - use recalc.py to update values + +### Working with pandas +- Specify data types to avoid inference issues: `pd.read_excel('file.xlsx', dtype={'id': str})` +- For large files, read specific columns: `pd.read_excel('file.xlsx', usecols=['A', 'C', 'E'])` +- Handle dates properly: `pd.read_excel('file.xlsx', parse_dates=['date_column'])` + +## Code Style Guidelines +**IMPORTANT**: When generating Python code for Excel operations: +- Write minimal, concise Python code without unnecessary comments +- Avoid verbose variable names and redundant operations +- Avoid unnecessary print statements + +**For Excel files themselves**: +- Add comments to cells with complex formulas or important assumptions +- Document data sources for hardcoded values +- Include notes for key calculations and model sections \ No newline at end of file diff --git a/skills/xlsx/recalc.py b/skills/xlsx/recalc.py new file mode 100755 index 0000000..102e157 --- /dev/null +++ b/skills/xlsx/recalc.py @@ -0,0 +1,178 @@ +#!/usr/bin/env python3 +""" +Excel Formula Recalculation Script +Recalculates all formulas in an Excel file using LibreOffice +""" + +import json +import sys +import subprocess +import os +import platform +from pathlib import Path +from openpyxl import load_workbook + + +def setup_libreoffice_macro(): + """Setup LibreOffice macro for recalculation if not already configured""" + if platform.system() == 'Darwin': + macro_dir = os.path.expanduser('~/Library/Application Support/LibreOffice/4/user/basic/Standard') + else: + macro_dir = os.path.expanduser('~/.config/libreoffice/4/user/basic/Standard') + + macro_file = os.path.join(macro_dir, 'Module1.xba') + + if os.path.exists(macro_file): + with open(macro_file, 'r') as f: + if 'RecalculateAndSave' in f.read(): + return True + + if not os.path.exists(macro_dir): + subprocess.run(['soffice', '--headless', '--terminate_after_init'], + capture_output=True, timeout=10) + os.makedirs(macro_dir, exist_ok=True) + + macro_content = ''' + + + Sub RecalculateAndSave() + ThisComponent.calculateAll() + ThisComponent.store() + ThisComponent.close(True) + End Sub +''' + + try: + with open(macro_file, 'w') as f: + f.write(macro_content) + return True + except Exception: + return False + + +def recalc(filename, timeout=30): + """ + Recalculate formulas in Excel file and report any errors + + Args: + filename: Path to Excel file + timeout: Maximum time to wait for recalculation (seconds) + + Returns: + dict with error locations and counts + """ + if not Path(filename).exists(): + return {'error': f'File {filename} does not exist'} + + abs_path = str(Path(filename).absolute()) + + if not setup_libreoffice_macro(): + return {'error': 'Failed to setup LibreOffice macro'} + + cmd = [ + 'soffice', '--headless', '--norestore', + 'vnd.sun.star.script:Standard.Module1.RecalculateAndSave?language=Basic&location=application', + abs_path + ] + + # Handle timeout command differences between Linux and macOS + if platform.system() != 'Windows': + timeout_cmd = 'timeout' if platform.system() == 'Linux' else None + if platform.system() == 'Darwin': + # Check if gtimeout is available on macOS + try: + subprocess.run(['gtimeout', '--version'], capture_output=True, timeout=1, check=False) + timeout_cmd = 'gtimeout' + except (FileNotFoundError, subprocess.TimeoutExpired): + pass + + if timeout_cmd: + cmd = [timeout_cmd, str(timeout)] + cmd + + result = subprocess.run(cmd, capture_output=True, text=True) + + if result.returncode != 0 and result.returncode != 124: # 124 is timeout exit code + error_msg = result.stderr or 'Unknown error during recalculation' + if 'Module1' in error_msg or 'RecalculateAndSave' not in error_msg: + return {'error': 'LibreOffice macro not configured properly'} + else: + return {'error': error_msg} + + # Check for Excel errors in the recalculated file - scan ALL cells + try: + wb = load_workbook(filename, data_only=True) + + excel_errors = ['#VALUE!', '#DIV/0!', '#REF!', '#NAME?', '#NULL!', '#NUM!', '#N/A'] + error_details = {err: [] for err in excel_errors} + total_errors = 0 + + for sheet_name in wb.sheetnames: + ws = wb[sheet_name] + # Check ALL rows and columns - no limits + for row in ws.iter_rows(): + for cell in row: + if cell.value is not None and isinstance(cell.value, str): + for err in excel_errors: + if err in cell.value: + location = f"{sheet_name}!{cell.coordinate}" + error_details[err].append(location) + total_errors += 1 + break + + wb.close() + + # Build result summary + result = { + 'status': 'success' if total_errors == 0 else 'errors_found', + 'total_errors': total_errors, + 'error_summary': {} + } + + # Add non-empty error categories + for err_type, locations in error_details.items(): + if locations: + result['error_summary'][err_type] = { + 'count': len(locations), + 'locations': locations[:20] # Show up to 20 locations + } + + # Add formula count for context - also check ALL cells + wb_formulas = load_workbook(filename, data_only=False) + formula_count = 0 + for sheet_name in wb_formulas.sheetnames: + ws = wb_formulas[sheet_name] + for row in ws.iter_rows(): + for cell in row: + if cell.value and isinstance(cell.value, str) and cell.value.startswith('='): + formula_count += 1 + wb_formulas.close() + + result['total_formulas'] = formula_count + + return result + + except Exception as e: + return {'error': str(e)} + + +def main(): + if len(sys.argv) < 2: + print("Usage: python recalc.py [timeout_seconds]") + print("\nRecalculates all formulas in an Excel file using LibreOffice") + print("\nReturns JSON with error details:") + print(" - status: 'success' or 'errors_found'") + print(" - total_errors: Total number of Excel errors found") + print(" - total_formulas: Number of formulas in the file") + print(" - error_summary: Breakdown by error type with locations") + print(" - #VALUE!, #DIV/0!, #REF!, #NAME?, #NULL!, #NUM!, #N/A") + sys.exit(1) + + filename = sys.argv[1] + timeout = int(sys.argv[2]) if len(sys.argv) > 2 else 30 + + result = recalc(filename, timeout) + print(json.dumps(result, indent=2)) + + +if __name__ == '__main__': + main() \ No newline at end of file diff --git a/src/app/api/download/route.ts b/src/app/api/download/route.ts new file mode 100644 index 0000000..64015b2 --- /dev/null +++ b/src/app/api/download/route.ts @@ -0,0 +1,30 @@ +import { NextRequest, NextResponse } from 'next/server'; +import { readFileSync, existsSync } from 'fs'; +import { join } from 'path'; + +export async function GET(request: NextRequest) { + const searchParams = request.nextUrl.searchParams; + const file = searchParams.get('file'); + + if (!file) { + return NextResponse.json({ error: 'No file specified' }, { status: 400 }); + } + + // Security: only allow files from the download directory + const fileName = file.replace(/\.\./g, ''); // Prevent path traversal + const filePath = join(process.cwd(), 'download', fileName); + + if (!existsSync(filePath)) { + return NextResponse.json({ error: 'File not found' }, { status: 404 }); + } + + const fileBuffer = readFileSync(filePath); + + return new NextResponse(fileBuffer, { + headers: { + 'Content-Type': 'application/zip', + 'Content-Disposition': `attachment; filename="${fileName}"`, + 'Content-Length': fileBuffer.length.toString() + } + }); +} diff --git a/src/app/api/pipeline/route.ts b/src/app/api/pipeline/route.ts new file mode 100644 index 0000000..c2a6c82 --- /dev/null +++ b/src/app/api/pipeline/route.ts @@ -0,0 +1,214 @@ +import { NextRequest, NextResponse } from 'next/server'; +import { writeFileSync, readFileSync, existsSync, mkdirSync, readdirSync } from 'fs'; +import { join } from 'path'; + +// Simple file-based storage for demo +const DATA_DIR = join(process.cwd(), '.pipeline-data'); + +interface PipelineData { + id: string; + name: string; + status: 'running' | 'completed' | 'failed' | 'cancelled'; + projects: ProjectData[]; + createdAt: string; + updatedAt: string; +} + +interface ProjectData { + id: string; + name: string; + status: string; + agents: AgentData[]; +} + +interface AgentData { + id: string; + role: string; + status: string; + output?: string; + startedAt?: string; + completedAt?: string; +} + +function ensureDataDir() { + if (!existsSync(DATA_DIR)) { + mkdirSync(DATA_DIR, { recursive: true }); + } +} + +function getPipelinePath(id: string) { + return join(DATA_DIR, `${id}.json`); +} + +function savePipeline(pipeline: PipelineData) { + ensureDataDir(); + writeFileSync(getPipelinePath(pipeline.id), JSON.stringify(pipeline, null, 2)); +} + +function loadPipeline(id: string): PipelineData | null { + const path = getPipelinePath(id); + if (!existsSync(path)) return null; + return JSON.parse(readFileSync(path, 'utf-8')); +} + +function listPipelines(): PipelineData[] { + ensureDataDir(); + const files = readdirSync(DATA_DIR).filter((f) => f.endsWith('.json')); + return files.map((f) => JSON.parse(readFileSync(join(DATA_DIR, f), 'utf-8'))); +} + +export async function POST(request: NextRequest) { + try { + const body = await request.json(); + const { action, data } = body; + + switch (action) { + case 'create': { + const id = `pipeline-${Date.now()}-${Math.random().toString(36).substring(2, 8)}`; + const pipeline: PipelineData = { + id, + name: data.name || 'New Pipeline', + status: 'running', + projects: data.projects?.map((p: { id: string; name: string }) => ({ + id: p.id, + name: p.name, + status: 'pending', + agents: [ + { id: `${p.id}-programmer`, role: 'programmer', status: 'pending' }, + { id: `${p.id}-reviewer`, role: 'reviewer', status: 'pending' }, + { id: `${p.id}-tester`, role: 'tester', status: 'pending' } + ] + })) || [], + createdAt: new Date().toISOString(), + updatedAt: new Date().toISOString() + }; + + savePipeline(pipeline); + + // Simulate pipeline execution + simulatePipelineExecution(id); + + return NextResponse.json({ success: true, pipeline }); + } + + case 'get': { + const pipeline = loadPipeline(data.id); + if (!pipeline) { + return NextResponse.json({ success: false, error: 'Pipeline not found' }, { status: 404 }); + } + return NextResponse.json({ success: true, pipeline }); + } + + case 'list': { + const pipelines = listPipelines(); + return NextResponse.json({ success: true, pipelines }); + } + + case 'cancel': { + const pipeline = loadPipeline(data.id); + if (!pipeline) { + return NextResponse.json({ success: false, error: 'Pipeline not found' }, { status: 404 }); + } + pipeline.status = 'cancelled'; + pipeline.updatedAt = new Date().toISOString(); + savePipeline(pipeline); + return NextResponse.json({ success: true, pipeline }); + } + + default: + return NextResponse.json({ success: false, error: 'Unknown action' }, { status: 400 }); + } + } catch (error) { + console.error('Pipeline API error:', error); + return NextResponse.json( + { success: false, error: error instanceof Error ? error.message : 'Unknown error' }, + { status: 500 } + ); + } +} + +export async function GET() { + const pipelines = listPipelines(); + return NextResponse.json({ success: true, pipelines, count: pipelines.length }); +} + +// Simulate pipeline execution for demo +async function simulatePipelineExecution(pipelineId: string) { + const updateAgent = (projectId: string, agentId: string, updates: Partial) => { + const pipeline = loadPipeline(pipelineId); + if (!pipeline) return; + + const project = pipeline.projects.find(p => p.id === projectId); + if (!project) return; + + const agent = project.agents.find(a => a.id === agentId); + if (!agent) return; + + Object.assign(agent, updates); + pipeline.updatedAt = new Date().toISOString(); + savePipeline(pipeline); + }; + + const updateProject = (projectId: string, status: string) => { + const pipeline = loadPipeline(pipelineId); + if (!pipeline) return; + + const project = pipeline.projects.find(p => p.id === projectId); + if (!project) return; + + project.status = status; + pipeline.updatedAt = new Date().toISOString(); + savePipeline(pipeline); + }; + + const updatePipeline = (status: PipelineData['status']) => { + const pipeline = loadPipeline(pipelineId); + if (!pipeline) return; + + pipeline.status = status; + pipeline.updatedAt = new Date().toISOString(); + savePipeline(pipeline); + }; + + // Get pipeline + const pipeline = loadPipeline(pipelineId); + if (!pipeline) return; + + // Simulate execution for each project + for (const project of pipeline.projects) { + // Programmer phase + updateAgent(project.id, `${project.id}-programmer`, { status: 'running', startedAt: new Date().toISOString() }); + await sleep(2000); + updateAgent(project.id, `${project.id}-programmer`, { + status: 'completed', + output: 'Code implementation completed. Created modules, tests, and documentation.', + completedAt: new Date().toISOString() + }); + + // Reviewer phase + updateAgent(project.id, `${project.id}-reviewer`, { status: 'running', startedAt: new Date().toISOString() }); + await sleep(1500); + updateAgent(project.id, `${project.id}-reviewer`, { + status: 'completed', + output: 'Code review completed. No critical issues found. Minor suggestions for improvement.', + completedAt: new Date().toISOString() + }); + + // Tester phase + updateAgent(project.id, `${project.id}-tester`, { status: 'running', startedAt: new Date().toISOString() }); + await sleep(1800); + updateAgent(project.id, `${project.id}-tester`, { + status: 'completed', + output: 'All tests passed. Coverage: 94%. Integration tests: 12/12 passed.', + completedAt: new Date().toISOString() + }); + + updateProject(project.id, 'completed'); + } + + updatePipeline('completed'); +} + +function sleep(ms: number) { + return new Promise(resolve => setTimeout(resolve, ms)); +} diff --git a/src/app/page.tsx b/src/app/page.tsx index 9778fcf..e718b75 100755 --- a/src/app/page.tsx +++ b/src/app/page.tsx @@ -13,714 +13,547 @@ import { ScrollArea } from '@/components/ui/scroll-area'; import { Separator } from '@/components/ui/separator'; import { Cpu, - MessageSquare, Layers, Users, Zap, - Database, BarChart3, Play, - Trash2, RefreshCw, Plus, Send, - Activity + Activity, + Download, + FolderArchive, + GitBranch, + CheckCircle2, + XCircle, + Clock, + Code, + FileCode, + TestTube, + Eye, + Terminal, + ArrowRight } from 'lucide-react'; // Types -interface TokenResult { - tokens: number; - characters: number; - words: number; +interface AgentInfo { + id: string; + role: string; + status: string; + output?: string; + startedAt?: string; + completedAt?: string; } -interface ConversationResult { - total: number; - breakdown: Array<{ - role: string; - content: string; - tokens: number; - }>; +interface ProjectInfo { + id: string; + name: string; + status: string; + agents: AgentInfo[]; } -interface BudgetInfo { - used: number; - remaining: number; - total: number; - percentageUsed: number; +interface PipelineInfo { + id: string; + name: string; + status: 'running' | 'completed' | 'failed' | 'cancelled'; + projects: ProjectInfo[]; + createdAt: string; + updatedAt: string; } -interface CompactionResult { - messages: Array<{ role: string; content: string }>; - originalTokenCount: number; - newTokenCount: number; - tokensSaved: number; - compressionRatio: number; - strategy: string; - summaryAdded: boolean; - removedCount: number; -} - -interface OrchestratorStats { - agents: { total: number; idle: number; working: number }; - tasks: { pending: number; running: number; completed: number; failed: number }; -} - -interface SubagentResult { - subagentId: string; - taskId: string; - success: boolean; - output: unknown; - error?: string; - duration: number; -} - -export default function AgentSystemPage() { - // Token Counter State - const [tokenInput, setTokenInput] = useState(''); - const [tokenResult, setTokenResult] = useState(null); - - // Context Compaction State - const [conversationMessages, setConversationMessages] = useState>([ - { role: 'user', content: 'Hello, I need help with my project.' }, - { role: 'assistant', content: 'I would be happy to help you with your project! What specifically do you need assistance with?' }, - { role: 'user', content: 'I need to build a real-time chat application with WebSocket support.' }, - { role: 'assistant', content: 'Great choice! For a real-time chat application with WebSocket support, I recommend using Socket.io or the native WebSocket API. Would you like me to outline the architecture?' } +export default function PipelineSystemPage() { + const [pipelines, setPipelines] = useState([]); + const [selectedPipeline, setSelectedPipeline] = useState(null); + const [isLoading, setIsLoading] = useState(false); + const [newPipelineName, setNewPipelineName] = useState('Multi-Project Pipeline'); + const [projectConfigs, setProjectConfigs] = useState([ + { id: 'project-1', name: 'Authentication Module' }, + { id: 'project-2', name: 'Payment Gateway' }, + { id: 'project-3', name: 'User Dashboard' }, + { id: 'project-4', name: 'API Service' } ]); - const [newMessage, setNewMessage] = useState({ role: 'user', content: '' }); - const [budget, setBudget] = useState(null); - const [compactionResult, setCompactionResult] = useState(null); - // Orchestrator State - const [orchestratorStats, setOrchestratorStats] = useState(null); - const [agentConfig, setAgentConfig] = useState({ - id: '', - name: 'Worker Agent', - type: 'worker', - capabilities: 'process,execute', - maxConcurrentTasks: 3, - timeout: 60000 - }); - - // Subagent State - const [subagentType, setSubagentType] = useState('explorer'); - const [subagentTask, setSubagentTask] = useState(''); - const [subagentContext, setSubagentContext] = useState(''); - const [subagentResult, setSubagentResult] = useState(null); - const [isProcessing, setIsProcessing] = useState(false); - - // Fetch orchestrator stats - const fetchStats = useCallback(async () => { + // Fetch pipelines + const fetchPipelines = useCallback(async () => { try { - const response = await fetch('/api/agent-system'); + const response = await fetch('/api/pipeline'); const data = await response.json(); if (data.success) { - setOrchestratorStats(data.stats.orchestrator); + setPipelines(data.pipelines); } } catch (error) { - console.error('Failed to fetch stats:', error); + console.error('Failed to fetch pipelines:', error); } }, []); - useEffect(() => { - fetchStats(); - const interval = setInterval(fetchStats, 5000); - return () => clearInterval(interval); - }, [fetchStats]); - - // Count tokens - const handleCountTokens = async () => { - const response = await fetch('/api/agent-system', { - method: 'POST', - headers: { 'Content-Type': 'application/json' }, - body: JSON.stringify({ - action: 'count-tokens', - data: { text: tokenInput } - }) - }); - const data = await response.json(); - if (data.success) { - setTokenResult(data.result); - } - }; - - // Add message to conversation - const handleAddMessage = () => { - if (!newMessage.content.trim()) return; - setConversationMessages([...conversationMessages, { ...newMessage }]); - setNewMessage({ role: 'user', content: '' }); - }; - - // Check compaction status - const handleCheckCompaction = async () => { - const response = await fetch('/api/agent-system', { - method: 'POST', - headers: { 'Content-Type': 'application/json' }, - body: JSON.stringify({ - action: 'check-compaction', - data: { messages: conversationMessages } - }) - }); - const data = await response.json(); - if (data.success) { - setBudget(data.budget); - } - }; - - // Compact context - const handleCompactContext = async () => { - setIsProcessing(true); + // Fetch single pipeline + const fetchPipeline = useCallback(async (id: string) => { try { - const response = await fetch('/api/agent-system', { + const response = await fetch('/api/pipeline', { method: 'POST', headers: { 'Content-Type': 'application/json' }, - body: JSON.stringify({ - action: 'compact-context', - data: { - messages: conversationMessages, - config: { strategy: 'hybrid', preserveRecentCount: 2 } - } - }) + body: JSON.stringify({ action: 'get', data: { id } }) }); const data = await response.json(); if (data.success) { - setCompactionResult(data.result); - setConversationMessages(data.result.messages); + setSelectedPipeline(data.pipeline); } - } finally { - setIsProcessing(false); + } catch (error) { + console.error('Failed to fetch pipeline:', error); } - }; + }, []); - // Register agent - const handleRegisterAgent = async () => { - const id = agentConfig.id || `agent-${Date.now()}`; - const response = await fetch('/api/agent-system', { - method: 'POST', - headers: { 'Content-Type': 'application/json' }, - body: JSON.stringify({ - action: 'register-agent', - data: { - config: { - id, - name: agentConfig.name, - type: agentConfig.type, - capabilities: agentConfig.capabilities.split(',').map(c => c.trim()), - maxConcurrentTasks: agentConfig.maxConcurrentTasks, - timeout: agentConfig.timeout - } - } - }) - }); - const data = await response.json(); - if (data.success) { - fetchStats(); - setAgentConfig({ ...agentConfig, id: '' }); - } - }; - - // Execute subagent task - const handleExecuteSubagent = async () => { - if (!subagentTask.trim()) return; - setIsProcessing(true); + // Create new pipeline + const createPipeline = async () => { + setIsLoading(true); try { - const response = await fetch('/api/agent-system', { + const response = await fetch('/api/pipeline', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ - action: 'spawn-subagent', + action: 'create', data: { - type: subagentType, - task: subagentTask, - context: subagentContext + name: newPipelineName, + projects: projectConfigs } }) }); const data = await response.json(); if (data.success) { - setSubagentResult(data.result); + fetchPipelines(); + setSelectedPipeline(data.pipeline); } } finally { - setIsProcessing(false); + setIsLoading(false); } }; + // Cancel pipeline + const cancelPipeline = async (id: string) => { + try { + const response = await fetch('/api/pipeline', { + method: 'POST', + headers: { 'Content-Type': 'application/json' }, + body: JSON.stringify({ action: 'cancel', data: { id } }) + }); + const data = await response.json(); + if (data.success) { + fetchPipelines(); + if (selectedPipeline?.id === id) { + setSelectedPipeline(data.pipeline); + } + } + } catch (error) { + console.error('Failed to cancel pipeline:', error); + } + }; + + // Poll for updates + useEffect(() => { + fetchPipelines(); + const interval = setInterval(fetchPipelines, 3000); + return () => clearInterval(interval); + }, [fetchPipelines]); + + // Update selected pipeline + useEffect(() => { + if (selectedPipeline) { + const interval = setInterval(() => { + fetchPipeline(selectedPipeline.id); + }, 2000); + return () => clearInterval(interval); + } + }, [selectedPipeline?.id, fetchPipeline]); + + // Get status color + const getStatusColor = (status: string) => { + switch (status) { + case 'running': return 'bg-blue-500'; + case 'completed': return 'bg-emerald-500'; + case 'failed': return 'bg-red-500'; + case 'cancelled': return 'bg-amber-500'; + case 'pending': return 'bg-slate-500'; + default: return 'bg-slate-500'; + } + }; + + // Get role icon + const getRoleIcon = (role: string) => { + switch (role) { + case 'programmer': return ; + case 'reviewer': return ; + case 'tester': return ; + default: return ; + } + }; + + // Calculate progress + const calculateProgress = (pipeline: PipelineInfo) => { + const totalAgents = pipeline.projects.reduce((sum, p) => sum + p.agents.length, 0); + const completedAgents = pipeline.projects.reduce( + (sum, p) => sum + p.agents.filter(a => a.status === 'completed').length, + 0 + ); + return totalAgents > 0 ? (completedAgents / totalAgents) * 100 : 0; + }; + return (
{/* Header */} -
-

- - Agent System -

-

- Complete implementation of context compaction, agent orchestration, and subagent spawning -

-
- - {/* Stats Overview */} - {orchestratorStats && ( -
- +
+
+

+ + Deterministic Multi-Agent Pipeline +

+

+ State machine orchestration • Parallel execution • Event-driven coordination +

+
+
+ -
- -
-

Agents

-

- {orchestratorStats.agents.total} -

-
-
-
-
- - -
- -
-

Working

-

- {orchestratorStats.agents.working} -

-
-
-
-
- - -
- -
-

Tasks Running

-

- {orchestratorStats.tasks.running} -

-
-
-
-
- - -
- -
-

Completed

-

- {orchestratorStats.tasks.completed} -

-
-
+ + + Download Source + +
+
+ + {/* Stats Overview */} +
+ + +
+ +
+

Pipelines

+

{pipelines.length}

+
+
+
+
+ + +
+ +
+

Running

+

+ {pipelines.filter(p => p.status === 'running').length} +

+
+
+
+
+ + +
+ +
+

Completed

+

+ {pipelines.filter(p => p.status === 'completed').length} +

+
+
+
+
+ + +
+ +
+

Concurrent Max

+

12

+
+
+
+
+
+ + {/* Main Content */} +
+ {/* Left Panel - Create Pipeline */} + + + Create Pipeline + + Configure and launch a multi-project pipeline + + + +
+ + setNewPipelineName(e.target.value)} + className="bg-slate-900 border-slate-600 text-white mt-1" + /> +
+ +
+ + + {projectConfigs.map((project, idx) => ( +
+ {idx + 1}. + { + const updated = [...projectConfigs]; + updated[idx].name = e.target.value; + setProjectConfigs(updated); + }} + className="bg-slate-900 border-slate-600 text-white flex-1 text-sm" + /> +
+ ))} +
+
+ +
+ +
+ + + + + +
+

• 4 projects × 3 roles = 12 concurrent agents

+

• Code → Review → Test flow

+

• Max 3 review iterations

+
+
+
+ + {/* Right Panel - Pipeline Details */} + + + + Pipeline Execution + {selectedPipeline && ( + + {selectedPipeline.status} + + )} + + + {selectedPipeline ? selectedPipeline.id : 'Select or create a pipeline'} + + + + {selectedPipeline ? ( +
+ {/* Progress */} +
+
+ Progress + {calculateProgress(selectedPipeline).toFixed(0)}% +
+ +
+ + {/* Projects */} +
+ {selectedPipeline.projects.map((project) => ( + + +
+ {project.name} + + {project.status} + +
+
+ + {/* Agent Pipeline Visualization */} +
+ {project.agents.map((agent, idx) => ( +
+
+
+ {getRoleIcon(agent.role)} +
+ {agent.role} + {agent.status === 'completed' && ( + + )} + {agent.status === 'running' && ( + + )} +
+ {idx < project.agents.length - 1 && ( + + )} +
+ ))} +
+
+
+ ))} +
+ + {/* Actions */} + {selectedPipeline.status === 'running' && ( + + )} +
+ ) : ( +
+ +

No pipeline selected

+

Create a new pipeline or select from history

+
+ )} +
+
+
+ + {/* Pipeline History */} + {pipelines.length > 0 && ( + + + Pipeline History + + Click to view details + + + + +
+ {pipelines.slice().reverse().map((pipeline) => ( +
setSelectedPipeline(pipeline)} + className={` + flex items-center justify-between p-3 rounded-lg cursor-pointer + ${selectedPipeline?.id === pipeline.id + ? 'bg-slate-700 border border-emerald-600' + : 'bg-slate-900/50 border border-slate-700 hover:border-slate-600'} + `} + > +
+
+ {pipeline.name} + {pipeline.projects.length} projects +
+
+ + {new Date(pipeline.createdAt).toLocaleTimeString()} +
+
+ ))} +
+ + + )} - {/* Main Tabs */} - - - - - Token Counter - - - - Context Compaction - - - - Orchestrator - - - - Subagents - - - - {/* Token Counter Tab */} - - - - Token Counter - - Estimate token counts for text using character-based approximation - - - -
- -