admin/OpenQode

Fork 0

Files

Gemini AI 142aaeee1e Release v1.01 Enhanced: Vi Control, TUI Gen5, Core Stability

2025-12-20 01:12:45 +04:00

35 KiB

Raw Permalink Blame History

🦆 GOOSE SUPER - CORE SUBSYSTEMS DETAILED SPECIFICATIONS

Playwright Integration
Browser Use System
Computer Use System
Full Vision Capabilities
Vibe Server Management

1. Playwright Integration

Purpose

Automate web browsers with precision - navigate, click, fill forms, extract data, take screenshots.

Architecture

┌────────────────────────────────────────────────────────────────┐
│                    PLAYWRIGHT SUBSYSTEM                        │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│  User: "Search for hotels in Dubai on Booking.com"           │
│                         ↓                                      │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ IQ Exchange Translation Layer                             │ │
│  │   → BROWSER action detected                               │ │
│  │   → Route to Playwright Bridge                            │ │
│  └──────────────────────────────────────────────────────────┘ │
│                         ↓                                      │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ Playwright Bridge (bin/playwright-bridge.js)              │ │
│  │                                                           │ │
│  │  Commands:                                                │ │
│  │   navigate <url>          → Go to URL                     │ │
│  │   click <selector|text>   → Click element                 │ │
│  │   fill <selector> <text>  → Type in input                 │ │
│  │   type <text>             → Type at cursor                │ │
│  │   press <key>             → Press key (Enter, Tab, etc)   │ │
│  │   wait <selector> <ms>    → Wait for element              │ │
│  │   screenshot <file>       → Capture page                  │ │
│  │   content                 → Get page text                 │ │
│  │   elements                → List all elements             │ │
│  │   close                   → Close browser session         │ │
│  └──────────────────────────────────────────────────────────┘ │
│                         ↓                                      │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ Chromium Browser (Headless or Visible)                    │ │
│  │   • Persistent session across commands                    │ │
│  │   • Cookies & auth remembered                             │ │
│  │   • DevTools access for debugging                         │ │
│  └──────────────────────────────────────────────────────────┘ │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Current State vs. Enhancement

Feature	Current	Enhanced
Session persistence	✅ Basic	✅ Full cookie/auth persistence
Element selection	Text/CSS only	Smart AI-assisted selector
Form filling	Manual fields	Auto-detect all form fields
Multi-tab	❌ Single tab	✅ Multiple tabs
Screenshots	✅ Basic	✅ With element annotations
DOM extraction	❌	✅ Structured for AI analysis

Enhanced Playwright Bridge Commands

// NEW: Smart element finding (AI-assisted)
node playwright-bridge.js smart-click "the search button"
node playwright-bridge.js smart-fill "email" "user@example.com"

// NEW: Element discovery for AI
node playwright-bridge.js list-interactive   // Lists all clickable/fillable elements
node playwright-bridge.js describe-page      // AI-friendly page description

// NEW: Multi-tab
node playwright-bridge.js new-tab
node playwright-bridge.js switch-tab 2
node playwright-bridge.js list-tabs

// NEW: Authentication helpers
node playwright-bridge.js save-cookies "site-name"
node playwright-bridge.js load-cookies "site-name"

// NEW: Form automation
node playwright-bridge.js auto-fill-form '{"email":"x","password":"y"}'

Integration with IQ Exchange

// In lib/iq-exchange.mjs - TASK_PATTERNS
browser: /\b(website|browser|search|navigate|booking\.com|amazon|google|fill.*form|click.*button|login|signup)\b/i

// Translation example:
// User: "Book a hotel in Dubai on Booking.com"
// IQ Exchange generates:
[
  'node playwright-bridge.js navigate "https://booking.com"',
  'node playwright-bridge.js wait "[name=ss]" 5000',
  'node playwright-bridge.js fill "[name=ss]" "Dubai hotels"', 
  'node playwright-bridge.js click "[type=submit]"',
  'node playwright-bridge.js screenshot "search-results.png"'
]

2. Browser Use System

Purpose

Built-in browser panel inside Goose for:

App Preview - See your code rendered live
Web Browsing - AI can browse the web inside Goose
Automation Control - Visual feedback for Playwright actions

Architecture

┌────────────────────────────────────────────────────────────────┐
│                    BROWSER USE SYSTEM                          │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ BUILT-IN BROWSER (Electron BrowserView)                   │ │
│  │                                                           │ │
│  │  ┌───────────────────────────────────────────────────┐   │ │
│  │  │ 🌐 https://localhost:5173    🔄 ◀ ▶ 🔍 🛠️ 📸      │   │ │
│  │  ├───────────────────────────────────────────────────┤   │ │
│  │  │                                                   │   │ │
│  │  │             YOUR APP RENDERS HERE                 │   │ │
│  │  │                                                   │   │ │
│  │  │     [Live preview of HTML/CSS/JS you write]      │   │ │
│  │  │                                                   │   │ │
│  │  │                                                   │   │ │
│  │  └───────────────────────────────────────────────────┘   │ │
│  │                                                           │ │
│  │  Toolbar:                                                 │ │
│  │   🔄 Refresh  ◀ Back  ▶ Forward  🔍 URL Bar              │ │
│  │   🛠️ DevTools  📸 Screenshot  📍 Element Inspect         │ │
│  └──────────────────────────────────────────────────────────┘ │
│                                                                │
│  Modes:                                                        │
│   [Preview] - Shows your local app (default)                  │
│   [Browse]  - Navigate to any URL                             │
│   [Automate]- Watch Playwright actions in real-time           │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Features

Preview Mode (Vibe Coding)

User: "Make me a landing page"
  ↓
Goose writes HTML/CSS/JS
  ↓
Auto-starts Vite dev server (port 5173)
  ↓
Built-in browser loads http://localhost:5173
  ↓
User: "Make the header blue"
  ↓
Goose edits CSS → Hot reload → Preview updates instantly

Browse Mode

User: "Go to GitHub and star the browser-use repo"
  ↓
Built-in browser navigates to github.com
  ↓
AI uses Playwright to interact with the page
  ↓
User can SEE every action happening

Automate Mode (Visible Automation)

When Playwright runs:
  1. Built-in browser shows the exact page
  2. Red highlight appears on clicked elements
  3. Green highlight on filled inputs
  4. Status bar shows current action

IPC Handlers (main.cjs)

// Browser panel control
ipcMain.handle('browser:create-view', async (event, options) => {
    browserView = new BrowserView({ webPreferences });
    mainWindow.addBrowserView(browserView);
    return { success: true };
});

ipcMain.handle('browser:navigate', async (event, url) => {
    await browserView.webContents.loadURL(url);
    return { success: true, url };
});

ipcMain.handle('browser:screenshot', async () => {
    const image = await browserView.webContents.capturePage();
    return { success: true, base64: image.toDataURL() };
});

ipcMain.handle('browser:execute-js', async (event, script) => {
    const result = await browserView.webContents.executeJavaScript(script);
    return { success: true, result };
});

ipcMain.handle('browser:get-dom', async () => {
    const dom = await browserView.webContents.executeJavaScript(`
        Array.from(document.querySelectorAll('a, button, input, select, textarea'))
            .map(el => ({
                tag: el.tagName,
                text: el.textContent?.trim().slice(0, 50),
                id: el.id,
                name: el.name,
                type: el.type,
                placeholder: el.placeholder
            }))
    `);
    return { success: true, elements: dom };
});

3. Computer Use System

Purpose

Control the Windows desktop - open apps, click UI elements, type text, take screenshots.

Architecture

┌────────────────────────────────────────────────────────────────┐
│                   COMPUTER USE SYSTEM                          │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│  User: "Open Spotify and play some jazz"                      │
│                         ↓                                      │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ IQ Exchange Translation Layer                             │ │
│  │   → DESKTOP action detected                               │ │
│  │   → Route to Computer Use module                          │ │
│  └──────────────────────────────────────────────────────────┘ │
│                         ↓                                      │
│  ┌────────────────────────────────────┬─────────────────────┐ │
│  │ input.ps1 (PowerShell)             │ computer-use.cjs    │ │
│  │ (Current - 66KB!)                  │ (Node.js native)    │ │
│  │                                    │                     │ │
│  │ • UIAutomation via .NET            │ • robotjs for speed │ │
│  │ • OCR via Windows API              │ • screenshot-desktop│ │
│  │ • Window management                │ • Cross-platform    │ │
│  └────────────────────────────────────┴─────────────────────┘ │
│                         ↓                                      │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ Windows Desktop                                           │ │
│  │   • Apps, dialogs, menus                                  │ │
│  │   • Task bar, system tray                                 │ │
│  │   • Any visible UI element                                │ │
│  └──────────────────────────────────────────────────────────┘ │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Available Commands (input.ps1)

Vision Commands (The Eyes)

# See what's on screen
input.ps1 screenshot "screen.png"      # Capture full screen
input.ps1 ocr "full"                   # Read ALL text on screen
input.ps1 ocr "region" 100 100 500 400 # Read text in region
input.ps1 app_state "Notepad"          # Get UI tree of app (buttons, menus, etc.)

# Open and focus apps
input.ps1 open "Spotify"               # Launch or focus app
input.ps1 startmenu                    # Open Windows Start menu
input.ps1 focus "Save As"              # Focus specific dialog/element
input.ps1 waitfor "Ready" 10           # Wait for text to appear (max 10s)

Interaction Commands (Smart - via UIAutomation)

# Click by element name (RELIABLE!)
input.ps1 uiclick "Play"               # Click button named "Play"
input.ps1 uiclick "File"               # Click menu item
input.ps1 uipress "Remember me"        # Toggle checkbox
input.ps1 uipress "Jazz"               # Select list item

# Type text
input.ps1 type "Hello World"           # Type into focused element
input.ps1 keyboard "search query"      # Alternative typing method
input.ps1 key "Enter"                  # Press single key
input.ps1 hotkey "Ctrl+S"              # Key combination

Fallback Commands (Coordinate-based)

# When UIAutomation fails, use coordinates
input.ps1 mouse 500 300                # Move mouse to x,y
input.ps1 click                        # Click at current position
input.ps1 drag 100 100 500 500         # Drag from point to point

Example Flow

User: "Open Paint and draw a red circle"

IQ Exchange generates:
1. input.ps1 open "Paint"
2. input.ps1 waitfor "Untitled" 5
3. input.ps1 uiclick "Brushes"
4. input.ps1 uiclick "Red"
5. input.ps1 mouse 400 300
6. input.ps1 drag 400 300 500 400

After each step:
  → Screenshot
  → OCR/app_state to verify
  → If failed → Self-heal and retry

Enhancements Needed

Current	Enhanced
PowerShell scripts (slow startup)	Node.js native with robotjs
Basic element finding	Smart element grounding with labels
Manual verification	Auto-screenshot after each action
English-only OCR	Multi-language OCR support

4. Full Vision Capabilities

Purpose

Let the AI "see" the screen like a human - understand what's visible, where elements are, and verify actions worked.

Architecture

┌────────────────────────────────────────────────────────────────┐
│                    VISION SYSTEM                               │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ SCREENSHOT LAYER                                          │ │
│  │   • Full screen capture                                   │ │
│  │   • Window-specific capture                               │ │
│  │   • Region capture                                        │ │
│  │   • Before/after comparison                               │ │
│  └──────────────────────────────────────────────────────────┘ │
│                         ↓                                      │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ ANALYSIS LAYER                                            │ │
│  │                                                           │ │
│  │  ┌────────────────┬────────────────┬──────────────────┐  │ │
│  │  │ OCR Engine     │ UI Automation  │ AI Vision (LLM)  │  │ │
│  │  │                │                │                  │  │ │
│  │  │ • Read text    │ • Element tree │ • Understand     │  │ │
│  │  │ • Find words   │ • Button names │   context        │  │ │
│  │  │ • Coordinates  │ • Input fields │ • Verify success │  │ │
│  │  │   of text      │ • Menus        │ • Describe scene │  │ │
│  │  └────────────────┴────────────────┴──────────────────┘  │ │
│  └──────────────────────────────────────────────────────────┘ │
│                         ↓                                      │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ GROUNDING LAYER (Element Labeling)                        │ │
│  │                                                           │ │
│  │  Screenshot with overlays:                                │ │
│  │  ┌─────────────────────────────────────────┐             │ │
│  │  │   [1] File   [2] Edit   [3] View        │             │ │
│  │  │   ┌─────────────────────────────┐       │             │ │
│  │  │   │ [4] Search...               │       │             │ │
│  │  │   └─────────────────────────────┘       │             │ │
│  │  │                                         │             │ │
│  │  │   [5] Save   [6] Cancel                 │             │ │
│  │  └─────────────────────────────────────────┘             │ │
│  │                                                           │ │
│  │  AI can say: "Click element [5]" → We click Save button  │ │
│  └──────────────────────────────────────────────────────────┘ │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Vision Flow

Before Action

1. Screenshot current state
2. OCR to find text locations
3. UIAutomation to get element names
4. Create "grounding map" of clickable elements
5. Send to AI: "Here's what I see on screen..."

After Action

1. Screenshot new state
2. Compare with previous
3. OCR to verify expected text appeared
4. Send to AI: "Did the action succeed?"
5. If failed → Generate fix → Retry

Grounding Map Example

{
  "screenshot": "base64://...",
  "elements": [
    { "id": 1, "type": "button", "text": "File", "bounds": [10, 5, 50, 25] },
    { "id": 2, "type": "button", "text": "Edit", "bounds": [60, 5, 100, 25] },
    { "id": 3, "type": "input", "placeholder": "Search...", "bounds": [150, 50, 400, 80] },
    { "id": 4, "type": "button", "text": "Save", "bounds": [300, 500, 380, 530] },
    { "id": 5, "type": "button", "text": "Cancel", "bounds": [400, 500, 480, 530] }
  ],
  "ocr_text": [
    { "text": "Untitled - Notepad", "bounds": [100, 0, 300, 20] },
    { "text": "Hello World", "bounds": [50, 100, 200, 120] }
  ]
}

AI Prompt with Vision

You are controlling a Windows computer. Here's what you see:

[SCREENSHOT: base64 image]

INTERACTIVE ELEMENTS:
[1] Button: "File" at (10, 5)
[2] Button: "Edit" at (60, 5)
[3] Input: "Search..." at (150, 50)
[4] Button: "Save" at (300, 500)
[5] Button: "Cancel" at (400, 500)

VISIBLE TEXT:
- "Untitled - Notepad" (title bar)
- "Hello World" (document content)

USER REQUEST: "Save this file as hello.txt"

YOUR ACTION:
- To click an element, say: CLICK [4]
- To type, say: TYPE "hello.txt"
- To press key, say: KEY Enter

5. Vibe Server Management

Purpose

"Vibe Server Management" = Server tasks for people who don't know servers.

No SSH commands to memorize. Just tell Goose what you want in plain English.

Architecture

┌────────────────────────────────────────────────────────────────┐
│                 VIBE SERVER MANAGEMENT                         │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│  User: "My website is down, fix it"                           │
│                         ↓                                      │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ IQ Exchange Translation Layer                             │ │
│  │   → SERVER action detected                                │ │
│  │   → Route to Server Management module                     │ │
│  └──────────────────────────────────────────────────────────┘ │
│                         ↓                                      │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ SERVER DIAGNOSIS FLOW                                     │ │
│  │                                                           │ │
│  │  1. Check if we have SSH credentials                      │ │
│  │     → If not: "I need SSH access. Enter credentials?"     │ │
│  │                                                           │ │
│  │  2. Connect to server                                     │ │
│  │     → ssh root@86.105.224.125                            │ │
│  │                                                           │ │
│  │  3. Diagnose the issue                                    │ │
│  │     → systemctl status nginx                              │ │
│  │     → pm2 status                                          │ │
│  │     → tail -n 50 /var/log/nginx/error.log                │ │
│  │                                                           │ │
│  │  4. Fix the issue                                         │ │
│  │     → systemctl restart nginx                             │ │
│  │     → pm2 restart all                                     │ │
│  │                                                           │ │
│  │  5. Verify fix worked                                     │ │
│  │     → curl http://localhost                               │ │
│  │     → Report back to user                                 │ │
│  └──────────────────────────────────────────────────────────┘ │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Noob-Friendly Commands (What User Says → What Goose Does)

User Says	Goose Does
"Is my server running?"	`ssh` → `systemctl status nginx` → parse output
"Restart my website"	`pm2 restart all` or `systemctl restart nginx`
"Check the logs"	`tail -n 100 /var/log/nginx/error.log`
"Deploy my latest code"	`git pull` → `npm install` → `npm run build` → `pm2 restart`
"My server is slow"	Check CPU/memory, identify resource hogs
"Set up SSL"	`certbot --nginx -d domain.com`
"Add a new website"	Create nginx config, set up PM2 process
"Backup my database"	`pg_dump` or `mysqldump`
"My disk is full"	`df -h`, find large files, clean up safely

Server Panel UI

┌────────────────────────────────────────────────────────────────┐
│ 🔗 SERVER MANAGEMENT                                     [X]   │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│  ┌───────────────────────────────────────────────────────────┐│
│  │ CONNECTIONS                                    [+ Add New] ││
│  │ ─────────────────────────────────────────────────────────  ││
│  │ ● Production Server (86.105.224.125)           [Connect]  ││
│  │ ○ Staging Server (192.168.1.100)               [Connect]  ││
│  └───────────────────────────────────────────────────────────┘│
│                                                                │
│  ┌───────────────────────────────────────────────────────────┐│
│  │ QUICK ACTIONS                                              ││
│  │ ─────────────────────────────────────────────────────────  ││
│  │  [🔄 Restart Website]  [📊 Check Status]  [📋 View Logs]  ││
│  │  [🚀 Deploy Latest]    [💾 Backup DB]     [🔒 SSL Setup]  ││
│  └───────────────────────────────────────────────────────────┘│
│                                                                │
│  ┌───────────────────────────────────────────────────────────┐│
│  │ LIVE TERMINAL                                              ││
│  │ ─────────────────────────────────────────────────────────  ││
│  │ root@server:~# systemctl status nginx                     ││
│  │ ● nginx.service - A high performance web server           ││
│  │    Loaded: loaded (/lib/systemd/system/nginx.service)     ││
│  │    Active: active (running) since Mon 2025-12-16 10:00    ││
│  │                                                           ││
│  │ $> _                                                      ││
│  └───────────────────────────────────────────────────────────┘│
│                                                                │
│  ┌───────────────────────────────────────────────────────────┐│
│  │ 💬 ASK GOOSE (natural language)                           ││
│  │ ─────────────────────────────────────────────────────────  ││
│  │ "Why is my website slow?"                    [Ask Goose]  ││
│  └───────────────────────────────────────────────────────────┘│
│                                                                │
└────────────────────────────────────────────────────────────────┘

SSH Service Implementation

// services/SSHService.js
const { Client } = require('ssh2');

class SSHService {
    constructor() {
        this.connections = new Map();
    }

    async connect(name, config) {
        const conn = new Client();
        return new Promise((resolve, reject) => {
            conn.on('ready', () => {
                this.connections.set(name, conn);
                resolve({ success: true });
            });
            conn.on('error', reject);
            conn.connect(config);
        });
    }

    async exec(name, command) {
        const conn = this.connections.get(name);
        if (!conn) throw new Error('Not connected');
        
        return new Promise((resolve, reject) => {
            conn.exec(command, (err, stream) => {
                if (err) reject(err);
                let output = '';
                stream.on('data', (data) => output += data);
                stream.on('close', () => resolve(output));
            });
        });
    }

    // High-level helper methods
    async checkWebsiteStatus(name) {
        const nginx = await this.exec(name, 'systemctl is-active nginx');
        const pm2 = await this.exec(name, 'pm2 jlist');
        return { nginx: nginx.trim(), pm2: JSON.parse(pm2) };
    }

    async restartWebsite(name) {
        await this.exec(name, 'pm2 restart all');
        await this.exec(name, 'systemctl reload nginx');
        return { success: true };
    }

    async getLogs(name, lines = 50) {
        return this.exec(name, `tail -n ${lines} /var/log/nginx/error.log`);
    }

    async deploy(name, path = '/var/www/app') {
        await this.exec(name, `cd ${path} && git pull`);
        await this.exec(name, `cd ${path} && npm install`);
        await this.exec(name, `cd ${path} && npm run build`);
        await this.exec(name, 'pm2 restart all');
        return { success: true };
    }
}

Saved Server Configurations

// .opencode/servers.json
{
  "production": {
    "name": "Production Server",
    "host": "86.105.224.125",
    "port": 22,
    "username": "root",
    "authType": "password",
    "webRoot": "/var/www/myapp",
    "pm2App": "myapp"
  },
  "staging": {
    "name": "Staging Server",
    "host": "192.168.1.100",
    "port": 22,
    "username": "deploy",
    "authType": "key",
    "keyPath": "~/.ssh/id_rsa",
    "webRoot": "/var/www/staging"
  }
}

Summary: All Five Subsystems

Subsystem	Purpose	Key Tech
1. Playwright	Browser automation	Playwright + persistent sessions
2. Browser Use	Built-in browser panel	Electron BrowserView
3. Computer Use	Desktop automation	input.ps1 + UIAutomation
4. Full Vision	See and verify	OCR + Screenshot + Grounding
5. Vibe Server	Noob-friendly server ops	SSH2 + high-level helpers

All subsystems connect through IQ Exchange which:

Detects task type from natural language
Routes to appropriate subsystem
Executes with self-healing loop
Verifies success using vision
Retries until success or max attempts

This document provides implementation-ready specifications for each core subsystem.

35 KiB Raw Permalink Blame History

🦆 GOOSE SUPER - CORE SUBSYSTEMS DETAILED SPECIFICATIONS

Table of Contents

1. Playwright Integration

Purpose

Architecture

Current State vs. Enhancement

Enhanced Playwright Bridge Commands

Integration with IQ Exchange

2. Browser Use System

Purpose

Architecture

Features

Preview Mode (Vibe Coding)

Browse Mode

Automate Mode (Visible Automation)

IPC Handlers (main.cjs)

3. Computer Use System

Purpose

Architecture

Available Commands (input.ps1)

Vision Commands (The Eyes)

Navigation Commands

Interaction Commands (Smart - via UIAutomation)

Fallback Commands (Coordinate-based)

Example Flow

Enhancements Needed

4. Full Vision Capabilities

Purpose

Architecture

Vision Flow

Before Action

After Action

Grounding Map Example

AI Prompt with Vision

5. Vibe Server Management

Purpose

Architecture

Noob-Friendly Commands (What User Says → What Goose Does)

Server Panel UI

SSH Service Implementation

Saved Server Configurations

Summary: All Five Subsystems

35 KiB

Raw Permalink Blame History