Files
OpenQode/docs/GOOSE_SUPER_SUBSYSTEMS.md

35 KiB

🦆 GOOSE SUPER - CORE SUBSYSTEMS DETAILED SPECIFICATIONS

Table of Contents

  1. Playwright Integration
  2. Browser Use System
  3. Computer Use System
  4. Full Vision Capabilities
  5. Vibe Server Management

1. Playwright Integration

Purpose

Automate web browsers with precision - navigate, click, fill forms, extract data, take screenshots.

Architecture

┌────────────────────────────────────────────────────────────────┐
│                    PLAYWRIGHT SUBSYSTEM                        │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│  User: "Search for hotels in Dubai on Booking.com"           │
│                         ↓                                      │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ IQ Exchange Translation Layer                             │ │
│  │   → BROWSER action detected                               │ │
│  │   → Route to Playwright Bridge                            │ │
│  └──────────────────────────────────────────────────────────┘ │
│                         ↓                                      │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ Playwright Bridge (bin/playwright-bridge.js)              │ │
│  │                                                           │ │
│  │  Commands:                                                │ │
│  │   navigate <url>          → Go to URL                     │ │
│  │   click <selector|text>   → Click element                 │ │
│  │   fill <selector> <text>  → Type in input                 │ │
│  │   type <text>             → Type at cursor                │ │
│  │   press <key>             → Press key (Enter, Tab, etc)   │ │
│  │   wait <selector> <ms>    → Wait for element              │ │
│  │   screenshot <file>       → Capture page                  │ │
│  │   content                 → Get page text                 │ │
│  │   elements                → List all elements             │ │
│  │   close                   → Close browser session         │ │
│  └──────────────────────────────────────────────────────────┘ │
│                         ↓                                      │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ Chromium Browser (Headless or Visible)                    │ │
│  │   • Persistent session across commands                    │ │
│  │   • Cookies & auth remembered                             │ │
│  │   • DevTools access for debugging                         │ │
│  └──────────────────────────────────────────────────────────┘ │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Current State vs. Enhancement

Feature Current Enhanced
Session persistence Basic Full cookie/auth persistence
Element selection Text/CSS only Smart AI-assisted selector
Form filling Manual fields Auto-detect all form fields
Multi-tab Single tab Multiple tabs
Screenshots Basic With element annotations
DOM extraction Structured for AI analysis

Enhanced Playwright Bridge Commands

// NEW: Smart element finding (AI-assisted)
node playwright-bridge.js smart-click "the search button"
node playwright-bridge.js smart-fill "email" "user@example.com"

// NEW: Element discovery for AI
node playwright-bridge.js list-interactive   // Lists all clickable/fillable elements
node playwright-bridge.js describe-page      // AI-friendly page description

// NEW: Multi-tab
node playwright-bridge.js new-tab
node playwright-bridge.js switch-tab 2
node playwright-bridge.js list-tabs

// NEW: Authentication helpers
node playwright-bridge.js save-cookies "site-name"
node playwright-bridge.js load-cookies "site-name"

// NEW: Form automation
node playwright-bridge.js auto-fill-form '{"email":"x","password":"y"}'

Integration with IQ Exchange

// In lib/iq-exchange.mjs - TASK_PATTERNS
browser: /\b(website|browser|search|navigate|booking\.com|amazon|google|fill.*form|click.*button|login|signup)\b/i

// Translation example:
// User: "Book a hotel in Dubai on Booking.com"
// IQ Exchange generates:
[
  'node playwright-bridge.js navigate "https://booking.com"',
  'node playwright-bridge.js wait "[name=ss]" 5000',
  'node playwright-bridge.js fill "[name=ss]" "Dubai hotels"', 
  'node playwright-bridge.js click "[type=submit]"',
  'node playwright-bridge.js screenshot "search-results.png"'
]

2. Browser Use System

Purpose

Built-in browser panel inside Goose for:

  1. App Preview - See your code rendered live
  2. Web Browsing - AI can browse the web inside Goose
  3. Automation Control - Visual feedback for Playwright actions

Architecture

┌────────────────────────────────────────────────────────────────┐
│                    BROWSER USE SYSTEM                          │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ BUILT-IN BROWSER (Electron BrowserView)                   │ │
│  │                                                           │ │
│  │  ┌───────────────────────────────────────────────────┐   │ │
│  │  │ 🌐 https://localhost:5173    🔄 ◀ ▶ 🔍 🛠️ 📸      │   │ │
│  │  ├───────────────────────────────────────────────────┤   │ │
│  │  │                                                   │   │ │
│  │  │             YOUR APP RENDERS HERE                 │   │ │
│  │  │                                                   │   │ │
│  │  │     [Live preview of HTML/CSS/JS you write]      │   │ │
│  │  │                                                   │   │ │
│  │  │                                                   │   │ │
│  │  └───────────────────────────────────────────────────┘   │ │
│  │                                                           │ │
│  │  Toolbar:                                                 │ │
│  │   🔄 Refresh  ◀ Back  ▶ Forward  🔍 URL Bar              │ │
│  │   🛠️ DevTools  📸 Screenshot  📍 Element Inspect         │ │
│  └──────────────────────────────────────────────────────────┘ │
│                                                                │
│  Modes:                                                        │
│   [Preview] - Shows your local app (default)                  │
│   [Browse]  - Navigate to any URL                             │
│   [Automate]- Watch Playwright actions in real-time           │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Features

Preview Mode (Vibe Coding)

User: "Make me a landing page"
  ↓
Goose writes HTML/CSS/JS
  ↓
Auto-starts Vite dev server (port 5173)
  ↓
Built-in browser loads http://localhost:5173
  ↓
User: "Make the header blue"
  ↓
Goose edits CSS → Hot reload → Preview updates instantly

Browse Mode

User: "Go to GitHub and star the browser-use repo"
  ↓
Built-in browser navigates to github.com
  ↓
AI uses Playwright to interact with the page
  ↓
User can SEE every action happening

Automate Mode (Visible Automation)

When Playwright runs:
  1. Built-in browser shows the exact page
  2. Red highlight appears on clicked elements
  3. Green highlight on filled inputs
  4. Status bar shows current action

IPC Handlers (main.cjs)

// Browser panel control
ipcMain.handle('browser:create-view', async (event, options) => {
    browserView = new BrowserView({ webPreferences });
    mainWindow.addBrowserView(browserView);
    return { success: true };
});

ipcMain.handle('browser:navigate', async (event, url) => {
    await browserView.webContents.loadURL(url);
    return { success: true, url };
});

ipcMain.handle('browser:screenshot', async () => {
    const image = await browserView.webContents.capturePage();
    return { success: true, base64: image.toDataURL() };
});

ipcMain.handle('browser:execute-js', async (event, script) => {
    const result = await browserView.webContents.executeJavaScript(script);
    return { success: true, result };
});

ipcMain.handle('browser:get-dom', async () => {
    const dom = await browserView.webContents.executeJavaScript(`
        Array.from(document.querySelectorAll('a, button, input, select, textarea'))
            .map(el => ({
                tag: el.tagName,
                text: el.textContent?.trim().slice(0, 50),
                id: el.id,
                name: el.name,
                type: el.type,
                placeholder: el.placeholder
            }))
    `);
    return { success: true, elements: dom };
});

3. Computer Use System

Purpose

Control the Windows desktop - open apps, click UI elements, type text, take screenshots.

Architecture

┌────────────────────────────────────────────────────────────────┐
│                   COMPUTER USE SYSTEM                          │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│  User: "Open Spotify and play some jazz"                      │
│                         ↓                                      │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ IQ Exchange Translation Layer                             │ │
│  │   → DESKTOP action detected                               │ │
│  │   → Route to Computer Use module                          │ │
│  └──────────────────────────────────────────────────────────┘ │
│                         ↓                                      │
│  ┌────────────────────────────────────┬─────────────────────┐ │
│  │ input.ps1 (PowerShell)             │ computer-use.cjs    │ │
│  │ (Current - 66KB!)                  │ (Node.js native)    │ │
│  │                                    │                     │ │
│  │ • UIAutomation via .NET            │ • robotjs for speed │ │
│  │ • OCR via Windows API              │ • screenshot-desktop│ │
│  │ • Window management                │ • Cross-platform    │ │
│  └────────────────────────────────────┴─────────────────────┘ │
│                         ↓                                      │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ Windows Desktop                                           │ │
│  │   • Apps, dialogs, menus                                  │ │
│  │   • Task bar, system tray                                 │ │
│  │   • Any visible UI element                                │ │
│  └──────────────────────────────────────────────────────────┘ │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Available Commands (input.ps1)

Vision Commands (The Eyes)

# See what's on screen
input.ps1 screenshot "screen.png"      # Capture full screen
input.ps1 ocr "full"                   # Read ALL text on screen
input.ps1 ocr "region" 100 100 500 400 # Read text in region
input.ps1 app_state "Notepad"          # Get UI tree of app (buttons, menus, etc.)

Navigation Commands

# Open and focus apps
input.ps1 open "Spotify"               # Launch or focus app
input.ps1 startmenu                    # Open Windows Start menu
input.ps1 focus "Save As"              # Focus specific dialog/element
input.ps1 waitfor "Ready" 10           # Wait for text to appear (max 10s)

Interaction Commands (Smart - via UIAutomation)

# Click by element name (RELIABLE!)
input.ps1 uiclick "Play"               # Click button named "Play"
input.ps1 uiclick "File"               # Click menu item
input.ps1 uipress "Remember me"        # Toggle checkbox
input.ps1 uipress "Jazz"               # Select list item

# Type text
input.ps1 type "Hello World"           # Type into focused element
input.ps1 keyboard "search query"      # Alternative typing method
input.ps1 key "Enter"                  # Press single key
input.ps1 hotkey "Ctrl+S"              # Key combination

Fallback Commands (Coordinate-based)

# When UIAutomation fails, use coordinates
input.ps1 mouse 500 300                # Move mouse to x,y
input.ps1 click                        # Click at current position
input.ps1 drag 100 100 500 500         # Drag from point to point

Example Flow

User: "Open Paint and draw a red circle"

IQ Exchange generates:
1. input.ps1 open "Paint"
2. input.ps1 waitfor "Untitled" 5
3. input.ps1 uiclick "Brushes"
4. input.ps1 uiclick "Red"
5. input.ps1 mouse 400 300
6. input.ps1 drag 400 300 500 400

After each step:
  → Screenshot
  → OCR/app_state to verify
  → If failed → Self-heal and retry

Enhancements Needed

Current Enhanced
PowerShell scripts (slow startup) Node.js native with robotjs
Basic element finding Smart element grounding with labels
Manual verification Auto-screenshot after each action
English-only OCR Multi-language OCR support

4. Full Vision Capabilities

Purpose

Let the AI "see" the screen like a human - understand what's visible, where elements are, and verify actions worked.

Architecture

┌────────────────────────────────────────────────────────────────┐
│                    VISION SYSTEM                               │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ SCREENSHOT LAYER                                          │ │
│  │   • Full screen capture                                   │ │
│  │   • Window-specific capture                               │ │
│  │   • Region capture                                        │ │
│  │   • Before/after comparison                               │ │
│  └──────────────────────────────────────────────────────────┘ │
│                         ↓                                      │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ ANALYSIS LAYER                                            │ │
│  │                                                           │ │
│  │  ┌────────────────┬────────────────┬──────────────────┐  │ │
│  │  │ OCR Engine     │ UI Automation  │ AI Vision (LLM)  │  │ │
│  │  │                │                │                  │  │ │
│  │  │ • Read text    │ • Element tree │ • Understand     │  │ │
│  │  │ • Find words   │ • Button names │   context        │  │ │
│  │  │ • Coordinates  │ • Input fields │ • Verify success │  │ │
│  │  │   of text      │ • Menus        │ • Describe scene │  │ │
│  │  └────────────────┴────────────────┴──────────────────┘  │ │
│  └──────────────────────────────────────────────────────────┘ │
│                         ↓                                      │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ GROUNDING LAYER (Element Labeling)                        │ │
│  │                                                           │ │
│  │  Screenshot with overlays:                                │ │
│  │  ┌─────────────────────────────────────────┐             │ │
│  │  │   [1] File   [2] Edit   [3] View        │             │ │
│  │  │   ┌─────────────────────────────┐       │             │ │
│  │  │   │ [4] Search...               │       │             │ │
│  │  │   └─────────────────────────────┘       │             │ │
│  │  │                                         │             │ │
│  │  │   [5] Save   [6] Cancel                 │             │ │
│  │  └─────────────────────────────────────────┘             │ │
│  │                                                           │ │
│  │  AI can say: "Click element [5]" → We click Save button  │ │
│  └──────────────────────────────────────────────────────────┘ │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Vision Flow

Before Action

1. Screenshot current state
2. OCR to find text locations
3. UIAutomation to get element names
4. Create "grounding map" of clickable elements
5. Send to AI: "Here's what I see on screen..."

After Action

1. Screenshot new state
2. Compare with previous
3. OCR to verify expected text appeared
4. Send to AI: "Did the action succeed?"
5. If failed → Generate fix → Retry

Grounding Map Example

{
  "screenshot": "base64://...",
  "elements": [
    { "id": 1, "type": "button", "text": "File", "bounds": [10, 5, 50, 25] },
    { "id": 2, "type": "button", "text": "Edit", "bounds": [60, 5, 100, 25] },
    { "id": 3, "type": "input", "placeholder": "Search...", "bounds": [150, 50, 400, 80] },
    { "id": 4, "type": "button", "text": "Save", "bounds": [300, 500, 380, 530] },
    { "id": 5, "type": "button", "text": "Cancel", "bounds": [400, 500, 480, 530] }
  ],
  "ocr_text": [
    { "text": "Untitled - Notepad", "bounds": [100, 0, 300, 20] },
    { "text": "Hello World", "bounds": [50, 100, 200, 120] }
  ]
}

AI Prompt with Vision

You are controlling a Windows computer. Here's what you see:

[SCREENSHOT: base64 image]

INTERACTIVE ELEMENTS:
[1] Button: "File" at (10, 5)
[2] Button: "Edit" at (60, 5)
[3] Input: "Search..." at (150, 50)
[4] Button: "Save" at (300, 500)
[5] Button: "Cancel" at (400, 500)

VISIBLE TEXT:
- "Untitled - Notepad" (title bar)
- "Hello World" (document content)

USER REQUEST: "Save this file as hello.txt"

YOUR ACTION:
- To click an element, say: CLICK [4]
- To type, say: TYPE "hello.txt"
- To press key, say: KEY Enter

5. Vibe Server Management

Purpose

"Vibe Server Management" = Server tasks for people who don't know servers.

No SSH commands to memorize. Just tell Goose what you want in plain English.

Architecture

┌────────────────────────────────────────────────────────────────┐
│                 VIBE SERVER MANAGEMENT                         │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│  User: "My website is down, fix it"                           │
│                         ↓                                      │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ IQ Exchange Translation Layer                             │ │
│  │   → SERVER action detected                                │ │
│  │   → Route to Server Management module                     │ │
│  └──────────────────────────────────────────────────────────┘ │
│                         ↓                                      │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │ SERVER DIAGNOSIS FLOW                                     │ │
│  │                                                           │ │
│  │  1. Check if we have SSH credentials                      │ │
│  │     → If not: "I need SSH access. Enter credentials?"     │ │
│  │                                                           │ │
│  │  2. Connect to server                                     │ │
│  │     → ssh root@86.105.224.125                            │ │
│  │                                                           │ │
│  │  3. Diagnose the issue                                    │ │
│  │     → systemctl status nginx                              │ │
│  │     → pm2 status                                          │ │
│  │     → tail -n 50 /var/log/nginx/error.log                │ │
│  │                                                           │ │
│  │  4. Fix the issue                                         │ │
│  │     → systemctl restart nginx                             │ │
│  │     → pm2 restart all                                     │ │
│  │                                                           │ │
│  │  5. Verify fix worked                                     │ │
│  │     → curl http://localhost                               │ │
│  │     → Report back to user                                 │ │
│  └──────────────────────────────────────────────────────────┘ │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Noob-Friendly Commands (What User Says → What Goose Does)

User Says Goose Does
"Is my server running?" sshsystemctl status nginx → parse output
"Restart my website" pm2 restart all or systemctl restart nginx
"Check the logs" tail -n 100 /var/log/nginx/error.log
"Deploy my latest code" git pullnpm installnpm run buildpm2 restart
"My server is slow" Check CPU/memory, identify resource hogs
"Set up SSL" certbot --nginx -d domain.com
"Add a new website" Create nginx config, set up PM2 process
"Backup my database" pg_dump or mysqldump
"My disk is full" df -h, find large files, clean up safely

Server Panel UI

┌────────────────────────────────────────────────────────────────┐
│ 🔗 SERVER MANAGEMENT                                     [X]   │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│  ┌───────────────────────────────────────────────────────────┐│
│  │ CONNECTIONS                                    [+ Add New] ││
│  │ ─────────────────────────────────────────────────────────  ││
│  │ ● Production Server (86.105.224.125)           [Connect]  ││
│  │ ○ Staging Server (192.168.1.100)               [Connect]  ││
│  └───────────────────────────────────────────────────────────┘│
│                                                                │
│  ┌───────────────────────────────────────────────────────────┐│
│  │ QUICK ACTIONS                                              ││
│  │ ─────────────────────────────────────────────────────────  ││
│  │  [🔄 Restart Website]  [📊 Check Status]  [📋 View Logs]  ││
│  │  [🚀 Deploy Latest]    [💾 Backup DB]     [🔒 SSL Setup]  ││
│  └───────────────────────────────────────────────────────────┘│
│                                                                │
│  ┌───────────────────────────────────────────────────────────┐│
│  │ LIVE TERMINAL                                              ││
│  │ ─────────────────────────────────────────────────────────  ││
│  │ root@server:~# systemctl status nginx                     ││
│  │ ● nginx.service - A high performance web server           ││
│  │    Loaded: loaded (/lib/systemd/system/nginx.service)     ││
│  │    Active: active (running) since Mon 2025-12-16 10:00    ││
│  │                                                           ││
│  │ $> _                                                      ││
│  └───────────────────────────────────────────────────────────┘│
│                                                                │
│  ┌───────────────────────────────────────────────────────────┐│
│  │ 💬 ASK GOOSE (natural language)                           ││
│  │ ─────────────────────────────────────────────────────────  ││
│  │ "Why is my website slow?"                    [Ask Goose]  ││
│  └───────────────────────────────────────────────────────────┘│
│                                                                │
└────────────────────────────────────────────────────────────────┘

SSH Service Implementation

// services/SSHService.js
const { Client } = require('ssh2');

class SSHService {
    constructor() {
        this.connections = new Map();
    }

    async connect(name, config) {
        const conn = new Client();
        return new Promise((resolve, reject) => {
            conn.on('ready', () => {
                this.connections.set(name, conn);
                resolve({ success: true });
            });
            conn.on('error', reject);
            conn.connect(config);
        });
    }

    async exec(name, command) {
        const conn = this.connections.get(name);
        if (!conn) throw new Error('Not connected');
        
        return new Promise((resolve, reject) => {
            conn.exec(command, (err, stream) => {
                if (err) reject(err);
                let output = '';
                stream.on('data', (data) => output += data);
                stream.on('close', () => resolve(output));
            });
        });
    }

    // High-level helper methods
    async checkWebsiteStatus(name) {
        const nginx = await this.exec(name, 'systemctl is-active nginx');
        const pm2 = await this.exec(name, 'pm2 jlist');
        return { nginx: nginx.trim(), pm2: JSON.parse(pm2) };
    }

    async restartWebsite(name) {
        await this.exec(name, 'pm2 restart all');
        await this.exec(name, 'systemctl reload nginx');
        return { success: true };
    }

    async getLogs(name, lines = 50) {
        return this.exec(name, `tail -n ${lines} /var/log/nginx/error.log`);
    }

    async deploy(name, path = '/var/www/app') {
        await this.exec(name, `cd ${path} && git pull`);
        await this.exec(name, `cd ${path} && npm install`);
        await this.exec(name, `cd ${path} && npm run build`);
        await this.exec(name, 'pm2 restart all');
        return { success: true };
    }
}

Saved Server Configurations

// .opencode/servers.json
{
  "production": {
    "name": "Production Server",
    "host": "86.105.224.125",
    "port": 22,
    "username": "root",
    "authType": "password",
    "webRoot": "/var/www/myapp",
    "pm2App": "myapp"
  },
  "staging": {
    "name": "Staging Server",
    "host": "192.168.1.100",
    "port": 22,
    "username": "deploy",
    "authType": "key",
    "keyPath": "~/.ssh/id_rsa",
    "webRoot": "/var/www/staging"
  }
}

Summary: All Five Subsystems

Subsystem Purpose Key Tech
1. Playwright Browser automation Playwright + persistent sessions
2. Browser Use Built-in browser panel Electron BrowserView
3. Computer Use Desktop automation input.ps1 + UIAutomation
4. Full Vision See and verify OCR + Screenshot + Grounding
5. Vibe Server Noob-friendly server ops SSH2 + high-level helpers

All subsystems connect through IQ Exchange which:

  • Detects task type from natural language
  • Routes to appropriate subsystem
  • Executes with self-healing loop
  • Verifies success using vision
  • Retries until success or max attempts

This document provides implementation-ready specifications for each core subsystem.