Files
OpenQode/docs/GOOSE_SUPER_FEATURE_EXTRACTION.md

17 KiB

📦 FEATURE EXTRACTION MAP - Reference Projects

What specific code/patterns to take from each project for Goose Super


1. Windows-Use (CursorTouch)

Repo: https://github.com/CursorTouch/Windows-Use

What They Have

windows_use/
├── agent/          # Agent orchestration loop
├── llms/           # LLM provider abstraction (Ollama, Google, OpenAI)
├── messages/       # Conversation/message handling
├── tool/           # Tool definitions for automation
└── telemetry/      # Usage analytics

What We Take

Feature Their File Our Implementation
UIAutomation element finder tool/uia.py Port to computer-use.cjs
Element grounding (labeling) tool/grounding.py New services/GroundingService.js
Smart click by name tool/actions.py Enhance input.ps1 uiclick
LLM abstraction layer llms/*.py New services/LLMService.js
Agent loop pattern agent/agent.py Enhance lib/iq-exchange.mjs

Specific Code to Port

# FROM: windows_use/tool/uia.py - UIAutomation element discovery
def find_element_by_name(name: str, control_type: str = None):
    """Find UI element using Windows Accessibility API"""
    automation = UIAutomation.GetRootElement()
    condition = automation.CreatePropertyCondition(
        UIA.NamePropertyId, name
    )
    if control_type:
        condition = automation.CreateAndCondition(
            condition,
            automation.CreatePropertyCondition(
                UIA.ControlTypePropertyId, control_type
            )
        )
    return automation.FindFirst(TreeScope.Descendants, condition)

Port to Node.js/PowerShell:

# Enhanced input.ps1 - find_element function
function Find-ElementByName {
    param([string]$Name, [string]$ControlType = $null)
    
    Add-Type -AssemblyName UIAutomationClient
    $root = [System.Windows.Automation.AutomationElement]::RootElement
    $condition = [System.Windows.Automation.PropertyCondition]::new(
        [System.Windows.Automation.AutomationElement]::NameProperty, 
        $Name
    )
    return $root.FindFirst([System.Windows.Automation.TreeScope]::Descendants, $condition)
}

Grounding System (Label Elements on Screenshot)

# FROM: windows_use/tool/grounding.py
def create_grounding_map(screenshot_path):
    """Overlay element labels on screenshot for AI understanding"""
    elements = get_all_interactive_elements()
    labeled_image = draw_labels_on_screenshot(screenshot_path, elements)
    return {
        "image": labeled_image,
        "elements": [
            {"id": i, "name": el.name, "type": el.type, "bounds": el.rect}
            for i, el in enumerate(elements)
        ]
    }

2. Open-Interface (AmberSahdev)

Repo: https://github.com/AmberSahdev/Open-Interface

What They Have

app/
├── core.py         # Main loop: screenshot → LLM → execute → verify
├── interpreter.py  # Parse LLM response into actions
├── llm.py          # LLM communication
├── ui.py           # Tkinter UI (18KB)
└── utils/          # Helpers

What We Take

Feature Their File Our Implementation
Screenshot → Execute → Verify loop core.py Enhance IQ Exchange loop
Action interpreter/parser interpreter.py Improve extractExecutables()
Corner detection (interrupt) ui.py Add to Electron app
Course-correction logic core.py Add to self-heal flow

Specific Code to Port

# FROM: app/core.py - Main automation loop
class AutomationCore:
    def run_task(self, user_request):
        while not self.task_complete and self.attempts < self.max_attempts:
            # 1. Capture current state
            screenshot = self.capture_screen()
            
            # 2. Send to LLM with context
            response = self.llm.analyze(
                screenshot=screenshot,
                request=user_request,
                previous_actions=self.action_history
            )
            
            # 3. Parse and execute actions
            actions = self.interpreter.parse(response)
            for action in actions:
                result = self.executor.run(action)
                self.action_history.append(result)
            
            # 4. Verify progress
            new_screenshot = self.capture_screen()
            progress = self.llm.verify_progress(screenshot, new_screenshot, user_request)
            
            if progress.is_complete:
                self.task_complete = True
            elif progress.is_stuck:
                self.attempts += 1
                # Self-correction: ask LLM for alternative approach

Port to our IQ Exchange:

// Enhanced lib/iq-exchange.mjs
async process(userRequest) {
    while (!this.taskComplete && this.attempts < this.maxRetries) {
        // 1. Screenshot current state
        const before = await this.captureScreen();
        
        // 2. Get AI instructions
        const response = await this.sendToAI(
            this.buildPromptWithVision(userRequest, before, this.history)
        );
        
        // 3. Execute
        const actions = this.extractExecutables(response);
        for (const action of actions) {
            await this.executeAny(action.content);
        }
        
        // 4. Verify
        const after = await this.captureScreen();
        const verified = await this.verifyProgress(before, after, userRequest);
        
        if (verified.complete) {
            this.taskComplete = true;
        } else if (verified.stuck) {
            this.attempts++;
            // Request alternative approach
        }
    }
}

Mouse Corner Interrupt

# FROM: app/ui.py - Corner detection to stop automation
def check_corner_interrupt(self):
    """Stop if user drags mouse to corner (safety interrupt)"""
    x, y = pyautogui.position()
    screen_w, screen_h = pyautogui.size()
    
    corners = [
        (0, 0), (screen_w, 0),
        (0, screen_h), (screen_w, screen_h)
    ]
    
    for cx, cy in corners:
        if abs(x - cx) < 50 and abs(y - cy) < 50:
            self.stop_automation()
            return True
    return False

3. Browser-Use

Repo: https://github.com/browser-use/browser-use

What They Have (Most Comprehensive!)

browser_use/
├── agent/          # Agent service and logic
├── browser/        # Playwright wrapper with smart features
├── controller/     # Action controller
├── dom/            # DOM analysis and manipulation
├── code_use/       # Code execution sandbox
├── filesystem/     # File operations
├── tools/          # Custom tool definitions
├── skills/         # Reusable action patterns
├── sandbox/        # Safe execution environment
├── llm/            # LLM integrations
└── mcp/            # Model Context Protocol

What We Take

Feature Their File Our Implementation
Smart DOM extraction dom/ New services/DOMService.js
Element selectors by text browser/element.py Enhance playwright-bridge.js
Multi-tab management browser/session.py Add to browser panel
Tools decorator pattern tools/registry.py New tool registration system
Sandbox execution sandbox/ Safe code runner
Form auto-detection dom/forms.py Auto-fill capability
Smart waiting browser/waits.py Better wait logic

Specific Code to Port

# FROM: browser_use/dom/extractor.py - Smart DOM extraction
class DOMExtractor:
    def extract_interactive_elements(self, page):
        """Extract all clickable/fillable elements with smart selectors"""
        return page.evaluate('''() => {
            const elements = [];
            const interactive = document.querySelectorAll(
                'a, button, input, select, textarea, [role="button"], [onclick]'
            );
            
            for (let i = 0; i < interactive.length; i++) {
                const el = interactive[i];
                const rect = el.getBoundingClientRect();
                if (rect.width > 0 && rect.height > 0) {
                    elements.push({
                        index: i,
                        tag: el.tagName.toLowerCase(),
                        text: el.textContent?.trim().slice(0, 50) || '',
                        placeholder: el.placeholder || '',
                        id: el.id,
                        name: el.name,
                        type: el.type,
                        selector: generateUniqueSelector(el),
                        rect: { x: rect.x, y: rect.y, w: rect.width, h: rect.height }
                    });
                }
            }
            return elements;
        }''')

Port to our Playwright bridge:

// Enhanced bin/playwright-bridge.js
async function extractInteractiveElements(page) {
    return await page.evaluate(() => {
        const elements = [];
        const selectors = 'a, button, input, select, textarea, [role="button"], [onclick]';
        document.querySelectorAll(selectors).forEach((el, i) => {
            const rect = el.getBoundingClientRect();
            if (rect.width > 0 && rect.height > 0) {
                elements.push({
                    index: i,
                    tag: el.tagName.toLowerCase(),
                    text: (el.textContent || '').trim().slice(0, 50),
                    placeholder: el.placeholder || '',
                    ariaLabel: el.getAttribute('aria-label') || '',
                    selector: el.id ? `#${el.id}` : generateSelector(el),
                    rect: { x: rect.x, y: rect.y, w: rect.width, h: rect.height }
                });
            }
        });
        return elements;
    });
}

Custom Tools Pattern

# FROM: browser_use/tools/registry.py
from browser_use import Tools

tools = Tools()

@tools.action(description='Search for something on Google')
async def google_search(query: str):
    await browser.navigate('https://google.com')
    await browser.fill('input[name="q"]', query)
    await browser.press('Enter')
    return await browser.get_content()

Port to our system:

// New: lib/tools-registry.js
class ToolsRegistry {
    constructor() {
        this.tools = new Map();
    }
    
    register(name, description, handler) {
        this.tools.set(name, { description, handler });
    }
    
    async execute(name, params) {
        const tool = this.tools.get(name);
        if (!tool) throw new Error(`Unknown tool: ${name}`);
        return await tool.handler(params);
    }
    
    getSchema() {
        return Array.from(this.tools.entries()).map(([name, tool]) => ({
            name,
            description: tool.description
        }));
    }
}

// Register tools
tools.register('google_search', 'Search Google', async ({ query }) => {
    await playwright.navigate('https://google.com');
    await playwright.fill('input[name="q"]', query);
    await playwright.press('Enter');
});

4. VS Code / Monaco Editor

Repo: https://github.com/microsoft/vscode

What They Have

VS Code is massive! We only need:
├── monaco-editor            # Core editor component (npm package)
├── Language services        # IntelliSense for JS/TS/CSS/HTML
└── TextMate grammars        # Syntax highlighting definitions

What We Take

Feature Their Package Our Implementation
Monaco Editor monaco-editor npm Embed in Electron
Language services Built into Monaco Enable for JS/TS/CSS/HTML
File tree UI Custom Build our own simple tree
Multi-tab Custom Build tabbed interface

Implementation

// Install
npm install monaco-editor

// Embed in Electron (renderer)
import * as monaco from 'monaco-editor';

// Configure for web languages
monaco.languages.typescript.javascriptDefaults.setDiagnosticsOptions({
    noSemanticValidation: false,
    noSyntaxValidation: false
});

// Create editor
const editor = monaco.editor.create(document.getElementById('editor-container'), {
    value: '// Start coding...',
    language: 'javascript',
    theme: 'vs-dark',
    automaticLayout: true,
    minimap: { enabled: false },
    fontSize: 14,
    lineNumbers: 'on',
    wordWrap: 'on'
});

// Listen for changes
editor.onDidChangeModelContent(() => {
    const code = editor.getValue();
    // Auto-save or trigger preview refresh
});

5. Goose (Block)

Repo: https://github.com/block/goose

What They Have

crates/
├── goose-cli/      # CLI interface
├── goose-server/   # Web server
├── goose/          # Core agent logic
└── mcp-*/          # Model Context Protocol extensions

ui/                 # Web UI (React)
documentation/      # Recipes and guides

What We Take

Feature Their Location Our Implementation
Recipe system documentation/recipes/ Project templates
Session management crates/goose/session.rs Session persistence
Multi-agent support crates/goose/agent.rs Future: agent switching
MCP protocol crates/mcp-*/ Tool discovery
Web UI patterns ui/ UI component ideas

Session Pattern

// FROM: crates/goose/session.rs (concept)
struct Session {
    id: String,
    messages: Vec<Message>,
    tools: Vec<Tool>,
    context: Context,
}

impl Session {
    fn save(&self, path: &Path) { /* serialize to disk */ }
    fn load(path: &Path) -> Self { /* deserialize from disk */ }
}

Port to our system:

// New: services/SessionService.js
class SessionService {
    constructor(dataDir = '.opencode/sessions') {
        this.dataDir = dataDir;
    }
    
    save(session) {
        const path = `${this.dataDir}/${session.id}.json`;
        fs.writeFileSync(path, JSON.stringify({
            id: session.id,
            created: session.created,
            messages: session.messages,
            files: session.files,
            projectPath: session.projectPath
        }, null, 2));
    }
    
    load(sessionId) {
        const path = `${this.dataDir}/${sessionId}.json`;
        return JSON.parse(fs.readFileSync(path, 'utf8'));
    }
    
    list() {
        return fs.readdirSync(this.dataDir)
            .filter(f => f.endsWith('.json'))
            .map(f => this.load(f.replace('.json', '')));
    }
}

6. OpenCode TUI (SST)

Repo: https://github.com/sst/opencode

What They Have

packages/
├── core/           # Core agent logic
├── tui/            # Terminal UI (Ink-based)
├── lsp/            # Language Server Protocol
└── providers/      # LLM providers

What We Take

Feature Their Location Our Implementation
LSP integration packages/lsp/ Code intelligence
Beautiful TUI packages/tui/ Reference for Ink components
Provider abstraction packages/providers/ Multi-LLM support
Status indicators packages/tui/components/ Progress UI

7. Mini-Agent (MiniMax)

Repo: https://github.com/MiniMax-AI/Mini-Agent

What They Have

mini_agent/
├── agent.py        # Lightweight agent
├── tools.py        # Tool definitions
└── memory.py       # Context/memory management

What We Take

Feature Their File Our Implementation
Memory management memory.py Context window optimization
Simple agent loop agent.py Reference for clarity

Memory Pattern

# FROM: mini_agent/memory.py
class Memory:
    def __init__(self, max_tokens=8000):
        self.messages = []
        self.max_tokens = max_tokens
    
    def add(self, message):
        self.messages.append(message)
        self._prune_if_needed()
    
    def _prune_if_needed(self):
        """Remove old messages to stay under token limit"""
        while self._count_tokens() > self.max_tokens:
            # Keep system message, remove oldest user/assistant
            if len(self.messages) > 2:
                self.messages.pop(1)

Summary: Code Extraction Checklist

Immediate Priority

  • UIAutomation from Windows-Usecomputer-use.cjs
  • DOM extraction from browser-useplaywright-bridge.js
  • Verify loop from Open-Interfacelib/iq-exchange.mjs
  • Monaco Editor from VS Code → New editor panel

Medium Priority

  • Tools registry from browser-uselib/tools-registry.js
  • Session management from Gooseservices/SessionService.js
  • Grounding system from Windows-Useservices/GroundingService.js

Nice to Have

  • LSP from OpenCode → Code intelligence
  • Memory optimization from Mini-Agent → Context management
  • MCP protocol from Goose → Tool discovery

This document maps exactly what code to extract from each project.