Release v1.01 Enhanced: Vi Control, TUI Gen5, Core Stability

2025-12-20 01:12:45 +04:00
parent 2407c42eb9
commit 142aaeee1e
254 changed files with 44888 additions and 31025 deletions
--- a/docs/GOOSE_SUPER_FEATURE_EXTRACTION.md
+++ b/docs/GOOSE_SUPER_FEATURE_EXTRACTION.md
@@ -0,0 +1,559 @@
+# 📦 FEATURE EXTRACTION MAP - Reference Projects
+
+> What specific code/patterns to take from each project for Goose Super
+
+---
+
+## 1. Windows-Use (CursorTouch)
+**Repo:** https://github.com/CursorTouch/Windows-Use
+
+### What They Have
+```
+windows_use/
+├── agent/          # Agent orchestration loop
+├── llms/           # LLM provider abstraction (Ollama, Google, OpenAI)
+├── messages/       # Conversation/message handling
+├── tool/           # Tool definitions for automation
+└── telemetry/      # Usage analytics
+```
+
+### What We Take
+
+| Feature | Their File | Our Implementation |
+|---------|------------|-------------------|
+| **UIAutomation element finder** | `tool/uia.py` | Port to `computer-use.cjs` |
+| **Element grounding (labeling)** | `tool/grounding.py` | New `services/GroundingService.js` |
+| **Smart click by name** | `tool/actions.py` | Enhance `input.ps1 uiclick` |
+| **LLM abstraction layer** | `llms/*.py` | New `services/LLMService.js` |
+| **Agent loop pattern** | `agent/agent.py` | Enhance `lib/iq-exchange.mjs` |
+
+### Specific Code to Port
+
+```python
+# FROM: windows_use/tool/uia.py - UIAutomation element discovery
+def find_element_by_name(name: str, control_type: str = None):
+    """Find UI element using Windows Accessibility API"""
+    automation = UIAutomation.GetRootElement()
+    condition = automation.CreatePropertyCondition(
+        UIA.NamePropertyId, name
+    )
+    if control_type:
+        condition = automation.CreateAndCondition(
+            condition,
+            automation.CreatePropertyCondition(
+                UIA.ControlTypePropertyId, control_type
+            )
+        )
+    return automation.FindFirst(TreeScope.Descendants, condition)
+```
+
+**Port to Node.js/PowerShell:**
+```powershell
+# Enhanced input.ps1 - find_element function
+function Find-ElementByName {
+    param([string]$Name, [string]$ControlType = $null)
+    
+    Add-Type -AssemblyName UIAutomationClient
+    $root = [System.Windows.Automation.AutomationElement]::RootElement
+    $condition = [System.Windows.Automation.PropertyCondition]::new(
+        [System.Windows.Automation.AutomationElement]::NameProperty, 
+        $Name
+    )
+    return $root.FindFirst([System.Windows.Automation.TreeScope]::Descendants, $condition)
+}
+```
+
+### Grounding System (Label Elements on Screenshot)
+
+```python
+# FROM: windows_use/tool/grounding.py
+def create_grounding_map(screenshot_path):
+    """Overlay element labels on screenshot for AI understanding"""
+    elements = get_all_interactive_elements()
+    labeled_image = draw_labels_on_screenshot(screenshot_path, elements)
+    return {
+        "image": labeled_image,
+        "elements": [
+            {"id": i, "name": el.name, "type": el.type, "bounds": el.rect}
+            for i, el in enumerate(elements)
+        ]
+    }
+```
+
+---
+
+## 2. Open-Interface (AmberSahdev)
+**Repo:** https://github.com/AmberSahdev/Open-Interface
+
+### What They Have
+```
+app/
+├── core.py         # Main loop: screenshot → LLM → execute → verify
+├── interpreter.py  # Parse LLM response into actions
+├── llm.py          # LLM communication
+├── ui.py           # Tkinter UI (18KB)
+└── utils/          # Helpers
+```
+
+### What We Take
+
+| Feature | Their File | Our Implementation |
+|---------|------------|-------------------|
+| **Screenshot → Execute → Verify loop** | `core.py` | Enhance IQ Exchange loop |
+| **Action interpreter/parser** | `interpreter.py` | Improve `extractExecutables()` |
+| **Corner detection (interrupt)** | `ui.py` | Add to Electron app |
+| **Course-correction logic** | `core.py` | Add to self-heal flow |
+
+### Specific Code to Port
+
+```python
+# FROM: app/core.py - Main automation loop
+class AutomationCore:
+    def run_task(self, user_request):
+        while not self.task_complete and self.attempts < self.max_attempts:
+            # 1. Capture current state
+            screenshot = self.capture_screen()
+            
+            # 2. Send to LLM with context
+            response = self.llm.analyze(
+                screenshot=screenshot,
+                request=user_request,
+                previous_actions=self.action_history
+            )
+            
+            # 3. Parse and execute actions
+            actions = self.interpreter.parse(response)
+            for action in actions:
+                result = self.executor.run(action)
+                self.action_history.append(result)
+            
+            # 4. Verify progress
+            new_screenshot = self.capture_screen()
+            progress = self.llm.verify_progress(screenshot, new_screenshot, user_request)
+            
+            if progress.is_complete:
+                self.task_complete = True
+            elif progress.is_stuck:
+                self.attempts += 1
+                # Self-correction: ask LLM for alternative approach
+```
+
+**Port to our IQ Exchange:**
+```javascript
+// Enhanced lib/iq-exchange.mjs
+async process(userRequest) {
+    while (!this.taskComplete && this.attempts < this.maxRetries) {
+        // 1. Screenshot current state
+        const before = await this.captureScreen();
+        
+        // 2. Get AI instructions
+        const response = await this.sendToAI(
+            this.buildPromptWithVision(userRequest, before, this.history)
+        );
+        
+        // 3. Execute
+        const actions = this.extractExecutables(response);
+        for (const action of actions) {
+            await this.executeAny(action.content);
+        }
+        
+        // 4. Verify
+        const after = await this.captureScreen();
+        const verified = await this.verifyProgress(before, after, userRequest);
+        
+        if (verified.complete) {
+            this.taskComplete = true;
+        } else if (verified.stuck) {
+            this.attempts++;
+            // Request alternative approach
+        }
+    }
+}
+```
+
+### Mouse Corner Interrupt
+
+```python
+# FROM: app/ui.py - Corner detection to stop automation
+def check_corner_interrupt(self):
+    """Stop if user drags mouse to corner (safety interrupt)"""
+    x, y = pyautogui.position()
+    screen_w, screen_h = pyautogui.size()
+    
+    corners = [
+        (0, 0), (screen_w, 0),
+        (0, screen_h), (screen_w, screen_h)
+    ]
+    
+    for cx, cy in corners:
+        if abs(x - cx) < 50 and abs(y - cy) < 50:
+            self.stop_automation()
+            return True
+    return False
+```
+
+---
+
+## 3. Browser-Use
+**Repo:** https://github.com/browser-use/browser-use
+
+### What They Have (Most Comprehensive!)
+```
+browser_use/
+├── agent/          # Agent service and logic
+├── browser/        # Playwright wrapper with smart features
+├── controller/     # Action controller
+├── dom/            # DOM analysis and manipulation
+├── code_use/       # Code execution sandbox
+├── filesystem/     # File operations
+├── tools/          # Custom tool definitions
+├── skills/         # Reusable action patterns
+├── sandbox/        # Safe execution environment
+├── llm/            # LLM integrations
+└── mcp/            # Model Context Protocol
+```
+
+### What We Take
+
+| Feature | Their File | Our Implementation |
+|---------|------------|-------------------|
+| **Smart DOM extraction** | `dom/` | New `services/DOMService.js` |
+| **Element selectors by text** | `browser/element.py` | Enhance `playwright-bridge.js` |
+| **Multi-tab management** | `browser/session.py` | Add to browser panel |
+| **Tools decorator pattern** | `tools/registry.py` | New tool registration system |
+| **Sandbox execution** | `sandbox/` | Safe code runner |
+| **Form auto-detection** | `dom/forms.py` | Auto-fill capability |
+| **Smart waiting** | `browser/waits.py` | Better wait logic |
+
+### Specific Code to Port
+
+```python
+# FROM: browser_use/dom/extractor.py - Smart DOM extraction
+class DOMExtractor:
+    def extract_interactive_elements(self, page):
+        """Extract all clickable/fillable elements with smart selectors"""
+        return page.evaluate('''() => {
+            const elements = [];
+            const interactive = document.querySelectorAll(
+                'a, button, input, select, textarea, [role="button"], [onclick]'
+            );
+            
+            for (let i = 0; i < interactive.length; i++) {
+                const el = interactive[i];
+                const rect = el.getBoundingClientRect();
+                if (rect.width > 0 && rect.height > 0) {
+                    elements.push({
+                        index: i,
+                        tag: el.tagName.toLowerCase(),
+                        text: el.textContent?.trim().slice(0, 50) || '',
+                        placeholder: el.placeholder || '',
+                        id: el.id,
+                        name: el.name,
+                        type: el.type,
+                        selector: generateUniqueSelector(el),
+                        rect: { x: rect.x, y: rect.y, w: rect.width, h: rect.height }
+                    });
+                }
+            }
+            return elements;
+        }''')
+```
+
+**Port to our Playwright bridge:**
+```javascript
+// Enhanced bin/playwright-bridge.js
+async function extractInteractiveElements(page) {
+    return await page.evaluate(() => {
+        const elements = [];
+        const selectors = 'a, button, input, select, textarea, [role="button"], [onclick]';
+        document.querySelectorAll(selectors).forEach((el, i) => {
+            const rect = el.getBoundingClientRect();
+            if (rect.width > 0 && rect.height > 0) {
+                elements.push({
+                    index: i,
+                    tag: el.tagName.toLowerCase(),
+                    text: (el.textContent || '').trim().slice(0, 50),
+                    placeholder: el.placeholder || '',
+                    ariaLabel: el.getAttribute('aria-label') || '',
+                    selector: el.id ? `#${el.id}` : generateSelector(el),
+                    rect: { x: rect.x, y: rect.y, w: rect.width, h: rect.height }
+                });
+            }
+        });
+        return elements;
+    });
+}
+```
+
+### Custom Tools Pattern
+
+```python
+# FROM: browser_use/tools/registry.py
+from browser_use import Tools
+
+tools = Tools()
+
+@tools.action(description='Search for something on Google')
+async def google_search(query: str):
+    await browser.navigate('https://google.com')
+    await browser.fill('input[name="q"]', query)
+    await browser.press('Enter')
+    return await browser.get_content()
+```
+
+**Port to our system:**
+```javascript
+// New: lib/tools-registry.js
+class ToolsRegistry {
+    constructor() {
+        this.tools = new Map();
+    }
+    
+    register(name, description, handler) {
+        this.tools.set(name, { description, handler });
+    }
+    
+    async execute(name, params) {
+        const tool = this.tools.get(name);
+        if (!tool) throw new Error(`Unknown tool: ${name}`);
+        return await tool.handler(params);
+    }
+    
+    getSchema() {
+        return Array.from(this.tools.entries()).map(([name, tool]) => ({
+            name,
+            description: tool.description
+        }));
+    }
+}
+
+// Register tools
+tools.register('google_search', 'Search Google', async ({ query }) => {
+    await playwright.navigate('https://google.com');
+    await playwright.fill('input[name="q"]', query);
+    await playwright.press('Enter');
+});
+```
+
+---
+
+## 4. VS Code / Monaco Editor
+**Repo:** https://github.com/microsoft/vscode
+
+### What They Have
+```
+VS Code is massive! We only need:
+├── monaco-editor            # Core editor component (npm package)
+├── Language services        # IntelliSense for JS/TS/CSS/HTML
+└── TextMate grammars        # Syntax highlighting definitions
+```
+
+### What We Take
+
+| Feature | Their Package | Our Implementation |
+|---------|---------------|-------------------|
+| **Monaco Editor** | `monaco-editor` npm | Embed in Electron |
+| **Language services** | Built into Monaco | Enable for JS/TS/CSS/HTML |
+| **File tree UI** | Custom | Build our own simple tree |
+| **Multi-tab** | Custom | Build tabbed interface |
+
+### Implementation
+
+```javascript
+// Install
+npm install monaco-editor
+
+// Embed in Electron (renderer)
+import * as monaco from 'monaco-editor';
+
+// Configure for web languages
+monaco.languages.typescript.javascriptDefaults.setDiagnosticsOptions({
+    noSemanticValidation: false,
+    noSyntaxValidation: false
+});
+
+// Create editor
+const editor = monaco.editor.create(document.getElementById('editor-container'), {
+    value: '// Start coding...',
+    language: 'javascript',
+    theme: 'vs-dark',
+    automaticLayout: true,
+    minimap: { enabled: false },
+    fontSize: 14,
+    lineNumbers: 'on',
+    wordWrap: 'on'
+});
+
+// Listen for changes
+editor.onDidChangeModelContent(() => {
+    const code = editor.getValue();
+    // Auto-save or trigger preview refresh
+});
+```
+
+---
+
+## 5. Goose (Block)
+**Repo:** https://github.com/block/goose
+
+### What They Have
+```
+crates/
+├── goose-cli/      # CLI interface
+├── goose-server/   # Web server
+├── goose/          # Core agent logic
+└── mcp-*/          # Model Context Protocol extensions
+
+ui/                 # Web UI (React)
+documentation/      # Recipes and guides
+```
+
+### What We Take
+
+| Feature | Their Location | Our Implementation |
+|---------|----------------|-------------------|
+| **Recipe system** | `documentation/recipes/` | Project templates |
+| **Session management** | `crates/goose/session.rs` | Session persistence |
+| **Multi-agent support** | `crates/goose/agent.rs` | Future: agent switching |
+| **MCP protocol** | `crates/mcp-*/` | Tool discovery |
+| **Web UI patterns** | `ui/` | UI component ideas |
+
+### Session Pattern
+
+```rust
+// FROM: crates/goose/session.rs (concept)
+struct Session {
+    id: String,
+    messages: Vec<Message>,
+    tools: Vec<Tool>,
+    context: Context,
+}
+
+impl Session {
+    fn save(&self, path: &Path) { /* serialize to disk */ }
+    fn load(path: &Path) -> Self { /* deserialize from disk */ }
+}
+```
+
+**Port to our system:**
+```javascript
+// New: services/SessionService.js
+class SessionService {
+    constructor(dataDir = '.opencode/sessions') {
+        this.dataDir = dataDir;
+    }
+    
+    save(session) {
+        const path = `${this.dataDir}/${session.id}.json`;
+        fs.writeFileSync(path, JSON.stringify({
+            id: session.id,
+            created: session.created,
+            messages: session.messages,
+            files: session.files,
+            projectPath: session.projectPath
+        }, null, 2));
+    }
+    
+    load(sessionId) {
+        const path = `${this.dataDir}/${sessionId}.json`;
+        return JSON.parse(fs.readFileSync(path, 'utf8'));
+    }
+    
+    list() {
+        return fs.readdirSync(this.dataDir)
+            .filter(f => f.endsWith('.json'))
+            .map(f => this.load(f.replace('.json', '')));
+    }
+}
+```
+
+---
+
+## 6. OpenCode TUI (SST)
+**Repo:** https://github.com/sst/opencode
+
+### What They Have
+```
+packages/
+├── core/           # Core agent logic
+├── tui/            # Terminal UI (Ink-based)
+├── lsp/            # Language Server Protocol
+└── providers/      # LLM providers
+```
+
+### What We Take
+
+| Feature | Their Location | Our Implementation |
+|---------|----------------|-------------------|
+| **LSP integration** | `packages/lsp/` | Code intelligence |
+| **Beautiful TUI** | `packages/tui/` | Reference for Ink components |
+| **Provider abstraction** | `packages/providers/` | Multi-LLM support |
+| **Status indicators** | `packages/tui/components/` | Progress UI |
+
+---
+
+## 7. Mini-Agent (MiniMax)
+**Repo:** https://github.com/MiniMax-AI/Mini-Agent
+
+### What They Have
+```
+mini_agent/
+├── agent.py        # Lightweight agent
+├── tools.py        # Tool definitions
+└── memory.py       # Context/memory management
+```
+
+### What We Take
+
+| Feature | Their File | Our Implementation |
+|---------|------------|-------------------|
+| **Memory management** | `memory.py` | Context window optimization |
+| **Simple agent loop** | `agent.py` | Reference for clarity |
+
+### Memory Pattern
+
+```python
+# FROM: mini_agent/memory.py
+class Memory:
+    def __init__(self, max_tokens=8000):
+        self.messages = []
+        self.max_tokens = max_tokens
+    
+    def add(self, message):
+        self.messages.append(message)
+        self._prune_if_needed()
+    
+    def _prune_if_needed(self):
+        """Remove old messages to stay under token limit"""
+        while self._count_tokens() > self.max_tokens:
+            # Keep system message, remove oldest user/assistant
+            if len(self.messages) > 2:
+                self.messages.pop(1)
+```
+
+---
+
+## Summary: Code Extraction Checklist
+
+### Immediate Priority
+
+- [ ] **UIAutomation from Windows-Use** → `computer-use.cjs`
+- [ ] **DOM extraction from browser-use** → `playwright-bridge.js`
+- [ ] **Verify loop from Open-Interface** → `lib/iq-exchange.mjs`
+- [ ] **Monaco Editor from VS Code** → New editor panel
+
+### Medium Priority
+
+- [ ] **Tools registry from browser-use** → `lib/tools-registry.js`
+- [ ] **Session management from Goose** → `services/SessionService.js`
+- [ ] **Grounding system from Windows-Use** → `services/GroundingService.js`
+
+### Nice to Have
+
+- [ ] **LSP from OpenCode** → Code intelligence
+- [ ] **Memory optimization from Mini-Agent** → Context management
+- [ ] **MCP protocol from Goose** → Tool discovery
+
+---
+
+*This document maps exactly what code to extract from each project.*