17 KiB
17 KiB
📦 FEATURE EXTRACTION MAP - Reference Projects
What specific code/patterns to take from each project for Goose Super
1. Windows-Use (CursorTouch)
Repo: https://github.com/CursorTouch/Windows-Use
What They Have
windows_use/
├── agent/ # Agent orchestration loop
├── llms/ # LLM provider abstraction (Ollama, Google, OpenAI)
├── messages/ # Conversation/message handling
├── tool/ # Tool definitions for automation
└── telemetry/ # Usage analytics
What We Take
| Feature | Their File | Our Implementation |
|---|---|---|
| UIAutomation element finder | tool/uia.py |
Port to computer-use.cjs |
| Element grounding (labeling) | tool/grounding.py |
New services/GroundingService.js |
| Smart click by name | tool/actions.py |
Enhance input.ps1 uiclick |
| LLM abstraction layer | llms/*.py |
New services/LLMService.js |
| Agent loop pattern | agent/agent.py |
Enhance lib/iq-exchange.mjs |
Specific Code to Port
# FROM: windows_use/tool/uia.py - UIAutomation element discovery
def find_element_by_name(name: str, control_type: str = None):
"""Find UI element using Windows Accessibility API"""
automation = UIAutomation.GetRootElement()
condition = automation.CreatePropertyCondition(
UIA.NamePropertyId, name
)
if control_type:
condition = automation.CreateAndCondition(
condition,
automation.CreatePropertyCondition(
UIA.ControlTypePropertyId, control_type
)
)
return automation.FindFirst(TreeScope.Descendants, condition)
Port to Node.js/PowerShell:
# Enhanced input.ps1 - find_element function
function Find-ElementByName {
param([string]$Name, [string]$ControlType = $null)
Add-Type -AssemblyName UIAutomationClient
$root = [System.Windows.Automation.AutomationElement]::RootElement
$condition = [System.Windows.Automation.PropertyCondition]::new(
[System.Windows.Automation.AutomationElement]::NameProperty,
$Name
)
return $root.FindFirst([System.Windows.Automation.TreeScope]::Descendants, $condition)
}
Grounding System (Label Elements on Screenshot)
# FROM: windows_use/tool/grounding.py
def create_grounding_map(screenshot_path):
"""Overlay element labels on screenshot for AI understanding"""
elements = get_all_interactive_elements()
labeled_image = draw_labels_on_screenshot(screenshot_path, elements)
return {
"image": labeled_image,
"elements": [
{"id": i, "name": el.name, "type": el.type, "bounds": el.rect}
for i, el in enumerate(elements)
]
}
2. Open-Interface (AmberSahdev)
Repo: https://github.com/AmberSahdev/Open-Interface
What They Have
app/
├── core.py # Main loop: screenshot → LLM → execute → verify
├── interpreter.py # Parse LLM response into actions
├── llm.py # LLM communication
├── ui.py # Tkinter UI (18KB)
└── utils/ # Helpers
What We Take
| Feature | Their File | Our Implementation |
|---|---|---|
| Screenshot → Execute → Verify loop | core.py |
Enhance IQ Exchange loop |
| Action interpreter/parser | interpreter.py |
Improve extractExecutables() |
| Corner detection (interrupt) | ui.py |
Add to Electron app |
| Course-correction logic | core.py |
Add to self-heal flow |
Specific Code to Port
# FROM: app/core.py - Main automation loop
class AutomationCore:
def run_task(self, user_request):
while not self.task_complete and self.attempts < self.max_attempts:
# 1. Capture current state
screenshot = self.capture_screen()
# 2. Send to LLM with context
response = self.llm.analyze(
screenshot=screenshot,
request=user_request,
previous_actions=self.action_history
)
# 3. Parse and execute actions
actions = self.interpreter.parse(response)
for action in actions:
result = self.executor.run(action)
self.action_history.append(result)
# 4. Verify progress
new_screenshot = self.capture_screen()
progress = self.llm.verify_progress(screenshot, new_screenshot, user_request)
if progress.is_complete:
self.task_complete = True
elif progress.is_stuck:
self.attempts += 1
# Self-correction: ask LLM for alternative approach
Port to our IQ Exchange:
// Enhanced lib/iq-exchange.mjs
async process(userRequest) {
while (!this.taskComplete && this.attempts < this.maxRetries) {
// 1. Screenshot current state
const before = await this.captureScreen();
// 2. Get AI instructions
const response = await this.sendToAI(
this.buildPromptWithVision(userRequest, before, this.history)
);
// 3. Execute
const actions = this.extractExecutables(response);
for (const action of actions) {
await this.executeAny(action.content);
}
// 4. Verify
const after = await this.captureScreen();
const verified = await this.verifyProgress(before, after, userRequest);
if (verified.complete) {
this.taskComplete = true;
} else if (verified.stuck) {
this.attempts++;
// Request alternative approach
}
}
}
Mouse Corner Interrupt
# FROM: app/ui.py - Corner detection to stop automation
def check_corner_interrupt(self):
"""Stop if user drags mouse to corner (safety interrupt)"""
x, y = pyautogui.position()
screen_w, screen_h = pyautogui.size()
corners = [
(0, 0), (screen_w, 0),
(0, screen_h), (screen_w, screen_h)
]
for cx, cy in corners:
if abs(x - cx) < 50 and abs(y - cy) < 50:
self.stop_automation()
return True
return False
3. Browser-Use
Repo: https://github.com/browser-use/browser-use
What They Have (Most Comprehensive!)
browser_use/
├── agent/ # Agent service and logic
├── browser/ # Playwright wrapper with smart features
├── controller/ # Action controller
├── dom/ # DOM analysis and manipulation
├── code_use/ # Code execution sandbox
├── filesystem/ # File operations
├── tools/ # Custom tool definitions
├── skills/ # Reusable action patterns
├── sandbox/ # Safe execution environment
├── llm/ # LLM integrations
└── mcp/ # Model Context Protocol
What We Take
| Feature | Their File | Our Implementation |
|---|---|---|
| Smart DOM extraction | dom/ |
New services/DOMService.js |
| Element selectors by text | browser/element.py |
Enhance playwright-bridge.js |
| Multi-tab management | browser/session.py |
Add to browser panel |
| Tools decorator pattern | tools/registry.py |
New tool registration system |
| Sandbox execution | sandbox/ |
Safe code runner |
| Form auto-detection | dom/forms.py |
Auto-fill capability |
| Smart waiting | browser/waits.py |
Better wait logic |
Specific Code to Port
# FROM: browser_use/dom/extractor.py - Smart DOM extraction
class DOMExtractor:
def extract_interactive_elements(self, page):
"""Extract all clickable/fillable elements with smart selectors"""
return page.evaluate('''() => {
const elements = [];
const interactive = document.querySelectorAll(
'a, button, input, select, textarea, [role="button"], [onclick]'
);
for (let i = 0; i < interactive.length; i++) {
const el = interactive[i];
const rect = el.getBoundingClientRect();
if (rect.width > 0 && rect.height > 0) {
elements.push({
index: i,
tag: el.tagName.toLowerCase(),
text: el.textContent?.trim().slice(0, 50) || '',
placeholder: el.placeholder || '',
id: el.id,
name: el.name,
type: el.type,
selector: generateUniqueSelector(el),
rect: { x: rect.x, y: rect.y, w: rect.width, h: rect.height }
});
}
}
return elements;
}''')
Port to our Playwright bridge:
// Enhanced bin/playwright-bridge.js
async function extractInteractiveElements(page) {
return await page.evaluate(() => {
const elements = [];
const selectors = 'a, button, input, select, textarea, [role="button"], [onclick]';
document.querySelectorAll(selectors).forEach((el, i) => {
const rect = el.getBoundingClientRect();
if (rect.width > 0 && rect.height > 0) {
elements.push({
index: i,
tag: el.tagName.toLowerCase(),
text: (el.textContent || '').trim().slice(0, 50),
placeholder: el.placeholder || '',
ariaLabel: el.getAttribute('aria-label') || '',
selector: el.id ? `#${el.id}` : generateSelector(el),
rect: { x: rect.x, y: rect.y, w: rect.width, h: rect.height }
});
}
});
return elements;
});
}
Custom Tools Pattern
# FROM: browser_use/tools/registry.py
from browser_use import Tools
tools = Tools()
@tools.action(description='Search for something on Google')
async def google_search(query: str):
await browser.navigate('https://google.com')
await browser.fill('input[name="q"]', query)
await browser.press('Enter')
return await browser.get_content()
Port to our system:
// New: lib/tools-registry.js
class ToolsRegistry {
constructor() {
this.tools = new Map();
}
register(name, description, handler) {
this.tools.set(name, { description, handler });
}
async execute(name, params) {
const tool = this.tools.get(name);
if (!tool) throw new Error(`Unknown tool: ${name}`);
return await tool.handler(params);
}
getSchema() {
return Array.from(this.tools.entries()).map(([name, tool]) => ({
name,
description: tool.description
}));
}
}
// Register tools
tools.register('google_search', 'Search Google', async ({ query }) => {
await playwright.navigate('https://google.com');
await playwright.fill('input[name="q"]', query);
await playwright.press('Enter');
});
4. VS Code / Monaco Editor
Repo: https://github.com/microsoft/vscode
What They Have
VS Code is massive! We only need:
├── monaco-editor # Core editor component (npm package)
├── Language services # IntelliSense for JS/TS/CSS/HTML
└── TextMate grammars # Syntax highlighting definitions
What We Take
| Feature | Their Package | Our Implementation |
|---|---|---|
| Monaco Editor | monaco-editor npm |
Embed in Electron |
| Language services | Built into Monaco | Enable for JS/TS/CSS/HTML |
| File tree UI | Custom | Build our own simple tree |
| Multi-tab | Custom | Build tabbed interface |
Implementation
// Install
npm install monaco-editor
// Embed in Electron (renderer)
import * as monaco from 'monaco-editor';
// Configure for web languages
monaco.languages.typescript.javascriptDefaults.setDiagnosticsOptions({
noSemanticValidation: false,
noSyntaxValidation: false
});
// Create editor
const editor = monaco.editor.create(document.getElementById('editor-container'), {
value: '// Start coding...',
language: 'javascript',
theme: 'vs-dark',
automaticLayout: true,
minimap: { enabled: false },
fontSize: 14,
lineNumbers: 'on',
wordWrap: 'on'
});
// Listen for changes
editor.onDidChangeModelContent(() => {
const code = editor.getValue();
// Auto-save or trigger preview refresh
});
5. Goose (Block)
Repo: https://github.com/block/goose
What They Have
crates/
├── goose-cli/ # CLI interface
├── goose-server/ # Web server
├── goose/ # Core agent logic
└── mcp-*/ # Model Context Protocol extensions
ui/ # Web UI (React)
documentation/ # Recipes and guides
What We Take
| Feature | Their Location | Our Implementation |
|---|---|---|
| Recipe system | documentation/recipes/ |
Project templates |
| Session management | crates/goose/session.rs |
Session persistence |
| Multi-agent support | crates/goose/agent.rs |
Future: agent switching |
| MCP protocol | crates/mcp-*/ |
Tool discovery |
| Web UI patterns | ui/ |
UI component ideas |
Session Pattern
// FROM: crates/goose/session.rs (concept)
struct Session {
id: String,
messages: Vec<Message>,
tools: Vec<Tool>,
context: Context,
}
impl Session {
fn save(&self, path: &Path) { /* serialize to disk */ }
fn load(path: &Path) -> Self { /* deserialize from disk */ }
}
Port to our system:
// New: services/SessionService.js
class SessionService {
constructor(dataDir = '.opencode/sessions') {
this.dataDir = dataDir;
}
save(session) {
const path = `${this.dataDir}/${session.id}.json`;
fs.writeFileSync(path, JSON.stringify({
id: session.id,
created: session.created,
messages: session.messages,
files: session.files,
projectPath: session.projectPath
}, null, 2));
}
load(sessionId) {
const path = `${this.dataDir}/${sessionId}.json`;
return JSON.parse(fs.readFileSync(path, 'utf8'));
}
list() {
return fs.readdirSync(this.dataDir)
.filter(f => f.endsWith('.json'))
.map(f => this.load(f.replace('.json', '')));
}
}
6. OpenCode TUI (SST)
Repo: https://github.com/sst/opencode
What They Have
packages/
├── core/ # Core agent logic
├── tui/ # Terminal UI (Ink-based)
├── lsp/ # Language Server Protocol
└── providers/ # LLM providers
What We Take
| Feature | Their Location | Our Implementation |
|---|---|---|
| LSP integration | packages/lsp/ |
Code intelligence |
| Beautiful TUI | packages/tui/ |
Reference for Ink components |
| Provider abstraction | packages/providers/ |
Multi-LLM support |
| Status indicators | packages/tui/components/ |
Progress UI |
7. Mini-Agent (MiniMax)
Repo: https://github.com/MiniMax-AI/Mini-Agent
What They Have
mini_agent/
├── agent.py # Lightweight agent
├── tools.py # Tool definitions
└── memory.py # Context/memory management
What We Take
| Feature | Their File | Our Implementation |
|---|---|---|
| Memory management | memory.py |
Context window optimization |
| Simple agent loop | agent.py |
Reference for clarity |
Memory Pattern
# FROM: mini_agent/memory.py
class Memory:
def __init__(self, max_tokens=8000):
self.messages = []
self.max_tokens = max_tokens
def add(self, message):
self.messages.append(message)
self._prune_if_needed()
def _prune_if_needed(self):
"""Remove old messages to stay under token limit"""
while self._count_tokens() > self.max_tokens:
# Keep system message, remove oldest user/assistant
if len(self.messages) > 2:
self.messages.pop(1)
Summary: Code Extraction Checklist
Immediate Priority
- UIAutomation from Windows-Use →
computer-use.cjs - DOM extraction from browser-use →
playwright-bridge.js - Verify loop from Open-Interface →
lib/iq-exchange.mjs - Monaco Editor from VS Code → New editor panel
Medium Priority
- Tools registry from browser-use →
lib/tools-registry.js - Session management from Goose →
services/SessionService.js - Grounding system from Windows-Use →
services/GroundingService.js
Nice to Have
- LSP from OpenCode → Code intelligence
- Memory optimization from Mini-Agent → Context management
- MCP protocol from Goose → Tool discovery
This document maps exactly what code to extract from each project.