Files
OpenQode/implementation_plan.md

74 lines
3.5 KiB
Markdown

# IQ Exchange Integration Implementation Plan
## Goal
Fully integrate the "Translation Layer" into IQ Exchange and upgrade the underlying tooling to use robust Windows UI Automation (UIA) hooks. This replaces blind coordinate-based actions with reliable element-based interactions.
## User Review Required
> [!IMPORTANT]
> This integration involves modifying the core `input.ps1` script to use .NET UIA assemblies. This is a significant upgrade that requires PowerShell 5.1+ (standard on Windows 10/11).
## Proposed Changes
### Phase 1: Enhanced Tooling (UIA Support)
Upgrade the low-level execution tools to support robust automation.
#### [MODIFY] [bin/input.ps1](file:///e:/TRAE Playground/Test Ideas/OpenQode-v1.01-Preview/bin/input.ps1)
- **Add:** .NET System.Windows.Automation assembly loading.
- **Add:** `Find-Element` helper function using `AutomationElement.RootElement.FindFirst`.
- **Add:** `Invoke-Element` for UIA InvokePattern (reliable clicking).
- **Add:** `Get-AppState` to dump window structure for context.
- **Implement:** `uiclick`, `waitfor`, `find`, `app_state` commands.
#### [MODIFY] [lib/computer-use.mjs](file:///e:/TRAE Playground/Test Ideas/OpenQode-v1.01-Preview/lib/computer-use.mjs)
- **Expose:** New UIA commands in the `desktop` object.
- **Add:** `getAppState(app_name)` function.
### Phase 2: Translation Layer
Implement the "Brain" that converts natural language to these new robust commands.
#### [MODIFY] [lib/iq-exchange.mjs](file:///e:/TRAE Playground/Test Ideas/OpenQode-v1.01-Preview/lib/iq-exchange.mjs)
- **New Method:** `translateRequest(userPrompt, context)`
- **System Prompt:** Specialized prompt that knows the *exact* API of `input.ps1` and Playwright.
- **Output:** Returns a structured list of commands (JSON or Code Block).
### Phase 3: Main Loop Integration
Hook the translation layer into the TUI.
#### [MODIFY] [bin/opencode-ink.mjs](file:///e:/TRAE Playground/Test Ideas/OpenQode-v1.01-Preview/bin/opencode-ink.mjs)
- **Update:** `handleExecuteCommands` or the stream handler.
- **Logic:**
1. Detect "computer use" intent.
2. Call `iqExchange.translateRequest()`.
3. Auto-execute the returned robust commands.
4. Use existing `auto-heal` if they fail.
### Phase 3.5: Vision Integration
Ensure the AI "Brain" knows it has eyes.
#### [MODIFY] [lib/iq-exchange.mjs](file:///e:/TRAE Playground/Test Ideas/OpenQode-v1.01-Preview/lib/iq-exchange.mjs)
- **Update:** `translateRequest` System Prompt to include:
- `ocr "region"` -> Read text from screen (Textual Vision).
- `screenshot "file"` -> Capture visual state.
- `app_state "App"` -> Structural Vision (Tree Dump).
- **Update:** `buildHealingPrompt` to remind AI of these tools during retries.
## Verification Plan
### Automated Tests
- [x] Verified `ocr` command works (internal logic check)
- [x] Verified `waitfor` command signature matches translation prompt
- [x] Verified `open` command error handling handles `stderr`
- **Integration Test:** Verify `translateRequest` returns valid commands for "Open Notepad and type Hello".
### Manual Verification
- [x] "Open Paint and draw a rectangle" -> Confirmed robust translation plan generation.
- [x] "Check text on screen" -> Confirmed `ocr` command availability.
- [x] "Button list" -> Confirmed `app_state` command availability.
### Manual QA
- **User Scenario:** "Open Paint and draw a rectangle."
- **Success Criteria:**
- Agent converts intent to `open mspaint`, `waitfor`, `uiclick`.
- Execution works without "blind" clicking.
- If paint fails to open, auto-heal detects and fixes.