3.5 KiB
3.5 KiB
IQ Exchange Integration Implementation Plan
Goal
Fully integrate the "Translation Layer" into IQ Exchange and upgrade the underlying tooling to use robust Windows UI Automation (UIA) hooks. This replaces blind coordinate-based actions with reliable element-based interactions.
User Review Required
Important
This integration involves modifying the core
input.ps1script to use .NET UIA assemblies. This is a significant upgrade that requires PowerShell 5.1+ (standard on Windows 10/11).
Proposed Changes
Phase 1: Enhanced Tooling (UIA Support)
Upgrade the low-level execution tools to support robust automation.
[MODIFY] [bin/input.ps1](file:///e:/TRAE Playground/Test Ideas/OpenQode-v1.01-Preview/bin/input.ps1)
- Add: .NET System.Windows.Automation assembly loading.
- Add:
Find-Elementhelper function usingAutomationElement.RootElement.FindFirst. - Add:
Invoke-Elementfor UIA InvokePattern (reliable clicking). - Add:
Get-AppStateto dump window structure for context. - Implement:
uiclick,waitfor,find,app_statecommands.
[MODIFY] [lib/computer-use.mjs](file:///e:/TRAE Playground/Test Ideas/OpenQode-v1.01-Preview/lib/computer-use.mjs)
- Expose: New UIA commands in the
desktopobject. - Add:
getAppState(app_name)function.
Phase 2: Translation Layer
Implement the "Brain" that converts natural language to these new robust commands.
[MODIFY] [lib/iq-exchange.mjs](file:///e:/TRAE Playground/Test Ideas/OpenQode-v1.01-Preview/lib/iq-exchange.mjs)
- New Method:
translateRequest(userPrompt, context) - System Prompt: Specialized prompt that knows the exact API of
input.ps1and Playwright. - Output: Returns a structured list of commands (JSON or Code Block).
Phase 3: Main Loop Integration
Hook the translation layer into the TUI.
[MODIFY] [bin/opencode-ink.mjs](file:///e:/TRAE Playground/Test Ideas/OpenQode-v1.01-Preview/bin/opencode-ink.mjs)
- Update:
handleExecuteCommandsor the stream handler. - Logic:
- Detect "computer use" intent.
- Call
iqExchange.translateRequest(). - Auto-execute the returned robust commands.
- Use existing
auto-healif they fail.
Phase 3.5: Vision Integration
Ensure the AI "Brain" knows it has eyes.
[MODIFY] [lib/iq-exchange.mjs](file:///e:/TRAE Playground/Test Ideas/OpenQode-v1.01-Preview/lib/iq-exchange.mjs)
- Update:
translateRequestSystem Prompt to include:ocr "region"-> Read text from screen (Textual Vision).screenshot "file"-> Capture visual state.app_state "App"-> Structural Vision (Tree Dump).
- Update:
buildHealingPromptto remind AI of these tools during retries.
Verification Plan
Automated Tests
- Verified
ocrcommand works (internal logic check) - Verified
waitforcommand signature matches translation prompt - Verified
opencommand error handling handlesstderr - Integration Test: Verify
translateRequestreturns valid commands for "Open Notepad and type Hello".
Manual Verification
- "Open Paint and draw a rectangle" -> Confirmed robust translation plan generation.
- "Check text on screen" -> Confirmed
ocrcommand availability. - "Button list" -> Confirmed
app_statecommand availability.
Manual QA
- User Scenario: "Open Paint and draw a rectangle."
- Success Criteria:
- Agent converts intent to
open mspaint,waitfor,uiclick. - Execution works without "blind" clicking.
- If paint fails to open, auto-heal detects and fixes.
- Agent converts intent to