Release v1.01 Enhanced: Vi Control, TUI Gen5, Core Stability

2025-12-20 01:12:45 +04:00
parent 2407c42eb9
commit 142aaeee1e
254 changed files with 44888 additions and 31025 deletions
--- a/Documentation/iq_exchange_improvement_proposal.md
+++ b/Documentation/iq_exchange_improvement_proposal.md
@@ -0,0 +1,101 @@
+# IQ Exchange & Computer Use: Research & Improvement Proposal
+
+## Executive Summary
+The current IQ Exchange implementation in `opencode-ink.mjs` provides a basic retry loop but lacks a robust "Translation Layer" for converting natural language into precise computer actions. It currently relies on placeholder logic or simple string matching.
+
+Research into state-of-the-art agents (Windows-Use, browser-use, OpenDevin) reveals that reliable agents use **structured translation layers** that map natural language to specific, hook-based APIs (Playwright, UIA) rather than fragile shell commands or pure vision.
+
+This proposal outlines a plan to upgrade the IQ Exchange with a proper **AI Translation Layer** and a **Robust Execution Loop** inspired by these findings.
+
+---
+
+## 1. Analysis of Current Implementation
+
+### Strengths
+- **Retry Loop:** `IQExchange` class has a solid retry mechanism with `maxRetries`.
+- **Feedback Loop:** Captures stdout/stderr and feeds it back to the AI for self-healing.
+- **Task Detection:** Simple regex-based detection for browser vs. desktop tasks.
+
+### Weaknesses
+- **Missing Translation Layer:** The `opencode-ink.mjs` file has a placeholder comment `// NEW: Computer Use Translation Layer` but no actual AI call to convert "Open Spotify and play jazz" into specific PowerShell/Playwright commands. It relies on the *main* chat response to hopefully contain the commands, which is unreliable.
+- **Fragile Command Parsing:** `extractCommands` uses regex finding \`\`\` code blocks, which can be hit-or-miss if the AI is chatty.
+- **No Structural Enforcing:** The AI is free to hallucinate commands or arguments.
+
+---
+
+## 2. Research Findings & Inspiration
+
+### A. Windows-Use (CursorTouch)
+- **Key Insight:** Uses **native UI Automation (UIA)** hooks instead of just vision.
+- **Relevance:** We should prefer `Input.ps1` using UIA (via PowerShell .NET access) over blind mouse coordinates.
+- **Takeaway:** The Translation Layer should map "Click X" to `uiclick "X"` (UIA) rather than `mouse x y`.
+
+### B. browser-use
+- **Key Insight:** **Separation of Concerns**.
+    1. **Perception:** Get DOM/State.
+    2. **Cognition (Planner):** Decide *next action* based on state.
+    3. **Action:** Execute.
+- **Relevance:** Our loop tries to do everything in one prompt.
+- **Takeaway:** We should split the "Translation" step.
+    1. User Request -> Translator AI (Specialized System Prompt) -> Standardized JSON/Script
+    2. Execution Engine -> Runs Script
+    3. Result -> Feedback
+
+### C. Open-Interface
+- **Key Insight:** **Continuous Course Correction**. Takes screenshots *during* execution to verify state.
+- **Relevance:** Our current loop only checks return codes (exit code 0/1).
+- **Takeaway:** We need "Verification Steps" in our commands (e.g., `waitfor "WindowName"`).
+
+---
+
+## 3. Proposed Improvements
+
+### Phase 1: The "Translation Layer" (Immediate Fix)
+Instead of relying on the main chat model to implicitly generate commands, we introduce a **dedicated translation step**.
+
+**Workflow:**
+1. **Detection:** Main Chat detects intent (e.g., "Computer Use").
+2. **Translation:** System calls a fast, specialized model (or same model with focused prompt) with the *specific schema* of available tools.
+   - **Input:** "Open Spotify and search for Jazz"
+   - **System Prompt:** "You are a Command Translator. Available tools: `open(app)`, `click(text)`, `type(text)`. Output ONLY the plan."
+   - **Output:**
+     ```powershell
+     powershell bin/input.ps1 open "Spotify"
+     powershell bin/input.ps1 waitfor "Search" 5
+     powershell bin/input.ps1 uiclick "Search"
+     powershell bin/input.ps1 type "Jazz"
+     ```
+3. **Execution:** The existing `IQExchange` loop runs this reliable script.
+
+### Phase 2: Enhanced Tooling (Library Update)
+Update `lib/computer-use.mjs` and `bin/input.ps1` to support **UIA-based robust actions**:
+- `uiclick "Text"`: Finds element by text name via UIA (more robust than coordinates).
+- `waitfor "Text"`: Polling loop to wait for UI state changes.
+- `app_state "App"`: Returns detailed window state/focus.
+
+### Phase 3: The "Cognitive Loop" (Architecture Shift)
+Move from **"Plan -> Execute All"** to **"Observe -> Plan -> Act -> Observe"**.
+- Instead of generating a full script at start, the agent generates *one step*, executes it, observes the result (screenshot/output), then generates the next step.
+- This handles dynamic popups and loading times much better.
+
+---
+
+## 4. Implementation Plan (for Phase 1 & 2)
+
+### Step 1: Implement Dedicated Translation Function
+In `lib/iq-exchange.mjs` or `bin/opencode-ink.mjs`, create `translateToCommands(userRequest, context)`:
+- Uses a strict system prompt defining the *exact* API.
+- Enforces output format (e.g., JSON or strict Code Block).
+
+### Step 2: Integrate into `handleExecuteCommands`
+- Detect if request is "Computer Use".
+- If so, *pause* main chat generation.
+- Call `translateToCommands`.
+- Feed result into the `auto-heal` loop.
+
+### Step 3: Upgrade `input.ps1`
+- Ensure it supports the robust UIA methods discovered in Windows-Use (using .NET `System.Windows.Automation`).
+
+## 5. User Review Required
+- **Decision:** Do we want the full "Cognitive Loop" (slower, more tokens, highly reliable) or the "Batch Script" approach (faster, cheaper, less robust)?
+- **Recommendation:** Start with **Batch Script + Translation Layer** (Phase 1). It fits the current TUI architecture best without a total rewrite.