feat: Integrated Vision & Robust Translation Layer, Secured Repo (removed keys)

This commit is contained in:
Gemini AI
2025-12-15 04:53:51 +04:00
Unverified
parent a8436c91a3
commit 2407c42eb9
38 changed files with 7786 additions and 3776 deletions

View File

@@ -1,19 +1,56 @@
# Final Feature Implementation - Verification
# Walkthrough - IQ Exchange Integration & Fixes
## 1. In-Chat Agent Visuals
- **What**: Distinct visual badges for Agent switches (e.g., `🤖 Security`, `🤖 Planner`) in the chat stream.
- **How**:
- Updated `flattenMessagesToBlocks` to parse `[AGENT: Name]` tags.
- Updated `ViewportMessage` to render a `Box` with `borderStyle: 'round'` and `magenta` color for these tags.
- **Verify**: Run a multi-agent flow (e.g., "Analyze this security...") and observe the chat. You should see purple badges between text blocks.
We have successfully integrated the **IQ Exchange Translation Layer** and **Vision Capabilities** into the OpenQode TUI and resolved critical execution fragility.
## 2. Global Responsive Hardening
- **What**: Prevents text overlap and horizontal scrolling when the terminal is resized.
- **How**:
- Enforced strict `width` propagation from `App` -> `ScrollableChat` -> `ViewportMessage`.
- Applied `width - 12` constraint to all `Markdown` and `CodeCard` components to account for gutters and borders.
- **Verify**: Resize your terminal window while chat is visible. Text should wrap dynamically without breaking the layout.
## 🚀 Key Features Implemented
## 3. Previous Wins (Retained)
- **Fluid Sidebar**: Rolling counters and CPS speedometer.
- **Clean UI**: Minimalist Code Cards.
### 1. The Translation Layer (`lib/iq-exchange.mjs`)
- **New Brain:** `translateRequest(userPrompt)` method acting as a cognitive bridge.
- **Robust Protocol:** Converts natural language (e.g., "Open Spotify") into precise PowerShell/Playwright commands.
- **System Commands:**
- `uiclick`: Reliable UIA-based clicking (no more blind coordinates).
- `waitfor`: Synchronization primitive to prevent racing the UI.
- `app_state`: Structural vision to "see" window contents.
### 2. Vision Integration (User Request)
The AI now has full vision capabilities exposed in its system prompt:
- **`ocr "region"`**: Reads text from the screen using Windows OCR (Windows 10/11 native).
- **`app_state "App"`**: Dumps the UI hierarchy to understand button names and inputs.
- **`screenshot "file"`**: Captures visual state.
### 3. Execution Robustness (Fixes)
- **Resolved "Unknown Error":** Fixed quoting logic in `executeAny` regex to handle arguments with spaces properly (`"mspaint.exe"` was breaking).
- **Better Error Reporting:** `input.ps1` now explicitly writes to Stderr when `Start-Process` fails, giving the AI actionable feedback.
## 🧪 Verification Results
### Static Analysis
| Component | Status | Notes |
|-----------|--------|-------|
| `input.ps1` | ✅ Verified | `ocr` implemented, `open` uses explicit error handling. |
| `iq-exchange.mjs` | ✅ Verified | Translation prompts include vision; regex fixed for quoted args. |
| `opencode-ink.mjs` | ✅ Verified | `handleSubmit` handles translation and errors. |
### Manual Verification Steps
To verify this in the live TUI:
1. **Launch OpenQode:** `npm run tui`
2. **Textual Vision Test:**
- Prompt: "Check the text on my active window using OCR."
- Expected: Agent runs `powershell bin/input.ps1 ocr "full"` and reports the text.
3. **Robust Action Test (Fixed):**
- Prompt: "Open Notepad and type 'Hello World'."
- Expected:
```powershell
powershell bin/input.ps1 open "Notepad"
powershell bin/input.ps1 waitfor "Untitled" 5
powershell bin/input.ps1 type "Hello World"
```
- **Fix Verification:** Should no longer show "Unknown error" or exit code 1.
4. **Structural Vision Test:**
- Prompt: "What buttons are available in the Calculator app?"
- Expected: Agent runs `powershell bin/input.ps1 app_state "Calculator"` and lists the button hierarchy.
## ⚠️ Notes
- **OCR Requirement:** Requires Windows 10 1809+ with a language pack installed (standard on most systems).
- **Permissions:** PowerShell scripts run with `-ExecutionPolicy Bypass`.