sync: PR #28 - tkinter GUI restore, proxy feature toggles, CROF guard fixes, gRPC test skip

feat: add 9 provider presets, latency indicator, fix _start_proxy bug, fix sandbox/approval flags, update Antigravity models
sync: PR #21 - MiMo compat fix, endpoint edit dedup, anti-stall Windows compat, AGENTS.md/CLAUDE.md
2026-05-29 13:14:48 +04:00 · 2026-05-29 09:47:04 +04:00 · 2026-05-27 22:00:12 +04:00 · 2026-05-27 19:19:03 +04:00 · 2026-05-27 19:16:29 +04:00 · 2026-05-27 19:04:58 +04:00
40 changed files with 47602 additions and 676 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -9,3 +9,9 @@ config.toml
 *.swp
 *~
 .DS_Store
+DEBIAN/
+usr/
+oauth-secrets.json
+secrets/
+*.secret
+.env
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,83 @@
+# Project: Codex Launcher — Any AI Provider
+
+## Overview
+
+OpenAI Codex CLI & Desktop launcher that proxies to **any** AI provider.
+Python-only (stdlib), zero pip dependencies. Supports Responses API, Chat Completions,
+Anthropic Messages API, Command Code, and more via a translation proxy.
+
+Maintained by:
+- **roman-ryzenadvanced** — original Linux/GTK development
+- **cobra91** — Windows port (tkinter GUI, MSIX support)
+
+## Architecture
+
+```
+codex-launcher-gui.py  (tkinter on Windows / GTK on Linux)
+  → codex_launcher_lib.py  (shared library: endpoints, config, process mgmt)
+    → translate-proxy.py   (HTTP proxy: Responses API → backend API)
+      → upstream provider (OpenAI, Anthropic, DeepSeek, Antigravity, etc.)
+```
+
+### Key Files
+
+| File | Purpose |
+|------|---------|
+| `src/codex-launcher-gui.py` | Windows tkinter GUI |
+| `src/codex_launcher_lib.py` | Shared library (endpoints, config, process management) |
+| `src/translate-proxy.py` | Translation proxy (core routing, adapters, streaming) |
+| `src/antigravity_grpc/` | gRPC client for Antigravity provider |
+
+### Backend Types
+
+| Type | Wire Protocol | Example |
+|------|--------------|---------|
+| `openai-compat` | Chat Completions | DeepSeek, OpenRouter, Crof.ai |
+| `anthropic` | Anthropic Messages | Anthropic direct, OpenCode Zen |
+| `command-code` | Command Code /alpha/generate | CommandCode API |
+| `gemini-oauth-*` | Google OAuth | Google Antigravity |
+
+## Platform Compatibility
+
+**MUST work on both Linux and Windows.** No exceptions.
+
+### Platform-Specific Patterns
+
+- **Process management**: `os.setsid()` + `os.killpg()` on Linux, `CREATE_NEW_PROCESS_GROUP` on Windows
+- **Process listing**: `pgrep` on Linux, `tasklist` / `wmic` on Windows
+- **Desktop launch**: exe path on Linux, `shell:AppsFolder\{AUMID}` for MSIX on Windows
+- **Signals**: `signal.SIGTERM` on Linux, `taskkill /F` on Windows
+- **Paths**: `~/.local/bin/` on Linux, `%LOCALAPPDATA%\Programs\Codex-Launcher\` on Windows
+- **Config**: `~/.codex/config.toml` (same format on both)
+- **POSIX-only APIs**: `os.getpgid()`, `/proc/{pid}/stat`, `os.setsid()` — always guard with `sys.platform` checks
+
+### Testing Cross-Platform
+
+- Never assume Unix-only APIs exist (`pgrep`, `getpgid`, `SIGTERM`)
+- Use `sys.platform == "win32"` for Windows branches
+- Test proxy startup on both platforms before committing
+- Provider presets (PROVIDER_PRESETS) work identically on both
+
+## Coding Conventions
+
+- Python 3.8+ stdlib only, zero pip dependencies
+- `snake_case` for functions/variables, `UPPER_CASE` for globals
+- Immutable patterns: create new dicts/objects, don't mutate in-place
+- Error handling: catch at boundaries, never silently swallow errors
+- Thread-safe: use `threading.Lock` for shared state, `threading.Semaphore` for concurrency
+
+## Common Pitfalls
+
+- **MSIX exe paths**: `C:\Program Files\WindowsApps\` exes cannot be launched via `subprocess.Popen` — use `shell:AppsFolder` protocol
+- **File locking on Windows**: Python can't overwrite files open in another process
+- **Path separators**: always use `os.path.join()` or `Path` objects, never hardcoded `/`
+- **Signal handling**: Windows doesn't support `SIGUSR1`/`SIGUSR2` — use events or named pipes
+
+## Testing
+
+- **Run before every commit**: `python -m pytest tests/ -v`
+- **All tests must pass** before pushing a PR
+- Test files live in `tests/` directory
+- Tests use `pytest` (not unittest runner)
+- Platform-specific tests must skip gracefully on other OS: `pytest.mark.skipif(sys.platform != "linux", reason="Linux-only")`
+- Never mock filesystem paths with hardcoded separators — use `os.path.join` or `tmp_path`
--- a/AI-MONITORING-DESIGN.md
+++ b/AI-MONITORING-DESIGN.md
@@ -0,0 +1,638 @@
+# AI Monitoring — Design Specification
+
+> **Codex Launcher v3.8.0 Feature Design**
+> Self-healing nano agent that monitors proxy health, diagnoses failures, and auto-recovers sessions.
+
+---
+
+## 1. Problem Statement
+
+Over 42 sessions in production, we observed these failure categories:
+
+| # | Failure Category | Count | Example |
+|---|-----------------|-------|---------|
+| F1 | **parsed_tool_calls=0** — model produces unparseable output | 42 | Bare `<explore_agent>`, `<bash>` without cmd, plain English intent |
+| F2 | **Stuck recovery triggered** — Intelligence Routing Layer 3 | 13 | "I need to fetch the README", "let me write the script" |
+| F3 | **Sanitizer flagged suspicious cmd** — cmd still JSON after unwrap | 11 | `{/'cmd/': /'sshpass -p .../'}` — double-escaped quoting |
+| F4 | **Upstream 500** — provider internal error | ~5 | `"An internal error occurred. Please try again later."` |
+| F5 | **Connection timeout** — upstream unreachable | ~3 | `Connection timed out after 15002 milliseconds` |
+| F6 | **Upstream 401/403** — auth failure | ~2 | Wrong API key, expired token, `upgrade_required` |
+| F7 | **Stream crash** — exception mid-stream | ~2 | `BrokenPipeError`, `ConnectionResetError` during SSE |
+| F8 | **Proxy port conflict** — Address already in use | ~1 | Stale process holding port |
+| F9 | **Schema cache corruption** — stale content_type=array | ~1 | `ErrorAnalyzer` learned wrong schema |
+| F10 | **Codex Desktop crash** — SIGKILL at ~27GB | ~1 | Issue #24048 — unbounded tool output memory |
+| F11 | **Codex 300s stall** — turn state machine race | ~1 | Issue #23807 — `stream disconnected` after 300s |
+
+### The Gap
+
+Intelligence Routing (v3.7.0) handles F1/F2/F3 **inside a single request**. But it can't:
+
+- **Detect a dead proxy process** (F7/F8) — the proxy already crashed
+- **Reconnect Codex to a restarted proxy** (F5/F7/F8) — Codex doesn't auto-reconnect
+- **Switch to a backup provider** when the primary is down (F4/F5)
+- **Clear corrupt caches** (F9) — requires out-of-band action
+- **Restart Codex Desktop** after a crash (F10/F11)
+- **Learn from failure patterns** across sessions — each failure is handled independently
+
+### What We Need
+
+A **separate lightweight watchdog process** that:
+1. Monitors proxy health continuously
+2. Detects failures the proxy can't detect itself
+3. Uses a cheap AI model to diagnose novel failures
+4. Takes corrective action automatically
+5. Learns from past incidents to prevent repeats
+
+---
+
+## 2. Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                        Codex Launcher GUI                            │
+│  ┌──────────┐  ┌──────────────┐  ┌───────────────────────────────┐ │
+│  │  Proxy   │  │   Codex      │  │   AI Monitoring Panel         │ │
+│  │  Manager │  │   Launcher   │  │   ┌─────────────────────┐     │ │
+│  │          │  │              │  │   │ ON/OFF Toggle        │     │ │
+│  └────┬─────┘  └──────┬───────┘  │   │ Provider Selector    │     │ │
+│       │               │          │   │ Model Selector        │     │ │
+│       │               │          │   │ Incident Log          │     │ │
+│       │               │          │   │ [View Diagnostics]    │     │ │
+│       │               │          │   └─────────────────────┘     │ │
+│       │               │          └───────────────────────────────┘ │
+└───────┼───────────────┼────────────────────────────────────────────┘
+        │               │
+        ▼               ▼
+┌───────────────┐  ┌────────────────┐
+│ translate-    │  │  Codex Desktop  │
+│ proxy.py      │  │  / CLI          │
+│ (port 8080)   │  │                 │
+│               │  │                 │
+│ /health ──────┼──┼─► health check  │
+│ /responses ───┼──┼─► main API      │
+└───────────────┘  └────────────────┘
+        ▲
+        │ health probes + log analysis + corrective actions
+        │
+┌───────┴────────────────────────────────────────────────────────────┐
+│                     AI Monitor Watchdog                             │
+│                    (thread in codex-launcher-gui)                   │
+│                                                                     │
+│  ┌─────────────────┐  ┌─────────────────┐  ┌──────────────────┐  │
+│  │  Health Watcher  │  │  Log Analyzer   │  │  AI Diagnostic   │  │
+│  │  (every 5s)      │  │  (continuous)    │  │  Agent (on-call) │  │
+│  │                  │  │                  │  │                  │  │
+│  │  - /health probe │  │  - tail cc-debug │  │  - Classify err  │  │
+│  │  - process alive │  │  - tail proxy.log│  │  - Root cause    │  │
+│  │  - port check    │  │  - pattern match │  │  - Suggest fix   │  │
+│  │  - memory watch  │  │  - incident DB   │  │  - Execute fix   │  │
+│  └────────┬────────┘  └────────┬────────┘  └────────┬─────────┘  │
+│           │                    │                     │             │
+│           └────────────────────┼─────────────────────┘             │
+│                                ▼                                   │
+│                    ┌──────────────────────┐                        │
+│                    │  Incident Store      │                        │
+│                    │  (JSON file)         │                        │
+│                    │  - Known patterns    │                        │
+│                    │  - Past resolutions  │                        │
+│                    │  - Success rates     │                        │
+│                    └──────────────────────┘                        │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 3. Three-Tier Response System
+
+### Tier 1: Fast Path — Rule-Based Auto-Recovery (< 1 second)
+
+Immediate reactions to **known failure patterns**. No AI needed.
+
+```python
+TIER1_RULES = [
+    # (trigger_pattern, action, cooldown)
+    
+    # --- Proxy Health ---
+    ("proxy_health_fail",      "restart_proxy",           30),
+    ("proxy_port_conflict",    "kill_stale + restart",     60),
+    ("proxy_memory_over_1gb",  "restart_proxy",           120),
+    
+    # --- Upstream Errors ---
+    ("upstream_429",           "wait_retry_after",          0),
+    ("upstream_502_503",       "retry_with_backoff",       30),
+    ("upstream_500_repeat_3x", "switch_provider",          60),
+    ("upstream_timeout",       "retry + increase_timeout", 30),
+    ("upstream_401_403",       "alert_user_bad_key",        0),
+    
+    # --- Stream Errors ---
+    ("stream_broken_pipe",     "restart_proxy",            30),
+    ("stream_reset",           "restart_proxy",            30),
+    ("stream_idle_300s",       "restart_proxy",            60),
+    
+    # --- Parser Failures ---
+    ("parsed_tool_calls_0_x3", "clear_schema_cache",      300),
+    ("sanitizer_suspicious_5x","alert_user_model_issue",    0),
+    ("stuck_recovery_x5",      "suggest_switch_model",      0),
+    
+    # --- Codex Process ---
+    ("codex_process_dead",     "alert_user_restart",         0),
+    ("codex_memory_over_4gb",  "alert_user_memory",          0),
+    
+    # --- Cache Corruption ---
+    ("schema_content_type_array", "delete_provider_caps",     0),
+]
+```
+
+### Tier 2: Pattern Matching — Incident Store Lookup (< 100ms)
+
+For failures we've **seen before and resolved**, look up the fix:
+
+```json
+{
+  "incidents": [
+    {
+      "pattern": "cc_stream_ended_empty + explore_agent + no_url",
+      "fix": "synth_explore_from_last_user_urls",
+      "source": "FIX-23",
+      "success_rate": 0.85,
+      "last_seen": "2026-05-22T16:00:00Z",
+      "occurrences": 5
+    },
+    {
+      "pattern": "require_escalation + no_cmd",
+      "fix": "auto_proceed_echo",
+      "source": "FIX-24",
+      "success_rate": 1.0,
+      "last_seen": "2026-05-22T15:30:00Z",
+      "occurrences": 3
+    }
+  ]
+}
+```
+
+### Tier 3: AI Diagnostic — Nano Agent (2-5 seconds)
+
+For **novel failures** that don't match any rule or pattern, invoke a cheap AI model:
+
+```
+Prompt Template (system):
+─────────────────────
+You are a diagnostic agent for a translation proxy that sits between
+OpenAI Codex CLI/Desktop and AI providers (Command Code, OpenAI-compat,
+Anthropic, etc.). You analyze error context and suggest ONE corrective action.
+
+Available actions: restart_proxy, kill_stale_processes, clear_schema_cache,
+switch_provider, increase_timeout, alert_user, ignore, retry_now,
+regenerate_config, cleanup_codex_stale
+
+Respond with ONLY a JSON object: {"action": "...", "reason": "...", "confidence": 0.0-1.0}
+
+Prompt Template (user):
+─────────────────────
+INCIDENT REPORT:
+Time: {timestamp}
+Session: {session_id}
+Proxy health: {alive/dead, port, uptime, memory_mb}
+Upstream: {url, model, last_http_code, last_error}
+Recent errors (last 60s):
+{log_lines}
+Parser state: {parsed_tool_calls, stuck_recovery_count, sanitizer_flags}
+Provider: {backend_type, model}
+History: {last_5_incidents_for_this_pattern}
+
+What corrective action should be taken?
+```
+
+---
+
+## 4. Complete Failure Catalog
+
+### Category A: Proxy-Level Failures (watchdog detects, auto-recovers)
+
+| ID | Failure | Symptoms | Tier 1 Action | Log Signature |
+|----|---------|----------|---------------|---------------|
+| A1 | Proxy process crashed | `/health` returns connection refused | `restart_proxy` | `urllib.error.URLError: [Errno 111] Connection refused` |
+| A2 | Port conflict | `Address already in use` on startup | `kill_stale + restart` | `OSError: [Errno 98] Address already in use` |
+| A3 | Memory leak | Process RSS > 1GB | `restart_proxy` | `/proc/{pid}/status` VmRSS check |
+| A4 | Deadlock | Health check hangs > 15s | `restart_proxy` | health probe timeout |
+| A5 | Unhandled exception | Process exits with non-zero | `restart_proxy` | `SELF-REVIVE CRASH #{n}` |
+| A6 | SSL/TLS error | `CERTIFICATE_VERIFY_FAILED` upstream | `alert_user` | `urllib.error.URLError: certificate verify failed` |
+| A7 | DNS resolution failure | `getaddrinfo failed` | `retry_with_backoff` | `socket.gaierror: Name or service not known` |
+
+### Category B: Upstream Provider Failures (proxy detects, watchdog analyzes)
+
+| ID | Failure | Symptoms | Tier 1 Action | Log Signature |
+|----|---------|----------|---------------|---------------|
+| B1 | Rate limit (429) | Too many requests | `wait_retry_after` | `HTTP 429` + `Retry-After` header |
+| B2 | Server error (5xx) | Provider down | `retry_with_backoff` | `HTTP 500/502/503` |
+| B3 | Auth failure (401/403) | Bad/expired key | `alert_user_bad_key` | `HTTP 401 {"error":"invalid_api_key"}` |
+| B4 | CC upgrade required (403) | Version mismatch | `update_cc_version` | `HTTP 403 upgrade_required` |
+| B5 | Connection timeout | Upstream silent | `retry + increase_timeout` | `urllib.error.URLError: timed out` |
+| B6 | Connection reset | Upstream dropped mid-stream | `restart_proxy` | `ConnectionResetError: Connection reset by peer` |
+| B7 | Broken pipe | Client disconnected | `ignore` | `BrokenPipeError: Broken pipe` |
+| B8 | Upstream 400 bad request | Malformed request | `clear_schema_cache` | `HTTP 400 {"error":"...expected string..."}` |
+| B9 | Provider capacity (503) | Overloaded | `switch_provider` | `HTTP 503` after 3 retries |
+| B10 | Cloudflare block (403/1010) | Bot detection | `check_browser_ua` | `HTTP 403 error 1010` |
+
+### Category C: Parser/Format Failures (Intelligence Routing handles, watchdog tracks)
+
+| ID | Failure | Symptoms | Auto-Fix (IR Layer) | Watchdog Escalation |
+|----|---------|----------|--------------------|--------------------|
+| C1 | Bare `<explore_agent>` | `parsed_tool_calls=0` | Layer 1: URL extraction | If 3x in a row → suggest model switch |
+| C2 | `<require_escalation>` block | Model wants permissions | Layer 2: Auto-proceed | If 5x → suggest different provider |
+| C3 | Unrecognized format | No parser matches | Layer 3: Intent synthesis | If 5x → log for AI diagnosis |
+| C4 | Double-wrapped cmd | `cmd = "{\"cmd\": ...}"` | Sanitizer: unwrap | If cmd still JSON → alert |
+| C5 | Suspicious cmd (JSON) | `cmd starts with {` | Sanitizer: flag | If 3x → clear cache + restart |
+| C6 | Empty cmd | `cmd = ""` or `cmd = "{}"` | Sanitizer: diagnostic echo | If 3x → suggest model switch |
+| C7 | Bare `{` token | Model outputs incomplete JSON | Layer 3: heuristic 5 | If persistent → AI diagnosis |
+| C8 | `<bash>` without cmd | Block has sandbox but no command | Layer 3: heuristic | If 3x → AI diagnosis |
+| C9 | DSML name mismatch | `name="cmd"` vs `name="command"` | DSML parser handles both | Self-test catches regression |
+| C10 | Stuck model loop | Same recovery 5+ times | Layer 3 max 3x then alert | Switch model or provider |
+
+### Category D: Codex Process Failures (watchdog detects, alerts user)
+
+| ID | Failure | Symptoms | Action | Log Signature |
+|----|---------|----------|--------|---------------|
+| D1 | Codex process killed | PID gone from pids.json | `alert_user_restart` | Process not in `/proc/{pid}` |
+| D2 | Codex memory explosion | RSS > 4GB | `alert_user_memory` | `/proc/{pid}/status` check |
+| D3 | Codex 300s stall | `stream disconnected` loop | `restart_proxy` | Codex stderr: `stream disconnected` |
+| D4 | Config corruption | `database disk image is malformed` | `regenerate_config` | Codex stderr: `malformed` |
+| D5 | Session context overflow | `context_length_exceeded` | `alert_user_context` | Codex stderr: `context_length_exceeded` |
+| D6 | WebSocket reconnect loop | `Reconnecting... N/5` | `check_proxy_health` | Codex stderr: `Reconnecting` |
+
+### Category E: Config/State Failures (watchdog detects, auto-fixes)
+
+| ID | Failure | Symptoms | Action | Detection |
+|----|---------|----------|--------|-----------|
+| E1 | Schema cache corruption | `content_type: "array"` in provider-caps.json | `delete_provider_caps` | Read file, check for known-bad values |
+| E2 | Stale PID file | pids.json has dead PIDs | `cleanup_pids` | Check `/proc/{pid}` existence |
+| E3 | Port from old session | config.toml has stale port | `regenerate_config` | Port in config != running port |
+| E4 | OAuth token expired | Google/Gemini token refresh fails | `alert_user_reauth` | Token file `expiry_ts < now` |
+| E5 | BGP all routes down | Every route returned error | `alert_user_no_provider` | All routes in cooldown |
+
+---
+
+## 5. Component Design
+
+### 5.1 Health Watcher Thread
+
+Runs in the GUI process as a background thread. Pings proxy `/health` endpoint every 5 seconds.
+
+```python
+class HealthWatcher(threading.Thread):
+    def __init__(self, proxy_port, on_failure, on_recovery):
+        super().__init__(daemon=True)
+        self.proxy_port = proxy_port
+        self.on_failure = on_failure
+        self.on_recovery = on_recovery
+        self.check_interval = 5  # seconds
+        self.failures = 0
+        self.running = True
+    
+    def run(self):
+        while self.running:
+            healthy = self._check_health()
+            if healthy:
+                if self.failures > 0:
+                    self.failures = 0
+                    self.on_recovery()
+            else:
+                self.failures += 1
+                if self.failures >= 3:  # 15s of consecutive failures
+                    self.on_failure(self.failures)
+            time.sleep(self.check_interval)
+    
+    def _check_health(self):
+        try:
+            req = urllib.request.Request(f"http://localhost:{self.proxy_port}/health")
+            resp = urllib.request.urlopen(req, timeout=5)
+            return resp.status == 200
+        except Exception:
+            return False
+```
+
+### 5.2 Log Analyzer Thread
+
+Tails the debug log and extracts failure signals in real-time.
+
+```python
+FAILURE_SIGNALS = {
+    "parsed_tool_calls=0":      ("C1", "parser_empty"),
+    "[STUCK-RECOVERY]":         ("C3", "stuck_recovery"),
+    "suspicious cmd":           ("C4", "sanitizer_flag"),
+    "empty cmd recovered":      ("C6", "empty_cmd"),
+    "HTTP 429":                 ("B1", "rate_limited"),
+    "HTTP 500":                 ("B2", "server_error"),
+    "HTTP 401":                 ("B3", "auth_failure"),
+    "HTTP 403":                 ("B4", "forbidden"),
+    "Connection refused":       ("A1", "proxy_dead"),
+    "Address already in use":   ("A2", "port_conflict"),
+    "Broken pipe":              ("B7", "broken_pipe"),
+    "Connection reset":         ("B6", "connection_reset"),
+    "timed out":                ("B5", "timeout"),
+    "SELF-REVIVE CRASH":        ("A5", "proxy_crash"),
+    "stream error":             ("B6", "stream_error"),
+}
+
+class LogAnalyzer(threading.Thread):
+    def __init__(self, log_path, on_signal):
+        super().__init__(daemon=True)
+        self.log_path = log_path
+        self.on_signal = on_signal
+        self.running = True
+    
+    def run(self):
+        fh = open(self.log_path, "r")
+        fh.seek(0, 2)  # seek to end
+        while self.running:
+            line = fh.readline()
+            if not line:
+                time.sleep(0.5)
+                continue
+            for pattern, (fault_id, category) in FAILURE_SIGNALS.items():
+                if pattern in line:
+                    self.on_signal(fault_id, category, line.strip())
+                    break
+```
+
+### 5.3 AI Diagnostic Agent
+
+Invoked by the watchdog when a failure doesn't match Tier 1 rules or Tier 2 patterns.
+
+```python
+class AIDiagnosticAgent:
+    def __init__(self, provider_url, model, api_key):
+        self.provider_url = provider_url
+        self.model = model
+        self.api_key = api_key
+        self.system_prompt = DIAGNOSTIC_SYSTEM_PROMPT  # defined below
+        self.incident_store = IncidentStore()
+    
+    def diagnose(self, context):
+        # Tier 2: Check incident store first
+        pattern = self._extract_pattern(context)
+        known_fix = self.incident_store.lookup(pattern)
+        if known_fix and known_fix["success_rate"] > 0.7:
+            return known_fix["fix"], "tier2_pattern", known_fix["success_rate"]
+        
+        # Tier 3: Ask AI
+        prompt = self._build_prompt(context)
+        response = self._call_model(prompt)
+        action = self._parse_response(response)
+        
+        # Learn from this incident
+        if action:
+            self.incident_store.record(pattern, action)
+        
+        return action, "tier3_ai", None
+    
+    def _call_model(self, prompt):
+        body = {
+            "model": self.model,
+            "messages": [
+                {"role": "system", "content": self.system_prompt},
+                {"role": "user", "content": prompt}
+            ],
+            "max_tokens": 200,
+            "temperature": 0.1,
+        }
+        req = urllib.request.Request(
+            self.provider_url,
+            data=json.dumps(body).encode(),
+            headers={
+                "Content-Type": "application/json",
+                "Authorization": f"Bearer {self.api_key}",
+            }
+        )
+        resp = urllib.request.urlopen(req, timeout=15)
+        return json.loads(resp.read())["choices"][0]["message"]["content"]
+```
+
+### 5.4 Incident Store
+
+JSON file that accumulates failure patterns and their resolutions.
+
+```json
+{
+  "version": 1,
+  "incidents": {
+    "parser_empty+explore_agent": {
+      "fault_ids": ["C1"],
+      "fix": "synth_explore_from_urls",
+      "source": "intelligent_routing",
+      "success_count": 8,
+      "fail_count": 1,
+      "last_seen": "2026-05-22T16:00:00Z",
+      "auto_applied": true
+    },
+    "server_error+repeat_3x": {
+      "fault_ids": ["B2"],
+      "fix": "switch_provider",
+      "source": "tier1_rule",
+      "success_count": 2,
+      "fail_count": 0,
+      "last_seen": "2026-05-22T14:00:00Z",
+      "auto_applied": true
+    }
+  },
+  "ai_diagnostic_calls": 0,
+  "tokens_used": 0,
+  "cost_usd": 0.0
+}
+```
+
+### 5.5 Diagnostic Agent System Prompt
+
+```
+You are a diagnostic agent for "Codex Launcher" — a desktop app that runs a local
+translation proxy between OpenAI Codex CLI/Desktop and various AI providers.
+
+## Your Job
+Analyze the incident report and recommend ONE corrective action.
+
+## Available Actions
+- restart_proxy: Kill and restart translate-proxy.py
+- kill_stale_processes: Kill orphaned proxy/codex processes
+- clear_schema_cache: Delete ~/.cache/codex-proxy/provider-caps.json
+- switch_provider: Switch to a different configured endpoint
+- increase_timeout: Increase upstream timeout for slow providers
+- regenerate_config: Regenerate Codex config.toml
+- cleanup_codex_stale: Run cleanup-codex-stale.sh
+- alert_user: Show notification to user (can't auto-fix)
+- ignore: Transient error, no action needed
+- retry_now: Immediate retry without changes
+
+## Decision Rules
+- If upstream returns 401/403 with auth error → alert_user (can't fix bad keys)
+- If proxy process is dead → restart_proxy
+- If same error repeated 5+ times → switch_provider or alert_user
+- If error is about content_type/schema → clear_schema_cache
+- If "Address already in use" → kill_stale_processes then restart_proxy
+- If timeout and upstream is slow → increase_timeout
+- If single transient 429/502/503 → ignore (retry handles it)
+- If "stream disconnected" and proxy is healthy → ignore (Codex retries)
+
+## Response Format
+Reply with ONLY a JSON object:
+{"action": "...", "reason": "...", "confidence": 0.0-1.0}
+
+No explanation, no markdown, no extra text.
+```
+
+---
+
+## 6. GUI Integration
+
+### AI Monitoring Panel (in Settings tab)
+
+```
+┌─────────────────────────────────────────────────────────┐
+│  AI Monitoring                                    [ON]  │
+│                                                          │
+│  ┌─ Diagnostic Agent ─────────────────────────────────┐ │
+│  │ Provider: [OpenCode Zen          ▼]                │ │
+│  │ Model:    [Qwen3-32B              ▼]                │ │
+│  │ API Key:  [sk-•••••••••••••••••••• ]                │ │
+│  │                                                     │ │
+│  │ Cost this month: $0.12 (3 diagnostic calls)         │ │
+│  │ Tokens used: 1,847 input / 423 output               │ │
+│  └─────────────────────────────────────────────────────┘ │
+│                                                          │
+│  ┌─ Incident Log (last 7 days) ──────────────────────┐  │
+│  │ ✅ 16:00 F1 parser_empty → synth_explore (Tier 2) │  │
+│  │ ⚠️ 15:30 B2 server_error → retry (Tier 1)         │  │
+│  │ ✅ 15:00 A1 proxy_dead → restart_proxy (Tier 1)    │  │
+│  │ 🤖 14:30 C3 novel_format → clear_cache (Tier 3)   │  │
+│  │ ...                                               │  │
+│  └────────────────────────────────────────────────────┘  │
+│                                                          │
+│  [View Full Diagnostics]  [Export Incident Report]       │
+└─────────────────────────────────────────────────────────┘
+```
+
+### Config Storage (in endpoints.json)
+
+```json
+{
+  "ai_monitoring": {
+    "enabled": true,
+    "provider_url": "https://opencode.ai/zen/v1/chat/completions",
+    "model": "Qwen/Qwen3-32B",
+    "api_key": "sk-...",
+    "tier1_enabled": true,
+    "tier2_enabled": true,
+    "tier3_enabled": true,
+    "auto_restart_proxy": true,
+    "auto_switch_provider": false,
+    "health_check_interval_s": 5,
+    "max_memory_mb": 1024,
+    "notification_level": "important_only"
+  }
+}
+```
+
+### Recommended Models (by cost)
+
+| Model | Cost/Diagnosis | Latency | Quality | Recommended For |
+|-------|---------------|---------|---------|----------------|
+| **Qwen3-32B** (OpenCode) | ~$0.0005 | 2-4s | Good | Default — cheapest decent model |
+| **DeepSeek V4 Flash** | ~$0.0003 | 2-3s | Good | Cheapest option |
+| **GPT-4o-mini** | ~$0.001 | 1-2s | Excellent | Best quality/latency |
+| **Gemini 2.0 Flash** | ~$0.0002 | 1-2s | Good | Cheapest + fastest |
+| **Claude Haiku 4.5** | ~$0.001 | 2-3s | Excellent | Best reasoning quality |
+| **Local Ollama** (if running) | $0 | 5-15s | Varies | Zero-cost offline option |
+
+### Cost Estimate
+
+- Average diagnostic prompt: ~800 tokens input, ~100 tokens output
+- Expected frequency: ~1-5 incidents per day that reach Tier 3
+- **Monthly cost**: $0.10 - $1.50 depending on model and usage
+
+---
+
+## 7. Watchdog Response Flow
+
+```
+Failure Detected
+      │
+      ▼
+┌─────────────┐    YES    ┌──────────────────┐
+│ Tier 1 Rule? ├─────────►│ Execute Action    │
+│ (known)      │           │ Log incident      │
+└──────┬───────┘           └──────────────────┘
+       │ NO
+       ▼
+┌─────────────┐    YES    ┌──────────────────┐
+│ Tier 2 Match?├─────────►│ Apply Known Fix   │
+│ (incident DB)│           │ Update success    │
+└──────┬───────┘           └──────────────────┘
+       │ NO
+       ▼
+┌─────────────┐   YES     ┌──────────────────┐
+│ AI Enabled?  ├─────────►│ Collect Context   │
+│ (Tier 3)     │           │ Build Prompt      │
+└──────┬───────┘           │ Call AI Model     │
+       │ NO                │ Parse Response    │
+       ▼                   │ Execute if auto   │
+┌─────────────┐           │ Store incident    │
+│ Alert User   │           └──────────────────┘
+│ (can't fix)  │
+└─────────────┘
+```
+
+---
+
+## 8. Safety Guards
+
+1. **Rate limit AI calls** — max 1 Tier 3 call per 60 seconds, max 10 per day
+2. **Never auto-execute destructive actions** — `alert_user` for: delete files, change API keys, modify source code
+3. **Auto-restart cap** — max 5 proxy restarts per 10 minutes, then alert user
+4. **Cost cap** — monthly AI diagnostic budget (configurable, default $2/month)
+5. **Cooldown per pattern** — same failure pattern has escalating cooldown (30s → 60s → 300s → alert)
+6. **User override** — any auto-action can be cancelled within 3 seconds via GUI
+7. **Incident store max size** — 500 entries, LRU eviction
+8. **Health check bypass** — if user manually stopped proxy, don't alert
+
+---
+
+## 9. Implementation Plan
+
+### Phase 1: Core Watchdog (v3.8.0)
+- `HealthWatcher` thread in `codex-launcher-gui`
+- `LogAnalyzer` thread tailing `cc-debug.log` and `proxy.log`
+- Tier 1 rule engine with all 20+ rules
+- Incident store (JSON file)
+- GUI toggle (ON/OFF) in settings
+- Auto-restart proxy on crash
+
+### Phase 2: Pattern Learning (v3.8.1)
+- Tier 2 incident store lookup
+- Auto-learn from Intelligence Routing outcomes
+- Success rate tracking per pattern
+- Incident log viewer in GUI
+
+### Phase 3: AI Diagnostic Agent (v3.9.0)
+- Tier 3 AI model integration
+- Provider/model selector in GUI
+- Diagnostic prompt template
+- Cost tracking
+- Full incident report export
+
+### Phase 4: Advanced Recovery (v4.0.0)
+- Auto-switch to backup provider on repeated failure
+- BGP route health monitoring
+- Predictive failure detection (memory growth, latency trends)
+- Codex process memory monitoring
+- WebSocket reconnect assistance
+
+---
+
+## 10. File Changes Summary
+
+| File | Changes |
+|------|---------|
+| `codex-launcher-gui` | +HealthWatcher thread, +LogAnalyzer thread, +AI Monitoring panel, +incident log viewer |
+| `translate-proxy.py` | +`/monitoring` endpoint (returns health + metrics), enhanced `/health` with memory/uptime |
+| `~/.cache/codex-proxy/incident-store.json` | New file — incident pattern database |
+| `~/.cache/codex-proxy/monitoring.log` | New file — watchdog activity log |
+| `~/.codex/endpoints.json` | +`ai_monitoring` config section |
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,709 @@
 # Changelog

+## v10.13.6 (2026-05-27)
+
+**Anti-Loop Resilience, Auto Token Refresh, Budget Cap, MSIX Support**
+
+### New Features
+- **Cross-session loop tracker**: Keys by user request hash — detects loops even when client creates new sessions per retry. Resets counter on new tasks.
+- **Tool-call budget**: 150 calls max per task, warning at 80. Injects directive to stop reading and write, instead of killing the session.
+- **File-path read-loop detection**: Same file read 5+ times or 30+ total file reads triggers force-finalize
+- **Auto 401 token refresh**: On 401 transient, force-refreshes Google OAuth token and retries once (both v2 + OA compat handlers)
+- **Model-aware idle timeout**: Flash/mini/haiku models get 120s timeout instead of 300s
+- **Smart compaction summary**: Directive text when read-loop detected in compacted history
+- **`_send_ag_finalize()` helper**: Returns synthetic response for hard terminations
+- **Default provider policy**: Unrecognized providers get balanced compaction (128K context, 60 items)
+- **Anti-stall self-kill fix**: No longer kills own parent process or process group
+- **Codex Desktop Updater**: Check/install/rollback/service management + manual rebuild from source
+- **E2E test suite**: `bash test-antigravity.sh --task` for real CLI task testing
+
+### Bug Fixes
+- Fix `task_retry_count` counting every turn instead of same-task retries (spam bug)
+- Fix tool-call budget killing session instead of injecting directive
+- Fix `_schema` NameError in smart-continue nudge (cobra91 PR #17)
+- Fix `_anti_stall_cleanup()` killing own parent/shell wrapper process
+- Fix OA compat path loop tracker indentation
+- Fix Codex CLI 0.134.0 profile system: separate `~/.codex/<slug>.config.toml` files
+- Fix compaction causing model loops: `max_input_items: 60→200` for 1M-token models
+- Merge cobra91 PR #17: MSIX Desktop launch, button state, `_schema` fix
+
+## v3.13.5 (2026-05-27)
+
+**Anti-Loop & Flash Model Resilience, Auto Token Refresh**
+
+### New Features
+- **Cross-session loop tracker**: Keys by user request hash — detects loops even when client creates new sessions per retry. Resets counter on new tasks.
+- **Tool-call budget**: 150 calls max per task, warning at 80. Injects directive to stop reading and write, instead of killing the session.
+- **File-path read-loop detection**: Same file read 5+ times or 30+ total file reads triggers force-finalize
+- **Smart compaction summary**: Directive text when read-loop detected in compacted history
+- **Model-aware idle timeout**: Flash/mini/haiku models get 120s timeout instead of 300s
+- **Auto 401 token refresh**: On 401 transient, force-refreshes Google OAuth token and retries once
+- **`_send_ag_finalize()` helper**: Returns synthetic response for hard terminations
+- **Default provider policy**: Unrecognized providers get balanced compaction (128K context, 60 items)
+- **Anti-stall self-kill fix**: No longer kills own parent process or process group
+- **E2E test suite with real CLI task**: `test-antigravity.sh --task`
+
+### Bug Fixes
+- Fix `_schema` NameError in smart-continue nudge (cobra91 PR #17)
+- Fix `_anti_stall_cleanup()` killing own parent/shell wrapper
+- Fix task_retry_count counting every turn instead of same-task retries
+- Fix tool-call budget cap killing session instead of injecting directive
+- Merged cobra91 PR #17: MSIX Desktop launch, button state
+
+## v3.13.0 (2026-05-27)
+
+**Codex Desktop Updater, Antigravity E2E, Profile System Fix**
+
+### New Features
+- **Codex Desktop Updater**: `CodexUpdaterWindow` class — check updates, install, rollback, service management, manual rebuild from source (`ilysenko/codex-desktop-linux`)
+- **Antigravity E2E test suite**: `~/.local/bin/test-antigravity.sh` — validates token, REST endpoints, proxy adapter, model resolution
+- **Antigravity prod endpoint working**: `cloudcode-pa.googleapis.com` returns 200 with real responses for `gemini-3-flash`
+
+### Bug Fixes
+- **Fix Antigravity endpoint order**: prod (`cloudcode-pa.googleapis.com`) first, then daily-sandbox, then autopush-sandbox
+- **Fix Antigravity model resolution**: `gemini-3.5-flash-high` → `gemini-3-flash` via `_model_alias` map
+- **Fix OAUTH_PROVIDER derivation**: auto-derived from `BACKEND` env var when running without `--config`
+- **Fix `service_disabled` bail**: only returns error from prod endpoint, skips sandbox endpoints
+- **Fix compaction causing model loops**: `max_input_items: 60→200` (prod), `80→250` (sandbox); `tool_output_limit: 6000→8000`; `compaction: "aggressive"→"conservative"` — model was "forgetting" earlier reads due to aggressive compaction
+- **Fix Codex CLI 0.134.0 profile system**: profiles now written to separate `~/.codex/<slug>.config.toml` files instead of `[profiles.*]` sections in main config
+- **Fix updater false success**: checks for "successfully"/"No update ready" in output text, not return code
+
+## v3.12.1 (2026-05-27)
+
+**Fix Antigravity Adapter (PR #15)**
+
+### Bug Fixes
+- Simplified model resolution, removed broken `_sanitize_gemini_schema()`
+- Restored correct headers
+- Expanded model alias map for all Antigravity variants
+- Re-enabled gRPC fallback by default
+
+## v3.12.0 (2026-05-27)
+
+**gRPC Auto-Fallback for Antigravity Provider (PR #13)**
+
+### New Features
+- **gRPC auto-fallback**: When REST API returns 404 (model not found), automatically retries via gRPC
+- **New `antigravity_grpc` module**: Full protobuf client with CloudCode PredictionService stubs
+- **Display name remapping**: gRPC uses display names (e.g. "Gemini 3.5 Flash (High)") instead of REST slugs
+- **Streaming and unary support**: gRPC fallback works for both streaming and non-streaming requests
+- **Dynamic version fetch with validation**: Probes fetched versions to ensure they work before caching
+- **Antigravity v2 handler rewrite**: Based on anti-api approach with proper safety settings, stopSequences, sessionId
+- **Lazy import**: grpcio is only imported when needed — zero impact if not installed
+
+### Bug Fixes
+- Antigravity 404 caused by invalid version — now validates with probe requests
+- Version fallback: auto-retries with re-fetched version if all endpoints return 404
+
+## v3.11.12 (2026-05-26)
+
+**New Antigravity v2 Handler (Mimicking anti-api)**
+
+### New Features
+- **Complete rewrite of Antigravity handler** based on https://github.com/ink1ing/anti-api approach
+- Safety settings (all OFF), stopSequences, sessionId, requestType: agent
+- functionResponse uses `response: { result: string }` format matching anti-api
+- Endpoint priority: `daily-cloudcode-pa.googleapis.com` first
+- Simplified sanitizer: only deduplicates consecutive user text, never touches tool messages
+
+## v3.11.11 (2026-05-26)
+
+## v3.11.11 (2026-05-26)
+
+**Antigravity Fix: Stricter function_call/output Pairing + Gemini Sanitizer Rewrite (PR #12)**
+
+### Bug Fixes
+- **Stricter function_call/output pairing**: Only includes pairs where BOTH call and output exist — no orphan calls sent to Gemini
+- **Gemini sanitizer rewritten**: Tool messages (`functionCall`/`functionResponse`) are always preserved as-is, never merged or skipped
+- **Text merging more conservative**: Checks last message for tool content before merging consecutive text messages
+- **Final trimming safe**: Only removes plain `message` items, never `function_call_output` (which would break tool pairs)
+- **Merge PR #12**: Fix by qwen-chat coder
+
+## v3.11.10 (2026-05-26)
+
+## v3.11.10 (2026-05-26)
+
+**Antigravity Fix: Interleave function_call/output Pairs, Gemini Turn Trimming (PR #11)**
+
+### Bug Fixes
+- **Fix Antigravity function_call/output ordering**: Tool calls and their responses are now properly interleaved in sequence (`function_call` → `function_call_output` → `function_call` → ...) instead of being grouped separately
+- **Gemini sanitizer trimming**: Leading/trailing non-user turns removed for Google API compliance (Google requires conversation to start and end with user turn)
+- **Stricter role boundary enforcement**: `functionCall` (model) and `functionResponse` (user) never merged across role boundaries
+- **Merge PR #11**: Fix by qwen-chat coder
+
+## v3.11.9 (2026-05-26)
+
+## v3.11.9 (2026-05-26)
+
+**Antigravity Fix: Preserve functionCall/functionResponse in Gemini Sanitizer (PR #10)**
+
+### Bug Fixes
+
+- **Fix Antigravity multi-turn tool use**: The Gemini message sanitizer was incorrectly merging/dropping `functionCall` and `functionResponse` turns, causing Antigravity to think forever without responding. These turns are now always preserved as separate messages.
+- **Merge PR #10**: `fix: preserve functionCall/functionResponse in Gemini sanitizer` (qwen-chat coder)
+
+## v3.11.8 (2026-05-26)
+
+## v3.11.8 (2026-05-26)
+
+**Vision Cache Persistence, PR #8 Merge**
+
+### New Features
+
+- **Vision description cache persisted across requests**: Image descriptions from the vision fallback API are now cached in a file (`~/.cache/codex-proxy/vision-cache.json`) so the same image URL is never described twice — saves API calls and latency
+- **Merge PR #8**: `fix: persist vision description cache across requests` (cobra91)
+
+## v3.11.7 (2026-05-26)
+
+**Vision Auto-Detect, Proactive Non-Vision Model Detection, Unit Tests, Bug Fixes**
+
+### New Features
+
+- **Vision auto-detect fallback**: When no explicit vision fallback is configured, automatically uses the current provider's own vision model (e.g., `0G-Qwen-VL` for OpenAdapter) as the image description API — no separate API key needed
+- **Proactive non-vision model detection**: Models matching name patterns (`glm`, `deepseek`, `llama`, `qwen` without `vl`, etc.) are detected as non-vision on first request without waiting for an error from the provider
+- **Vision preprocessing is now the primary image handling solution**: Replaces old `_strip_images_from_input()` (which just removed images with a placeholder). Images are now described via API and sent as rich text descriptions to text-only models
+- **Merge PR #6**: Vision/OCR preprocessing for text-only models (cobra91)
+- **Merge PR #7**: 177 unit tests for translate-proxy.py (cobra91)
+
+### Bug Fixes
+
+- **AttributeError fix**: `image_url` field can be a string (bare URL) not always a dict — fixed in both `_preprocess_vision_input()` and old strip function
+- **Auth os error 2 fix**: GUI shows "Config missing" message instead of raw OSError when `~/.codex/` directory doesn't exist
+- **Removed duplicate vision functions**: Cleaned up duplicate `_vision_describe_image()`, `_preprocess_vision()`, `_preprocess_vision_input()` from merge
+
+## v3.11.6 (2026-05-26)
+
+**Antigravity Loop Breakers, Vision/OCR Preprocessing, has_content Fix, Auth Error Fix**
+
+### New Features (Antigravity-only, no other providers affected)
+
+- **Per-session loop tracking**: `_ANTIGRAVITY_LOOP_TRACKER` global dict with `_antigravity_loop_key()` function tracks state per session: `latest_user_hash`, `nudge_injected`, `latest_user_appended`, `tool_calls_for_request`, `repeated_tool`, `force_finalize`, `last_tool`, `last_tool_count`
+- **Edit-intent nudge injection**: Injected only on the first turn per request, preventing duplicate nudges across retries
+- **Latest user instruction append**: Appended exactly once per request to prevent redundant instruction stacking
+- **Loop breaker**: If the same tool + arguments is repeated ≥ 5 times in a session, `force_finalize` is triggered to break the infinite loop
+- **Detailed `[antigravity-loop]` logging**: All tracking fields logged on every Antigravity request for debugging
+
+### New Features (All OpenAI-compatible providers)
+
+- **Vision/OCR preprocessing**: When a provider doesn't support images (detected via error messages like "unknown variant image_url", "does not support image"), the proxy automatically calls a configurable vision fallback API (default: Kilo.ai) to describe images as text, then replaces image blocks with text descriptions before sending to text-only models
+- **`_vision_describe_image()`**: Calls vision fallback model to describe a single image, with MD5-based caching to avoid re-describing same URL
+- **`_preprocess_vision()`**: Replaces `image_url`/`input_image` blocks in Chat Completions message format with text descriptions when provider lacks vision support
+- **`_preprocess_vision_input()`**: Same for Responses API input format — runs BEFORE adapter conversion so images are replaced early
+- **Vision error retry**: On HTTP 4xx errors containing image-related keywords, automatically retries with images preprocessed instead of failing
+- **Configurable via env vars**: `VISION_FALLBACK_URL`, `VISION_FALLBACK_MODEL`, `VISION_FALLBACK_KEY`
+- **ProviderSchema `supports_vision` field**: Auto-detected from error responses and persisted in provider-caps.json
+
+### Critical Fixes
+
+- **`has_content` now includes `function_call`** (v3.11.5 fix): `_observe_event` only checked for `"type": "message"` — when models return only tool calls (no text), `has_content` was `False`, causing Codex to loop infinitely and build context until `context_length_exceeded`. Now checks both `"message"` and `"function_call"`.
+- **`has_message`/`has_tool_call` initialized in all 5 locations**: Previous fix added variables inside `_observe_event` closure but missed 4 other `has_content = False` locations, causing `NameError: name 'has_message' is not defined` crashes.
+- **Auth config-not-found error handling**: When Codex's `config.toml` is missing or deleted, `codex login status` returns "Error loading configuration: No such file or directory (os error 2)". Now caught specifically (`OSError errno==2`) and returns ("not_configured", "Config missing — launch once to create") with clear GUI guidance.
+
+### Bug Fixes (GUI)
+
+- **Active endpoint sync**: GUI auto-removes stale endpoint references on startup
+
+## v3.11.5 (2026-05-26)
+
+**Vision Filter, Token-Aware Compaction, Universal Adaptive Compaction, Smart-Continue Text Detection**
+
+### Critical Fixes
+
+- **Token-aware compaction for small-context models (FIX)**: `_crof_compact_for_retry()` had an early return at `len(input_data) <= limit` (item count) — if you had 25 items × 1600 tokens = 40K tokens, it skipped compaction entirely because 25 < 30 (the default item limit). Now also checks estimated token count vs learned model max, and compacts when either item count OR token count exceeds limits. Fixes repeated `context_length_exceeded` errors on models like 0G-GLM-5.1 (~35K token context).
+- **Proactive compaction now token-aware**: Previously only triggered when item count > 30. Now also triggers when estimated tokens exceed 80% of the model's learned token limit, even if item count is below the threshold. Prevents the first-request failure pattern on small-context models.
+- **Compaction aggression threshold**: Changed `est > max_tok` to `est >= max_tok * 0.9` to avoid edge case where estimated tokens exactly equal the limit and compaction is skipped.
+- **Removed all `crof.ai` gates from adaptive compaction**: Proactive compaction, `finish_reason=length` retry, `_crof_record`, and compaction logging were gated behind `"crof.ai" in TARGET_URL`. These gates prevented OpenAdapter and other providers from getting proactive/retry compaction, causing repeated `context_length_exceeded` failures. Now applies universally to ALL providers.
+
+### New Features
+
+- **Vision model detection + image stripping**: `_strip_images_from_input()` and `_model_supports_vision()` detect vision capability by model name pattern. Non-vision models (deepseek, glm, mixtral, llama, command, dbrx, qwen, phi-3) have `input_image`/`image_url` parts stripped and replaced with `[User attached image: filename — this model does not support vision]` text notice. Vision models (gpt-4o, gemini, claude, qwen-vl, glm-5v) keep images intact. Applied in 3 paths: main request, context_length_exceeded retry, smart-continue nudge.
+- **Token estimation and per-model limit learning**: `_estimate_tokens()`, `_estimate_input_tokens()`, `_get_model_max_tokens()`, `_set_model_max_tokens()`. Extracts `~N tokens` from `context_length_exceeded` error messages and stores per-model token limits. Used by proactive compaction and retry compaction to adjust `keep` count dynamically.
+- **Compaction aggression levels**: `_crof_compact_for_retry()` accepts `aggression` parameter (0=normal, 1=extreme). Extreme mode kicks in when estimated tokens > 1.5× the learned limit or on 2nd+ retry attempt. Reduces `keep` count to minimum, ensuring the compacted request fits within model limits.
+- **Smart-continue text-tool detection**: Removed hard requirement for `has_function_call_output(input_data)`. Added `_TOOL_CALL_TEXT_PATTERNS` and `_text_looks_like_tool_calls()` to trigger nudging when model outputs text matching tool-call patterns (e.g., `• (exec_command cmd ...)`, `write_to_file`, `exec_command`) even without prior `function_call_output` in context. Essential for models like 0G-GLM-5.1 that never emit real `function_call_output` items.
+- **Parenthesized tool call regex**: `_PAREN_TC_RE` pattern to match `• (name args...)` format from non-vision models that output tool calls as parenthesized text.
+
+### GUI Fixes
+
+- **Active endpoint sync**: Added `set_active_endpoint()` and `validate_active_endpoint()` to Linux GTK GUI. Syncs `.active-endpoint.json` with `config.toml` on every launch; auto-removes stale references to deleted providers. Fixed `"Error loading configuration: No such file or directory (os error 2)"` crash when active endpoint referenced a deleted provider.
+- **Config state**: `~/.codex/.active-endpoint.json` and `config.toml` model catalog path validated and auto-corrected on GUI startup.
+
+## v3.11.0 (2026-05-26)
+
+**Cobra PR Merge + Smart Continuation + API Key Hot-Reload**
+
+### New Features
+- **Concurrency semaphore (max 3)**: limits parallel upstream requests to prevent rate-limiting
+- **Auto-continue for truncated text**: detects text ending in `:`, `(`, `;`, `…` or `finish_reason=length`, continues seamlessly
+- **SO_REUSEADDR on sticky port**: prevents `TIME_WAIT` from changing port on restart
+- **proxy-stderr.log**: persistent log file for proxy errors
+- **Stream diagnostics**: logs event count, finish reason, content flag, elapsed time after each stream
+- **Timeout/OSError handler**: sends proper `response.failed` SSE event instead of silently dropping connection
+- **Restart Proxy button**: now only restarts proxy without killing Codex Desktop
+- **Tool call argument normalizer**: fixes capital-A `Arguments` key, strips markdown/JSON code block wrapping from tool call arguments
+- **Smart-continue loop (2× retries)**: escalating nudge messages when model returns text-only stop mid-task
+- **XML tool call extraction**: parses `<tool_call>name{args}</tool_call>` from model text output, injects as real `function_call` items
+- **Auto-continue + smart-continue ordered with skip guard**: prevents both from double-firing on the same response
+- **API key hot-reload**: mtime tracking detects config changes, `/admin/reload` endpoint triggers hot-reload, `/admin/verify-key` tests key against upstream
+- **GUI hot-reload**: auto-refreshes proxy key on endpoint edit, verifies with upstream — no proxy restart needed
+- **Synthetic tool-results disabled**: was causing deepseek-v4-pro truncation on opencode.ai
+
+## v3.10.12 (2026-05-26)
+
+**Sticky Endpoint, Claude Fixes, Guardrail Skip, Anti-Stall**
+
+### New Features
+- **Sticky endpoint caching**: remembers which endpoint last succeeded, reuses it on every subsequent request (zero overhead)
+- **Sequential fallback**: if sticky endpoint fails (429/502/503), tries next endpoint in order — no parallel probing, no wasted requests
+- **Endpoint order**: `cloudcode-pa.googleapis.com` first (matches agy CLI), `daily-cloudcode-pa.googleapis.com` as fallback
+- **Anti-stall engine**: kills stale proxy processes and clears `__pycache__` on every new session start
+- **Smart error classification**: distinguishes `quota_exhausted` vs `capacity_exhausted` vs `account_banned` vs `validation_required` vs `service_disabled` vs `auth_permanent`
+- **Rate limit reset time parsing**: extracts cooldown from error body (`quotaResetDelay`, `Resets in ~1h27m`, etc.) for accurate cooldown
+- **Missing Antigravity headers**: `X-Client-Name`, `X-Client-Version`, `x-goog-api-client`, platform-aware `User-Agent`
+- **Session ID**: added `sessionId` to request wrapper for proper session tracking
+
+### Bug Fixes (TRAE Agent)
+- **Guardrail skip for simple messages**: when user sends simple messages (e.g. "hi"), skip injecting `_GEMINI_AGENT_GUARDRAIL` — prevents model from aggressively calling tools and looping `ls -la` 50+ times
+- **Claude tool preservation**: Claude models through Antigravity now keep ALL tool outputs in normalizer (no summarization/truncation) — prevents context loss that broke Claude sessions
+- **Claude compaction guard**: `_adaptive_compact` skipped for Claude models — Claude handles its own context, no forced compaction
+- **Claude normalizer guard**: `_antigravity_normalize_context` skipped for Claude models — avoids stripping Claude-specific message structure
+- **Claude sanitization guard**: Google content sanitization loop skipped for Claude models — prevents mangling Claude's response format
+- **Normalizer model parameter**: `_antigravity_normalize_context` now receives `model` param to distinguish Claude vs Gemini behavior
+
+## v3.10.11 (2026-05-26)
+
+**Hybrid Endpoint Fallback — Redundant Antigravity Endpoints**
+
+### New Features
+- Hybrid endpoint fallback: tries `cloudcode-pa.googleapis.com` then `daily-cloudcode-pa.googleapis.com` on 429
+- `daily-cloudcode-pa.googleapis.com` is the same production endpoint agy-core uses (separate rate limit bucket)
+- 429 errors now log full response body for debugging
+- SERVICE_DISABLED (403) still falls through to next endpoint
+- Rate-limit marking only happens after ALL endpoints fail
+
+### Bug Fixes
+- Fixed 429 on one endpoint immediately failing — now tries fallback before giving up
+- Restored SERVICE_DISABLED fallthrough (was accidentally removed)
+
+## v3.10.10 (2026-05-25)
+
+**Context Normalizer Fix — Compaction Summary Preservation**
+
+### Bug Fixes
+- Fixed normalizer stripping ALL context on resumed sessions after compaction
+- Normalizer no longer auto-resets when compaction summary is present
+- Compaction summaries ("Auto-compacted: N earlier turns") are always preserved
+- Deduplicates consecutive identical `<goal_context>` messages (10→1)
+- Emergency reset now preserves compaction summaries
+- Previous behavior: after compaction reduced 1925→185 items, normalizer saw `n_tool_outputs == 0` and stripped to just `system + latest_user`, losing all context — model responded with "I don't have context"
+
+### hashlib Fix (v3.10.9 hotfix)
+- `_antigravity_normalize_context` crashed with `NameError: hashlib` on resumed sessions
+- Replaced SHA256 duplicate detection with string comparison
+
+## v3.10.9 (2026-05-25)
+
+**Antigravity Overhaul — Context Normalizer, Claude Thinking Fix, Endpoint Lockdown**
+
+### Antigravity Endpoint Lockdown
+- Production-only: `cloudcode-pa.googleapis.com` by default
+- Sandbox/staging blocked unless `ALLOW_ANTIGRAVITY_STAGING=1`
+- 403 SERVICE_DISABLED falls through, 429 returns to client
+
+### AntigravityContextNormalizer
+- Bounded context — no more 136-item polluted requests for "hi"
+- Simple message detector, auto-reset polluted context
+- Duplicate removal, tool output budget, hard char limits
+
+### Claude Thinking Fix (Antigravity-only)
+- Fixed 400 error: `maxOutputTokens=64000` when thinking enabled
+- Snake_case config, VALIDATED toolConfig, proper budgets
+
+### z.ai / OpenRouter (cobra91 PR #4)
+- Full OpenClaw attribution headers, OpenRouter caching
+
+## v3.10.8 (2026-05-25)
+
+**OAuth & Antigravity Endpoint Fixes**
+
+### Re-OAuth Buttons Fixed
+- Linux GUI: `load_oauth_secrets()` was undefined — buttons crashed silently on click
+- Now loads OAuth secrets inline from `~/.config/codex-launcher/oauth-secrets.json`
+- Both Linux and Windows Re-OAuth use PKCE + localhost callback (was deprecated OOB paste)
+
+### Antigravity Staging/Sandbox Blocked by Default
+- Proxy: production `cloudcode-pa.googleapis.com` tried FIRST, sandbox/daily/autopush as fallback only
+- Proxy: 403 SERVICE_DISABLED now falls through to next endpoint instead of returning error immediately
+- Project discovery: validates against production endpoint, not staging-cloudaicompanion.sandbox
+- Antigravity preset `base_url` changed to production (was `daily-cloudcode-pa.sandbox.googleapis.com`)
+- `[antigravity-endpoint]` log line shows which endpoints are being tried
+
+### Other Fixes
+- GLib.idle_add lambda returning truthy tuple fixed (caused repeated callbacks)
+- Windows GUI project discovery also uses production endpoint
+
+## v3.10.7 (2026-05-25)
+
+**Prompt Enhancer — Fix Lost Context After Compaction**
+
+### Prompt Enhancer (Per-Provider Toggle)
+- **Offline mode**: Injects structured XML instructions before every user prompt to keep the model focused, decisive, and context-aware after compaction strips conversation history
+- **AI-powered mode**: Optionally calls an external LLM (configurable model/URL/key) to rewrite vague prompts into clear, actionable instructions
+- Prevents the "had to resend and reword" problem in long sessions where compaction summarizes hundreds of turns
+- **Per-endpoint setting** — enable/disable for each provider independently
+- Configurable in both Linux and Windows GUI: toggle switch, mode selector, enhancer model, URL, API key fields
+
+### How It Works
+- **Offline**: Prepends a `<prompt-enhancer>` block with rules like "never ask for clarification, infer from compacted context, execute decisively"
+- **AI-powered**: Sends the user's prompt + compaction summary to a separate model (e.g. DeepSeek V4 Flash via Freebuff) which rewrites it for clarity, then prepends the offline instructions too
+- Both modes run after compaction but before the request is sent upstream
+
+## v3.10.6 (2026-05-25)
+
+**Freebuff Integration + Codebuff OAuth Fix + Windows Consolidation**
+
+### Freebuff (Free DeepSeek/Kimi)
+- **Freebuff integration**: Free DeepSeek/Kimi models via codebuff.com API
+- Fixed User-Agent to match official SDK: `ai-sdk/openai-compatible/1.0.25/codebuff`
+- Fixed metadata fields: `freebuff_instance_id` + `client_id` (base36 random) + `cost_mode: "free"`
+- Fixed session endpoint: POST empty `{}` body (not `{"model": model}`)
+- GUI preset aliases: "Freebuff (Free DeepSeek/Kimi)", "FreeBuff", "Codebuff (Free DeepSeek/Kimi)" all map to same backend
+
+### Codebuff Fix
+- Fixed Codebuff OAuth: use `www.codebuff.com` (bare `codebuff.com` returns 307 redirect)
+
+### OAuth Secrets & Credentials (All Providers)
+- **OAuth Secrets dialog now shows ALL providers**: Google (Antigravity + Gemini CLI) AND Freebuff/Codebuff
+- **Re-OAuth buttons** for each provider: instantly re-authenticate Google or GitHub/Codebuff
+- Token status indicators (valid/missing) for each Google provider
+- Shows logged-in email and auth status for Freebuff/Codebuff
+- Editable auth token and fingerprint fields for Freebuff/Codebuff
+
+### Windows
+- Windows GUI files consolidated into `src/` (merged by cobra91 via PR #1 and PR #2)
+
+### Proxy & GUI Improvements (cobra91 PR #3)
+- CROF adaptive logic gated to `crof.ai` only — no more log pollution for other providers
+- Data directory consolidation: all data now in `codex-proxy/` (was split across `codex-desktop/`, `codex-launcher/`, `codex-proxy/`)
+- Sticky proxy port: persists in `.last-proxy-port`, reused on restart so Codex Desktop keeps connection
+- Adaptive compact budget raised from 60% to 80% — avoids premature compaction on large-context models (DeepSeek v4 Pro 1M)
+- Config cleanup fix: stale `proxy-*.json` cleanup moved after `_init_runtime()` to avoid deleting active config
+- Windows GUI: added Clear Log, Restart Proxy, View Log buttons
+- **Linux/Windows feature parity**: both GUIs now have identical features
+- Windows GUI: ported OAuth Secrets all-providers dialog (Google + Freebuff/Codebuff with Re-OAuth buttons, token status)
+- Windows GUI: added Codebuff/Freebuff OAuth login flow (GitHub browser-based)
+- Windows GUI: added Sync from Preset button in endpoint editor
+- Linux GUI: added Clear Log + Restart Proxy buttons (matching Windows)
+
+## v3.10.5 (2026-05-25)
+
+**Windows GUI + Context Compaction for Antigravity/Gemini OAuth**
+
+### Windows Native GUI (tkinter)
+- **Windows GUI** in `windows/` folder — full tkinter port by cobra91
+- OAuth Secrets editor, Import JSON, Antigravity model list
+- Shared backend with Linux (same translate-proxy.py)
+- See README for Windows installation and usage
+
+**Context Compaction for Antigravity/Gemini OAuth**
+
+### Fix
+- **Prevent `input token count exceeds maximum` errors** during long conversations
+- Added aggressive compaction policies for Antigravity (`cloudcode-pa`) and Gemini CLI (`googleapis`)
+- Auto-trims old turns when approaching 60% of model context limit (1M tokens for Gemini, 200K for Claude, 128K for GPT-OSS)
+- Added REST model IDs to context size map (`gemini-3-flash`, `gemini-3.1-pro-low`, `claude-sonnet-4-6`, etc.)
+
+## v3.10.4 (2026-05-25)
+
+**Security: OAuth Secrets Editor + Import JSON**
+
+### Security
+- **All hardcoded OAuth secrets removed from source code and git history**
+- OAuth client IDs and secrets now stored locally in `~/.config/codex-launcher/oauth-secrets.json`
+- Git history rewritten to scrub all leaked credentials (0 matches verified)
+- Pre-push hook blocks any future commit containing secrets
+- All old Gitea releases deleted (contained leaked secrets in .deb files)
+
+### New Features
+- **OAuth Secrets editor** in GUI — "OAuth Secrets" button in header bar
+- **Import JSON** button — import `client_secret_*.json` downloaded from Google Cloud Console
+- Supports both `"installed"` and `"web"` JSON formats from Google
+
+### Antigravity Fix (from v3.10.3)
+- Antigravity REST API uses slug IDs, not display names
+- Verified all model IDs with live API testing
+
+## v3.10.3 (2026-05-25)
+
+**Fix Antigravity 404 Errors — Verified REST Model IDs**
+
+### Critical Fix
+- Antigravity REST API (`v1internal:generateContent`) uses slug IDs, not display names
+- Verified all model IDs with live API testing against `daily-cloudcode-pa.sandbox.googleapis.com`
+- Display names map to closest working REST model (e.g. `Gemini 3.5 Flash (High)` → `gemini-3-flash`)
+- Model list now matches agy CLI: Gemini 3.5 Flash (H/M/L), Gemini 3.1 Pro (H/L), Claude Sonnet/Opus 4.6, GPT-OSS 120B
+
+### Working REST Model IDs
+| Display Name | REST ID |
+|---|---|
+| Gemini 3.5 Flash (High) | gemini-3-flash |
+| Gemini 3.5 Flash (Medium) | gemini-3-flash |
+| Gemini 3.5 Flash (Low) | gemini-3.5-flash-low |
+| Gemini 3.1 Pro (High) | gemini-3.1-pro-low |
+| Gemini 3.1 Pro (Low) | gemini-3.1-pro-low |
+| Claude Sonnet 4.6 (Thinking) | claude-sonnet-4-6 |
+| Claude Opus 4.6 (Thinking) | claude-opus-4-6-thinking |
+| GPT-OSS 120B (Medium) | gpt-oss-120b-medium |
+
+## v3.10.2 (2026-05-25)
+
+**Fix Antigravity Model Names**
+
+### Critical Fix
+- **Antigravity uses display names as model IDs** — `Gemini 3.5 Flash (High)` not `gemini-3.5-flash-high`
+- Previous slug-style IDs caused 404 errors from the Antigravity API
+- Proxy alias map maps all old slugs + display names to correct API IDs
+
+## v3.10.0 (2026-05-25)
+
+**Provider Model Editor + Antigravity Model Refresh**
+
+### Provider Editor
+- **Remove Selected** button to remove highlighted model(s) from provider
+- **Clear All** button to empty model list
+- **Sync from Preset** button to refresh model list from current preset definition
+- Preset sync now replaces (not appends) models — fixes stale saved model lists
+
+### Antigravity Models Updated
+- **Gemini 3.5 Flash** (High / Medium)
+- **Gemini 3.1 Pro** (High / Low)
+- **Claude Sonnet 4.6 Thinking**
+- **Claude Opus 4.6 Thinking**
+- **GPT-OSS 120B Medium**
+
+## v3.9.9 (2026-05-25)
+
+**Antigravity Model Refresh**
+
+### Updated Models
+- **Gemini 3.5 Flash** (High / Medium) — new flagship flash model
+- **Gemini 3.1 Pro** (High / Low) — tiered reasoning control
+- **Claude Sonnet 4.6 Thinking** — Anthropic partner model via Antigravity
+- **Claude Opus 4.6 Thinking** — Anthropic partner model via Antigravity
+- **GPT-OSS 120B Medium** — open-weight GPT model via Antigravity
+- Removed stale `antigravity-*` prefixed IDs and old preview models
+
+### Proxy Updates
+- Alias map updated for tiered model IDs (high/medium/low/thinking)
+- Context sizes added for all new Antigravity models
+
+## v3.9.8 (2026-05-25)
+
+**Codex Desktop Model Fix & Global BrokenPipeError Protection**
+
+### Desktop Model Fix
+- **Codex Desktop sending wrong model** (gpt-5.4-mini) instead of user-selected model — now remapped via `CODEX_LAUNCHER_MODEL` env var
+- **Config.toml** now writes `review_model`, `wire_api`, `request_max_retries`, `stream_max_retries`, `stream_idle_timeout_ms` for Desktop compatibility
+- **Proxy model remap** intercepts Desktop forced models (`gpt-5.4-mini`, `gpt-5.5`, etc.) and routes to the user's selected model
+
+### Global Crash Fix
+- **`send_json()` globally catches BrokenPipeError** — no more crashes on client disconnect across all backends
+
+## v3.9.7 (2026-05-25)
+
+**Codebuff Error Forwarding & Crash Fixes**
+
+### Rate Limit Error Forwarding
+- **Real Codebuff error messages** forwarded to user instead of generic "429 Too Many Requests"
+- **HTTP 200 + Responses API format** for rate limits — Codex displays the actual Codebuff message (e.g. "Daily session limit reached. Resets in 29m.") instead of retrying
+- **`retryAfterMs` extraction** from Codebuff 429 responses for accurate cooldown timers
+- **`_codebuff_start_run`** returns actual error body instead of `None` — shows real Codebuff errors
+
+### Crash Fixes
+- **BrokenPipeError crash** on "all accounts exhausted" response — wrapped in try/except
+- **3 SyntaxWarnings** fixed for invalid `\ ` escape sequences in docstrings
+
+## v3.9.6 (2026-05-25)
+
+**Performance & Stability Hardening — Connection Pooling, Stream Idle Timeouts, Retry-After**
+
+Inspired by architectural study of [Codex-Proxy-Server](https://github.com/unluckyjori/Codex-Proxy-Server) (Rust/Axum).
+
+### P0: Connection Pooling & Stream Idle Timeout
+- **Connection pooling** (`http.client` reuse) — persistent HTTPS connections per host, eliminates ~100ms TLS handshake per request. Pool keyed by `{scheme}://{host}:{port}`, reused across requests.
+- **Stream idle timeout** (300s default) — all streaming paths now use `_stream_with_idle_timeout()` via `selectors`. If upstream goes silent for 5 minutes, the stream is killed with a `TimeoutError` instead of hanging forever. Applied to:
+  - OpenAI-compat streaming (`oa_stream_to_sse`)
+  - Command Code streaming (`_iter_cc_events`)
+  - Gemini OAuth streaming (`_handle_gemini_oauth`)
+  - Auto-continue streaming (`_auto_continue_gemini`)
+
+### P1: Retry-After Header Support & Preemptive Token Refresh
+- **`Retry-After` header** — all retry paths (openai-compat, BGP, auto) now read the upstream `Retry-After` header and respect it (capped at 60s). Falls back to exponential backoff if header is absent.
+- **Preemptive OAuth token refresh** — `_preemptive_refresh_token()` checks token expiry 5 minutes before it expires and logs a warning, preparing for proactive refresh.
+
+### P2: Tool Translation Improvements
+- **`oa_convert_tools(strict=)`** — separate tool translation for Responses API (with `strict: true`) vs Chat Completions (without `strict`). Some providers reject the `strict` field in Chat Completions mode.
+- **Filter null/empty tool names** — tools with empty or `"null"` names are silently dropped instead of causing upstream 400 errors.
+
+### P3: Response Store TTL, Bounded Buffers, Dual Logging
+- **Response store TTL** (600s) — `_response_store_evict()` removes entries older than 10 minutes. Prevents unbounded memory growth on long sessions.
+- **Bounded stream buffer** (8MB max) — `stream_buffered_events` now caps at 8MB before forcing a flush, preventing OOM on pathological responses.
+- **`response.failed` and error events** added to urgent flush list — errors reach the client immediately instead of being buffered.
+- **Dual logging** — `proxy.log` in `~/.cache/codex-proxy/` captures all proxy messages alongside stderr. Survives Codex Desktop's stderr piping.
+
+## v3.5.0 (2026-05-22)
+
+**Major Release — Command Code Adapter Overhaul, AI Assist, Self-Revive Watchdog, Debug Infrastructure**
+
+### Command Code Provider — Multi-Format Tool-Call Parser (Critical Bug Fix)
+
+The Command Code (CC) provider adapter in `translate-proxy.py` had a critical bug where the CC model's tool-call output was not being parsed into executable tool calls, causing the Codex agent loop to stop after the first response. The CC model output format **changes between sessions and models** — the parser must handle all observed formats.
+
+**Root Cause:** The CC model returns tool calls as inline text in various formats (raw JSON, XML, DSML tags, HTML-like blocks) within `text-delta` SSE events. The original parser only handled one format. When the model switched output style, tool calls were silently dropped, and Codex received a plain text response instead of executable commands — halting the multi-turn agent loop.
+
+**The Fix — Multi-Format Parser Chain (17 patches):**
+
+A cascading parser chain was built that tries each format in order, first match wins:
+`DSML → <bash> blocks → <explore_agent> → <tool_call type=...> → XML patterns → raw JSON → fallback regex`
+
+- **FIX 1**: `cc_input_to_messages()` — enforce STRING content only (CC `/alpha/generate` rejects content blocks). Tool calls sent as inline JSON text in assistant messages. Tool results as `role: "user"` plain text (NOT `role: "tool"`).
+- **FIX 2**: `x-command-code-version` header always sent (fallback `"0.26.8"`) — prevents 403 `upgrade_required` errors.
+- **FIX 3**: Cleared stale schema cache (`content_type:"array"`) that was corrupting message construction.
+- **FIX 4**: Streaming `try/except` wrapper — catches all streaming errors and sends `response.completed(status:"failed")` event instead of crashing the connection.
+- **FIX 5**: `_extract_raw_json_tool_calls()` — new parser that finds raw JSON tool calls embedded in model text (`{"cmd":"...","type":"tool-call"}`).
+- **FIX 6**: `_extract_args()` three-tier parser — tries direct parse → `codecs.escape_decode` → `unicode_escape` to prevent double-wrapped argument strings.
+- **FIX 7**: `_extract_field()` skips leading `\` before value type check — handles malformed escape sequences in CC output.
+- **FIX 8**: `sandbox_permissions` normalization from parsed dict — converts `{"docker":"full"}` to the flat string format Codex expects.
+- **FIX 9** (REVERTED): Removed adaptive probe system — proved unnecessary, conservative inline-text format is sufficient.
+- **FIX 10**: Comprehensive fix documentation added to proxy file header for maintainability.
+- **FIX 11**: `_unwrap_cmd()` recursive unwrapping — handles double/triple-wrapped `cmd` values at all 7 extraction paths. `_sanitize_tool_calls()` post-extraction validation layer ensures every tool call has valid name + args.
+- **FIX 11c**: XML regex fix — `</tool_call)` had unbalanced parenthesis for ~4000 lines; now uses `[)]?>` to match both `</tool_call)>` and `</tool_call)>`.
+- **FIX 12**: Self-revive watchdog loop — auto-restarts proxy on crash (up to 50x, progressive backoff 1→30s). Controlled by `_SHUTDOWN_REQUESTED` flag on SIGTERM/SIGINT.
+- **FIX 13**: Fallback extraction when main parser returns empty but text contains tool-call signals (`{"cmd":`, `"type":"tool-call"`, `<tool`, `<function=`).
+- **FIX 14**: Parser for `<tool_call type="bash">\n{"command":"..."}` format (actual CC model output) + fixed fallback regex to match BOTH `"cmd"` AND `"command"` keys.
+- **FIX 15**: `<explore_agent>` blocks converted to real `exec_command` with synthesized curl-based repo exploration command.
+- **FIX 16**: `<bash>...</bash>` blocks parsed — extracts `prefix_rule`, `sandbox_permissions`, `justification` via line-oriented parsing.
+- **FIX 17**: DSML tool_call blocks — the **current CC model output format**:
+  - `<｜｜DSML｜｜tool_calls>` wrapper
+  - `<｜｜DSML｜｜invoke name="exec">` with `<｜｜DSML｜｜parameter name="command">` tags
+  - Extracts command from `parameter name="command"` or fallback to `prefix_rule`
+  - Maps `exec`/`bash` → `exec_command`
+
+### Debug Infrastructure
+- **Debug-to-file**: All proxy events, text_buf preview, parser results, and fallback attempts logged to `~/.cache/codex-proxy/cc-debug.log` — works even when stderr is piped by Codex Desktop.
+- **Inline self-test**: `--self-test` flag runs 19 tests covering unwrap, double-wrap, unescaped quotes, XML, function=, sanitizer edge cases.
+- **Per-request logging**: Event types, text_buf content, parser match results written to debug log for every request.
+
+### AI Assist
+- AI Assist integration in launcher GUI for intelligent provider configuration and troubleshooting.
+
+### Self-Revive Watchdog
+- Proxy auto-restarts on crash with progressive backoff (1s → 30s, up to 50 restarts).
+- Clean shutdown on SIGTERM/SIGINT via `_SHUTDOWN_REQUESTED` flag.
+- Eliminates manual proxy restart during long coding sessions.
+
+### Other Improvements
+- `text_buf` in `cc_stream_to_sse` accumulates all `text-delta` events; parsing happens at end-of-stream for complete extraction.
+- Schema cache with 24h staleness TTL for provider capabilities.
+- ErrorAnalyzer learns from 4xx errors on retry (max 2 retries).
+- `cleanup-codex-stale.sh` updated with additional stale process patterns.
+
+## v3.3.0 (2026-05-20)
+
+**Antigravity + Gemini CLI OAuth — full Codex agent loop working**
+
+### Gemini CLI OAuth + Antigravity OAuth
+- Split Google OAuth into separate Gemini CLI OAuth and Google Antigravity OAuth presets/backends.
+- Gemini CLI OAuth uses the Gemini CLI public OAuth client and Code Assist endpoints.
+- Antigravity OAuth uses Antigravity OAuth credentials, Code Assist daily/autopush/prod fallback, and Antigravity-style request wrapping.
+- Added Antigravity version discovery from the updater/changelog with local caching.
+- Added Antigravity model alias mapping from UI-facing `antigravity-*` IDs to upstream Code Assist model IDs.
+
+### Responses API + Tool Flow
+- Added Gemini-style history hardening for Google OAuth requests: removes empty turns, coalesces adjacent roles, drops duplicate user repeats, and enforces user-start/user-end history.
+- Preserves function-call IDs across turns and adds synthetic `thoughtSignature` for historical Gemini function calls, matching Gemini CLI hardening behavior.
+- Fixed Antigravity streaming Responses API compatibility: single assistant message item, text done events, content part done, output item done, final completed event, and connection close.
+- Added `response.function_call_arguments.delta` and `response.function_call_arguments.done` events so Codex can execute Antigravity tool calls and create files.
+- Fixed functionResponse name matching — uses the original functionCall name instead of falling back to call_id.
+- Strengthened Antigravity prompt policy: use tools immediately for file changes, avoid planning-only responses, and answer directly when no suitable tool exists.
+- **Auto-continue on MAX_TOKENS** — when Gemini/Antigravity truncates a text response, the proxy transparently sends a continuation request and concatenates the output so Codex receives the complete response without manual intervention.
+
+### Reliability + Routing
+- Added BGP++ route scoring, route cooldowns, token buckets, and persisted route stats.
+- Added provider policy layer and adaptive context compaction.
+- Added tool-call pairing validation/repair for orphaned tool outputs.
+- Added Endpoint Doctor in the endpoint editor.
+- Added log redaction helper for common API key/token patterns.
+
+## v3.1.0 (2026-05-20)
+
+- Initial Antigravity/Gemini CLI OAuth backend split.
+- Gemini-style history hardening, SSE streaming fixes.
+
+## v3.0.0 (2026-05-20)
+
+**Major architectural overhaul — Phase 0 + Phase 1 of engineering roadmap**
+
+### Proxy (translate-proxy.py)
+- **ThreadingHTTPServer** — serves concurrent requests (no more blocking)
+- **Thread-safe shared state** — OrderedDict response store with locks, Crof state lock, stats lock
+- **Batched + atomic stats writes** — stats buffered in memory, flushed every 5s via `os.replace()`
+- **Graceful shutdown** — SIGTERM/SIGINT drain active connections (up to 5s), reject new with 503
+- **Progressive upstream timeouts** — based on input size and tools (60-300s instead of flat 180s)
+- **Lazy JSON parsing** — skip parsing SSE events unless they contain `response.completed`
+- **Buffered SSE writes** — flush every 30ms, on urgent events, or at 4KB (reduces syscalls)
+- **`/health` endpoint** — returns backend, target, models, BGP route count
+- **Consolidated imports** — all at top, no more missing import crashes
+- **`main()` entry point** — runtime init moved out of module level
+- **TCP_NODELAY** — on all streaming paths (from v2.7.0)
+- **Anthropic prompt caching** — `cache_control: ephemeral` on system prompts (from v2.7.0)
+
+### Launcher (codex-launcher-gui)
+- **Dynamic port allocation** — `_pick_free_port()` picks random free port, no more 8080 conflicts
+- **Proxy health gating** — Codex will NOT launch if proxy fails health check within 15s
+- **Error dialogs** — clear GTK error dialog when proxy startup fails
+- **Atomic config backup/restore** — temp file + `os.replace()`, no more corrupted config.toml
+- **Config transactions** — recovery from interrupted sessions on next startup
+- **Safe cleanup (PID registry)** — only kills processes launched by the app (pids.json)
+- **Proxy stderr piped to log** — real-time proxy logs in launcher UI
+- **Bearer token** — Codex config uses `codex-launcher-local` instead of real API key
+- **Usage Dashboard v2** — OpenUsage-inspired dark theme with status pills, KPI strip, model bars (from v2.7.0)
+
+## v2.7.0 (2026-05-20)
+
+- **Usage Dashboard redesigned** (inspired by OpenUsage design patterns)
+  - Deep Space dark theme with Catppuccin-inspired color palette
+  - Header with animated status dots (OK/WARN/ERR provider health)
+  - KPI summary strip: total providers, requests, token volume, avg latency
+  - Provider cards with colored borders matching health status
+  - Status pills: OK (green), WARN (yellow), ERR (red)
+  - Colored section separators per metric type (Usage=yellow, Models=lavender)
+  - Model composition bar: stacked horizontal segments per model share
+  - Per-model breakdown with mini progress bars, percentage, request counts
+  - Per-model token breakdown (in/out) when available
+  - Token formatting: 1.2M, 45.3K instead of raw numbers
+  - Duration formatting: 1.5h, 3.2m instead of raw seconds
+  - Error section with warning icon
+
+- **TCP_NODELAY streaming optimization**
+  - Disables Nagle's algorithm on streaming connections
+  - Reduces per-packet latency by up to 40ms on small SSE events
+  - Applied to all 4 streaming code paths (openai-compat, retry, command-code, generic)
+
+- **Anthropic prompt caching**
+  - System prompts now sent as `cache_control: ephemeral` structured format
+  - Enables Anthropic's automatic prompt caching (saves tokens + cost on repeated prompts)
+
+## v2.6.1 (2026-05-20)
+
+- **Google OAuth rebuilt to emulate Gemini CLI**
+  - Uses Google's public OAuth client_id (same as gemini-cli)
+  - No `client_secret.json` needed — zero setup required
+  - PKCE (S256 code challenge) + CSRF state protection
+  - Scopes: cloud-platform, generative-language, userinfo.email, userinfo.profile
+  - Redirects to Google's success/failure pages (same as gemini-cli)
+  - Just click "OAuth Login" → browser opens → authorize → done
+  - Token file permissions set to 0600 for security
+
 ## v2.6.0 (2026-05-20)

 - **Usage Dashboard** — per-provider tracking with visual cards
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,32 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project
+
+Codex Launcher — Any AI Provider. Run OpenAI Codex CLI & Desktop with any AI provider.
+
+## Pre-Commit Checklist
+
+- [ ] Run unit tests: `python -m pytest tests/ -v` (all must pass)
+- [ ] Verify cross-platform: no `os.getpgid`, `/proc/`, `pgrep`, `SIGUSR*` without `sys.platform` guard
+- [ ] Check syntax: `python -c "import py_compile; py_compile.compile('src/translate-proxy.py', doraise=True)"`
+- [ ] No hardcoded Unix paths or Windows-only APIs without platform checks
+- [ ] No secrets or API keys in source code
+
+## Development Commands
+
+```bash
+# Run tests
+python -m pytest tests/ -v
+
+# Syntax check
+python -c "import py_compile; py_compile.compile('src/translate-proxy.py', doraise=True)"
+
+# Run proxy locally
+python src/translate-proxy.py --port 8080
+```
+
+## Agent Guidelines
+
+See @AGENTS.md for architecture details, platform compatibility rules, and coding conventions.
--- a/README.md
+++ b/README.md
@@ -9,13 +9,28 @@
  <a href="https://z.ai/subscribe?ic=ROK78RJKNW">z.ai/subscribe</a>
 </p>

+
+<p align="center">
 ---
+If you want fork it, use the Github copy, here it is:
+<a href="https://github.com/roman-ryzenadvanced/Codex-Launcher-Any-AI-Provider">Codex-Any-AI-Provider on Github (Official)</a>
+---
+</p>
+
+

 <h1 align="center">Codex Launcher — Any AI Provider</h1>

 <p align="center">
  <strong>Run OpenAI Codex CLI &amp; Desktop with <em>any</em> AI provider.</strong><br/>
-  OpenCode &bull; Z.AI &bull; Anthropic &bull; Command Code &bull; OpenRouter &bull; Crof.ai &bull; NVIDIA NIM &bull; Kilo.ai &bull; and more
+  Google Antigravity &bull; Gemini CLI &bull; OpenCode &bull; Z.AI &bull; Anthropic &bull; Command Code &bull; Freebuff &bull; OpenRouter &bull; Crof.ai &bull; NVIDIA NIM &bull; OpenAdapter &bull; Kilo.ai &bull; DeepSeek &bull; and more
+</p>
+
+<p align="center">
+  <sub>
+    Windows version by <a href="https://github.com/cobra91">cobra91</a> &bull;
+    Original Linux development by <a href="https://github.com/roman-ryzenadvanced">roman-ryzenadvanced</a>
+  </sub>
 </p>

 <p align="center">
@@ -32,6 +47,10 @@
  <img src="https://img.shields.io/badge/Command_Code-✓-success" /> 
  <img src="https://img.shields.io/badge/Streaming_SSE-✓-success" />
  <img src="https://img.shields.io/badge/Tool_Calls-✓-success" />
+  <img src="https://img.shields.io/badge/AI_Assist-✓-success" />
+  <img src="https://img.shields.io/badge/Intelligence_Routing-✓-success" />
+  <img src="https://img.shields.io/badge/AI_Monitoring-✓-success" />
+  <img src="https://img.shields.io/badge/Self_Revive_Watchdog-✓-success" />
 </p>

 ---
@@ -43,14 +62,16 @@ OpenAI's Codex CLI v2.0+ exclusively uses the **Responses API** — a protocol t
 | Provider | API | Works with Codex? |
 |----------|-----|:-:|
 | OpenAI | Responses API | ✅ |
-| Z.AI | Chat Completions | ❌ |
-| OpenCode | Chat Completions | ❌ |
-| Anthropic | Messages API | ❌ |
-| Command Code | Custom `/alpha/generate` | ❌ |
-| Ollama | Chat Completions | ❌ |
-| OpenRouter | Chat Completions | ❌ |
-| NVIDIA NIM | Chat Completions | ❌ |
-| Crof.ai | Chat Completions | ❌ |
+| Google Antigravity (OAuth) | Code Assist / Gemini Native | ✅ |
+| Gemini CLI OAuth | Code Assist | ✅ |
+| Z.AI | Chat Completions | ✅ |
+| OpenCode | Chat Completions | ✅ |
+| Anthropic | Messages API | ✅ |
+| Command Code | Custom `/alpha/generate` | ✅ |
+| Ollama | Chat Completions | ✅ |
+| OpenRouter | Chat Completions | ✅ |
+| NVIDIA NIM | Chat Completions | ✅ |
+| Crof.ai | Chat Completions | ✅ |

 The protocols differ in **endpoint paths**, **message formats**, **tool-call structures**, **streaming events**, and **completion semantics**. You can't just swap a base URL.

@@ -65,23 +86,23 @@ A three-component system:
 ```
 ┌─────────────────────────────────────────────────────────────────────┐
 │                         Codex Launcher GUI                          │
-│                    (endpoint management + lifecycle)                │
+│              (endpoint management + AI Assist + lifecycle)          │
 └──────────┬─────────────────┬──────────────────┬────────────────────┘
           │                 │                  │
    ┌──────▼──────┐  ┌──────▼──────┐  ┌────────▼─────────┐
    │  Codex      │  │  Native     │  │  Translation     │
    │  Default    │  │  OpenAI     │  │  Proxy           │
-    │  (remove    │  │  (direct    │  │  (port 8080)     │
+    │  (remove    │  │  (direct    │  │  (auto-revive)   │
    │  config)    │  │  URL)       │  │                  │
    └──────┬──────┘  └──────┬──────┘  └────────┬─────────┘
           │                │                   │
           ▼                ▼          ┌────────┴────────┐
    ┌──────────────┐ ┌───────────┐    │                 │
    │ Built-in     │ │ config.   │    ▼                 ▼
-    │ Codex OAuth  │ │ toml      │ ┌────────────┐ ┌───────────┐
-    └──────────────┘ └───────────┘ │ OpenAI     │ │ Anthropic │
-                                   │ Chat Comp. │ │ Messages  │
-                                   └────────────┘ └───────────┘
+    │ Codex OAuth  │ │ toml      │ ┌────────────┐ ┌───────────┐ ┌──────────┐
+    └──────────────┘ └───────────┘ │ OpenAI     │ │ Anthropic │ │ Command  │
+                                   │ Chat Comp. │ │ Messages  │ │ Code     │
+                                   └────────────┘ └───────────┘ └──────────┘
 ```

 ---
@@ -103,20 +124,84 @@ A three-component system:
 - **Browser UA injection** — bypasses Cloudflare bot detection for providers like OpenCode
 - **Smart URL construction** — prevents double-path bugs (`/v1/chat/completions/chat/completions`)
 - **Header forwarding** — preserves client identity headers while filtering hop-by-hop headers
+- **Connection pooling** — persistent HTTPS connections per host, eliminates TLS handshake overhead per request
+- **Stream idle timeout** — kills stalled upstream connections after 5 minutes of silence
+- **Retry-After support** — respects upstream `Retry-After` headers on 429/502/503 responses
+- **Response store TTL** — evicts stored responses older than 10 minutes, prevents memory leaks
+- **Bounded stream buffers** — 8MB cap prevents OOM on pathological responses
+- **Dual logging** — all proxy messages written to both stderr and `~/.cache/codex-proxy/proxy.log`
+- **Vision model detection** (v3.11.5) — automatically strips images for non-vision models (DeepSeek, GLM, Qwen, etc.) and replaces with text notice; vision-capable models (GPT-4o, Gemini, Claude, Qwen-VL) keep images intact
+- **Token-aware compaction** (v3.11.5) — learns per-model token limits from `context_length_exceeded` errors; proactively compacts when estimated tokens exceed 80% of limit; prevents repeated context overflow on small-context models (~35K tokens)
+- **Universal adaptive compaction** (v3.11.5) — compaction now works for ALL providers (was Crof.ai-only); proactive + retry compaction with aggression levels (normal/extreme)
+- **Smart-continue text detection** (v3.11.5) — triggers continuation nudging when model outputs text matching tool-call patterns, essential for text-only models that never emit real `function_call_output` items
+- **Antigravity loop breakers** (v3.11.6) — per-session tracking with automatic finalization when same tool+args repeats 5+ times; edit-intent nudge injected only on first turn; latest user instruction appended exactly once per request
+- **has_content function_call fix** (v3.11.6) — tool-call-only responses now correctly flagged as having content, preventing infinite loops on OpenAdapter/Z.AI/OpenRouter providers
+- **Vision/OCR preprocessing** (v3.11.6) — when provider rejects images, automatically calls a configurable vision fallback API (Kilo.ai) to describe images as text for text-only models; MD5-cached; retries on vision errors with preprocessed text
+- **Auth config-missing fix** (v3.11.6) — graceful handling when Codex config.toml is missing instead of showing raw os error
+- **Codex Desktop Updater** (v10.13.6) — built-in updater window with Check/Install/Rollback buttons, service management, and manual rebuild from source (`ilysenko/codex-desktop-linux`)
+- **Codex CLI 0.134.0 profile system** (v10.13.6) — profiles written to separate `~/.codex/<slug>.config.toml` files for compatibility with Codex CLI 0.134.0+
+- **Anti-loop resilience** (v10.13.6) — cross-session loop tracker keyed by user request hash, tool-call budget (150 calls), file read-loop detection, auto 401 token refresh
+- **Conservative compaction for large models** (v10.13.6) — `max_input_items: 200` for Antigravity's 1M-token models; prevents model from "forgetting" earlier file reads
+- **Antigravity E2E test suite** (v10.13.6) — `bash test-antigravity.sh [--task]` validates token, REST endpoints, proxy adapter, model resolution; `--task` runs real CLI task with anomaly detection
+- **MSIX Desktop support** (v10.13.6) — Windows Store install detection, `shell:AppsFolder` launch, tasklist-based process monitoring (cobra91 PR #17)
 - Zero dependencies — pure Python stdlib

+### Command Code Adapter
+- **Multi-format tool-call parser** — handles all known CC model output formats in a cascading chain:
+  - DSML tags (`<｜｜DSML｜｜invoke>`) — current model format
+  - `<bash>...</bash>` blocks with metadata extraction
+  - `<explore_agent>` blocks converted to real `exec_command`
+  - `<tool_call type="bash">` HTML-like blocks
+  - XML `<function=` patterns
+  - Raw JSON `{"cmd":"..."}` embedded in text
+  - Fallback regex for unrecognized tool-call signals
+- **Three-tier argument parser** — handles double-wrapped, escaped, and unicode-escaped arguments
+- **Recursive unwrapping** — handles double/triple-wrapped `cmd` values
+- **Post-extraction sanitizer** — validates every tool call has valid name + args before forwarding to Codex
+- **ErrorAnalyzer** — learns from 4xx errors, retries with adjusted parameters (max 2 retries)
+- **Schema cache** with 24h staleness TTL for provider capabilities
+
+### Intelligence Routing (v3.7.0)
+- **Three-layer self-healing system** — the agent loop never stalls, even when the model speaks gibberish
+- **Layer 1 — Deep URL Extraction**: When `<explore_agent>` hides URLs inside nested JSON (`messages: [{"content": "https://..."}]`), the parser drills into the JSON structure to find them. Module-level `_build_explore_cmd()` is reused across parser + stream path.
+- **Layer 2 — Escalation Auto-Proceed**: `<require_escalation>` and `<request_escalation_permission>` blocks are detected and auto-resolved — the model doesn't get stuck waiting for permissions that don't exist.
+- **Layer 3 — Intent-Based Command Synthesis**: When ALL parsers fail, 5 heuristics analyze the model's plain-text output and synthesize a working command:
+  1. URL detected → `curl` it
+  2. File path mentioned → `cat` or `ls` it
+  3. Shell command in quotes → extract and run it
+  4. "explore"/"fetch" intent → use the last URL the user mentioned
+  5. "I need to"/"let me" intent → echo a diagnostic so the loop continues
+- **Session URL memory** — `_last_user_urls` deque (20 entries) tracks URLs from user messages across the session, giving the synthesizer context to work with
+- **54 self-test patterns** — comprehensive coverage of all three layers
+
+### AI Monitoring (v3.8.0)
+- **Self-healing watchdog** — the proxy auto-recovers from crashes, the model getting stuck, upstream failures, and more
+- **Three-tier response system**: Tier 1 = rule-based (< 1s), Tier 2 = pattern lookup (< 100ms), Tier 3 = AI diagnostic agent (2-5s)
+- **HealthWatcher thread** — pings proxy `/health` every 5 seconds, auto-restarts on crash
+- **LogAnalyzer thread** — tails debug logs for 18 failure signal patterns in real-time
+- **14 Tier 1 rules** — restart proxy, clear schema cache, kill stale processes, retry with backoff, rate limit handling
+- **Incident pattern store** — learns from every resolved incident, looks up known fixes by success rate
+- **AI diagnostic agent** — user-configurable provider/model (e.g., Gemini Flash, GPT-4o-mini, local Ollama) for diagnosing novel failures
+- **30 fault types** catalogued across 5 categories: proxy failures (A), upstream errors (B), parser failures (C), Codex process failures (D), config/state failures (E)
+- **Safety guards** — rate-limited AI calls, restart caps (5/10min), cooldown per pattern, monthly budget cap
+- **GUI panel** — ON/OFF toggle, provider/model/API key selector, health check interval, auto-restart toggle, incident log viewer
+- **Enhanced `/health`** — returns `uptime_s`, `memory_mb`, `requests_total` for monitoring
+
 ### GTK Launcher (`codex-launcher-gui`)
 - **Endpoint manager** — add, edit, delete, set default providers
- **Provider presets** — one-click setup for 10+ providers with pre-filled URLs and model lists
+- **Provider presets** — one-click setup for 15+ providers with pre-filled URLs and model lists
 - **Model auto-fetch** — pulls available models directly from provider APIs
 - **Bulk model import** — paste a comma/newline-separated list of model IDs
 - **Launch Desktop** — starts Codex Desktop with the selected provider and model
 - **Launch CLI** — opens Codex CLI in a terminal with the selected provider
 - **Codex Default** — launch with built-in OAuth, no proxy or custom config
+- **AI Assist** — integrated AI-powered configuration assistance and troubleshooting
+- **Usage Dashboard** — per-provider tracking with dark theme, KPI strip, model bars, status pills
 - **Profile backup/import** — export and import endpoint configurations as portable JSON bundles
 - **Threaded operations** — model refresh runs in background, UI stays responsive
 - **Process lifecycle** — stall detection, kill/cleanup, config backup/restore around sessions
 - **Config normalization** — automatically strips stale API path suffixes from URLs
+- **Reasoning controls** — per-provider reasoning toggle with effort level selection

 ### Process Management
 - Kills stale electron/webview/app-server processes from previous sessions
@@ -266,6 +351,153 @@ codex-launcher-gui
 2. On launch: backup config → **delete** `config.toml` entirely → start Codex → restore config after exit
 3. Key insight: writing empty strings (`model = ""`, `model_provider = ""`) causes Codex to error with "Model provider `` not found". The config must not exist at all for Codex to fall back to built-in defaults.

+### Phase 7: Command Code Multi-Format Parser — The 17-Fix Odyssey
+
+**Problem:** Command Code provider's tool calls were silently dropped, causing the Codex agent loop to stop after the first response. The CC model returns tool calls as inline text in wildly varying formats that change between sessions and model versions.
+
+**Root Cause Analysis:**
+1. CC's `/alpha/generate` API uses a proprietary protocol — not Chat Completions, not Anthropic Messages
+2. Tool calls appear as inline text within `text-delta` SSE events, not as structured JSON
+3. The model output format is **non-deterministic** — observed 6+ distinct formats:
+   - Raw JSON: `{"cmd":"mkdir -p /foo","type":"tool-call"}`
+   - XML: `<function name="exec_command"><parameter name="cmd">...</parameter></function>`
+   - HTML-like: `<tool_call type="bash">\n{"command":"..."}`
+   - Bash blocks: `<bash>\nprefix_rule: ...\n{"command":"..."}</bash>`
+   - Explore blocks: `<explore_agent>...</explore_agent>`
+   - DSML tags: `<｜｜DSML｜｜invoke name="exec"><｜｜DSML｜｜parameter name="command">...</parameter></invoke>`
+4. Additional complications: double-wrapped arguments, unescaped quotes, unicode escapes, missing fields
+
+**The Fix — 17 Incremental Patches:**
+Built a cascading parser chain (`DSML → bash → explore → tool_call → XML → raw JSON → fallback regex`) that tries each format in order. Each patch addressed a specific format observed in production:
+
+- **FIX 1–4**: Foundation — string-only content, version headers, cache clearing, streaming error handling
+- **FIX 5–8**: Core parsing — raw JSON extraction, three-tier argument parser, field extraction, permission normalization
+- **FIX 9–10**: Cleanup — removed dead code, added documentation
+- **FIX 11–11c**: Robustness — recursive unwrapping of nested cmd values, post-extraction sanitizer, XML regex fix
+- **FIX 12**: Self-revive watchdog — proxy auto-restarts on crash instead of dying silently
+- **FIX 13–17**: New format support — fallback extraction, HTML-like blocks, explore blocks, bash blocks, DSML tags
+
+**Key Design Decision:** Field-level regex extraction instead of JSON parsing. Standard JSON parsers fail on unescaped quotes in shell commands (e.g., `echo "hello world"` breaks JSON). The regex approach tolerates malformed JSON by extracting individual fields.
+
+**Verification:** `--self-test` flag runs 19 automated tests covering all edge cases. Debug logging to `~/.cache/codex-proxy/cc-debug.log` captures every parser decision for troubleshooting.
+
+### Phase 8: Intelligence Routing — When the Model Refuses to Speak Machine
+
+**Problem:** The 17-fix parser chain from Phase 7 was powerful — it could handle DSML, XML, JSON, bash blocks, explore tags, you name it. But there was one edge case it couldn't crack: **when the model doesn't produce a parseable tool-call format at all**.
+
+In production, `deepseek/deepseek-v4-flash` via Command Code kept doing things like:
+
+```
+<explore_agent>
+messages: [{"content": "Understand the Z.AI-Chat-for-Android repo at https://..."}]
+</explore_agent>
+```
+
+or:
+
+```
+<require_escalation>
+I need elevated permissions to access the repository.
+</require_escalation>
+```
+
+or just plain English: *"I need to fetch the README from the repository to understand the app structure."*
+
+In every case, `parsed_tool_calls=0`. No tool to execute. The Codex agent loop ground to a halt. The user saw "thinking..." forever.
+
+**The insight:** The model is trying to communicate *intent*, just not in a format we can parse. Instead of adding more regex patterns, what if we could **read the model's mind** — understand what it *wants* to do, and synthesize the command for it?
+
+**Intelligence Routing — Three Layers of Escalation:**
+
+```
+Layer 1: "Fix the input"     — Can we extract more from what the model gave us?
+Layer 2: "Handle the intent" — Is the model asking for something we can auto-resolve?
+Layer 3: "Read the mind"     — What is the model trying to do? Just do it for it.
+```
+
+**Layer 1 — Deep URL Extraction (FIX 23):**
+
+The `<explore_agent>` handler had a URL regex, but the URL was trapped inside `{"content": "https://..."}` — the trailing `"` broke matching. The fix: after the initial regex fails, `json.loads()` the entire block, walk the JSON tree, and pull URLs out of `content` fields. The `_build_explore_cmd()` function was extracted to module level so both the parser and the stream handler could use it.
+
+```python
+# Before: regex fails, URL lost
+# After: json.loads -> iterate items -> extract content -> find URL
+```
+
+**Layer 2 — Escalation Auto-Proceed (FIX 24):**
+
+`<require_escalation>` blocks are the model's way of saying "I need more permissions." The CC adapter doesn't have an escalation mechanism — these blocks were silently dropped. The fix: detect them (both closed `<tag>...</tag>` and bare `<tag />` forms), extract any URL inside them, and auto-proceed with an explore command or a diagnostic echo.
+
+```python
+# Model: <require_escalation>Please let me run curl</require_escalation>
+# Proxy: Okay, here's your curl command → exec_command synthesized
+```
+
+**Layer 3 — Intent-Based Command Synthesis (FIX 25):**
+
+The crown jewel. When ALL parsers return empty — no DSML, no XML, no JSON, no fallback regex matches — the system doesn't give up. It analyzes the model's raw text through **5 heuristic lenses** in priority order:
+
+| Priority | Signal | Synthesized Command |
+|:--------:|--------|---------------------|
+| 1 | URL in text | `curl` to fetch it |
+| 2 | File path reference | `cat` or `ls` the file |
+| 3 | Shell command in backticks/quotes | Extract and run it |
+| 4 | "explore"/"fetch" + last user URL | Full explore command |
+| 5 | "I need to"/"let me" intent | Echo diagnostic |
+
+The system also maintains a **session URL memory** (`_last_user_urls`, a deque of the last 20 URLs from user messages) so heuristic 4 always has a URL to work with, even when the model's text doesn't contain one.
+
+```python
+# Model: "I should explore the repository to understand its structure."
+# Parser: empty (no parseable format)
+# Layer 3 heuristic 4: "explore" detected, pulling URL from session memory...
+# Result: exec_command with full curl pipeline
+```
+
+**The result:** Before Intelligence Routing, `parsed_tool_calls=0` meant **game over** — the agent loop stalled permanently. After Intelligence Routing, `parsed_tool_calls=0` triggers the self-healing chain and the loop **always** gets a tool call to execute. The model can speak in tongues and the system still works.
+
+**Test coverage:** 54 self-test patterns (up from 41), with 13 new tests specifically for Intelligence Routing layers.
+
+### Phase 9: AI Monitoring — The Watchman That Never Sleeps
+
+**Problem:** Intelligence Routing (Phase 8) handles failures *inside a single request*. But it can't detect a dead proxy process, reconnect Codex to a restarted proxy, switch to a backup provider when the primary is down, or clear corrupt caches. When the proxy crashes at 3 AM, the user wakes up to a broken Codex session and has to manually restart everything.
+
+**The insight:** We needed a separate watchdog process that runs *outside* the proxy — monitoring it from the outside, like a night watchman patrolling a building. But a dumb watchdog that just restarts on crash is crude. What if the watchdog could *think* — diagnose *why* the proxy crashed and take the right corrective action?
+
+**The Three-Tier Response System:**
+
+```
+Failure Detected
+      │
+      ├── Tier 1: Known pattern? → Rule-based fix (< 1 second)
+      │             "proxy dead" → restart_proxy
+      │             "429 rate limit" → wait_retry_after
+      │             "schema corrupt" → delete_provider_caps
+      │
+      ├── Tier 2: Seen this before? → Incident store lookup (< 100ms)
+      │             85% success rate → reuse the fix that worked last time
+      │
+      └── Tier 3: Novel failure? → AI diagnostic agent (2-5 seconds)
+                    Feed context to cheap LLM → get recommended action
+                    Learn from result for next time
+```
+
+**What makes this different from existing solutions:**
+
+Existing proxy tools (ccLoad, cc-proxy, codex-pool) all focus on routing and failover at the *request* level. None have an AI-powered diagnostic agent that analyzes failure context and recommends corrective actions. ccLoad has health checks and cooldowns, but it's purely rule-based. AI Monitoring adds the *intelligence* layer on top — the Tier 3 agent can diagnose novel failures that no rule covers.
+
+**How it works:**
+
+Two threads run in the GUI process:
+1. `HealthWatcher` — pings `/health` every 5 seconds. On 3 consecutive failures, triggers Tier 1 `restart_proxy`.
+2. `LogAnalyzer` — tails the debug log file, watching for 18 signal patterns. Counts consecutive failures per category. When a threshold is hit (e.g., 5x stuck recovery, 3x server error), triggers the appropriate tier.
+
+The AI diagnostic agent (Tier 3) is fully configurable — the user picks any provider and model. A cheap model like Gemini Flash (~$0.0002/call) or a free local Ollama instance works perfectly. The agent receives a structured incident report (proxy health, upstream status, recent errors, parser state) and responds with one JSON action.
+
+**Learning over time:** Every resolved incident is stored in `incident-store.json` with pattern → fix → success rate. Over time, the system shifts from Tier 3 (expensive AI calls) to Tier 2 (instant pattern lookup). A failure seen 10 times with 90% success rate will never reach the AI again.
+
+**Catalogued 30 fault types** across 5 categories based on analysis of 42 production `parsed_tool_calls=0` events, 13 stuck recoveries, and 11 sanitizer flags from our actual debug logs. The system knows exactly what to look for.
+
 ---

 ## Architecture Deep Dive
@@ -332,6 +564,78 @@ The launcher generates model catalog JSON with dual field naming to satisfy both

 ---

+## Gemini Antigravity State Continuity
+
+Codex Launcher includes special handling for Gemini 3 / Antigravity OAuth:
+
+- **Sticky endpoint with parallel discovery**: First request probes `cloudcode-pa.googleapis.com` and `daily-cloudcode-pa.googleapis.com` simultaneously — first 200 wins and is cached. All subsequent requests go straight to the cached endpoint. If it fails (429/502/503), cache is cleared and all endpoints are re-probed in parallel. Zero wasted time on rate-limited endpoints.
+- **Thought signature preservation**: Captures `thoughtSignature` from Gemini responses
+  and reattaches them on follow-up requests to maintain tool-call continuity.
+- **Edit-intent detection**: When follow-up requests contain edit keywords, a tool-use
+  nudge is injected to prevent text-only responses.
+- **User instruction enforcement**: The latest user message is guaranteed to be the
+  final content turn sent to Gemini, even after compaction.
+- **Smart compaction**: Old tool outputs capped at 3000 chars, recent 6 at 20000 chars.
+- **Context compaction**: Aggressive auto-trimming when approaching 80% of model context
+  limit (1M tokens Gemini, 200K Claude, 128K GPT-OSS). Prevents token limit errors.
+- **Model ID mapping**: Display names (e.g. `Gemini 3.5 Flash (High)`) mapped to REST API
+  slugs (e.g. `gemini-3-flash`). See `docs/ANTIGRAVITY.md` for details.
+
+### OAuth Secrets
+
+Google OAuth credentials are stored locally in `~/.config/codex-launcher/oauth-secrets.json`
+and never committed to the repository. Use the **OAuth Secrets** button in the launcher
+header to edit or import `client_secret_*.json` files from Google Cloud Console.
+
+---
+
+## Multi-Account Rotation
+
+Codex Launcher supports **multiple accounts per provider** with automatic rotation
+when one account is rate-limited.
+
+### Codebuff (Multiple Accounts)
+
+Register additional free accounts at [codebuff.com](https://www.codebuff.com), then
+add them to `~/.config/manicode/credentials.json`:
+
+```json
+{
+  "default": { "authToken": "token-primary", "email": "you+1@gmail.com" },
+  "accounts": [
+    { "authToken": "token-secondary", "email": "you+2@gmail.com" },
+    { "authToken": "token-tertiary", "email": "you+3@gmail.com" }
+  ]
+}
+```
+
+Each account gets 5 free requests/day. With 3 accounts = **15 requests/day**.
+
+### Google OAuth (Multiple Projects)
+
+Add additional Google Cloud token files:
+
+```
+~/.cache/codex-proxy/google-antigravity-oauth-token.json     # primary
+~/.cache/codex-proxy/google-antigravity-oauth-token-1.json   # extra project 1
+~/.cache/codex-proxy/google-antigravity-oauth-token-2.json   # extra project 2
+```
+
+### API Keys (Comma-Separated)
+
+For any OpenAI-compatible provider:
+```json
+{ "api_key": "sk-key1,sk-key2,sk-key3" }
+```
+
+### Account Status Endpoint
+
+```bash
+curl http://127.0.0.1:PORT/v1/accounts
+```
+
+---
+
 ## Provider Presets

 | Preset | Backend | Base URL |
@@ -341,13 +645,28 @@ The launcher generates model catalog JSON with dual field naming to satisfy both
 | OpenCode Zen | OpenAI-compat | `https://opencode.ai/zen/v1` |
 | OpenCode Go | OpenAI-compat | `https://opencode.ai/zen/go/v1` |
 | Command Code | Command Code | `https://api.commandcode.ai` |
+| **Codebuff / Freebuff** | **Codebuff** | `https://www.codebuff.com` *(free DeepSeek/Kimi — OAuth login built-in)* |
 | Crof.ai | OpenAI-compat | `https://crof.ai/v1` |
+| OpenAdapter | OpenAI-compat | `https://api.openadapter.in/v1` |
+| Z.ai Coding | OpenAI-compat | `https://api.z.ai/api/coding/paas/v4` |
 | NVIDIA NIM | OpenAI-compat | `https://integrate.api.nvidia.com/v1` |
 | Kilo.ai | OpenAI-compat | `https://api.kilo.ai/api/gateway` |
 | OpenRouter | OpenAI-compat | `https://openrouter.ai/api/v1` |
 | Z.AI | OpenAI-compat | `https://api.z.ai/api/coding/paas/v4` |
+| Google Gemini (API Key) | OpenAI-compat | `https://generativelanguage.googleapis.com/v1beta/openai` |
+| Google Gemini (OAuth) | Gemini OAuth | `cloudcode-pa.googleapis.com` |
+| Google Antigravity (OAuth) | Antigravity OAuth | `daily-cloudcode-pa.sandbox.googleapis.com` |
 | Custom | Any | User-defined |

+### Free Models (via Codebuff/Freebuff)
+Codebuff/Freebuff provides free access to these models — no API key needed:
+- **DeepSeek V4 Pro** — Smartest model
+- **DeepSeek V4 Flash** — Most efficient
+- **Kimi K2.6** — Balanced
+- **MiniMax M2.7** — Fastest
+
+*Requires: `freebuff login` via GUI OAuth button, or `npm install -g freebuff && freebuff login` (GitHub OAuth)*
+
 ---

 ## File Structure
@@ -366,17 +685,63 @@ README.md                         # This file
 ### Installed Locations

 ```
-~/.local/bin/translate-proxy.py       # Proxy
-~/.local/bin/codex-launcher-gui       # Launcher
-~/.local/bin/cleanup-codex-stale.sh   # Cleanup
-~/.local/share/applications/codex-launcher.desktop  # App grid entry
-~/.codex/endpoints.json               # Endpoint storage
-~/.codex/config.toml                  # Codex config (auto-generated)
-~/.cache/codex-proxy/                 # Proxy configs + model catalogs
+/usr/bin/translate-proxy.py               # Proxy (from .deb)
+/usr/bin/codex-launcher-gui               # Launcher (from .deb)
+/usr/bin/cleanup-codex-stale.sh           # Cleanup (from .deb)
+/usr/share/applications/codex-launcher.desktop  # App grid entry
+~/.codex/endpoints.json                   # Endpoint storage
+~/.codex/config.toml                      # Codex config (auto-generated)
+~/.cache/codex-proxy/                     # Proxy configs + model catalogs
+~/.cache/codex-proxy/cc-debug.log         # Debug log (per-request)
 ```

 ---

+### Phase 10: Codebuff Integration — Free AI for Everyone (v3.8.1)
+
+**Problem:** Users want access to powerful models like DeepSeek V4 Pro without paying API fees. Codebuff (by CodebuffAI) offers free access to premium models through their server, but it's a CLI tool — not an API you can plug into Codex Launcher.
+
+**The insight:** Codebuff's backend is a Next.js app with an OpenAI-compatible `/api/v1/chat/completions` endpoint. It uses agent-run lifecycle management and model-specific routing. If we replicate the agent run protocol in our proxy, we can tap into codebuff's free tier.
+
+**How Codebuff works internally:**
+1. User logs in via GitHub OAuth → session token stored in `~/.config/manicode/credentials.json`
+2. Each request creates an **agent run** via `POST /api/v1/agent-runs`
+3. Chat completions sent with `codebuff_metadata: {run_id, cost_mode: "free"}`
+4. Server routes to the correct upstream provider using its own API keys
+5. Agent run finished when request completes
+
+**What we built:**
+
+```
+Codex Request
+     │
+     ▼
+┌─────────────────────────────────┐
+│  translate-proxy.py              │
+│  _handle_codebuff()             │
+│                                  │
+│  1. Read token from credentials  │
+│  2. POST /api/v1/agent-runs      │──→  {action: "START", agentId}
+│  3. POST /api/v1/chat/completions │──→  {model, messages,
+│                                        codebuff_metadata: {
+│                                          run_id, cost_mode: "free"}}
+│  4. Stream response back to Codex   │←──  SSE events
+│  5. POST /api/v1/agent-runs       │──→  {action: "FINISH"}
+└─────────────────────────────────┘
+```
+
+**Free models available:**
+| Model | Agent ID | Notes |
+|-------|----------|-------|
+| DeepSeek V4 Pro | `base2-free-deepseek` | Smartest |
+| DeepSeek V4 Flash | `base2-free-deepseek-flash` | Most efficient |
+| Kimi K2.6 | `base2-free-kimi` | Balanced |
+| MiniMax M2.7 | `base2-free` | Fastest |
+
+**Bonus fix:** While investigating this, we discovered that `endpoints.json` had been overwritten with only 4 AG X entries, losing all 17+ provider presets. Restored all presets from proxy cache files.
+
+---
+
 ## Troubleshooting

 | Issue | Cause | Fix |
@@ -391,6 +756,17 @@ README.md                         # This file
 | Models not showing in picker | Wrong model catalog format | Must have both `slug` + `model` fields |
 | Codex hangs in "thinking" | Missing `response.completed` | Proxy emits full SSE event sequence |
 | Stops after first tool call (Crof) | `previous_response_id` not resolved | V2.1.2 stores and chains responses for multi-turn |
+| CC agent stops after first response | Tool calls not parsed from model text | V3.5 multi-format parser handles all CC output formats |
+| CC tool calls have wrong args | Double-wrapped arguments | V3.5 three-tier parser + recursive unwrapping |
+| Proxy crashes mid-session | Unhandled streaming error | V3.5 self-revive watchdog auto-restarts |
+| CC 403 upgrade_required | Missing version header | V3.5 always sends `x-command-code-version` |
+| CC explore_agent can't find URL | URL hidden inside JSON messages | V3.7 Layer 1 drills into JSON to extract URLs |
+| CC agent stalls on escalation blocks | `<require_escalation>` not handled | V3.7 Layer 2 auto-proceeds past escalation requests |
+| CC agent stalls — no tool calls at all | Model output format unrecognized | V3.7 Layer 3 synthesizes command from text intent |
+| Proxy crashes mid-session | Unhandled streaming error | V3.8 AI Monitor auto-restarts proxy |
+| Proxy port conflict on restart | Stale process holding port | V3.8 AI Monitor kills stale + restarts |
+| Schema cache corruption | ErrorAnalyzer learned wrong schema | V3.8 AI Monitor auto-clears provider-caps.json |
+| Upstream 500 repeatedly | Provider having issues | V3.8 AI Monitor detects pattern + alerts/switches |

 ---

@@ -426,15 +802,70 @@ codex --profile my-profile -c model=my-model

 ---

+## Windows Version
+
+A native **Windows GUI** (tkinter) is available in the `src/` folder alongside the Linux version. Both GUIs have **full feature parity**.
+
+<p align="center">
+  <sub>
+    Windows version by <a href="https://github.com/cobra91">cobra91</a> &bull;
+    Original Linux development by <a href="https://github.com/roman-ryzenadvanced">roman-ryzenadvanced</a>
+  </sub>
+</p>
+
+### Files
+
+| File | Purpose |
+|---|---|
+| `src/codex-launcher-gui.py` | tkinter GUI (Windows) — manage endpoints, launch Codex CLI/Desktop |
+| `src/codex-launcher-gui` | GTK GUI (Linux) — same features, native GTK look |
+| `src/codex_launcher_lib.py` | Shared library — proxy lifecycle, config, OAuth, diagnostics |
+| `src/translate-proxy.py` | Proxy — translates Responses API for any provider |
+
+### How to Run (Windows)
+
+Python ≥ 3.8 with tkinter is required (comes with the official Python installer).
+
+```powershell
+# From repo root
+cd src
+python codex-launcher-gui.py
+```
+
+The GUI will:
+1. Auto-create default endpoints on first run
+2. Show a toolbar with Endpoints, OAuth Secrets, AI Monitor, and more
+3. Launch Codex CLI/Desktop with your chosen provider
+
+### OAuth Credentials
+
+Google OAuth (Antigravity / Gemini CLI) requires a `client_secret_*.json` from [Google Cloud Console](https://console.cloud.google.com/apis/credentials). Use the **OAuth Secrets** button in the GUI to import it — credentials are stored locally in `~/.config/codex-launcher/oauth-secrets.json`, never in the repo.
+
+The **OAuth Secrets** dialog shows all providers (Google + Freebuff/Codebuff) with **Re-OAuth buttons** to instantly re-authenticate any provider.
+
+### Feature Parity
+
+Both Linux (GTK) and Windows (tkinter) GUIs have identical features:
+- All provider presets, endpoint management, BGP routing
+- OAuth Secrets with all providers + Re-OAuth buttons
+- AI Monitor, Usage Dashboard, Request History, Benchmark
+- Clear Log, Restart Proxy, View Log
+- Doctor, Diagnostic Agent, Profile Backup/Import
+- Antigravity model mapping, context compaction (80% budget)
+- Multi-account rotation, rate limit handling
+
+---
+
 ## Requirements

 - Python ≥ 3.8
- python3-gi (`sudo apt install python3-gi`)
+- python3-gi (`sudo apt install python3-gi`) — Linux only
+- tkinter (`python3-tk`) — Windows / Linux GUI
 - Codex CLI ≥ 2.0
 - Codex Desktop (optional, for Desktop mode)
- bash, curl, lsof
+- bash, curl, lsof — Linux only

-**No pip dependencies.** Zero. Pure stdlib + system GTK.
+**No pip dependencies.** Zero. Pure stdlib.

 ---

--- a/6402
+++ b/6402
--- a/codex-launcher-gui.py
+++ b/codex-launcher-gui.py
--- a/codex-launcher_10.13.6_all.deb
+++ b/codex-launcher_10.13.6_all.deb
--- a/codex-launcher_10.13.8_all.deb
+++ b/codex-launcher_10.13.8_all.deb
--- a/codex-launcher_2.6.0_all.deb
+++ b/codex-launcher_2.6.0_all.deb
--- a/codex-launcher_3.10.10_all.deb
+++ b/codex-launcher_3.10.10_all.deb
--- a/codex-launcher_3.10.11_all.deb
+++ b/codex-launcher_3.10.11_all.deb
--- a/codex-launcher_3.10.12_all.deb
+++ b/codex-launcher_3.10.12_all.deb
--- a/codex-launcher_3.10.9_all.deb
+++ b/codex-launcher_3.10.9_all.deb
--- a/codex-launcher_3.12.1_all.deb
+++ b/codex-launcher_3.12.1_all.deb
--- a/codex-launcher_3.13.0_all.deb
+++ b/codex-launcher_3.13.0_all.deb
--- a/codex-launcher_3.13.5_all.deb
+++ b/codex-launcher_3.13.5_all.deb
--- a/codex_launcher_lib.py
+++ b/codex_launcher_lib.py
--- a/docs/ANTIGRAVITY.md
+++ b/docs/ANTIGRAVITY.md
@@ -0,0 +1,335 @@
+# Antigravity (Google CloudCode) — Technical Reference
+
+Everything needed to understand, maintain, and debug the Antigravity OAuth provider integration in Codex Launcher.
+
+---
+
+## 1. What Is Antigravity?
+
+Antigravity is Google's internal codename for **Google CloudCode** — a cloud-based AI coding agent powered by Gemini and other models. The CLI tool (`agy`) is a native Go binary that uses gRPC to communicate with Google's CloudCode backend.
+
+- **Official CLI binary**: `~/.local/bin/agy-core` (ELF x86-64 Go binary, ~183MB)
+- **Wrapper script**: `~/.local/bin/agy` (Python, manages provider switching)
+- **CLI settings**: `~/.gemini/antigravity-cli/settings.json`
+- **Provider state**: `~/.gemini/antigravity-cli/agy_provider.json`
+
+---
+
+## 2. Two API Protocols — REST vs gRPC
+
+### 2.1 What the agy CLI uses (gRPC)
+
+The native `agy-core` binary uses **gRPC** to communicate with the CloudCode backend:
+
+- **Service**: `google.internal.cloud.code.v1internal.PredictionService`
+- **Methods**:
+  - `GenerateContent` — main inference
+  - `FetchAvailableModels` — list available models
+  - `CountTokens` — token counting
+  - `RetrieveUserQuota` — quota check
+- **Other services**: `CloudCode`, `JetskiService` (settings, plugins, etc.)
+- **Proto files**: `google/internal/cloud/code/v1internal/prediction_service.proto`, `cloudcode.proto`
+- **Model IDs in gRPC**: Display names like `"Gemini 3.5 Flash (High)"` — verified from `settings.json`
+
+### 2.2 What our proxy uses (REST)
+
+Our Codex Launcher proxy does NOT use gRPC. It uses the **REST API** that the CloudCode backend also exposes:
+
+- **Endpoint path**: `v1internal:generateContent` (non-streaming) / `v1internal:streamGenerateContent?alt=sse` (streaming SSE)
+- **This is NOT the standard Gemini REST API** — it's the CloudCode-internal REST gateway
+- **Model IDs in REST**: Slug-style IDs like `gemini-3-flash` — NOT display names
+- **The REST API is more limited** — fewer model variants available than gRPC
+
+### 2.3 Why not gRPC?
+
+The agy binary uses gRPC with protobuf serialization. Using gRPC from the proxy would require:
+- Maintaining proto definitions (compiled from the binary)
+- More complex streaming
+- The `grpcio` Python library (not installed by default)
+
+The REST API works well enough for our use case.
+
+---
+
+## 3. Endpoints
+
+The proxy tries these endpoints in order for Antigravity:
+
+```
+1. https://daily-cloudcode-pa.sandbox.googleapis.com  (primary)
+2. https://autopush-cloudcode-pa.sandbox.googleapis.com (fallback)
+3. https://cloudcode-pa.googleapis.com                  (production fallback)
+```
+
+For regular Gemini CLI OAuth, only `cloudcode-pa.googleapis.com` is used.
+
+---
+
+## 4. Authentication
+
+### 4.1 OAuth Flow
+
+- **Client IDs**: Stored locally in `~/.config/codex-launcher/oauth-secrets.json` (not in repo)
+- **OAuth callback**: `https://antigravity.google/oauth-callback`
+- **Token storage**: `~/.cache/codex-proxy/google-antigravity-oauth-token.json`
+- **Token refresh**: via `https://oauth2.googleapis.com/token`
+- **Scopes**: `email profile openid cloud-platform cclog experimentsandconfigs userinfo.email userinfo.profile`
+- **Note**: The token does NOT have `auth/aicode` scope — it uses `cloud-platform` instead
+
+### 4.2 Multi-Account Support
+
+- `GoogleAccountPool("antigravity")` manages multiple Google accounts
+- Token files: `google-antigravity-oauth-token.json`, `google-antigravity-oauth-token-2.json`, etc.
+- Round-robin rotation across accounts
+
+---
+
+## 5. Request Format
+
+### 5.1 REST Request Wrapper
+
+The proxy wraps the Gemini-format request body in an outer envelope:
+
+```json
+{
+  "project": "<gcp-project-id>",
+  "model": "<rest-model-id>",
+  "requestType": "agent",
+  "userAgent": "antigravity",
+  "requestId": "agent-<uuid>",
+  "request": {
+    "contents": [...],
+    "systemInstruction": {...},
+    "generationConfig": {...},
+    "tools": [...]
+  }
+}
+```
+
+### 5.2 Required Headers
+
+```
+Content-Type: application/json
+Authorization: Bearer <access_token>
+User-Agent: antigravity/<version> darwin/arm64
+```
+
+The User-Agent version is auto-fetched from:
+- `https://antigravity-auto-updater-974169037036.us-central1.run.app`
+- Fallback: `https://antigravity.google/changelog`
+- Cached in `~/.cache/codex-proxy/antigravity-version.json`
+- Default: `1.18.3`
+
+---
+
+## 6. Model ID Mapping (CRITICAL)
+
+### 6.1 The Problem
+
+The agy CLI shows models with display names:
+- `Gemini 3.5 Flash (High)`
+- `Claude Sonnet 4.6 (Thinking)`
+
+But the **REST API only accepts slug IDs**:
+- `gemini-3-flash`
+- `claude-sonnet-4-6`
+
+Sending display names to the REST API returns **HTTP 404 "Requested entity was not found"**.
+
+### 6.2 Verified Working Model IDs
+
+All tested with live API calls to `daily-cloudcode-pa.sandbox.googleapis.com/v1internal:generateContent` on 2026-05-25:
+
+| Display Name (agy CLI / GUI) | REST API Model ID | Status |
+|---|---|---|
+| Gemini 3.5 Flash (High) | `gemini-3-flash` | OK |
+| Gemini 3.5 Flash (Medium) | `gemini-3-flash` | OK |
+| Gemini 3.5 Flash (Low) | `gemini-3.5-flash-low` | OK |
+| Gemini 3.1 Pro (High) | `gemini-3.1-pro-low` | OK (only low tier works via REST) |
+| Gemini 3.1 Pro (Low) | `gemini-3.1-pro-low` | OK |
+| Claude Sonnet 4.6 (Thinking) | `claude-sonnet-4-6` | OK |
+| Claude Opus 4.6 (Thinking) | `claude-opus-4-6-thinking` | OK |
+| GPT-OSS 120B (Medium) | `gpt-oss-120b-medium` | OK |
+| Gemini 2.5 Flash | `gemini-2.5-flash` | OK |
+| Gemini 2.5 Flash Lite | `gemini-2.5-flash-lite` | OK |
+| Gemini 2.5 Pro | `gemini-2.5-pro` | 503 (exists, no capacity) |
+
+### 6.3 Models That Return 404 via REST
+
+These exist in gRPC but NOT in the REST API:
+
+```
+gemini-3-flash-high, gemini-3-flash-medium, gemini-3-flash-low
+gemini-3.5-flash, gemini-3.5-flash-high, gemini-3.5-flash-medium
+gemini-3.1-pro-high (400, not 404, but doesn't work)
+gemini-3-pro, gemini-3-pro-high, gemini-3-pro-low (500)
+gemini-3.1-flash, gemini-3.1-flash-high
+claude-sonnet-4, claude-sonnet-4-5, claude-sonnet-4-6-thinking
+claude-opus-4, claude-opus-4-5
+claude-haiku-4-5
+gpt-oss-120b, gpt-oss-120b-maas, gpt-oss-20b-maas
+```
+
+### 6.4 How the Mapping Works
+
+1. GUI shows display names (matching agy CLI): `Gemini 3.5 Flash (High)`
+2. Codex CLI sends whatever model ID the user selected
+3. Proxy `alias_map` translates: `"Gemini 3.5 Flash (High)" → "gemini-3-flash"`
+4. Proxy sends REST request with `"model": "gemini-3-flash"`
+
+The alias map is in `_handle_gemini_oauth()` around line 4316 of `translate-proxy.py`.
+
+---
+
+## 7. Response Format
+
+### 7.1 Non-Streaming
+
+```json
+{
+  "response": {
+    "candidates": [{
+      "content": {
+        "role": "model",
+        "parts": [{"text": "..."}]
+      },
+      "finishReason": "STOP"
+    }]
+  }
+}
+```
+
+### 7.2 Streaming (SSE)
+
+Content-Type: `text/event-stream`
+
+Each SSE event contains a JSON chunk with the same structure. The proxy converts these to OpenAI Responses API format for Codex CLI.
+
+---
+
+## 8. Context Sizes
+
+```python
+"Gemini 3.5 Flash": 1000000, "Gemini 3.1 Pro": 2000000,
+"gemini-3-flash": 1000000, "gemini-3.1-pro-low": 2000000,
+"gemini-3.5-flash-low": 1000000,
+"Claude Sonnet 4.6": 200000, "Claude Opus 4.6": 200000,
+"claude-sonnet-4-6": 200000, "claude-opus-4-6-thinking": 200000,
+"GPT-OSS 120B": 128000, "gpt-oss-120b-medium": 128000,
+"gemini-2.5-flash": 1000000, "gemini-2.5-pro": 2000000,
+```
+
+---
+
+## 9. Key Proxy Code Locations
+
+| Component | File | Line (approx) |
+|---|---|---|
+| Antigravity version | translate-proxy.py | 287-288 |
+| Version fetcher | translate-proxy.py | 705-748 |
+| Model alias map | translate-proxy.py | ~4316 |
+| REST request building | translate-proxy.py | ~4563-4602 |
+| Endpoint fallback loop | translate-proxy.py | ~4610 |
+| SSE streaming handler | translate-proxy.py | `_forward_gemini_sse()` |
+| Auto-continue for MAX_TOKENS | translate-proxy.py | `_auto_continue_gemini()` |
+| OAuth token refresh | translate-proxy.py | `_refresh_oauth_token_for()` |
+| Google account pool | translate-proxy.py | `_google_antigravity_pool` |
+| GUI preset models | codex-launcher-gui | ~358 |
+| GUI static model list | codex-launcher-gui | ~760 `_ANTIGRAVITY_MODELS` |
+| GUI fetch_models shortcut | codex-launcher-gui | ~770 `fetch_models_for_endpoint()` |
+
+---
+
+## 10. Debugging
+
+### 10.1 Debug Logs
+
+- **Proxy stderr**: Shows model mapping, request details, errors
+- **400 error dump**: `~/.cache/codex-proxy/gemini-last-400-request.json`
+- **Long context dump**: `~/.cache/codex-proxy/gemini-long-ctx-<session>.json`
+
+### 10.2 Quick API Test
+
+```bash
+TOKEN=$(python3 -c "import json; print(json.load(open('$HOME/.cache/codex-proxy/google-antigravity-oauth-token.json'))['access_token'])")
+
+curl -s "https://daily-cloudcode-pa.sandbox.googleapis.com/v1internal:generateContent" \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $TOKEN" \
+  -H "User-Agent: antigravity/2.0.1 darwin/arm64" \
+  -d '{
+    "project": "voltaic-hangout-z1qhf",
+    "model": "gemini-3-flash",
+    "requestType": "agent",
+    "userAgent": "antigravity",
+    "requestId": "test-123",
+    "request": {
+      "contents": [{"role": "user", "parts": [{"text": "say hi"}]}]
+    }
+  }'
+```
+
+### 10.3 Token Info
+
+```bash
+TOKEN=$(python3 -c "import json; print(json.load(open('$HOME/.cache/codex-proxy/google-antigravity-oauth-token.json'))['access_token'])")
+curl -s "https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=$TOKEN" | python3 -m json.tool
+```
+
+### 10.4 Common Errors
+
+| Error | Cause | Fix |
+|---|---|---|
+| 404 "Requested entity was not found" | Wrong model ID (display name instead of slug) | Check alias_map |
+| 404 on `/v1/models` | Antigravity has no models REST endpoint | Proxy returns static list |
+| 404 on POST /responses | Codex CLI routing issue, not Antigravity | Check proxy is running |
+| 503 "No capacity" | Model exists but overloaded | Try another model or endpoint |
+| 500 "Unknown Error" | Model ID exists but broken on server | Known for gemini-3-pro-low |
+| PERMISSION_DENIED (gRPC) | Token lacks scope or empty request body | Use REST API instead |
+
+---
+
+## 11. Version History (Antigravity-specific)
+
+| Version | Date | Change |
+|---|---|---|
+| v3.10.3 | 2026-05-25 | **Fix 404**: Verified REST model IDs, display→slug mapping |
+| v3.10.2 | 2026-05-25 | Wrong fix: tried display names (didn't work) |
+| v3.10.0 | 2026-05-25 | Provider model editor, static Antigravity model list |
+| v3.9.9 | 2026-05-25 | Refreshed Antigravity models (slugs were wrong) |
+| v3.3.0 | Earlier | Initial Antigravity OAuth + tool calls + SSE streaming |
+
+---
+
+## 12. Testing a New Model ID
+
+If new models appear in the agy CLI, verify them against the REST API before adding:
+
+```python
+# Test a candidate model ID
+import urllib.request, json, os
+
+token = json.load(open(os.path.expanduser("~/.cache/codex-proxy/google-antigravity-oauth-token.json")))["access_token"]
+wrapped = {
+    "project": "voltaic-hangout-z1qhf", "model": "NEW-MODEL-ID",
+    "requestType": "agent", "userAgent": "antigravity",
+    "requestId": "test-123",
+    "request": {"contents": [{"role": "user", "parts": [{"text": "say hi"}]}]},
+}
+req = urllib.request.Request(
+    "https://daily-cloudcode-pa.sandbox.googleapis.com/v1internal:generateContent",
+    data=json.dumps(wrapped).encode(),
+    headers={"Content-Type": "application/json", "Authorization": f"Bearer {token}", "User-Agent": "antigravity/2.0.1 darwin/arm64"},
+)
+try:
+    resp = urllib.request.urlopen(req, timeout=15)
+    print("OK:", resp.read().decode()[:200])
+except urllib.error.HTTPError as e:
+    print(f"{e.code}:", e.read().decode()[:200])
+```
+
+Then update:
+1. `alias_map` in `translate-proxy.py` — add display name → REST slug mapping
+2. `_ANTIGRAVITY_MODELS` in `codex-launcher-gui` — add display name to list
+3. Preset in `codex-launcher-gui` — add display name to `"Google Antigravity (OAuth)"` models
+4. Context sizes in `translate-proxy.py` — add model ID to `_MODEL_CTX` dict
--- a/install.ps1
+++ b/install.ps1
@@ -0,0 +1,127 @@
+<#
+.SYNOPSIS
+    Codex Launcher Windows Installer
+.DESCRIPTION
+    Installs Codex Launcher for the current user.
+.NOTES
+    Requires: Python 3.8+ (stdlib only, zero pip dependencies).
+#>
+
+param(
+    [switch]$Uninstall
+)
+
+$ErrorActionPreference = 'Stop'
+$BinDir = Join-Path $env:LOCALAPPDATA 'Programs\Codex-Launcher'
+$StartMenu = Join-Path $env:APPDATA 'Microsoft\Windows\Start Menu\Programs'
+
+if ($Uninstall) {
+    Write-Host 'Uninstalling Codex Launcher...' -ForegroundColor Yellow
+
+    if (Test-Path $BinDir) {
+        Remove-Item -Recurse -Force $BinDir
+        Write-Host "  Removed $BinDir"
+    }
+
+    $shortcut = Join-Path $StartMenu 'Codex Launcher.lnk'
+    if (Test-Path $shortcut) {
+        Remove-Item -Force $shortcut
+        Write-Host '  Removed Start Menu shortcut'
+    }
+
+    $userPath = [Environment]::GetEnvironmentVariable('PATH', 'User')
+    if ($userPath -like "*$BinDir*") {
+        $newPath = ($userPath -split ';' | Where-Object { $_ -ne $BinDir }) -join ';'
+        [Environment]::SetEnvironmentVariable('PATH', $newPath, 'User')
+        Write-Host '  Removed from PATH'
+    }
+
+    Write-Host 'Uninstall complete.' -ForegroundColor Green
+    return
+}
+
+Write-Host ''
+Write-Host '  Codex Launcher - Windows Installer' -ForegroundColor Cyan
+Write-Host '  ====================================' -ForegroundColor Cyan
+Write-Host ''
+
+# Check Python
+$pythonExe = Get-Command python -ErrorAction SilentlyContinue
+if (-not $pythonExe) {
+    $pythonExe = Get-Command python3 -ErrorAction SilentlyContinue
+}
+if (-not $pythonExe) {
+    Write-Host 'ERROR: Python not found. Install Python 3.8+ and add to PATH.' -ForegroundColor Red
+    exit 1
+}
+Write-Host "  Python: $($pythonExe.Source)" -ForegroundColor Gray
+
+# Create install directory
+New-Item -ItemType Directory -Force -Path $BinDir | Out-Null
+
+# Copy files
+$srcDir = Join-Path $PSScriptRoot 'src'
+$files = @(
+    'translate-proxy.py',
+    'codex-launcher-gui.py',
+    'codex_launcher_lib.py',
+    'cleanup-codex-stale.py'
+)
+
+foreach ($file in $files) {
+    $src = Join-Path $srcDir $file
+    if (Test-Path $src) {
+        Copy-Item -Force $src $BinDir
+        Write-Host "  Installed: $file" -ForegroundColor Green
+    } else {
+        Write-Host "  WARNING: $file not found in src/" -ForegroundColor Yellow
+    }
+}
+
+# Create Start Menu shortcut
+$WshShell = New-Object -ComObject WScript.Shell
+$shortcutPath = Join-Path $StartMenu 'Codex Launcher.lnk'
+$Shortcut = $WshShell.CreateShortcut($shortcutPath)
+
+# Find pythonw.exe for no-console launch
+$pythonw = Get-Command pythonw -ErrorAction SilentlyContinue
+if (-not $pythonw) {
+    $pythonDir = Split-Path $pythonExe.Source
+    $pythonwCandidate = Join-Path $pythonDir 'pythonw.exe'
+    if (Test-Path $pythonwCandidate) {
+        $pythonw = $pythonwCandidate
+    }
+}
+
+if ($pythonw) {
+    $targetPath = if ($pythonw.Source) { $pythonw.Source } else { $pythonw }
+} else {
+    $targetPath = $pythonExe.Source
+}
+$Shortcut.TargetPath = $targetPath
+$guiPath = Join-Path $BinDir 'codex-launcher-gui.py'
+$Shortcut.Arguments = $guiPath
+$Shortcut.WorkingDirectory = $BinDir
+$Shortcut.Description = 'Launch Codex Desktop with any AI provider'
+$Shortcut.Save()
+Write-Host '  Created Start Menu shortcut' -ForegroundColor Green
+
+# Add to PATH
+$userPath = [Environment]::GetEnvironmentVariable('PATH', 'User')
+if ($userPath -notlike "*$BinDir*") {
+    $newUserPath = $userPath + ';' + $BinDir
+    [Environment]::SetEnvironmentVariable('PATH', $newUserPath, 'User')
+    $env:PATH = $env:PATH + ';' + $BinDir
+    Write-Host '  Added to user PATH' -ForegroundColor Green
+}
+
+# Verify
+Write-Host ''
+Write-Host '  Installation complete!' -ForegroundColor Cyan
+Write-Host "  Install dir: $BinDir" -ForegroundColor Gray
+Write-Host ''
+Write-Host '  Launch options:' -ForegroundColor White
+Write-Host '    Start Menu:  Codex Launcher' -ForegroundColor Gray
+Write-Host '    Command:     codex-launcher-gui.py' -ForegroundColor Gray
+Write-Host '    Uninstall:   powershell -File install.ps1 -Uninstall' -ForegroundColor Gray
+Write-Host ''
--- a/install.sh
+++ b/install.sh
@@ -2,28 +2,37 @@
 set -e

 SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
-BIN_DIR="$HOME/.local/bin"
-APP_DIR="$HOME/.local/share/applications"

-mkdir -p "$BIN_DIR" "$APP_DIR"
+if [ -f "$SCRIPT_DIR/codex-launcher_3.11.6_all.deb" ]; then
+    echo "Installing codex-launcher_3.11.6_all.deb ..."
+    sudo dpkg -i "$SCRIPT_DIR/codex-launcher_3.11.6_all.deb"
+else
+    echo "WARNING: codex-launcher_3.11.6_all.deb not found; copying files manually."
+fi
+echo "Installed v3.11.6 via .deb package."
+    echo "  translate-proxy.py   -> /usr/bin/translate-proxy.py"
+    echo "  codex-launcher-gui   -> /usr/bin/codex-launcher-gui"
+    echo "  cleanup-codex-stale  -> /usr/bin/cleanup-codex-stale.sh"
+    echo "  desktop entry        -> /usr/share/applications/codex-launcher.desktop"
+else
+    BIN_DIR="$HOME/.local/bin"
+    APP_DIR="$HOME/.local/share/applications"
+    mkdir -p "$BIN_DIR" "$APP_DIR"
+    cp "$SCRIPT_DIR/src/translate-proxy.py" "$BIN_DIR/"
+    cp "$SCRIPT_DIR/src/codex-launcher-gui" "$BIN_DIR/"
+    cp "$SCRIPT_DIR/src/cleanup-codex-stale.sh" "$BIN_DIR/"
+    chmod +x "$BIN_DIR/translate-proxy.py"
+    chmod +x "$BIN_DIR/codex-launcher-gui"
+    chmod +x "$BIN_DIR/cleanup-codex-stale.sh"
+    USERNAME=$(whoami)
+    sed "s/YOUR_USERNAME/$USERNAME/g" "$SCRIPT_DIR/src/codex-launcher.desktop.template" > "$APP_DIR/codex-launcher.desktop"
+    update-desktop-database "$APP_DIR" 2>/dev/null || true
+    echo "Installed from source."
+    echo "  translate-proxy.py   -> $BIN_DIR/translate-proxy.py"
+    echo "  codex-launcher-gui   -> $BIN_DIR/codex-launcher-gui"
+    echo "  cleanup-codex-stale  -> $BIN_DIR/cleanup-codex-stale.sh"
+    echo "  desktop entry        -> $APP_DIR/codex-launcher.desktop"
+fi

-cp "$SCRIPT_DIR/src/translate-proxy.py" "$BIN_DIR/"
-cp "$SCRIPT_DIR/src/codex-launcher-gui" "$BIN_DIR/"
-cp "$SCRIPT_DIR/src/cleanup-codex-stale.sh" "$BIN_DIR/"
-
-chmod +x "$BIN_DIR/translate-proxy.py"
-chmod +x "$BIN_DIR/codex-launcher-gui"
-chmod +x "$BIN_DIR/cleanup-codex-stale.sh"
-
-USERNAME=$(whoami)
-sed "s/YOUR_USERNAME/$USERNAME/g" "$SCRIPT_DIR/src/codex-launcher.desktop.template" > "$APP_DIR/codex-launcher.desktop"
-
-update-desktop-database "$APP_DIR" 2>/dev/null || true
-
-echo "Installed."
-echo "  translate-proxy.py   -> $BIN_DIR/translate-proxy.py"
-echo "  codex-launcher-gui   -> $BIN_DIR/codex-launcher-gui"
-echo "  cleanup-codex-stale  -> $BIN_DIR/cleanup-codex-stale.sh"
-echo "  desktop entry        -> $APP_DIR/codex-launcher.desktop"
 echo ""
 echo "Open 'Codex Launcher' from your app grid, or run: codex-launcher-gui"
--- a/src/antigravity_grpc/init.py
+++ b/src/antigravity_grpc/init.py
@@ -0,0 +1,24 @@
+"""
+antigravity_grpc — gRPC fallback client for Google CloudCode (Antigravity).
+
+When the REST API rejects a request (404 model not found, 400 bad request due to
+model ID mismatch, etc.), this module provides a gRPC fallback path that uses
+Google's native PredictionService protocol — the same one the agy CLI uses.
+
+This module is imported lazily and only when grpcio is installed. If grpcio is
+not available, the fallback is silently skipped.
+"""
+
+from .client import (
+    GrpcFallbackResult,
+    AntigravityGrpcClient,
+    is_grpc_available,
+    get_client,
+)
+
+__all__ = [
+    "GrpcFallbackResult",
+    "AntigravityGrpcClient",
+    "is_grpc_available",
+    "get_client",
+]
--- a/src/antigravity_grpc/client.py
+++ b/src/antigravity_grpc/client.py
@@ -0,0 +1,609 @@
+"""
+antigravity_grpc.client — gRPC fallback client for Google CloudCode (Antigravity).
+
+This module provides a gRPC client that can be used as an automatic fallback when
+the CloudCode REST API rejects requests. The gRPC path uses the same
+PredictionService that the native agy CLI binary uses, giving access to models
+that are unavailable via REST (e.g. models that return 404 on REST but work on gRPC).
+
+Key design decisions:
+  - Lazy import: grpcio is only imported when actually needed. If not installed,
+    is_grpc_available() returns False and the fallback is silently skipped.
+  - Zero impact on other providers: this module is only called from
+    _handle_antigravity_v2() when REST returns a fallback-eligible error.
+  - Same output format as REST: the client returns structured dicts that match
+    the SSE/JSON response shapes the proxy already processes.
+  - Thread-safe: the gRPC channel is created once per endpoint and reused.
+
+Usage from translate-proxy.py:
+    from antigravity_grpc import is_grpc_available, AntigravityGrpcClient
+
+    if is_grpc_available():
+        client = AntigravityGrpcClient()
+        result = client.try_generate(request_dict, stream=False)
+        if result.ok:
+            # Use result.response_data (dict matching REST response shape)
+        else:
+            # gRPC also failed, fall through to error
+"""
+
+import json
+import os
+import sys
+import time
+import threading
+import collections
+
+# ═══════════════════════════════════════════════════════════════════
+# Lazy gRPC import — never crash if grpcio is missing
+# ═══════════════════════════════════════════════════════════════════
+
+_grpc = None
+_pb2 = None
+_pb2_grpc = None
+_import_error = None
+
+def _try_import():
+    global _grpc, _pb2, _pb2_grpc, _import_error
+    if _grpc is not None:
+        return _grpc is not False
+    try:
+        import grpc as _real_grpc
+        # Import the generated stubs relative to this package
+        from . import cloudcode_pb2 as _real_pb2
+        from . import cloudcode_pb2_grpc as _real_pb2_grpc
+        _grpc = _real_grpc
+        _pb2 = _real_pb2
+        _pb2_grpc = _real_pb2_grpc
+        return True
+    except Exception as e:
+        _import_error = str(e)
+        _grpc = False
+        return False
+
+
+def is_grpc_available():
+    """Return True if grpcio and the generated stubs are importable."""
+    return _try_import()
+
+
+# ═══════════════════════════════════════════════════════════════════
+# gRPC endpoints for Antigravity (same hosts, different port/path)
+# ═══════════════════════════════════════════════════════════════════
+# The CloudCode gRPC service runs on the same hosts as REST but uses
+# the gRPC protocol. The agy CLI connects to:
+#   - cloudcode-pa.googleapis.com:443
+#   - daily-cloudcode-pa.googleapis.com:443
+#   - daily-cloudcode-pa.sandbox.googleapis.com:443
+
+_GRPC_ENDPOINTS = [
+    "daily-cloudcode-pa.googleapis.com:443",
+    "cloudcode-pa.googleapis.com:443",
+]
+
+_ALLOW_STAGING_ENV = "ALLOW_ANTIGRAVITY_STAGING"
+
+# ═══════════════════════════════════════════════════════════════════
+# Result type
+# ═══════════════════════════════════════════════════════════════════
+
+class GrpcFallbackResult:
+    """Result of a gRPC fallback attempt."""
+
+    __slots__ = ("ok", "response_data", "stream_chunks", "error_message",
+                 "endpoint_used", "model_used", "elapsed_s")
+
+    def __init__(self, ok=False, response_data=None, stream_chunks=None,
+                 error_message="", endpoint_used="", model_used="", elapsed_s=0.0):
+        self.ok = ok
+        self.response_data = response_data      # dict (non-streaming)
+        self.stream_chunks = stream_chunks      # list[dict] (streaming)
+        self.error_message = error_message
+        self.endpoint_used = endpoint_used
+        self.model_used = model_used
+        self.elapsed_s = elapsed_s
+
+    def __repr__(self):
+        if self.ok:
+            if self.stream_chunks is not None:
+                return f"<GrpcFallbackResult OK stream chunks={len(self.stream_chunks)}>"
+            return f"<GrpcFallbackResult OK data_keys={list(self.response_data.keys()) if self.response_data else None}>"
+        return f"<GrpcFallbackResult FAIL error={self.error_message!r}>"
+
+
+# ═══════════════════════════════════════════════════════════════════
+# JSON → Protobuf conversion helpers
+# ═══════════════════════════════════════════════════════════════════
+
+def _struct_to_protobuf(d, struct_obj=None):
+    """Convert a Python dict to a google.protobuf.Struct."""
+    from google.protobuf.struct_pb2 import Struct, Value, NullValue, ListValue
+    if struct_obj is None:
+        struct_obj = Struct()
+    if isinstance(d, dict):
+        for k, v in d.items():
+            if isinstance(v, str):
+                struct_obj.fields[k].string_value = v
+            elif isinstance(v, bool):
+                struct_obj.fields[k].bool_value = v
+            elif isinstance(v, int):
+                struct_obj.fields[k].number_value = float(v)
+            elif isinstance(v, float):
+                struct_obj.fields[k].number_value = v
+            elif isinstance(v, dict):
+                _struct_to_protobuf(v, struct_obj.fields[k].struct_value)
+            elif isinstance(v, list):
+                lst = struct_obj.fields[k].list_value
+                for item in v:
+                    if isinstance(item, str):
+                        lst.values.add().string_value = item
+                    elif isinstance(item, bool):
+                        lst.values.add().bool_value = item
+                    elif isinstance(item, (int, float)):
+                        lst.values.add().number_value = float(item)
+                    elif isinstance(item, dict):
+                        _struct_to_protobuf(item, lst.values.add().struct_value)
+                    elif item is None:
+                        lst.values.add().null_value = 0
+            elif v is None:
+                struct_obj.fields[k].null_value = 0
+    return struct_obj
+
+
+def _protobuf_struct_to_dict(struct):
+    """Convert a google.protobuf.Struct to a Python dict."""
+    from google.protobuf.struct_pb2 import Value, NullValue
+    result = {}
+    for k, v in struct.fields.items():
+        kind = v.WhichOneof("kind")
+        if kind == "null_value":
+            result[k] = None
+        elif kind == "number_value":
+            result[k] = v.number_value
+        elif kind == "string_value":
+            result[k] = v.string_value
+        elif kind == "bool_value":
+            result[k] = v.bool_value
+        elif kind == "struct_value":
+            result[k] = _protobuf_struct_to_dict(v.struct_value)
+        elif kind == "list_value":
+            result[k] = [_value_to_python(item) for item in v.list_value.values]
+        else:
+            result[k] = None
+    return result
+
+
+def _value_to_python(v):
+    """Convert a google.protobuf.Value to a Python value."""
+    kind = v.WhichOneof("kind")
+    if kind == "null_value":
+        return None
+    elif kind == "number_value":
+        return v.number_value
+    elif kind == "string_value":
+        return v.string_value
+    elif kind == "bool_value":
+        return v.bool_value
+    elif kind == "struct_value":
+        return _protobuf_struct_to_dict(v.struct_value)
+    elif kind == "list_value":
+        return [_value_to_python(item) for item in v.list_value.values]
+    return None
+
+
+def _json_parts_to_proto(parts_json):
+    """Convert a list of JSON content parts to protobuf Part messages."""
+    result = []
+    for p in parts_json:
+        if not isinstance(p, dict):
+            continue
+        part = _pb2.Part()
+
+        # Thought signature
+        sig = p.get("thoughtSignature") or p.get("thought_signature")
+        if sig:
+            part.thought_signature = sig
+
+        if p.get("thought"):
+            part.thought = True
+            if "text" in p:
+                part.text = p["text"]
+        elif "text" in p and "functionCall" not in p:
+            part.text = p["text"]
+        elif "functionCall" in p:
+            fc = p["functionCall"]
+            part.function_call.name = fc.get("name", "")
+            part.function_call.id = fc.get("id", "")
+            args = fc.get("args", fc.get("arguments", {}))
+            if isinstance(args, dict):
+                _struct_to_protobuf(args, part.function_call.args)
+            elif isinstance(args, str):
+                try:
+                    _struct_to_protobuf(json.loads(args), part.function_call.args)
+                except Exception:
+                    pass
+        elif "functionResponse" in p:
+            fr = p["functionResponse"]
+            part.function_response.name = fr.get("name", "")
+            part.function_response.id = fr.get("id", "")
+            resp = fr.get("response", {})
+            if "result" in resp:
+                result_val = resp["result"]
+                if isinstance(result_val, (dict, list)):
+                    _struct_to_protobuf({"result": result_val}, part.function_response.response)
+                else:
+                    _struct_to_protobuf({"result": str(result_val)}, part.function_response.response)
+            elif isinstance(resp, dict):
+                _struct_to_protobuf(resp, part.function_response.response)
+        elif "inlineData" in p:
+            idata = p["inlineData"]
+            import base64
+            part.inline_data.mime_type = idata.get("mimeType", "image/png")
+            b64data = idata.get("data", "")
+            part.inline_data.data = base64.b64decode(b64data) if b64data else b""
+
+        result.append(part)
+    return result
+
+
+def _json_contents_to_proto(contents_json):
+    """Convert a list of JSON content objects to protobuf Content messages."""
+    result = []
+    for c in contents_json:
+        if not isinstance(c, dict):
+            continue
+        content = _pb2.Content()
+        content.role = c.get("role", "user")
+        for part in _json_parts_to_proto(c.get("parts", [])):
+            content.parts.append(part)
+        result.append(content)
+    return result
+
+
+def _proto_candidate_to_json(candidate):
+    """Convert a protobuf Candidate to a JSON-compatible dict."""
+    content_json = {"role": candidate.content.role, "parts": []}
+    for part in candidate.content.parts:
+        p = {}
+        if part.thought_signature:
+            p["thoughtSignature"] = part.thought_signature
+        if part.thought:
+            p["thought"] = True
+            if part.text:
+                p["text"] = part.text
+        elif part.text and not part.HasField("function_call"):
+            p["text"] = part.text
+        elif part.HasField("function_call"):
+            fc = part.function_call
+            args_dict = _protobuf_struct_to_dict(fc.args) if fc.HasField("args") else {}
+            p["functionCall"] = {
+                "name": fc.name,
+                "args": args_dict,
+                "id": fc.id,
+            }
+        elif part.HasField("function_response"):
+            fr = part.function_response
+            resp_dict = _protobuf_struct_to_dict(fr.response) if fr.HasField("response") else {}
+            p["functionResponse"] = {
+                "name": fr.name,
+                "response": resp_dict,
+                "id": fr.id,
+            }
+        elif part.HasField("inline_data"):
+            import base64
+            p["inlineData"] = {
+                "mimeType": part.inline_data.mime_type,
+                "data": base64.b64encode(part.inline_data.data).decode(),
+            }
+        if p:
+            content_json["parts"].append(p)
+
+    return {
+        "content": content_json,
+        "finishReason": candidate.finish_reason,
+        "index": candidate.index,
+    }
+
+
+# ═══════════════════════════════════════════════════════════════════
+# Client
+# ═══════════════════════════════════════════════════════════════════
+
+class AntigravityGrpcClient:
+    """
+    gRPC fallback client for Google CloudCode Antigravity.
+
+    Thread-safe. Channels are cached per endpoint and reused.
+    """
+
+    def __init__(self):
+        self._channels = {}
+        self._stubs = {}
+        self._lock = threading.Lock()
+
+    def _get_channel(self, endpoint):
+        """Get or create a gRPC channel for the given endpoint."""
+        with self._lock:
+            if endpoint not in self._channels:
+                # Use secure channel with default SSL credentials
+                creds = _grpc.ssl_channel_credentials()
+                channel = _grpc.secure_channel(endpoint, creds)
+                self._channels[endpoint] = channel
+                self._stubs[endpoint] = _pb2_grpc.PredictionServiceStub(channel)
+            return self._channels[endpoint], self._stubs[endpoint]
+
+    def _build_request(self, wrapped_dict):
+        """
+        Build a GenerateContentRequest protobuf from the same wrapped dict
+        that the REST API uses.
+
+        wrapped_dict shape:
+        {
+            "project": "...",
+            "model": "...",
+            "requestType": "agent",
+            "userAgent": "antigravity/...",
+            "requestId": "agent-...",
+            "request": {
+                "contents": [...],
+                "systemInstruction": {...},
+                "generationConfig": {...},
+                "tools": [...],
+                "safetySettings": [...],
+                "toolConfig": {...},
+                "sessionId": "..."
+            }
+        }
+        """
+        req = _pb2.GenerateContentRequest()
+        req.project = wrapped_dict.get("project", "")
+        req.model = wrapped_dict.get("model", "")
+        req.request_type = wrapped_dict.get("requestType", "agent")
+        req.user_agent = wrapped_dict.get("userAgent", "")
+        req.request_id = wrapped_dict.get("requestId", "")
+
+        inner = wrapped_dict.get("request", {})
+
+        # Contents
+        for c in _json_contents_to_proto(inner.get("contents", [])):
+            req.request.contents.append(c)
+
+        # System instruction
+        si = inner.get("systemInstruction", {})
+        if si:
+            si_parts = si.get("parts", [])
+            if si.get("role"):
+                req.request.system_instruction.role = si.get("role", "user")
+            for part in _json_parts_to_proto(si_parts):
+                req.request.system_instruction.parts.append(part)
+
+        # Generation config
+        gc = inner.get("generationConfig", {})
+        if gc:
+            cfg = req.request.generation_config
+            if "maxOutputTokens" in gc:
+                cfg.max_output_tokens = int(gc["maxOutputTokens"])
+            if "temperature" in gc:
+                cfg.temperature = float(gc["temperature"])
+            if "topP" in gc:
+                cfg.top_p = float(gc["top_p" if "top_p" in gc else "topP"])
+            for ss in gc.get("stopSequences", []):
+                cfg.stop_sequences.append(ss)
+
+            # Thinking config (Gemini 3 native)
+            tc = gc.get("thinkingConfig", gc.get("thinking_config"))
+            if tc:
+                cfg.thinking_config.include_thoughts = tc.get("includeThoughts", tc.get("include_thoughts", False))
+                cfg.thinking_config.thinking_budget = int(tc.get("thinkingBudget", tc.get("thinking_budget", 8192)))
+            # Legacy thinking fields
+            if "includeThoughts" in gc and not tc:
+                cfg.thinking_config.include_thoughts = gc["includeThoughts"]
+            if "thinkingBudget" in gc and not tc:
+                cfg.thinking_config.thinking_budget = int(gc["thinkingBudget"])
+
+        # Tools
+        for tool_json in inner.get("tools", []):
+            tool = _pb2.Tool()
+            for fd_json in tool_json.get("functionDeclarations", []):
+                fd = tool.function_declarations.add()
+                fd.name = fd_json.get("name", "")
+                fd.description = fd_json.get("description", "")
+                params = fd_json.get("parameters", {})
+                if isinstance(params, dict) and params:
+                    _struct_to_protobuf(params, fd.parameters)
+            req.request.tools.append(tool)
+
+        # Safety settings
+        for ss in inner.get("safetySettings", []):
+            ss_msg = _pb2.SafetySetting()
+            ss_msg.category = ss.get("category", "")
+            ss_msg.threshold = ss.get("threshold", "OFF")
+            req.request.safety_settings.append(ss_msg)
+
+        # Tool config
+        tcfg = inner.get("toolConfig", {})
+        if tcfg:
+            fcc = tcfg.get("functionCallingConfig", {})
+            if fcc:
+                req.request.tool_config.function_calling_config.mode = fcc.get("mode", "AUTO")
+                for afn in fcc.get("allowed_function_names", []):
+                    req.request.tool_config.function_calling_config.allowed_function_names.append(afn)
+
+        # Session ID
+        sid = inner.get("sessionId", "")
+        if sid:
+            req.request.session_id = sid
+
+        return req
+
+    def try_generate(self, wrapped_dict, stream=False, access_token="",
+                     timeout_s=180):
+        """
+        Try a gRPC GenerateContent or StreamGenerateContent request.
+
+        Args:
+            wrapped_dict: The same wrapped dict used for REST requests.
+            stream: If True, use server-streaming RPC.
+            access_token: OAuth2 Bearer token for authentication.
+            timeout_s: Request timeout in seconds.
+
+        Returns:
+            GrpcFallbackResult with ok=True if successful.
+            For non-streaming: result.response_data is a dict matching
+                the REST JSON response shape.
+            For streaming: result.stream_chunks is a list of dicts matching
+                REST SSE chunk shapes.
+        """
+        if not is_grpc_available():
+            return GrpcFallbackResult(ok=False, error_message="grpcio not installed")
+
+        t0 = time.time()
+
+        # Build metadata (gRPC uses metadata instead of HTTP headers)
+        metadata = []
+        if access_token:
+            metadata.append(("authorization", f"Bearer {access_token}"))
+        ua = wrapped_dict.get("userAgent", "")
+        if ua:
+            metadata.append(("user-agent", ua))
+        metadata.append(("x-client-name", "antigravity"))
+        # Required for Google's gRPC gateway
+        metadata.append(("x-goog-api-client", "gl-node/18.18.2 fire/0.8.6 grpc/1.10.x"))
+
+        # Build endpoints list
+        endpoints = list(_GRPC_ENDPOINTS)
+        if os.environ.get(_ALLOW_STAGING_ENV, "0") == "1":
+            endpoints.append("daily-cloudcode-pa.sandbox.googleapis.com:443")
+            endpoints.append("autopush-cloudcode-pa.sandbox.googleapis.com:443")
+
+        model = wrapped_dict.get("model", "?")
+
+        last_error = ""
+        for ep in endpoints:
+            try:
+                channel, stub = self._get_channel(ep)
+                req = self._build_request(wrapped_dict)
+
+                if stream:
+                    return self._do_stream(stub, req, metadata, ep, model,
+                                           timeout_s, t0)
+                else:
+                    return self._do_unary(stub, req, metadata, ep, model,
+                                          timeout_s, t0)
+
+            except Exception as e:
+                last_error = str(e)
+                err_str = last_error.lower()
+                print(f"[antigravity-grpc] {ep} failed: {last_error[:300]}", file=sys.stderr)
+                # Don't retry on auth errors
+                if "unauthenticated" in err_str or "permission" in err_str:
+                    break
+                # Don't retry on invalid argument (model truly doesn't exist)
+                if "not_found" in err_str or "not found" in err_str:
+                    break
+                continue
+
+        elapsed = time.time() - t0
+        return GrpcFallbackResult(
+            ok=False,
+            error_message=f"All gRPC endpoints failed: {last_error}",
+            model_used=model,
+            elapsed_s=elapsed,
+        )
+
+    def _do_unary(self, stub, req, metadata, endpoint, model, timeout_s, t0):
+        """Execute a unary (non-streaming) gRPC call."""
+        response = stub.GenerateContent(
+            req,
+            metadata=metadata,
+            timeout=timeout_s,
+        )
+        elapsed = time.time() - t0
+
+        # Convert protobuf response to REST-compatible JSON shape
+        candidates_json = []
+        for candidate in response.response.candidates:
+            candidates_json.append(_proto_candidate_to_json(candidate))
+
+        # Match the REST response envelope:
+        # { "response": { "candidates": [...] } }
+        rest_shape = {
+            "response": {
+                "candidates": candidates_json,
+            }
+        }
+
+        print(f"[antigravity-grpc] {endpoint} unary OK, candidates={len(candidates_json)}, elapsed={elapsed:.1f}s", file=sys.stderr)
+
+        return GrpcFallbackResult(
+            ok=True,
+            response_data=rest_shape,
+            endpoint_used=endpoint,
+            model_used=model,
+            elapsed_s=elapsed,
+        )
+
+    def _do_stream(self, stub, req, metadata, endpoint, model, timeout_s, t0):
+        """Execute a server-streaming gRPC call."""
+        chunks = []
+        chunk_count = 0
+
+        response_iter = stub.StreamGenerateContent(
+            req,
+            metadata=metadata,
+            timeout=timeout_s,
+        )
+
+        for chunk_proto in response_iter:
+            chunk_count += 1
+            # Each chunk_proto is a StreamGenerateContentChunk
+            # which wraps a Response with candidates
+            candidates_json = []
+            for candidate in chunk_proto.response.candidates:
+                candidates_json.append(_proto_candidate_to_json(candidate))
+
+            # Match REST SSE chunk shape: { "response": { "candidates": [...] } }
+            chunk_json = {
+                "response": {
+                    "candidates": candidates_json,
+                }
+            }
+            chunks.append(chunk_json)
+
+        elapsed = time.time() - t0
+        print(f"[antigravity-grpc] {endpoint} stream OK, chunks={chunk_count}, elapsed={elapsed:.1f}s", file=sys.stderr)
+
+        return GrpcFallbackResult(
+            ok=True,
+            stream_chunks=chunks,
+            endpoint_used=endpoint,
+            model_used=model,
+            elapsed_s=elapsed,
+        )
+
+    def close(self):
+        """Close all gRPC channels."""
+        with self._lock:
+            for ep, channel in self._channels.items():
+                try:
+                    channel.close()
+                except Exception:
+                    pass
+            self._channels.clear()
+            self._stubs.clear()
+
+
+# ═══════════════════════════════════════════════════════════════════
+# Module-level singleton
+# ═══════════════════════════════════════════════════════════════════
+
+_client = None
+_client_lock = threading.Lock()
+
+def get_client():
+    """Get the module-level AntigravityGrpcClient singleton."""
+    global _client
+    with _client_lock:
+        if _client is None:
+            _client = AntigravityGrpcClient()
+        return _client
--- a/src/antigravity_grpc/cloudcode_pb2.py
+++ b/src/antigravity_grpc/cloudcode_pb2.py
--- a/src/antigravity_grpc/cloudcode_pb2_grpc.py
+++ b/src/antigravity_grpc/cloudcode_pb2_grpc.py
@@ -0,0 +1,275 @@
+# Generated by the gRPC Python protocol compiler plugin. DO NOT EDIT!
+"""Client and server classes corresponding to protobuf-defined services."""
+import grpc
+import warnings
+
+from antigravity_grpc import cloudcode_pb2 as cloudcode__pb2
+
+GRPC_GENERATED_VERSION = '1.80.0'
+GRPC_VERSION = grpc.__version__
+_version_not_supported = False
+
+try:
+    from grpc._utilities import first_version_is_lower
+    _version_not_supported = first_version_is_lower(GRPC_VERSION, GRPC_GENERATED_VERSION)
+except ImportError:
+    _version_not_supported = True
+
+if _version_not_supported:
+    raise RuntimeError(
+        f'The grpc package installed is at version {GRPC_VERSION},'
+        + ' but the generated code in cloudcode_pb2_grpc.py depends on'
+        + f' grpcio>={GRPC_GENERATED_VERSION}.'
+        + f' Please upgrade your grpc module to grpcio>={GRPC_GENERATED_VERSION}'
+        + f' or downgrade your generated code using grpcio-tools<={GRPC_VERSION}.'
+    )
+
+
+class PredictionServiceStub(object):
+    """─── Service ──────────────────────────────────────────────────────────
+
+    """
+
+    def __init__(self, channel):
+        """Constructor.
+
+        Args:
+            channel: A grpc.Channel.
+        """
+        self.GenerateContent = channel.unary_unary(
+                '/google.internal.cloud.code.v1internal.PredictionService/GenerateContent',
+                request_serializer=cloudcode__pb2.GenerateContentRequest.SerializeToString,
+                response_deserializer=cloudcode__pb2.GenerateContentResponse.FromString,
+                _registered_method=True)
+        self.StreamGenerateContent = channel.unary_stream(
+                '/google.internal.cloud.code.v1internal.PredictionService/StreamGenerateContent',
+                request_serializer=cloudcode__pb2.GenerateContentRequest.SerializeToString,
+                response_deserializer=cloudcode__pb2.StreamGenerateContentChunk.FromString,
+                _registered_method=True)
+        self.FetchAvailableModels = channel.unary_unary(
+                '/google.internal.cloud.code.v1internal.PredictionService/FetchAvailableModels',
+                request_serializer=cloudcode__pb2.FetchAvailableModelsRequest.SerializeToString,
+                response_deserializer=cloudcode__pb2.FetchAvailableModelsResponse.FromString,
+                _registered_method=True)
+        self.CountTokens = channel.unary_unary(
+                '/google.internal.cloud.code.v1internal.PredictionService/CountTokens',
+                request_serializer=cloudcode__pb2.CountTokensRequest.SerializeToString,
+                response_deserializer=cloudcode__pb2.CountTokensResponse.FromString,
+                _registered_method=True)
+        self.RetrieveUserQuota = channel.unary_unary(
+                '/google.internal.cloud.code.v1internal.PredictionService/RetrieveUserQuota',
+                request_serializer=cloudcode__pb2.RetrieveUserQuotaRequest.SerializeToString,
+                response_deserializer=cloudcode__pb2.RetrieveUserQuotaResponse.FromString,
+                _registered_method=True)
+
+
+class PredictionServiceServicer(object):
+    """─── Service ──────────────────────────────────────────────────────────
+
+    """
+
+    def GenerateContent(self, request, context):
+        """Missing associated documentation comment in .proto file."""
+        context.set_code(grpc.StatusCode.UNIMPLEMENTED)
+        context.set_details('Method not implemented!')
+        raise NotImplementedError('Method not implemented!')
+
+    def StreamGenerateContent(self, request, context):
+        """Missing associated documentation comment in .proto file."""
+        context.set_code(grpc.StatusCode.UNIMPLEMENTED)
+        context.set_details('Method not implemented!')
+        raise NotImplementedError('Method not implemented!')
+
+    def FetchAvailableModels(self, request, context):
+        """Missing associated documentation comment in .proto file."""
+        context.set_code(grpc.StatusCode.UNIMPLEMENTED)
+        context.set_details('Method not implemented!')
+        raise NotImplementedError('Method not implemented!')
+
+    def CountTokens(self, request, context):
+        """Missing associated documentation comment in .proto file."""
+        context.set_code(grpc.StatusCode.UNIMPLEMENTED)
+        context.set_details('Method not implemented!')
+        raise NotImplementedError('Method not implemented!')
+
+    def RetrieveUserQuota(self, request, context):
+        """Missing associated documentation comment in .proto file."""
+        context.set_code(grpc.StatusCode.UNIMPLEMENTED)
+        context.set_details('Method not implemented!')
+        raise NotImplementedError('Method not implemented!')
+
+
+def add_PredictionServiceServicer_to_server(servicer, server):
+    rpc_method_handlers = {
+            'GenerateContent': grpc.unary_unary_rpc_method_handler(
+                    servicer.GenerateContent,
+                    request_deserializer=cloudcode__pb2.GenerateContentRequest.FromString,
+                    response_serializer=cloudcode__pb2.GenerateContentResponse.SerializeToString,
+            ),
+            'StreamGenerateContent': grpc.unary_stream_rpc_method_handler(
+                    servicer.StreamGenerateContent,
+                    request_deserializer=cloudcode__pb2.GenerateContentRequest.FromString,
+                    response_serializer=cloudcode__pb2.StreamGenerateContentChunk.SerializeToString,
+            ),
+            'FetchAvailableModels': grpc.unary_unary_rpc_method_handler(
+                    servicer.FetchAvailableModels,
+                    request_deserializer=cloudcode__pb2.FetchAvailableModelsRequest.FromString,
+                    response_serializer=cloudcode__pb2.FetchAvailableModelsResponse.SerializeToString,
+            ),
+            'CountTokens': grpc.unary_unary_rpc_method_handler(
+                    servicer.CountTokens,
+                    request_deserializer=cloudcode__pb2.CountTokensRequest.FromString,
+                    response_serializer=cloudcode__pb2.CountTokensResponse.SerializeToString,
+            ),
+            'RetrieveUserQuota': grpc.unary_unary_rpc_method_handler(
+                    servicer.RetrieveUserQuota,
+                    request_deserializer=cloudcode__pb2.RetrieveUserQuotaRequest.FromString,
+                    response_serializer=cloudcode__pb2.RetrieveUserQuotaResponse.SerializeToString,
+            ),
+    }
+    generic_handler = grpc.method_handlers_generic_handler(
+            'google.internal.cloud.code.v1internal.PredictionService', rpc_method_handlers)
+    server.add_generic_rpc_handlers((generic_handler,))
+    server.add_registered_method_handlers('google.internal.cloud.code.v1internal.PredictionService', rpc_method_handlers)
+
+
+ # This class is part of an EXPERIMENTAL API.
+class PredictionService(object):
+    """─── Service ──────────────────────────────────────────────────────────
+
+    """
+
+    @staticmethod
+    def GenerateContent(request,
+            target,
+            options=(),
+            channel_credentials=None,
+            call_credentials=None,
+            insecure=False,
+            compression=None,
+            wait_for_ready=None,
+            timeout=None,
+            metadata=None):
+        return grpc.experimental.unary_unary(
+            request,
+            target,
+            '/google.internal.cloud.code.v1internal.PredictionService/GenerateContent',
+            cloudcode__pb2.GenerateContentRequest.SerializeToString,
+            cloudcode__pb2.GenerateContentResponse.FromString,
+            options,
+            channel_credentials,
+            insecure,
+            call_credentials,
+            compression,
+            wait_for_ready,
+            timeout,
+            metadata,
+            _registered_method=True)
+
+    @staticmethod
+    def StreamGenerateContent(request,
+            target,
+            options=(),
+            channel_credentials=None,
+            call_credentials=None,
+            insecure=False,
+            compression=None,
+            wait_for_ready=None,
+            timeout=None,
+            metadata=None):
+        return grpc.experimental.unary_stream(
+            request,
+            target,
+            '/google.internal.cloud.code.v1internal.PredictionService/StreamGenerateContent',
+            cloudcode__pb2.GenerateContentRequest.SerializeToString,
+            cloudcode__pb2.StreamGenerateContentChunk.FromString,
+            options,
+            channel_credentials,
+            insecure,
+            call_credentials,
+            compression,
+            wait_for_ready,
+            timeout,
+            metadata,
+            _registered_method=True)
+
+    @staticmethod
+    def FetchAvailableModels(request,
+            target,
+            options=(),
+            channel_credentials=None,
+            call_credentials=None,
+            insecure=False,
+            compression=None,
+            wait_for_ready=None,
+            timeout=None,
+            metadata=None):
+        return grpc.experimental.unary_unary(
+            request,
+            target,
+            '/google.internal.cloud.code.v1internal.PredictionService/FetchAvailableModels',
+            cloudcode__pb2.FetchAvailableModelsRequest.SerializeToString,
+            cloudcode__pb2.FetchAvailableModelsResponse.FromString,
+            options,
+            channel_credentials,
+            insecure,
+            call_credentials,
+            compression,
+            wait_for_ready,
+            timeout,
+            metadata,
+            _registered_method=True)
+
+    @staticmethod
+    def CountTokens(request,
+            target,
+            options=(),
+            channel_credentials=None,
+            call_credentials=None,
+            insecure=False,
+            compression=None,
+            wait_for_ready=None,
+            timeout=None,
+            metadata=None):
+        return grpc.experimental.unary_unary(
+            request,
+            target,
+            '/google.internal.cloud.code.v1internal.PredictionService/CountTokens',
+            cloudcode__pb2.CountTokensRequest.SerializeToString,
+            cloudcode__pb2.CountTokensResponse.FromString,
+            options,
+            channel_credentials,
+            insecure,
+            call_credentials,
+            compression,
+            wait_for_ready,
+            timeout,
+            metadata,
+            _registered_method=True)
+
+    @staticmethod
+    def RetrieveUserQuota(request,
+            target,
+            options=(),
+            channel_credentials=None,
+            call_credentials=None,
+            insecure=False,
+            compression=None,
+            wait_for_ready=None,
+            timeout=None,
+            metadata=None):
+        return grpc.experimental.unary_unary(
+            request,
+            target,
+            '/google.internal.cloud.code.v1internal.PredictionService/RetrieveUserQuota',
+            cloudcode__pb2.RetrieveUserQuotaRequest.SerializeToString,
+            cloudcode__pb2.RetrieveUserQuotaResponse.FromString,
+            options,
+            channel_credentials,
+            insecure,
+            call_credentials,
+            compression,
+            wait_for_ready,
+            timeout,
+            metadata,
+            _registered_method=True)
--- a/src/antigravity_grpc/proto/cloudcode.proto
+++ b/src/antigravity_grpc/proto/cloudcode.proto
@@ -0,0 +1,183 @@
+// Copyright 2026 Codex Launcher Contributors
+// SPDX-License-Identifier: MIT
+//
+// CloudCode internal gRPC service definitions.
+// Reverse-engineered from the agy-core binary for Antigravity proxy fallback.
+// Service: google.internal.cloud.code.v1internal.PredictionService
+//
+// NOTE: google/api/annotations.proto is NOT imported here because it conflicts
+// with the google namespace package at runtime. The HTTP annotations are only
+// needed for Google's Envoy/gRPC-gateway and are unnecessary for our client.
+
+syntax = "proto3";
+
+package google.internal.cloud.code.v1internal;
+
+import "google/protobuf/struct.proto";
+
+option go_package = "google.golang.org/internal/cloud/code/v1internal";
+
+// ─── Reused message types ───────────────────────────────────────────
+
+message Content {
+  string role = 1;
+  repeated Part parts = 2;
+}
+
+message Part {
+  oneof data {
+    string text = 1;
+    InlineData inline_data = 2;
+    FunctionCall function_call = 3;
+    FunctionResponse function_response = 4;
+  }
+  // Thought signature for Gemini continuity
+  string thought_signature = 10;
+  // Thought part (reasoning)
+  bool thought = 11;
+}
+
+message InlineData {
+  string mime_type = 1;
+  bytes data = 2;
+}
+
+message FunctionCall {
+  string name = 1;
+  google.protobuf.Struct args = 2;
+  string id = 3;
+}
+
+message FunctionResponse {
+  string name = 1;
+  google.protobuf.Struct response = 2;
+  string id = 3;
+}
+
+message SafetySetting {
+  string category = 1;
+  string threshold = 2;
+}
+
+message GenerationConfig {
+  int32 max_output_tokens = 1;
+  float temperature = 2;
+  float top_p = 3;
+  int32 thinking_budget = 4;
+  bool include_thoughts = 5;
+  repeated string stop_sequences = 6;
+  message ThinkingConfig {
+    bool include_thoughts = 1;
+    int32 thinking_budget = 2;
+  }
+  ThinkingConfig thinking_config = 7;
+}
+
+message Tool {
+  repeated FunctionDeclaration function_declarations = 1;
+}
+
+message FunctionDeclaration {
+  string name = 1;
+  string description = 2;
+  google.protobuf.Struct parameters = 3;
+}
+
+message ToolConfig {
+  message FunctionCallingConfig {
+    string mode = 1;  // "AUTO", "ANY", "NONE", "VALIDATED"
+    repeated string allowed_function_names = 2;
+  }
+  FunctionCallingConfig function_calling_config = 1;
+}
+
+message Candidate {
+  Content content = 1;
+  string finish_reason = 2;
+  int32 index = 3;
+}
+
+// ─── GenerateContent ─────────────────────────────────────────────────
+
+message GenerateContentRequest {
+  string project = 1;
+  string model = 2;
+  string request_type = 3;
+  string user_agent = 4;
+  string request_id = 5;
+
+  message InnerRequest {
+    repeated Content contents = 1;
+    Content system_instruction = 2;
+    GenerationConfig generation_config = 3;
+    repeated Tool tools = 4;
+    repeated SafetySetting safety_settings = 5;
+    ToolConfig tool_config = 6;
+    string session_id = 7;
+  }
+
+  InnerRequest request = 10;
+}
+
+message GenerateContentResponse {
+  message Response {
+    repeated Candidate candidates = 1;
+  }
+  Response response = 1;
+}
+
+// ─── StreamGenerateContent ────────────────────────────────────────────
+
+message StreamGenerateContentChunk {
+  GenerateContentResponse.Response response = 1;
+}
+
+// ─── FetchAvailableModels ────────────────────────────────────────────
+
+message FetchAvailableModelsRequest {
+  string project = 1;
+}
+
+message FetchAvailableModelsResponse {
+  message ModelInfo {
+    string name = 1;
+    string display_name = 2;
+    string description = 3;
+    int64 context_window = 4;
+  }
+  repeated ModelInfo models = 1;
+}
+
+// ─── CountTokens ──────────────────────────────────────────────────────
+
+message CountTokensRequest {
+  string project = 1;
+  string model = 2;
+  repeated Content contents = 3;
+}
+
+message CountTokensResponse {
+  int32 total_tokens = 1;
+}
+
+// ─── RetrieveUserQuota ───────────────────────────────────────────────
+
+message RetrieveUserQuotaRequest {
+  string project = 1;
+}
+
+message RetrieveUserQuotaResponse {
+  int64 daily_limit = 1;
+  int64 daily_usage = 2;
+  int64 daily_remaining = 3;
+}
+
+// ─── Service ──────────────────────────────────────────────────────────
+
+service PredictionService {
+  rpc GenerateContent(GenerateContentRequest) returns (GenerateContentResponse);
+  rpc StreamGenerateContent(GenerateContentRequest) returns (stream StreamGenerateContentChunk);
+  rpc FetchAvailableModels(FetchAvailableModelsRequest) returns (FetchAvailableModelsResponse);
+  rpc CountTokens(CountTokensRequest) returns (CountTokensResponse);
+  rpc RetrieveUserQuota(RetrieveUserQuotaRequest) returns (RetrieveUserQuotaResponse);
+}
--- a/src/antigravity_grpc/proto/google/api/annotations.proto
+++ b/src/antigravity_grpc/proto/google/api/annotations.proto
@@ -0,0 +1,14 @@
+// Minimal google/api/annotations.proto for code generation.
+
+syntax = "proto3";
+
+package google.api;
+
+import "google/api/http.proto";
+import "google/protobuf/descriptor.proto";
+
+option go_package = "google.golang.org/genproto/googleapis/api/annotations";
+
+extend google.protobuf.MethodOptions {
+  HttpRule http = 72295728;
+}
--- a/src/antigravity_grpc/proto/google/api/http.proto
+++ b/src/antigravity_grpc/proto/google/api/http.proto
@@ -0,0 +1,18 @@
+// Minimal google/api/http.proto for code generation.
+
+syntax = "proto3";
+
+package google.api;
+
+option go_package = "google.golang.org/genproto/googleapis/api/annotations";
+
+message HttpRule {
+  string get = 1;
+  string put = 2;
+  string post = 3;
+  string delete = 4;
+  string patch = 5;
+  repeated HttpRule additional_bindings = 11;
+  string body = 7;
+  string response_body = 12;
+}
--- a/src/cleanup-codex-stale.py
+++ b/src/cleanup-codex-stale.py
@@ -0,0 +1,101 @@
+#!/usr/bin/env python3
+"""Cleanup stale Codex Launcher processes and artifacts — cross-platform.
+
+Kills registered process groups and removes stale PID/socket files left
+by previous Codex Launcher sessions.
+
+Windows: uses taskkill /F /T /PID
+Linux: uses kill -TERM -- -PGID
+"""
+
+import json, os, sys, subprocess, time
+from pathlib import Path
+
+IS_WINDOWS = sys.platform == "win32"
+
+if IS_WINDOWS:
+    _local = os.environ.get("LOCALAPPDATA", str(Path.home() / "AppData" / "Local"))
+    PID_REGISTRY = Path(_local) / "codex-proxy" / "pids.json"
+    CODEX_DIR = Path.home() / ".codex"
+    _local_share = Path(_local)
+    _cache = Path(_local)
+else:
+    PID_REGISTRY = Path.home() / ".cache" / "codex-proxy" / "pids.json"
+    CODEX_DIR = Path.home() / ".codex"
+    _local_share = Path.home() / ".local" / "share"
+    _cache = Path.home() / ".cache"
+
+
+def kill_group(pid):
+    if IS_WINDOWS:
+        subprocess.run(["taskkill", "/F", "/T", "/PID", str(pid)],
+                       capture_output=True, timeout=10)
+    else:
+        import signal
+        try:
+            pgid = os.getpgid(pid)
+            os.killpg(pgid, signal.SIGTERM)
+            time.sleep(0.5)
+            try:
+                os.killpg(pgid, signal.SIGKILL)
+            except OSError:
+                pass
+        except OSError:
+            pass
+
+
+def main():
+    print("[cleanup] Cleaning up stale Codex Launcher processes...", file=sys.stderr)
+
+    if PID_REGISTRY.exists():
+        try:
+            with open(PID_REGISTRY) as f:
+                registry = json.load(f)
+        except Exception as e:
+            print(f"[cleanup] Failed to read PID registry: {e}", file=sys.stderr)
+            registry = {}
+
+        for kind, info in registry.items():
+            pid = info.get("pid") if isinstance(info, dict) else info
+            if pid and isinstance(pid, int):
+                print(f"[cleanup] Killing {kind} (PID {pid})", file=sys.stderr)
+                kill_group(pid)
+
+        try:
+            PID_REGISTRY.unlink()
+        except OSError:
+            pass
+    else:
+        print("[cleanup] No PID registry found — nothing to stop", file=sys.stderr)
+
+    stale_files = []
+    if IS_WINDOWS:
+        stale_files = [
+            _cache / "codex-desktop" / ".codex-desktop-pid",
+            _cache / "codex-desktop" / ".webview-pid",
+        ]
+    else:
+        stale_files = [
+            CODEX_DIR / ".launch-action-socket",
+            CODEX_DIR / ".codex-desktop-launch-action",
+            CODEX_DIR / ".codex-desktop-pid",
+            CODEX_DIR / ".webview-pid",
+            _local_share / "codex-desktop" / ".codex-desktop-pid",
+            _local_share / "codex-desktop" / ".webview-pid",
+            _cache / "codex-desktop" / ".codex-desktop-pid",
+            _cache / "codex-desktop" / ".webview-pid",
+        ]
+
+    for fp in stale_files:
+        try:
+            if fp.exists():
+                fp.unlink()
+                print(f"[cleanup] Removed {fp}", file=sys.stderr)
+        except OSError:
+            pass
+
+    print("[cleanup] Done", file=sys.stderr)
+
+
+if __name__ == "__main__":
+    main()
--- a/src/cleanup-codex-stale.sh
+++ b/src/cleanup-codex-stale.sh
@@ -1,42 +1,51 @@
 #!/bin/bash
-# Cleanup script for Codex Desktop - kills stale processes before launch
+# Cleanup script for Codex Launcher - kills only launcher-owned processes.

-echo "Cleaning up stale Codex processes..." >&2
+set -u

-# Kill codex app-server processes
-for pid in $(ps aux 2>/dev/null | grep -E "codex .*app-server" | grep -v grep | awk '{print $2}'); do
-  kill -9 "$pid" 2>/dev/null || true
-  echo "  Killed app-server pid=$pid"
+REGISTRY="${HOME}/.cache/codex-launcher/pids.json"
+
+echo "Cleaning up launcher-owned processes..." >&2
+
+kill_group() {
+  kind="$1"
+  pgid="$2"
+
+  if [ -z "$pgid" ] || [ "$pgid" = "null" ]; then
+    return 0
+  fi
+
+  if kill -TERM -- "-$pgid" 2>/dev/null; then
+    echo "  Stopped ${kind} pgid=${pgid}"
+    return 0
+  fi
+
+  return 0
+}
+
+if [ -f "$REGISTRY" ]; then
+  python3 - "$REGISTRY" <<'PY'
+import json, sys
+from pathlib import Path
+
+path = Path(sys.argv[1])
+try:
+    data = json.loads(path.read_text())
+except Exception:
+    data = {}
+
+for kind, meta in sorted(data.items()):
+    pgid = meta.get('pgid') if isinstance(meta, dict) else None
+    if pgid:
+        print(f'{kind}\t{pgid}')
+PY
+else
+  echo "  No registry found; nothing to stop"
+fi | while IFS=$'\t' read -r kind pgid; do
+  [ -n "${kind:-}" ] || continue
+  kill_group "$kind" "$pgid"
 done

-# Kill webview server
-for pid in $(ps aux 2>/dev/null | grep webview-server.py | grep -v grep | awk '{print $2}'); do
-  kill -9 "$pid" 2>/dev/null || true
-  echo "  Killed webview-server pid=$pid"
-done
-
-# Kill main electron process for codex-desktop
-for pid in $(ps aux 2>/dev/null | grep "/opt/codex-desktop/electron" | grep "class=codex-desktop" | grep -v grep | awk '{print $2}'); do
-  kill -9 "$pid" 2>/dev/null || true
-  echo "  Killed electron pid=$pid"
-done
-
-# Kill all remaining child processes of codex-desktop
-for pid in $(ps aux 2>/dev/null | grep "/opt/codex-desktop/" | grep -v grep | awk '{print $2}'); do
-  kill -9 "$pid" 2>/dev/null || true
-done
-
-# Kill zai proxy (if any)
-for pid in $(ps aux 2>/dev/null | grep zai-proxy.py | grep -v grep | awk '{print $2}'); do
-  kill "$pid" 2>/dev/null || true
-done
-
-# Kill unified translation proxy (if any)
-for pid in $(ps aux 2>/dev/null | grep translate-proxy.py | grep -v grep | awk '{print $2}'); do
-  kill "$pid" 2>/dev/null || true
-done
-
-# Remove stale socket and PID files
 rm -f "$HOME/.codex/.launch-action-socket" 2>/dev/null || true
 rm -f "$HOME/.codex/.codex-desktop-launch-action" 2>/dev/null || true
 rm -f "$HOME/.local/share/codex-desktop/.launch-action-socket" 2>/dev/null || true
@@ -46,12 +55,4 @@ rm -f "$HOME/.cache/codex-desktop/.codex-desktop-pid" 2>/dev/null || true
 rm -f "$HOME/.local/share/codex-desktop/.webview-pid" 2>/dev/null || true
 rm -f "$HOME/.cache/codex-desktop/.webview-pid" 2>/dev/null || true

-sleep 1
-
-# Verify no remaining process on port 5175 (webview)
-if lsof -ti :5175 2>/dev/null | grep -q .; then
-  echo "  Warning: Port 5175 still in use"
-  lsof -ti :5175 2>/dev/null | xargs kill -9 2>/dev/null || true
-fi
-
 echo "Cleanup complete"
--- a/src/codex-launcher-gui
+++ b/src/codex-launcher-gui
--- a/src/codex-launcher-gui.py
+++ b/src/codex-launcher-gui.py
--- a/src/codex_launcher_lib.py
+++ b/src/codex_launcher_lib.py
--- a/src/translate-proxy.py
+++ b/src/translate-proxy.py
--- a/test-antigravity.sh
+++ b/test-antigravity.sh
@@ -0,0 +1,482 @@
+#!/usr/bin/env bash
+# ═══════════════════════════════════════════════════════════════════
+# test-antigravity.sh — End-to-end Antigravity proxy test + real task
+#
+# Phases:
+#   1. Token validity
+#   2. Direct REST endpoint probe
+#   3. Proxy adapter (start proxy, test /responses)
+#   4. Real Codex CLI task (if --task flag given)
+#   5. Anomaly detection + analysis
+#
+# Usage:
+#   bash ~/.local/bin/test-antigravity.sh              # quick tests
+#   bash ~/.local/bin/test-antigravity.sh --task        # + real CLI task
+#   bash ~/.local/bin/test-antigravity.sh --verbose     # show all logs
+# Exit:  0 = all pass, 1 = some fail
+# ═══════════════════════════════════════════════════════════════════
+set -uo pipefail
+
+VERBOSE=0; RUN_TASK=0
+for arg in "$@"; do
+    case "$arg" in
+        --verbose|-v) VERBOSE=1 ;;
+        --task|-t) RUN_TASK=1 ;;
+    esac
+done
+
+RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'; CYAN='\033[0;36m'; NC='\033[0m'
+PASS=0; FAIL=0; SKIP=0; RESULTS=()
+log_pass() { echo -e "  ${GREEN}PASS${NC} $1"; ((PASS++)); RESULTS+=("PASS $1"); }
+log_fail() { echo -e "  ${RED}FAIL${NC} $1"; ((FAIL++)); RESULTS+=("FAIL $1"); }
+log_skip() { echo -e "  ${YELLOW}SKIP${NC} $1"; ((SKIP++)); RESULTS+=("SKIP $1"); }
+log_info() { echo -e "  ${CYAN}INFO${NC} $1"; }
+
+TOKEN_PATH="$HOME/.cache/codex-proxy/google-antigravity-oauth-token.json"
+[ ! -f "$TOKEN_PATH" ] && { echo "ERROR: No token file. Login via GUI first."; exit 1; }
+
+ACCESS_TOKEN=$(python3 -c "
+import json, os, sys, time, urllib.request, urllib.parse
+tp = os.path.expanduser('~/.cache/codex-proxy/google-antigravity-oauth-token.json')
+d = json.load(open(tp))
+if d.get('expires_at', 0) > time.time(): print(d['access_token']); sys.exit(0)
+cid, cs, rt = d.get('client_id',''), d.get('client_secret',''), d.get('refresh_token','')
+if not all([cid, cs, rt]): print('ERROR'); sys.exit(1)
+data = urllib.parse.urlencode({'client_id':cid,'client_secret':cs,'refresh_token':rt,'grant_type':'refresh_token'}).encode()
+resp = urllib.request.urlopen(urllib.request.Request('https://oauth2.googleapis.com/token', data=data), timeout=15)
+tok = json.loads(resp.read()); d.update(tok); d['expires_at'] = time.time() + tok.get('expires_in',3600)
+json.dump(d, open(tp,'w')); print(tok.get('access_token','ERROR'))
+" 2>&1) || true
+[[ "$ACCESS_TOKEN" == ERROR* ]] || [ -z "$ACCESS_TOKEN" ] && { echo "ERROR: Token refresh failed: $ACCESS_TOKEN"; exit 1; }
+
+PROJECT_ID=$(python3 -c "import json; print(json.load(open('$TOKEN_PATH')).get('project_id',''))")
+[ -z "$PROJECT_ID" ] && { echo "ERROR: No project_id"; exit 1; }
+
+echo "═══════════════════════════════════════════════════════════════"
+echo " Antigravity E2E Test Suite"
+echo "═══════════════════════════════════════════════════════════════"
+echo " Project: $PROJECT_ID  Token: ${ACCESS_TOKEN:0:20}..."
+
+# ── Test 1: Token validity ────────────────────────────────────────
+echo ""; echo "─── Test 1: Token Validity ───"
+HTTP=$(curl -s -o /dev/null -w "%{http_code}" -H "Authorization: Bearer $ACCESS_TOKEN" \
+    "https://www.googleapis.com/oauth2/v1/userinfo" --max-time 5)
+[ "$HTTP" = "200" ] && log_pass "Token valid" || log_fail "Token invalid (HTTP $HTTP)"
+
+# ── Test 2: Direct REST probe (prod first, fast timeout) ─────────
+echo ""; echo "─── Test 2: Direct REST Endpoint Probe ───"
+ENDPOINTS=(
+    "https://cloudcode-pa.googleapis.com"
+    "https://daily-cloudcode-pa.sandbox.googleapis.com"
+    "https://autopush-cloudcode-pa.sandbox.googleapis.com"
+)
+MODELS=("gemini-3-flash")
+BEST_EP=""; BEST_MODEL=""
+
+for model in "${MODELS[@]}"; do
+    for ep in "${ENDPOINTS[@]}"; do
+        ep_s=$(echo "$ep" | sed 's|https://||;s|.googleapis.com||')
+        RESP=$(curl -s -w "\n%{http_code}" -X POST "${ep}/v1internal:generateContent" \
+            -H "Content-Type: application/json" \
+            -H "Authorization: Bearer $ACCESS_TOKEN" \
+            -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Antigravity/2.0.6 Chrome/138.0.7204.235 Electron/37.3.1 Safari/537.36" \
+            -H 'Client-Metadata: {"ideType":"ANTIGRAVITY","platform":"LINUX","pluginType":"GEMINI"}' \
+            -d "{\"project\":\"$PROJECT_ID\",\"model\":\"$model\",\"requestType\":\"agent\",\"userAgent\":\"antigravity/2.0.6 linux/x64\",\"requestId\":\"t$(date +%s)\",\"request\":{\"contents\":[{\"role\":\"user\",\"parts\":[{\"text\":\"Say hi\"}]}],\"sessionId\":\"t$(date +%s%N)\",\"generationConfig\":{\"maxOutputTokens\":256}}}" \
+            --connect-timeout 5 --max-time 20 2>&1)
+        HTTP=$(echo "$RESP" | tail -1); BODY=$(echo "$RESP" | sed '$d')
+        if [ "$HTTP" = "200" ]; then
+            TEXT=$(echo "$BODY" | python3 -c "
+import sys, json
+try:
+    d = json.load(sys.stdin)
+    parts = d.get('response',{}).get('candidates',[{}])[0].get('content',{}).get('parts',[])
+    texts = [p['text'] for p in parts if 'text' in p and p['text']]
+    print(' '.join(texts)[:80] if texts else 'EMPTY')
+except: print('EMPTY')" 2>/dev/null)
+            if [ "$TEXT" != "EMPTY" ] && ! echo "$TEXT" | grep -qi "no longer supported"; then
+                log_pass "$model @ ${ep_s} → \"$TEXT\""
+                [ -z "$BEST_EP" ] && BEST_EP="$ep" && BEST_MODEL="$model"
+            else
+                log_fail "$model @ ${ep_s} → 200 but empty/deprecated"
+            fi
+        else
+            ERR=$(echo "$BODY" | python3 -c "
+import sys, json
+try: print(json.load(sys.stdin).get('error',{}).get('status','')[:50])
+except: pass" 2>/dev/null)
+            log_skip "$model @ ${ep_s} → $HTTP $ERR"
+        fi
+    done
+done
+
+# ── Test 3: Proxy adapter (start proxy, test /responses) ──────────
+echo ""; echo "─── Test 3: Proxy Adapter (end-to-end) ───"
+set +e
+
+TEST_PORT=$(python3 -c "import socket; s=socket.socket(); s.bind(('',0)); print(s.getsockname()[1]); s.close()")
+PROXY_API_KEY="test-$RANDOM"
+
+find /home/roman/.local/bin -name "__pycache__" -type d -exec rm -rf {} + 2>/dev/null; true
+
+PROXY_PID=""
+export PROXY_PORT=$TEST_PORT
+export PROXY_API_KEY=$PROXY_API_KEY
+export PROXY_BACKEND=gemini-oauth-antigravity
+export PROXY_TARGET_URL=https://cloudcode-pa.googleapis.com
+python3 /home/roman/.local/bin/translate-proxy.py >/tmp/antigravity-test-proxy.log 2>&1 &
+PROXY_PID=$!
+
+cleanup() { kill $PROXY_PID 2>/dev/null || true; wait $PROXY_PID 2>/dev/null || true; }
+trap cleanup EXIT
+
+sleep 3
+if ! kill -0 $PROXY_PID 2>/dev/null; then
+    log_fail "Proxy failed to start (port $TEST_PORT)"
+    cat /tmp/antigravity-test-proxy.log 2>/dev/null | tail -5
+else
+    log_pass "Proxy started on :$TEST_PORT"
+
+    # /v1/models
+    HTTP=$(curl -s -o /dev/null -w "%{http_code}" -H "Authorization: Bearer $PROXY_API_KEY" \
+        "http://127.0.0.1:$TEST_PORT/v1/models" --max-time 5)
+    [ "$HTTP" = "200" ] && log_pass "/v1/models → 200" || log_fail "/v1/models → $HTTP"
+
+    # /responses (non-stream)
+    RESP_HTTP=$(curl -s -w "%{http_code}" -o /tmp/antigravity-test-response.json \
+        -X POST "http://127.0.0.1:$TEST_PORT/responses" \
+        -H "Content-Type: application/json" \
+        -H "Authorization: Bearer $PROXY_API_KEY" \
+        -d '{
+            "model":"gemini-3.5-flash-high",
+            "stream":false,
+            "input":[{"type":"message","role":"user","content":[{"type":"input_text","text":"Say hello in exactly 3 words"}]}],
+            "tools":[{"type":"function","name":"test_tool","description":"test","parameters":{"type":"object","properties":{"cmd":{"type":"string"}}}}],
+            "instructions":"You are a helpful assistant.",
+            "max_output_tokens":256
+        }' --connect-timeout 10 --max-time 60 2>&1)
+
+    if [ "$RESP_HTTP" = "200" ]; then
+        TEXT=$(python3 -c "
+import json
+d = json.load(open('/tmp/antigravity-test-response.json'))
+out = d.get('output', [])
+texts = []
+for item in out:
+    for p in (item.get('content', []) if isinstance(item, dict) else []):
+        if isinstance(p, dict): texts.append(p.get('text', ''))
+print(' '.join(t for t in texts if t).strip()[:120] or 'EMPTY')
+" 2>/dev/null)
+        if [ "$TEXT" = "EMPTY" ]; then
+            log_fail "Proxy /responses → 200 but EMPTY"
+        else
+            log_pass "Proxy /responses → 200: \"$TEXT\""
+        fi
+    else
+        ERR=$(python3 -c "
+import json; d = json.load(open('/tmp/antigravity-test-response.json'))
+print(d.get('error',{}).get('message','')[:120])" 2>/dev/null || echo "unknown")
+        log_fail "Proxy /responses → $RESP_HTTP: $ERR"
+    fi
+
+    # Verify model resolution in logs
+    if grep -q "model resolved: gemini-3.5-flash-high -> gemini-3-flash" /tmp/antigravity-test-proxy.log; then
+        log_pass "Model resolution: gemini-3.5-flash-high → gemini-3-flash"
+    else
+        log_fail "Model resolution not found in proxy logs"
+    fi
+
+    [ "$VERBOSE" = "1" ] && cat /tmp/antigravity-test-proxy.log
+fi
+
+# ── Test 4: Real Codex CLI Task ────────────────────────────────────
+if [ "$RUN_TASK" = "1" ]; then
+    echo ""; echo "─── Test 4: Real Codex CLI Task ───"
+
+    if ! command -v codex &>/dev/null; then
+        log_skip "Codex CLI not found"
+    else
+        CLI_VERSION=$(codex --version 2>/dev/null || echo "unknown")
+        log_info "Codex CLI: $CLI_VERSION"
+
+        TASK_PROMPT='Create a file /tmp/e2e-test-output.txt with the text "Hello from Codex CLI E2E test" followed by the current date. Then read it back and confirm the content is correct. This is a simple smoke test.'
+
+        TASK_WORKSPACE="/tmp/e2e-test-workspace"
+        mkdir -p "$TASK_WORKSPACE"
+
+        mkdir -p /tmp/antigravity-task-logs
+        TASK_PROXY_LOG="/tmp/antigravity-task-logs/proxy-$(date +%s).log"
+        TASK_CLI_LOG="/tmp/antigravity-task-logs/cli-$(date +%s).log"
+        TASK_MONITOR_LOG="/tmp/antigravity-task-logs/monitor-$(date +%s).log"
+
+        # Set up proxy for CLI task (use the one already running on TEST_PORT)
+        # Write codex profile + config pointing to our test proxy
+        CONFIG_DIR="$HOME/.codex"
+        CONFIG_FILE="$CONFIG_DIR/config.toml"
+        CONFIG_BACKUP="$CONFIG_DIR/config.toml.task-backup"
+
+        [ -f "$CONFIG_FILE" ] && cp "$CONFIG_FILE" "$CONFIG_BACKUP"
+
+        # Generate model catalog
+        CATALOG_PATH="$HOME/.cache/codex-proxy/models-Antigravity-Test.json"
+        python3 -c "
+import json, os
+models = ['gemini-3.5-flash-high', 'gemini-3.5-flash-medium', 'gemini-3.5-flash-low',
+          'gemini-3.1-pro-high', 'gemini-3.1-pro-low',
+          'claude-sonnet-4-6', 'claude-opus-4-6-thinking', 'gpt-oss-120b-medium']
+catalog = []
+for m in models:
+    catalog.append({'slug':m,'model':m,'display_name':m,'description':'Antigravity '+m,'hidden':False,'isDefault':m=='gemini-3.5-flash-high','shell_type':'shell_command','visibility':'list','default_reasoning_level':'medium','supported_reasoning_levels':[{'effort':'low','description':'Fast'},{'effort':'medium','description':'Balanced'},{'effort':'high','description':'Deep'}]})
+os.makedirs(os.path.dirname('$CATALOG_PATH'), exist_ok=True)
+json.dump(catalog, open('$CATALOG_PATH','w'), indent=2)
+" || log_fail "Failed to create model catalog"
+
+        # Write main config
+        cat > "$CONFIG_FILE" <<CONFEOF
+model = "gemini-3.5-flash-high"
+model_provider = "Antigravity Test"
+model_catalog_json = "$CATALOG_PATH"
+
+[model_providers."Antigravity Test"]
+name = "Antigravity Test"
+base_url = "http://127.0.0.1:$TEST_PORT"
+experimental_bearer_token = "$PROXY_API_KEY"
+wire_api = "responses"
+request_max_retries = 1
+stream_max_retries = 0
+stream_idle_timeout_ms = 600000
+
+[projects."/home/roman/Codex-Launcher-Any-AI-Provider"]
+trust_level = "trusted"
+CONFEOF
+
+        # Write profile file for Codex CLI 0.134.0+
+        PROFILE_FILE="$CONFIG_DIR/Antigravity-Test.config.toml"
+        cat > "$PROFILE_FILE" <<PROFEOF
+model = "gemini-3.5-flash-high"
+model_provider = "Antigravity Test"
+model_catalog_json = "$CATALOG_PATH"
+service_tier = "fast"
+approvals_reviewer = "user"
+PROFEOF
+
+        log_info "Config written: profile=Antigravity-Test, port=$TEST_PORT"
+
+        # ── Anomaly monitor (background) ──
+        ANOMALY_FOUND=0
+        (
+            PROXY_LOG="/tmp/antigravity-test-proxy.log"
+            START_TIME=$(date +%s)
+            TIMEOUT_SEC=600
+            PREV_LINE_COUNT=0
+            STALL_COUNT=0
+            LOOP_DETECTOR=""
+            LOOP_COUNT=0
+
+            while true; do
+                sleep 10
+                [ ! -f "$PROXY_LOG" ] && continue
+
+                NOW=$(date +%s)
+                ELAPSED=$(( NOW - START_TIME ))
+                [ "$ELAPSED" -gt "$TIMEOUT_SEC" ] && {
+                    echo "[MONITOR] TIMEOUT: Task exceeded ${TIMEOUT_SEC}s" >> "$TASK_MONITOR_LOG"
+                    break
+                }
+
+                # Check proxy is alive
+                if ! kill -0 $PROXY_PID 2>/dev/null; then
+                    echo "[MONITOR] FATAL: Proxy process died" >> "$TASK_MONITOR_LOG"
+                    break
+                fi
+
+                # Count lines in proxy log
+                LINE_COUNT=$(wc -l < "$PROXY_LOG" 2>/dev/null || echo 0)
+                NEW_LINES=$(( LINE_COUNT - PREV_LINE_COUNT ))
+                PREV_LINE_COUNT=$LINE_COUNT
+
+                # Stall detection: no new log lines for 3 consecutive checks = stalled
+                if [ "$NEW_LINES" -eq 0 ]; then
+                    STALL_COUNT=$(( STALL_COUNT + 1 ))
+                    if [ "$STALL_COUNT" -ge 18 ]; then
+                        echo "[MONITOR] STALL: No proxy activity for 180s" >> "$TASK_MONITOR_LOG"
+                    fi
+                else
+                    STALL_COUNT=0
+                fi
+
+                # Loop detection: check if same tool call repeats
+                RECENT=$(tail -50 "$PROXY_LOG" 2>/dev/null | grep "exec_command" | tail -5 | md5sum | cut -c1-8)
+                if [ -n "$RECENT" ] && [ "$RECENT" = "$LOOP_DETECTOR" ]; then
+                    LOOP_COUNT=$(( LOOP_COUNT + 1 ))
+                    if [ "$LOOP_COUNT" -ge 6 ]; then
+                        echo "[MONITOR] LOOP: Same tool calls repeating ($LOOP_COUNT times)" >> "$TASK_MONITOR_LOG"
+                    fi
+                else
+                    LOOP_DETECTOR="$RECENT"
+                    LOOP_COUNT=0
+                fi
+
+                # Check for error patterns
+                ERRORS=$(tail -100 "$PROXY_LOG" 2>/dev/null | grep -ciE "error|failed|timeout|500|502|503|429" || echo 0)
+                if [ "$ERRORS" -gt 10 ]; then
+                    echo "[MONITOR] ERRORS: $ERRORS error lines in last 100 log lines" >> "$TASK_MONITOR_LOG"
+                fi
+
+                # Check for compaction issues
+                COMPACT_LINES=$(tail -200 "$PROXY_LOG" 2>/dev/null | grep -c "compacted\|compaction\|trimming" || echo 0)
+                if [ "$COMPACT_LINES" -gt 20 ]; then
+                    echo "[MONITOR] COMPACTION: Excessive compaction ($COMPACT_LINES events)" >> "$TASK_MONITOR_LOG"
+                fi
+
+                # Check context item count
+                HIGH_ITEM=$(tail -200 "$PROXY_LOG" 2>/dev/null | grep -oP '\[\d+\]' | grep -oP '\d+' | sort -rn | head -1 || echo 0)
+                if [ -n "$HIGH_ITEM" ] && [ "$HIGH_ITEM" -gt 100 ]; then
+                    echo "[MONITOR] CONTEXT: High item count detected: [$HIGH_ITEM]" >> "$TASK_MONITOR_LOG"
+                fi
+
+                # Log heartbeat
+                echo "[MONITOR] ${ELAPSED}s elapsed, ${LINE_COUNT} log lines, ${NEW_LINES} new, ${ERRORS} errors" >> "$TASK_MONITOR_LOG"
+            done
+        ) &
+        MONITOR_PID=$!
+
+        # ── Launch Codex CLI with the task ──
+        log_info "Launching Codex CLI with real task..."
+        log_info "Task: Create and verify a simple test file"
+        log_info "Monitor log: $TASK_MONITOR_LOG"
+
+        cd "$TASK_WORKSPACE"
+
+        set +e
+        codex exec --profile Antigravity-Test -c "model=gemini-3.5-flash-high" \
+            -c 'sandbox_permissions=["disk-full-read-access","disk-full-write-access"]' \
+            "$TASK_PROMPT" \
+            > "$TASK_CLI_LOG" 2>&1
+        CLI_EXIT=$?
+        set -e
+
+        # Stop monitor
+        kill $MONITOR_PID 2>/dev/null || true
+        wait $MONITOR_PID 2>/dev/null || true
+
+        CLI_DURATION=$(wc -l < "$TASK_CLI_LOG" 2>/dev/null || echo 0)
+        log_info "CLI exited (code $CLI_EXIT, $CLI_DURATION output lines)"
+
+        # ── Analyze results ──
+        echo ""; echo "─── Test 4a: CLI Task Results ───"
+
+        if [ "$CLI_EXIT" -eq 0 ]; then
+            log_pass "CLI task completed successfully"
+        else
+            log_fail "CLI task failed (exit code $CLI_EXIT)"
+            echo "    Last 10 lines of CLI output:"
+            tail -10 "$TASK_CLI_LOG" 2>/dev/null | sed 's/^/    /'
+        fi
+
+        # Check monitor log for anomalies
+        echo ""; echo "─── Test 4b: Anomaly Analysis ───"
+        if [ -f "$TASK_MONITOR_LOG" ]; then
+            ANOMALIES=$(grep -c "\[MONITOR\]" "$TASK_MONITOR_LOG" 2>/dev/null || echo 0)
+            CRITICAL=$(grep -cE "FATAL|LOOP|TIMEOUT|STALL|ERRORS|COMPACTION|CONTEXT" "$TASK_MONITOR_LOG" 2>/dev/null || echo 0)
+            log_info "Monitor: $ANOMALIES checks, $CRITICAL anomalies detected"
+
+            if [ "$CRITICAL" -gt 0 ]; then
+                echo -e "  ${RED}ANOMALIES FOUND:${NC}"
+                grep -E "FATAL|LOOP|TIMEOUT|STALL|ERRORS|COMPACTION|CONTEXT" "$TASK_MONITOR_LOG" | while read line; do
+                    echo -e "    ${RED}$line${NC}"
+                done
+                log_fail "$CRITICAL anomalies detected during task"
+            else
+                log_pass "No anomalies detected during task"
+            fi
+
+            [ "$VERBOSE" = "1" ] && cat "$TASK_MONITOR_LOG"
+        else
+            log_skip "No monitor log produced"
+        fi
+
+        # Check proxy log for issues
+        echo ""; echo "─── Test 4c: Proxy Health ───"
+        if [ -f "/tmp/antigravity-test-proxy.log" ]; then
+            ERROR_COUNT=$(grep -ciE "error|failed|exception|traceback" /tmp/antigravity-test-proxy.log || echo 0)
+            TIMEOUT_COUNT=$(grep -ci "timeout\|timed.out" /tmp/antigravity-test-proxy.log || echo 0)
+            COMPACT_COUNT=$(grep -c "compacted\|compaction" /tmp/antigravity-test-proxy.log || echo 0)
+            ITEM_COUNT=$(grep -oP '\[\d+\]' /tmp/antigravity-test-proxy.log | grep -oP '\d+' | sort -rn | head -1 || echo 0)
+
+            log_info "Proxy errors: $ERROR_COUNT, timeouts: $TIMEOUT_COUNT, compactions: $COMPACT_COUNT, max context items: $ITEM_COUNT"
+
+            [ "$ERROR_COUNT" -gt 20 ] && log_fail "High error count: $ERROR_COUNT"
+            [ "$TIMEOUT_COUNT" -gt 5 ] && log_fail "Timeout count: $TIMEOUT_COUNT"
+            [ "$ITEM_COUNT" -gt 100 ] && log_fail "Context items grew to: $ITEM_COUNT (compaction may be failing)"
+            [ "$ITEM_COUNT" -le 100 ] && [ "$ITEM_COUNT" -gt 0 ] && log_pass "Context items stayed under 100 (max: $ITEM_COUNT)"
+
+            # Check for repeated identical tool calls (loop detection)
+            DUPE_CALLS=$(grep "exec_command" /tmp/antigravity-test-proxy.log | sed 's/.*args=//' | sort | uniq -c | sort -rn | head -1 | awk '{print $1}' || echo 0)
+            if [ "$DUPE_CALLS" -gt 10 ]; then
+                log_fail "Loop detected: same tool call repeated $DUPE_CALLS times"
+            else
+                log_pass "No tool call loops (max repeat: $DUPE_CALLS)"
+            fi
+        fi
+
+        # Check if the file was actually created
+        echo ""; echo "─── Test 4d: Task Output Quality ───"
+        if [ -f "/tmp/e2e-test-output.txt" ]; then
+            CONTENT=$(cat /tmp/e2e-test-output.txt 2>/dev/null)
+            if echo "$CONTENT" | grep -q "Hello from Codex CLI E2E test"; then
+                log_pass "Task output file created with correct content"
+            else
+                log_fail "Task output file exists but content is wrong: $CONTENT"
+            fi
+        else
+            log_fail "Task output file /tmp/e2e-test-output.txt was NOT created"
+        fi
+
+        # Check proxy log for tool-strip events (budget cap defense)
+        echo ""; echo "─── Test 4e: Anti-Loop Defense Verification ───"
+        if [ -f "/tmp/antigravity-test-proxy.log" ]; then
+            NULL_TOOL_LOOPS=$(grep -c "NULL-TOOL LOOP" /tmp/antigravity-test-proxy.log || echo 0)
+            TOOL_STRIPPED=$(grep -c "TOOLS STRIPPED" /tmp/antigravity-test-proxy.log || echo 0)
+            BUDGET_HIT=$(grep -c "HARD CAP" /tmp/antigravity-test-proxy.log || echo 0)
+            READ_LOOP=$(grep -c "FILE READ LOOP" /tmp/antigravity-test-proxy.log || echo 0)
+            FORCE_FINALIZE=$(grep -c "force_finalize" /tmp/antigravity-test-proxy.log || echo 0)
+
+            log_info "Anti-loop events: null-tool=$NULL_TOOL_LOOPS stripped=$TOOL_STRIPPED budget=$BUDGET_HIT read-loop=$READ_LOOP finalize=$FORCE_FINALIZE"
+
+            # For a simple task, none of these should fire
+            if [ "$BUDGET_HIT" -gt 0 ]; then
+                log_fail "Budget cap hit on simple task — model looping"
+            else
+                log_pass "No budget cap triggered (task completed cleanly)"
+            fi
+
+            if [ "$TOOL_STRIPPED" -gt 0 ]; then
+                log_fail "Tools were stripped — model hit hard limit"
+            else
+                log_pass "No tool stripping needed (model behaved)"
+            fi
+        fi
+
+        # Restore original config
+        [ -f "$CONFIG_BACKUP" ] && mv "$CONFIG_BACKUP" "$CONFIG_FILE"
+        rm -f "$PROFILE_FILE"
+
+        log_info "Config restored"
+    fi
+fi
+
+# ── Summary ───────────────────────────────────────────────────────
+echo ""
+echo "═══════════════════════════════════════════════════════════════"
+echo " Results: $PASS passed, $FAIL failed, $SKIP skipped"
+echo "═══════════════════════════════════════════════════════════════"
+[ -n "$BEST_EP" ] && echo -e " ${GREEN}Best direct:${NC} $BEST_MODEL @ $BEST_EP"
+
+if [ "$FAIL" -gt 0 ]; then
+    echo -e "\n${RED}FAILED — Do NOT push until all tests pass${NC}"
+    for r in "${RESULTS[@]}"; do echo "$r" | grep -q "^FAIL" && echo "  $r"; done
+    exit 1
+else
+    echo -e "\n${GREEN}ALL TESTS PASSED — Safe to push${NC}"
+    exit 0
+fi
--- a/tests/init.py
+++ b/tests/init.py
--- a/tests/test_antigravity_grpc.py
+++ b/tests/test_antigravity_grpc.py
@@ -0,0 +1,396 @@
+#!/usr/bin/env python3
+"""
+Unit tests for the Antigravity gRPC fallback module.
+
+Tests cover:
+1. Module import and availability detection
+2. Protobuf conversion helpers (JSON <-> protobuf)
+3. Request building from wrapped REST dict
+4. Reverse alias map correctness
+5. GrpcFallbackResult type
+6. Integration: _try_grpc_fallback triggers correctly on REST 404
+"""
+
+import json
+import os
+import sys
+import unittest
+from unittest.mock import patch, MagicMock
+
+# Add src to path so we can import the antigravity_grpc package
+_src_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), "..", "src")
+if _src_dir not in sys.path:
+    sys.path.insert(0, _src_dir)
+
+
+class TestGrpcModuleAvailability(unittest.TestCase):
+    """Tests for is_grpc_available() and module loading."""
+
+    def test_is_grpc_available_returns_bool(self):
+        """is_grpc_available should return a boolean."""
+        from antigravity_grpc import is_grpc_available
+        result = is_grpc_available()
+        self.assertIsInstance(result, bool)
+
+    def test_is_grpc_available_true_when_installed(self):
+        """If grpcio is installed and stubs are loadable, should return True."""
+        from antigravity_grpc import is_grpc_available
+        # grpcio was installed at test time, so this should be True
+        self.assertTrue(is_grpc_available())
+
+    def test_client_instantiation(self):
+        """AntigravityGrpcClient should be instantiatable."""
+        from antigravity_grpc import AntigravityGrpcClient
+        client = AntigravityGrpcClient()
+        self.assertIsNotNone(client)
+
+    def test_get_client_singleton(self):
+        """get_client should return the same singleton."""
+        from antigravity_grpc import get_client
+        c1 = get_client()
+        c2 = get_client()
+        self.assertIs(c1, c2)
+
+
+class TestGrpcFallbackResult(unittest.TestCase):
+    """Tests for GrpcFallbackResult type."""
+
+    def test_default_values(self):
+        from antigravity_grpc import GrpcFallbackResult
+        r = GrpcFallbackResult()
+        self.assertFalse(r.ok)
+        self.assertIsNone(r.response_data)
+        self.assertIsNone(r.stream_chunks)
+        self.assertEqual(r.error_message, "")
+        self.assertEqual(r.endpoint_used, "")
+        self.assertEqual(r.model_used, "")
+        self.assertEqual(r.elapsed_s, 0.0)
+
+    def test_success_result(self):
+        from antigravity_grpc import GrpcFallbackResult
+        r = GrpcFallbackResult(ok=True, response_data={"response": {"candidates": []}},
+                                endpoint_used="daily-cloudcode-pa.googleapis.com:443",
+                                model_used="Gemini 3.5 Flash (High)",
+                                elapsed_s=2.5)
+        self.assertTrue(r.ok)
+        self.assertIsNotNone(r.response_data)
+        self.assertEqual(r.elapsed_s, 2.5)
+
+    def test_failure_result(self):
+        from antigravity_grpc import GrpcFallbackResult
+        r = GrpcFallbackResult(ok=False, error_message="All gRPC endpoints failed")
+        self.assertFalse(r.ok)
+        self.assertIn("failed", r.error_message)
+
+    def test_repr(self):
+        from antigravity_grpc import GrpcFallbackResult
+        r_ok = GrpcFallbackResult(ok=True, response_data={"response": {"candidates": []}})
+        self.assertIn("OK", repr(r_ok))
+        r_fail = GrpcFallbackResult(ok=False, error_message="timeout")
+        self.assertIn("FAIL", repr(r_fail))
+
+
+class TestReverseAliasMap(unittest.TestCase):
+    """Tests for the _GRPC_REVERSE_ALIAS map in translate-proxy.py."""
+
+    def test_import_reverse_alias(self):
+        """The reverse alias map should be importable from the proxy module."""
+        import importlib
+        _spec = importlib.util.spec_from_file_location(
+            "translate_proxy",
+            os.path.join(_src_dir, "translate-proxy.py"),
+        )
+        tp = importlib.util.module_from_spec(_spec)
+        _spec.loader.exec_module(tp)
+        self.assertIsInstance(tp._GRPC_REVERSE_ALIAS, dict)
+
+    def test_key_models_have_reverse_aliases(self):
+        """All key REST model slugs should have gRPC display name mappings."""
+        import importlib
+        _spec = importlib.util.spec_from_file_location(
+            "translate_proxy",
+            os.path.join(_src_dir, "translate-proxy.py"),
+        )
+        tp = importlib.util.module_from_spec(_spec)
+        _spec.loader.exec_module(tp)
+
+        required_slugs = [
+            "gemini-3-flash",
+            "gemini-3.5-flash-low",
+            "gemini-3.1-pro-low",
+            "claude-sonnet-4-6",
+            "claude-opus-4-6-thinking",
+            "gemini-2.5-flash",
+        ]
+        for slug in required_slugs:
+            self.assertIn(slug, tp._GRPC_REVERSE_ALIAS,
+                         f"Missing reverse alias for REST slug '{slug}'")
+
+    def test_reverse_alias_values_are_display_names(self):
+        """gRPC display names should contain spaces and parentheses, not hyphens."""
+        import importlib
+        _spec = importlib.util.spec_from_file_location(
+            "translate_proxy",
+            os.path.join(_src_dir, "translate-proxy.py"),
+        )
+        tp = importlib.util.module_from_spec(_spec)
+        _spec.loader.exec_module(tp)
+
+        for slug, display_name in tp._GRPC_REVERSE_ALIAS.items():
+            # Display names typically have spaces (e.g. "Gemini 3.5 Flash (High)")
+            # while slugs use hyphens (e.g. "gemini-3-flash")
+            self.assertNotEqual(slug, display_name,
+                               f"Reverse alias for '{slug}' should differ from slug (gRPC uses display names)")
+
+
+class TestProtobufConversion(unittest.TestCase):
+    """Tests for JSON -> protobuf conversion helpers."""
+
+    def test_struct_to_protobuf(self):
+        """_struct_to_protobuf should convert a simple dict to Struct."""
+        from antigravity_grpc.client import _struct_to_protobuf
+        result = _struct_to_protobuf({"key": "value", "num": 42})
+        self.assertIsNotNone(result)
+        # Verify round-trip
+        from antigravity_grpc.client import _protobuf_struct_to_dict
+        d = _protobuf_struct_to_dict(result)
+        self.assertEqual(d["key"], "value")
+        self.assertEqual(d["num"], 42.0)
+
+    def test_struct_round_trip_nested(self):
+        """Nested dicts should survive a round-trip through protobuf."""
+        from antigravity_grpc.client import _struct_to_protobuf, _protobuf_struct_to_dict
+        original = {"outer": {"inner": "hello"}, "list_val": [1, 2, 3]}
+        proto = _struct_to_protobuf(original)
+        result = _protobuf_struct_to_dict(proto)
+        self.assertEqual(result["outer"]["inner"], "hello")
+        self.assertEqual(result["list_val"], [1.0, 2.0, 3.0])
+
+    def test_json_parts_to_proto_text(self):
+        """Text parts should convert to protobuf Part with text field."""
+        from antigravity_grpc.client import _json_parts_to_proto
+        parts = _json_parts_to_proto([{"text": "Hello world"}])
+        self.assertEqual(len(parts), 1)
+        self.assertEqual(parts[0].text, "Hello world")
+
+    def test_json_parts_to_proto_function_call(self):
+        """FunctionCall parts should convert correctly."""
+        from antigravity_grpc.client import _json_parts_to_proto
+        parts = _json_parts_to_proto([{
+            "functionCall": {
+                "name": "exec_command",
+                "args": {"cmd": "ls -la"},
+                "id": "call_123"
+            }
+        }])
+        self.assertEqual(len(parts), 1)
+        self.assertTrue(parts[0].HasField("function_call"))
+        self.assertEqual(parts[0].function_call.name, "exec_command")
+        self.assertEqual(parts[0].function_call.id, "call_123")
+
+    def test_json_parts_to_proto_function_response(self):
+        """FunctionResponse parts should convert correctly."""
+        from antigravity_grpc.client import _json_parts_to_proto
+        parts = _json_parts_to_proto([{
+            "functionResponse": {
+                "name": "exec_command",
+                "response": {"result": "file1.txt"},
+                "id": "call_123"
+            }
+        }])
+        self.assertEqual(len(parts), 1)
+        self.assertTrue(parts[0].HasField("function_response"))
+        self.assertEqual(parts[0].function_response.name, "exec_command")
+
+    def test_json_contents_to_proto(self):
+        """Content objects should convert correctly."""
+        from antigravity_grpc.client import _json_contents_to_proto
+        contents = _json_contents_to_proto([
+            {"role": "user", "parts": [{"text": "Hello"}]},
+            {"role": "model", "parts": [{"text": "Hi there"}]},
+        ])
+        self.assertEqual(len(contents), 2)
+        self.assertEqual(contents[0].role, "user")
+        self.assertEqual(contents[1].role, "model")
+
+    def test_proto_candidate_to_json(self):
+        """Protobuf candidates should convert back to JSON-compatible dicts."""
+        from antigravity_grpc.client import _json_contents_to_proto, _proto_candidate_to_json
+        from antigravity_grpc import cloudcode_pb2 as pb2
+
+        # Build a candidate manually
+        candidate = pb2.Candidate()
+        candidate.content.role = "model"
+        candidate.content.parts.add().text = "Hello from gRPC"
+        candidate.finish_reason = "STOP"
+        candidate.index = 0
+
+        result = _proto_candidate_to_json(candidate)
+        self.assertEqual(result["finishReason"], "STOP")
+        self.assertEqual(result["content"]["role"], "model")
+        self.assertEqual(result["content"]["parts"][0]["text"], "Hello from gRPC")
+
+
+class TestGrpcRequestBuilding(unittest.TestCase):
+    """Tests for _build_request (wrapped REST dict → protobuf)."""
+
+    def _get_client(self):
+        from antigravity_grpc import AntigravityGrpcClient
+        return AntigravityGrpcClient()
+
+    def test_build_request_basic(self):
+        """Basic request fields should be populated correctly."""
+        client = self._get_client()
+        wrapped = {
+            "project": "test-project-123",
+            "model": "Gemini 3.5 Flash (High)",
+            "requestType": "agent",
+            "userAgent": "antigravity/2.0.6",
+            "requestId": "agent-test123",
+            "request": {
+                "contents": [
+                    {"role": "user", "parts": [{"text": "Say hello"}]}
+                ],
+                "safetySettings": [
+                    {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "OFF"},
+                ],
+            }
+        }
+        req = client._build_request(wrapped)
+        self.assertEqual(req.project, "test-project-123")
+        self.assertEqual(req.model, "Gemini 3.5 Flash (High)")
+        self.assertEqual(req.request_type, "agent")
+        self.assertEqual(len(req.request.contents), 1)
+        self.assertEqual(req.request.contents[0].role, "user")
+
+    def test_build_request_with_tools(self):
+        """Tools should be converted to function declarations."""
+        client = self._get_client()
+        wrapped = {
+            "project": "test-project",
+            "model": "gemini-3-flash",
+            "request": {
+                "contents": [],
+                "tools": [{
+                    "functionDeclarations": [{
+                        "name": "exec_command",
+                        "description": "Run a shell command",
+                        "parameters": {"type": "object", "properties": {"cmd": {"type": "string"}}}
+                    }]
+                }],
+            }
+        }
+        req = client._build_request(wrapped)
+        self.assertEqual(len(req.request.tools), 1)
+        self.assertEqual(req.request.tools[0].function_declarations[0].name, "exec_command")
+
+    def test_build_request_with_generation_config(self):
+        """Generation config should be populated correctly."""
+        client = self._get_client()
+        wrapped = {
+            "project": "test-project",
+            "model": "gemini-3-flash",
+            "request": {
+                "contents": [],
+                "generationConfig": {
+                    "maxOutputTokens": 64000,
+                    "temperature": 0.7,
+                    "stopSequences": ["\n\nHuman:"],
+                    "thinkingConfig": {
+                        "includeThoughts": True,
+                        "thinkingBudget": 8192,
+                    }
+                }
+            }
+        }
+        req = client._build_request(wrapped)
+        self.assertEqual(req.request.generation_config.max_output_tokens, 64000)
+        self.assertAlmostEqual(req.request.generation_config.temperature, 0.7, places=2)
+        self.assertTrue(req.request.generation_config.thinking_config.include_thoughts)
+        self.assertEqual(req.request.generation_config.thinking_config.thinking_budget, 8192)
+
+    def test_build_request_with_function_call_history(self):
+        """Function call/response pairs in contents should be preserved."""
+        client = self._get_client()
+        wrapped = {
+            "project": "test-project",
+            "model": "gemini-3-flash",
+            "request": {
+                "contents": [
+                    {"role": "user", "parts": [{"text": "List files"}]},
+                    {"role": "model", "parts": [{
+                        "functionCall": {"name": "exec_command", "args": {"cmd": "ls"}, "id": "call_1"}
+                    }]},
+                    {"role": "user", "parts": [{
+                        "functionResponse": {"name": "exec_command", "response": {"result": "file.txt"}, "id": "call_1"}
+                    }]},
+                ]
+            }
+        }
+        req = client._build_request(wrapped)
+        self.assertEqual(len(req.request.contents), 3)
+        # Verify function call preserved
+        self.assertTrue(req.request.contents[1].parts[0].HasField("function_call"))
+        self.assertEqual(req.request.contents[1].parts[0].function_call.name, "exec_command")
+        # Verify function response preserved
+        self.assertTrue(req.request.contents[2].parts[0].HasField("function_response"))
+        self.assertEqual(req.request.contents[2].parts[0].function_response.name, "exec_command")
+
+
+class TestGrpcEndpointsConfig(unittest.TestCase):
+    """Tests for gRPC endpoint configuration."""
+
+    def test_default_endpoints(self):
+        """Default endpoints should include production and daily."""
+        from antigravity_grpc.client import _GRPC_ENDPOINTS
+        self.assertGreaterEqual(len(_GRPC_ENDPOINTS), 2)
+        hostnames = [ep.split(":")[0] for ep in _GRPC_ENDPOINTS]
+        self.assertIn("daily-cloudcode-pa.googleapis.com", hostnames)
+        self.assertIn("cloudcode-pa.googleapis.com", hostnames)
+
+    def test_staging_env_var(self):
+        """Staging endpoints should be controlled by env var."""
+        from antigravity_grpc.client import _ALLOW_STAGING_ENV
+        self.assertEqual(_ALLOW_STAGING_ENV, "ALLOW_ANTIGRAVITY_STAGING")
+
+
+class TestProxyIntegration(unittest.TestCase):
+    """Tests for the proxy's gRPC fallback integration."""
+
+    def _load_proxy_module(self):
+        import importlib
+        _spec = importlib.util.spec_from_file_location(
+            "translate_proxy",
+            os.path.join(_src_dir, "translate-proxy.py"),
+        )
+        tp = importlib.util.module_from_spec(_spec)
+        _spec.loader.exec_module(tp)
+        return tp
+
+    def test_get_grpc_client_function_exists(self):
+        """_get_grpc_client should exist as a module-level function."""
+        tp = self._load_proxy_module()
+        self.assertTrue(callable(tp._get_grpc_client))
+
+    def test_grpc_fallback_errors_set(self):
+        """_GRPC_FALLBACK_REST_ERRORS should include 404."""
+        tp = self._load_proxy_module()
+        self.assertIn(404, tp._GRPC_FALLBACK_REST_ERRORS)
+
+    def test_versions_bug_fixed(self):
+        """The _versions[0] NameError should be fixed (should be _fetched_ver)."""
+        # Read the source file and verify _versions is not used incorrectly
+        with open(os.path.join(_src_dir, "translate-proxy.py")) as f:
+            source = f.read()
+        # The bug was: ver={_versions[0]}  -- should be ver={_fetched_ver}
+        self.assertNotIn("_versions[0]", source,
+                         "Bug: _versions[0] should have been replaced with _fetched_ver")
+
+
+if __name__ == "__main__":
+    print("=" * 70)
+    print("Antigravity gRPC Fallback - Unit Tests")
+    print("=" * 70)
+    print()
+
+    unittest.main(verbosity=2)
--- a/tests/test_translate_proxy.py
+++ b/tests/test_translate_proxy.py
--- a/translate-proxy.py
+++ b/translate-proxy.py