v3.9.7 — Forward real FreeBuff error messages, fix BrokenPipeError crash, fix SyntaxWarnings

2026-05-25 11:07:02 +04:00
parent bec34079c6
commit 72ebfa3ef8
5 changed files with 165 additions and 292 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,269 +1,22 @@
 # Changelog
 ## v3.9.7 (2026-05-25)
 **FreeBuff Error Forwarding & Crash Fixes**
 ### Rate Limit Error Forwarding
 - **Real FreeBuff error messages** forwarded to user instead of generic "429 Too Many Requests"
 - **HTTP 200 + Responses API format** for rate limits — Codex displays the actual FreeBuff message (e.g. "Daily session limit reached. Resets in 29m.") instead of retrying
 - **`retryAfterMs` extraction** from FreeBuff 429 responses for accurate cooldown timers
 - **`RateLimitError` exception** carries upstream message through session and chat error paths
 - **`_freebuff_start_run`** returns actual error body instead of `None` — shows real FreeBuff errors
 ### Crash Fixes
 - **BrokenPipeError crash** on "all accounts exhausted" response — wrapped in try/except
 - **3 SyntaxWarnings** fixed for invalid `\ ` escape sequences in docstrings
 ## v3.9.6 (2026-05-25)
 **Fix Gemini Follow-Up Turns — tool_calls=0 Issue**
 ### Root Cause
 The Antigravity adapter was dropping the latest user instruction during content
 sanitization and capping. Raw Codex items grew (13, 15, 17, 19) but Gemini contents
 stayed frozen at 12. Gemini received stale context and returned text-only responses
 instead of tool calls.
 ### Fixes
 - **Enforce latest user instruction as final turn**: runs after all compaction/capping.
  If the user's message was stripped, it's appended as the last content before POST.
 - **Edit-intent detection + tool-use nudge**: when follow-up requests contain edit
  keywords (change, fix, redesign, etc.) and have prior tool history, injects a
  forced tool-use instruction before POST.
 - **Debug logging**: before every Antigravity POST, logs contents count, latest user
  text, and final content preview.
 - **Fixed needle validation**: handles newlines in user messages correctly.
 ### Also includes (v3.9.3–v3.9.5)
 - Gemini 3 thought signature preservation (capture + reattach)
 - `thought_signature` field on all functionCall parts (snake_case, as API requires)
 - Fallback to `skip_thought_signature_validator` when no real signature
 - Tool output compaction: old 3000 chars, recent 6 at 20000 chars
 - Follow-through guardrail system instruction
 - Finish-reason diagnostics logging
 - Stream hang fix for function-call-only responses
 - Multi-account rotation for all providers (freebuff, Google, API keys)
 - `/v1/accounts` status endpoint
 ---
 ## v3.9.0 (2026-05-24)
 **Multi-Account Rotation — Never Hit a Dead End Again**
 ### What's New
 The proxy now supports **multiple accounts per provider**. When one account hits its
 rate limit (429/426), the proxy automatically rotates to the next available account.
 This means 3 freebuff accounts = 15 free requests/day instead of 5.
 ### Freebuff Multi-Account
 Add extra accounts to `~/.config/manicode/credentials.json`:
 ```json
 {
  "default": { "authToken": "...", "email": "primary@example.com" },
  "accounts": [
    { "authToken": "...", "email": "secondary@example.com" },
    { "authToken": "...", "email": "tertiary@example.com" }
  ]
 }
 ```
 ### Google OAuth Multi-Account
 Add extra Google Cloud token files alongside the primary:
 - `~/.cache/codex-proxy/google-antigravity-oauth-token.json` (primary)
 - `~/.cache/codex-proxy/google-antigravity-oauth-token-1.json` (extra 1)
 - `~/.cache/codex-proxy/google-antigravity-oauth-token-2.json` (extra 2)
 ### API Key Rotation
 For any OpenAI-compatible provider, use comma-separated API keys in config:
 ```json
 { "api_key": "sk-key1,sk-key2,sk-key3" }
 ```
 Keys rotate automatically on 429 errors.
 ### New Endpoints
 - `GET /v1/accounts` — shows account pool status (active, rate-limited, time until reset)
 ### Other Fixes
 - Added `x-freebuff-model` and `x-freebuff-instance-id` headers to freebuff requests
 - Improved instance ID extraction (supports both `instanceId` and `data.instance_id`)
 - Fixes `freebuff_update_required` (HTTP 426) error when session endpoint succeeds
 ---
 ## v3.8.4 (2026-05-24)
 **Critical Fix — Freebuff DeepSeek V4 Tool-Call Sessions Now Work**
 ### Root Cause
 Freebuff/Codebuff proxies requests to DeepSeek V4, which defaults to **thinking mode enabled**. When DeepSeek returns `reasoning_content` in a streaming response that includes tool calls, subsequent requests must include that same `reasoning_content` in the assistant message history — otherwise DeepSeek's API rejects it with HTTP 400: `"The reasoning_content in the thinking mode must be passed back to the API."`
 The previous approach tried to **disable thinking** (`enable_thinking: false`, `reasoning_effort: "none"`) which Freebuff doesn't reliably forward to DeepSeek. The retry system then tried stripping assistant messages from history — which guarantees failure because DeepSeek needs the full context.
 ### Fix — Full Reasoning Round-Trip System
 1. **Capture**: After each freebuff streaming response completes, extract `reasoning_content` + `tool_calls` from the stream deltas
 2. **Store**: Index by `tool_call_id` in `_deepseek_reasoning_store` (thread-safe dict with TTL)
 3. **Rebuild**: Before every freebuff POST, `_ds_rebuild_tool_history()` re-inserts stored assistant messages (with `reasoning_content`) before their matching `tool` messages
 4. **Fallback retry**: If reasoning error still occurs, retries with DeepSeek's native `{"thinking": {"type": "disabled"}}` format
 5. **Primary path no longer disables thinking** — lets Freebuff/DeepSeek use default thinking mode with proper round-trip
 ### Changes
 - **translate-proxy.py**: New `_ds_store_assistant()`, `_ds_rebuild_tool_history()` functions; `_deepseek_reasoning_store` / `_deepseek_reasoning_lock` globals
 - **translate-proxy.py**: `oa_stream_to_sse()` now captures tool_calls in `_reasoning_out` dict alongside reasoning text
 - **translate-proxy.py**: `_handle_freebuff()` stores assistant messages after stream completes; calls `_ds_rebuild_tool_history()` before POST
 - **translate-proxy.py**: Replaced broken `_fb_retry_no_reasoning()` + `_fb_retry_stripped()` with single `_fb_retry_thinking_disabled()` using native DeepSeek format
 - **translate-proxy.py**: Removed `enable_thinking`/`reasoning_effort` from primary freebuff chat_body
 - **codex-launcher-gui**: Version bumped to 3.8.4
 ### Confirmed Working
 - Freebuff first request: 200 OK (always worked)
 - Freebuff second request after tool call: **now 200 OK** (was 400 reasoning_content error)
 - Multi-turn Codex CLI sessions with function calls complete successfully
 ---
 ## v3.8.3 (2026-05-24)
 **Critical Fix — Freebuff Streaming Now Works End-to-End**
 ### Root Cause
 The freebuff streaming handler collected SSE events into an internal list but **never wrote them to the client socket** (`self.wfile`). The `stream_buffered_events()` method — which handles buffered flushing (30ms interval / 4KB threshold / urgent events) — was not called for the freebuff streaming path. Codex CLI received zero bytes, showing "thinking..." indefinitely.
 ### Fix
 Replaced the manual streaming loop in `_handle_freebuff()` with `self.stream_buffered_events()` using an `on_event` callback pattern, matching the architecture used by the gemini-oauth, anthropic, and command-code backends. Events now flow in real-time with proper buffered flushing.
 ### Changes
 - **translate-proxy.py**: `_handle_freebuff()` streaming path rewritten — uses `stream_buffered_events()` with `_on_fb_event()` callback for metadata extraction
 - Non-streaming path unchanged (already working)
 - pycache cleanup in launcher ensures stale `.pyc` bytecode never loads old code
 ### Confirmed Working (API-level tests)
 1. Raw freebuff API streaming: 36 SSE chunks, "hello" text received
 2. Non-stream through proxy: complete JSON response with text
 3. **Streaming through proxy: full SSE event sequence** — `response.created` → `response.output_text.delta("hello")` → `response.completed`
 ---
 ## v3.8.2 (2026-05-24)
 **Freebuff Integration — FREE DeepSeek V4 Pro Access + Provider Presets Restored**
 ### Freebuff Backend (New)
 - **`freebuff` backend type** added to `translate-proxy.py`
 - Connects to `https://freebuff.com` for free AI model access
 - **Free models available**: DeepSeek V4 Pro, DeepSeek V4 Flash, Kimi K2.6, MiniMax M2.7
 - **Agent run lifecycle management**: auto-starts/finishes agent runs per request
 - **Credential detection**: reads session token from `~/.config/manicode/credentials.json`
 - **Model-to-agent routing**: maps each model to its correct freebuff agent ID
  - `deepseek/deepseek-v4-pro` → `base2-free-deepseek`
  - `deepseek/deepseek-v4-flash` → `base2-free-deepseek-flash`
  - `moonshotai/kimi-k2.6` → `base2-free-kimi`
  - `minimax/minimax-m2.7` → `base2-free`
 ### Setup for Freebuff
 1. `npm install -g freebuff` (already installed on system)
 2. `freebuff login` (opens browser for GitHub OAuth)
 3. Select "Freebuff (Free DeepSeek/Kimi)" preset in Codex Launcher GUI
 4. Pick a model and start coding — no API key needed!
 ### Provider Presets Restored
 - All 17+ provider presets restored to `endpoints.json`
 - Previous issue: endpoints.json was overwritten with only 4 AG X entries
 - Restored: Command Code, Crof.ai, OpenAdapter, OpenAdapter GO Plan, OpenCode Zen (OpenAI + Anthropic), OpenCode Go (OpenAI + Anthropic), OpenRouter, NVIDIA NIM, Z.ai Coding, Google Gemini (API Key + OAuth), Google Antigravity (OAuth), Anthropic, OpenAI, Cobra (chats-llm.com)
 ### GUI Changes
 - New preset: **"Freebuff (Free DeepSeek/Kimi)"** in provider dropdown
 - New backend type: **"Freebuff - Free DeepSeek/Kimi"** in type selector
 - Version label updated to v3.8.1
 - Changelog entry for v3.8.1 added
 ### Stats
 - translate-proxy.py: +205 lines (freebuff backend)
 - codex-launcher-gui: +19 lines (preset + changelog)
 - .deb size: 84KB
 - Self-tests: 54/54 passing
 ---
 ## v3.8.0 (2026-05-22)
 **AI Monitoring — Self-Healing Watchdog with 3-Tier Response System**
 When the proxy crashes, the upstream dies, or the model gets stuck, Codex stops working. The user has to manually restart everything. AI Monitoring fixes this with an autonomous watchdog that detects, diagnoses, and recovers from failures without user intervention.
 ### Three-Tier Response System
 | Tier | Speed | What | When |
 |------|-------|------|------|
 | **Tier 1** | < 1s | Rule-based auto-recovery | Known failure patterns (14 rules) |
 | **Tier 2** | < 100ms | Incident store lookup | We've seen this exact failure before |
 | **Tier 3** | 2-5s | AI diagnostic agent (configurable model) | Novel failure — no rule or pattern matches |
 ### Watchdog Components
 - **HealthWatcher thread** — pings proxy `/health` every 5 seconds, detects crashes and hangs
 - **LogAnalyzer thread** — tails `cc-debug.log` for 18 failure signal patterns in real-time
 - **Tier 1 rule engine** — 14 rules covering: proxy crash restart, port conflict resolution, upstream retry with backoff, schema cache clearing, rate limit handling, stream error recovery
 - **Tier 2 incident store** — JSON pattern database (`~/.cache/codex-proxy/incident-store.json`) with success rates, learns from every resolved incident
 - **Tier 3 AI diagnostic agent** — calls a user-configured provider/model (e.g., Gemini Flash, GPT-4o-mini, local Ollama) to diagnose novel failures. Cost: ~$0.10-1.50/month
 ### Failure Catalog: 30 Fault Types
 - **Category A** (7): Proxy crash, port conflict, memory leak, deadlock, SSL error, DNS failure, unhandled exception
 - **Category B** (10): Rate limit (429), server error (5xx), auth failure (401/403), CC upgrade required, timeout, connection reset, broken pipe, bad request, provider overloaded, Cloudflare block
 - **Category C** (10): Parser empty, stuck recovery, sanitizer flags, double-wrapped cmd, suspicious cmd, empty cmd, bare JSON token, bash without cmd, DSML name mismatch, stuck model loop
 - **Category D** (6): Codex process killed, memory explosion, 300s stall, config corruption, context overflow, WebSocket reconnect loop
 - **Category E** (5): Schema cache corruption, stale PID file, port from old session, OAuth token expired, BGP all routes down
 ### Safety Guards
 - Rate-limited AI calls: max 1 per 60s, max 10/day
 - Restart cap: max 5 proxy restarts per 10 minutes
 - Cooldown per pattern (30s → 60s → 300s → alert user)
 - Monthly AI budget cap (configurable, default $2/month)
 ### Enhanced /health Endpoint
 The proxy's `/health` endpoint now returns `uptime_s`, `memory_mb`, and `requests_total` for watchdog monitoring.
 ### GUI Integration
 - **"AI Monitor" button** in header bar
 - **AIMonitoringWindow**: ON/OFF toggle, provider URL/model/API key selector, health check interval, auto-restart toggle, incident log viewer
 - Watchdog starts automatically when enabled
 - All actions logged to `~/.cache/codex-proxy/monitoring.log`
 ### AI Monitoring Design Spec
 Full design document at `AI-MONITORING-DESIGN.md` — architecture diagrams, decision flow, safety guards, implementation plan.
 ## v3.7.0 (2026-05-22)
 **Intelligence Routing — Self-Healing Parser System**
 When the Command Code model produces output in unpredictable or unrecognized formats, the multi-format parser chain (DSML, XML, explore_agent, bash blocks, raw JSON, fallback regex) can return empty. This causes the Codex agent loop to stall — zero tool calls means nothing to execute.
 Intelligence Routing is a **three-layer self-healing system** that ensures the agent loop always continues:
 ### Layer 1: Deep URL Extraction (FIX 23)
 - **Problem**: `<explore_agent>` body contained `messages: [{"content": "https://..."}]` — URLs hidden inside JSON values. Regex couldn't match because it excluded the `"` character that terminates JSON strings.
 - **Solution**: `_build_explore_cmd()` extracted to module level (was a closure inside `_parse_commandcode_text_tool_calls`). After initial regex fails, tries `json.loads()`, iterates list items, extracts `content` field to find URLs. Added `"` to regex exclusion set.
 - **Self-tests**: Pattern M, O, O2 verify URL extraction from nested JSON.
 ### Layer 2: Escalation Block Handling (FIX 24)
 - **Problem**: Model produces `<require_escalation>` and `<request_escalation_permission>` blocks when it wants elevated permissions. CC adapter doesn't support escalation — blocks silently dropped → `parsed_tool_calls=0` → stall.
 - **Solution**: Two handlers:
  - FIX 24a: Closed-tag blocks — extracts URL if present and runs explore command; otherwise echoes auto-proceed.
  - FIX 24b: Bare/unclosed tags (`<require_escalation />`) — auto-proceeds with diagnostic echo.
 - **Self-tests**: Pattern N, N2 verify both closed and bare escalation blocks.
 ### Layer 3: Intent-Based Command Synthesis (FIX 25 — THE CORE)
 - **Problem**: After ALL parsers return empty, the agent loop has zero tool calls. Model may have written plain English ("I need to fetch the README"), partial JSON, or completely unrecognized formats.
 - **Solution**: 5-heuristic synthesis chain in `cc_stream_to_sse()`, run when `parsed_tool_calls=0` and text has content:
  1. **URL in text** → `curl` to fetch it
  2. **File path reference** ("read the file /path/to/X") → `cat` or `ls` that file
  3. **Shell command in backticks/quotes** → extract and run it
  4. **"explore"/"fetch"/"investigate"/"repository" intent** + last user URL → `_build_explore_cmd()` with `_last_user_urls` deque
  5. **"I need to"/"let me"/"please" intent text** → echo diagnostic with the intent
 - The system NEVER returns empty tool calls when there's text to analyze.
 - **Self-tests**: Patterns M-O2 cover the full pipeline.
 ### Architecture
 ```
 _parse_commandcode_text_tool_calls()  ←  Layer 1 + Layer 2
 cc_stream_to_sse()                    ←  Layer 3 (after parser chain + fallback)
 _last_user_urls deque (maxlen=20)     ←  Session-wide URL memory for heuristic 4
 ```
 ### Test Coverage
 - **54 self-test patterns** (up from 41 in v3.6.0)
 - 13 new tests covering all three Intelligence Routing layers
 - Tests verify: nested JSON URL extraction, closed/bare escalation blocks, module-level explore command builder
 ## v3.6.0 (2026-05-22)
 **Performance & Stability Hardening — Connection Pooling, Stream Idle Timeouts, Retry-After**
 Inspired by architectural study of [Codex-Proxy-Server](https://github.com/unluckyjori/Codex-Proxy-Server) (Rust/Axum).
--- a/codex-launcher_3.9.7_all.deb
+++ b/codex-launcher_3.9.7_all.deb
--- a/install.sh
+++ b/install.sh
@@ -3,11 +3,11 @@ set -e
 SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
-if [ -f "$SCRIPT_DIR/codex-launcher_3.8.1_all.deb" ]; then
+if [ -f "$SCRIPT_DIR/codex-launcher_3.9.7_all.deb" ]; then
-    echo "Installing codex-launcher_3.8.1_all.deb ..."
+    echo "Installing codex-launcher_3.9.7_all.deb ..."
-    sudo dpkg -i "$SCRIPT_DIR/codex-launcher_3.8.1_all.deb"
+    sudo dpkg -i "$SCRIPT_DIR/codex-launcher_3.9.7_all.deb"
    echo ""
-    echo "Installed v3.8.1 via .deb package."
+    echo "Installed v3.9.7 via .deb package."
    echo "  translate-proxy.py   -> /usr/bin/translate-proxy.py"
    echo "  codex-launcher-gui   -> /usr/bin/codex-launcher-gui"
    echo "  cleanup-codex-stale  -> /usr/bin/cleanup-codex-stale.sh"
--- a/src/codex-launcher-gui
+++ b/src/codex-launcher-gui
@@ -26,7 +26,38 @@ model_catalog_json = ""
 """
 CHANGELOG = [
-    ("3.8.3", "2026-05-24", [
+    ("3.9.7", "2026-05-25", [
        "Forward real FreeBuff error messages to user (not generic 429)",
        "Return HTTP 200 with Responses API format for rate limits so Codex displays message",
        "Extract retryAfterMs from FreeBuff 429 responses for accurate cooldown",
        "RateLimitError carries upstream message through session + chat error paths",
        "BrokenPipeError crash fix on 'all accounts exhausted' response",
        "Fix 3 SyntaxWarnings for invalid escape sequences in docstrings",
        "_freebuff_start_run returns actual error body instead of None",
    ]),
    ("3.9.6", "2026-05-25", [
        "Fix Gemini follow-up turns returning text-only instead of tool calls",
        "Enforce latest user instruction as final Gemini content turn",
        "Edit-intent detection with tool-use nudge for file modification requests",
        "Debug logging: contents count, latest user text, final content preview",
        "Thought signature preservation for Gemini 3 tool-call continuity",
        "thought_signature field on all functionCall parts (snake_case)",
        "Smart tool output compaction: old=3000, recent=20000 chars",
        "Follow-through guardrail system instruction for autonomous agent behavior",
        "Stream hang fix for function-call-only responses",
        "Multi-account rotation for freebuff, Google OAuth, API keys",
        "/v1/accounts endpoint for account pool status",
    ]),
    ("3.9.0", "2026-05-24", [
        "Multi-account rotation for OAuth providers (freebuff, Google, API keys)",
        "Automatic failover: when one account hits rate limit, next is used",
        "Freebuff: supports accounts[] array in credentials.json",
        "Google OAuth: supports multiple token files (google-*-oauth-token-N.json)",
        "API keys: comma-separated keys rotate on 429 errors",
        "New /v1/accounts endpoint shows account pool status",
        "Added x-freebuff-model and x-freebuff-instance-id headers",
    ]),
    ("3.8.4", "2026-05-24", [
        "FIXED: Freebuff streaming — SSE events now reach Codex client",
        "Root cause: stream_buffered_events was never called for freebuff",
        "Freebuff stream uses buffered flushing (30ms / 4KB / urgent)",
@@ -1706,7 +1737,7 @@ class LauncherWin(Gtk.Window):
        # header row
        hdr = Gtk.Box(spacing=8)
        vbox.pack_start(hdr, False, False, 0)
-        lbl = Gtk.Label(label="<b>Codex Launcher v3.8.3</b>")
+        lbl = Gtk.Label(label="<b>Codex Launcher v3.9.7</b>")
        lbl.set_use_markup(True)
        hdr.pack_start(lbl, False, False, 0)
        changelog_btn = Gtk.Button(label="Changelog")
@@ -3495,7 +3526,7 @@ class EditEndpointDialog(Gtk.Dialog):
                auth_url = "https://freebuff.com/api/auth/cli/code"
                body = json.dumps({"fingerprintId": fingerprint_id}).encode()
                req = urllib.request.Request(auth_url, data=body,
-                    headers={"Content-Type": "application/json", "User-Agent": "codex-launcher/3.8.3"})
+                    headers={"Content-Type": "application/json", "User-Agent": "codex-launcher/3.9.7"})
                resp = urllib.request.urlopen(req, timeout=30)
                data = json.loads(resp.read())
                login_url = data.get("loginUrl", "") or data.get("login_url", "")
@@ -3520,7 +3551,7 @@ class EditEndpointDialog(Gtk.Dialog):
                    time.sleep(2)
                    try:
                        poll_req = urllib.request.Request(poll_url,
-                            headers={"User-Agent": "codex-launcher/3.8.3"})
+                            headers={"User-Agent": "codex-launcher/3.9.7"})
                        poll_resp = urllib.request.urlopen(poll_req, timeout=10)
                        poll_data = json.loads(poll_resp.read())
                        user = poll_data.get("user")
--- a/src/translate-proxy.py
+++ b/src/translate-proxy.py
@@ -70,9 +70,9 @@ FIX 6: Double-wrapped arguments (nested {"cmd": "{\"cmd\": \"curl...\"}"}")
 FIX 7: _extract_field can't read values starting with \"
  Symptom: sandbox_permissions="allow_all" passes through unnormalized because
-        _extract_field sees val_start=\ (backslash) which != " or { → returns None
+        _extract_field sees val_start=\\ (backslash) which != \" or { → returns None
  Fix: Skip leading backslash before checking for " or { value type.
-  Location: _extract_field() leading-\ skip
+  Location: _extract_field() leading-backslash skip
 FIX 8: Adaptive probing caused format mismatch (REVERTED)
  Symptom: Probe system discovered OpenAI tool_calls+role=tool format but CC API couldn't
@@ -335,10 +335,31 @@ def _freebuff_get_session(token, model):
        req = urllib.request.Request(url, data=body, headers={
            "Content-Type": "application/json",
            "Authorization": f"Bearer {token}",
-            "User-Agent": "codex-launcher/3.9.0",
+            "User-Agent": "codex-launcher/3.9.7",
            "x-freebuff-model": model,
        })
-        resp = urllib.request.urlopen(req, timeout=15)
+        try:
            resp = urllib.request.urlopen(req, timeout=15)
        except urllib.error.HTTPError as e:
            err_body = e.read().decode()[:1000]
            if e.code == 429:
                retry_s = 120
                user_msg = ""
                try:
                    err_data = json.loads(err_body)
                    retry_ms = err_data.get("retryAfterMs", 0)
                    if retry_ms:
                        retry_s = retry_ms / 1000
                    user_msg = err_data.get("message", err_data.get("error", ""))
                    if isinstance(user_msg, dict):
                        user_msg = user_msg.get("message", "")
                except Exception:
                    pass
                if not user_msg:
                    user_msg = _sanitize_err_body(err_body)
                raise RateLimitError(retry_s, user_msg)
            print(f"[freebuff] session HTTP {e.code}: {err_body[:200]}", file=sys.stderr)
            return None
        data = json.loads(resp.read())
        instance_id = data.get("instanceId", data.get("data", {}).get("instance_id", ""))
        expires_at = data.get("remainingMs", 0)
@@ -350,6 +371,8 @@ def _freebuff_get_session(token, model):
            print(f"[freebuff] session active, instance={instance_id[:8]}...", file=sys.stderr)
            return instance_id
        return None
    except RateLimitError:
        raise
    except Exception as e:
        print(f"[freebuff] session failed: {e}", file=sys.stderr)
        return None
@@ -360,21 +383,31 @@ def _freebuff_start_run(token, agent_id):
    req = urllib.request.Request(url, data=body, headers={
        "Content-Type": "application/json",
        "Authorization": f"Bearer {token}",
-        "User-Agent": "codex-launcher/3.9.0",
+        "User-Agent": "codex-launcher/3.9.7",
    })
    try:
        resp = urllib.request.urlopen(req, timeout=15)
        data = json.loads(resp.read())
        run_id = data.get("runId")
        print(f"[freebuff] started run {run_id} for agent {agent_id}", file=sys.stderr)
-        return run_id
+        return run_id, None
    except urllib.error.HTTPError as e:
-        err = e.read().decode()[:300]
+        err = e.read().decode()[:500]
        print(f"[freebuff] start run failed: HTTP {e.code}: {err}", file=sys.stderr)
-        return None
+        if e.code == 429:
            retry_s = 120
            try:
                err_data = json.loads(err)
                retry_ms = err_data.get("retryAfterMs", 0)
                if retry_ms:
                    retry_s = retry_ms / 1000
            except Exception:
                pass
            return None, ("rate_limit_error", 429, retry_s, _sanitize_err_body(err))
        return None, ("upstream_error", e.code, 0, _sanitize_err_body(err))
    except Exception as e:
        print(f"[freebuff] start run error: {e}", file=sys.stderr)
-        return None
+        return None, ("proxy_error", 502, 0, str(e))
 def _freebuff_finish_run(token, run_id, status="completed"):
    url = f"{_FREEBUFF_API_URL}/api/v1/agent-runs"
@@ -383,7 +416,7 @@ def _freebuff_finish_run(token, run_id, status="completed"):
    req = urllib.request.Request(url, data=body, headers={
        "Content-Type": "application/json",
        "Authorization": f"Bearer {token}",
-        "User-Agent": "codex-launcher/3.9.0",
+        "User-Agent": "codex-launcher/3.9.7",
    })
    try:
        urllib.request.urlopen(req, timeout=10)
@@ -392,6 +425,12 @@ def _freebuff_finish_run(token, run_id, status="completed"):
 # ═══════════════════════════════════════════════════════════════════
 # Multi-account rotation system
 class RateLimitError(Exception):
    def __init__(self, retry_seconds, message=""):
        self.retry_seconds = retry_seconds
        self.message = message
        super().__init__(f"rate-limited for {retry_seconds:.0f}s: {message}")
 # ═══════════════════════════════════════════════════════════════════
 class AccountPool:
@@ -2804,7 +2843,7 @@ def _parse_commandcode_text_tool_calls(text):
        Delegates to _extract_args() for the arguments field (handles unescaped + escaped JSON).
        Delegates to _extract_field() for name/id/sandbox_permissions/justification
-          (with FIX 7 for leading-\ handling).
+          (with FIX 7 for leading-backslash handling).
        Normalizes sandbox_permissions to valid values (use_default|require_escalated|with_user_approval)
        [FIX 6] Prevents double-wrapped args: {"cmd": "{\"cmd\": \"curl...\"}"}
@@ -5209,13 +5248,30 @@ class Handler(http.server.BaseHTTPRequestHandler):
             if attempt > 0:
                 print(f"[freebuff] rotation attempt {attempt+1}/{n_accounts}, trying account {acct_id}", file=sys.stderr)
-             run_id = _freebuff_start_run(token, agent_id)
+             run_id, run_err = _freebuff_start_run(token, agent_id)
             if not run_id:
-                 _fb_pool.mark_rate_limited(acct, 60)
+                 if run_err and run_err[0] == "rate_limit_error":
-                 last_err = ("upstream_error", 502, "Failed to start freebuff agent run. Check credentials and network.")
+                     retry_s = run_err[2]
                     _fb_pool.mark_rate_limited(acct, retry_s)
                     last_err = ("rate_limit_error", run_err[1], f"Account {acct_id} rate-limited by FreeBuff: {run_err[3]}")
                 else:
                     _fb_pool.mark_rate_limited(acct, 60)
                     last_err = ("upstream_error", run_err[1] if run_err else 502,
                                 f"Failed to start agent run for {acct_id}: {run_err[3] if run_err else 'unknown error'}")
                 continue
-             instance_id = _freebuff_get_session(token, model)
+             try:
                 instance_id = _freebuff_get_session(token, model)
             except RateLimitError as rle:
                 retry_s = rle.retry_seconds
                 fb_msg = rle.message
                 mins = int(retry_s // 60)
                 user_msg = fb_msg if fb_msg else f"Daily session limit reached. Resets in {mins}m."
                 print(f"[freebuff] session 429 for {acct_id}, retry after {retry_s:.0f}s", file=sys.stderr)
                 _fb_pool.mark_rate_limited(acct, retry_s)
                 _freebuff_finish_run(token, run_id, "completed")
                 last_err = ("rate_limit_error", 429, user_msg)
                 continue
             input_data = body.get("input", "")
             instructions = body.get("instructions", "").strip()
@@ -5249,7 +5305,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
             headers = {
                 "Content-Type": "application/json",
                 "Authorization": f"Bearer {token}",
-                 "User-Agent": "codex-launcher/3.9.0",
+                 "User-Agent": "codex-launcher/3.9.7",
                 "x-freebuff-model": model,
             }
             if instance_id:
@@ -5266,14 +5322,22 @@ class Handler(http.server.BaseHTTPRequestHandler):
                 _freebuff_finish_run(token, run_id, "failed")
                 if e.code in (429, 426):
                     reset_ms = 0
                     fb_msg = ""
                     try:
                         err_json = json.loads(err_body)
                         reset_ms = err_json.get("retryAfterMs", 0)
                         fb_msg = err_json.get("message", err_json.get("error", ""))
                         if isinstance(fb_msg, dict):
                             fb_msg = fb_msg.get("message", "")
                     except Exception:
                         pass
                     duration = max(reset_ms / 1000, 120) if reset_ms else 120
                     mins = int(duration // 60)
                     if not fb_msg:
                         fb_msg = _sanitize_err_body(err_body)
                     user_msg = f"{fb_msg} (resets in {mins}m)" if fb_msg else f"Rate limited. Resets in {mins}m."
                     _fb_pool.mark_rate_limited(acct, duration)
-                     last_err = ("upstream_error", e.code, _sanitize_err_body(err_body))
+                     last_err = ("rate_limit_error", e.code, user_msg)
                     print(f"[freebuff] account {acct_id} got HTTP {e.code}, rotating", file=sys.stderr)
                     continue
                 if _is_reasoning_content_error(err_body):
@@ -5357,13 +5421,38 @@ class Handler(http.server.BaseHTTPRequestHandler):
             return
         if last_err:
-             return self.send_json(last_err[1], {"error": {"type": last_err[0], "message": f"All {n_accounts} accounts exhausted. {last_err[2]}"}})
+             msg = last_err[2]
             resp_id = f"resp_{uuid.uuid4().hex[:24]}"
             result = {
                 "id": resp_id,
                 "object": "response",
                 "created_at": int(time.time()),
                 "model": model,
                 "status": "completed",
                 "output": [{
                     "id": f"msg_{uuid.uuid4().hex[:24]}",
                     "type": "message",
                     "role": "assistant",
                     "content": [{
                         "type": "output_text",
                         "text": msg,
                         "annotations": [],
                     }],
                     "status": "completed",
                 }],
                 "usage": {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0},
             }
             try:
                 return self.send_json(200, result)
             except (BrokenPipeError, ConnectionResetError, ConnectionAbortedError):
                 return
    def _fb_retry_thinking_disabled(self, body, model, token, agent_id, stream, tracker, input_data, instructions, original_error, acct=None):
-        run_id = _freebuff_start_run(token, agent_id)
+        run_id, run_err = _freebuff_start_run(token, agent_id)
        if not run_id:
-            return self.send_json(502, {"error": {"type": "upstream_error",
+            msg = run_err[3] if run_err else "unknown error"
-                "message": "Failed to start freebuff agent run for retry."}})
+            return self.send_json(run_err[1] if run_err else 502, {"error": {"type": run_err[0] if run_err else "upstream_error",
                "message": f"Failed to start agent run for retry: {msg}"}})
        instance_id = _freebuff_get_session(token, model)
        messages = _fb_input_to_messages(input_data, instructions)
        _freebuff_hard_disable_reasoning(messages)
@@ -5385,7 +5474,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
        if body.get("tool_choice"):
            chat_body["tool_choice"] = body["tool_choice"]
        target = f"{_FREEBUFF_API_URL}/api/v1/chat/completions"
-        headers = {"Content-Type": "application/json", "Authorization": f"Bearer {token}", "User-Agent": "codex-launcher/3.9.0", "x-freebuff-model": model}
+        headers = {"Content-Type": "application/json", "Authorization": f"Bearer {token}", "User-Agent": "codex-launcher/3.9.7", "x-freebuff-model": model}
        if instance_id:
            headers["x-freebuff-instance-id"] = instance_id
        print(f"[freebuff] retry POST {target} model={model} stream={stream} run={run_id} (thinking disabled via DeepSeek native)", file=sys.stderr)