v3.9.7 — Forward real codebuff error messages, fix BrokenPipeError crash, fix SyntaxWarnings

2026-05-25 11:07:02 +04:00
parent e819aaef8a
commit 528b3e65ee
5 changed files with 165 additions and 292 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,269 +1,22 @@
 # Changelog

+## v3.9.7 (2026-05-25)
+
+**Codebuff Error Forwarding & Crash Fixes**
+
+### Rate Limit Error Forwarding
+- **Real Codebuff error messages** forwarded to user instead of generic "429 Too Many Requests"
+- **HTTP 200 + Responses API format** for rate limits — Codex displays the actual Codebuff message (e.g. "Daily session limit reached. Resets in 29m.") instead of retrying
+- **`retryAfterMs` extraction** from Codebuff 429 responses for accurate cooldown timers
+- **`RateLimitError` exception** carries upstream message through session and chat error paths
+- **`_codebuff_start_run`** returns actual error body instead of `None` — shows real Codebuff errors
+
+### Crash Fixes
+- **BrokenPipeError crash** on "all accounts exhausted" response — wrapped in try/except
+- **3 SyntaxWarnings** fixed for invalid `\ ` escape sequences in docstrings
+
 ## v3.9.6 (2026-05-25)

-**Fix Gemini Follow-Up Turns — tool_calls=0 Issue**
-
-### Root Cause
-The Antigravity adapter was dropping the latest user instruction during content
-sanitization and capping. Raw Codex items grew (13, 15, 17, 19) but Gemini contents
-stayed frozen at 12. Gemini received stale context and returned text-only responses
-instead of tool calls.
-
-### Fixes
- **Enforce latest user instruction as final turn**: runs after all compaction/capping.
-  If the user's message was stripped, it's appended as the last content before POST.
- **Edit-intent detection + tool-use nudge**: when follow-up requests contain edit
-  keywords (change, fix, redesign, etc.) and have prior tool history, injects a
-  forced tool-use instruction before POST.
- **Debug logging**: before every Antigravity POST, logs contents count, latest user
-  text, and final content preview.
- **Fixed needle validation**: handles newlines in user messages correctly.
-
-### Also includes (v3.9.3–v3.9.5)
- Gemini 3 thought signature preservation (capture + reattach)
- `thought_signature` field on all functionCall parts (snake_case, as API requires)
- Fallback to `skip_thought_signature_validator` when no real signature
- Tool output compaction: old 3000 chars, recent 6 at 20000 chars
- Follow-through guardrail system instruction
- Finish-reason diagnostics logging
- Stream hang fix for function-call-only responses
- Multi-account rotation for all providers (codebuff, Google, API keys)
- `/v1/accounts` status endpoint
-
---
-
-
-## v3.9.0 (2026-05-24)
-
-**Multi-Account Rotation — Never Hit a Dead End Again**
-
-### What's New
-The proxy now supports **multiple accounts per provider**. When one account hits its
-rate limit (429/426), the proxy automatically rotates to the next available account.
-This means 3 codebuff accounts = 15 free requests/day instead of 5.
-
-### Codebuff Multi-Account
-Add extra accounts to `~/.config/manicode/credentials.json`:
-```json
-{
-  "default": { "authToken": "...", "email": "primary@example.com" },
-  "accounts": [
-    { "authToken": "...", "email": "secondary@example.com" },
-    { "authToken": "...", "email": "tertiary@example.com" }
-  ]
-}
-```
-
-### Google OAuth Multi-Account
-Add extra Google Cloud token files alongside the primary:
- `~/.cache/codex-proxy/google-antigravity-oauth-token.json` (primary)
- `~/.cache/codex-proxy/google-antigravity-oauth-token-1.json` (extra 1)
- `~/.cache/codex-proxy/google-antigravity-oauth-token-2.json` (extra 2)
-
-### API Key Rotation
-For any OpenAI-compatible provider, use comma-separated API keys in config:
-```json
-{ "api_key": "sk-key1,sk-key2,sk-key3" }
-```
-Keys rotate automatically on 429 errors.
-
-### New Endpoints
- `GET /v1/accounts` — shows account pool status (active, rate-limited, time until reset)
-
-### Other Fixes
- Added `x-codebuff-model` and `x-codebuff-instance-id` headers to codebuff requests
- Improved instance ID extraction (supports both `instanceId` and `data.instance_id`)
- Fixes `codebuff_update_required` (HTTP 426) error when session endpoint succeeds
-
---
-
-
-## v3.8.4 (2026-05-24)
-
-**Critical Fix — Codebuff DeepSeek V4 Tool-Call Sessions Now Work**
-
-### Root Cause
-Codebuff/Codebuff proxies requests to DeepSeek V4, which defaults to **thinking mode enabled**. When DeepSeek returns `reasoning_content` in a streaming response that includes tool calls, subsequent requests must include that same `reasoning_content` in the assistant message history — otherwise DeepSeek's API rejects it with HTTP 400: `"The reasoning_content in the thinking mode must be passed back to the API."`
-
-The previous approach tried to **disable thinking** (`enable_thinking: false`, `reasoning_effort: "none"`) which Codebuff doesn't reliably forward to DeepSeek. The retry system then tried stripping assistant messages from history — which guarantees failure because DeepSeek needs the full context.
-
-### Fix — Full Reasoning Round-Trip System
-1. **Capture**: After each codebuff streaming response completes, extract `reasoning_content` + `tool_calls` from the stream deltas
-2. **Store**: Index by `tool_call_id` in `_deepseek_reasoning_store` (thread-safe dict with TTL)
-3. **Rebuild**: Before every codebuff POST, `_ds_rebuild_tool_history()` re-inserts stored assistant messages (with `reasoning_content`) before their matching `tool` messages
-4. **Fallback retry**: If reasoning error still occurs, retries with DeepSeek's native `{"thinking": {"type": "disabled"}}` format
-5. **Primary path no longer disables thinking** — lets Codebuff/DeepSeek use default thinking mode with proper round-trip
-
-### Changes
- **translate-proxy.py**: New `_ds_store_assistant()`, `_ds_rebuild_tool_history()` functions; `_deepseek_reasoning_store` / `_deepseek_reasoning_lock` globals
- **translate-proxy.py**: `oa_stream_to_sse()` now captures tool_calls in `_reasoning_out` dict alongside reasoning text
- **translate-proxy.py**: `_handle_codebuff()` stores assistant messages after stream completes; calls `_ds_rebuild_tool_history()` before POST
- **translate-proxy.py**: Replaced broken `_fb_retry_no_reasoning()` + `_fb_retry_stripped()` with single `_fb_retry_thinking_disabled()` using native DeepSeek format
- **translate-proxy.py**: Removed `enable_thinking`/`reasoning_effort` from primary codebuff chat_body
- **codex-launcher-gui**: Version bumped to 3.8.4
-
-### Confirmed Working
- Codebuff first request: 200 OK (always worked)
- Codebuff second request after tool call: **now 200 OK** (was 400 reasoning_content error)
- Multi-turn Codex CLI sessions with function calls complete successfully
-
---
-
-## v3.8.3 (2026-05-24)
-
-**Critical Fix — Codebuff Streaming Now Works End-to-End**
-
-### Root Cause
-The codebuff streaming handler collected SSE events into an internal list but **never wrote them to the client socket** (`self.wfile`). The `stream_buffered_events()` method — which handles buffered flushing (30ms interval / 4KB threshold / urgent events) — was not called for the codebuff streaming path. Codex CLI received zero bytes, showing "thinking..." indefinitely.
-
-### Fix
-Replaced the manual streaming loop in `_handle_codebuff()` with `self.stream_buffered_events()` using an `on_event` callback pattern, matching the architecture used by the gemini-oauth, anthropic, and command-code backends. Events now flow in real-time with proper buffered flushing.
-
-### Changes
- **translate-proxy.py**: `_handle_codebuff()` streaming path rewritten — uses `stream_buffered_events()` with `_on_fb_event()` callback for metadata extraction
- Non-streaming path unchanged (already working)
- pycache cleanup in launcher ensures stale `.pyc` bytecode never loads old code
-
-### Confirmed Working (API-level tests)
-1. Raw codebuff API streaming: 36 SSE chunks, "hello" text received
-2. Non-stream through proxy: complete JSON response with text
-3. **Streaming through proxy: full SSE event sequence** — `response.created` → `response.output_text.delta("hello")` → `response.completed`
-
---
-
-## v3.8.2 (2026-05-24)
-
-**Codebuff Integration — FREE DeepSeek V4 Pro Access + Provider Presets Restored**
-
-### Codebuff Backend (New)
- **`codebuff` backend type** added to `translate-proxy.py`
- Connects to `https://codebuff.com` for free AI model access
- **Free models available**: DeepSeek V4 Pro, DeepSeek V4 Flash, Kimi K2.6, MiniMax M2.7
- **Agent run lifecycle management**: auto-starts/finishes agent runs per request
- **Credential detection**: reads session token from `~/.config/manicode/credentials.json`
- **Model-to-agent routing**: maps each model to its correct codebuff agent ID
-  - `deepseek/deepseek-v4-pro` → `base2-free-deepseek`
-  - `deepseek/deepseek-v4-flash` → `base2-free-deepseek-flash`
-  - `moonshotai/kimi-k2.6` → `base2-free-kimi`
-  - `minimax/minimax-m2.7` → `base2-free`
-
-### Setup for Codebuff
-1. `npm install -g codebuff` (already installed on system)
-2. `codebuff login` (opens browser for GitHub OAuth)
-3. Select "Codebuff (Free DeepSeek/Kimi)" preset in Codex Launcher GUI
-4. Pick a model and start coding — no API key needed!
-
-### Provider Presets Restored
- All 17+ provider presets restored to `endpoints.json`
- Previous issue: endpoints.json was overwritten with only 4 AG X entries
- Restored: Command Code, Crof.ai, OpenAdapter, OpenAdapter GO Plan, OpenCode Zen (OpenAI + Anthropic), OpenCode Go (OpenAI + Anthropic), OpenRouter, NVIDIA NIM, Z.ai Coding, Google Gemini (API Key + OAuth), Google Antigravity (OAuth), Anthropic, OpenAI, Cobra (chats-llm.com)
-
-### GUI Changes
- New preset: **"Codebuff (Free DeepSeek/Kimi)"** in provider dropdown
- New backend type: **"Codebuff - Free DeepSeek/Kimi"** in type selector
- Version label updated to v3.8.1
- Changelog entry for v3.8.1 added
-
-### Stats
- translate-proxy.py: +205 lines (codebuff backend)
- codex-launcher-gui: +19 lines (preset + changelog)
- .deb size: 84KB
- Self-tests: 54/54 passing
-
---
-
-## v3.8.0 (2026-05-22)
-
-**AI Monitoring — Self-Healing Watchdog with 3-Tier Response System**
-
-When the proxy crashes, the upstream dies, or the model gets stuck, Codex stops working. The user has to manually restart everything. AI Monitoring fixes this with an autonomous watchdog that detects, diagnoses, and recovers from failures without user intervention.
-
-### Three-Tier Response System
-
-| Tier | Speed | What | When |
-|------|-------|------|------|
-| **Tier 1** | < 1s | Rule-based auto-recovery | Known failure patterns (14 rules) |
-| **Tier 2** | < 100ms | Incident store lookup | We've seen this exact failure before |
-| **Tier 3** | 2-5s | AI diagnostic agent (configurable model) | Novel failure — no rule or pattern matches |
-
-### Watchdog Components
- **HealthWatcher thread** — pings proxy `/health` every 5 seconds, detects crashes and hangs
- **LogAnalyzer thread** — tails `cc-debug.log` for 18 failure signal patterns in real-time
- **Tier 1 rule engine** — 14 rules covering: proxy crash restart, port conflict resolution, upstream retry with backoff, schema cache clearing, rate limit handling, stream error recovery
- **Tier 2 incident store** — JSON pattern database (`~/.cache/codex-proxy/incident-store.json`) with success rates, learns from every resolved incident
- **Tier 3 AI diagnostic agent** — calls a user-configured provider/model (e.g., Gemini Flash, GPT-4o-mini, local Ollama) to diagnose novel failures. Cost: ~$0.10-1.50/month
-
-### Failure Catalog: 30 Fault Types
- **Category A** (7): Proxy crash, port conflict, memory leak, deadlock, SSL error, DNS failure, unhandled exception
- **Category B** (10): Rate limit (429), server error (5xx), auth failure (401/403), CC upgrade required, timeout, connection reset, broken pipe, bad request, provider overloaded, Cloudflare block
- **Category C** (10): Parser empty, stuck recovery, sanitizer flags, double-wrapped cmd, suspicious cmd, empty cmd, bare JSON token, bash without cmd, DSML name mismatch, stuck model loop
- **Category D** (6): Codex process killed, memory explosion, 300s stall, config corruption, context overflow, WebSocket reconnect loop
- **Category E** (5): Schema cache corruption, stale PID file, port from old session, OAuth token expired, BGP all routes down
-
-### Safety Guards
- Rate-limited AI calls: max 1 per 60s, max 10/day
- Restart cap: max 5 proxy restarts per 10 minutes
- Cooldown per pattern (30s → 60s → 300s → alert user)
- Monthly AI budget cap (configurable, default $2/month)
-
-### Enhanced /health Endpoint
-The proxy's `/health` endpoint now returns `uptime_s`, `memory_mb`, and `requests_total` for watchdog monitoring.
-
-### GUI Integration
- **"AI Monitor" button** in header bar
- **AIMonitoringWindow**: ON/OFF toggle, provider URL/model/API key selector, health check interval, auto-restart toggle, incident log viewer
- Watchdog starts automatically when enabled
- All actions logged to `~/.cache/codex-proxy/monitoring.log`
-
-### AI Monitoring Design Spec
-Full design document at `AI-MONITORING-DESIGN.md` — architecture diagrams, decision flow, safety guards, implementation plan.
-
-## v3.7.0 (2026-05-22)
-
-**Intelligence Routing — Self-Healing Parser System**
-
-When the Command Code model produces output in unpredictable or unrecognized formats, the multi-format parser chain (DSML, XML, explore_agent, bash blocks, raw JSON, fallback regex) can return empty. This causes the Codex agent loop to stall — zero tool calls means nothing to execute.
-
-Intelligence Routing is a **three-layer self-healing system** that ensures the agent loop always continues:
-
-### Layer 1: Deep URL Extraction (FIX 23)
- **Problem**: `<explore_agent>` body contained `messages: [{"content": "https://..."}]` — URLs hidden inside JSON values. Regex couldn't match because it excluded the `"` character that terminates JSON strings.
- **Solution**: `_build_explore_cmd()` extracted to module level (was a closure inside `_parse_commandcode_text_tool_calls`). After initial regex fails, tries `json.loads()`, iterates list items, extracts `content` field to find URLs. Added `"` to regex exclusion set.
- **Self-tests**: Pattern M, O, O2 verify URL extraction from nested JSON.
-
-### Layer 2: Escalation Block Handling (FIX 24)
- **Problem**: Model produces `<require_escalation>` and `<request_escalation_permission>` blocks when it wants elevated permissions. CC adapter doesn't support escalation — blocks silently dropped → `parsed_tool_calls=0` → stall.
- **Solution**: Two handlers:
-  - FIX 24a: Closed-tag blocks — extracts URL if present and runs explore command; otherwise echoes auto-proceed.
-  - FIX 24b: Bare/unclosed tags (`<require_escalation />`) — auto-proceeds with diagnostic echo.
- **Self-tests**: Pattern N, N2 verify both closed and bare escalation blocks.
-
-### Layer 3: Intent-Based Command Synthesis (FIX 25 — THE CORE)
- **Problem**: After ALL parsers return empty, the agent loop has zero tool calls. Model may have written plain English ("I need to fetch the README"), partial JSON, or completely unrecognized formats.
- **Solution**: 5-heuristic synthesis chain in `cc_stream_to_sse()`, run when `parsed_tool_calls=0` and text has content:
-  1. **URL in text** → `curl` to fetch it
-  2. **File path reference** ("read the file /path/to/X") → `cat` or `ls` that file
-  3. **Shell command in backticks/quotes** → extract and run it
-  4. **"explore"/"fetch"/"investigate"/"repository" intent** + last user URL → `_build_explore_cmd()` with `_last_user_urls` deque
-  5. **"I need to"/"let me"/"please" intent text** → echo diagnostic with the intent
- The system NEVER returns empty tool calls when there's text to analyze.
- **Self-tests**: Patterns M-O2 cover the full pipeline.
-
-### Architecture
-```
-_parse_commandcode_text_tool_calls()  ←  Layer 1 + Layer 2
-cc_stream_to_sse()                    ←  Layer 3 (after parser chain + fallback)
-_last_user_urls deque (maxlen=20)     ←  Session-wide URL memory for heuristic 4
-```
-
-### Test Coverage
- **54 self-test patterns** (up from 41 in v3.6.0)
- 13 new tests covering all three Intelligence Routing layers
- Tests verify: nested JSON URL extraction, closed/bare escalation blocks, module-level explore command builder
-
-## v3.6.0 (2026-05-22)
-
 **Performance & Stability Hardening — Connection Pooling, Stream Idle Timeouts, Retry-After**

 Inspired by architectural study of [Codex-Proxy-Server](https://github.com/unluckyjori/Codex-Proxy-Server) (Rust/Axum).