# Changelog

## v3.11.8 (2026-05-26)

**Vision Cache Persistence, PR #8 Merge**

### New Features

- **Vision description cache persisted across requests**: Image descriptions from the vision fallback API are now cached in a file (`~/.cache/codex-proxy/vision-cache.json`) so the same image URL is never described twice — saves API calls and latency
- **Merge PR #8**: `fix: persist vision description cache across requests` (cobra91)

## v3.11.7 (2026-05-26)

**Vision Auto-Detect, Proactive Non-Vision Model Detection, Unit Tests, Bug Fixes**

### New Features

- **Vision auto-detect fallback**: When no explicit vision fallback is configured, automatically uses the current provider's own vision model (e.g., `0G-Qwen-VL` for OpenAdapter) as the image description API — no separate API key needed
- **Proactive non-vision model detection**: Models matching name patterns (`glm`, `deepseek`, `llama`, `qwen` without `vl`, etc.) are detected as non-vision on first request without waiting for an error from the provider
- **Vision preprocessing is now the primary image handling solution**: Replaces old `_strip_images_from_input()` (which just removed images with a placeholder). Images are now described via API and sent as rich text descriptions to text-only models
- **Merge PR #6**: Vision/OCR preprocessing for text-only models (cobra91)
- **Merge PR #7**: 177 unit tests for translate-proxy.py (cobra91)

### Bug Fixes

- **AttributeError fix**: `image_url` field can be a string (bare URL) not always a dict — fixed in both `_preprocess_vision_input()` and old strip function
- **Auth os error 2 fix**: GUI shows "Config missing" message instead of raw OSError when `~/.codex/` directory doesn't exist
- **Removed duplicate vision functions**: Cleaned up duplicate `_vision_describe_image()`, `_preprocess_vision()`, `_preprocess_vision_input()` from merge

## v3.11.6 (2026-05-26)

**Antigravity Loop Breakers, Vision/OCR Preprocessing, has_content Fix, Auth Error Fix**

### New Features (Antigravity-only, no other providers affected)

- **Per-session loop tracking**: `_ANTIGRAVITY_LOOP_TRACKER` global dict with `_antigravity_loop_key()` function tracks state per session: `latest_user_hash`, `nudge_injected`, `latest_user_appended`, `tool_calls_for_request`, `repeated_tool`, `force_finalize`, `last_tool`, `last_tool_count`
- **Edit-intent nudge injection**: Injected only on the first turn per request, preventing duplicate nudges across retries
- **Latest user instruction append**: Appended exactly once per request to prevent redundant instruction stacking
- **Loop breaker**: If the same tool + arguments is repeated ≥ 5 times in a session, `force_finalize` is triggered to break the infinite loop
- **Detailed `[antigravity-loop]` logging**: All tracking fields logged on every Antigravity request for debugging

### New Features (All OpenAI-compatible providers)

- **Vision/OCR preprocessing**: When a provider doesn't support images (detected via error messages like "unknown variant image_url", "does not support image"), the proxy automatically calls a configurable vision fallback API (default: Kilo.ai) to describe images as text, then replaces image blocks with text descriptions before sending to text-only models
- **`_vision_describe_image()`**: Calls vision fallback model to describe a single image, with MD5-based caching to avoid re-describing same URL
- **`_preprocess_vision()`**: Replaces `image_url`/`input_image` blocks in Chat Completions message format with text descriptions when provider lacks vision support
- **`_preprocess_vision_input()`**: Same for Responses API input format — runs BEFORE adapter conversion so images are replaced early
- **Vision error retry**: On HTTP 4xx errors containing image-related keywords, automatically retries with images preprocessed instead of failing
- **Configurable via env vars**: `VISION_FALLBACK_URL`, `VISION_FALLBACK_MODEL`, `VISION_FALLBACK_KEY`
- **ProviderSchema `supports_vision` field**: Auto-detected from error responses and persisted in provider-caps.json

### Critical Fixes

- **`has_content` now includes `function_call`** (v3.11.5 fix): `_observe_event` only checked for `"type": "message"` — when models return only tool calls (no text), `has_content` was `False`, causing Codex to loop infinitely and build context until `context_length_exceeded`. Now checks both `"message"` and `"function_call"`.
- **`has_message`/`has_tool_call` initialized in all 5 locations**: Previous fix added variables inside `_observe_event` closure but missed 4 other `has_content = False` locations, causing `NameError: name 'has_message' is not defined` crashes.
- **Auth config-not-found error handling**: When Codex's `config.toml` is missing or deleted, `codex login status` returns "Error loading configuration: No such file or directory (os error 2)". Now caught specifically (`OSError errno==2`) and returns ("not_configured", "Config missing — launch once to create") with clear GUI guidance.

### Bug Fixes (GUI)

- **Active endpoint sync**: GUI auto-removes stale endpoint references on startup

## v3.11.5 (2026-05-26)

**Vision Filter, Token-Aware Compaction, Universal Adaptive Compaction, Smart-Continue Text Detection**

### Critical Fixes

- **Token-aware compaction for small-context models (FIX)**: `_crof_compact_for_retry()` had an early return at `len(input_data) <= limit` (item count) — if you had 25 items × 1600 tokens = 40K tokens, it skipped compaction entirely because 25 < 30 (the default item limit). Now also checks estimated token count vs learned model max, and compacts when either item count OR token count exceeds limits. Fixes repeated `context_length_exceeded` errors on models like 0G-GLM-5.1 (~35K token context).
- **Proactive compaction now token-aware**: Previously only triggered when item count > 30. Now also triggers when estimated tokens exceed 80% of the model's learned token limit, even if item count is below the threshold. Prevents the first-request failure pattern on small-context models.
- **Compaction aggression threshold**: Changed `est > max_tok` to `est >= max_tok * 0.9` to avoid edge case where estimated tokens exactly equal the limit and compaction is skipped.
- **Removed all `crof.ai` gates from adaptive compaction**: Proactive compaction, `finish_reason=length` retry, `_crof_record`, and compaction logging were gated behind `"crof.ai" in TARGET_URL`. These gates prevented OpenAdapter and other providers from getting proactive/retry compaction, causing repeated `context_length_exceeded` failures. Now applies universally to ALL providers.

### New Features

- **Vision model detection + image stripping**: `_strip_images_from_input()` and `_model_supports_vision()` detect vision capability by model name pattern. Non-vision models (deepseek, glm, mixtral, llama, command, dbrx, qwen, phi-3) have `input_image`/`image_url` parts stripped and replaced with `[User attached image: filename — this model does not support vision]` text notice. Vision models (gpt-4o, gemini, claude, qwen-vl, glm-5v) keep images intact. Applied in 3 paths: main request, context_length_exceeded retry, smart-continue nudge.
- **Token estimation and per-model limit learning**: `_estimate_tokens()`, `_estimate_input_tokens()`, `_get_model_max_tokens()`, `_set_model_max_tokens()`. Extracts `~N tokens` from `context_length_exceeded` error messages and stores per-model token limits. Used by proactive compaction and retry compaction to adjust `keep` count dynamically.
- **Compaction aggression levels**: `_crof_compact_for_retry()` accepts `aggression` parameter (0=normal, 1=extreme). Extreme mode kicks in when estimated tokens > 1.5× the learned limit or on 2nd+ retry attempt. Reduces `keep` count to minimum, ensuring the compacted request fits within model limits.
- **Smart-continue text-tool detection**: Removed hard requirement for `has_function_call_output(input_data)`. Added `_TOOL_CALL_TEXT_PATTERNS` and `_text_looks_like_tool_calls()` to trigger nudging when model outputs text matching tool-call patterns (e.g., `• (exec_command cmd ...)`, `write_to_file`, `exec_command`) even without prior `function_call_output` in context. Essential for models like 0G-GLM-5.1 that never emit real `function_call_output` items.
- **Parenthesized tool call regex**: `_PAREN_TC_RE` pattern to match `• (name args...)` format from non-vision models that output tool calls as parenthesized text.

### GUI Fixes

- **Active endpoint sync**: Added `set_active_endpoint()` and `validate_active_endpoint()` to Linux GTK GUI. Syncs `.active-endpoint.json` with `config.toml` on every launch; auto-removes stale references to deleted providers. Fixed `"Error loading configuration: No such file or directory (os error 2)"` crash when active endpoint referenced a deleted provider.
- **Config state**: `~/.codex/.active-endpoint.json` and `config.toml` model catalog path validated and auto-corrected on GUI startup.

## v3.11.0 (2026-05-26)

**Cobra PR Merge + Smart Continuation + API Key Hot-Reload**

### New Features
- **Concurrency semaphore (max 3)**: limits parallel upstream requests to prevent rate-limiting
- **Auto-continue for truncated text**: detects text ending in `:`, `(`, `;`, `…` or `finish_reason=length`, continues seamlessly
- **SO_REUSEADDR on sticky port**: prevents `TIME_WAIT` from changing port on restart
- **proxy-stderr.log**: persistent log file for proxy errors
- **Stream diagnostics**: logs event count, finish reason, content flag, elapsed time after each stream
- **Timeout/OSError handler**: sends proper `response.failed` SSE event instead of silently dropping connection
- **Restart Proxy button**: now only restarts proxy without killing Codex Desktop
- **Tool call argument normalizer**: fixes capital-A `Arguments` key, strips markdown/JSON code block wrapping from tool call arguments
- **Smart-continue loop (2× retries)**: escalating nudge messages when model returns text-only stop mid-task
- **XML tool call extraction**: parses `<tool_call>name{args}</tool_call>` from model text output, injects as real `function_call` items
- **Auto-continue + smart-continue ordered with skip guard**: prevents both from double-firing on the same response
- **API key hot-reload**: mtime tracking detects config changes, `/admin/reload` endpoint triggers hot-reload, `/admin/verify-key` tests key against upstream
- **GUI hot-reload**: auto-refreshes proxy key on endpoint edit, verifies with upstream — no proxy restart needed
- **Synthetic tool-results disabled**: was causing deepseek-v4-pro truncation on opencode.ai

## v3.10.12 (2026-05-26)

**Sticky Endpoint, Claude Fixes, Guardrail Skip, Anti-Stall**

### New Features
- **Sticky endpoint caching**: remembers which endpoint last succeeded, reuses it on every subsequent request (zero overhead)
- **Sequential fallback**: if sticky endpoint fails (429/502/503), tries next endpoint in order — no parallel probing, no wasted requests
- **Endpoint order**: `cloudcode-pa.googleapis.com` first (matches agy CLI), `daily-cloudcode-pa.googleapis.com` as fallback
- **Anti-stall engine**: kills stale proxy processes and clears `__pycache__` on every new session start
- **Smart error classification**: distinguishes `quota_exhausted` vs `capacity_exhausted` vs `account_banned` vs `validation_required` vs `service_disabled` vs `auth_permanent`
- **Rate limit reset time parsing**: extracts cooldown from error body (`quotaResetDelay`, `Resets in ~1h27m`, etc.) for accurate cooldown
- **Missing Antigravity headers**: `X-Client-Name`, `X-Client-Version`, `x-goog-api-client`, platform-aware `User-Agent`
- **Session ID**: added `sessionId` to request wrapper for proper session tracking

### Bug Fixes (TRAE Agent)
- **Guardrail skip for simple messages**: when user sends simple messages (e.g. "hi"), skip injecting `_GEMINI_AGENT_GUARDRAIL` — prevents model from aggressively calling tools and looping `ls -la` 50+ times
- **Claude tool preservation**: Claude models through Antigravity now keep ALL tool outputs in normalizer (no summarization/truncation) — prevents context loss that broke Claude sessions
- **Claude compaction guard**: `_adaptive_compact` skipped for Claude models — Claude handles its own context, no forced compaction
- **Claude normalizer guard**: `_antigravity_normalize_context` skipped for Claude models — avoids stripping Claude-specific message structure
- **Claude sanitization guard**: Google content sanitization loop skipped for Claude models — prevents mangling Claude's response format
- **Normalizer model parameter**: `_antigravity_normalize_context` now receives `model` param to distinguish Claude vs Gemini behavior

## v3.10.11 (2026-05-26)

**Hybrid Endpoint Fallback — Redundant Antigravity Endpoints**

### New Features
- Hybrid endpoint fallback: tries `cloudcode-pa.googleapis.com` then `daily-cloudcode-pa.googleapis.com` on 429
- `daily-cloudcode-pa.googleapis.com` is the same production endpoint agy-core uses (separate rate limit bucket)
- 429 errors now log full response body for debugging
- SERVICE_DISABLED (403) still falls through to next endpoint
- Rate-limit marking only happens after ALL endpoints fail

### Bug Fixes
- Fixed 429 on one endpoint immediately failing — now tries fallback before giving up
- Restored SERVICE_DISABLED fallthrough (was accidentally removed)

## v3.10.10 (2026-05-25)

**Context Normalizer Fix — Compaction Summary Preservation**

### Bug Fixes
- Fixed normalizer stripping ALL context on resumed sessions after compaction
- Normalizer no longer auto-resets when compaction summary is present
- Compaction summaries ("Auto-compacted: N earlier turns") are always preserved
- Deduplicates consecutive identical `<goal_context>` messages (10→1)
- Emergency reset now preserves compaction summaries
- Previous behavior: after compaction reduced 1925→185 items, normalizer saw `n_tool_outputs == 0` and stripped to just `system + latest_user`, losing all context — model responded with "I don't have context"

### hashlib Fix (v3.10.9 hotfix)
- `_antigravity_normalize_context` crashed with `NameError: hashlib` on resumed sessions
- Replaced SHA256 duplicate detection with string comparison

## v3.10.9 (2026-05-25)

**Antigravity Overhaul — Context Normalizer, Claude Thinking Fix, Endpoint Lockdown**

### Antigravity Endpoint Lockdown
- Production-only: `cloudcode-pa.googleapis.com` by default
- Sandbox/staging blocked unless `ALLOW_ANTIGRAVITY_STAGING=1`
- 403 SERVICE_DISABLED falls through, 429 returns to client

### AntigravityContextNormalizer
- Bounded context — no more 136-item polluted requests for "hi"
- Simple message detector, auto-reset polluted context
- Duplicate removal, tool output budget, hard char limits

### Claude Thinking Fix (Antigravity-only)
- Fixed 400 error: `maxOutputTokens=64000` when thinking enabled
- Snake_case config, VALIDATED toolConfig, proper budgets

### z.ai / OpenRouter (cobra91 PR #4)
- Full OpenClaw attribution headers, OpenRouter caching

## v3.10.8 (2026-05-25)

**OAuth & Antigravity Endpoint Fixes**

### Re-OAuth Buttons Fixed
- Linux GUI: `load_oauth_secrets()` was undefined — buttons crashed silently on click
- Now loads OAuth secrets inline from `~/.config/codex-launcher/oauth-secrets.json`
- Both Linux and Windows Re-OAuth use PKCE + localhost callback (was deprecated OOB paste)

### Antigravity Staging/Sandbox Blocked by Default
- Proxy: production `cloudcode-pa.googleapis.com` tried FIRST, sandbox/daily/autopush as fallback only
- Proxy: 403 SERVICE_DISABLED now falls through to next endpoint instead of returning error immediately
- Project discovery: validates against production endpoint, not staging-cloudaicompanion.sandbox
- Antigravity preset `base_url` changed to production (was `daily-cloudcode-pa.sandbox.googleapis.com`)
- `[antigravity-endpoint]` log line shows which endpoints are being tried

### Other Fixes
- GLib.idle_add lambda returning truthy tuple fixed (caused repeated callbacks)
- Windows GUI project discovery also uses production endpoint

## v3.10.7 (2026-05-25)

**Prompt Enhancer — Fix Lost Context After Compaction**

### Prompt Enhancer (Per-Provider Toggle)
- **Offline mode**: Injects structured XML instructions before every user prompt to keep the model focused, decisive, and context-aware after compaction strips conversation history
- **AI-powered mode**: Optionally calls an external LLM (configurable model/URL/key) to rewrite vague prompts into clear, actionable instructions
- Prevents the "had to resend and reword" problem in long sessions where compaction summarizes hundreds of turns
- **Per-endpoint setting** — enable/disable for each provider independently
- Configurable in both Linux and Windows GUI: toggle switch, mode selector, enhancer model, URL, API key fields

### How It Works
- **Offline**: Prepends a `<prompt-enhancer>` block with rules like "never ask for clarification, infer from compacted context, execute decisively"
- **AI-powered**: Sends the user's prompt + compaction summary to a separate model (e.g. DeepSeek V4 Flash via Freebuff) which rewrites it for clarity, then prepends the offline instructions too
- Both modes run after compaction but before the request is sent upstream

## v3.10.6 (2026-05-25)

**Freebuff Integration + Codebuff OAuth Fix + Windows Consolidation**

### Freebuff (Free DeepSeek/Kimi)
- **Freebuff integration**: Free DeepSeek/Kimi models via codebuff.com API
- Fixed User-Agent to match official SDK: `ai-sdk/openai-compatible/1.0.25/codebuff`
- Fixed metadata fields: `freebuff_instance_id` + `client_id` (base36 random) + `cost_mode: "free"`
- Fixed session endpoint: POST empty `{}` body (not `{"model": model}`)
- GUI preset aliases: "Freebuff (Free DeepSeek/Kimi)", "FreeBuff", "Codebuff (Free DeepSeek/Kimi)" all map to same backend

### Codebuff Fix
- Fixed Codebuff OAuth: use `www.codebuff.com` (bare `codebuff.com` returns 307 redirect)

### OAuth Secrets & Credentials (All Providers)
- **OAuth Secrets dialog now shows ALL providers**: Google (Antigravity + Gemini CLI) AND Freebuff/Codebuff
- **Re-OAuth buttons** for each provider: instantly re-authenticate Google or GitHub/Codebuff
- Token status indicators (valid/missing) for each Google provider
- Shows logged-in email and auth status for Freebuff/Codebuff
- Editable auth token and fingerprint fields for Freebuff/Codebuff

### Windows
- Windows GUI files consolidated into `src/` (merged by cobra91 via PR #1 and PR #2)

### Proxy & GUI Improvements (cobra91 PR #3)
- CROF adaptive logic gated to `crof.ai` only — no more log pollution for other providers
- Data directory consolidation: all data now in `codex-proxy/` (was split across `codex-desktop/`, `codex-launcher/`, `codex-proxy/`)
- Sticky proxy port: persists in `.last-proxy-port`, reused on restart so Codex Desktop keeps connection
- Adaptive compact budget raised from 60% to 80% — avoids premature compaction on large-context models (DeepSeek v4 Pro 1M)
- Config cleanup fix: stale `proxy-*.json` cleanup moved after `_init_runtime()` to avoid deleting active config
- Windows GUI: added Clear Log, Restart Proxy, View Log buttons
- **Linux/Windows feature parity**: both GUIs now have identical features
- Windows GUI: ported OAuth Secrets all-providers dialog (Google + Freebuff/Codebuff with Re-OAuth buttons, token status)
- Windows GUI: added Codebuff/Freebuff OAuth login flow (GitHub browser-based)
- Windows GUI: added Sync from Preset button in endpoint editor
- Linux GUI: added Clear Log + Restart Proxy buttons (matching Windows)

## v3.10.5 (2026-05-25)

**Windows GUI + Context Compaction for Antigravity/Gemini OAuth**

### Windows Native GUI (tkinter)
- **Windows GUI** in `windows/` folder — full tkinter port by cobra91
- OAuth Secrets editor, Import JSON, Antigravity model list
- Shared backend with Linux (same translate-proxy.py)
- See README for Windows installation and usage

**Context Compaction for Antigravity/Gemini OAuth**

### Fix
- **Prevent `input token count exceeds maximum` errors** during long conversations
- Added aggressive compaction policies for Antigravity (`cloudcode-pa`) and Gemini CLI (`googleapis`)
- Auto-trims old turns when approaching 60% of model context limit (1M tokens for Gemini, 200K for Claude, 128K for GPT-OSS)
- Added REST model IDs to context size map (`gemini-3-flash`, `gemini-3.1-pro-low`, `claude-sonnet-4-6`, etc.)

## v3.10.4 (2026-05-25)

**Security: OAuth Secrets Editor + Import JSON**

### Security
- **All hardcoded OAuth secrets removed from source code and git history**
- OAuth client IDs and secrets now stored locally in `~/.config/codex-launcher/oauth-secrets.json`
- Git history rewritten to scrub all leaked credentials (0 matches verified)
- Pre-push hook blocks any future commit containing secrets
- All old Gitea releases deleted (contained leaked secrets in .deb files)

### New Features
- **OAuth Secrets editor** in GUI — "OAuth Secrets" button in header bar
- **Import JSON** button — import `client_secret_*.json` downloaded from Google Cloud Console
- Supports both `"installed"` and `"web"` JSON formats from Google

### Antigravity Fix (from v3.10.3)
- Antigravity REST API uses slug IDs, not display names
- Verified all model IDs with live API testing

## v3.10.3 (2026-05-25)

**Fix Antigravity 404 Errors — Verified REST Model IDs**

### Critical Fix
- Antigravity REST API (`v1internal:generateContent`) uses slug IDs, not display names
- Verified all model IDs with live API testing against `daily-cloudcode-pa.sandbox.googleapis.com`
- Display names map to closest working REST model (e.g. `Gemini 3.5 Flash (High)` → `gemini-3-flash`)
- Model list now matches agy CLI: Gemini 3.5 Flash (H/M/L), Gemini 3.1 Pro (H/L), Claude Sonnet/Opus 4.6, GPT-OSS 120B

### Working REST Model IDs
| Display Name | REST ID |
|---|---|
| Gemini 3.5 Flash (High) | gemini-3-flash |
| Gemini 3.5 Flash (Medium) | gemini-3-flash |
| Gemini 3.5 Flash (Low) | gemini-3.5-flash-low |
| Gemini 3.1 Pro (High) | gemini-3.1-pro-low |
| Gemini 3.1 Pro (Low) | gemini-3.1-pro-low |
| Claude Sonnet 4.6 (Thinking) | claude-sonnet-4-6 |
| Claude Opus 4.6 (Thinking) | claude-opus-4-6-thinking |
| GPT-OSS 120B (Medium) | gpt-oss-120b-medium |

## v3.10.2 (2026-05-25)

**Fix Antigravity Model Names**

### Critical Fix
- **Antigravity uses display names as model IDs** — `Gemini 3.5 Flash (High)` not `gemini-3.5-flash-high`
- Previous slug-style IDs caused 404 errors from the Antigravity API
- Proxy alias map maps all old slugs + display names to correct API IDs

## v3.10.0 (2026-05-25)

**Provider Model Editor + Antigravity Model Refresh**

### Provider Editor
- **Remove Selected** button to remove highlighted model(s) from provider
- **Clear All** button to empty model list
- **Sync from Preset** button to refresh model list from current preset definition
- Preset sync now replaces (not appends) models — fixes stale saved model lists

### Antigravity Models Updated
- **Gemini 3.5 Flash** (High / Medium)
- **Gemini 3.1 Pro** (High / Low)
- **Claude Sonnet 4.6 Thinking**
- **Claude Opus 4.6 Thinking**
- **GPT-OSS 120B Medium**

## v3.9.9 (2026-05-25)

**Antigravity Model Refresh**

### Updated Models
- **Gemini 3.5 Flash** (High / Medium) — new flagship flash model
- **Gemini 3.1 Pro** (High / Low) — tiered reasoning control
- **Claude Sonnet 4.6 Thinking** — Anthropic partner model via Antigravity
- **Claude Opus 4.6 Thinking** — Anthropic partner model via Antigravity
- **GPT-OSS 120B Medium** — open-weight GPT model via Antigravity
- Removed stale `antigravity-*` prefixed IDs and old preview models

### Proxy Updates
- Alias map updated for tiered model IDs (high/medium/low/thinking)
- Context sizes added for all new Antigravity models

## v3.9.8 (2026-05-25)

**Codex Desktop Model Fix & Global BrokenPipeError Protection**

### Desktop Model Fix
- **Codex Desktop sending wrong model** (gpt-5.4-mini) instead of user-selected model — now remapped via `CODEX_LAUNCHER_MODEL` env var
- **Config.toml** now writes `review_model`, `wire_api`, `request_max_retries`, `stream_max_retries`, `stream_idle_timeout_ms` for Desktop compatibility
- **Proxy model remap** intercepts Desktop forced models (`gpt-5.4-mini`, `gpt-5.5`, etc.) and routes to the user's selected model

### Global Crash Fix
- **`send_json()` globally catches BrokenPipeError** — no more crashes on client disconnect across all backends

## v3.9.7 (2026-05-25)

**Codebuff Error Forwarding & Crash Fixes**

### Rate Limit Error Forwarding
- **Real Codebuff error messages** forwarded to user instead of generic "429 Too Many Requests"
- **HTTP 200 + Responses API format** for rate limits — Codex displays the actual Codebuff message (e.g. "Daily session limit reached. Resets in 29m.") instead of retrying
- **`retryAfterMs` extraction** from Codebuff 429 responses for accurate cooldown timers
- **`_codebuff_start_run`** returns actual error body instead of `None` — shows real Codebuff errors

### Crash Fixes
- **BrokenPipeError crash** on "all accounts exhausted" response — wrapped in try/except
- **3 SyntaxWarnings** fixed for invalid `\ ` escape sequences in docstrings

## v3.9.6 (2026-05-25)

**Performance & Stability Hardening — Connection Pooling, Stream Idle Timeouts, Retry-After**

Inspired by architectural study of [Codex-Proxy-Server](https://github.com/unluckyjori/Codex-Proxy-Server) (Rust/Axum).

### P0: Connection Pooling & Stream Idle Timeout
- **Connection pooling** (`http.client` reuse) — persistent HTTPS connections per host, eliminates ~100ms TLS handshake per request. Pool keyed by `{scheme}://{host}:{port}`, reused across requests.
- **Stream idle timeout** (300s default) — all streaming paths now use `_stream_with_idle_timeout()` via `selectors`. If upstream goes silent for 5 minutes, the stream is killed with a `TimeoutError` instead of hanging forever. Applied to:
  - OpenAI-compat streaming (`oa_stream_to_sse`)
  - Command Code streaming (`_iter_cc_events`)
  - Gemini OAuth streaming (`_handle_gemini_oauth`)
  - Auto-continue streaming (`_auto_continue_gemini`)

### P1: Retry-After Header Support & Preemptive Token Refresh
- **`Retry-After` header** — all retry paths (openai-compat, BGP, auto) now read the upstream `Retry-After` header and respect it (capped at 60s). Falls back to exponential backoff if header is absent.
- **Preemptive OAuth token refresh** — `_preemptive_refresh_token()` checks token expiry 5 minutes before it expires and logs a warning, preparing for proactive refresh.

### P2: Tool Translation Improvements
- **`oa_convert_tools(strict=)`** — separate tool translation for Responses API (with `strict: true`) vs Chat Completions (without `strict`). Some providers reject the `strict` field in Chat Completions mode.
- **Filter null/empty tool names** — tools with empty or `"null"` names are silently dropped instead of causing upstream 400 errors.

### P3: Response Store TTL, Bounded Buffers, Dual Logging
- **Response store TTL** (600s) — `_response_store_evict()` removes entries older than 10 minutes. Prevents unbounded memory growth on long sessions.
- **Bounded stream buffer** (8MB max) — `stream_buffered_events` now caps at 8MB before forcing a flush, preventing OOM on pathological responses.
- **`response.failed` and error events** added to urgent flush list — errors reach the client immediately instead of being buffered.
- **Dual logging** — `proxy.log` in `~/.cache/codex-proxy/` captures all proxy messages alongside stderr. Survives Codex Desktop's stderr piping.

## v3.5.0 (2026-05-22)

**Major Release — Command Code Adapter Overhaul, AI Assist, Self-Revive Watchdog, Debug Infrastructure**

### Command Code Provider — Multi-Format Tool-Call Parser (Critical Bug Fix)

The Command Code (CC) provider adapter in `translate-proxy.py` had a critical bug where the CC model's tool-call output was not being parsed into executable tool calls, causing the Codex agent loop to stop after the first response. The CC model output format **changes between sessions and models** — the parser must handle all observed formats.

**Root Cause:** The CC model returns tool calls as inline text in various formats (raw JSON, XML, DSML tags, HTML-like blocks) within `text-delta` SSE events. The original parser only handled one format. When the model switched output style, tool calls were silently dropped, and Codex received a plain text response instead of executable commands — halting the multi-turn agent loop.

**The Fix — Multi-Format Parser Chain (17 patches):**

A cascading parser chain was built that tries each format in order, first match wins:
`DSML → <bash> blocks → <explore_agent> → <tool_call type=...> → XML patterns → raw JSON → fallback regex`

- **FIX 1**: `cc_input_to_messages()` — enforce STRING content only (CC `/alpha/generate` rejects content blocks). Tool calls sent as inline JSON text in assistant messages. Tool results as `role: "user"` plain text (NOT `role: "tool"`).
- **FIX 2**: `x-command-code-version` header always sent (fallback `"0.26.8"`) — prevents 403 `upgrade_required` errors.
- **FIX 3**: Cleared stale schema cache (`content_type:"array"`) that was corrupting message construction.
- **FIX 4**: Streaming `try/except` wrapper — catches all streaming errors and sends `response.completed(status:"failed")` event instead of crashing the connection.
- **FIX 5**: `_extract_raw_json_tool_calls()` — new parser that finds raw JSON tool calls embedded in model text (`{"cmd":"...","type":"tool-call"}`).
- **FIX 6**: `_extract_args()` three-tier parser — tries direct parse → `codecs.escape_decode` → `unicode_escape` to prevent double-wrapped argument strings.
- **FIX 7**: `_extract_field()` skips leading `\` before value type check — handles malformed escape sequences in CC output.
- **FIX 8**: `sandbox_permissions` normalization from parsed dict — converts `{"docker":"full"}` to the flat string format Codex expects.
- **FIX 9** (REVERTED): Removed adaptive probe system — proved unnecessary, conservative inline-text format is sufficient.
- **FIX 10**: Comprehensive fix documentation added to proxy file header for maintainability.
- **FIX 11**: `_unwrap_cmd()` recursive unwrapping — handles double/triple-wrapped `cmd` values at all 7 extraction paths. `_sanitize_tool_calls()` post-extraction validation layer ensures every tool call has valid name + args.
- **FIX 11c**: XML regex fix — `</tool_call)` had unbalanced parenthesis for ~4000 lines; now uses `[)]?>` to match both `</tool_call)>` and `</tool_call)>`.
- **FIX 12**: Self-revive watchdog loop — auto-restarts proxy on crash (up to 50x, progressive backoff 1→30s). Controlled by `_SHUTDOWN_REQUESTED` flag on SIGTERM/SIGINT.
- **FIX 13**: Fallback extraction when main parser returns empty but text contains tool-call signals (`{"cmd":`, `"type":"tool-call"`, `<tool`, `<function=`).
- **FIX 14**: Parser for `<tool_call type="bash">\n{"command":"..."}` format (actual CC model output) + fixed fallback regex to match BOTH `"cmd"` AND `"command"` keys.
- **FIX 15**: `<explore_agent>` blocks converted to real `exec_command` with synthesized curl-based repo exploration command.
- **FIX 16**: `<bash>...</bash>` blocks parsed — extracts `prefix_rule`, `sandbox_permissions`, `justification` via line-oriented parsing.
- **FIX 17**: DSML tool_call blocks — the **current CC model output format**:
  - `<｜｜DSML｜｜tool_calls>` wrapper
  - `<｜｜DSML｜｜invoke name="exec">` with `<｜｜DSML｜｜parameter name="command">` tags
  - Extracts command from `parameter name="command"` or fallback to `prefix_rule`
  - Maps `exec`/`bash` → `exec_command`

### Debug Infrastructure
- **Debug-to-file**: All proxy events, text_buf preview, parser results, and fallback attempts logged to `~/.cache/codex-proxy/cc-debug.log` — works even when stderr is piped by Codex Desktop.
- **Inline self-test**: `--self-test` flag runs 19 tests covering unwrap, double-wrap, unescaped quotes, XML, function=, sanitizer edge cases.
- **Per-request logging**: Event types, text_buf content, parser match results written to debug log for every request.

### AI Assist
- AI Assist integration in launcher GUI for intelligent provider configuration and troubleshooting.

### Self-Revive Watchdog
- Proxy auto-restarts on crash with progressive backoff (1s → 30s, up to 50 restarts).
- Clean shutdown on SIGTERM/SIGINT via `_SHUTDOWN_REQUESTED` flag.
- Eliminates manual proxy restart during long coding sessions.

### Other Improvements
- `text_buf` in `cc_stream_to_sse` accumulates all `text-delta` events; parsing happens at end-of-stream for complete extraction.
- Schema cache with 24h staleness TTL for provider capabilities.
- ErrorAnalyzer learns from 4xx errors on retry (max 2 retries).
- `cleanup-codex-stale.sh` updated with additional stale process patterns.

## v3.3.0 (2026-05-20)

**Antigravity + Gemini CLI OAuth — full Codex agent loop working**

### Gemini CLI OAuth + Antigravity OAuth
- Split Google OAuth into separate Gemini CLI OAuth and Google Antigravity OAuth presets/backends.
- Gemini CLI OAuth uses the Gemini CLI public OAuth client and Code Assist endpoints.
- Antigravity OAuth uses Antigravity OAuth credentials, Code Assist daily/autopush/prod fallback, and Antigravity-style request wrapping.
- Added Antigravity version discovery from the updater/changelog with local caching.
- Added Antigravity model alias mapping from UI-facing `antigravity-*` IDs to upstream Code Assist model IDs.

### Responses API + Tool Flow
- Added Gemini-style history hardening for Google OAuth requests: removes empty turns, coalesces adjacent roles, drops duplicate user repeats, and enforces user-start/user-end history.
- Preserves function-call IDs across turns and adds synthetic `thoughtSignature` for historical Gemini function calls, matching Gemini CLI hardening behavior.
- Fixed Antigravity streaming Responses API compatibility: single assistant message item, text done events, content part done, output item done, final completed event, and connection close.
- Added `response.function_call_arguments.delta` and `response.function_call_arguments.done` events so Codex can execute Antigravity tool calls and create files.
- Fixed functionResponse name matching — uses the original functionCall name instead of falling back to call_id.
- Strengthened Antigravity prompt policy: use tools immediately for file changes, avoid planning-only responses, and answer directly when no suitable tool exists.
- **Auto-continue on MAX_TOKENS** — when Gemini/Antigravity truncates a text response, the proxy transparently sends a continuation request and concatenates the output so Codex receives the complete response without manual intervention.

### Reliability + Routing
- Added BGP++ route scoring, route cooldowns, token buckets, and persisted route stats.
- Added provider policy layer and adaptive context compaction.
- Added tool-call pairing validation/repair for orphaned tool outputs.
- Added Endpoint Doctor in the endpoint editor.
- Added log redaction helper for common API key/token patterns.

## v3.1.0 (2026-05-20)

- Initial Antigravity/Gemini CLI OAuth backend split.
- Gemini-style history hardening, SSE streaming fixes.

## v3.0.0 (2026-05-20)

**Major architectural overhaul — Phase 0 + Phase 1 of engineering roadmap**

### Proxy (translate-proxy.py)
- **ThreadingHTTPServer** — serves concurrent requests (no more blocking)
- **Thread-safe shared state** — OrderedDict response store with locks, Crof state lock, stats lock
- **Batched + atomic stats writes** — stats buffered in memory, flushed every 5s via `os.replace()`
- **Graceful shutdown** — SIGTERM/SIGINT drain active connections (up to 5s), reject new with 503
- **Progressive upstream timeouts** — based on input size and tools (60-300s instead of flat 180s)
- **Lazy JSON parsing** — skip parsing SSE events unless they contain `response.completed`
- **Buffered SSE writes** — flush every 30ms, on urgent events, or at 4KB (reduces syscalls)
- **`/health` endpoint** — returns backend, target, models, BGP route count
- **Consolidated imports** — all at top, no more missing import crashes
- **`main()` entry point** — runtime init moved out of module level
- **TCP_NODELAY** — on all streaming paths (from v2.7.0)
- **Anthropic prompt caching** — `cache_control: ephemeral` on system prompts (from v2.7.0)

### Launcher (codex-launcher-gui)
- **Dynamic port allocation** — `_pick_free_port()` picks random free port, no more 8080 conflicts
- **Proxy health gating** — Codex will NOT launch if proxy fails health check within 15s
- **Error dialogs** — clear GTK error dialog when proxy startup fails
- **Atomic config backup/restore** — temp file + `os.replace()`, no more corrupted config.toml
- **Config transactions** — recovery from interrupted sessions on next startup
- **Safe cleanup (PID registry)** — only kills processes launched by the app (pids.json)
- **Proxy stderr piped to log** — real-time proxy logs in launcher UI
- **Bearer token** — Codex config uses `codex-launcher-local` instead of real API key
- **Usage Dashboard v2** — OpenUsage-inspired dark theme with status pills, KPI strip, model bars (from v2.7.0)

## v2.7.0 (2026-05-20)

- **Usage Dashboard redesigned** (inspired by OpenUsage design patterns)
  - Deep Space dark theme with Catppuccin-inspired color palette
  - Header with animated status dots (OK/WARN/ERR provider health)
  - KPI summary strip: total providers, requests, token volume, avg latency
  - Provider cards with colored borders matching health status
  - Status pills: OK (green), WARN (yellow), ERR (red)
  - Colored section separators per metric type (Usage=yellow, Models=lavender)
  - Model composition bar: stacked horizontal segments per model share
  - Per-model breakdown with mini progress bars, percentage, request counts
  - Per-model token breakdown (in/out) when available
  - Token formatting: 1.2M, 45.3K instead of raw numbers
  - Duration formatting: 1.5h, 3.2m instead of raw seconds
  - Error section with warning icon

- **TCP_NODELAY streaming optimization**
  - Disables Nagle's algorithm on streaming connections
  - Reduces per-packet latency by up to 40ms on small SSE events
  - Applied to all 4 streaming code paths (openai-compat, retry, command-code, generic)

- **Anthropic prompt caching**
  - System prompts now sent as `cache_control: ephemeral` structured format
  - Enables Anthropic's automatic prompt caching (saves tokens + cost on repeated prompts)

## v2.6.1 (2026-05-20)

- **Google OAuth rebuilt to emulate Gemini CLI**
  - Uses Google's public OAuth client_id (same as gemini-cli)
  - No `client_secret.json` needed — zero setup required
  - PKCE (S256 code challenge) + CSRF state protection
  - Scopes: cloud-platform, generative-language, userinfo.email, userinfo.profile
  - Redirects to Google's success/failure pages (same as gemini-cli)
  - Just click "OAuth Login" → browser opens → authorize → done
  - Token file permissions set to 0600 for security

## v2.6.0 (2026-05-20)

- **Usage Dashboard** — per-provider tracking with visual cards
  - Request counts, success/failure rates, token usage, latency stats
  - Color-coded success rate bars (green/yellow/red)
  - Per-model breakdown showing request counts
  - Last error and last used timestamp
  - Sorted by most-used provider
  - Refresh button for live updates
- **Proxy usage tracking** — records every request to `usage-stats.json`
- **Google OAuth**: browse for `client_secret.json` with file picker dialog
  - No longer requires copying to a specific path manually
  - Auto-copies selected file to `~/.cache/codex-proxy/`

## v2.5.1 (2026-05-20)

- **Adaptive retry for transient errors** (429/502/503)
  - Exponential backoff: 2s → 4s → 8s, up to 3 retries
  - Works for both single-provider and BGP mode
  - BGP routes retry before failing over to next route
  - Connection errors (reset/broken pipe) also retried
- **Proxy socket reuse** — no more `Address already in use` crashes on restart
- **BGP startup log** shows route count and names

## v2.5.0 (2026-05-20)

- **AI BGP — Multi-provider routing with automatic failover**
  - New "AI BGP" button in main window → pool manager
  - Create BGP pools with ordered routes from any configured endpoint
  - Each route has its own endpoint URL, API key, model, and priority
  - **Failover strategy**: tries primary route, automatically falls back to next on error/timeout
  - BGP pools appear in endpoint dropdown with 🔀 icon
  - Pool editor: add/remove/reorder routes, pick endpoint + model per route
  - Up/down buttons for priority reordering
  - Proxy logs `[bgp] trying route 'Name'` and `[bgp] route 'Name' FAILED` on fallback
  - If all routes fail: returns 502 with detailed error per route
- Fixed TOML config breakage from multi-line paste in API key field (`_toml_safe()`)

## v2.4.0 (2026-05-20)

- **Added OpenAdapter provider preset**
  - Base URL: `https://api.openadapter.in/v1` — one API key, 40+ models
  - Pre-loaded models: glm-4.7, DeepSeek-V3, kimi-k2.6, qwen3.6-plus, claude-sonnet-4-6, gpt-5.4, gemini-2.5-flash, and more
  - Works with existing openai-compat proxy backend — no special handling needed
- Fixed Add/Edit dialog crash (missing `_on_reasoning_toggled` method)
- Redesigned Google OAuth flow with live status dialog and clickable auth URL

## v2.3.2 (2026-05-20)

- **Added Google Gemini provider with OAuth support**
  - Two presets: "Google Gemini (API Key)" and "Google Gemini (OAuth)"
  - OAuth Login button in endpoint editor — full Google OAuth2 flow
  - Starts local HTTP server (port 8085), opens browser for Google consent
  - Captures auth code, exchanges for access + refresh tokens
  - Stores tokens in `~/.cache/codex-proxy/google-oauth-token.json`
  - Auto-refreshes access tokens when expired (no manual re-login)
  - Uses Gemini's OpenAI-compatible endpoint: `generativelanguage.googleapis.com/v1beta/openai`
  - Models: gemini-2.5-flash, gemini-2.5-pro, gemini-2.0-flash, gemini-2.0-flash-lite, and more
  - Setup instructions shown if `client_secret.json` not found

## v2.3.0 (2026-05-20)

- **Adaptive Crof self-healing system**
  - Tracks per-model success/failure history with item counts
  - Dynamically learns max item limit per model (starts at 30, adjusts down on failures)
  - Proactively compacts input when above learned limit before sending to upstream
  - Auto-retry on `finish_reason=length` with aggressive re-compaction and resend
  - Prevents `stream disconnected` and `incomplete` errors on long conversations
  - All tracking logged to stderr: `[crof-adaptive] model=X items=N OK/FAIL -> limit=N`
- Fixed `NameError: _ts` crash in debug logging
- Fixed `ConnectionResetError` crash on client disconnect during streaming
- Added 180s upstream timeout to prevent hanging connections
- Compaction now preserves function_call/function_call_output pairs (no orphaned tool outputs)
- Fixed reasoning control: `reasoning_effort=none` always sends both params

## v2.2.1 (2026-05-20)

- **Fixed compaction orphaning function_call_output items** — root cause of Crof `incomplete` responses
  - Compaction cut between function_call and its function_call_output, creating dangling tool results
  - Crof model received orphaned `tool` messages with empty `tool_call_id`, causing confusion and token exhaustion
  - Compaction now expands tail boundary to include matching function_call/function_call_output pairs
- **Fixed reasoning control**: `reasoning_effort=none` now always sends both `enable_thinking=false` AND `reasoning_effort=none`
  - Crof API testing confirmed `reasoning_effort=none` is what actually suppresses reasoning, not `enable_thinking=false`
- Added upstream debug logging to `~/.cache/codex-proxy/crof-upstream.jsonl`

## v2.2.0 (2026-05-20)

- **Added per-provider Reasoning controls in endpoint editor**
  - Reasoning On/Off toggle — disable reasoning for models that exhaust output tokens (e.g., Crof mimo-v2.5-pro)
  - Reasoning Effort selector: None, Minimal, Low, Medium, High, Max
  - When reasoning is OFF: sends `enable_thinking=false` + `reasoning_effort=none` to upstream API
  - When reasoning is ON: sends user-selected effort level (default: Medium)
  - Settings stored per-endpoint, passed through proxy config to upstream requests
- Strip `reasoning_content` from proxy output — Codex doesn't use it, avoids token waste
- Force `max_tokens=64000` minimum for openai-compat providers — room for both reasoning and content
- Inspired by unsloth's reasoning control patterns for Qwen/GPT-OSS models
- Styled reasoning switch: green = ON, orange = OFF, gentle rounded pill shape
- Added error handling to endpoint manager Add/Edit/Manage dialogs (prevents silent failures)

## v2.1.3 (2026-05-19)

- **Fixed Crof mimo-v2.5-pro stopping mid-response (finish_reason=length)**
  - Root cause: model emits 600+ `reasoning_content` SSE chunks that exhaust `max_tokens` before any actual content is generated
  - Strip `reasoning_content` from proxy output — Codex doesn't use reasoning, avoids wasting output tokens on invisible text
  - Force `max_tokens` minimum of 64000 for openai-compat providers — gives models room for both reasoning and content
  - Works for all openai-compat providers (Crof, Z.AI, DeepSeek, OpenRouter, etc.)

## v2.1.2 (2026-05-19)

- **Fixed Crof.ai and providers stopping after first tool call (root cause: None tool IDs)**
- Codex sends `function_call` items with `id=None` — proxy now matches tool results to calls by call_id + positional fallback
- Fixed orphan message output item when response is only tool calls (no text content)
- **Auto-trims long conversations (>30 items)** to prevent context overflow on providers like Crof
  - Keeps system/developer messages, original user query, and most recent 10 items
  - **Auto-compacts old items into a summary** instead of just dropping them
  - Summary includes: user requests, assistant responses, tool calls made, files touched
  - Preserves enough context for the model to continue long tasks intelligently
- **Truncates large tool outputs (>8000 chars)** to prevent model output token exhaustion
  - Crof's models return `incomplete` when tool results contain too much text (e.g., full HTML pages)
  - Truncated outputs include `[truncated N chars]` suffix so the model knows data was cut
- Added request/response logging to `~/.cache/codex-proxy/requests.log` for debugging
- Proxy stderr no longer discarded by launcher (visible in terminal for debugging)

## v2.1.1 (2026-05-19)

- Added Command Code backend to translation proxy (proprietary `/alpha/generate` API)
- Added Command Code provider preset with 20 models (DeepSeek, Claude, GPT, Kimi, GLM, Qwen, etc.)
- Added `cc_version` field in endpoint editor for Command Code version (default: 0.26.8)
- Proxy sends `x-command-code-version` header to CC API (fixes 403 "upgrade_required")
- CC message conversion: `system` role → `user`, string content → array, tools stripped, real UUID for threadId
- Fixed proxy: map `developer` role to `system` for Chat Completions providers (DeepSeek, Qwen, etc.)
- Fixed proxy: map `developer` role to `user` for Anthropic providers
- Forward `instructions` field from Responses API as system message/param

## v2.1.0 (2026-05-19)

- Added Codex auth status detection (reads `codex login status`)
- Auth status bar shows logged-in provider or warning if auth missing/expired
- "Re-login" button opens `codex login` in a terminal for re-authentication
- Auto re-checks auth 30s after re-login flow starts
- Pre-launch auth check warns before launching Codex Default mode if auth is invalid
- Auth status checked asynchronously at startup (non-blocking)

## v2.0.1 (2026-05-19)

- Added Codex CLI/Desktop installation verifier to main page
- Green check (✔) when detected, yellow cross (✘) when missing
- "Install" button next to missing tools opens install guide dialog
- Desktop/CLI launch buttons disabled with tooltip when tool is missing
- Dependency status logged on startup
- Buttons respect missing-state after busy/unbusy cycles

## v2.0.0 (2026-05-19)

- Initial release: multi-provider Codex Launcher
- Translation proxy: Responses API to Chat Completions + Anthropic Messages
- GTK endpoint manager with 10+ provider presets
- Codex Default mode (built-in OAuth, zero config)
- Browser UA injection for Cloudflare-protected providers (OpenCode)
- Streaming SSE, tool calls, reasoning content support
- Profile backup/import, model auto-fetch, bulk import
- Refresh Models in background thread
- URL normalization to prevent double-path bugs
- Config backup/restore around sessions
- .deb installer package