# Changelog ## v3.11.8 (2026-05-26) **Vision Cache Persistence, PR #8 Merge** ### New Features - **Vision description cache persisted across requests**: Image descriptions from the vision fallback API are now cached in a file (`~/.cache/codex-proxy/vision-cache.json`) so the same image URL is never described twice — saves API calls and latency - **Merge PR #8**: `fix: persist vision description cache across requests` (cobra91) ## v3.11.7 (2026-05-26) **Vision Auto-Detect, Proactive Non-Vision Model Detection, Unit Tests, Bug Fixes** ### New Features - **Vision auto-detect fallback**: When no explicit vision fallback is configured, automatically uses the current provider's own vision model (e.g., `0G-Qwen-VL` for OpenAdapter) as the image description API — no separate API key needed - **Proactive non-vision model detection**: Models matching name patterns (`glm`, `deepseek`, `llama`, `qwen` without `vl`, etc.) are detected as non-vision on first request without waiting for an error from the provider - **Vision preprocessing is now the primary image handling solution**: Replaces old `_strip_images_from_input()` (which just removed images with a placeholder). Images are now described via API and sent as rich text descriptions to text-only models - **Merge PR #6**: Vision/OCR preprocessing for text-only models (cobra91) - **Merge PR #7**: 177 unit tests for translate-proxy.py (cobra91) ### Bug Fixes - **AttributeError fix**: `image_url` field can be a string (bare URL) not always a dict — fixed in both `_preprocess_vision_input()` and old strip function - **Auth os error 2 fix**: GUI shows "Config missing" message instead of raw OSError when `~/.codex/` directory doesn't exist - **Removed duplicate vision functions**: Cleaned up duplicate `_vision_describe_image()`, `_preprocess_vision()`, `_preprocess_vision_input()` from merge ## v3.11.6 (2026-05-26) **Antigravity Loop Breakers, Vision/OCR Preprocessing, has_content Fix, Auth Error Fix** ### New Features (Antigravity-only, no other providers affected) - **Per-session loop tracking**: `_ANTIGRAVITY_LOOP_TRACKER` global dict with `_antigravity_loop_key()` function tracks state per session: `latest_user_hash`, `nudge_injected`, `latest_user_appended`, `tool_calls_for_request`, `repeated_tool`, `force_finalize`, `last_tool`, `last_tool_count` - **Edit-intent nudge injection**: Injected only on the first turn per request, preventing duplicate nudges across retries - **Latest user instruction append**: Appended exactly once per request to prevent redundant instruction stacking - **Loop breaker**: If the same tool + arguments is repeated ≥ 5 times in a session, `force_finalize` is triggered to break the infinite loop - **Detailed `[antigravity-loop]` logging**: All tracking fields logged on every Antigravity request for debugging ### New Features (All OpenAI-compatible providers) - **Vision/OCR preprocessing**: When a provider doesn't support images (detected via error messages like "unknown variant image_url", "does not support image"), the proxy automatically calls a configurable vision fallback API (default: Kilo.ai) to describe images as text, then replaces image blocks with text descriptions before sending to text-only models - **`_vision_describe_image()`**: Calls vision fallback model to describe a single image, with MD5-based caching to avoid re-describing same URL - **`_preprocess_vision()`**: Replaces `image_url`/`input_image` blocks in Chat Completions message format with text descriptions when provider lacks vision support - **`_preprocess_vision_input()`**: Same for Responses API input format — runs BEFORE adapter conversion so images are replaced early - **Vision error retry**: On HTTP 4xx errors containing image-related keywords, automatically retries with images preprocessed instead of failing - **Configurable via env vars**: `VISION_FALLBACK_URL`, `VISION_FALLBACK_MODEL`, `VISION_FALLBACK_KEY` - **ProviderSchema `supports_vision` field**: Auto-detected from error responses and persisted in provider-caps.json ### Critical Fixes - **`has_content` now includes `function_call`** (v3.11.5 fix): `_observe_event` only checked for `"type": "message"` — when models return only tool calls (no text), `has_content` was `False`, causing Codex to loop infinitely and build context until `context_length_exceeded`. Now checks both `"message"` and `"function_call"`. - **`has_message`/`has_tool_call` initialized in all 5 locations**: Previous fix added variables inside `_observe_event` closure but missed 4 other `has_content = False` locations, causing `NameError: name 'has_message' is not defined` crashes. - **Auth config-not-found error handling**: When Codex's `config.toml` is missing or deleted, `codex login status` returns "Error loading configuration: No such file or directory (os error 2)". Now caught specifically (`OSError errno==2`) and returns ("not_configured", "Config missing — launch once to create") with clear GUI guidance. ### Bug Fixes (GUI) - **Active endpoint sync**: GUI auto-removes stale endpoint references on startup ## v3.11.5 (2026-05-26) **Vision Filter, Token-Aware Compaction, Universal Adaptive Compaction, Smart-Continue Text Detection** ### Critical Fixes - **Token-aware compaction for small-context models (FIX)**: `_crof_compact_for_retry()` had an early return at `len(input_data) <= limit` (item count) — if you had 25 items × 1600 tokens = 40K tokens, it skipped compaction entirely because 25 < 30 (the default item limit). Now also checks estimated token count vs learned model max, and compacts when either item count OR token count exceeds limits. Fixes repeated `context_length_exceeded` errors on models like 0G-GLM-5.1 (~35K token context). - **Proactive compaction now token-aware**: Previously only triggered when item count > 30. Now also triggers when estimated tokens exceed 80% of the model's learned token limit, even if item count is below the threshold. Prevents the first-request failure pattern on small-context models. - **Compaction aggression threshold**: Changed `est > max_tok` to `est >= max_tok * 0.9` to avoid edge case where estimated tokens exactly equal the limit and compaction is skipped. - **Removed all `crof.ai` gates from adaptive compaction**: Proactive compaction, `finish_reason=length` retry, `_crof_record`, and compaction logging were gated behind `"crof.ai" in TARGET_URL`. These gates prevented OpenAdapter and other providers from getting proactive/retry compaction, causing repeated `context_length_exceeded` failures. Now applies universally to ALL providers. ### New Features - **Vision model detection + image stripping**: `_strip_images_from_input()` and `_model_supports_vision()` detect vision capability by model name pattern. Non-vision models (deepseek, glm, mixtral, llama, command, dbrx, qwen, phi-3) have `input_image`/`image_url` parts stripped and replaced with `[User attached image: filename — this model does not support vision]` text notice. Vision models (gpt-4o, gemini, claude, qwen-vl, glm-5v) keep images intact. Applied in 3 paths: main request, context_length_exceeded retry, smart-continue nudge. - **Token estimation and per-model limit learning**: `_estimate_tokens()`, `_estimate_input_tokens()`, `_get_model_max_tokens()`, `_set_model_max_tokens()`. Extracts `~N tokens` from `context_length_exceeded` error messages and stores per-model token limits. Used by proactive compaction and retry compaction to adjust `keep` count dynamically. - **Compaction aggression levels**: `_crof_compact_for_retry()` accepts `aggression` parameter (0=normal, 1=extreme). Extreme mode kicks in when estimated tokens > 1.5× the learned limit or on 2nd+ retry attempt. Reduces `keep` count to minimum, ensuring the compacted request fits within model limits. - **Smart-continue text-tool detection**: Removed hard requirement for `has_function_call_output(input_data)`. Added `_TOOL_CALL_TEXT_PATTERNS` and `_text_looks_like_tool_calls()` to trigger nudging when model outputs text matching tool-call patterns (e.g., `• (exec_command cmd ...)`, `write_to_file`, `exec_command`) even without prior `function_call_output` in context. Essential for models like 0G-GLM-5.1 that never emit real `function_call_output` items. - **Parenthesized tool call regex**: `_PAREN_TC_RE` pattern to match `• (name args...)` format from non-vision models that output tool calls as parenthesized text. ### GUI Fixes - **Active endpoint sync**: Added `set_active_endpoint()` and `validate_active_endpoint()` to Linux GTK GUI. Syncs `.active-endpoint.json` with `config.toml` on every launch; auto-removes stale references to deleted providers. Fixed `"Error loading configuration: No such file or directory (os error 2)"` crash when active endpoint referenced a deleted provider. - **Config state**: `~/.codex/.active-endpoint.json` and `config.toml` model catalog path validated and auto-corrected on GUI startup. ## v3.11.0 (2026-05-26) **Cobra PR Merge + Smart Continuation + API Key Hot-Reload** ### New Features - **Concurrency semaphore (max 3)**: limits parallel upstream requests to prevent rate-limiting - **Auto-continue for truncated text**: detects text ending in `:`, `(`, `;`, `…` or `finish_reason=length`, continues seamlessly - **SO_REUSEADDR on sticky port**: prevents `TIME_WAIT` from changing port on restart - **proxy-stderr.log**: persistent log file for proxy errors - **Stream diagnostics**: logs event count, finish reason, content flag, elapsed time after each stream - **Timeout/OSError handler**: sends proper `response.failed` SSE event instead of silently dropping connection - **Restart Proxy button**: now only restarts proxy without killing Codex Desktop - **Tool call argument normalizer**: fixes capital-A `Arguments` key, strips markdown/JSON code block wrapping from tool call arguments - **Smart-continue loop (2× retries)**: escalating nudge messages when model returns text-only stop mid-task - **XML tool call extraction**: parses `name{args}` from model text output, injects as real `function_call` items - **Auto-continue + smart-continue ordered with skip guard**: prevents both from double-firing on the same response - **API key hot-reload**: mtime tracking detects config changes, `/admin/reload` endpoint triggers hot-reload, `/admin/verify-key` tests key against upstream - **GUI hot-reload**: auto-refreshes proxy key on endpoint edit, verifies with upstream — no proxy restart needed - **Synthetic tool-results disabled**: was causing deepseek-v4-pro truncation on opencode.ai ## v3.10.12 (2026-05-26) **Sticky Endpoint, Claude Fixes, Guardrail Skip, Anti-Stall** ### New Features - **Sticky endpoint caching**: remembers which endpoint last succeeded, reuses it on every subsequent request (zero overhead) - **Sequential fallback**: if sticky endpoint fails (429/502/503), tries next endpoint in order — no parallel probing, no wasted requests - **Endpoint order**: `cloudcode-pa.googleapis.com` first (matches agy CLI), `daily-cloudcode-pa.googleapis.com` as fallback - **Anti-stall engine**: kills stale proxy processes and clears `__pycache__` on every new session start - **Smart error classification**: distinguishes `quota_exhausted` vs `capacity_exhausted` vs `account_banned` vs `validation_required` vs `service_disabled` vs `auth_permanent` - **Rate limit reset time parsing**: extracts cooldown from error body (`quotaResetDelay`, `Resets in ~1h27m`, etc.) for accurate cooldown - **Missing Antigravity headers**: `X-Client-Name`, `X-Client-Version`, `x-goog-api-client`, platform-aware `User-Agent` - **Session ID**: added `sessionId` to request wrapper for proper session tracking ### Bug Fixes (TRAE Agent) - **Guardrail skip for simple messages**: when user sends simple messages (e.g. "hi"), skip injecting `_GEMINI_AGENT_GUARDRAIL` — prevents model from aggressively calling tools and looping `ls -la` 50+ times - **Claude tool preservation**: Claude models through Antigravity now keep ALL tool outputs in normalizer (no summarization/truncation) — prevents context loss that broke Claude sessions - **Claude compaction guard**: `_adaptive_compact` skipped for Claude models — Claude handles its own context, no forced compaction - **Claude normalizer guard**: `_antigravity_normalize_context` skipped for Claude models — avoids stripping Claude-specific message structure - **Claude sanitization guard**: Google content sanitization loop skipped for Claude models — prevents mangling Claude's response format - **Normalizer model parameter**: `_antigravity_normalize_context` now receives `model` param to distinguish Claude vs Gemini behavior ## v3.10.11 (2026-05-26) **Hybrid Endpoint Fallback — Redundant Antigravity Endpoints** ### New Features - Hybrid endpoint fallback: tries `cloudcode-pa.googleapis.com` then `daily-cloudcode-pa.googleapis.com` on 429 - `daily-cloudcode-pa.googleapis.com` is the same production endpoint agy-core uses (separate rate limit bucket) - 429 errors now log full response body for debugging - SERVICE_DISABLED (403) still falls through to next endpoint - Rate-limit marking only happens after ALL endpoints fail ### Bug Fixes - Fixed 429 on one endpoint immediately failing — now tries fallback before giving up - Restored SERVICE_DISABLED fallthrough (was accidentally removed) ## v3.10.10 (2026-05-25) **Context Normalizer Fix — Compaction Summary Preservation** ### Bug Fixes - Fixed normalizer stripping ALL context on resumed sessions after compaction - Normalizer no longer auto-resets when compaction summary is present - Compaction summaries ("Auto-compacted: N earlier turns") are always preserved - Deduplicates consecutive identical `` messages (10→1) - Emergency reset now preserves compaction summaries - Previous behavior: after compaction reduced 1925→185 items, normalizer saw `n_tool_outputs == 0` and stripped to just `system + latest_user`, losing all context — model responded with "I don't have context" ### hashlib Fix (v3.10.9 hotfix) - `_antigravity_normalize_context` crashed with `NameError: hashlib` on resumed sessions - Replaced SHA256 duplicate detection with string comparison ## v3.10.9 (2026-05-25) **Antigravity Overhaul — Context Normalizer, Claude Thinking Fix, Endpoint Lockdown** ### Antigravity Endpoint Lockdown - Production-only: `cloudcode-pa.googleapis.com` by default - Sandbox/staging blocked unless `ALLOW_ANTIGRAVITY_STAGING=1` - 403 SERVICE_DISABLED falls through, 429 returns to client ### AntigravityContextNormalizer - Bounded context — no more 136-item polluted requests for "hi" - Simple message detector, auto-reset polluted context - Duplicate removal, tool output budget, hard char limits ### Claude Thinking Fix (Antigravity-only) - Fixed 400 error: `maxOutputTokens=64000` when thinking enabled - Snake_case config, VALIDATED toolConfig, proper budgets ### z.ai / OpenRouter (cobra91 PR #4) - Full OpenClaw attribution headers, OpenRouter caching ## v3.10.8 (2026-05-25) **OAuth & Antigravity Endpoint Fixes** ### Re-OAuth Buttons Fixed - Linux GUI: `load_oauth_secrets()` was undefined — buttons crashed silently on click - Now loads OAuth secrets inline from `~/.config/codex-launcher/oauth-secrets.json` - Both Linux and Windows Re-OAuth use PKCE + localhost callback (was deprecated OOB paste) ### Antigravity Staging/Sandbox Blocked by Default - Proxy: production `cloudcode-pa.googleapis.com` tried FIRST, sandbox/daily/autopush as fallback only - Proxy: 403 SERVICE_DISABLED now falls through to next endpoint instead of returning error immediately - Project discovery: validates against production endpoint, not staging-cloudaicompanion.sandbox - Antigravity preset `base_url` changed to production (was `daily-cloudcode-pa.sandbox.googleapis.com`) - `[antigravity-endpoint]` log line shows which endpoints are being tried ### Other Fixes - GLib.idle_add lambda returning truthy tuple fixed (caused repeated callbacks) - Windows GUI project discovery also uses production endpoint ## v3.10.7 (2026-05-25) **Prompt Enhancer — Fix Lost Context After Compaction** ### Prompt Enhancer (Per-Provider Toggle) - **Offline mode**: Injects structured XML instructions before every user prompt to keep the model focused, decisive, and context-aware after compaction strips conversation history - **AI-powered mode**: Optionally calls an external LLM (configurable model/URL/key) to rewrite vague prompts into clear, actionable instructions - Prevents the "had to resend and reword" problem in long sessions where compaction summarizes hundreds of turns - **Per-endpoint setting** — enable/disable for each provider independently - Configurable in both Linux and Windows GUI: toggle switch, mode selector, enhancer model, URL, API key fields ### How It Works - **Offline**: Prepends a `` block with rules like "never ask for clarification, infer from compacted context, execute decisively" - **AI-powered**: Sends the user's prompt + compaction summary to a separate model (e.g. DeepSeek V4 Flash via Freebuff) which rewrites it for clarity, then prepends the offline instructions too - Both modes run after compaction but before the request is sent upstream ## v3.10.6 (2026-05-25) **Freebuff Integration + Codebuff OAuth Fix + Windows Consolidation** ### Freebuff (Free DeepSeek/Kimi) - **Freebuff integration**: Free DeepSeek/Kimi models via codebuff.com API - Fixed User-Agent to match official SDK: `ai-sdk/openai-compatible/1.0.25/codebuff` - Fixed metadata fields: `freebuff_instance_id` + `client_id` (base36 random) + `cost_mode: "free"` - Fixed session endpoint: POST empty `{}` body (not `{"model": model}`) - GUI preset aliases: "Freebuff (Free DeepSeek/Kimi)", "FreeBuff", "Codebuff (Free DeepSeek/Kimi)" all map to same backend ### Codebuff Fix - Fixed Codebuff OAuth: use `www.codebuff.com` (bare `codebuff.com` returns 307 redirect) ### OAuth Secrets & Credentials (All Providers) - **OAuth Secrets dialog now shows ALL providers**: Google (Antigravity + Gemini CLI) AND Freebuff/Codebuff - **Re-OAuth buttons** for each provider: instantly re-authenticate Google or GitHub/Codebuff - Token status indicators (valid/missing) for each Google provider - Shows logged-in email and auth status for Freebuff/Codebuff - Editable auth token and fingerprint fields for Freebuff/Codebuff ### Windows - Windows GUI files consolidated into `src/` (merged by cobra91 via PR #1 and PR #2) ### Proxy & GUI Improvements (cobra91 PR #3) - CROF adaptive logic gated to `crof.ai` only — no more log pollution for other providers - Data directory consolidation: all data now in `codex-proxy/` (was split across `codex-desktop/`, `codex-launcher/`, `codex-proxy/`) - Sticky proxy port: persists in `.last-proxy-port`, reused on restart so Codex Desktop keeps connection - Adaptive compact budget raised from 60% to 80% — avoids premature compaction on large-context models (DeepSeek v4 Pro 1M) - Config cleanup fix: stale `proxy-*.json` cleanup moved after `_init_runtime()` to avoid deleting active config - Windows GUI: added Clear Log, Restart Proxy, View Log buttons - **Linux/Windows feature parity**: both GUIs now have identical features - Windows GUI: ported OAuth Secrets all-providers dialog (Google + Freebuff/Codebuff with Re-OAuth buttons, token status) - Windows GUI: added Codebuff/Freebuff OAuth login flow (GitHub browser-based) - Windows GUI: added Sync from Preset button in endpoint editor - Linux GUI: added Clear Log + Restart Proxy buttons (matching Windows) ## v3.10.5 (2026-05-25) **Windows GUI + Context Compaction for Antigravity/Gemini OAuth** ### Windows Native GUI (tkinter) - **Windows GUI** in `windows/` folder — full tkinter port by cobra91 - OAuth Secrets editor, Import JSON, Antigravity model list - Shared backend with Linux (same translate-proxy.py) - See README for Windows installation and usage **Context Compaction for Antigravity/Gemini OAuth** ### Fix - **Prevent `input token count exceeds maximum` errors** during long conversations - Added aggressive compaction policies for Antigravity (`cloudcode-pa`) and Gemini CLI (`googleapis`) - Auto-trims old turns when approaching 60% of model context limit (1M tokens for Gemini, 200K for Claude, 128K for GPT-OSS) - Added REST model IDs to context size map (`gemini-3-flash`, `gemini-3.1-pro-low`, `claude-sonnet-4-6`, etc.) ## v3.10.4 (2026-05-25) **Security: OAuth Secrets Editor + Import JSON** ### Security - **All hardcoded OAuth secrets removed from source code and git history** - OAuth client IDs and secrets now stored locally in `~/.config/codex-launcher/oauth-secrets.json` - Git history rewritten to scrub all leaked credentials (0 matches verified) - Pre-push hook blocks any future commit containing secrets - All old Gitea releases deleted (contained leaked secrets in .deb files) ### New Features - **OAuth Secrets editor** in GUI — "OAuth Secrets" button in header bar - **Import JSON** button — import `client_secret_*.json` downloaded from Google Cloud Console - Supports both `"installed"` and `"web"` JSON formats from Google ### Antigravity Fix (from v3.10.3) - Antigravity REST API uses slug IDs, not display names - Verified all model IDs with live API testing ## v3.10.3 (2026-05-25) **Fix Antigravity 404 Errors — Verified REST Model IDs** ### Critical Fix - Antigravity REST API (`v1internal:generateContent`) uses slug IDs, not display names - Verified all model IDs with live API testing against `daily-cloudcode-pa.sandbox.googleapis.com` - Display names map to closest working REST model (e.g. `Gemini 3.5 Flash (High)` → `gemini-3-flash`) - Model list now matches agy CLI: Gemini 3.5 Flash (H/M/L), Gemini 3.1 Pro (H/L), Claude Sonnet/Opus 4.6, GPT-OSS 120B ### Working REST Model IDs | Display Name | REST ID | |---|---| | Gemini 3.5 Flash (High) | gemini-3-flash | | Gemini 3.5 Flash (Medium) | gemini-3-flash | | Gemini 3.5 Flash (Low) | gemini-3.5-flash-low | | Gemini 3.1 Pro (High) | gemini-3.1-pro-low | | Gemini 3.1 Pro (Low) | gemini-3.1-pro-low | | Claude Sonnet 4.6 (Thinking) | claude-sonnet-4-6 | | Claude Opus 4.6 (Thinking) | claude-opus-4-6-thinking | | GPT-OSS 120B (Medium) | gpt-oss-120b-medium | ## v3.10.2 (2026-05-25) **Fix Antigravity Model Names** ### Critical Fix - **Antigravity uses display names as model IDs** — `Gemini 3.5 Flash (High)` not `gemini-3.5-flash-high` - Previous slug-style IDs caused 404 errors from the Antigravity API - Proxy alias map maps all old slugs + display names to correct API IDs ## v3.10.0 (2026-05-25) **Provider Model Editor + Antigravity Model Refresh** ### Provider Editor - **Remove Selected** button to remove highlighted model(s) from provider - **Clear All** button to empty model list - **Sync from Preset** button to refresh model list from current preset definition - Preset sync now replaces (not appends) models — fixes stale saved model lists ### Antigravity Models Updated - **Gemini 3.5 Flash** (High / Medium) - **Gemini 3.1 Pro** (High / Low) - **Claude Sonnet 4.6 Thinking** - **Claude Opus 4.6 Thinking** - **GPT-OSS 120B Medium** ## v3.9.9 (2026-05-25) **Antigravity Model Refresh** ### Updated Models - **Gemini 3.5 Flash** (High / Medium) — new flagship flash model - **Gemini 3.1 Pro** (High / Low) — tiered reasoning control - **Claude Sonnet 4.6 Thinking** — Anthropic partner model via Antigravity - **Claude Opus 4.6 Thinking** — Anthropic partner model via Antigravity - **GPT-OSS 120B Medium** — open-weight GPT model via Antigravity - Removed stale `antigravity-*` prefixed IDs and old preview models ### Proxy Updates - Alias map updated for tiered model IDs (high/medium/low/thinking) - Context sizes added for all new Antigravity models ## v3.9.8 (2026-05-25) **Codex Desktop Model Fix & Global BrokenPipeError Protection** ### Desktop Model Fix - **Codex Desktop sending wrong model** (gpt-5.4-mini) instead of user-selected model — now remapped via `CODEX_LAUNCHER_MODEL` env var - **Config.toml** now writes `review_model`, `wire_api`, `request_max_retries`, `stream_max_retries`, `stream_idle_timeout_ms` for Desktop compatibility - **Proxy model remap** intercepts Desktop forced models (`gpt-5.4-mini`, `gpt-5.5`, etc.) and routes to the user's selected model ### Global Crash Fix - **`send_json()` globally catches BrokenPipeError** — no more crashes on client disconnect across all backends ## v3.9.7 (2026-05-25) **Codebuff Error Forwarding & Crash Fixes** ### Rate Limit Error Forwarding - **Real Codebuff error messages** forwarded to user instead of generic "429 Too Many Requests" - **HTTP 200 + Responses API format** for rate limits — Codex displays the actual Codebuff message (e.g. "Daily session limit reached. Resets in 29m.") instead of retrying - **`retryAfterMs` extraction** from Codebuff 429 responses for accurate cooldown timers - **`_codebuff_start_run`** returns actual error body instead of `None` — shows real Codebuff errors ### Crash Fixes - **BrokenPipeError crash** on "all accounts exhausted" response — wrapped in try/except - **3 SyntaxWarnings** fixed for invalid `\ ` escape sequences in docstrings ## v3.9.6 (2026-05-25) **Performance & Stability Hardening — Connection Pooling, Stream Idle Timeouts, Retry-After** Inspired by architectural study of [Codex-Proxy-Server](https://github.com/unluckyjori/Codex-Proxy-Server) (Rust/Axum). ### P0: Connection Pooling & Stream Idle Timeout - **Connection pooling** (`http.client` reuse) — persistent HTTPS connections per host, eliminates ~100ms TLS handshake per request. Pool keyed by `{scheme}://{host}:{port}`, reused across requests. - **Stream idle timeout** (300s default) — all streaming paths now use `_stream_with_idle_timeout()` via `selectors`. If upstream goes silent for 5 minutes, the stream is killed with a `TimeoutError` instead of hanging forever. Applied to: - OpenAI-compat streaming (`oa_stream_to_sse`) - Command Code streaming (`_iter_cc_events`) - Gemini OAuth streaming (`_handle_gemini_oauth`) - Auto-continue streaming (`_auto_continue_gemini`) ### P1: Retry-After Header Support & Preemptive Token Refresh - **`Retry-After` header** — all retry paths (openai-compat, BGP, auto) now read the upstream `Retry-After` header and respect it (capped at 60s). Falls back to exponential backoff if header is absent. - **Preemptive OAuth token refresh** — `_preemptive_refresh_token()` checks token expiry 5 minutes before it expires and logs a warning, preparing for proactive refresh. ### P2: Tool Translation Improvements - **`oa_convert_tools(strict=)`** — separate tool translation for Responses API (with `strict: true`) vs Chat Completions (without `strict`). Some providers reject the `strict` field in Chat Completions mode. - **Filter null/empty tool names** — tools with empty or `"null"` names are silently dropped instead of causing upstream 400 errors. ### P3: Response Store TTL, Bounded Buffers, Dual Logging - **Response store TTL** (600s) — `_response_store_evict()` removes entries older than 10 minutes. Prevents unbounded memory growth on long sessions. - **Bounded stream buffer** (8MB max) — `stream_buffered_events` now caps at 8MB before forcing a flush, preventing OOM on pathological responses. - **`response.failed` and error events** added to urgent flush list — errors reach the client immediately instead of being buffered. - **Dual logging** — `proxy.log` in `~/.cache/codex-proxy/` captures all proxy messages alongside stderr. Survives Codex Desktop's stderr piping. ## v3.5.0 (2026-05-22) **Major Release — Command Code Adapter Overhaul, AI Assist, Self-Revive Watchdog, Debug Infrastructure** ### Command Code Provider — Multi-Format Tool-Call Parser (Critical Bug Fix) The Command Code (CC) provider adapter in `translate-proxy.py` had a critical bug where the CC model's tool-call output was not being parsed into executable tool calls, causing the Codex agent loop to stop after the first response. The CC model output format **changes between sessions and models** — the parser must handle all observed formats. **Root Cause:** The CC model returns tool calls as inline text in various formats (raw JSON, XML, DSML tags, HTML-like blocks) within `text-delta` SSE events. The original parser only handled one format. When the model switched output style, tool calls were silently dropped, and Codex received a plain text response instead of executable commands — halting the multi-turn agent loop. **The Fix — Multi-Format Parser Chain (17 patches):** A cascading parser chain was built that tries each format in order, first match wins: `DSML → blocks → → XML patterns → raw JSON → fallback regex` - **FIX 1**: `cc_input_to_messages()` — enforce STRING content only (CC `/alpha/generate` rejects content blocks). Tool calls sent as inline JSON text in assistant messages. Tool results as `role: "user"` plain text (NOT `role: "tool"`). - **FIX 2**: `x-command-code-version` header always sent (fallback `"0.26.8"`) — prevents 403 `upgrade_required` errors. - **FIX 3**: Cleared stale schema cache (`content_type:"array"`) that was corrupting message construction. - **FIX 4**: Streaming `try/except` wrapper — catches all streaming errors and sends `response.completed(status:"failed")` event instead of crashing the connection. - **FIX 5**: `_extract_raw_json_tool_calls()` — new parser that finds raw JSON tool calls embedded in model text (`{"cmd":"...","type":"tool-call"}`). - **FIX 6**: `_extract_args()` three-tier parser — tries direct parse → `codecs.escape_decode` → `unicode_escape` to prevent double-wrapped argument strings. - **FIX 7**: `_extract_field()` skips leading `\` before value type check — handles malformed escape sequences in CC output. - **FIX 8**: `sandbox_permissions` normalization from parsed dict — converts `{"docker":"full"}` to the flat string format Codex expects. - **FIX 9** (REVERTED): Removed adaptive probe system — proved unnecessary, conservative inline-text format is sufficient. - **FIX 10**: Comprehensive fix documentation added to proxy file header for maintainability. - **FIX 11**: `_unwrap_cmd()` recursive unwrapping — handles double/triple-wrapped `cmd` values at all 7 extraction paths. `_sanitize_tool_calls()` post-extraction validation layer ensures every tool call has valid name + args. - **FIX 11c**: XML regex fix — `` to match both `` and ``. - **FIX 12**: Self-revive watchdog loop — auto-restarts proxy on crash (up to 50x, progressive backoff 1→30s). Controlled by `_SHUTDOWN_REQUESTED` flag on SIGTERM/SIGINT. - **FIX 13**: Fallback extraction when main parser returns empty but text contains tool-call signals (`{"cmd":`, `"type":"tool-call"`, `\n{"command":"..."}` format (actual CC model output) + fixed fallback regex to match BOTH `"cmd"` AND `"command"` keys. - **FIX 15**: `` blocks converted to real `exec_command` with synthesized curl-based repo exploration command. - **FIX 16**: `...` blocks parsed — extracts `prefix_rule`, `sandbox_permissions`, `justification` via line-oriented parsing. - **FIX 17**: DSML tool_call blocks — the **current CC model output format**: - `<||DSML||tool_calls>` wrapper - `<||DSML||invoke name="exec">` with `<||DSML||parameter name="command">` tags - Extracts command from `parameter name="command"` or fallback to `prefix_rule` - Maps `exec`/`bash` → `exec_command` ### Debug Infrastructure - **Debug-to-file**: All proxy events, text_buf preview, parser results, and fallback attempts logged to `~/.cache/codex-proxy/cc-debug.log` — works even when stderr is piped by Codex Desktop. - **Inline self-test**: `--self-test` flag runs 19 tests covering unwrap, double-wrap, unescaped quotes, XML, function=, sanitizer edge cases. - **Per-request logging**: Event types, text_buf content, parser match results written to debug log for every request. ### AI Assist - AI Assist integration in launcher GUI for intelligent provider configuration and troubleshooting. ### Self-Revive Watchdog - Proxy auto-restarts on crash with progressive backoff (1s → 30s, up to 50 restarts). - Clean shutdown on SIGTERM/SIGINT via `_SHUTDOWN_REQUESTED` flag. - Eliminates manual proxy restart during long coding sessions. ### Other Improvements - `text_buf` in `cc_stream_to_sse` accumulates all `text-delta` events; parsing happens at end-of-stream for complete extraction. - Schema cache with 24h staleness TTL for provider capabilities. - ErrorAnalyzer learns from 4xx errors on retry (max 2 retries). - `cleanup-codex-stale.sh` updated with additional stale process patterns. ## v3.3.0 (2026-05-20) **Antigravity + Gemini CLI OAuth — full Codex agent loop working** ### Gemini CLI OAuth + Antigravity OAuth - Split Google OAuth into separate Gemini CLI OAuth and Google Antigravity OAuth presets/backends. - Gemini CLI OAuth uses the Gemini CLI public OAuth client and Code Assist endpoints. - Antigravity OAuth uses Antigravity OAuth credentials, Code Assist daily/autopush/prod fallback, and Antigravity-style request wrapping. - Added Antigravity version discovery from the updater/changelog with local caching. - Added Antigravity model alias mapping from UI-facing `antigravity-*` IDs to upstream Code Assist model IDs. ### Responses API + Tool Flow - Added Gemini-style history hardening for Google OAuth requests: removes empty turns, coalesces adjacent roles, drops duplicate user repeats, and enforces user-start/user-end history. - Preserves function-call IDs across turns and adds synthetic `thoughtSignature` for historical Gemini function calls, matching Gemini CLI hardening behavior. - Fixed Antigravity streaming Responses API compatibility: single assistant message item, text done events, content part done, output item done, final completed event, and connection close. - Added `response.function_call_arguments.delta` and `response.function_call_arguments.done` events so Codex can execute Antigravity tool calls and create files. - Fixed functionResponse name matching — uses the original functionCall name instead of falling back to call_id. - Strengthened Antigravity prompt policy: use tools immediately for file changes, avoid planning-only responses, and answer directly when no suitable tool exists. - **Auto-continue on MAX_TOKENS** — when Gemini/Antigravity truncates a text response, the proxy transparently sends a continuation request and concatenates the output so Codex receives the complete response without manual intervention. ### Reliability + Routing - Added BGP++ route scoring, route cooldowns, token buckets, and persisted route stats. - Added provider policy layer and adaptive context compaction. - Added tool-call pairing validation/repair for orphaned tool outputs. - Added Endpoint Doctor in the endpoint editor. - Added log redaction helper for common API key/token patterns. ## v3.1.0 (2026-05-20) - Initial Antigravity/Gemini CLI OAuth backend split. - Gemini-style history hardening, SSE streaming fixes. ## v3.0.0 (2026-05-20) **Major architectural overhaul — Phase 0 + Phase 1 of engineering roadmap** ### Proxy (translate-proxy.py) - **ThreadingHTTPServer** — serves concurrent requests (no more blocking) - **Thread-safe shared state** — OrderedDict response store with locks, Crof state lock, stats lock - **Batched + atomic stats writes** — stats buffered in memory, flushed every 5s via `os.replace()` - **Graceful shutdown** — SIGTERM/SIGINT drain active connections (up to 5s), reject new with 503 - **Progressive upstream timeouts** — based on input size and tools (60-300s instead of flat 180s) - **Lazy JSON parsing** — skip parsing SSE events unless they contain `response.completed` - **Buffered SSE writes** — flush every 30ms, on urgent events, or at 4KB (reduces syscalls) - **`/health` endpoint** — returns backend, target, models, BGP route count - **Consolidated imports** — all at top, no more missing import crashes - **`main()` entry point** — runtime init moved out of module level - **TCP_NODELAY** — on all streaming paths (from v2.7.0) - **Anthropic prompt caching** — `cache_control: ephemeral` on system prompts (from v2.7.0) ### Launcher (codex-launcher-gui) - **Dynamic port allocation** — `_pick_free_port()` picks random free port, no more 8080 conflicts - **Proxy health gating** — Codex will NOT launch if proxy fails health check within 15s - **Error dialogs** — clear GTK error dialog when proxy startup fails - **Atomic config backup/restore** — temp file + `os.replace()`, no more corrupted config.toml - **Config transactions** — recovery from interrupted sessions on next startup - **Safe cleanup (PID registry)** — only kills processes launched by the app (pids.json) - **Proxy stderr piped to log** — real-time proxy logs in launcher UI - **Bearer token** — Codex config uses `codex-launcher-local` instead of real API key - **Usage Dashboard v2** — OpenUsage-inspired dark theme with status pills, KPI strip, model bars (from v2.7.0) ## v2.7.0 (2026-05-20) - **Usage Dashboard redesigned** (inspired by OpenUsage design patterns) - Deep Space dark theme with Catppuccin-inspired color palette - Header with animated status dots (OK/WARN/ERR provider health) - KPI summary strip: total providers, requests, token volume, avg latency - Provider cards with colored borders matching health status - Status pills: OK (green), WARN (yellow), ERR (red) - Colored section separators per metric type (Usage=yellow, Models=lavender) - Model composition bar: stacked horizontal segments per model share - Per-model breakdown with mini progress bars, percentage, request counts - Per-model token breakdown (in/out) when available - Token formatting: 1.2M, 45.3K instead of raw numbers - Duration formatting: 1.5h, 3.2m instead of raw seconds - Error section with warning icon - **TCP_NODELAY streaming optimization** - Disables Nagle's algorithm on streaming connections - Reduces per-packet latency by up to 40ms on small SSE events - Applied to all 4 streaming code paths (openai-compat, retry, command-code, generic) - **Anthropic prompt caching** - System prompts now sent as `cache_control: ephemeral` structured format - Enables Anthropic's automatic prompt caching (saves tokens + cost on repeated prompts) ## v2.6.1 (2026-05-20) - **Google OAuth rebuilt to emulate Gemini CLI** - Uses Google's public OAuth client_id (same as gemini-cli) - No `client_secret.json` needed — zero setup required - PKCE (S256 code challenge) + CSRF state protection - Scopes: cloud-platform, generative-language, userinfo.email, userinfo.profile - Redirects to Google's success/failure pages (same as gemini-cli) - Just click "OAuth Login" → browser opens → authorize → done - Token file permissions set to 0600 for security ## v2.6.0 (2026-05-20) - **Usage Dashboard** — per-provider tracking with visual cards - Request counts, success/failure rates, token usage, latency stats - Color-coded success rate bars (green/yellow/red) - Per-model breakdown showing request counts - Last error and last used timestamp - Sorted by most-used provider - Refresh button for live updates - **Proxy usage tracking** — records every request to `usage-stats.json` - **Google OAuth**: browse for `client_secret.json` with file picker dialog - No longer requires copying to a specific path manually - Auto-copies selected file to `~/.cache/codex-proxy/` ## v2.5.1 (2026-05-20) - **Adaptive retry for transient errors** (429/502/503) - Exponential backoff: 2s → 4s → 8s, up to 3 retries - Works for both single-provider and BGP mode - BGP routes retry before failing over to next route - Connection errors (reset/broken pipe) also retried - **Proxy socket reuse** — no more `Address already in use` crashes on restart - **BGP startup log** shows route count and names ## v2.5.0 (2026-05-20) - **AI BGP — Multi-provider routing with automatic failover** - New "AI BGP" button in main window → pool manager - Create BGP pools with ordered routes from any configured endpoint - Each route has its own endpoint URL, API key, model, and priority - **Failover strategy**: tries primary route, automatically falls back to next on error/timeout - BGP pools appear in endpoint dropdown with 🔀 icon - Pool editor: add/remove/reorder routes, pick endpoint + model per route - Up/down buttons for priority reordering - Proxy logs `[bgp] trying route 'Name'` and `[bgp] route 'Name' FAILED` on fallback - If all routes fail: returns 502 with detailed error per route - Fixed TOML config breakage from multi-line paste in API key field (`_toml_safe()`) ## v2.4.0 (2026-05-20) - **Added OpenAdapter provider preset** - Base URL: `https://api.openadapter.in/v1` — one API key, 40+ models - Pre-loaded models: glm-4.7, DeepSeek-V3, kimi-k2.6, qwen3.6-plus, claude-sonnet-4-6, gpt-5.4, gemini-2.5-flash, and more - Works with existing openai-compat proxy backend — no special handling needed - Fixed Add/Edit dialog crash (missing `_on_reasoning_toggled` method) - Redesigned Google OAuth flow with live status dialog and clickable auth URL ## v2.3.2 (2026-05-20) - **Added Google Gemini provider with OAuth support** - Two presets: "Google Gemini (API Key)" and "Google Gemini (OAuth)" - OAuth Login button in endpoint editor — full Google OAuth2 flow - Starts local HTTP server (port 8085), opens browser for Google consent - Captures auth code, exchanges for access + refresh tokens - Stores tokens in `~/.cache/codex-proxy/google-oauth-token.json` - Auto-refreshes access tokens when expired (no manual re-login) - Uses Gemini's OpenAI-compatible endpoint: `generativelanguage.googleapis.com/v1beta/openai` - Models: gemini-2.5-flash, gemini-2.5-pro, gemini-2.0-flash, gemini-2.0-flash-lite, and more - Setup instructions shown if `client_secret.json` not found ## v2.3.0 (2026-05-20) - **Adaptive Crof self-healing system** - Tracks per-model success/failure history with item counts - Dynamically learns max item limit per model (starts at 30, adjusts down on failures) - Proactively compacts input when above learned limit before sending to upstream - Auto-retry on `finish_reason=length` with aggressive re-compaction and resend - Prevents `stream disconnected` and `incomplete` errors on long conversations - All tracking logged to stderr: `[crof-adaptive] model=X items=N OK/FAIL -> limit=N` - Fixed `NameError: _ts` crash in debug logging - Fixed `ConnectionResetError` crash on client disconnect during streaming - Added 180s upstream timeout to prevent hanging connections - Compaction now preserves function_call/function_call_output pairs (no orphaned tool outputs) - Fixed reasoning control: `reasoning_effort=none` always sends both params ## v2.2.1 (2026-05-20) - **Fixed compaction orphaning function_call_output items** — root cause of Crof `incomplete` responses - Compaction cut between function_call and its function_call_output, creating dangling tool results - Crof model received orphaned `tool` messages with empty `tool_call_id`, causing confusion and token exhaustion - Compaction now expands tail boundary to include matching function_call/function_call_output pairs - **Fixed reasoning control**: `reasoning_effort=none` now always sends both `enable_thinking=false` AND `reasoning_effort=none` - Crof API testing confirmed `reasoning_effort=none` is what actually suppresses reasoning, not `enable_thinking=false` - Added upstream debug logging to `~/.cache/codex-proxy/crof-upstream.jsonl` ## v2.2.0 (2026-05-20) - **Added per-provider Reasoning controls in endpoint editor** - Reasoning On/Off toggle — disable reasoning for models that exhaust output tokens (e.g., Crof mimo-v2.5-pro) - Reasoning Effort selector: None, Minimal, Low, Medium, High, Max - When reasoning is OFF: sends `enable_thinking=false` + `reasoning_effort=none` to upstream API - When reasoning is ON: sends user-selected effort level (default: Medium) - Settings stored per-endpoint, passed through proxy config to upstream requests - Strip `reasoning_content` from proxy output — Codex doesn't use it, avoids token waste - Force `max_tokens=64000` minimum for openai-compat providers — room for both reasoning and content - Inspired by unsloth's reasoning control patterns for Qwen/GPT-OSS models - Styled reasoning switch: green = ON, orange = OFF, gentle rounded pill shape - Added error handling to endpoint manager Add/Edit/Manage dialogs (prevents silent failures) ## v2.1.3 (2026-05-19) - **Fixed Crof mimo-v2.5-pro stopping mid-response (finish_reason=length)** - Root cause: model emits 600+ `reasoning_content` SSE chunks that exhaust `max_tokens` before any actual content is generated - Strip `reasoning_content` from proxy output — Codex doesn't use reasoning, avoids wasting output tokens on invisible text - Force `max_tokens` minimum of 64000 for openai-compat providers — gives models room for both reasoning and content - Works for all openai-compat providers (Crof, Z.AI, DeepSeek, OpenRouter, etc.) ## v2.1.2 (2026-05-19) - **Fixed Crof.ai and providers stopping after first tool call (root cause: None tool IDs)** - Codex sends `function_call` items with `id=None` — proxy now matches tool results to calls by call_id + positional fallback - Fixed orphan message output item when response is only tool calls (no text content) - **Auto-trims long conversations (>30 items)** to prevent context overflow on providers like Crof - Keeps system/developer messages, original user query, and most recent 10 items - **Auto-compacts old items into a summary** instead of just dropping them - Summary includes: user requests, assistant responses, tool calls made, files touched - Preserves enough context for the model to continue long tasks intelligently - **Truncates large tool outputs (>8000 chars)** to prevent model output token exhaustion - Crof's models return `incomplete` when tool results contain too much text (e.g., full HTML pages) - Truncated outputs include `[truncated N chars]` suffix so the model knows data was cut - Added request/response logging to `~/.cache/codex-proxy/requests.log` for debugging - Proxy stderr no longer discarded by launcher (visible in terminal for debugging) ## v2.1.1 (2026-05-19) - Added Command Code backend to translation proxy (proprietary `/alpha/generate` API) - Added Command Code provider preset with 20 models (DeepSeek, Claude, GPT, Kimi, GLM, Qwen, etc.) - Added `cc_version` field in endpoint editor for Command Code version (default: 0.26.8) - Proxy sends `x-command-code-version` header to CC API (fixes 403 "upgrade_required") - CC message conversion: `system` role → `user`, string content → array, tools stripped, real UUID for threadId - Fixed proxy: map `developer` role to `system` for Chat Completions providers (DeepSeek, Qwen, etc.) - Fixed proxy: map `developer` role to `user` for Anthropic providers - Forward `instructions` field from Responses API as system message/param ## v2.1.0 (2026-05-19) - Added Codex auth status detection (reads `codex login status`) - Auth status bar shows logged-in provider or warning if auth missing/expired - "Re-login" button opens `codex login` in a terminal for re-authentication - Auto re-checks auth 30s after re-login flow starts - Pre-launch auth check warns before launching Codex Default mode if auth is invalid - Auth status checked asynchronously at startup (non-blocking) ## v2.0.1 (2026-05-19) - Added Codex CLI/Desktop installation verifier to main page - Green check (✔) when detected, yellow cross (✘) when missing - "Install" button next to missing tools opens install guide dialog - Desktop/CLI launch buttons disabled with tooltip when tool is missing - Dependency status logged on startup - Buttons respect missing-state after busy/unbusy cycles ## v2.0.0 (2026-05-19) - Initial release: multi-provider Codex Launcher - Translation proxy: Responses API to Chat Completions + Anthropic Messages - GTK endpoint manager with 10+ provider presets - Codex Default mode (built-in OAuth, zero config) - Browser UA injection for Cloudflare-protected providers (OpenCode) - Streaming SSE, tool calls, reasoning content support - Profile backup/import, model auto-fetch, bulk import - Refresh Models in background thread - URL normalization to prevent double-path bugs - Config backup/restore around sessions - .deb installer package