Files

admin beea20686b v3.6.0 — Performance & Stability Hardening

P0: Connection pooling (http.client reuse per host), stream idle timeout
    (300s via selectors) on all streaming paths (OA/CC/Gemini/auto-continue)
P1: Retry-After header support on all retry paths, preemptive OAuth token
    refresh (5min before expiry)
P2: oa_convert_tools(strict=) for Responses vs Chat Completions, filter
    null/empty tool names
P3: Response store TTL (600s eviction), bounded stream buffers (8MB cap),
    response.failed/error urgent flush, dual logging (proxy.log)

.deb: v3.6.0 (71KB) — v3.5.0 and v3.3.0 kept as fallback

2026-05-22 13:14:51 +04:00

23 KiB

Raw Blame History

Changelog

v3.6.0 (2026-05-22)

Performance & Stability Hardening — Connection Pooling, Stream Idle Timeouts, Retry-After

Inspired by architectural study of Codex-Proxy-Server (Rust/Axum).

P0: Connection Pooling & Stream Idle Timeout

Connection pooling (http.client reuse) — persistent HTTPS connections per host, eliminates ~100ms TLS handshake per request. Pool keyed by {scheme}://{host}:{port}, reused across requests.
Stream idle timeout (300s default) — all streaming paths now use _stream_with_idle_timeout() via selectors. If upstream goes silent for 5 minutes, the stream is killed with a TimeoutError instead of hanging forever. Applied to:
- OpenAI-compat streaming (oa_stream_to_sse)
- Command Code streaming (_iter_cc_events)
- Gemini OAuth streaming (_handle_gemini_oauth)
- Auto-continue streaming (_auto_continue_gemini)

P1: Retry-After Header Support & Preemptive Token Refresh

Retry-After header — all retry paths (openai-compat, BGP, auto) now read the upstream Retry-After header and respect it (capped at 60s). Falls back to exponential backoff if header is absent.
Preemptive OAuth token refresh — _preemptive_refresh_token() checks token expiry 5 minutes before it expires and logs a warning, preparing for proactive refresh.

P2: Tool Translation Improvements

oa_convert_tools(strict=) — separate tool translation for Responses API (with strict: true) vs Chat Completions (without strict). Some providers reject the strict field in Chat Completions mode.
Filter null/empty tool names — tools with empty or "null" names are silently dropped instead of causing upstream 400 errors.

P3: Response Store TTL, Bounded Buffers, Dual Logging

Response store TTL (600s) — _response_store_evict() removes entries older than 10 minutes. Prevents unbounded memory growth on long sessions.
Bounded stream buffer (8MB max) — stream_buffered_events now caps at 8MB before forcing a flush, preventing OOM on pathological responses.
response.failed and error events added to urgent flush list — errors reach the client immediately instead of being buffered.
Dual logging — proxy.log in ~/.cache/codex-proxy/ captures all proxy messages alongside stderr. Survives Codex Desktop's stderr piping.

v3.5.0 (2026-05-22)

Major Release — Command Code Adapter Overhaul, AI Assist, Self-Revive Watchdog, Debug Infrastructure

Command Code Provider — Multi-Format Tool-Call Parser (Critical Bug Fix)

The Command Code (CC) provider adapter in translate-proxy.py had a critical bug where the CC model's tool-call output was not being parsed into executable tool calls, causing the Codex agent loop to stop after the first response. The CC model output format changes between sessions and models — the parser must handle all observed formats.

Root Cause: The CC model returns tool calls as inline text in various formats (raw JSON, XML, DSML tags, HTML-like blocks) within text-delta SSE events. The original parser only handled one format. When the model switched output style, tool calls were silently dropped, and Codex received a plain text response instead of executable commands — halting the multi-turn agent loop.

The Fix — Multi-Format Parser Chain (17 patches):

A cascading parser chain was built that tries each format in order, first match wins: DSML → <bash> blocks → <explore_agent> → <tool_call type=...> → XML patterns → raw JSON → fallback regex

FIX 1: cc_input_to_messages() — enforce STRING content only (CC /alpha/generate rejects content blocks). Tool calls sent as inline JSON text in assistant messages. Tool results as role: "user" plain text (NOT role: "tool").
FIX 2: x-command-code-version header always sent (fallback "0.26.8") — prevents 403 upgrade_required errors.
FIX 3: Cleared stale schema cache (content_type:"array") that was corrupting message construction.
FIX 4: Streaming try/except wrapper — catches all streaming errors and sends response.completed(status:"failed") event instead of crashing the connection.
FIX 5: _extract_raw_json_tool_calls() — new parser that finds raw JSON tool calls embedded in model text ({"cmd":"...","type":"tool-call"}).
FIX 6: _extract_args() three-tier parser — tries direct parse → codecs.escape_decode → unicode_escape to prevent double-wrapped argument strings.
FIX 7: _extract_field() skips leading \ before value type check — handles malformed escape sequences in CC output.
FIX 8: sandbox_permissions normalization from parsed dict — converts {"docker":"full"} to the flat string format Codex expects.
FIX 9 (REVERTED): Removed adaptive probe system — proved unnecessary, conservative inline-text format is sufficient.
FIX 10: Comprehensive fix documentation added to proxy file header for maintainability.
FIX 11: _unwrap_cmd() recursive unwrapping — handles double/triple-wrapped cmd values at all 7 extraction paths. _sanitize_tool_calls() post-extraction validation layer ensures every tool call has valid name + args.
FIX 11c: XML regex fix — </tool_call) had unbalanced parenthesis for ~4000 lines; now uses [)]?> to match both </tool_call)> and </tool_call)>.
FIX 12: Self-revive watchdog loop — auto-restarts proxy on crash (up to 50x, progressive backoff 1→30s). Controlled by _SHUTDOWN_REQUESTED flag on SIGTERM/SIGINT.
FIX 13: Fallback extraction when main parser returns empty but text contains tool-call signals ({"cmd":, "type":"tool-call", <tool, <function=).
FIX 14: Parser for <tool_call type="bash">\n{"command":"..."} format (actual CC model output) + fixed fallback regex to match BOTH "cmd" AND "command" keys.
FIX 15: <explore_agent> blocks converted to real exec_command with synthesized curl-based repo exploration command.
FIX 16: <bash>...</bash> blocks parsed — extracts prefix_rule, sandbox_permissions, justification via line-oriented parsing.
FIX 17: DSML tool_call blocks — the current CC model output format:
- <｜｜DSML｜｜tool_calls> wrapper
- <｜｜DSML｜｜invoke name="exec"> with <｜｜DSML｜｜parameter name="command"> tags
- Extracts command from parameter name="command" or fallback to prefix_rule
- Maps exec/bash → exec_command

Debug Infrastructure

Debug-to-file: All proxy events, text_buf preview, parser results, and fallback attempts logged to ~/.cache/codex-proxy/cc-debug.log — works even when stderr is piped by Codex Desktop.
Inline self-test: --self-test flag runs 19 tests covering unwrap, double-wrap, unescaped quotes, XML, function=, sanitizer edge cases.
Per-request logging: Event types, text_buf content, parser match results written to debug log for every request.

AI Assist

AI Assist integration in launcher GUI for intelligent provider configuration and troubleshooting.

Self-Revive Watchdog

Proxy auto-restarts on crash with progressive backoff (1s → 30s, up to 50 restarts).
Clean shutdown on SIGTERM/SIGINT via _SHUTDOWN_REQUESTED flag.
Eliminates manual proxy restart during long coding sessions.

Other Improvements

text_buf in cc_stream_to_sse accumulates all text-delta events; parsing happens at end-of-stream for complete extraction.
Schema cache with 24h staleness TTL for provider capabilities.
ErrorAnalyzer learns from 4xx errors on retry (max 2 retries).
cleanup-codex-stale.sh updated with additional stale process patterns.

v3.3.0 (2026-05-20)

Antigravity + Gemini CLI OAuth — full Codex agent loop working

Gemini CLI OAuth + Antigravity OAuth

Split Google OAuth into separate Gemini CLI OAuth and Google Antigravity OAuth presets/backends.
Gemini CLI OAuth uses the Gemini CLI public OAuth client and Code Assist endpoints.
Antigravity OAuth uses Antigravity OAuth credentials, Code Assist daily/autopush/prod fallback, and Antigravity-style request wrapping.
Added Antigravity version discovery from the updater/changelog with local caching.
Added Antigravity model alias mapping from UI-facing antigravity-* IDs to upstream Code Assist model IDs.

Responses API + Tool Flow

Added Gemini-style history hardening for Google OAuth requests: removes empty turns, coalesces adjacent roles, drops duplicate user repeats, and enforces user-start/user-end history.
Preserves function-call IDs across turns and adds synthetic thoughtSignature for historical Gemini function calls, matching Gemini CLI hardening behavior.
Fixed Antigravity streaming Responses API compatibility: single assistant message item, text done events, content part done, output item done, final completed event, and connection close.
Added response.function_call_arguments.delta and response.function_call_arguments.done events so Codex can execute Antigravity tool calls and create files.
Fixed functionResponse name matching — uses the original functionCall name instead of falling back to call_id.
Strengthened Antigravity prompt policy: use tools immediately for file changes, avoid planning-only responses, and answer directly when no suitable tool exists.
Auto-continue on MAX_TOKENS — when Gemini/Antigravity truncates a text response, the proxy transparently sends a continuation request and concatenates the output so Codex receives the complete response without manual intervention.

Reliability + Routing

Added BGP++ route scoring, route cooldowns, token buckets, and persisted route stats.
Added provider policy layer and adaptive context compaction.
Added tool-call pairing validation/repair for orphaned tool outputs.
Added Endpoint Doctor in the endpoint editor.
Added log redaction helper for common API key/token patterns.

v3.1.0 (2026-05-20)

Initial Antigravity/Gemini CLI OAuth backend split.
Gemini-style history hardening, SSE streaming fixes.

v3.0.0 (2026-05-20)

Major architectural overhaul — Phase 0 + Phase 1 of engineering roadmap

Proxy (translate-proxy.py)

ThreadingHTTPServer — serves concurrent requests (no more blocking)
Thread-safe shared state — OrderedDict response store with locks, Crof state lock, stats lock
Batched + atomic stats writes — stats buffered in memory, flushed every 5s via os.replace()
Graceful shutdown — SIGTERM/SIGINT drain active connections (up to 5s), reject new with 503
Progressive upstream timeouts — based on input size and tools (60-300s instead of flat 180s)
Lazy JSON parsing — skip parsing SSE events unless they contain response.completed
Buffered SSE writes — flush every 30ms, on urgent events, or at 4KB (reduces syscalls)
/health endpoint — returns backend, target, models, BGP route count
Consolidated imports — all at top, no more missing import crashes
main() entry point — runtime init moved out of module level
TCP_NODELAY — on all streaming paths (from v2.7.0)
Anthropic prompt caching — cache_control: ephemeral on system prompts (from v2.7.0)

Launcher (codex-launcher-gui)

Dynamic port allocation — _pick_free_port() picks random free port, no more 8080 conflicts
Proxy health gating — Codex will NOT launch if proxy fails health check within 15s
Error dialogs — clear GTK error dialog when proxy startup fails
Atomic config backup/restore — temp file + os.replace(), no more corrupted config.toml
Config transactions — recovery from interrupted sessions on next startup
Safe cleanup (PID registry) — only kills processes launched by the app (pids.json)
Proxy stderr piped to log — real-time proxy logs in launcher UI
Bearer token — Codex config uses codex-launcher-local instead of real API key
Usage Dashboard v2 — OpenUsage-inspired dark theme with status pills, KPI strip, model bars (from v2.7.0)

v2.7.0 (2026-05-20)

Usage Dashboard redesigned (inspired by OpenUsage design patterns)
- Deep Space dark theme with Catppuccin-inspired color palette
- Header with animated status dots (OK/WARN/ERR provider health)
- KPI summary strip: total providers, requests, token volume, avg latency
- Provider cards with colored borders matching health status
- Status pills: OK (green), WARN (yellow), ERR (red)
- Colored section separators per metric type (Usage=yellow, Models=lavender)
- Model composition bar: stacked horizontal segments per model share
- Per-model breakdown with mini progress bars, percentage, request counts
- Per-model token breakdown (in/out) when available
- Token formatting: 1.2M, 45.3K instead of raw numbers
- Duration formatting: 1.5h, 3.2m instead of raw seconds
- Error section with warning icon
TCP_NODELAY streaming optimization
- Disables Nagle's algorithm on streaming connections
- Reduces per-packet latency by up to 40ms on small SSE events
- Applied to all 4 streaming code paths (openai-compat, retry, command-code, generic)
Anthropic prompt caching
- System prompts now sent as cache_control: ephemeral structured format
- Enables Anthropic's automatic prompt caching (saves tokens + cost on repeated prompts)

v2.6.1 (2026-05-20)

Google OAuth rebuilt to emulate Gemini CLI
- Uses Google's public OAuth client_id (same as gemini-cli)
- No client_secret.json needed — zero setup required
- PKCE (S256 code challenge) + CSRF state protection
- Scopes: cloud-platform, generative-language, userinfo.email, userinfo.profile
- Redirects to Google's success/failure pages (same as gemini-cli)
- Just click "OAuth Login" → browser opens → authorize → done
- Token file permissions set to 0600 for security

v2.6.0 (2026-05-20)

Usage Dashboard — per-provider tracking with visual cards
- Request counts, success/failure rates, token usage, latency stats
- Color-coded success rate bars (green/yellow/red)
- Per-model breakdown showing request counts
- Last error and last used timestamp
- Sorted by most-used provider
- Refresh button for live updates
Proxy usage tracking — records every request to usage-stats.json
Google OAuth: browse for client_secret.json with file picker dialog
- No longer requires copying to a specific path manually
- Auto-copies selected file to ~/.cache/codex-proxy/

v2.5.1 (2026-05-20)

Adaptive retry for transient errors (429/502/503)
- Exponential backoff: 2s → 4s → 8s, up to 3 retries
- Works for both single-provider and BGP mode
- BGP routes retry before failing over to next route
- Connection errors (reset/broken pipe) also retried
Proxy socket reuse — no more Address already in use crashes on restart
BGP startup log shows route count and names

v2.5.0 (2026-05-20)

AI BGP — Multi-provider routing with automatic failover
- New "AI BGP" button in main window → pool manager
- Create BGP pools with ordered routes from any configured endpoint
- Each route has its own endpoint URL, API key, model, and priority
- Failover strategy: tries primary route, automatically falls back to next on error/timeout
- BGP pools appear in endpoint dropdown with 🔀 icon
- Pool editor: add/remove/reorder routes, pick endpoint + model per route
- Up/down buttons for priority reordering
- Proxy logs [bgp] trying route 'Name' and [bgp] route 'Name' FAILED on fallback
- If all routes fail: returns 502 with detailed error per route
Fixed TOML config breakage from multi-line paste in API key field (_toml_safe())

v2.4.0 (2026-05-20)

Added OpenAdapter provider preset
- Base URL: https://api.openadapter.in/v1 — one API key, 40+ models
- Pre-loaded models: glm-4.7, DeepSeek-V3, kimi-k2.6, qwen3.6-plus, claude-sonnet-4-6, gpt-5.4, gemini-2.5-flash, and more
- Works with existing openai-compat proxy backend — no special handling needed
Fixed Add/Edit dialog crash (missing _on_reasoning_toggled method)
Redesigned Google OAuth flow with live status dialog and clickable auth URL

v2.3.2 (2026-05-20)

Added Google Gemini provider with OAuth support
- Two presets: "Google Gemini (API Key)" and "Google Gemini (OAuth)"
- OAuth Login button in endpoint editor — full Google OAuth2 flow
- Starts local HTTP server (port 8085), opens browser for Google consent
- Captures auth code, exchanges for access + refresh tokens
- Stores tokens in ~/.cache/codex-proxy/google-oauth-token.json
- Auto-refreshes access tokens when expired (no manual re-login)
- Uses Gemini's OpenAI-compatible endpoint: generativelanguage.googleapis.com/v1beta/openai
- Models: gemini-2.5-flash, gemini-2.5-pro, gemini-2.0-flash, gemini-2.0-flash-lite, and more
- Setup instructions shown if client_secret.json not found

v2.3.0 (2026-05-20)

Adaptive Crof self-healing system
- Tracks per-model success/failure history with item counts
- Dynamically learns max item limit per model (starts at 30, adjusts down on failures)
- Proactively compacts input when above learned limit before sending to upstream
- Auto-retry on finish_reason=length with aggressive re-compaction and resend
- Prevents stream disconnected and incomplete errors on long conversations
- All tracking logged to stderr: [crof-adaptive] model=X items=N OK/FAIL -> limit=N
Fixed NameError: _ts crash in debug logging
Fixed ConnectionResetError crash on client disconnect during streaming
Added 180s upstream timeout to prevent hanging connections
Compaction now preserves function_call/function_call_output pairs (no orphaned tool outputs)
Fixed reasoning control: reasoning_effort=none always sends both params

v2.2.1 (2026-05-20)

Fixed compaction orphaning function_call_output items — root cause of Crof incomplete responses
- Compaction cut between function_call and its function_call_output, creating dangling tool results
- Crof model received orphaned tool messages with empty tool_call_id, causing confusion and token exhaustion
- Compaction now expands tail boundary to include matching function_call/function_call_output pairs
Fixed reasoning control: reasoning_effort=none now always sends both enable_thinking=false AND reasoning_effort=none
- Crof API testing confirmed reasoning_effort=none is what actually suppresses reasoning, not enable_thinking=false
Added upstream debug logging to ~/.cache/codex-proxy/crof-upstream.jsonl

v2.2.0 (2026-05-20)

Added per-provider Reasoning controls in endpoint editor
- Reasoning On/Off toggle — disable reasoning for models that exhaust output tokens (e.g., Crof mimo-v2.5-pro)
- Reasoning Effort selector: None, Minimal, Low, Medium, High, Max
- When reasoning is OFF: sends enable_thinking=false + reasoning_effort=none to upstream API
- When reasoning is ON: sends user-selected effort level (default: Medium)
- Settings stored per-endpoint, passed through proxy config to upstream requests
Strip reasoning_content from proxy output — Codex doesn't use it, avoids token waste
Force max_tokens=64000 minimum for openai-compat providers — room for both reasoning and content
Inspired by unsloth's reasoning control patterns for Qwen/GPT-OSS models
Styled reasoning switch: green = ON, orange = OFF, gentle rounded pill shape
Added error handling to endpoint manager Add/Edit/Manage dialogs (prevents silent failures)

v2.1.3 (2026-05-19)

Fixed Crof mimo-v2.5-pro stopping mid-response (finish_reason=length)
- Root cause: model emits 600+ reasoning_content SSE chunks that exhaust max_tokens before any actual content is generated
- Strip reasoning_content from proxy output — Codex doesn't use reasoning, avoids wasting output tokens on invisible text
- Force max_tokens minimum of 64000 for openai-compat providers — gives models room for both reasoning and content
- Works for all openai-compat providers (Crof, Z.AI, DeepSeek, OpenRouter, etc.)

v2.1.2 (2026-05-19)

Fixed Crof.ai and providers stopping after first tool call (root cause: None tool IDs)
Codex sends function_call items with id=None — proxy now matches tool results to calls by call_id + positional fallback
Fixed orphan message output item when response is only tool calls (no text content)
Auto-trims long conversations (>30 items) to prevent context overflow on providers like Crof
- Keeps system/developer messages, original user query, and most recent 10 items
- Auto-compacts old items into a summary instead of just dropping them
- Summary includes: user requests, assistant responses, tool calls made, files touched
- Preserves enough context for the model to continue long tasks intelligently
Truncates large tool outputs (>8000 chars) to prevent model output token exhaustion
- Crof's models return incomplete when tool results contain too much text (e.g., full HTML pages)
- Truncated outputs include [truncated N chars] suffix so the model knows data was cut
Added request/response logging to ~/.cache/codex-proxy/requests.log for debugging
Proxy stderr no longer discarded by launcher (visible in terminal for debugging)

v2.1.1 (2026-05-19)

Added Command Code backend to translation proxy (proprietary /alpha/generate API)
Added Command Code provider preset with 20 models (DeepSeek, Claude, GPT, Kimi, GLM, Qwen, etc.)
Added cc_version field in endpoint editor for Command Code version (default: 0.26.8)
Proxy sends x-command-code-version header to CC API (fixes 403 "upgrade_required")
CC message conversion: system role → user, string content → array, tools stripped, real UUID for threadId
Fixed proxy: map developer role to system for Chat Completions providers (DeepSeek, Qwen, etc.)
Fixed proxy: map developer role to user for Anthropic providers
Forward instructions field from Responses API as system message/param

v2.1.0 (2026-05-19)

Added Codex auth status detection (reads codex login status)
Auth status bar shows logged-in provider or warning if auth missing/expired
"Re-login" button opens codex login in a terminal for re-authentication
Auto re-checks auth 30s after re-login flow starts
Pre-launch auth check warns before launching Codex Default mode if auth is invalid
Auth status checked asynchronously at startup (non-blocking)

v2.0.1 (2026-05-19)

Added Codex CLI/Desktop installation verifier to main page
Green check (✔) when detected, yellow cross (✘) when missing
"Install" button next to missing tools opens install guide dialog
Desktop/CLI launch buttons disabled with tooltip when tool is missing
Dependency status logged on startup
Buttons respect missing-state after busy/unbusy cycles

v2.0.0 (2026-05-19)

Initial release: multi-provider Codex Launcher
Translation proxy: Responses API to Chat Completions + Anthropic Messages
GTK endpoint manager with 10+ provider presets
Codex Default mode (built-in OAuth, zero config)
Browser UA injection for Cloudflare-protected providers (OpenCode)
Streaming SSE, tool calls, reasoning content support
Profile backup/import, model auto-fetch, bulk import
Refresh Models in background thread
URL normalization to prevent double-path bugs
Config backup/restore around sessions
.deb installer package

23 KiB Raw Blame History Unescape Escape