204 lines
11 KiB
Markdown
204 lines
11 KiB
Markdown
# Changelog
|
|
|
|
## v2.7.0 (2026-05-20)
|
|
|
|
- **Usage Dashboard redesigned** (inspired by OpenUsage design patterns)
|
|
- Deep Space dark theme with Catppuccin-inspired color palette
|
|
- Header with animated status dots (OK/WARN/ERR provider health)
|
|
- KPI summary strip: total providers, requests, token volume, avg latency
|
|
- Provider cards with colored borders matching health status
|
|
- Status pills: OK (green), WARN (yellow), ERR (red)
|
|
- Colored section separators per metric type (Usage=yellow, Models=lavender)
|
|
- Model composition bar: stacked horizontal segments per model share
|
|
- Per-model breakdown with mini progress bars, percentage, request counts
|
|
- Per-model token breakdown (in/out) when available
|
|
- Token formatting: 1.2M, 45.3K instead of raw numbers
|
|
- Duration formatting: 1.5h, 3.2m instead of raw seconds
|
|
- Error section with warning icon
|
|
|
|
- **TCP_NODELAY streaming optimization**
|
|
- Disables Nagle's algorithm on streaming connections
|
|
- Reduces per-packet latency by up to 40ms on small SSE events
|
|
- Applied to all 4 streaming code paths (openai-compat, retry, command-code, generic)
|
|
|
|
- **Anthropic prompt caching**
|
|
- System prompts now sent as `cache_control: ephemeral` structured format
|
|
- Enables Anthropic's automatic prompt caching (saves tokens + cost on repeated prompts)
|
|
|
|
## v2.6.1 (2026-05-20)
|
|
|
|
- **Google OAuth rebuilt to emulate Gemini CLI**
|
|
- Uses Google's public OAuth client_id (same as gemini-cli)
|
|
- No `client_secret.json` needed — zero setup required
|
|
- PKCE (S256 code challenge) + CSRF state protection
|
|
- Scopes: cloud-platform, generative-language, userinfo.email, userinfo.profile
|
|
- Redirects to Google's success/failure pages (same as gemini-cli)
|
|
- Just click "OAuth Login" → browser opens → authorize → done
|
|
- Token file permissions set to 0600 for security
|
|
|
|
## v2.6.0 (2026-05-20)
|
|
|
|
- **Usage Dashboard** — per-provider tracking with visual cards
|
|
- Request counts, success/failure rates, token usage, latency stats
|
|
- Color-coded success rate bars (green/yellow/red)
|
|
- Per-model breakdown showing request counts
|
|
- Last error and last used timestamp
|
|
- Sorted by most-used provider
|
|
- Refresh button for live updates
|
|
- **Proxy usage tracking** — records every request to `usage-stats.json`
|
|
- **Google OAuth**: browse for `client_secret.json` with file picker dialog
|
|
- No longer requires copying to a specific path manually
|
|
- Auto-copies selected file to `~/.cache/codex-proxy/`
|
|
|
|
## v2.5.1 (2026-05-20)
|
|
|
|
- **Adaptive retry for transient errors** (429/502/503)
|
|
- Exponential backoff: 2s → 4s → 8s, up to 3 retries
|
|
- Works for both single-provider and BGP mode
|
|
- BGP routes retry before failing over to next route
|
|
- Connection errors (reset/broken pipe) also retried
|
|
- **Proxy socket reuse** — no more `Address already in use` crashes on restart
|
|
- **BGP startup log** shows route count and names
|
|
|
|
## v2.5.0 (2026-05-20)
|
|
|
|
- **AI BGP — Multi-provider routing with automatic failover**
|
|
- New "AI BGP" button in main window → pool manager
|
|
- Create BGP pools with ordered routes from any configured endpoint
|
|
- Each route has its own endpoint URL, API key, model, and priority
|
|
- **Failover strategy**: tries primary route, automatically falls back to next on error/timeout
|
|
- BGP pools appear in endpoint dropdown with 🔀 icon
|
|
- Pool editor: add/remove/reorder routes, pick endpoint + model per route
|
|
- Up/down buttons for priority reordering
|
|
- Proxy logs `[bgp] trying route 'Name'` and `[bgp] route 'Name' FAILED` on fallback
|
|
- If all routes fail: returns 502 with detailed error per route
|
|
- Fixed TOML config breakage from multi-line paste in API key field (`_toml_safe()`)
|
|
|
|
## v2.4.0 (2026-05-20)
|
|
|
|
- **Added OpenAdapter provider preset**
|
|
- Base URL: `https://api.openadapter.in/v1` — one API key, 40+ models
|
|
- Pre-loaded models: glm-4.7, DeepSeek-V3, kimi-k2.6, qwen3.6-plus, claude-sonnet-4-6, gpt-5.4, gemini-2.5-flash, and more
|
|
- Works with existing openai-compat proxy backend — no special handling needed
|
|
- Fixed Add/Edit dialog crash (missing `_on_reasoning_toggled` method)
|
|
- Redesigned Google OAuth flow with live status dialog and clickable auth URL
|
|
|
|
## v2.3.2 (2026-05-20)
|
|
|
|
- **Added Google Gemini provider with OAuth support**
|
|
- Two presets: "Google Gemini (API Key)" and "Google Gemini (OAuth)"
|
|
- OAuth Login button in endpoint editor — full Google OAuth2 flow
|
|
- Starts local HTTP server (port 8085), opens browser for Google consent
|
|
- Captures auth code, exchanges for access + refresh tokens
|
|
- Stores tokens in `~/.cache/codex-proxy/google-oauth-token.json`
|
|
- Auto-refreshes access tokens when expired (no manual re-login)
|
|
- Uses Gemini's OpenAI-compatible endpoint: `generativelanguage.googleapis.com/v1beta/openai`
|
|
- Models: gemini-2.5-flash, gemini-2.5-pro, gemini-2.0-flash, gemini-2.0-flash-lite, and more
|
|
- Setup instructions shown if `client_secret.json` not found
|
|
|
|
## v2.3.0 (2026-05-20)
|
|
|
|
- **Adaptive Crof self-healing system**
|
|
- Tracks per-model success/failure history with item counts
|
|
- Dynamically learns max item limit per model (starts at 30, adjusts down on failures)
|
|
- Proactively compacts input when above learned limit before sending to upstream
|
|
- Auto-retry on `finish_reason=length` with aggressive re-compaction and resend
|
|
- Prevents `stream disconnected` and `incomplete` errors on long conversations
|
|
- All tracking logged to stderr: `[crof-adaptive] model=X items=N OK/FAIL -> limit=N`
|
|
- Fixed `NameError: _ts` crash in debug logging
|
|
- Fixed `ConnectionResetError` crash on client disconnect during streaming
|
|
- Added 180s upstream timeout to prevent hanging connections
|
|
- Compaction now preserves function_call/function_call_output pairs (no orphaned tool outputs)
|
|
- Fixed reasoning control: `reasoning_effort=none` always sends both params
|
|
|
|
## v2.2.1 (2026-05-20)
|
|
|
|
- **Fixed compaction orphaning function_call_output items** — root cause of Crof `incomplete` responses
|
|
- Compaction cut between function_call and its function_call_output, creating dangling tool results
|
|
- Crof model received orphaned `tool` messages with empty `tool_call_id`, causing confusion and token exhaustion
|
|
- Compaction now expands tail boundary to include matching function_call/function_call_output pairs
|
|
- **Fixed reasoning control**: `reasoning_effort=none` now always sends both `enable_thinking=false` AND `reasoning_effort=none`
|
|
- Crof API testing confirmed `reasoning_effort=none` is what actually suppresses reasoning, not `enable_thinking=false`
|
|
- Added upstream debug logging to `~/.cache/codex-proxy/crof-upstream.jsonl`
|
|
|
|
## v2.2.0 (2026-05-20)
|
|
|
|
- **Added per-provider Reasoning controls in endpoint editor**
|
|
- Reasoning On/Off toggle — disable reasoning for models that exhaust output tokens (e.g., Crof mimo-v2.5-pro)
|
|
- Reasoning Effort selector: None, Minimal, Low, Medium, High, Max
|
|
- When reasoning is OFF: sends `enable_thinking=false` + `reasoning_effort=none` to upstream API
|
|
- When reasoning is ON: sends user-selected effort level (default: Medium)
|
|
- Settings stored per-endpoint, passed through proxy config to upstream requests
|
|
- Strip `reasoning_content` from proxy output — Codex doesn't use it, avoids token waste
|
|
- Force `max_tokens=64000` minimum for openai-compat providers — room for both reasoning and content
|
|
- Inspired by unsloth's reasoning control patterns for Qwen/GPT-OSS models
|
|
- Styled reasoning switch: green = ON, orange = OFF, gentle rounded pill shape
|
|
- Added error handling to endpoint manager Add/Edit/Manage dialogs (prevents silent failures)
|
|
|
|
## v2.1.3 (2026-05-19)
|
|
|
|
- **Fixed Crof mimo-v2.5-pro stopping mid-response (finish_reason=length)**
|
|
- Root cause: model emits 600+ `reasoning_content` SSE chunks that exhaust `max_tokens` before any actual content is generated
|
|
- Strip `reasoning_content` from proxy output — Codex doesn't use reasoning, avoids wasting output tokens on invisible text
|
|
- Force `max_tokens` minimum of 64000 for openai-compat providers — gives models room for both reasoning and content
|
|
- Works for all openai-compat providers (Crof, Z.AI, DeepSeek, OpenRouter, etc.)
|
|
|
|
## v2.1.2 (2026-05-19)
|
|
|
|
- **Fixed Crof.ai and providers stopping after first tool call (root cause: None tool IDs)**
|
|
- Codex sends `function_call` items with `id=None` — proxy now matches tool results to calls by call_id + positional fallback
|
|
- Fixed orphan message output item when response is only tool calls (no text content)
|
|
- **Auto-trims long conversations (>30 items)** to prevent context overflow on providers like Crof
|
|
- Keeps system/developer messages, original user query, and most recent 10 items
|
|
- **Auto-compacts old items into a summary** instead of just dropping them
|
|
- Summary includes: user requests, assistant responses, tool calls made, files touched
|
|
- Preserves enough context for the model to continue long tasks intelligently
|
|
- **Truncates large tool outputs (>8000 chars)** to prevent model output token exhaustion
|
|
- Crof's models return `incomplete` when tool results contain too much text (e.g., full HTML pages)
|
|
- Truncated outputs include `[truncated N chars]` suffix so the model knows data was cut
|
|
- Added request/response logging to `~/.cache/codex-proxy/requests.log` for debugging
|
|
- Proxy stderr no longer discarded by launcher (visible in terminal for debugging)
|
|
|
|
## v2.1.1 (2026-05-19)
|
|
|
|
- Added Command Code backend to translation proxy (proprietary `/alpha/generate` API)
|
|
- Added Command Code provider preset with 20 models (DeepSeek, Claude, GPT, Kimi, GLM, Qwen, etc.)
|
|
- Added `cc_version` field in endpoint editor for Command Code version (default: 0.26.8)
|
|
- Proxy sends `x-command-code-version` header to CC API (fixes 403 "upgrade_required")
|
|
- CC message conversion: `system` role → `user`, string content → array, tools stripped, real UUID for threadId
|
|
- Fixed proxy: map `developer` role to `system` for Chat Completions providers (DeepSeek, Qwen, etc.)
|
|
- Fixed proxy: map `developer` role to `user` for Anthropic providers
|
|
- Forward `instructions` field from Responses API as system message/param
|
|
|
|
## v2.1.0 (2026-05-19)
|
|
|
|
- Added Codex auth status detection (reads `codex login status`)
|
|
- Auth status bar shows logged-in provider or warning if auth missing/expired
|
|
- "Re-login" button opens `codex login` in a terminal for re-authentication
|
|
- Auto re-checks auth 30s after re-login flow starts
|
|
- Pre-launch auth check warns before launching Codex Default mode if auth is invalid
|
|
- Auth status checked asynchronously at startup (non-blocking)
|
|
|
|
## v2.0.1 (2026-05-19)
|
|
|
|
- Added Codex CLI/Desktop installation verifier to main page
|
|
- Green check (✔) when detected, yellow cross (✘) when missing
|
|
- "Install" button next to missing tools opens install guide dialog
|
|
- Desktop/CLI launch buttons disabled with tooltip when tool is missing
|
|
- Dependency status logged on startup
|
|
- Buttons respect missing-state after busy/unbusy cycles
|
|
|
|
## v2.0.0 (2026-05-19)
|
|
|
|
- Initial release: multi-provider Codex Launcher
|
|
- Translation proxy: Responses API to Chat Completions + Anthropic Messages
|
|
- GTK endpoint manager with 10+ provider presets
|
|
- Codex Default mode (built-in OAuth, zero config)
|
|
- Browser UA injection for Cloudflare-protected providers (OpenCode)
|
|
- Streaming SSE, tool calls, reasoning content support
|
|
- Profile backup/import, model auto-fetch, bulk import
|
|
- Refresh Models in background thread
|
|
- URL normalization to prevent double-path bugs
|
|
- Config backup/restore around sessions
|
|
- .deb installer package
|