v3.6.0 — Performance & Stability Hardening

P0: Connection pooling (http.client reuse per host), stream idle timeout
    (300s via selectors) on all streaming paths (OA/CC/Gemini/auto-continue)
P1: Retry-After header support on all retry paths, preemptive OAuth token
    refresh (5min before expiry)
P2: oa_convert_tools(strict=) for Responses vs Chat Completions, filter
    null/empty tool names
P3: Response store TTL (600s eviction), bounded stream buffers (8MB cap),
    response.failed/error urgent flush, dual logging (proxy.log)

.deb: v3.6.0 (71KB) — v3.5.0 and v3.3.0 kept as fallback
This commit is contained in:
admin
2026-05-22 13:14:51 +04:00
Unverified
parent 0682e46521
commit beea20686b
5 changed files with 177 additions and 19 deletions

View File

@@ -1,5 +1,33 @@
# Changelog
## v3.6.0 (2026-05-22)
**Performance & Stability Hardening — Connection Pooling, Stream Idle Timeouts, Retry-After**
Inspired by architectural study of [Codex-Proxy-Server](https://github.com/unluckyjori/Codex-Proxy-Server) (Rust/Axum).
### P0: Connection Pooling & Stream Idle Timeout
- **Connection pooling** (`http.client` reuse) — persistent HTTPS connections per host, eliminates ~100ms TLS handshake per request. Pool keyed by `{scheme}://{host}:{port}`, reused across requests.
- **Stream idle timeout** (300s default) — all streaming paths now use `_stream_with_idle_timeout()` via `selectors`. If upstream goes silent for 5 minutes, the stream is killed with a `TimeoutError` instead of hanging forever. Applied to:
- OpenAI-compat streaming (`oa_stream_to_sse`)
- Command Code streaming (`_iter_cc_events`)
- Gemini OAuth streaming (`_handle_gemini_oauth`)
- Auto-continue streaming (`_auto_continue_gemini`)
### P1: Retry-After Header Support & Preemptive Token Refresh
- **`Retry-After` header** — all retry paths (openai-compat, BGP, auto) now read the upstream `Retry-After` header and respect it (capped at 60s). Falls back to exponential backoff if header is absent.
- **Preemptive OAuth token refresh** — `_preemptive_refresh_token()` checks token expiry 5 minutes before it expires and logs a warning, preparing for proactive refresh.
### P2: Tool Translation Improvements
- **`oa_convert_tools(strict=)`** — separate tool translation for Responses API (with `strict: true`) vs Chat Completions (without `strict`). Some providers reject the `strict` field in Chat Completions mode.
- **Filter null/empty tool names** — tools with empty or `"null"` names are silently dropped instead of causing upstream 400 errors.
### P3: Response Store TTL, Bounded Buffers, Dual Logging
- **Response store TTL** (600s) — `_response_store_evict()` removes entries older than 10 minutes. Prevents unbounded memory growth on long sessions.
- **Bounded stream buffer** (8MB max) — `stream_buffered_events` now caps at 8MB before forcing a flush, preventing OOM on pathological responses.
- **`response.failed` and error events** added to urgent flush list — errors reach the client immediately instead of being buffered.
- **Dual logging** — `proxy.log` in `~/.cache/codex-proxy/` captures all proxy messages alongside stderr. Survives Codex Desktop's stderr piping.
## v3.5.0 (2026-05-22)
**Major Release — Command Code Adapter Overhaul, AI Assist, Self-Revive Watchdog, Debug Infrastructure**