v3.6.0 — Performance & Stability Hardening

P0: Connection pooling (http.client reuse per host), stream idle timeout (300s via selectors) on all streaming paths (OA/CC/Gemini/auto-continue) P1: Retry-After header support on all retry paths, preemptive OAuth token refresh (5min before expiry) P2: oa_convert_tools(strict=) for Responses vs Chat Completions, filter null/empty tool names P3: Response store TTL (600s eviction), bounded stream buffers (8MB cap), response.failed/error urgent flush, dual logging (proxy.log) .deb: v3.6.0 (71KB) — v3.5.0 and v3.3.0 kept as fallback
2026-05-22 13:14:51 +04:00
parent 0682e46521
commit beea20686b
5 changed files with 177 additions and 19 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,33 @@
 # Changelog

+## v3.6.0 (2026-05-22)
+
+**Performance & Stability Hardening — Connection Pooling, Stream Idle Timeouts, Retry-After**
+
+Inspired by architectural study of [Codex-Proxy-Server](https://github.com/unluckyjori/Codex-Proxy-Server) (Rust/Axum).
+
+### P0: Connection Pooling & Stream Idle Timeout
+- **Connection pooling** (`http.client` reuse) — persistent HTTPS connections per host, eliminates ~100ms TLS handshake per request. Pool keyed by `{scheme}://{host}:{port}`, reused across requests.
+- **Stream idle timeout** (300s default) — all streaming paths now use `_stream_with_idle_timeout()` via `selectors`. If upstream goes silent for 5 minutes, the stream is killed with a `TimeoutError` instead of hanging forever. Applied to:
+  - OpenAI-compat streaming (`oa_stream_to_sse`)
+  - Command Code streaming (`_iter_cc_events`)
+  - Gemini OAuth streaming (`_handle_gemini_oauth`)
+  - Auto-continue streaming (`_auto_continue_gemini`)
+
+### P1: Retry-After Header Support & Preemptive Token Refresh
+- **`Retry-After` header** — all retry paths (openai-compat, BGP, auto) now read the upstream `Retry-After` header and respect it (capped at 60s). Falls back to exponential backoff if header is absent.
+- **Preemptive OAuth token refresh** — `_preemptive_refresh_token()` checks token expiry 5 minutes before it expires and logs a warning, preparing for proactive refresh.
+
+### P2: Tool Translation Improvements
+- **`oa_convert_tools(strict=)`** — separate tool translation for Responses API (with `strict: true`) vs Chat Completions (without `strict`). Some providers reject the `strict` field in Chat Completions mode.
+- **Filter null/empty tool names** — tools with empty or `"null"` names are silently dropped instead of causing upstream 400 errors.
+
+### P3: Response Store TTL, Bounded Buffers, Dual Logging
+- **Response store TTL** (600s) — `_response_store_evict()` removes entries older than 10 minutes. Prevents unbounded memory growth on long sessions.
+- **Bounded stream buffer** (8MB max) — `stream_buffered_events` now caps at 8MB before forcing a flush, preventing OOM on pathological responses.
+- **`response.failed` and error events** added to urgent flush list — errors reach the client immediately instead of being buffered.
+- **Dual logging** — `proxy.log` in `~/.cache/codex-proxy/` captures all proxy messages alongside stderr. Survives Codex Desktop's stderr piping.
+
 ## v3.5.0 (2026-05-22)

 **Major Release — Command Code Adapter Overhaul, AI Assist, Self-Revive Watchdog, Debug Infrastructure**