v3.11.5: token-aware compaction, vision filter, universal adaptive compaction, smart-continue text detection
This commit is contained in:
@@ -130,6 +130,10 @@ A three-component system:
|
||||
- **Response store TTL** — evicts stored responses older than 10 minutes, prevents memory leaks
|
||||
- **Bounded stream buffers** — 8MB cap prevents OOM on pathological responses
|
||||
- **Dual logging** — all proxy messages written to both stderr and `~/.cache/codex-proxy/proxy.log`
|
||||
- **Vision model detection** (v3.11.5) — automatically strips images for non-vision models (DeepSeek, GLM, Qwen, etc.) and replaces with text notice; vision-capable models (GPT-4o, Gemini, Claude, Qwen-VL) keep images intact
|
||||
- **Token-aware compaction** (v3.11.5) — learns per-model token limits from `context_length_exceeded` errors; proactively compacts when estimated tokens exceed 80% of limit; prevents repeated context overflow on small-context models (~35K tokens)
|
||||
- **Universal adaptive compaction** (v3.11.5) — compaction now works for ALL providers (was Crof.ai-only); proactive + retry compaction with aggression levels (normal/extreme)
|
||||
- **Smart-continue text detection** (v3.11.5) — triggers continuation nudging when model outputs text matching tool-call patterns, essential for text-only models that never emit real `function_call_output` items
|
||||
- Zero dependencies — pure Python stdlib
|
||||
|
||||
### Command Code Adapter
|
||||
|
||||
Reference in New Issue
Block a user