feat: auto-compaction for long conversations (like Claude Code/Codex /compact)

Instead of just truncating old items, the proxy now auto-compacts them into a structured summary preserving key context: - User requests, assistant responses, tool calls made, files touched - Keeps original query + system messages + last 10 recent items - 38 items -> 14 items in testing, with summary of dropped turns - Similar to Claude Code's auto-compact and Codex CLI's /compact - No extra API calls needed, instant, zero cost
2026-05-19 21:49:55 +04:00
parent c90912ed07
commit 662d8e961e
3 changed files with 117 additions and 25 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -6,9 +6,10 @@
 - Codex sends `function_call` items with `id=None` — proxy now matches tool results to calls by call_id + positional fallback
 - Fixed orphan message output item when response is only tool calls (no text content)
 - **Auto-trims long conversations (>30 items)** to prevent context overflow on providers like Crof
-  - Keeps system/developer messages, original user query, and most recent items
-  - Drops oldest tool call/outputs from the middle when conversation grows too long
-  - Prevents `status=incomplete` errors on providers with smaller context windows
+  - Keeps system/developer messages, original user query, and most recent 10 items
+  - **Auto-compacts old items into a summary** instead of just dropping them
+  - Summary includes: user requests, assistant responses, tool calls made, files touched
+  - Preserves enough context for the model to continue long tasks intelligently
 - **Truncates large tool outputs (>8000 chars)** to prevent model output token exhaustion
  - Crof's models return `incomplete` when tool results contain too much text (e.g., full HTML pages)
  - Truncated outputs include `[truncated N chars]` suffix so the model knows data was cut