feat: auto-compaction for long conversations (like Claude Code/Codex /compact)

Instead of just truncating old items, the proxy now auto-compacts
them into a structured summary preserving key context:
- User requests, assistant responses, tool calls made, files touched
- Keeps original query + system messages + last 10 recent items
- 38 items -> 14 items in testing, with summary of dropped turns
- Similar to Claude Code's auto-compact and Codex CLI's /compact
- No extra API calls needed, instant, zero cost
This commit is contained in:
admin
2026-05-19 21:49:55 +04:00
Unverified
parent c90912ed07
commit 662d8e961e
3 changed files with 117 additions and 25 deletions

View File

@@ -6,9 +6,10 @@
- Codex sends `function_call` items with `id=None` — proxy now matches tool results to calls by call_id + positional fallback
- Fixed orphan message output item when response is only tool calls (no text content)
- **Auto-trims long conversations (>30 items)** to prevent context overflow on providers like Crof
- Keeps system/developer messages, original user query, and most recent items
- Drops oldest tool call/outputs from the middle when conversation grows too long
- Prevents `status=incomplete` errors on providers with smaller context windows
- Keeps system/developer messages, original user query, and most recent 10 items
- **Auto-compacts old items into a summary** instead of just dropping them
- Summary includes: user requests, assistant responses, tool calls made, files touched
- Preserves enough context for the model to continue long tasks intelligently
- **Truncates large tool outputs (>8000 chars)** to prevent model output token exhaustion
- Crof's models return `incomplete` when tool results contain too much text (e.g., full HTML pages)
- Truncated outputs include `[truncated N chars]` suffix so the model knows data was cut