v3.11.5: token-aware compaction, vision filter, universal adaptive compaction, smart-continue text detection

This commit is contained in:
Roman | RyzenAdvanced
2026-05-26 16:14:05 +04:00
Unverified
parent 028185652d
commit c16c6eaf61
9 changed files with 488 additions and 73 deletions

View File

@@ -1,5 +1,29 @@
# Changelog # Changelog
## v3.11.5 (2026-05-26)
**Vision Filter, Token-Aware Compaction, Universal Adaptive Compaction, Smart-Continue Text Detection**
### Critical Fixes
- **Token-aware compaction for small-context models (FIX)**: `_crof_compact_for_retry()` had an early return at `len(input_data) <= limit` (item count) — if you had 25 items × 1600 tokens = 40K tokens, it skipped compaction entirely because 25 < 30 (the default item limit). Now also checks estimated token count vs learned model max, and compacts when either item count OR token count exceeds limits. Fixes repeated `context_length_exceeded` errors on models like 0G-GLM-5.1 (~35K token context).
- **Proactive compaction now token-aware**: Previously only triggered when item count > 30. Now also triggers when estimated tokens exceed 80% of the model's learned token limit, even if item count is below the threshold. Prevents the first-request failure pattern on small-context models.
- **Compaction aggression threshold**: Changed `est > max_tok` to `est >= max_tok * 0.9` to avoid edge case where estimated tokens exactly equal the limit and compaction is skipped.
- **Removed all `crof.ai` gates from adaptive compaction**: Proactive compaction, `finish_reason=length` retry, `_crof_record`, and compaction logging were gated behind `"crof.ai" in TARGET_URL`. These gates prevented OpenAdapter and other providers from getting proactive/retry compaction, causing repeated `context_length_exceeded` failures. Now applies universally to ALL providers.
### New Features
- **Vision model detection + image stripping**: `_strip_images_from_input()` and `_model_supports_vision()` detect vision capability by model name pattern. Non-vision models (deepseek, glm, mixtral, llama, command, dbrx, qwen, phi-3) have `input_image`/`image_url` parts stripped and replaced with `[User attached image: filename — this model does not support vision]` text notice. Vision models (gpt-4o, gemini, claude, qwen-vl, glm-5v) keep images intact. Applied in 3 paths: main request, context_length_exceeded retry, smart-continue nudge.
- **Token estimation and per-model limit learning**: `_estimate_tokens()`, `_estimate_input_tokens()`, `_get_model_max_tokens()`, `_set_model_max_tokens()`. Extracts `~N tokens` from `context_length_exceeded` error messages and stores per-model token limits. Used by proactive compaction and retry compaction to adjust `keep` count dynamically.
- **Compaction aggression levels**: `_crof_compact_for_retry()` accepts `aggression` parameter (0=normal, 1=extreme). Extreme mode kicks in when estimated tokens > 1.5× the learned limit or on 2nd+ retry attempt. Reduces `keep` count to minimum, ensuring the compacted request fits within model limits.
- **Smart-continue text-tool detection**: Removed hard requirement for `has_function_call_output(input_data)`. Added `_TOOL_CALL_TEXT_PATTERNS` and `_text_looks_like_tool_calls()` to trigger nudging when model outputs text matching tool-call patterns (e.g., `• (exec_command cmd ...)`, `write_to_file`, `exec_command`) even without prior `function_call_output` in context. Essential for models like 0G-GLM-5.1 that never emit real `function_call_output` items.
- **Parenthesized tool call regex**: `_PAREN_TC_RE` pattern to match `• (name args...)` format from non-vision models that output tool calls as parenthesized text.
### GUI Fixes
- **Active endpoint sync**: Added `set_active_endpoint()` and `validate_active_endpoint()` to Linux GTK GUI. Syncs `.active-endpoint.json` with `config.toml` on every launch; auto-removes stale references to deleted providers. Fixed `"Error loading configuration: No such file or directory (os error 2)"` crash when active endpoint referenced a deleted provider.
- **Config state**: `~/.codex/.active-endpoint.json` and `config.toml` model catalog path validated and auto-corrected on GUI startup.
## v3.11.0 (2026-05-26) ## v3.11.0 (2026-05-26)
**Cobra PR Merge + Smart Continuation + API Key Hot-Reload** **Cobra PR Merge + Smart Continuation + API Key Hot-Reload**

View File

@@ -130,6 +130,10 @@ A three-component system:
- **Response store TTL** — evicts stored responses older than 10 minutes, prevents memory leaks - **Response store TTL** — evicts stored responses older than 10 minutes, prevents memory leaks
- **Bounded stream buffers** — 8MB cap prevents OOM on pathological responses - **Bounded stream buffers** — 8MB cap prevents OOM on pathological responses
- **Dual logging** — all proxy messages written to both stderr and `~/.cache/codex-proxy/proxy.log` - **Dual logging** — all proxy messages written to both stderr and `~/.cache/codex-proxy/proxy.log`
- **Vision model detection** (v3.11.5) — automatically strips images for non-vision models (DeepSeek, GLM, Qwen, etc.) and replaces with text notice; vision-capable models (GPT-4o, Gemini, Claude, Qwen-VL) keep images intact
- **Token-aware compaction** (v3.11.5) — learns per-model token limits from `context_length_exceeded` errors; proactively compacts when estimated tokens exceed 80% of limit; prevents repeated context overflow on small-context models (~35K tokens)
- **Universal adaptive compaction** (v3.11.5) — compaction now works for ALL providers (was Crof.ai-only); proactive + retry compaction with aggression levels (normal/extreme)
- **Smart-continue text detection** (v3.11.5) — triggers continuation nudging when model outputs text matching tool-call patterns, essential for text-only models that never emit real `function_call_output` items
- Zero dependencies — pure Python stdlib - Zero dependencies — pure Python stdlib
### Command Code Adapter ### Command Code Adapter

Binary file not shown.

Binary file not shown.

View File

@@ -3,13 +3,13 @@ set -e
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
if [ -f "$SCRIPT_DIR/codex-launcher_3.11.0_all.deb" ]; then if [ -f "$SCRIPT_DIR/codex-launcher_3.11.5_all.deb" ]; then
echo "Installing codex-launcher_3.11.0_all.deb ..." echo "Installing codex-launcher_3.11.5_all.deb ..."
sudo dpkg -i "$SCRIPT_DIR/codex-launcher_3.11.0_all.deb" sudo dpkg -i "$SCRIPT_DIR/codex-launcher_3.11.5_all.deb"
else else
echo "WARNING: codex-launcher_3.11.0_all.deb not found; copying files manually." echo "WARNING: codex-launcher_3.11.5_all.deb not found; copying files manually."
fi fi
echo "Installed v3.11.0 via .deb package." echo "Installed v3.11.5 via .deb package."
echo " translate-proxy.py -> /usr/bin/translate-proxy.py" echo " translate-proxy.py -> /usr/bin/translate-proxy.py"
echo " codex-launcher-gui -> /usr/bin/codex-launcher-gui" echo " codex-launcher-gui -> /usr/bin/codex-launcher-gui"
echo " cleanup-codex-stale -> /usr/bin/cleanup-codex-stale.sh" echo " cleanup-codex-stale -> /usr/bin/cleanup-codex-stale.sh"

View File

@@ -20,12 +20,22 @@ BGP_POOLS_FILE = HOME / ".codex/bgp-pools.json"
LOG_DIR = HOME / ".cache/codex-desktop" LOG_DIR = HOME / ".cache/codex-desktop"
LAUNCH_LOG = LOG_DIR / "launcher.log" LAUNCH_LOG = LOG_DIR / "launcher.log"
PROXY_CONFIG_DIR = HOME / ".cache/codex-proxy" PROXY_CONFIG_DIR = HOME / ".cache/codex-proxy"
ACTIVE_ENDPOINT_FILE = HOME / ".codex/.active-endpoint.json"
DEFAULT_CONFIG = """model = "" DEFAULT_CONFIG = """model = ""
model_provider = "" model_provider = ""
model_catalog_json = "" model_catalog_json = ""
""" """
CHANGELOG = [ CHANGELOG = [
("3.11.5", "2026-05-26", [
"Token-aware compaction: fixes context_length_exceeded on small-context models",
"Proactive compaction triggers on token count, not just item count",
"Universal adaptive compaction for all providers (removed crof.ai gates)",
"Vision model detection + image stripping for non-vision models",
"Per-model token limit learning from error messages",
"Smart-continue text-tool detection for text-only models",
"Active endpoint sync: auto-removes stale references on startup",
]),
("3.11.0", "2026-05-26", [ ("3.11.0", "2026-05-26", [
"Merge cobra PR: concurrency semaphore (max 3), auto-continue for truncated text", "Merge cobra PR: concurrency semaphore (max 3), auto-continue for truncated text",
"SO_REUSEADDR on sticky port, proxy-stderr.log, stream diagnostics logging", "SO_REUSEADDR on sticky port, proxy-stderr.log, stream diagnostics logging",
@@ -33,7 +43,7 @@ CHANGELOG = [
"Restart Proxy button: only restarts proxy without killing Codex Desktop", "Restart Proxy button: only restarts proxy without killing Codex Desktop",
"Tool call argument normalizer: fixes Arguments→arguments, strips markdown wrapping", "Tool call argument normalizer: fixes Arguments→arguments, strips markdown wrapping",
"Smart-continue loop (2× retries): escalating nudges when model stops text-only mid-task", "Smart-continue loop (2× retries): escalating nudges when model stops text-only mid-task",
"XML tool call extraction: parses <tool_call> patterns from text, injects as real calls", "XML tool call extraction: parses <name> patterns from text, injects as real calls",
"Auto-continue + smart-continue ordered with skip guard to avoid double-firing", "Auto-continue + smart-continue ordered with skip guard to avoid double-firing",
"API key hot-reload with mtime tracking + /admin/reload + /admin/verify-key endpoints", "API key hot-reload with mtime tracking + /admin/reload + /admin/verify-key endpoints",
"GUI hot-reload: auto-refreshes proxy key on endpoint edit, verifies with upstream", "GUI hot-reload: auto-refreshes proxy key on endpoint edit, verifies with upstream",
@@ -923,6 +933,27 @@ def restore_config():
shutil.copy2(str(CONFIG_BAK), str(tmp)) shutil.copy2(str(CONFIG_BAK), str(tmp))
os.replace(str(tmp), str(CONFIG)) os.replace(str(tmp), str(CONFIG))
def set_active_endpoint(name):
ACTIVE_ENDPOINT_FILE.parent.mkdir(parents=True, exist_ok=True)
write_secure_text(ACTIVE_ENDPOINT_FILE, json.dumps({"active": name}, indent=2))
def validate_active_endpoint(logfn=None):
if not ACTIVE_ENDPOINT_FILE.exists():
return
try:
d = json.loads(ACTIVE_ENDPOINT_FILE.read_text())
active = d.get("active", "")
if not active:
return
eps = load_endpoints()
names = {ep.get("name", "") for ep in eps}
if active not in names:
ACTIVE_ENDPOINT_FILE.unlink()
if logfn:
logfn(f"Removed stale active-endpoint '{active}' (provider no longer exists)")
except Exception:
pass
def write_secure_text(path, text): def write_secure_text(path, text):
path.parent.mkdir(parents=True, exist_ok=True) path.parent.mkdir(parents=True, exist_ok=True)
tmp = path.with_suffix(path.suffix + ".tmp") tmp = path.with_suffix(path.suffix + ".tmp")
@@ -1862,6 +1893,7 @@ class LauncherWin(Gtk.Window):
self._proc = None self._proc = None
self._endpoints_data = load_endpoints() self._endpoints_data = load_endpoints()
recover_config_if_needed() recover_config_if_needed()
validate_active_endpoint()
vbox = Gtk.Box(orientation=Gtk.Orientation.VERTICAL, spacing=8) vbox = Gtk.Box(orientation=Gtk.Orientation.VERTICAL, spacing=8)
self.add(vbox) self.add(vbox)
@@ -2607,6 +2639,8 @@ class LauncherWin(Gtk.Window):
begin_config_transaction(f"launch:{ep['name']}") begin_config_transaction(f"launch:{ep['name']}")
write_config_for_native(ep, model) write_config_for_native(ep, model)
set_active_endpoint(ep["name"])
if target == "desktop": if target == "desktop":
if needs_proxy: if needs_proxy:
_kill_existing_desktop(self.log) _kill_existing_desktop(self.log)
@@ -2664,6 +2698,7 @@ class LauncherWin(Gtk.Window):
begin_config_transaction(f"launch:bgp:{pool['name']}") begin_config_transaction(f"launch:bgp:{pool['name']}")
write_config_for_translated(bgp_ep, model, port) write_config_for_translated(bgp_ep, model, port)
set_active_endpoint(pool["name"])
if target == "desktop": if target == "desktop":
_kill_existing_desktop(self.log) _kill_existing_desktop(self.log)

View File

@@ -83,14 +83,24 @@ model_catalog_json = ""
""" """
CHANGELOG = [ CHANGELOG = [
("3.11.5", "2026-05-26", [
"Token-aware compaction: fixes context_length_exceeded on small-context models (25 items × 1600 tokens)",
"Proactive compaction triggers on token count (>80% model limit), not just item count",
"Universal adaptive compaction: removed crof.ai-only gates, all providers get compaction",
"Vision model detection: strips images for non-vision models, keeps for vision-capable ones",
"Per-model token limit learning from context_length_exceeded error messages",
"Compaction aggression levels: normal vs extreme when tokens > 1.5× model limit",
"Smart-continue text-tool detection: triggers on tool-call text patterns, not just function_call_output",
"Active endpoint sync: GUI auto-removes stale endpoint references on startup",
]),
("3.11.0", "2026-05-26", [ ("3.11.0", "2026-05-26", [
"Merge cobra PR: concurrency semaphore (max 3), auto-continue for truncated text", "Merge cobra PR: concurrency semaphore (max 3), auto-continue for truncated text",
"SO_REUSEADDR on sticky port, proxy-stderr.log, stream diagnostics logging", "SO_REUSEADDR on sticky port, proxy-stderr.log, stream diagnostics logging",
"Timeout/OSError handler sends response.failed SSE instead of silent drop", "Timeout/OSError handler sends response.failed SSE instead of silent drop",
"Restart Proxy button: only restarts proxy without killing Codex Desktop", "Restart Proxy button: only restarts proxy without killing Codex Desktop",
"Tool call argument normalizer: fixes Argumentsarguments, strips markdown wrapping", "Tool call argument normalizer: fixes Arguments->arguments, strips markdown wrapping",
"Smart-continue loop (2× retries): escalating nudges when model stops text-only mid-task", "Smart-continue loop (2x retries): escalating nudges when model stops text-only mid-task",
"XML tool call extraction: parses <tool_call> patterns from text, injects as real calls", "XML tool call extraction: parses patterns from text, injects as real calls",
"Auto-continue + smart-continue ordered with skip guard to avoid double-firing", "Auto-continue + smart-continue ordered with skip guard to avoid double-firing",
"API key hot-reload with mtime tracking + /admin/reload + /admin/verify-key endpoints", "API key hot-reload with mtime tracking + /admin/reload + /admin/verify-key endpoints",
"GUI hot-reload: auto-refreshes proxy key on endpoint edit, verifies with upstream", "GUI hot-reload: auto-refreshes proxy key on endpoint edit, verifies with upstream",

View File

@@ -1469,6 +1469,53 @@ _CROF_ADAPTIVE = {
"min_keep_recent": 6, "min_keep_recent": 6,
} }
_model_max_tokens = {}
_model_max_tokens_lock = threading.Lock()
def _estimate_tokens(item):
if not isinstance(item, dict):
return 4
t = item.get("type", "")
if t == "message":
content = item.get("content", "")
if isinstance(content, str):
return max(4, len(content) // 4)
elif isinstance(content, list):
total = 4
for part in content:
pt = part.get("type", "")
if pt in ("input_text", "output_text"):
total += max(4, len(part.get("text", "")) // 4)
elif pt == "input_image":
total += 800
elif pt in ("function_call",):
total += max(20, len(part.get("arguments", "{}")) // 2)
elif pt == "function_call_output":
total += max(8, len(part.get("output", "")) // 4)
return total
elif t in ("function_call_output",):
return max(8, len(item.get("output", "")) // 4)
elif t == "function_call":
return max(20, len(item.get("arguments", "{}")) // 2)
return 4
def _estimate_input_tokens(input_data):
if not isinstance(input_data, list):
return 0
return sum(_estimate_tokens(i) for i in input_data)
def _get_model_max_tokens(model):
with _model_max_tokens_lock:
return _model_max_tokens.get(model)
def _set_model_max_tokens(model, tokens):
if model and tokens:
with _model_max_tokens_lock:
existing = _model_max_tokens.get(model)
if existing is None or tokens < existing:
_model_max_tokens[model] = tokens
print(f"[ctx-limit] learned {model} max ~{tokens} tokens", file=sys.stderr)
_BGP_STATS_PATH = os.path.join(_LOG_DIR, "bgp-route-stats.json") _BGP_STATS_PATH = os.path.join(_LOG_DIR, "bgp-route-stats.json")
_bgp_stats_lock = threading.Lock() _bgp_stats_lock = threading.Lock()
@@ -1534,8 +1581,6 @@ def _sorted_bgp_routes():
return sorted(BGP_ROUTES, key=lambda r: _score_route(r, stats)) return sorted(BGP_ROUTES, key=lambda r: _score_route(r, stats))
def _crof_record(model, n_items, success): def _crof_record(model, n_items, success):
if TARGET_URL and "crof.ai" not in TARGET_URL:
return
if not isinstance(n_items, int) or n_items < 1: if not isinstance(n_items, int) or n_items < 1:
return return
entry = {"model": model, "items": n_items, "ok": success} entry = {"model": model, "items": n_items, "ok": success}
@@ -1561,20 +1606,36 @@ def _crof_record(model, n_items, success):
global_limit = v["limit"] global_limit = v["limit"]
_CROF_ADAPTIVE["global_item_limit"] = global_limit _CROF_ADAPTIVE["global_item_limit"] = global_limit
if TARGET_URL and "crof.ai" in TARGET_URL: print(f"[crof-adaptive] model={model} items={n_items} {'OK' if success else 'FAIL'} -> limit={ml.get('limit',30)} global={global_limit}", file=sys.stderr)
print(f"[crof-adaptive] model={model} items={n_items} {'OK' if success else 'FAIL'} -> limit={ml.get('limit',30)} global={global_limit}", file=sys.stderr)
def _crof_item_limit(model): def _crof_item_limit(model):
ml = _CROF_ADAPTIVE["model_limits"].get(model, {}) ml = _CROF_ADAPTIVE["model_limits"].get(model, {})
per_model = ml.get("limit", 30) per_model = ml.get("limit", 30)
return min(per_model, _CROF_ADAPTIVE["global_item_limit"]) return min(per_model, _CROF_ADAPTIVE["global_item_limit"])
def _crof_compact_for_retry(input_data, model): def _crof_compact_for_retry(input_data, model, aggression=0):
limit = _crof_item_limit(model) limit = _crof_item_limit(model)
if not isinstance(input_data, list) or len(input_data) <= limit: if not isinstance(input_data, list) or len(input_data) < 2:
return input_data
max_tok = _get_model_max_tokens(model)
est = _estimate_input_tokens(input_data)
over_item_limit = len(input_data) > limit
over_token_limit = max_tok and est >= max_tok * 0.9
if not over_item_limit and not over_token_limit:
return input_data return input_data
keep = max(_CROF_ADAPTIVE["min_keep_recent"], limit // 3) keep = max(_CROF_ADAPTIVE["min_keep_recent"], limit // 3)
if over_token_limit:
ratio = est / max_tok
if aggression >= 1 or ratio > 1.5:
keep = max(2, _CROF_ADAPTIVE["min_keep_recent"] // 2)
elif ratio > 1.2:
keep = max(3, keep // 2)
print(f"[ctx-limit] model={model} est={est}tok max={max_tok}tok ratio={ratio:.2f} -> keep={keep}", file=sys.stderr)
elif over_item_limit:
keep = max(keep, 6)
head_end = 0 head_end = 0
for i, item in enumerate(input_data): for i, item in enumerate(input_data):
t = item.get("type") t = item.get("type")
@@ -1607,8 +1668,7 @@ def _crof_compact_for_retry(input_data, model):
summary_lines.append(_item_summary(item, max_len=120)) summary_lines.append(_item_summary(item, max_len=120))
summary_msg = {"type": "message", "role": "user", "content": [{"type": "input_text", "text": "\n".join(summary_lines)}]} summary_msg = {"type": "message", "role": "user", "content": [{"type": "input_text", "text": "\n".join(summary_lines)}]}
if TARGET_URL and "crof.ai" in TARGET_URL: print(f"[crof-adaptive] RETRY compact: {len(input_data)} -> {len(head)+1+len(tail)} (limit={limit}, keep={len(tail)}, agg={aggression})", file=sys.stderr)
print(f"[crof-adaptive] RETRY compact: {len(input_data)} -> {len(head)+1+len(tail)} (limit={limit}, keep={len(tail)})", file=sys.stderr)
return head + [summary_msg] + tail return head + [summary_msg] + tail
def _item_summary(item, max_len=200): def _item_summary(item, max_len=200):
@@ -2051,6 +2111,18 @@ def synthesize_tool_results_for_chat(input_items):
def has_function_call_output(input_items): def has_function_call_output(input_items):
return isinstance(input_items, list) and any(i.get("type") == "function_call_output" for i in input_items) return isinstance(input_items, list) and any(i.get("type") == "function_call_output" for i in input_items)
_TOOL_CALL_TEXT_PATTERNS = re.compile(
r'(?:^|\n)[\s•\-\*]*\(?'
r'(?:exec_command|write_to_file|exec_bash|bash|run_command|shell|edit_file|read_file|search_files|list_files)'
r'[\s:]',
re.I | re.MULTILINE
)
def _text_looks_like_tool_calls(text):
if not text or len(text) < 6:
return False
return bool(_TOOL_CALL_TEXT_PATTERNS.search(text))
# ═══════════════════════════════════════════════════════════════════ # ═══════════════════════════════════════════════════════════════════
# Log redaction # Log redaction
# ═══════════════════════════════════════════════════════════════════ # ═══════════════════════════════════════════════════════════════════
@@ -2233,9 +2305,14 @@ def _normalize_tool_args(raw_args):
except json.JSONDecodeError: except json.JSONDecodeError:
return raw_args return raw_args
_XML_TC_RE = re.compile(r'<tool_call>(\w+)(.*?)</tool_call>', re.DOTALL) _XML_TC_RE = re.compile(r'exec_command(.*?)</invoke>', re.DOTALL)
_XML_ARG_VALUE_RE = re.compile(r'</?arg_value>\s*') _XML_ARG_VALUE_RE = re.compile(r'</?arg_value>\s*')
_PAREN_TC_RE = re.compile(
r'(?:^|[\n•\-\*]\s*)\(\s*(exec_command|write_to_file|exec_bash|bash|run_command|shell|edit_file|read_file|search_files|list_files)\b\s*(.*?)\)',
re.DOTALL | re.I
)
def _extract_xml_tool_calls(text): def _extract_xml_tool_calls(text):
if not text: if not text:
return [] return []
@@ -2262,6 +2339,68 @@ def _extract_xml_tool_calls(text):
results.append({"name": name, "args": args_str, "call_id": f"xml_{len(results)}"}) results.append({"name": name, "args": args_str, "call_id": f"xml_{len(results)}"})
return results return results
_NON_VISION_MODEL_PATTERNS = re.compile(
r'\b(deepseek|glm|mixtral|llama\b(?!.*vision)|command|dbrx|qwen\b(?!.*vl)|phi-?3(?!.*vision))',
re.I
)
_vision_fail_cache = set()
_vision_fail_lock = threading.Lock()
def _model_supports_vision(model):
if not model:
return True
with _vision_fail_lock:
if model in _vision_fail_cache:
return False
if _NON_VISION_MODEL_PATTERNS.search(model):
return False
return True
def _mark_vision_fail(model):
if model:
with _vision_fail_lock:
_vision_fail_cache.add(model)
def _strip_images_from_input(input_data, model):
if not isinstance(input_data, list) or _model_supports_vision(model):
return input_data
modified = False
result = []
for item in input_data:
if item.get("type") != "message":
result.append(item)
continue
content = item.get("content", [])
if isinstance(content, str):
result.append(item)
continue
new_content = []
has_img = False
for part in content:
if isinstance(part, str):
new_content.append(part)
continue
pt = part.get("type", "")
if pt in ("input_image", "image_url"):
if not has_img:
fname = part.get("image_url", {}).get("url", part.get("url", "image.png"))
if fname.startswith("data:"):
fname = "screenshot.png"
new_content.append({"type": "output_text", "text": f"[User attached image: {fname} — this model does not support vision]"})
has_img = True
modified = True
else:
new_content.append(part)
if modified:
result.append({**item, "content": new_content})
else:
result.append(item)
if modified:
print(f"[vision-filter] stripped {sum(1 for i in input_data if i.get('type')=='message' and any(c.get('type') in ('input_image','image_url') for c in (i.get('content') or []) if isinstance(c,dict)))} images for model={model}", file=sys.stderr)
return result
return input_data
def oa_input_to_messages(input_data): def oa_input_to_messages(input_data):
msgs = [] msgs = []
tool_name_by_id = {} tool_name_by_id = {}
@@ -4889,12 +5028,25 @@ class Handler(http.server.BaseHTTPRequestHandler):
body["input"] = input_data body["input"] = input_data
crof_limit = _crof_item_limit(model) crof_limit = _crof_item_limit(model)
_crof_eligible = TARGET_URL and "crof.ai" in TARGET_URL _crof_eligible = True
if _crof_eligible and not compacted and isinstance(input_data, list) and len(input_data) > crof_limit: if _crof_eligible and not compacted and isinstance(input_data, list):
print(f"[crof-adaptive] proactive compact: {len(input_data)} items > limit {crof_limit}", file=sys.stderr) _needs_compact = len(input_data) > crof_limit
input_data = _crof_compact_for_retry(input_data, model) max_tok = _get_model_max_tokens(model)
body = dict(body) est_tok = _estimate_input_tokens(input_data) if max_tok else 0
body["input"] = input_data if not _needs_compact and max_tok and est_tok > max_tok * 0.8:
_needs_compact = True
if _needs_compact:
_agg = 0
if max_tok and est_tok > max_tok:
_agg = 1
print(f"[crof-adaptive] proactive compact: {len(input_data)} items, est={est_tok}tok max={max_tok}tok agg={_agg}", file=sys.stderr)
input_data = _crof_compact_for_retry(input_data, model, aggression=_agg)
body = dict(body)
body["input"] = input_data
# Strip images for non-vision models
input_data = _strip_images_from_input(input_data, model)
body["input"] = input_data
messages = oa_input_to_messages(input_data) messages = oa_input_to_messages(input_data)
messages = _inject_stored_reasoning(messages) messages = _inject_stored_reasoning(messages)
@@ -4927,14 +5079,19 @@ class Handler(http.server.BaseHTTPRequestHandler):
except urllib.error.HTTPError as e: except urllib.error.HTTPError as e:
err_body = e.read().decode() err_body = e.read().decode()
if "context_length_exceeded" in err_body and attempt < max_retries: if "context_length_exceeded" in err_body and attempt < max_retries:
print(f"[{self._session_id}] context_length_exceeded (attempt {attempt+1}/{max_retries}), retrying with extreme compaction!", file=sys.stderr) import re as _re
_tok_m = _re.search(r'~?(\d+)\s*tokens', err_body)
if _tok_m:
_set_model_max_tokens(model, int(_tok_m.group(1)))
print(f"[{self._session_id}] context_length_exceeded (attempt {attempt+1}/{max_retries}), retrying with compaction (agg={attempt})!", file=sys.stderr)
policy = provider_policy() policy = provider_policy()
if isinstance(input_data, list): if isinstance(input_data, list):
print(f"[{self._session_id}] applying extreme compaction to {len(input_data)} items", file=sys.stderr) est = _estimate_input_tokens(input_data)
input_data = _crof_compact_for_retry(input_data, model) print(f"[{self._session_id}] applying compaction to {len(input_data)} items ~{est}tok", file=sys.stderr)
input_data = _crof_compact_for_retry(input_data, model, aggression=attempt)
body = dict(body) body = dict(body)
body["input"] = input_data body["input"] = input_data
messages = oa_input_to_messages(input_data) messages = oa_input_to_messages(_strip_images_from_input(input_data, model))
messages = _inject_stored_reasoning(messages) messages = _inject_stored_reasoning(messages)
instructions = body.get("instructions", "").strip() instructions = body.get("instructions", "").strip()
if instructions: if instructions:
@@ -5725,9 +5882,11 @@ class Handler(http.server.BaseHTTPRequestHandler):
last_status = None last_status = None
finish_reason = None finish_reason = None
has_content = False has_content = False
has_message = False
has_tool_call = False
def _observe_event(event): def _observe_event(event):
nonlocal last_resp_id, last_output, last_status, finish_reason, has_content nonlocal last_resp_id, last_output, last_status, finish_reason, has_content, has_message, has_tool_call
for line in event.strip().split("\n"): for line in event.strip().split("\n"):
if line.startswith("data: "): if line.startswith("data: "):
try: try:
@@ -5737,7 +5896,9 @@ class Handler(http.server.BaseHTTPRequestHandler):
last_output = d.get("response", {}).get("output", []) last_output = d.get("response", {}).get("output", [])
last_status = d.get("response", {}).get("status") last_status = d.get("response", {}).get("status")
finish_reason = "length" if last_status == "incomplete" else "stop" finish_reason = "length" if last_status == "incomplete" else "stop"
has_content = any(o.get("type") == "message" for o in (last_output or [])) has_tool_call = any(o.get("type") == "function_call" for o in (last_output or []))
has_message = any(o.get("type") == "message" for o in (last_output or []))
has_content = has_message or has_tool_call
except Exception: except Exception:
pass pass
@@ -5749,7 +5910,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
break break
collected_events.append(event) collected_events.append(event)
_observe_event(event) _observe_event(event)
print(f"[{self._session_id}] stream ended: events={len(collected_events)} finish={finish_reason} has_content={has_content} elapsed={time.time()-t0:.1f}s", file=sys.stderr) print(f"[{self._session_id}] stream ended: events={len(collected_events)} finish={finish_reason} has_content={has_content} has_message={has_message} has_tool_call={has_tool_call} elapsed={time.time()-t0:.1f}s", file=sys.stderr)
except (ConnectionResetError, BrokenPipeError, ConnectionAbortedError): except (ConnectionResetError, BrokenPipeError, ConnectionAbortedError):
print("[translate-proxy] client disconnected during stream", file=sys.stderr) print("[translate-proxy] client disconnected during stream", file=sys.stderr)
_crof_record(model, n_items, False) _crof_record(model, n_items, False)
@@ -5805,6 +5966,8 @@ class Handler(http.server.BaseHTTPRequestHandler):
last_resp_id = last_output = last_status = None last_resp_id = last_output = last_status = None
finish_reason = None finish_reason = None
has_content = False has_content = False
has_message = False
has_tool_call = False
for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")): for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")):
collected_events.append(event) collected_events.append(event)
_observe_event(event) _observe_event(event)
@@ -5813,7 +5976,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
print(f"[provider-sensor] synthetic retry failed: {e}", file=sys.stderr) print(f"[provider-sensor] synthetic retry failed: {e}", file=sys.stderr)
# Auto-retry on finish_reason=length with no content due to too much context. # Auto-retry on finish_reason=length with no content due to too much context.
if finish_reason == "length" and not has_content and isinstance(input_data, list) and len(input_data) > 5 and TARGET_URL and "crof.ai" in TARGET_URL: if finish_reason == "length" and not has_content and isinstance(input_data, list) and len(input_data) > 5:
print(f"[crof-adaptive] RETRY: finish_reason=length with no content, compacting {n_items} items", file=sys.stderr) print(f"[crof-adaptive] RETRY: finish_reason=length with no content, compacting {n_items} items", file=sys.stderr)
new_input = _crof_compact_for_retry(input_data, model) new_input = _crof_compact_for_retry(input_data, model)
if len(new_input) < len(input_data): if len(new_input) < len(input_data):
@@ -5836,6 +5999,8 @@ class Handler(http.server.BaseHTTPRequestHandler):
last_resp_id = last_output = last_status = None last_resp_id = last_output = last_status = None
finish_reason = None finish_reason = None
has_content = False has_content = False
has_message = False
has_tool_call = False
for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")): for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")):
collected_events.append(event) collected_events.append(event)
_observe_event(event) _observe_event(event)
@@ -5943,9 +6108,17 @@ class Handler(http.server.BaseHTTPRequestHandler):
_smart_attempt = 0 _smart_attempt = 0
while _smart_attempt < _smart_max: while _smart_attempt < _smart_max:
_has_tool_calls_in_output = any(o.get("type") == "function_call" for o in (last_output or [])) _has_tool_calls_in_output = any(o.get("type") == "function_call" for o in (last_output or []))
last_text = ""
for o in (last_output or []):
if o.get("type") == "message":
for c in (o.get("content") or []):
if isinstance(c, dict) and c.get("type") == "output_text":
last_text += c.get("text", "")
_looks_like_tools = _text_looks_like_tool_calls(last_text)
_has_prior_tool_ctx = has_function_call_output(input_data)
if not (finish_reason == "stop" and has_content and not _has_tool_calls_in_output if not (finish_reason == "stop" and has_content and not _has_tool_calls_in_output
and isinstance(input_data, list) and len(input_data) >= 3 and isinstance(input_data, list) and len(input_data) >= 3
and has_function_call_output(input_data)): and (_has_prior_tool_ctx or _looks_like_tools)):
break break
_smart_attempt += 1 _smart_attempt += 1
_nudges = [ _nudges = [
@@ -5954,12 +6127,6 @@ class Handler(http.server.BaseHTTPRequestHandler):
] ]
nudge_text = _nudges[min(_smart_attempt - 1, len(_nudges) - 1)] nudge_text = _nudges[min(_smart_attempt - 1, len(_nudges) - 1)]
# Try extracting XML tool calls from text as fallback before nudging # Try extracting XML tool calls from text as fallback before nudging
last_text = ""
for o in (last_output or []):
if o.get("type") == "message":
for c in (o.get("content") or []):
if isinstance(c, dict) and c.get("type") == "output_text":
last_text += c.get("text", "")
xml_fc = _extract_xml_tool_calls(last_text) xml_fc = _extract_xml_tool_calls(last_text)
if xml_fc: if xml_fc:
print(f"[{self._session_id}] [smart-continue] extracted {len(xml_fc)} XML tool calls from text, injecting and retrying", file=sys.stderr) print(f"[{self._session_id}] [smart-continue] extracted {len(xml_fc)} XML tool calls from text, injecting and retrying", file=sys.stderr)
@@ -5979,6 +6146,8 @@ class Handler(http.server.BaseHTTPRequestHandler):
last_resp_id = last_output = last_status = None last_resp_id = last_output = last_status = None
finish_reason = None finish_reason = None
has_content = False has_content = False
has_message = False
has_tool_call = False
for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")): for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")):
collected_events.append(event) collected_events.append(event)
_observe_event(event) _observe_event(event)
@@ -5988,19 +6157,21 @@ class Handler(http.server.BaseHTTPRequestHandler):
print(f"[{self._session_id}] [smart-continue] XML injection retry failed: {e}", file=sys.stderr) print(f"[{self._session_id}] [smart-continue] XML injection retry failed: {e}", file=sys.stderr)
break break
_nudge_msg = {"role": "user", "content": nudge_text} _nudge_msg = {"role": "user", "content": nudge_text}
nudge_messages = oa_input_to_messages(input_data) + [_nudge_msg] nudge_messages = oa_input_to_messages(_strip_images_from_input(input_data, model)) + [_nudge_msg]
instructions = body.get("instructions", "").strip() instructions = body.get("instructions", "").strip()
if instructions: if instructions:
nudge_messages.insert(0, {"role": "system", "content": instructions}) nudge_messages.insert(0, {"role": "system", "content": instructions})
nudge_chat_body = self._build_chat_body(model, nudge_messages, body, stream) nudge_chat_body = self._build_chat_body(model, nudge_messages, body, stream)
nudge_req = urllib.request.Request(target, data=json.dumps(nudge_chat_body).encode(), headers=fwd) nudge_req = urllib.request.Request(target, data=json.dumps(nudge_chat_body).encode(), headers=fwd)
print(f"[{self._session_id}] [smart-continue] attempt {_smart_attempt}/{_smart_max}: model stopped mid-task, nudging", file=sys.stderr) print(f"[{self._session_id}] [smart-continue] attempt {_smart_attempt}/{_smart_max}: model stopped mid-task (prior_ctx={_has_prior_tool_ctx} text_tools={_looks_like_tools}), nudging", file=sys.stderr)
try: try:
retry_upstream = urllib.request.urlopen(nudge_req, timeout=_upstream_timeout(body, True)) retry_upstream = urllib.request.urlopen(nudge_req, timeout=_upstream_timeout(body, True))
collected_events = [] collected_events = []
last_resp_id = last_output = last_status = None last_resp_id = last_output = last_status = None
finish_reason = None finish_reason = None
has_content = False has_content = False
has_message = False
has_tool_call = False
for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")): for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")):
collected_events.append(event) collected_events.append(event)
_observe_event(event) _observe_event(event)

View File

@@ -1469,6 +1469,53 @@ _CROF_ADAPTIVE = {
"min_keep_recent": 6, "min_keep_recent": 6,
} }
_model_max_tokens = {}
_model_max_tokens_lock = threading.Lock()
def _estimate_tokens(item):
if not isinstance(item, dict):
return 4
t = item.get("type", "")
if t == "message":
content = item.get("content", "")
if isinstance(content, str):
return max(4, len(content) // 4)
elif isinstance(content, list):
total = 4
for part in content:
pt = part.get("type", "")
if pt in ("input_text", "output_text"):
total += max(4, len(part.get("text", "")) // 4)
elif pt == "input_image":
total += 800
elif pt in ("function_call",):
total += max(20, len(part.get("arguments", "{}")) // 2)
elif pt == "function_call_output":
total += max(8, len(part.get("output", "")) // 4)
return total
elif t in ("function_call_output",):
return max(8, len(item.get("output", "")) // 4)
elif t == "function_call":
return max(20, len(item.get("arguments", "{}")) // 2)
return 4
def _estimate_input_tokens(input_data):
if not isinstance(input_data, list):
return 0
return sum(_estimate_tokens(i) for i in input_data)
def _get_model_max_tokens(model):
with _model_max_tokens_lock:
return _model_max_tokens.get(model)
def _set_model_max_tokens(model, tokens):
if model and tokens:
with _model_max_tokens_lock:
existing = _model_max_tokens.get(model)
if existing is None or tokens < existing:
_model_max_tokens[model] = tokens
print(f"[ctx-limit] learned {model} max ~{tokens} tokens", file=sys.stderr)
_BGP_STATS_PATH = os.path.join(_LOG_DIR, "bgp-route-stats.json") _BGP_STATS_PATH = os.path.join(_LOG_DIR, "bgp-route-stats.json")
_bgp_stats_lock = threading.Lock() _bgp_stats_lock = threading.Lock()
@@ -1534,8 +1581,6 @@ def _sorted_bgp_routes():
return sorted(BGP_ROUTES, key=lambda r: _score_route(r, stats)) return sorted(BGP_ROUTES, key=lambda r: _score_route(r, stats))
def _crof_record(model, n_items, success): def _crof_record(model, n_items, success):
if TARGET_URL and "crof.ai" not in TARGET_URL:
return
if not isinstance(n_items, int) or n_items < 1: if not isinstance(n_items, int) or n_items < 1:
return return
entry = {"model": model, "items": n_items, "ok": success} entry = {"model": model, "items": n_items, "ok": success}
@@ -1561,20 +1606,36 @@ def _crof_record(model, n_items, success):
global_limit = v["limit"] global_limit = v["limit"]
_CROF_ADAPTIVE["global_item_limit"] = global_limit _CROF_ADAPTIVE["global_item_limit"] = global_limit
if TARGET_URL and "crof.ai" in TARGET_URL: print(f"[crof-adaptive] model={model} items={n_items} {'OK' if success else 'FAIL'} -> limit={ml.get('limit',30)} global={global_limit}", file=sys.stderr)
print(f"[crof-adaptive] model={model} items={n_items} {'OK' if success else 'FAIL'} -> limit={ml.get('limit',30)} global={global_limit}", file=sys.stderr)
def _crof_item_limit(model): def _crof_item_limit(model):
ml = _CROF_ADAPTIVE["model_limits"].get(model, {}) ml = _CROF_ADAPTIVE["model_limits"].get(model, {})
per_model = ml.get("limit", 30) per_model = ml.get("limit", 30)
return min(per_model, _CROF_ADAPTIVE["global_item_limit"]) return min(per_model, _CROF_ADAPTIVE["global_item_limit"])
def _crof_compact_for_retry(input_data, model): def _crof_compact_for_retry(input_data, model, aggression=0):
limit = _crof_item_limit(model) limit = _crof_item_limit(model)
if not isinstance(input_data, list) or len(input_data) <= limit: if not isinstance(input_data, list) or len(input_data) < 2:
return input_data
max_tok = _get_model_max_tokens(model)
est = _estimate_input_tokens(input_data)
over_item_limit = len(input_data) > limit
over_token_limit = max_tok and est >= max_tok * 0.9
if not over_item_limit and not over_token_limit:
return input_data return input_data
keep = max(_CROF_ADAPTIVE["min_keep_recent"], limit // 3) keep = max(_CROF_ADAPTIVE["min_keep_recent"], limit // 3)
if over_token_limit:
ratio = est / max_tok
if aggression >= 1 or ratio > 1.5:
keep = max(2, _CROF_ADAPTIVE["min_keep_recent"] // 2)
elif ratio > 1.2:
keep = max(3, keep // 2)
print(f"[ctx-limit] model={model} est={est}tok max={max_tok}tok ratio={ratio:.2f} -> keep={keep}", file=sys.stderr)
elif over_item_limit:
keep = max(keep, 6)
head_end = 0 head_end = 0
for i, item in enumerate(input_data): for i, item in enumerate(input_data):
t = item.get("type") t = item.get("type")
@@ -1607,8 +1668,7 @@ def _crof_compact_for_retry(input_data, model):
summary_lines.append(_item_summary(item, max_len=120)) summary_lines.append(_item_summary(item, max_len=120))
summary_msg = {"type": "message", "role": "user", "content": [{"type": "input_text", "text": "\n".join(summary_lines)}]} summary_msg = {"type": "message", "role": "user", "content": [{"type": "input_text", "text": "\n".join(summary_lines)}]}
if TARGET_URL and "crof.ai" in TARGET_URL: print(f"[crof-adaptive] RETRY compact: {len(input_data)} -> {len(head)+1+len(tail)} (limit={limit}, keep={len(tail)}, agg={aggression})", file=sys.stderr)
print(f"[crof-adaptive] RETRY compact: {len(input_data)} -> {len(head)+1+len(tail)} (limit={limit}, keep={len(tail)})", file=sys.stderr)
return head + [summary_msg] + tail return head + [summary_msg] + tail
def _item_summary(item, max_len=200): def _item_summary(item, max_len=200):
@@ -2051,6 +2111,18 @@ def synthesize_tool_results_for_chat(input_items):
def has_function_call_output(input_items): def has_function_call_output(input_items):
return isinstance(input_items, list) and any(i.get("type") == "function_call_output" for i in input_items) return isinstance(input_items, list) and any(i.get("type") == "function_call_output" for i in input_items)
_TOOL_CALL_TEXT_PATTERNS = re.compile(
r'(?:^|\n)[\s•\-\*]*\(?'
r'(?:exec_command|write_to_file|exec_bash|bash|run_command|shell|edit_file|read_file|search_files|list_files)'
r'[\s:]',
re.I | re.MULTILINE
)
def _text_looks_like_tool_calls(text):
if not text or len(text) < 6:
return False
return bool(_TOOL_CALL_TEXT_PATTERNS.search(text))
# ═══════════════════════════════════════════════════════════════════ # ═══════════════════════════════════════════════════════════════════
# Log redaction # Log redaction
# ═══════════════════════════════════════════════════════════════════ # ═══════════════════════════════════════════════════════════════════
@@ -2233,9 +2305,14 @@ def _normalize_tool_args(raw_args):
except json.JSONDecodeError: except json.JSONDecodeError:
return raw_args return raw_args
_XML_TC_RE = re.compile(r'<tool_call>(\w+)(.*?)</tool_call>', re.DOTALL) _XML_TC_RE = re.compile(r'exec_command(.*?)</invoke>', re.DOTALL)
_XML_ARG_VALUE_RE = re.compile(r'</?arg_value>\s*') _XML_ARG_VALUE_RE = re.compile(r'</?arg_value>\s*')
_PAREN_TC_RE = re.compile(
r'(?:^|[\n•\-\*]\s*)\(\s*(exec_command|write_to_file|exec_bash|bash|run_command|shell|edit_file|read_file|search_files|list_files)\b\s*(.*?)\)',
re.DOTALL | re.I
)
def _extract_xml_tool_calls(text): def _extract_xml_tool_calls(text):
if not text: if not text:
return [] return []
@@ -2262,6 +2339,68 @@ def _extract_xml_tool_calls(text):
results.append({"name": name, "args": args_str, "call_id": f"xml_{len(results)}"}) results.append({"name": name, "args": args_str, "call_id": f"xml_{len(results)}"})
return results return results
_NON_VISION_MODEL_PATTERNS = re.compile(
r'\b(deepseek|glm|mixtral|llama\b(?!.*vision)|command|dbrx|qwen\b(?!.*vl)|phi-?3(?!.*vision))',
re.I
)
_vision_fail_cache = set()
_vision_fail_lock = threading.Lock()
def _model_supports_vision(model):
if not model:
return True
with _vision_fail_lock:
if model in _vision_fail_cache:
return False
if _NON_VISION_MODEL_PATTERNS.search(model):
return False
return True
def _mark_vision_fail(model):
if model:
with _vision_fail_lock:
_vision_fail_cache.add(model)
def _strip_images_from_input(input_data, model):
if not isinstance(input_data, list) or _model_supports_vision(model):
return input_data
modified = False
result = []
for item in input_data:
if item.get("type") != "message":
result.append(item)
continue
content = item.get("content", [])
if isinstance(content, str):
result.append(item)
continue
new_content = []
has_img = False
for part in content:
if isinstance(part, str):
new_content.append(part)
continue
pt = part.get("type", "")
if pt in ("input_image", "image_url"):
if not has_img:
fname = part.get("image_url", {}).get("url", part.get("url", "image.png"))
if fname.startswith("data:"):
fname = "screenshot.png"
new_content.append({"type": "output_text", "text": f"[User attached image: {fname} — this model does not support vision]"})
has_img = True
modified = True
else:
new_content.append(part)
if modified:
result.append({**item, "content": new_content})
else:
result.append(item)
if modified:
print(f"[vision-filter] stripped {sum(1 for i in input_data if i.get('type')=='message' and any(c.get('type') in ('input_image','image_url') for c in (i.get('content') or []) if isinstance(c,dict)))} images for model={model}", file=sys.stderr)
return result
return input_data
def oa_input_to_messages(input_data): def oa_input_to_messages(input_data):
msgs = [] msgs = []
tool_name_by_id = {} tool_name_by_id = {}
@@ -4889,12 +5028,25 @@ class Handler(http.server.BaseHTTPRequestHandler):
body["input"] = input_data body["input"] = input_data
crof_limit = _crof_item_limit(model) crof_limit = _crof_item_limit(model)
_crof_eligible = TARGET_URL and "crof.ai" in TARGET_URL _crof_eligible = True
if _crof_eligible and not compacted and isinstance(input_data, list) and len(input_data) > crof_limit: if _crof_eligible and not compacted and isinstance(input_data, list):
print(f"[crof-adaptive] proactive compact: {len(input_data)} items > limit {crof_limit}", file=sys.stderr) _needs_compact = len(input_data) > crof_limit
input_data = _crof_compact_for_retry(input_data, model) max_tok = _get_model_max_tokens(model)
body = dict(body) est_tok = _estimate_input_tokens(input_data) if max_tok else 0
body["input"] = input_data if not _needs_compact and max_tok and est_tok > max_tok * 0.8:
_needs_compact = True
if _needs_compact:
_agg = 0
if max_tok and est_tok > max_tok:
_agg = 1
print(f"[crof-adaptive] proactive compact: {len(input_data)} items, est={est_tok}tok max={max_tok}tok agg={_agg}", file=sys.stderr)
input_data = _crof_compact_for_retry(input_data, model, aggression=_agg)
body = dict(body)
body["input"] = input_data
# Strip images for non-vision models
input_data = _strip_images_from_input(input_data, model)
body["input"] = input_data
messages = oa_input_to_messages(input_data) messages = oa_input_to_messages(input_data)
messages = _inject_stored_reasoning(messages) messages = _inject_stored_reasoning(messages)
@@ -4927,14 +5079,19 @@ class Handler(http.server.BaseHTTPRequestHandler):
except urllib.error.HTTPError as e: except urllib.error.HTTPError as e:
err_body = e.read().decode() err_body = e.read().decode()
if "context_length_exceeded" in err_body and attempt < max_retries: if "context_length_exceeded" in err_body and attempt < max_retries:
print(f"[{self._session_id}] context_length_exceeded (attempt {attempt+1}/{max_retries}), retrying with extreme compaction!", file=sys.stderr) import re as _re
_tok_m = _re.search(r'~?(\d+)\s*tokens', err_body)
if _tok_m:
_set_model_max_tokens(model, int(_tok_m.group(1)))
print(f"[{self._session_id}] context_length_exceeded (attempt {attempt+1}/{max_retries}), retrying with compaction (agg={attempt})!", file=sys.stderr)
policy = provider_policy() policy = provider_policy()
if isinstance(input_data, list): if isinstance(input_data, list):
print(f"[{self._session_id}] applying extreme compaction to {len(input_data)} items", file=sys.stderr) est = _estimate_input_tokens(input_data)
input_data = _crof_compact_for_retry(input_data, model) print(f"[{self._session_id}] applying compaction to {len(input_data)} items ~{est}tok", file=sys.stderr)
input_data = _crof_compact_for_retry(input_data, model, aggression=attempt)
body = dict(body) body = dict(body)
body["input"] = input_data body["input"] = input_data
messages = oa_input_to_messages(input_data) messages = oa_input_to_messages(_strip_images_from_input(input_data, model))
messages = _inject_stored_reasoning(messages) messages = _inject_stored_reasoning(messages)
instructions = body.get("instructions", "").strip() instructions = body.get("instructions", "").strip()
if instructions: if instructions:
@@ -5725,9 +5882,11 @@ class Handler(http.server.BaseHTTPRequestHandler):
last_status = None last_status = None
finish_reason = None finish_reason = None
has_content = False has_content = False
has_message = False
has_tool_call = False
def _observe_event(event): def _observe_event(event):
nonlocal last_resp_id, last_output, last_status, finish_reason, has_content nonlocal last_resp_id, last_output, last_status, finish_reason, has_content, has_message, has_tool_call
for line in event.strip().split("\n"): for line in event.strip().split("\n"):
if line.startswith("data: "): if line.startswith("data: "):
try: try:
@@ -5737,7 +5896,9 @@ class Handler(http.server.BaseHTTPRequestHandler):
last_output = d.get("response", {}).get("output", []) last_output = d.get("response", {}).get("output", [])
last_status = d.get("response", {}).get("status") last_status = d.get("response", {}).get("status")
finish_reason = "length" if last_status == "incomplete" else "stop" finish_reason = "length" if last_status == "incomplete" else "stop"
has_content = any(o.get("type") == "message" for o in (last_output or [])) has_tool_call = any(o.get("type") == "function_call" for o in (last_output or []))
has_message = any(o.get("type") == "message" for o in (last_output or []))
has_content = has_message or has_tool_call
except Exception: except Exception:
pass pass
@@ -5749,7 +5910,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
break break
collected_events.append(event) collected_events.append(event)
_observe_event(event) _observe_event(event)
print(f"[{self._session_id}] stream ended: events={len(collected_events)} finish={finish_reason} has_content={has_content} elapsed={time.time()-t0:.1f}s", file=sys.stderr) print(f"[{self._session_id}] stream ended: events={len(collected_events)} finish={finish_reason} has_content={has_content} has_message={has_message} has_tool_call={has_tool_call} elapsed={time.time()-t0:.1f}s", file=sys.stderr)
except (ConnectionResetError, BrokenPipeError, ConnectionAbortedError): except (ConnectionResetError, BrokenPipeError, ConnectionAbortedError):
print("[translate-proxy] client disconnected during stream", file=sys.stderr) print("[translate-proxy] client disconnected during stream", file=sys.stderr)
_crof_record(model, n_items, False) _crof_record(model, n_items, False)
@@ -5805,6 +5966,8 @@ class Handler(http.server.BaseHTTPRequestHandler):
last_resp_id = last_output = last_status = None last_resp_id = last_output = last_status = None
finish_reason = None finish_reason = None
has_content = False has_content = False
has_message = False
has_tool_call = False
for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")): for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")):
collected_events.append(event) collected_events.append(event)
_observe_event(event) _observe_event(event)
@@ -5813,7 +5976,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
print(f"[provider-sensor] synthetic retry failed: {e}", file=sys.stderr) print(f"[provider-sensor] synthetic retry failed: {e}", file=sys.stderr)
# Auto-retry on finish_reason=length with no content due to too much context. # Auto-retry on finish_reason=length with no content due to too much context.
if finish_reason == "length" and not has_content and isinstance(input_data, list) and len(input_data) > 5 and TARGET_URL and "crof.ai" in TARGET_URL: if finish_reason == "length" and not has_content and isinstance(input_data, list) and len(input_data) > 5:
print(f"[crof-adaptive] RETRY: finish_reason=length with no content, compacting {n_items} items", file=sys.stderr) print(f"[crof-adaptive] RETRY: finish_reason=length with no content, compacting {n_items} items", file=sys.stderr)
new_input = _crof_compact_for_retry(input_data, model) new_input = _crof_compact_for_retry(input_data, model)
if len(new_input) < len(input_data): if len(new_input) < len(input_data):
@@ -5836,6 +5999,8 @@ class Handler(http.server.BaseHTTPRequestHandler):
last_resp_id = last_output = last_status = None last_resp_id = last_output = last_status = None
finish_reason = None finish_reason = None
has_content = False has_content = False
has_message = False
has_tool_call = False
for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")): for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")):
collected_events.append(event) collected_events.append(event)
_observe_event(event) _observe_event(event)
@@ -5943,9 +6108,17 @@ class Handler(http.server.BaseHTTPRequestHandler):
_smart_attempt = 0 _smart_attempt = 0
while _smart_attempt < _smart_max: while _smart_attempt < _smart_max:
_has_tool_calls_in_output = any(o.get("type") == "function_call" for o in (last_output or [])) _has_tool_calls_in_output = any(o.get("type") == "function_call" for o in (last_output or []))
last_text = ""
for o in (last_output or []):
if o.get("type") == "message":
for c in (o.get("content") or []):
if isinstance(c, dict) and c.get("type") == "output_text":
last_text += c.get("text", "")
_looks_like_tools = _text_looks_like_tool_calls(last_text)
_has_prior_tool_ctx = has_function_call_output(input_data)
if not (finish_reason == "stop" and has_content and not _has_tool_calls_in_output if not (finish_reason == "stop" and has_content and not _has_tool_calls_in_output
and isinstance(input_data, list) and len(input_data) >= 3 and isinstance(input_data, list) and len(input_data) >= 3
and has_function_call_output(input_data)): and (_has_prior_tool_ctx or _looks_like_tools)):
break break
_smart_attempt += 1 _smart_attempt += 1
_nudges = [ _nudges = [
@@ -5954,12 +6127,6 @@ class Handler(http.server.BaseHTTPRequestHandler):
] ]
nudge_text = _nudges[min(_smart_attempt - 1, len(_nudges) - 1)] nudge_text = _nudges[min(_smart_attempt - 1, len(_nudges) - 1)]
# Try extracting XML tool calls from text as fallback before nudging # Try extracting XML tool calls from text as fallback before nudging
last_text = ""
for o in (last_output or []):
if o.get("type") == "message":
for c in (o.get("content") or []):
if isinstance(c, dict) and c.get("type") == "output_text":
last_text += c.get("text", "")
xml_fc = _extract_xml_tool_calls(last_text) xml_fc = _extract_xml_tool_calls(last_text)
if xml_fc: if xml_fc:
print(f"[{self._session_id}] [smart-continue] extracted {len(xml_fc)} XML tool calls from text, injecting and retrying", file=sys.stderr) print(f"[{self._session_id}] [smart-continue] extracted {len(xml_fc)} XML tool calls from text, injecting and retrying", file=sys.stderr)
@@ -5979,6 +6146,8 @@ class Handler(http.server.BaseHTTPRequestHandler):
last_resp_id = last_output = last_status = None last_resp_id = last_output = last_status = None
finish_reason = None finish_reason = None
has_content = False has_content = False
has_message = False
has_tool_call = False
for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")): for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")):
collected_events.append(event) collected_events.append(event)
_observe_event(event) _observe_event(event)
@@ -5988,19 +6157,21 @@ class Handler(http.server.BaseHTTPRequestHandler):
print(f"[{self._session_id}] [smart-continue] XML injection retry failed: {e}", file=sys.stderr) print(f"[{self._session_id}] [smart-continue] XML injection retry failed: {e}", file=sys.stderr)
break break
_nudge_msg = {"role": "user", "content": nudge_text} _nudge_msg = {"role": "user", "content": nudge_text}
nudge_messages = oa_input_to_messages(input_data) + [_nudge_msg] nudge_messages = oa_input_to_messages(_strip_images_from_input(input_data, model)) + [_nudge_msg]
instructions = body.get("instructions", "").strip() instructions = body.get("instructions", "").strip()
if instructions: if instructions:
nudge_messages.insert(0, {"role": "system", "content": instructions}) nudge_messages.insert(0, {"role": "system", "content": instructions})
nudge_chat_body = self._build_chat_body(model, nudge_messages, body, stream) nudge_chat_body = self._build_chat_body(model, nudge_messages, body, stream)
nudge_req = urllib.request.Request(target, data=json.dumps(nudge_chat_body).encode(), headers=fwd) nudge_req = urllib.request.Request(target, data=json.dumps(nudge_chat_body).encode(), headers=fwd)
print(f"[{self._session_id}] [smart-continue] attempt {_smart_attempt}/{_smart_max}: model stopped mid-task, nudging", file=sys.stderr) print(f"[{self._session_id}] [smart-continue] attempt {_smart_attempt}/{_smart_max}: model stopped mid-task (prior_ctx={_has_prior_tool_ctx} text_tools={_looks_like_tools}), nudging", file=sys.stderr)
try: try:
retry_upstream = urllib.request.urlopen(nudge_req, timeout=_upstream_timeout(body, True)) retry_upstream = urllib.request.urlopen(nudge_req, timeout=_upstream_timeout(body, True))
collected_events = [] collected_events = []
last_resp_id = last_output = last_status = None last_resp_id = last_output = last_status = None
finish_reason = None finish_reason = None
has_content = False has_content = False
has_message = False
has_tool_call = False
for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")): for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")):
collected_events.append(event) collected_events.append(event)
_observe_event(event) _observe_event(event)