v3.11.5: token-aware compaction, vision filter, universal adaptive compaction, smart-continue text detection

This commit is contained in:
Roman | RyzenAdvanced
2026-05-26 16:14:05 +04:00
Unverified
parent 028185652d
commit b029e7cb5e
9 changed files with 684 additions and 127 deletions

View File

@@ -1,5 +1,29 @@
# Changelog
## v3.11.5 (2026-05-26)
**Vision Filter, Token-Aware Compaction, Universal Adaptive Compaction, Smart-Continue Text Detection**
### Critical Fixes
- **Token-aware compaction for small-context models (FIX)**: `_crof_compact_for_retry()` had an early return at `len(input_data) <= limit` (item count) — if you had 25 items × 1600 tokens = 40K tokens, it skipped compaction entirely because 25 < 30 (the default item limit). Now also checks estimated token count vs learned model max, and compacts when either item count OR token count exceeds limits. Fixes repeated `context_length_exceeded` errors on models like 0G-GLM-5.1 (~35K token context).
- **Proactive compaction now token-aware**: Previously only triggered when item count > 30. Now also triggers when estimated tokens exceed 80% of the model's learned token limit, even if item count is below the threshold. Prevents the first-request failure pattern on small-context models.
- **Compaction aggression threshold**: Changed `est > max_tok` to `est >= max_tok * 0.9` to avoid edge case where estimated tokens exactly equal the limit and compaction is skipped.
- **Removed all `crof.ai` gates from adaptive compaction**: Proactive compaction, `finish_reason=length` retry, `_crof_record`, and compaction logging were gated behind `"crof.ai" in TARGET_URL`. These gates prevented OpenAdapter and other providers from getting proactive/retry compaction, causing repeated `context_length_exceeded` failures. Now applies universally to ALL providers.
### New Features
- **Vision model detection + image stripping**: `_strip_images_from_input()` and `_model_supports_vision()` detect vision capability by model name pattern. Non-vision models (deepseek, glm, mixtral, llama, command, dbrx, qwen, phi-3) have `input_image`/`image_url` parts stripped and replaced with `[User attached image: filename — this model does not support vision]` text notice. Vision models (gpt-4o, gemini, claude, qwen-vl, glm-5v) keep images intact. Applied in 3 paths: main request, context_length_exceeded retry, smart-continue nudge.
- **Token estimation and per-model limit learning**: `_estimate_tokens()`, `_estimate_input_tokens()`, `_get_model_max_tokens()`, `_set_model_max_tokens()`. Extracts `~N tokens` from `context_length_exceeded` error messages and stores per-model token limits. Used by proactive compaction and retry compaction to adjust `keep` count dynamically.
- **Compaction aggression levels**: `_crof_compact_for_retry()` accepts `aggression` parameter (0=normal, 1=extreme). Extreme mode kicks in when estimated tokens > 1.5× the learned limit or on 2nd+ retry attempt. Reduces `keep` count to minimum, ensuring the compacted request fits within model limits.
- **Smart-continue text-tool detection**: Removed hard requirement for `has_function_call_output(input_data)`. Added `_TOOL_CALL_TEXT_PATTERNS` and `_text_looks_like_tool_calls()` to trigger nudging when model outputs text matching tool-call patterns (e.g., `• (exec_command cmd ...)`, `write_to_file`, `exec_command`) even without prior `function_call_output` in context. Essential for models like 0G-GLM-5.1 that never emit real `function_call_output` items.
- **Parenthesized tool call regex**: `_PAREN_TC_RE` pattern to match `• (name args...)` format from non-vision models that output tool calls as parenthesized text.
### GUI Fixes
- **Active endpoint sync**: Added `set_active_endpoint()` and `validate_active_endpoint()` to Linux GTK GUI. Syncs `.active-endpoint.json` with `config.toml` on every launch; auto-removes stale references to deleted providers. Fixed `"Error loading configuration: No such file or directory (os error 2)"` crash when active endpoint referenced a deleted provider.
- **Config state**: `~/.codex/.active-endpoint.json` and `config.toml` model catalog path validated and auto-corrected on GUI startup.
## v3.11.0 (2026-05-26)
**Cobra PR Merge + Smart Continuation + API Key Hot-Reload**

View File

@@ -130,6 +130,10 @@ A three-component system:
- **Response store TTL** — evicts stored responses older than 10 minutes, prevents memory leaks
- **Bounded stream buffers** — 8MB cap prevents OOM on pathological responses
- **Dual logging** — all proxy messages written to both stderr and `~/.cache/codex-proxy/proxy.log`
- **Vision model detection** (v3.11.5) — automatically strips images for non-vision models (DeepSeek, GLM, Qwen, etc.) and replaces with text notice; vision-capable models (GPT-4o, Gemini, Claude, Qwen-VL) keep images intact
- **Token-aware compaction** (v3.11.5) — learns per-model token limits from `context_length_exceeded` errors; proactively compacts when estimated tokens exceed 80% of limit; prevents repeated context overflow on small-context models (~35K tokens)
- **Universal adaptive compaction** (v3.11.5) — compaction now works for ALL providers (was Crof.ai-only); proactive + retry compaction with aggression levels (normal/extreme)
- **Smart-continue text detection** (v3.11.5) — triggers continuation nudging when model outputs text matching tool-call patterns, essential for text-only models that never emit real `function_call_output` items
- Zero dependencies — pure Python stdlib
### Command Code Adapter

Binary file not shown.

Binary file not shown.

View File

@@ -3,13 +3,13 @@ set -e
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
if [ -f "$SCRIPT_DIR/codex-launcher_3.11.0_all.deb" ]; then
echo "Installing codex-launcher_3.11.0_all.deb ..."
sudo dpkg -i "$SCRIPT_DIR/codex-launcher_3.11.0_all.deb"
if [ -f "$SCRIPT_DIR/codex-launcher_3.11.5_all.deb" ]; then
echo "Installing codex-launcher_3.11.5_all.deb ..."
sudo dpkg -i "$SCRIPT_DIR/codex-launcher_3.11.5_all.deb"
else
echo "WARNING: codex-launcher_3.11.0_all.deb not found; copying files manually."
echo "WARNING: codex-launcher_3.11.5_all.deb not found; copying files manually."
fi
echo "Installed v3.11.0 via .deb package."
echo "Installed v3.11.5 via .deb package."
echo " translate-proxy.py -> /usr/bin/translate-proxy.py"
echo " codex-launcher-gui -> /usr/bin/codex-launcher-gui"
echo " cleanup-codex-stale -> /usr/bin/cleanup-codex-stale.sh"

View File

@@ -20,12 +20,22 @@ BGP_POOLS_FILE = HOME / ".codex/bgp-pools.json"
LOG_DIR = HOME / ".cache/codex-desktop"
LAUNCH_LOG = LOG_DIR / "launcher.log"
PROXY_CONFIG_DIR = HOME / ".cache/codex-proxy"
ACTIVE_ENDPOINT_FILE = HOME / ".codex/.active-endpoint.json"
DEFAULT_CONFIG = """model = ""
model_provider = ""
model_catalog_json = ""
"""
CHANGELOG = [
("3.11.5", "2026-05-26", [
"Token-aware compaction: fixes context_length_exceeded on small-context models",
"Proactive compaction triggers on token count, not just item count",
"Universal adaptive compaction for all providers (removed crof.ai gates)",
"Vision model detection + image stripping for non-vision models",
"Per-model token limit learning from error messages",
"Smart-continue text-tool detection for text-only models",
"Active endpoint sync: auto-removes stale references on startup",
]),
("3.11.0", "2026-05-26", [
"Merge cobra PR: concurrency semaphore (max 3), auto-continue for truncated text",
"SO_REUSEADDR on sticky port, proxy-stderr.log, stream diagnostics logging",
@@ -33,7 +43,7 @@ CHANGELOG = [
"Restart Proxy button: only restarts proxy without killing Codex Desktop",
"Tool call argument normalizer: fixes Arguments→arguments, strips markdown wrapping",
"Smart-continue loop (2× retries): escalating nudges when model stops text-only mid-task",
"XML tool call extraction: parses <tool_call> patterns from text, injects as real calls",
"XML tool call extraction: parses <name> patterns from text, injects as real calls",
"Auto-continue + smart-continue ordered with skip guard to avoid double-firing",
"API key hot-reload with mtime tracking + /admin/reload + /admin/verify-key endpoints",
"GUI hot-reload: auto-refreshes proxy key on endpoint edit, verifies with upstream",
@@ -923,6 +933,27 @@ def restore_config():
shutil.copy2(str(CONFIG_BAK), str(tmp))
os.replace(str(tmp), str(CONFIG))
def set_active_endpoint(name):
ACTIVE_ENDPOINT_FILE.parent.mkdir(parents=True, exist_ok=True)
write_secure_text(ACTIVE_ENDPOINT_FILE, json.dumps({"active": name}, indent=2))
def validate_active_endpoint(logfn=None):
if not ACTIVE_ENDPOINT_FILE.exists():
return
try:
d = json.loads(ACTIVE_ENDPOINT_FILE.read_text())
active = d.get("active", "")
if not active:
return
eps = load_endpoints()
names = {ep.get("name", "") for ep in eps}
if active not in names:
ACTIVE_ENDPOINT_FILE.unlink()
if logfn:
logfn(f"Removed stale active-endpoint '{active}' (provider no longer exists)")
except Exception:
pass
def write_secure_text(path, text):
path.parent.mkdir(parents=True, exist_ok=True)
tmp = path.with_suffix(path.suffix + ".tmp")
@@ -1862,6 +1893,7 @@ class LauncherWin(Gtk.Window):
self._proc = None
self._endpoints_data = load_endpoints()
recover_config_if_needed()
validate_active_endpoint()
vbox = Gtk.Box(orientation=Gtk.Orientation.VERTICAL, spacing=8)
self.add(vbox)
@@ -2607,6 +2639,8 @@ class LauncherWin(Gtk.Window):
begin_config_transaction(f"launch:{ep['name']}")
write_config_for_native(ep, model)
set_active_endpoint(ep["name"])
if target == "desktop":
if needs_proxy:
_kill_existing_desktop(self.log)
@@ -2664,6 +2698,7 @@ class LauncherWin(Gtk.Window):
begin_config_transaction(f"launch:bgp:{pool['name']}")
write_config_for_translated(bgp_ep, model, port)
set_active_endpoint(pool["name"])
if target == "desktop":
_kill_existing_desktop(self.log)

View File

@@ -83,14 +83,24 @@ model_catalog_json = ""
"""
CHANGELOG = [
("3.11.5", "2026-05-26", [
"Token-aware compaction: fixes context_length_exceeded on small-context models (25 items × 1600 tokens)",
"Proactive compaction triggers on token count (>80% model limit), not just item count",
"Universal adaptive compaction: removed crof.ai-only gates, all providers get compaction",
"Vision model detection: strips images for non-vision models, keeps for vision-capable ones",
"Per-model token limit learning from context_length_exceeded error messages",
"Compaction aggression levels: normal vs extreme when tokens > 1.5× model limit",
"Smart-continue text-tool detection: triggers on tool-call text patterns, not just function_call_output",
"Active endpoint sync: GUI auto-removes stale endpoint references on startup",
]),
("3.11.0", "2026-05-26", [
"Merge cobra PR: concurrency semaphore (max 3), auto-continue for truncated text",
"SO_REUSEADDR on sticky port, proxy-stderr.log, stream diagnostics logging",
"Timeout/OSError handler sends response.failed SSE instead of silent drop",
"Restart Proxy button: only restarts proxy without killing Codex Desktop",
"Tool call argument normalizer: fixes Argumentsarguments, strips markdown wrapping",
"Smart-continue loop (2× retries): escalating nudges when model stops text-only mid-task",
"XML tool call extraction: parses <tool_call> patterns from text, injects as real calls",
"Tool call argument normalizer: fixes Arguments->arguments, strips markdown wrapping",
"Smart-continue loop (2x retries): escalating nudges when model stops text-only mid-task",
"XML tool call extraction: parses patterns from text, injects as real calls",
"Auto-continue + smart-continue ordered with skip guard to avoid double-firing",
"API key hot-reload with mtime tracking + /admin/reload + /admin/verify-key endpoints",
"GUI hot-reload: auto-refreshes proxy key on endpoint edit, verifies with upstream",

View File

@@ -787,6 +787,10 @@ _GEMINI_AGENT_GUARDRAIL = (
)
_LOG_FILE_LOCK = threading.Lock()
_ANTIGRAVITY_LOOP_TRACKER = {}
_ANTIGRAVITY_LOOP_TRACKER_LOCK = threading.Lock()
def _antigravity_loop_key(session_id):
return f"ag:{session_id}"
def _fetch_antigravity_version():
cache_path = os.path.join(os.path.expanduser("~"), ".cache", "codex-proxy", "antigravity-version.json")
@@ -1469,6 +1473,53 @@ _CROF_ADAPTIVE = {
"min_keep_recent": 6,
}
_model_max_tokens = {}
_model_max_tokens_lock = threading.Lock()
def _estimate_tokens(item):
if not isinstance(item, dict):
return 4
t = item.get("type", "")
if t == "message":
content = item.get("content", "")
if isinstance(content, str):
return max(4, len(content) // 4)
elif isinstance(content, list):
total = 4
for part in content:
pt = part.get("type", "")
if pt in ("input_text", "output_text"):
total += max(4, len(part.get("text", "")) // 4)
elif pt == "input_image":
total += 800
elif pt in ("function_call",):
total += max(20, len(part.get("arguments", "{}")) // 2)
elif pt == "function_call_output":
total += max(8, len(part.get("output", "")) // 4)
return total
elif t in ("function_call_output",):
return max(8, len(item.get("output", "")) // 4)
elif t == "function_call":
return max(20, len(item.get("arguments", "{}")) // 2)
return 4
def _estimate_input_tokens(input_data):
if not isinstance(input_data, list):
return 0
return sum(_estimate_tokens(i) for i in input_data)
def _get_model_max_tokens(model):
with _model_max_tokens_lock:
return _model_max_tokens.get(model)
def _set_model_max_tokens(model, tokens):
if model and tokens:
with _model_max_tokens_lock:
existing = _model_max_tokens.get(model)
if existing is None or tokens < existing:
_model_max_tokens[model] = tokens
print(f"[ctx-limit] learned {model} max ~{tokens} tokens", file=sys.stderr)
_BGP_STATS_PATH = os.path.join(_LOG_DIR, "bgp-route-stats.json")
_bgp_stats_lock = threading.Lock()
@@ -1534,8 +1585,6 @@ def _sorted_bgp_routes():
return sorted(BGP_ROUTES, key=lambda r: _score_route(r, stats))
def _crof_record(model, n_items, success):
if TARGET_URL and "crof.ai" not in TARGET_URL:
return
if not isinstance(n_items, int) or n_items < 1:
return
entry = {"model": model, "items": n_items, "ok": success}
@@ -1561,7 +1610,6 @@ def _crof_record(model, n_items, success):
global_limit = v["limit"]
_CROF_ADAPTIVE["global_item_limit"] = global_limit
if TARGET_URL and "crof.ai" in TARGET_URL:
print(f"[crof-adaptive] model={model} items={n_items} {'OK' if success else 'FAIL'} -> limit={ml.get('limit',30)} global={global_limit}", file=sys.stderr)
def _crof_item_limit(model):
@@ -1569,12 +1617,29 @@ def _crof_item_limit(model):
per_model = ml.get("limit", 30)
return min(per_model, _CROF_ADAPTIVE["global_item_limit"])
def _crof_compact_for_retry(input_data, model):
def _crof_compact_for_retry(input_data, model, aggression=0):
limit = _crof_item_limit(model)
if not isinstance(input_data, list) or len(input_data) <= limit:
if not isinstance(input_data, list) or len(input_data) < 2:
return input_data
max_tok = _get_model_max_tokens(model)
est = _estimate_input_tokens(input_data)
over_item_limit = len(input_data) > limit
over_token_limit = max_tok and est >= max_tok * 0.9
if not over_item_limit and not over_token_limit:
return input_data
keep = max(_CROF_ADAPTIVE["min_keep_recent"], limit // 3)
if over_token_limit:
ratio = est / max_tok
if aggression >= 1 or ratio > 1.5:
keep = max(2, _CROF_ADAPTIVE["min_keep_recent"] // 2)
elif ratio > 1.2:
keep = max(3, keep // 2)
print(f"[ctx-limit] model={model} est={est}tok max={max_tok}tok ratio={ratio:.2f} -> keep={keep}", file=sys.stderr)
elif over_item_limit:
keep = max(keep, 6)
head_end = 0
for i, item in enumerate(input_data):
t = item.get("type")
@@ -1607,8 +1672,7 @@ def _crof_compact_for_retry(input_data, model):
summary_lines.append(_item_summary(item, max_len=120))
summary_msg = {"type": "message", "role": "user", "content": [{"type": "input_text", "text": "\n".join(summary_lines)}]}
if TARGET_URL and "crof.ai" in TARGET_URL:
print(f"[crof-adaptive] RETRY compact: {len(input_data)} -> {len(head)+1+len(tail)} (limit={limit}, keep={len(tail)})", file=sys.stderr)
print(f"[crof-adaptive] RETRY compact: {len(input_data)} -> {len(head)+1+len(tail)} (limit={limit}, keep={len(tail)}, agg={aggression})", file=sys.stderr)
return head + [summary_msg] + tail
def _item_summary(item, max_len=200):
@@ -2051,6 +2115,18 @@ def synthesize_tool_results_for_chat(input_items):
def has_function_call_output(input_items):
return isinstance(input_items, list) and any(i.get("type") == "function_call_output" for i in input_items)
_TOOL_CALL_TEXT_PATTERNS = re.compile(
r'(?:^|\n)[\s•\-\*]*\(?'
r'(?:exec_command|write_to_file|exec_bash|bash|run_command|shell|edit_file|read_file|search_files|list_files)'
r'[\s:]',
re.I | re.MULTILINE
)
def _text_looks_like_tool_calls(text):
if not text or len(text) < 6:
return False
return bool(_TOOL_CALL_TEXT_PATTERNS.search(text))
# ═══════════════════════════════════════════════════════════════════
# Log redaction
# ═══════════════════════════════════════════════════════════════════
@@ -2233,9 +2309,14 @@ def _normalize_tool_args(raw_args):
except json.JSONDecodeError:
return raw_args
_XML_TC_RE = re.compile(r'<tool_call>(\w+)(.*?)</tool_call>', re.DOTALL)
_XML_TC_RE = re.compile(r'exec_command(.*?)</invoke>', re.DOTALL)
_XML_ARG_VALUE_RE = re.compile(r'</?arg_value>\s*')
_PAREN_TC_RE = re.compile(
r'(?:^|[\n•\-\*]\s*)\(\s*(exec_command|write_to_file|exec_bash|bash|run_command|shell|edit_file|read_file|search_files|list_files)\b\s*(.*?)\)',
re.DOTALL | re.I
)
def _extract_xml_tool_calls(text):
if not text:
return []
@@ -2262,6 +2343,68 @@ def _extract_xml_tool_calls(text):
results.append({"name": name, "args": args_str, "call_id": f"xml_{len(results)}"})
return results
_NON_VISION_MODEL_PATTERNS = re.compile(
r'\b(deepseek|glm|mixtral|llama\b(?!.*vision)|command|dbrx|qwen\b(?!.*vl)|phi-?3(?!.*vision))',
re.I
)
_vision_fail_cache = set()
_vision_fail_lock = threading.Lock()
def _model_supports_vision(model):
if not model:
return True
with _vision_fail_lock:
if model in _vision_fail_cache:
return False
if _NON_VISION_MODEL_PATTERNS.search(model):
return False
return True
def _mark_vision_fail(model):
if model:
with _vision_fail_lock:
_vision_fail_cache.add(model)
def _strip_images_from_input(input_data, model):
if not isinstance(input_data, list) or _model_supports_vision(model):
return input_data
modified = False
result = []
for item in input_data:
if item.get("type") != "message":
result.append(item)
continue
content = item.get("content", [])
if isinstance(content, str):
result.append(item)
continue
new_content = []
has_img = False
for part in content:
if isinstance(part, str):
new_content.append(part)
continue
pt = part.get("type", "")
if pt in ("input_image", "image_url"):
if not has_img:
fname = part.get("image_url", {}).get("url", part.get("url", "image.png"))
if fname.startswith("data:"):
fname = "screenshot.png"
new_content.append({"type": "output_text", "text": f"[User attached image: {fname} — this model does not support vision]"})
has_img = True
modified = True
else:
new_content.append(part)
if modified:
result.append({**item, "content": new_content})
else:
result.append(item)
if modified:
print(f"[vision-filter] stripped {sum(1 for i in input_data if i.get('type')=='message' and any(c.get('type') in ('input_image','image_url') for c in (i.get('content') or []) if isinstance(c,dict)))} images for model={model}", file=sys.stderr)
return result
return input_data
def oa_input_to_messages(input_data):
msgs = []
tool_name_by_id = {}
@@ -4889,13 +5032,26 @@ class Handler(http.server.BaseHTTPRequestHandler):
body["input"] = input_data
crof_limit = _crof_item_limit(model)
_crof_eligible = TARGET_URL and "crof.ai" in TARGET_URL
if _crof_eligible and not compacted and isinstance(input_data, list) and len(input_data) > crof_limit:
print(f"[crof-adaptive] proactive compact: {len(input_data)} items > limit {crof_limit}", file=sys.stderr)
input_data = _crof_compact_for_retry(input_data, model)
_crof_eligible = True
if _crof_eligible and not compacted and isinstance(input_data, list):
_needs_compact = len(input_data) > crof_limit
max_tok = _get_model_max_tokens(model)
est_tok = _estimate_input_tokens(input_data) if max_tok else 0
if not _needs_compact and max_tok and est_tok > max_tok * 0.8:
_needs_compact = True
if _needs_compact:
_agg = 0
if max_tok and est_tok > max_tok:
_agg = 1
print(f"[crof-adaptive] proactive compact: {len(input_data)} items, est={est_tok}tok max={max_tok}tok agg={_agg}", file=sys.stderr)
input_data = _crof_compact_for_retry(input_data, model, aggression=_agg)
body = dict(body)
body["input"] = input_data
# Strip images for non-vision models
input_data = _strip_images_from_input(input_data, model)
body["input"] = input_data
messages = oa_input_to_messages(input_data)
messages = _inject_stored_reasoning(messages)
instructions = body.get("instructions", "").strip()
@@ -4927,14 +5083,19 @@ class Handler(http.server.BaseHTTPRequestHandler):
except urllib.error.HTTPError as e:
err_body = e.read().decode()
if "context_length_exceeded" in err_body and attempt < max_retries:
print(f"[{self._session_id}] context_length_exceeded (attempt {attempt+1}/{max_retries}), retrying with extreme compaction!", file=sys.stderr)
import re as _re
_tok_m = _re.search(r'~?(\d+)\s*tokens', err_body)
if _tok_m:
_set_model_max_tokens(model, int(_tok_m.group(1)))
print(f"[{self._session_id}] context_length_exceeded (attempt {attempt+1}/{max_retries}), retrying with compaction (agg={attempt})!", file=sys.stderr)
policy = provider_policy()
if isinstance(input_data, list):
print(f"[{self._session_id}] applying extreme compaction to {len(input_data)} items", file=sys.stderr)
input_data = _crof_compact_for_retry(input_data, model)
est = _estimate_input_tokens(input_data)
print(f"[{self._session_id}] applying compaction to {len(input_data)} items ~{est}tok", file=sys.stderr)
input_data = _crof_compact_for_retry(input_data, model, aggression=attempt)
body = dict(body)
body["input"] = input_data
messages = oa_input_to_messages(input_data)
messages = oa_input_to_messages(_strip_images_from_input(input_data, model))
messages = _inject_stored_reasoning(messages)
instructions = body.get("instructions", "").strip()
if instructions:
@@ -5267,22 +5428,26 @@ class Handler(http.server.BaseHTTPRequestHandler):
if not is_latest_simple:
contents.insert(0, {"role": "user", "parts": [{"text": _GEMINI_AGENT_GUARDRAIL}]})
if OAUTH_PROVIDER == "google-antigravity" and isinstance(input_data, list):
_EDIT_WORDS = ("change", "fix", "update", "redesign", "rewrite", "modify", "improve", "replace", "edit", "make it", "add", "remove", "delete", "rename", "move", "convert")
latest_lower = ""
for item in reversed(input_data):
if item.get("type") == "message" and item.get("role") == "user":
c = item.get("content", "")
if isinstance(c, str): latest_lower = c.lower()
elif isinstance(c, list): latest_lower = " ".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict)).lower()
break
if latest_lower and any(w in latest_lower for w in _EDIT_WORDS):
n_tool_calls = sum(1 for it in input_data if isinstance(it, dict) and it.get("type") == "function_call")
contents.append({"role": "user", "parts": [{"text": "!!! ABSOLUTELY NO PLANNING - EMIT THE TOOL CALL NOW !!! IMPORTANT: The user is requesting a modification to existing files. You MUST use tools (exec_command, read_files, write, etc.) to make the changes RIGHT NOW. Do NOT just describe what to do — actually CALL THE TOOLS IN THIS RESPONSE. IMMEDIATELY INSPECT THE FILE OR LIST FILES USING exec_command TOOL CALL."}]})
print(f"[antigravity] edit-intent detected; injected tool-use nudge", file=sys.stderr)
if OAUTH_PROVIDER == "google-antigravity":
import hashlib
ag_key = _antigravity_loop_key(self._session_id)
with _ANTIGRAVITY_LOOP_TRACKER_LOCK:
if ag_key not in _ANTIGRAVITY_LOOP_TRACKER:
_ANTIGRAVITY_LOOP_TRACKER[ag_key] = {
"latest_user_hash": None,
"nudge_injected": False,
"latest_user_appended": False,
"tool_calls_for_request": 0,
"repeated_tool": False,
"force_finalize": False,
"last_tool": None,
"last_tool_count": 0,
}
ag_state = _ANTIGRAVITY_LOOP_TRACKER[ag_key]
if OAUTH_PROVIDER == "google-antigravity" and isinstance(input_data, list):
latest_user = ""
latest_user_hash = None
if isinstance(input_data, list):
for item in reversed(input_data):
if item.get("type") == "message" and item.get("role") == "user":
c = item.get("content", "")
@@ -5292,6 +5457,59 @@ class Handler(http.server.BaseHTTPRequestHandler):
latest_user = "\n".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict))
break
if latest_user:
latest_norm = " ".join(latest_user.strip().split())[:200]
latest_user_hash = hashlib.sha256(latest_norm.encode()).hexdigest()[:16]
if latest_user_hash != ag_state["latest_user_hash"]:
ag_state["latest_user_hash"] = latest_user_hash
ag_state["nudge_injected"] = False
ag_state["latest_user_appended"] = False
ag_state["tool_calls_for_request"] = 0
ag_state["repeated_tool"] = False
ag_state["force_finalize"] = False
ag_state["last_tool"] = None
ag_state["last_tool_count"] = 0
if isinstance(input_data, list):
n_tool_calls = sum(1 for it in input_data if isinstance(it, dict) and it.get("type") == "function_call")
ag_state["tool_calls_for_request"] = n_tool_calls
last_tool_key = None
for item in reversed(input_data):
if isinstance(item, dict) and item.get("type") == "function_call":
fname = item.get("name", "")
args_str = json.dumps(item.get("arguments", {}), sort_keys=True)[:100]
last_tool_key = f"{fname}:{args_str}"
break
if last_tool_key:
if last_tool_key == ag_state["last_tool"]:
ag_state["last_tool_count"] += 1
if ag_state["last_tool_count"] >= 5:
ag_state["repeated_tool"] = True
ag_state["force_finalize"] = True
else:
ag_state["last_tool"] = last_tool_key
ag_state["last_tool_count"] = 1
_EDIT_WORDS = ("change", "fix", "update", "redesign", "rewrite", "modify", "improve", "replace", "edit", "make it", "add", "remove", "delete", "rename", "move", "convert")
latest_lower = ""
if isinstance(input_data, list):
for item in reversed(input_data):
if item.get("type") == "message" and item.get("role") == "user":
c = item.get("content", "")
if isinstance(c, str): latest_lower = c.lower()
elif isinstance(c, list): latest_lower = " ".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict)).lower()
break
if ag_state["force_finalize"]:
contents.append({"role": "user", "parts": [{"text": "STOP CALLING TOOLS. APPLY THE FINAL EDIT OR SUMMARIZE WHAT BLOCKED YOU. DO NOT CALL ANY MORE TOOLS. DO NOT PRODUCE ANY MORE PLANNING TEXT. DO NOT PRODUCE ANY MORE EXPLORATORY TOOL CALLS. PRODUCE A FINAL ANSWER OR A CLEAR STATEMENT OF WHAT IS PREVENTING YOU FROM COMPLETING THE TASK."}]})
elif latest_lower and any(w in latest_lower for w in _EDIT_WORDS) and not ag_state["nudge_injected"] and not ag_state["force_finalize"]:
contents.append({"role": "user", "parts": [{"text": "!!! ABSOLUTELY NO PLANNING - EMIT THE TOOL CALL NOW !!! IMPORTANT: The user is requesting a modification to existing files. You MUST use tools (exec_command, read_files, write, etc.) to make the changes RIGHT NOW. Do NOT just describe what to do — actually CALL THE TOOLS IN THIS RESPONSE. IMMEDIATELY INSPECT THE FILE OR LIST FILES USING exec_command TOOL CALL."}]})
ag_state["nudge_injected"] = True
print(f"[antigravity] edit-intent detected; injected tool-use nudge (first time for this request)", file=sys.stderr)
else:
if ag_state["nudge_injected"]:
print(f"[antigravity] edit-intent nudge already injected, skipping", file=sys.stderr)
if latest_user and not ag_state["latest_user_appended"] and not ag_state["force_finalize"]:
latest_norm = " ".join(latest_user.strip().split())[:160]
final_text = ""
if contents:
@@ -5299,10 +5517,20 @@ class Handler(http.server.BaseHTTPRequestHandler):
if last.get("role") == "user":
final_text = " ".join(json.dumps(last.get("parts", []), ensure_ascii=False).split())
if latest_norm[:120] not in final_text:
print(f"[antigravity] latest user instruction was not final turn; appending", file=sys.stderr)
print(f"[antigravity] latest user instruction was not final turn; appending (first time for this request)", file=sys.stderr)
contents.append({"role": "user", "parts": [{"text": latest_user}]})
ag_state["latest_user_appended"] = True
else:
print(f"[antigravity] latest user instruction is final turn", file=sys.stderr)
else:
if ag_state["latest_user_appended"]:
print(f"[antigravity] latest user instruction already appended, skipping", file=sys.stderr)
print(f"[antigravity-loop] latest_user_hash={latest_user_hash}", file=sys.stderr)
print(f"[antigravity-loop] tool_calls_for_request={ag_state['tool_calls_for_request']}", file=sys.stderr)
print(f"[antigravity-loop] repeated_tool={ag_state['repeated_tool']}", file=sys.stderr)
print(f"[antigravity-loop] nudge_injected={ag_state['nudge_injected']}", file=sys.stderr)
print(f"[antigravity-loop] force_finalize={ag_state['force_finalize']}", file=sys.stderr)
print(f"[{self._session_id}] [antigravity-debug] input_items={len(input_data) if isinstance(input_data, list) else 1} contents={len(contents)} latest={latest_user[:80]!r}", file=sys.stderr)
if contents:
last_c = contents[-1]
@@ -5725,9 +5953,11 @@ class Handler(http.server.BaseHTTPRequestHandler):
last_status = None
finish_reason = None
has_content = False
has_message = False
has_tool_call = False
def _observe_event(event):
nonlocal last_resp_id, last_output, last_status, finish_reason, has_content
nonlocal last_resp_id, last_output, last_status, finish_reason, has_content, has_message, has_tool_call
for line in event.strip().split("\n"):
if line.startswith("data: "):
try:
@@ -5737,7 +5967,9 @@ class Handler(http.server.BaseHTTPRequestHandler):
last_output = d.get("response", {}).get("output", [])
last_status = d.get("response", {}).get("status")
finish_reason = "length" if last_status == "incomplete" else "stop"
has_content = any(o.get("type") == "message" for o in (last_output or []))
has_tool_call = any(o.get("type") == "function_call" for o in (last_output or []))
has_message = any(o.get("type") == "message" for o in (last_output or []))
has_content = has_message or has_tool_call
except Exception:
pass
@@ -5749,7 +5981,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
break
collected_events.append(event)
_observe_event(event)
print(f"[{self._session_id}] stream ended: events={len(collected_events)} finish={finish_reason} has_content={has_content} elapsed={time.time()-t0:.1f}s", file=sys.stderr)
print(f"[{self._session_id}] stream ended: events={len(collected_events)} finish={finish_reason} has_content={has_content} has_message={has_message} has_tool_call={has_tool_call} elapsed={time.time()-t0:.1f}s", file=sys.stderr)
except (ConnectionResetError, BrokenPipeError, ConnectionAbortedError):
print("[translate-proxy] client disconnected during stream", file=sys.stderr)
_crof_record(model, n_items, False)
@@ -5805,6 +6037,8 @@ class Handler(http.server.BaseHTTPRequestHandler):
last_resp_id = last_output = last_status = None
finish_reason = None
has_content = False
has_message = False
has_tool_call = False
for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")):
collected_events.append(event)
_observe_event(event)
@@ -5813,7 +6047,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
print(f"[provider-sensor] synthetic retry failed: {e}", file=sys.stderr)
# Auto-retry on finish_reason=length with no content due to too much context.
if finish_reason == "length" and not has_content and isinstance(input_data, list) and len(input_data) > 5 and TARGET_URL and "crof.ai" in TARGET_URL:
if finish_reason == "length" and not has_content and isinstance(input_data, list) and len(input_data) > 5:
print(f"[crof-adaptive] RETRY: finish_reason=length with no content, compacting {n_items} items", file=sys.stderr)
new_input = _crof_compact_for_retry(input_data, model)
if len(new_input) < len(input_data):
@@ -5836,6 +6070,8 @@ class Handler(http.server.BaseHTTPRequestHandler):
last_resp_id = last_output = last_status = None
finish_reason = None
has_content = False
has_message = False
has_tool_call = False
for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")):
collected_events.append(event)
_observe_event(event)
@@ -5943,9 +6179,17 @@ class Handler(http.server.BaseHTTPRequestHandler):
_smart_attempt = 0
while _smart_attempt < _smart_max:
_has_tool_calls_in_output = any(o.get("type") == "function_call" for o in (last_output or []))
last_text = ""
for o in (last_output or []):
if o.get("type") == "message":
for c in (o.get("content") or []):
if isinstance(c, dict) and c.get("type") == "output_text":
last_text += c.get("text", "")
_looks_like_tools = _text_looks_like_tool_calls(last_text)
_has_prior_tool_ctx = has_function_call_output(input_data)
if not (finish_reason == "stop" and has_content and not _has_tool_calls_in_output
and isinstance(input_data, list) and len(input_data) >= 3
and has_function_call_output(input_data)):
and (_has_prior_tool_ctx or _looks_like_tools)):
break
_smart_attempt += 1
_nudges = [
@@ -5954,12 +6198,6 @@ class Handler(http.server.BaseHTTPRequestHandler):
]
nudge_text = _nudges[min(_smart_attempt - 1, len(_nudges) - 1)]
# Try extracting XML tool calls from text as fallback before nudging
last_text = ""
for o in (last_output or []):
if o.get("type") == "message":
for c in (o.get("content") or []):
if isinstance(c, dict) and c.get("type") == "output_text":
last_text += c.get("text", "")
xml_fc = _extract_xml_tool_calls(last_text)
if xml_fc:
print(f"[{self._session_id}] [smart-continue] extracted {len(xml_fc)} XML tool calls from text, injecting and retrying", file=sys.stderr)
@@ -5979,6 +6217,8 @@ class Handler(http.server.BaseHTTPRequestHandler):
last_resp_id = last_output = last_status = None
finish_reason = None
has_content = False
has_message = False
has_tool_call = False
for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")):
collected_events.append(event)
_observe_event(event)
@@ -5988,19 +6228,21 @@ class Handler(http.server.BaseHTTPRequestHandler):
print(f"[{self._session_id}] [smart-continue] XML injection retry failed: {e}", file=sys.stderr)
break
_nudge_msg = {"role": "user", "content": nudge_text}
nudge_messages = oa_input_to_messages(input_data) + [_nudge_msg]
nudge_messages = oa_input_to_messages(_strip_images_from_input(input_data, model)) + [_nudge_msg]
instructions = body.get("instructions", "").strip()
if instructions:
nudge_messages.insert(0, {"role": "system", "content": instructions})
nudge_chat_body = self._build_chat_body(model, nudge_messages, body, stream)
nudge_req = urllib.request.Request(target, data=json.dumps(nudge_chat_body).encode(), headers=fwd)
print(f"[{self._session_id}] [smart-continue] attempt {_smart_attempt}/{_smart_max}: model stopped mid-task, nudging", file=sys.stderr)
print(f"[{self._session_id}] [smart-continue] attempt {_smart_attempt}/{_smart_max}: model stopped mid-task (prior_ctx={_has_prior_tool_ctx} text_tools={_looks_like_tools}), nudging", file=sys.stderr)
try:
retry_upstream = urllib.request.urlopen(nudge_req, timeout=_upstream_timeout(body, True))
collected_events = []
last_resp_id = last_output = last_status = None
finish_reason = None
has_content = False
has_message = False
has_tool_call = False
for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")):
collected_events.append(event)
_observe_event(event)

View File

@@ -787,6 +787,10 @@ _GEMINI_AGENT_GUARDRAIL = (
)
_LOG_FILE_LOCK = threading.Lock()
_ANTIGRAVITY_LOOP_TRACKER = {}
_ANTIGRAVITY_LOOP_TRACKER_LOCK = threading.Lock()
def _antigravity_loop_key(session_id):
return f"ag:{session_id}"
def _fetch_antigravity_version():
cache_path = os.path.join(os.path.expanduser("~"), ".cache", "codex-proxy", "antigravity-version.json")
@@ -1469,6 +1473,53 @@ _CROF_ADAPTIVE = {
"min_keep_recent": 6,
}
_model_max_tokens = {}
_model_max_tokens_lock = threading.Lock()
def _estimate_tokens(item):
if not isinstance(item, dict):
return 4
t = item.get("type", "")
if t == "message":
content = item.get("content", "")
if isinstance(content, str):
return max(4, len(content) // 4)
elif isinstance(content, list):
total = 4
for part in content:
pt = part.get("type", "")
if pt in ("input_text", "output_text"):
total += max(4, len(part.get("text", "")) // 4)
elif pt == "input_image":
total += 800
elif pt in ("function_call",):
total += max(20, len(part.get("arguments", "{}")) // 2)
elif pt == "function_call_output":
total += max(8, len(part.get("output", "")) // 4)
return total
elif t in ("function_call_output",):
return max(8, len(item.get("output", "")) // 4)
elif t == "function_call":
return max(20, len(item.get("arguments", "{}")) // 2)
return 4
def _estimate_input_tokens(input_data):
if not isinstance(input_data, list):
return 0
return sum(_estimate_tokens(i) for i in input_data)
def _get_model_max_tokens(model):
with _model_max_tokens_lock:
return _model_max_tokens.get(model)
def _set_model_max_tokens(model, tokens):
if model and tokens:
with _model_max_tokens_lock:
existing = _model_max_tokens.get(model)
if existing is None or tokens < existing:
_model_max_tokens[model] = tokens
print(f"[ctx-limit] learned {model} max ~{tokens} tokens", file=sys.stderr)
_BGP_STATS_PATH = os.path.join(_LOG_DIR, "bgp-route-stats.json")
_bgp_stats_lock = threading.Lock()
@@ -1534,8 +1585,6 @@ def _sorted_bgp_routes():
return sorted(BGP_ROUTES, key=lambda r: _score_route(r, stats))
def _crof_record(model, n_items, success):
if TARGET_URL and "crof.ai" not in TARGET_URL:
return
if not isinstance(n_items, int) or n_items < 1:
return
entry = {"model": model, "items": n_items, "ok": success}
@@ -1561,7 +1610,6 @@ def _crof_record(model, n_items, success):
global_limit = v["limit"]
_CROF_ADAPTIVE["global_item_limit"] = global_limit
if TARGET_URL and "crof.ai" in TARGET_URL:
print(f"[crof-adaptive] model={model} items={n_items} {'OK' if success else 'FAIL'} -> limit={ml.get('limit',30)} global={global_limit}", file=sys.stderr)
def _crof_item_limit(model):
@@ -1569,12 +1617,29 @@ def _crof_item_limit(model):
per_model = ml.get("limit", 30)
return min(per_model, _CROF_ADAPTIVE["global_item_limit"])
def _crof_compact_for_retry(input_data, model):
def _crof_compact_for_retry(input_data, model, aggression=0):
limit = _crof_item_limit(model)
if not isinstance(input_data, list) or len(input_data) <= limit:
if not isinstance(input_data, list) or len(input_data) < 2:
return input_data
max_tok = _get_model_max_tokens(model)
est = _estimate_input_tokens(input_data)
over_item_limit = len(input_data) > limit
over_token_limit = max_tok and est >= max_tok * 0.9
if not over_item_limit and not over_token_limit:
return input_data
keep = max(_CROF_ADAPTIVE["min_keep_recent"], limit // 3)
if over_token_limit:
ratio = est / max_tok
if aggression >= 1 or ratio > 1.5:
keep = max(2, _CROF_ADAPTIVE["min_keep_recent"] // 2)
elif ratio > 1.2:
keep = max(3, keep // 2)
print(f"[ctx-limit] model={model} est={est}tok max={max_tok}tok ratio={ratio:.2f} -> keep={keep}", file=sys.stderr)
elif over_item_limit:
keep = max(keep, 6)
head_end = 0
for i, item in enumerate(input_data):
t = item.get("type")
@@ -1607,8 +1672,7 @@ def _crof_compact_for_retry(input_data, model):
summary_lines.append(_item_summary(item, max_len=120))
summary_msg = {"type": "message", "role": "user", "content": [{"type": "input_text", "text": "\n".join(summary_lines)}]}
if TARGET_URL and "crof.ai" in TARGET_URL:
print(f"[crof-adaptive] RETRY compact: {len(input_data)} -> {len(head)+1+len(tail)} (limit={limit}, keep={len(tail)})", file=sys.stderr)
print(f"[crof-adaptive] RETRY compact: {len(input_data)} -> {len(head)+1+len(tail)} (limit={limit}, keep={len(tail)}, agg={aggression})", file=sys.stderr)
return head + [summary_msg] + tail
def _item_summary(item, max_len=200):
@@ -2051,6 +2115,18 @@ def synthesize_tool_results_for_chat(input_items):
def has_function_call_output(input_items):
return isinstance(input_items, list) and any(i.get("type") == "function_call_output" for i in input_items)
_TOOL_CALL_TEXT_PATTERNS = re.compile(
r'(?:^|\n)[\s•\-\*]*\(?'
r'(?:exec_command|write_to_file|exec_bash|bash|run_command|shell|edit_file|read_file|search_files|list_files)'
r'[\s:]',
re.I | re.MULTILINE
)
def _text_looks_like_tool_calls(text):
if not text or len(text) < 6:
return False
return bool(_TOOL_CALL_TEXT_PATTERNS.search(text))
# ═══════════════════════════════════════════════════════════════════
# Log redaction
# ═══════════════════════════════════════════════════════════════════
@@ -2233,9 +2309,14 @@ def _normalize_tool_args(raw_args):
except json.JSONDecodeError:
return raw_args
_XML_TC_RE = re.compile(r'<tool_call>(\w+)(.*?)</tool_call>', re.DOTALL)
_XML_TC_RE = re.compile(r'exec_command(.*?)</invoke>', re.DOTALL)
_XML_ARG_VALUE_RE = re.compile(r'</?arg_value>\s*')
_PAREN_TC_RE = re.compile(
r'(?:^|[\n•\-\*]\s*)\(\s*(exec_command|write_to_file|exec_bash|bash|run_command|shell|edit_file|read_file|search_files|list_files)\b\s*(.*?)\)',
re.DOTALL | re.I
)
def _extract_xml_tool_calls(text):
if not text:
return []
@@ -2262,6 +2343,68 @@ def _extract_xml_tool_calls(text):
results.append({"name": name, "args": args_str, "call_id": f"xml_{len(results)}"})
return results
_NON_VISION_MODEL_PATTERNS = re.compile(
r'\b(deepseek|glm|mixtral|llama\b(?!.*vision)|command|dbrx|qwen\b(?!.*vl)|phi-?3(?!.*vision))',
re.I
)
_vision_fail_cache = set()
_vision_fail_lock = threading.Lock()
def _model_supports_vision(model):
if not model:
return True
with _vision_fail_lock:
if model in _vision_fail_cache:
return False
if _NON_VISION_MODEL_PATTERNS.search(model):
return False
return True
def _mark_vision_fail(model):
if model:
with _vision_fail_lock:
_vision_fail_cache.add(model)
def _strip_images_from_input(input_data, model):
if not isinstance(input_data, list) or _model_supports_vision(model):
return input_data
modified = False
result = []
for item in input_data:
if item.get("type") != "message":
result.append(item)
continue
content = item.get("content", [])
if isinstance(content, str):
result.append(item)
continue
new_content = []
has_img = False
for part in content:
if isinstance(part, str):
new_content.append(part)
continue
pt = part.get("type", "")
if pt in ("input_image", "image_url"):
if not has_img:
fname = part.get("image_url", {}).get("url", part.get("url", "image.png"))
if fname.startswith("data:"):
fname = "screenshot.png"
new_content.append({"type": "output_text", "text": f"[User attached image: {fname} — this model does not support vision]"})
has_img = True
modified = True
else:
new_content.append(part)
if modified:
result.append({**item, "content": new_content})
else:
result.append(item)
if modified:
print(f"[vision-filter] stripped {sum(1 for i in input_data if i.get('type')=='message' and any(c.get('type') in ('input_image','image_url') for c in (i.get('content') or []) if isinstance(c,dict)))} images for model={model}", file=sys.stderr)
return result
return input_data
def oa_input_to_messages(input_data):
msgs = []
tool_name_by_id = {}
@@ -4889,13 +5032,26 @@ class Handler(http.server.BaseHTTPRequestHandler):
body["input"] = input_data
crof_limit = _crof_item_limit(model)
_crof_eligible = TARGET_URL and "crof.ai" in TARGET_URL
if _crof_eligible and not compacted and isinstance(input_data, list) and len(input_data) > crof_limit:
print(f"[crof-adaptive] proactive compact: {len(input_data)} items > limit {crof_limit}", file=sys.stderr)
input_data = _crof_compact_for_retry(input_data, model)
_crof_eligible = True
if _crof_eligible and not compacted and isinstance(input_data, list):
_needs_compact = len(input_data) > crof_limit
max_tok = _get_model_max_tokens(model)
est_tok = _estimate_input_tokens(input_data) if max_tok else 0
if not _needs_compact and max_tok and est_tok > max_tok * 0.8:
_needs_compact = True
if _needs_compact:
_agg = 0
if max_tok and est_tok > max_tok:
_agg = 1
print(f"[crof-adaptive] proactive compact: {len(input_data)} items, est={est_tok}tok max={max_tok}tok agg={_agg}", file=sys.stderr)
input_data = _crof_compact_for_retry(input_data, model, aggression=_agg)
body = dict(body)
body["input"] = input_data
# Strip images for non-vision models
input_data = _strip_images_from_input(input_data, model)
body["input"] = input_data
messages = oa_input_to_messages(input_data)
messages = _inject_stored_reasoning(messages)
instructions = body.get("instructions", "").strip()
@@ -4927,14 +5083,19 @@ class Handler(http.server.BaseHTTPRequestHandler):
except urllib.error.HTTPError as e:
err_body = e.read().decode()
if "context_length_exceeded" in err_body and attempt < max_retries:
print(f"[{self._session_id}] context_length_exceeded (attempt {attempt+1}/{max_retries}), retrying with extreme compaction!", file=sys.stderr)
import re as _re
_tok_m = _re.search(r'~?(\d+)\s*tokens', err_body)
if _tok_m:
_set_model_max_tokens(model, int(_tok_m.group(1)))
print(f"[{self._session_id}] context_length_exceeded (attempt {attempt+1}/{max_retries}), retrying with compaction (agg={attempt})!", file=sys.stderr)
policy = provider_policy()
if isinstance(input_data, list):
print(f"[{self._session_id}] applying extreme compaction to {len(input_data)} items", file=sys.stderr)
input_data = _crof_compact_for_retry(input_data, model)
est = _estimate_input_tokens(input_data)
print(f"[{self._session_id}] applying compaction to {len(input_data)} items ~{est}tok", file=sys.stderr)
input_data = _crof_compact_for_retry(input_data, model, aggression=attempt)
body = dict(body)
body["input"] = input_data
messages = oa_input_to_messages(input_data)
messages = oa_input_to_messages(_strip_images_from_input(input_data, model))
messages = _inject_stored_reasoning(messages)
instructions = body.get("instructions", "").strip()
if instructions:
@@ -5267,22 +5428,26 @@ class Handler(http.server.BaseHTTPRequestHandler):
if not is_latest_simple:
contents.insert(0, {"role": "user", "parts": [{"text": _GEMINI_AGENT_GUARDRAIL}]})
if OAUTH_PROVIDER == "google-antigravity" and isinstance(input_data, list):
_EDIT_WORDS = ("change", "fix", "update", "redesign", "rewrite", "modify", "improve", "replace", "edit", "make it", "add", "remove", "delete", "rename", "move", "convert")
latest_lower = ""
for item in reversed(input_data):
if item.get("type") == "message" and item.get("role") == "user":
c = item.get("content", "")
if isinstance(c, str): latest_lower = c.lower()
elif isinstance(c, list): latest_lower = " ".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict)).lower()
break
if latest_lower and any(w in latest_lower for w in _EDIT_WORDS):
n_tool_calls = sum(1 for it in input_data if isinstance(it, dict) and it.get("type") == "function_call")
contents.append({"role": "user", "parts": [{"text": "!!! ABSOLUTELY NO PLANNING - EMIT THE TOOL CALL NOW !!! IMPORTANT: The user is requesting a modification to existing files. You MUST use tools (exec_command, read_files, write, etc.) to make the changes RIGHT NOW. Do NOT just describe what to do — actually CALL THE TOOLS IN THIS RESPONSE. IMMEDIATELY INSPECT THE FILE OR LIST FILES USING exec_command TOOL CALL."}]})
print(f"[antigravity] edit-intent detected; injected tool-use nudge", file=sys.stderr)
if OAUTH_PROVIDER == "google-antigravity":
import hashlib
ag_key = _antigravity_loop_key(self._session_id)
with _ANTIGRAVITY_LOOP_TRACKER_LOCK:
if ag_key not in _ANTIGRAVITY_LOOP_TRACKER:
_ANTIGRAVITY_LOOP_TRACKER[ag_key] = {
"latest_user_hash": None,
"nudge_injected": False,
"latest_user_appended": False,
"tool_calls_for_request": 0,
"repeated_tool": False,
"force_finalize": False,
"last_tool": None,
"last_tool_count": 0,
}
ag_state = _ANTIGRAVITY_LOOP_TRACKER[ag_key]
if OAUTH_PROVIDER == "google-antigravity" and isinstance(input_data, list):
latest_user = ""
latest_user_hash = None
if isinstance(input_data, list):
for item in reversed(input_data):
if item.get("type") == "message" and item.get("role") == "user":
c = item.get("content", "")
@@ -5292,6 +5457,59 @@ class Handler(http.server.BaseHTTPRequestHandler):
latest_user = "\n".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict))
break
if latest_user:
latest_norm = " ".join(latest_user.strip().split())[:200]
latest_user_hash = hashlib.sha256(latest_norm.encode()).hexdigest()[:16]
if latest_user_hash != ag_state["latest_user_hash"]:
ag_state["latest_user_hash"] = latest_user_hash
ag_state["nudge_injected"] = False
ag_state["latest_user_appended"] = False
ag_state["tool_calls_for_request"] = 0
ag_state["repeated_tool"] = False
ag_state["force_finalize"] = False
ag_state["last_tool"] = None
ag_state["last_tool_count"] = 0
if isinstance(input_data, list):
n_tool_calls = sum(1 for it in input_data if isinstance(it, dict) and it.get("type") == "function_call")
ag_state["tool_calls_for_request"] = n_tool_calls
last_tool_key = None
for item in reversed(input_data):
if isinstance(item, dict) and item.get("type") == "function_call":
fname = item.get("name", "")
args_str = json.dumps(item.get("arguments", {}), sort_keys=True)[:100]
last_tool_key = f"{fname}:{args_str}"
break
if last_tool_key:
if last_tool_key == ag_state["last_tool"]:
ag_state["last_tool_count"] += 1
if ag_state["last_tool_count"] >= 5:
ag_state["repeated_tool"] = True
ag_state["force_finalize"] = True
else:
ag_state["last_tool"] = last_tool_key
ag_state["last_tool_count"] = 1
_EDIT_WORDS = ("change", "fix", "update", "redesign", "rewrite", "modify", "improve", "replace", "edit", "make it", "add", "remove", "delete", "rename", "move", "convert")
latest_lower = ""
if isinstance(input_data, list):
for item in reversed(input_data):
if item.get("type") == "message" and item.get("role") == "user":
c = item.get("content", "")
if isinstance(c, str): latest_lower = c.lower()
elif isinstance(c, list): latest_lower = " ".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict)).lower()
break
if ag_state["force_finalize"]:
contents.append({"role": "user", "parts": [{"text": "STOP CALLING TOOLS. APPLY THE FINAL EDIT OR SUMMARIZE WHAT BLOCKED YOU. DO NOT CALL ANY MORE TOOLS. DO NOT PRODUCE ANY MORE PLANNING TEXT. DO NOT PRODUCE ANY MORE EXPLORATORY TOOL CALLS. PRODUCE A FINAL ANSWER OR A CLEAR STATEMENT OF WHAT IS PREVENTING YOU FROM COMPLETING THE TASK."}]})
elif latest_lower and any(w in latest_lower for w in _EDIT_WORDS) and not ag_state["nudge_injected"] and not ag_state["force_finalize"]:
contents.append({"role": "user", "parts": [{"text": "!!! ABSOLUTELY NO PLANNING - EMIT THE TOOL CALL NOW !!! IMPORTANT: The user is requesting a modification to existing files. You MUST use tools (exec_command, read_files, write, etc.) to make the changes RIGHT NOW. Do NOT just describe what to do — actually CALL THE TOOLS IN THIS RESPONSE. IMMEDIATELY INSPECT THE FILE OR LIST FILES USING exec_command TOOL CALL."}]})
ag_state["nudge_injected"] = True
print(f"[antigravity] edit-intent detected; injected tool-use nudge (first time for this request)", file=sys.stderr)
else:
if ag_state["nudge_injected"]:
print(f"[antigravity] edit-intent nudge already injected, skipping", file=sys.stderr)
if latest_user and not ag_state["latest_user_appended"] and not ag_state["force_finalize"]:
latest_norm = " ".join(latest_user.strip().split())[:160]
final_text = ""
if contents:
@@ -5299,10 +5517,20 @@ class Handler(http.server.BaseHTTPRequestHandler):
if last.get("role") == "user":
final_text = " ".join(json.dumps(last.get("parts", []), ensure_ascii=False).split())
if latest_norm[:120] not in final_text:
print(f"[antigravity] latest user instruction was not final turn; appending", file=sys.stderr)
print(f"[antigravity] latest user instruction was not final turn; appending (first time for this request)", file=sys.stderr)
contents.append({"role": "user", "parts": [{"text": latest_user}]})
ag_state["latest_user_appended"] = True
else:
print(f"[antigravity] latest user instruction is final turn", file=sys.stderr)
else:
if ag_state["latest_user_appended"]:
print(f"[antigravity] latest user instruction already appended, skipping", file=sys.stderr)
print(f"[antigravity-loop] latest_user_hash={latest_user_hash}", file=sys.stderr)
print(f"[antigravity-loop] tool_calls_for_request={ag_state['tool_calls_for_request']}", file=sys.stderr)
print(f"[antigravity-loop] repeated_tool={ag_state['repeated_tool']}", file=sys.stderr)
print(f"[antigravity-loop] nudge_injected={ag_state['nudge_injected']}", file=sys.stderr)
print(f"[antigravity-loop] force_finalize={ag_state['force_finalize']}", file=sys.stderr)
print(f"[{self._session_id}] [antigravity-debug] input_items={len(input_data) if isinstance(input_data, list) else 1} contents={len(contents)} latest={latest_user[:80]!r}", file=sys.stderr)
if contents:
last_c = contents[-1]
@@ -5725,9 +5953,11 @@ class Handler(http.server.BaseHTTPRequestHandler):
last_status = None
finish_reason = None
has_content = False
has_message = False
has_tool_call = False
def _observe_event(event):
nonlocal last_resp_id, last_output, last_status, finish_reason, has_content
nonlocal last_resp_id, last_output, last_status, finish_reason, has_content, has_message, has_tool_call
for line in event.strip().split("\n"):
if line.startswith("data: "):
try:
@@ -5737,7 +5967,9 @@ class Handler(http.server.BaseHTTPRequestHandler):
last_output = d.get("response", {}).get("output", [])
last_status = d.get("response", {}).get("status")
finish_reason = "length" if last_status == "incomplete" else "stop"
has_content = any(o.get("type") == "message" for o in (last_output or []))
has_tool_call = any(o.get("type") == "function_call" for o in (last_output or []))
has_message = any(o.get("type") == "message" for o in (last_output or []))
has_content = has_message or has_tool_call
except Exception:
pass
@@ -5749,7 +5981,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
break
collected_events.append(event)
_observe_event(event)
print(f"[{self._session_id}] stream ended: events={len(collected_events)} finish={finish_reason} has_content={has_content} elapsed={time.time()-t0:.1f}s", file=sys.stderr)
print(f"[{self._session_id}] stream ended: events={len(collected_events)} finish={finish_reason} has_content={has_content} has_message={has_message} has_tool_call={has_tool_call} elapsed={time.time()-t0:.1f}s", file=sys.stderr)
except (ConnectionResetError, BrokenPipeError, ConnectionAbortedError):
print("[translate-proxy] client disconnected during stream", file=sys.stderr)
_crof_record(model, n_items, False)
@@ -5805,6 +6037,8 @@ class Handler(http.server.BaseHTTPRequestHandler):
last_resp_id = last_output = last_status = None
finish_reason = None
has_content = False
has_message = False
has_tool_call = False
for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")):
collected_events.append(event)
_observe_event(event)
@@ -5813,7 +6047,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
print(f"[provider-sensor] synthetic retry failed: {e}", file=sys.stderr)
# Auto-retry on finish_reason=length with no content due to too much context.
if finish_reason == "length" and not has_content and isinstance(input_data, list) and len(input_data) > 5 and TARGET_URL and "crof.ai" in TARGET_URL:
if finish_reason == "length" and not has_content and isinstance(input_data, list) and len(input_data) > 5:
print(f"[crof-adaptive] RETRY: finish_reason=length with no content, compacting {n_items} items", file=sys.stderr)
new_input = _crof_compact_for_retry(input_data, model)
if len(new_input) < len(input_data):
@@ -5836,6 +6070,8 @@ class Handler(http.server.BaseHTTPRequestHandler):
last_resp_id = last_output = last_status = None
finish_reason = None
has_content = False
has_message = False
has_tool_call = False
for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")):
collected_events.append(event)
_observe_event(event)
@@ -5943,9 +6179,17 @@ class Handler(http.server.BaseHTTPRequestHandler):
_smart_attempt = 0
while _smart_attempt < _smart_max:
_has_tool_calls_in_output = any(o.get("type") == "function_call" for o in (last_output or []))
last_text = ""
for o in (last_output or []):
if o.get("type") == "message":
for c in (o.get("content") or []):
if isinstance(c, dict) and c.get("type") == "output_text":
last_text += c.get("text", "")
_looks_like_tools = _text_looks_like_tool_calls(last_text)
_has_prior_tool_ctx = has_function_call_output(input_data)
if not (finish_reason == "stop" and has_content and not _has_tool_calls_in_output
and isinstance(input_data, list) and len(input_data) >= 3
and has_function_call_output(input_data)):
and (_has_prior_tool_ctx or _looks_like_tools)):
break
_smart_attempt += 1
_nudges = [
@@ -5954,12 +6198,6 @@ class Handler(http.server.BaseHTTPRequestHandler):
]
nudge_text = _nudges[min(_smart_attempt - 1, len(_nudges) - 1)]
# Try extracting XML tool calls from text as fallback before nudging
last_text = ""
for o in (last_output or []):
if o.get("type") == "message":
for c in (o.get("content") or []):
if isinstance(c, dict) and c.get("type") == "output_text":
last_text += c.get("text", "")
xml_fc = _extract_xml_tool_calls(last_text)
if xml_fc:
print(f"[{self._session_id}] [smart-continue] extracted {len(xml_fc)} XML tool calls from text, injecting and retrying", file=sys.stderr)
@@ -5979,6 +6217,8 @@ class Handler(http.server.BaseHTTPRequestHandler):
last_resp_id = last_output = last_status = None
finish_reason = None
has_content = False
has_message = False
has_tool_call = False
for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")):
collected_events.append(event)
_observe_event(event)
@@ -5988,19 +6228,21 @@ class Handler(http.server.BaseHTTPRequestHandler):
print(f"[{self._session_id}] [smart-continue] XML injection retry failed: {e}", file=sys.stderr)
break
_nudge_msg = {"role": "user", "content": nudge_text}
nudge_messages = oa_input_to_messages(input_data) + [_nudge_msg]
nudge_messages = oa_input_to_messages(_strip_images_from_input(input_data, model)) + [_nudge_msg]
instructions = body.get("instructions", "").strip()
if instructions:
nudge_messages.insert(0, {"role": "system", "content": instructions})
nudge_chat_body = self._build_chat_body(model, nudge_messages, body, stream)
nudge_req = urllib.request.Request(target, data=json.dumps(nudge_chat_body).encode(), headers=fwd)
print(f"[{self._session_id}] [smart-continue] attempt {_smart_attempt}/{_smart_max}: model stopped mid-task, nudging", file=sys.stderr)
print(f"[{self._session_id}] [smart-continue] attempt {_smart_attempt}/{_smart_max}: model stopped mid-task (prior_ctx={_has_prior_tool_ctx} text_tools={_looks_like_tools}), nudging", file=sys.stderr)
try:
retry_upstream = urllib.request.urlopen(nudge_req, timeout=_upstream_timeout(body, True))
collected_events = []
last_resp_id = last_output = last_status = None
finish_reason = None
has_content = False
has_message = False
has_tool_call = False
for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")):
collected_events.append(event)
_observe_event(event)