8806 lines
432 KiB
Python
Executable File
8806 lines
432 KiB
Python
Executable File
#!/usr/bin/env python3
|
||
"""
|
||
translate-proxy.py — Responses API → backend API translation proxy.
|
||
|
||
Backends:
|
||
openai-compat — any OpenAI-compatible Chat Completions API
|
||
anthropic — Anthropic Messages API
|
||
command-code — CommandCode /alpha/generate (Z.AI GLM Coding Plan)
|
||
|
||
Usage:
|
||
python3 translate-proxy.py --config proxy-config.json
|
||
python3 translate-proxy.py --backend command-code --target-url https://... --api-key sk-...
|
||
|
||
═══════════════════════════════════════════════════════════════════
|
||
COMMANDCODE ADAPTER — FIX HISTORY (2026-05-22)
|
||
═══════════════════════════════════════════════════════════════════
|
||
|
||
This file contains multiple rounds of fixes for the CommandCode adapter.
|
||
Each fix addresses a specific failure mode observed in production.
|
||
They are documented here for future maintainability.
|
||
|
||
FIX 1: Content blocks rejected by CC API (root cause of initial 400 errors)
|
||
Symptom: {"error":{"message":"params.messages[i].content expected string, received array"}}
|
||
Cause: cc_input_to_messages emitted tool results as content blocks [{"type":"tool_result",...}]
|
||
Fix: All messages now use string content. Tool results as role="user" with plain text.
|
||
Location: cc_input_to_messages() ~line 1085
|
||
|
||
FIX 2: x-command-code-version header dropped during rewrite
|
||
Symptom: HTTP 403 upgrade_required from CommandCode API
|
||
Cause: _handle_command_code rewrite removed the header line
|
||
Fix: Always send x-command-code-version header with fallback "0.26.8"
|
||
Location: _handle_command_code() header setup block
|
||
|
||
FIX 3: Stale schema cache with wrong content_type=array
|
||
Symptom: SchemaAdapter used content_type="array" causing content blocks in auto path
|
||
Cause: ErrorAnalyzer learned incorrect schema from error message text
|
||
Fix: Cleared provider-caps.json; added 24h staleness TTL to _load_schema()
|
||
Location: _load_schema(), provider-caps.json
|
||
|
||
FIX 4: Stream disconnect before completion (client-side "stream disconnected")
|
||
Symptom: Client sees partial SSE then connection close, no response.completed event
|
||
Cause: No try/except around streaming path; exceptions crashed handler mid-stream
|
||
Fix: Wrapped stream_buffered_events in try/except; sends response.completed(status:"failed") on crash
|
||
Location: _handle_command_code() streaming section
|
||
|
||
FIX 5: Tool calls echoed as text instead of being parsed (THE BIG ONE)
|
||
Symptom: Model generates inline JSON tool calls like {"type":"tool-call","id":"...","name":"exec_command","arguments":"{...}"}
|
||
These appear as raw text in the conversation. The tool is never executed.
|
||
Root cause chain:
|
||
a) cc_input_to_messages sends tool calls as inline JSON text in assistant messages
|
||
b) The CC model echoes back similar JSON in its text-delta response
|
||
c) _parse_commandcode_text_tool_calls only handled XML format (```
|
||
<tool>``)
|
||
d) Raw JSON tool calls passed through as plain text → client shows them unparsed
|
||
Fix: Added _extract_raw_json_tool_calls() with field-level regex extraction.
|
||
Handles BOTH malformed (unescaped inner quotes) AND properly escaped JSON.
|
||
Three-tier parse: direct json.loads → unescape \"→\" → unicode_escape decode.
|
||
Location: _extract_args(), _extract_field(), _extract_raw_json_tool_calls()
|
||
|
||
FIX 6: Double-wrapped arguments (nested {"cmd": "{\"cmd\": \"curl...\"}"}")
|
||
Symptom: args={"cmd": "{\\\"cmd\\\": \\\"curl...\\\"}"}
|
||
Tool executor receives cmd = the literal string '{"cmd": "curl..."', not the actual curl command.
|
||
Root cause: When model generates properly escaped JSON ("arguments": "{\\"cmd\\": \\"...\\"}"),
|
||
_extract_args naive brace-counting returns raw text with escaped quotes.
|
||
json.loads(raw) fails on \\ at structural level.
|
||
Fallback sets args["cmd"] = raw_string → double-wrapped.
|
||
Fix: _extract_args now tries 3 parse strategies before returning.
|
||
Also normalizes sandbox_permissions from parsed args dict (not raw snippet).
|
||
Location: _extract_args() three-tier parser, sandbox_permissions normalization
|
||
|
||
FIX 7: _extract_field can't read values starting with \"
|
||
Symptom: sandbox_permissions="allow_all" passes through unnormalized because
|
||
_extract_field sees val_start=\\ (backslash) which != \" or { → returns None
|
||
Fix: Skip leading backslash before checking for " or { value type.
|
||
Location: _extract_field() leading-backslash skip
|
||
|
||
FIX 8: Adaptive probing caused format mismatch (REVERTED)
|
||
Symptom: Probe system discovered OpenAI tool_calls+role=tool format but CC API couldn't
|
||
process multi-turn tool loops correctly with it.
|
||
Fix: Removed probe system entirely. Use conservative format only:
|
||
- Inline JSON text for tool calls (cc_input_to_messages default)
|
||
- role="user" for all tool results
|
||
- ErrorAnalyzer learning on retries (not proactive probes)
|
||
Location: Reverted to cc_input_to_messages(), removed _build_cc_messages + _probe_cc_format
|
||
|
||
FIX 21: DSML parser silently drops tool calls when model uses name="cmd" (THE HALT BUG)
|
||
Symptom: Codex CLI stops mid-task. Model generates valid DSML exec_command with
|
||
<||DSML||parameter name="cmd" string="true">curl ...
|
||
Parser returns parsed_tool_calls=0. Client sees text output but no tool to execute.
|
||
CLI has nothing to do and halts.
|
||
Root cause: Line 1798 had `if key == "command":` — only matching parameter name="command".
|
||
The actual tool schema defines the parameter as "cmd" (see exec_command schema).
|
||
When DeepSeek generates name="cmd", the key "cmd" != "command", so cmd stays None,
|
||
and line 1825-1826 `if not cmd: continue` silently skips the entire tool call.
|
||
The XML parser (line 2205) already handled both: `params.get("command") or params.get("cmd")`
|
||
but the DSML parser did not.
|
||
Fix: Changed to `if key in ("command", "cmd"):` in the DSML parameter loop.
|
||
Test: Pattern L self-test verifies DSML with name="cmd" is parsed correctly.
|
||
Location: _parse_commandcode_text_tool_calls() DSML parameter loop, self-test Pattern L
|
||
|
||
════════════════════════════════════════════════════════════════════
|
||
INTELLIGENCE ROUTING — Self-Healing Parser System (v3.7.0)
|
||
════════════════════════════════════════════════════════════════════
|
||
|
||
Problem: The Command Code model produces output in unpredictable formats
|
||
that change between sessions and models. When the multi-format parser chain
|
||
(DSML → <bash> → <explore_agent> → <tool_call type=...> → XML → raw JSON →
|
||
fallback regex) returns empty, the Codex agent loop has zero tool calls and
|
||
STALLS — the user sees the model "thinking" but nothing happens.
|
||
|
||
Intelligence Routing is a three-layer self-healing system:
|
||
|
||
LAYER 1 — Deep URL Extraction (FIX 23)
|
||
The <explore_agent> handler was failing because URLs were hidden inside
|
||
nested JSON: messages: [{"content": "https://..."}]. The regex couldn't
|
||
find them because it excluded the " character that terminates JSON values.
|
||
|
||
Solution: _build_explore_cmd() is now a module-level function (was a
|
||
closure). After the initial regex fails, it tries json.loads() on the
|
||
text, iterates list items, and extracts the "content" field to find URLs.
|
||
Also added " to the regex exclusion set and rstrip characters.
|
||
|
||
LAYER 2 — Escalation Block Handling (FIX 24)
|
||
The model produces <require_escalation> and <request_escalation_permission>
|
||
blocks when it wants elevated permissions. The CC adapter doesn't support
|
||
escalation — these blocks were silently dropped, causing parsed_tool_calls=0.
|
||
|
||
Solution: Two handlers:
|
||
- FIX 24a: Closed-tag blocks — extracts URL if present, runs explore cmd;
|
||
otherwise echoes auto-proceed message.
|
||
- FIX 24b: Bare/unclosed tags (<require_escalation />) — auto-proceeds.
|
||
|
||
LAYER 3 — Intent-Based Command Synthesis (FIX 25, THE CORE)
|
||
When ALL parsers return empty and text has content, the system plays
|
||
detective using 5 heuristics in priority order:
|
||
|
||
1. URL detected in text → curl to fetch it
|
||
2. File path reference → cat or ls that file
|
||
3. Shell command in backticks/quotes → extract and run
|
||
4. "explore"/"fetch"/"investigate" intent + last user URL → explore cmd
|
||
5. "I need to"/"let me"/"please" intent text → echo diagnostic
|
||
|
||
This ensures the agent loop ALWAYS has a tool call to execute, even when
|
||
the model's output format is completely unrecognized. The loop never stalls.
|
||
|
||
Architecture:
|
||
_parse_commandcode_text_tool_calls() — LAYER 1 + LAYER 2
|
||
cc_stream_to_sse() — LAYER 3 (runs after parser chain + fallback)
|
||
|
||
The _last_user_urls deque (maxlen=20) tracks URLs from user messages
|
||
across the session, giving Layer 3 heuristic 4 a URL to work with.
|
||
|
||
Self-tests: 54 patterns (was 41) covering all three layers.
|
||
|
||
════════════════════════════════════════════════════════════════════
|
||
"""
|
||
|
||
import json, http.server, socketserver, urllib.request, urllib.parse, urllib.error, re
|
||
import time, uuid, os, sys, argparse, threading, socket, collections, contextlib, signal
|
||
import secrets, string, hashlib
|
||
import dataclasses
|
||
import http.client
|
||
import selectors
|
||
import tempfile
|
||
|
||
_IS_WINDOWS = sys.platform == "win32"
|
||
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
# Lazy gRPC import for Antigravity fallback
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
_antigravity_grpc_client = None
|
||
_antigravity_grpc_available = None
|
||
|
||
def _get_grpc_client():
|
||
"""Lazy-load the Antigravity gRPC client. Returns None if grpcio is not installed."""
|
||
global _antigravity_grpc_client, _antigravity_grpc_available
|
||
if _antigravity_grpc_available is False:
|
||
return None
|
||
if _antigravity_grpc_client is not None:
|
||
return _antigravity_grpc_client
|
||
try:
|
||
# Add the src directory to sys.path so antigravity_grpc package is found
|
||
_src_dir = os.path.dirname(os.path.abspath(__file__))
|
||
if _src_dir not in sys.path:
|
||
sys.path.insert(0, _src_dir)
|
||
from antigravity_grpc import is_grpc_available, AntigravityGrpcClient, get_client
|
||
if is_grpc_available():
|
||
_antigravity_grpc_client = get_client()
|
||
_antigravity_grpc_available = True
|
||
print("[antigravity-grpc] gRPC fallback module loaded OK", file=sys.stderr)
|
||
return _antigravity_grpc_client
|
||
else:
|
||
_antigravity_grpc_available = False
|
||
print("[antigravity-grpc] grpcio available but stubs failed to load, gRPC fallback disabled", file=sys.stderr)
|
||
return None
|
||
except ImportError as e:
|
||
_antigravity_grpc_available = False
|
||
print(f"[antigravity-grpc] grpcio not installed ({e}), gRPC fallback disabled", file=sys.stderr)
|
||
return None
|
||
|
||
# Reverse alias map: REST slug → gRPC display name
|
||
# gRPC uses display names (e.g. "Gemini 3.5 Flash (High)") while REST uses slugs (e.g. "gemini-3-flash")
|
||
_GRPC_REVERSE_ALIAS = {
|
||
"gemini-3-flash": "Gemini 3.5 Flash (High)",
|
||
"gemini-3.5-flash-low": "Gemini 3.5 Flash (Low)",
|
||
"gemini-3.1-pro-low": "Gemini 3.1 Pro (High)",
|
||
"claude-sonnet-4-6": "Claude Sonnet 4.6 (Thinking)",
|
||
"claude-opus-4-6-thinking": "Claude Opus 4.6 (Thinking)",
|
||
"gpt-oss-120b-medium": "GPT-OSS 120B (Medium)",
|
||
"gemini-2.5-flash": "Gemini 2.5 Flash",
|
||
"gemini-2.5-pro": "Gemini 2.5 Pro",
|
||
"gemini-2.5-flash-lite": "Gemini 2.5 Flash Lite",
|
||
}
|
||
|
||
# Errors from REST that should trigger gRPC fallback
|
||
_GRPC_FALLBACK_REST_ERRORS = {404} # Model not found via REST (model exists in gRPC but not REST)
|
||
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
# Config
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
|
||
DEFAULT_MODELS = {
|
||
"openai-compat": [
|
||
{"id": "gpt-4o-mini", "object": "model", "created": 1700000000, "owned_by": "custom"},
|
||
],
|
||
"anthropic": [
|
||
{"id": "claude-sonnet-4-20250514", "object": "model", "created": 1700000000, "owned_by": "anthropic"},
|
||
],
|
||
"codebuff": [
|
||
{"id": "deepseek/deepseek-v4-pro", "object": "model", "created": 1700000000, "owned_by": "codebuff"},
|
||
{"id": "deepseek/deepseek-v4-flash", "object": "model", "created": 1700000000, "owned_by": "codebuff"},
|
||
{"id": "moonshotai/kimi-k2.6", "object": "model", "created": 1700000000, "owned_by": "codebuff"},
|
||
{"id": "minimax/minimax-m2.7", "object": "model", "created": 1700000000, "owned_by": "codebuff"},
|
||
],
|
||
"auto": [
|
||
{"id": "default-model", "object": "model", "created": 1700000000, "owned_by": "auto"},
|
||
],
|
||
}
|
||
|
||
def load_config():
|
||
global _CONFIG_PATH, _CONFIG_MTIME
|
||
p = argparse.ArgumentParser(description="Responses API translation proxy")
|
||
p.add_argument("--config", help="JSON config file path")
|
||
p.add_argument("--port", type=int, default=None)
|
||
p.add_argument("--backend", default=None, choices=["openai-compat", "anthropic", "command-code", "codebuff", "freebuff", "auto"])
|
||
p.add_argument("--target-url", default=None)
|
||
p.add_argument("--api-key", default=None)
|
||
p.add_argument("--models-file", default=None, help="JSON file with model list array")
|
||
_args = p.parse_args()
|
||
|
||
cfg = {}
|
||
if _args.config:
|
||
_CONFIG_PATH = os.path.abspath(_args.config)
|
||
with open(_args.config) as f:
|
||
cfg = json.load(f)
|
||
try:
|
||
_CONFIG_MTIME = os.path.getmtime(_CONFIG_PATH)
|
||
except OSError:
|
||
pass
|
||
|
||
for ck, ak in [("port", "port"), ("backend_type", "backend"),
|
||
("target_url", "target_url"), ("api_key", "api_key")]:
|
||
v = getattr(_args, ak, None)
|
||
if v is not None:
|
||
cfg[ck] = v
|
||
|
||
env_map = {
|
||
"port": ("PROXY_PORT", "ZAI_PROXY_PORT", int),
|
||
"backend_type": ("PROXY_BACKEND", None, str),
|
||
"target_url": ("PROXY_TARGET_URL", "ZAI_BASE_URL", str),
|
||
"api_key": ("PROXY_API_KEY", "ZAI_API_KEY", str),
|
||
"vision_fallback_url": ("VISION_FALLBACK_URL", None, str),
|
||
"vision_fallback_model": ("VISION_FALLBACK_MODEL", None, str),
|
||
"vision_fallback_key": ("VISION_FALLBACK_KEY", None, str),
|
||
}
|
||
for ck, (ev1, ev2, conv) in env_map.items():
|
||
if ck not in cfg:
|
||
v = os.environ.get(ev1) or (os.environ.get(ev2) if ev2 else None)
|
||
if v:
|
||
cfg[ck] = conv(v) if conv == int else v
|
||
|
||
cfg.setdefault("port", 8080)
|
||
cfg.setdefault("backend_type", "openai-compat")
|
||
cfg.setdefault("target_url", "http://localhost:11434/v1")
|
||
cfg.setdefault("api_key", "")
|
||
|
||
models = cfg.get("models", [])
|
||
if not models and _args.models_file:
|
||
with open(_args.models_file) as f:
|
||
models = json.load(f)
|
||
if not models:
|
||
models = DEFAULT_MODELS.get(cfg["backend_type"], [])
|
||
cfg["models"] = models
|
||
|
||
return cfg
|
||
|
||
CONFIG = None
|
||
_CONFIG_PATH = None
|
||
_CONFIG_MTIME = 0
|
||
PORT = 8080
|
||
BACKEND = "openai-compat"
|
||
TARGET_URL = ""
|
||
API_KEY = ""
|
||
OAUTH_PROVIDER = ""
|
||
MODELS = []
|
||
CC_VERSION = ""
|
||
REASONING_ENABLED = True
|
||
REASONING_EFFORT = "medium"
|
||
FORCE_MODEL = ""
|
||
BGP_ROUTES = []
|
||
PROMPT_ENHANCER = False
|
||
PROMPT_ENHANCER_MODE = "offline"
|
||
PROMPT_ENHANCER_MODEL = ""
|
||
PROMPT_ENHANCER_URL = ""
|
||
PROMPT_ENHANCER_KEY = ""
|
||
VISION_FALLBACK_URL = ""
|
||
VISION_FALLBACK_MODEL = ""
|
||
VISION_FALLBACK_KEY = ""
|
||
SERVER = None
|
||
|
||
if _IS_WINDOWS:
|
||
_LOG_DIR = os.path.join(os.environ.get("LOCALAPPDATA", os.path.expanduser("~")), "codex-proxy")
|
||
else:
|
||
_LOG_DIR = os.path.join(os.path.expanduser("~"), ".cache", "codex-proxy")
|
||
os.makedirs(_LOG_DIR, exist_ok=True)
|
||
_REQUESTS_DIR = os.path.join(_LOG_DIR, "requests")
|
||
os.makedirs(_REQUESTS_DIR, exist_ok=True)
|
||
try:
|
||
for _f in os.listdir(_REQUESTS_DIR):
|
||
if _f.endswith(".tmp"):
|
||
os.remove(os.path.join(_REQUESTS_DIR, _f))
|
||
except Exception:
|
||
pass
|
||
_stats_path = os.path.join(_LOG_DIR, "usage-stats.json")
|
||
_provider_caps_path = os.path.join(_LOG_DIR, "provider-caps.json")
|
||
_stats_lock = threading.Lock()
|
||
_stats_pending = []
|
||
_stats_flush_timer = None
|
||
_STATS_FLUSH_INTERVAL = 5.0
|
||
_STATS = {}
|
||
|
||
try:
|
||
_LOG_FILE = open(os.path.join(_LOG_DIR, "proxy.log"), "a", encoding="utf-8")
|
||
except Exception:
|
||
_LOG_FILE = None
|
||
|
||
_response_store = collections.OrderedDict()
|
||
_response_store_lock = threading.Lock()
|
||
_MAX_STORED = 50
|
||
_RESPONSE_TTL = 600
|
||
|
||
_fb_reasoning_store = collections.OrderedDict()
|
||
_fb_reasoning_store_lock = threading.Lock()
|
||
|
||
_deepseek_reasoning_store = {}
|
||
_deepseek_reasoning_lock = threading.Lock()
|
||
_MAX_DS_STORED = 100
|
||
|
||
_last_reasoning_store = {}
|
||
_last_reasoning_lock = threading.Lock()
|
||
|
||
_crof_lock = threading.Lock()
|
||
_provider_caps_lock = threading.Lock()
|
||
_provider_caps = None
|
||
|
||
_shutdown_requested = False
|
||
_active_connections = 0
|
||
_active_connections_lock = threading.Lock()
|
||
_active_requests = {}
|
||
_active_requests_lock = threading.Lock()
|
||
|
||
_pool = uuid.uuid4().hex[:8]
|
||
_antigravity_version = "2.0.1"
|
||
_antigravity_version_checked = 0
|
||
_antigravity_version_lock = threading.Lock()
|
||
_antigravity_version_validated = False
|
||
_last_user_urls = collections.deque(maxlen=20)
|
||
|
||
_conn_pool_lock = threading.Lock()
|
||
_conn_pool = {}
|
||
|
||
_STREAM_IDLE_TIMEOUT = 300
|
||
|
||
def _idle_timeout_for_model(model, default=300):
|
||
if not model:
|
||
return default
|
||
m = model.lower()
|
||
if "flash" in m or "mini" in m or "haiku" in m:
|
||
return 120
|
||
return default
|
||
_MAX_CONCURRENT_REQUESTS = 3
|
||
_request_semaphore = threading.Semaphore(_MAX_CONCURRENT_REQUESTS)
|
||
|
||
_CODEBUFF_AUTH_URL = "https://www.codebuff.com"
|
||
_CODEBUFF_API_URL = "https://www.codebuff.com"
|
||
_CODEBUFF_AGENT_MAP = {
|
||
"deepseek/deepseek-v4-pro": "base2-free-deepseek",
|
||
"deepseek/deepseek-v4-flash": "base2-free-deepseek-flash",
|
||
"moonshotai/kimi-k2.6": "base2-free-kimi",
|
||
"minimax/minimax-m2.7": "base2-free",
|
||
}
|
||
if _IS_WINDOWS:
|
||
_CODEBUFF_CREDS_PATH = os.path.join(os.environ.get("APPDATA", os.path.expanduser("~")), "manicode", "credentials.json")
|
||
else:
|
||
_CODEBUFF_CREDS_PATH = os.path.join(os.path.expanduser("~"), ".config", "manicode", "credentials.json")
|
||
_codebuff_token_cache = {"token": None, "checked": 0}
|
||
_codebuff_session_cache = {"instance_id": None, "expires": 0, "model": None}
|
||
_codebuff_token_lock = threading.Lock()
|
||
|
||
def _get_codebuff_token():
|
||
with _codebuff_token_lock:
|
||
if _codebuff_token_cache["token"] and _codebuff_token_cache["checked"] > time.time() - 300:
|
||
return _codebuff_token_cache["token"]
|
||
try:
|
||
with open(_CODEBUFF_CREDS_PATH) as f:
|
||
creds = json.load(f)
|
||
default_account = creds.get("default", {})
|
||
token = default_account.get("authToken") or creds.get("apiKey") or ""
|
||
with _codebuff_token_lock:
|
||
_codebuff_token_cache["token"] = token
|
||
_codebuff_token_cache["checked"] = time.time()
|
||
return token
|
||
except Exception as e:
|
||
print(f"[codebuff] no credentials at {_CODEBUFF_CREDS_PATH}: {e}", file=sys.stderr)
|
||
return ""
|
||
|
||
def _codebuff_get_session(token, model):
|
||
with _codebuff_token_lock:
|
||
sc = _codebuff_session_cache
|
||
if sc["instance_id"] and sc["expires"] > time.time() + 60 and sc["model"] == model:
|
||
return sc["instance_id"]
|
||
try:
|
||
url = f"{_CODEBUFF_API_URL}/api/v1/freebuff/session"
|
||
body = json.dumps({}).encode()
|
||
req = urllib.request.Request(url, data=body, headers={
|
||
"Content-Type": "application/json",
|
||
"Authorization": f"Bearer {token}",
|
||
"User-Agent": "ai-sdk/openai-compatible/1.0.25/codebuff",
|
||
"x-codebuff-model": model,
|
||
})
|
||
try:
|
||
resp = urllib.request.urlopen(req, timeout=15)
|
||
except urllib.error.HTTPError as e:
|
||
err_body = e.read().decode()[:1000]
|
||
if e.code == 429:
|
||
retry_s = 120
|
||
user_msg = ""
|
||
try:
|
||
err_data = json.loads(err_body)
|
||
retry_ms = err_data.get("retryAfterMs", 0)
|
||
if retry_ms:
|
||
retry_s = retry_ms / 1000
|
||
user_msg = err_data.get("message", err_data.get("error", ""))
|
||
if isinstance(user_msg, dict):
|
||
user_msg = user_msg.get("message", "")
|
||
except Exception:
|
||
pass
|
||
if not user_msg:
|
||
user_msg = _sanitize_err_body(err_body)
|
||
raise RateLimitError(retry_s, user_msg)
|
||
print(f"[codebuff] session HTTP {e.code}: {err_body[:200]}", file=sys.stderr)
|
||
return None
|
||
data = json.loads(resp.read())
|
||
instance_id = data.get("instanceId", data.get("data", {}).get("instance_id", ""))
|
||
expires_at = data.get("remainingMs", 0)
|
||
if instance_id:
|
||
with _codebuff_token_lock:
|
||
_codebuff_session_cache["instance_id"] = instance_id
|
||
_codebuff_session_cache["expires"] = time.time() + min(expires_at / 1000, 3600)
|
||
_codebuff_session_cache["model"] = model
|
||
print(f"[codebuff] session active, instance={instance_id[:8]}...", file=sys.stderr)
|
||
return instance_id
|
||
return None
|
||
except RateLimitError:
|
||
raise
|
||
except Exception as e:
|
||
print(f"[codebuff] session failed: {e}", file=sys.stderr)
|
||
return None
|
||
|
||
def _codebuff_start_run(token, agent_id):
|
||
url = f"{_CODEBUFF_API_URL}/api/v1/agent-runs"
|
||
body = json.dumps({"action": "START", "agentId": agent_id, "ancestorRunIds": []}).encode()
|
||
req = urllib.request.Request(url, data=body, headers={
|
||
"Content-Type": "application/json",
|
||
"Authorization": f"Bearer {token}",
|
||
"User-Agent": "ai-sdk/openai-compatible/1.0.25/codebuff",
|
||
})
|
||
try:
|
||
resp = urllib.request.urlopen(req, timeout=15)
|
||
data = json.loads(resp.read())
|
||
run_id = data.get("runId")
|
||
print(f"[codebuff] started run {run_id} for agent {agent_id}", file=sys.stderr)
|
||
return run_id, None
|
||
except urllib.error.HTTPError as e:
|
||
err = e.read().decode()[:500]
|
||
print(f"[codebuff] start run failed: HTTP {e.code}: {err}", file=sys.stderr)
|
||
if e.code == 429:
|
||
retry_s = 120
|
||
try:
|
||
err_data = json.loads(err)
|
||
retry_ms = err_data.get("retryAfterMs", 0)
|
||
if retry_ms:
|
||
retry_s = retry_ms / 1000
|
||
except Exception:
|
||
pass
|
||
return None, ("rate_limit_error", 429, retry_s, _sanitize_err_body(err))
|
||
return None, ("upstream_error", e.code, 0, _sanitize_err_body(err))
|
||
except Exception as e:
|
||
print(f"[codebuff] start run error: {e}", file=sys.stderr)
|
||
return None, ("proxy_error", 502, 0, str(e))
|
||
|
||
def _codebuff_finish_run(token, run_id, status="completed"):
|
||
url = f"{_CODEBUFF_API_URL}/api/v1/agent-runs"
|
||
body = json.dumps({"action": "FINISH", "runId": run_id, "status": status,
|
||
"totalSteps": 1, "directCredits": 0, "totalCredits": 0}).encode()
|
||
req = urllib.request.Request(url, data=body, headers={
|
||
"Content-Type": "application/json",
|
||
"Authorization": f"Bearer {token}",
|
||
"User-Agent": "ai-sdk/openai-compatible/1.0.25/codebuff",
|
||
})
|
||
try:
|
||
urllib.request.urlopen(req, timeout=10)
|
||
except Exception as e:
|
||
print(f"[codebuff] finish run {run_id} error: {e}", file=sys.stderr)
|
||
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
# Multi-account rotation system
|
||
class RateLimitError(Exception):
|
||
def __init__(self, retry_seconds, message=""):
|
||
self.retry_seconds = retry_seconds
|
||
self.message = message
|
||
super().__init__(f"rate-limited for {retry_seconds:.0f}s: {message}")
|
||
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
|
||
class AccountPool:
|
||
"""Manages multiple accounts for a provider. Rotates on rate-limit (429/426)."""
|
||
|
||
def __init__(self, provider_name):
|
||
self.provider_name = provider_name
|
||
self._lock = threading.Lock()
|
||
self._accounts = []
|
||
self._rate_limited = {}
|
||
self._current_idx = 0
|
||
self._loaded_at = 0
|
||
|
||
def load_accounts(self, force=False):
|
||
with self._lock:
|
||
if not force and self._accounts and time.time() - self._loaded_at < 60:
|
||
return len(self._accounts)
|
||
accounts = self._do_load()
|
||
with self._lock:
|
||
if accounts:
|
||
self._accounts = accounts
|
||
self._loaded_at = time.time()
|
||
for a in accounts:
|
||
key = a.get("id", a.get("email", ""))
|
||
if key not in self._rate_limited:
|
||
self._rate_limited[key] = 0
|
||
return len(self._accounts) if accounts else 0
|
||
|
||
def _do_load(self):
|
||
return []
|
||
|
||
def get(self):
|
||
"""Return the best available account dict, or None."""
|
||
self.load_accounts()
|
||
with self._lock:
|
||
if not self._accounts:
|
||
return None
|
||
now = time.time()
|
||
n = len(self._accounts)
|
||
for attempt in range(n):
|
||
idx = (self._current_idx + attempt) % n
|
||
acct = self._accounts[idx]
|
||
key = acct.get("id", acct.get("email", ""))
|
||
if self._rate_limited.get(key, 0) < now:
|
||
self._current_idx = idx
|
||
return acct
|
||
best_key = min(self._rate_limited, key=self._rate_limited.get)
|
||
wait = self._rate_limited[best_key] - now
|
||
print(f"[{self.provider_name}] all accounts rate-limited, earliest free in {wait:.0f}s", file=sys.stderr)
|
||
return self._accounts[self._current_idx]
|
||
|
||
def mark_rate_limited(self, account, duration=120):
|
||
key = account.get("id", account.get("email", ""))
|
||
with self._lock:
|
||
self._rate_limited[key] = time.time() + duration
|
||
idx = None
|
||
for i, a in enumerate(self._accounts):
|
||
if a.get("id", a.get("email", "")) == key:
|
||
idx = i
|
||
break
|
||
if idx is not None:
|
||
self._current_idx = (idx + 1) % len(self._accounts)
|
||
print(f"[{self.provider_name}] account {key} rate-limited for {duration}s, rotating to next", file=sys.stderr)
|
||
|
||
def advance(self):
|
||
with self._lock:
|
||
if self._accounts:
|
||
self._current_idx = (self._current_idx + 1) % len(self._accounts)
|
||
|
||
def status(self):
|
||
with self._lock:
|
||
now = time.time()
|
||
result = []
|
||
for a in self._accounts:
|
||
key = a.get("id", a.get("email", ""))
|
||
rl_until = self._rate_limited.get(key, 0)
|
||
info = {"id": key, "email": a.get("email", ""), "rate_limited": rl_until > now}
|
||
if rl_until > now:
|
||
info["rate_limited_until"] = rl_until
|
||
info["resets_in"] = int(rl_until - now)
|
||
result.append(info)
|
||
return result
|
||
|
||
class CodebuffAccountPool(AccountPool):
|
||
def _do_load(self):
|
||
if not os.path.exists(_CODEBUFF_CREDS_PATH):
|
||
return None
|
||
try:
|
||
with open(_CODEBUFF_CREDS_PATH) as f:
|
||
creds = json.load(f)
|
||
except Exception:
|
||
return None
|
||
accounts = []
|
||
if "accounts" in creds and isinstance(creds["accounts"], list):
|
||
for i, ac in enumerate(creds["accounts"]):
|
||
token = ac.get("authToken") or ac.get("apiKey") or ""
|
||
if token:
|
||
acct = {"id": ac.get("email") or ac.get("id") or f"account-{i}", "token": token, "email": ac.get("email", "")}
|
||
accounts.append(acct)
|
||
default = creds.get("default", {})
|
||
default_token = default.get("authToken") or creds.get("apiKey") or ""
|
||
if default_token:
|
||
default_id = default.get("email") or default.get("id") or "default"
|
||
if not any(a["id"] == default_id for a in accounts):
|
||
accounts.insert(0, {"id": default_id, "token": default_token, "email": default.get("email", "")})
|
||
return accounts if accounts else None
|
||
|
||
class GoogleAccountPool(AccountPool):
|
||
def __init__(self, variant):
|
||
super().__init__(f"google-{variant}")
|
||
self.variant = variant
|
||
|
||
def _do_load(self):
|
||
cache_dir = os.path.join(os.path.expanduser("~"), ".cache", "codex-proxy")
|
||
accounts = []
|
||
primary = f"google-{self.variant}-oauth-token.json"
|
||
primary_path = os.path.join(cache_dir, primary)
|
||
if os.path.exists(primary_path):
|
||
try:
|
||
with open(primary_path) as f:
|
||
tok = json.load(f)
|
||
token = tok.get("access_token", "")
|
||
if token:
|
||
accounts.append({"id": f"google-{self.variant}-primary", "token": token, "email": tok.get("email", ""), "_token_data": tok, "_path": primary_path})
|
||
except Exception:
|
||
pass
|
||
idx = 1
|
||
while True:
|
||
extra = f"google-{self.variant}-oauth-token-{idx}.json"
|
||
extra_path = os.path.join(cache_dir, extra)
|
||
if not os.path.exists(extra_path):
|
||
break
|
||
try:
|
||
with open(extra_path) as f:
|
||
tok = json.load(f)
|
||
token = tok.get("access_token", "")
|
||
if token:
|
||
accounts.append({"id": f"google-{self.variant}-{idx}", "token": token, "email": tok.get("email", ""), "_token_data": tok, "_path": extra_path})
|
||
except Exception:
|
||
pass
|
||
idx += 1
|
||
return accounts if accounts else None
|
||
|
||
class APIKeyPool(AccountPool):
|
||
"""Rotates through comma-separated API keys."""
|
||
|
||
def __init__(self, provider_name, keys_str):
|
||
super().__init__(provider_name)
|
||
self._raw_keys = [k.strip() for k in keys_str.split(",") if k.strip()]
|
||
self._accounts = [{"id": f"key-{i}", "token": k, "email": f"key-{i}"} for i, k in enumerate(self._raw_keys)]
|
||
for a in self._accounts:
|
||
self._rate_limited[a["id"]] = 0
|
||
self._loaded_at = time.time()
|
||
|
||
def load_accounts(self, force=False):
|
||
return len(self._accounts)
|
||
|
||
_cb_pool = CodebuffAccountPool("codebuff")
|
||
_google_antigravity_pool = GoogleAccountPool("antigravity")
|
||
_google_cli_pool = GoogleAccountPool("cli")
|
||
_antigravity_preferred_endpoint = None
|
||
_antigravity_endpoint_lock = threading.Lock()
|
||
|
||
def _classify_antigravity_error(status_code, body):
|
||
lower = body.lower()
|
||
if status_code == 400:
|
||
return "bad_request"
|
||
if status_code == 401:
|
||
if any(x in lower for x in ["invalid_grant", "token revoked", "token_revoked", "invalid_client"]):
|
||
return "auth_permanent"
|
||
return "auth_transient"
|
||
if status_code == 403:
|
||
if "validation_required" in lower or "account_disabled" in lower:
|
||
return "validation_required"
|
||
if "has been disabled" in lower and "violation of terms of service" in lower:
|
||
return "account_banned"
|
||
if "service_disabled" in lower:
|
||
return "service_disabled"
|
||
return "forbidden"
|
||
if status_code in (429, 503, 529):
|
||
if any(x in lower for x in ["model_capacity_exhausted", "capacity_exhausted", "model is currently overloaded", "service temporarily unavailable"]):
|
||
return "capacity_exhausted"
|
||
if any(x in lower for x in ["quota_exhausted", "resource_exhausted", "daily limit", "quota exceeded", "quotaresetdelay"]):
|
||
return "quota_exhausted"
|
||
return "rate_limited"
|
||
if status_code >= 500:
|
||
return "server_error"
|
||
return "unknown"
|
||
|
||
def _parse_rate_limit_reset(body):
|
||
import re as _re
|
||
m = _re.search(r'quotaResetDelay[:"\s]+(\d+(?:\.\d+)?)(ms|s)', body, _re.IGNORECASE)
|
||
if m:
|
||
val = float(m.group(1))
|
||
return val / 1000 if m.group(2) == 'ms' else val
|
||
m = _re.search(r'(\d+)h(\d+)m(\d+)s', body, _re.IGNORECASE)
|
||
if m:
|
||
return int(m.group(1)) * 3600 + int(m.group(2)) * 60 + int(m.group(3))
|
||
m = _re.search(r'Resets in ~(\d+)h(\d+)m', body, _re.IGNORECASE)
|
||
if m:
|
||
return int(m.group(1)) * 3600 + int(m.group(2)) * 60
|
||
m = _re.search(r'retry[-_]?after[:\s]+(\d+)\s*(?:sec|s\b)', body, _re.IGNORECASE)
|
||
if m:
|
||
return int(m.group(1))
|
||
return None
|
||
|
||
def _get_codebuff_account():
|
||
"""Return (token, account_dict) for best available codebuff account."""
|
||
_cb_pool.load_accounts()
|
||
acct = _cb_pool.get()
|
||
if not acct:
|
||
return "", None
|
||
return acct["token"], acct
|
||
|
||
def _get_google_account(oauth_provider):
|
||
"""Return (access_token, account_dict) for best available Google account."""
|
||
pool = _google_antigravity_pool if oauth_provider == "google-antigravity" else _google_cli_pool
|
||
pool.load_accounts()
|
||
acct = pool.get()
|
||
if not acct:
|
||
return None, None
|
||
token_data = acct.get("_token_data", {})
|
||
token_path = acct.get("_path", "")
|
||
if token_data and token_path:
|
||
refreshed = _refresh_google_token(token_data, token_path)
|
||
return refreshed, acct
|
||
return acct.get("token", ""), acct
|
||
|
||
def _refresh_google_token(token_data, token_path):
|
||
if token_data.get("expires_at", 0) > time.time() + 60:
|
||
return token_data.get("access_token", "")
|
||
client_id = token_data.get("client_id", "")
|
||
client_secret = token_data.get("client_secret", "")
|
||
refresh_token = token_data.get("refresh_token", "")
|
||
if not all([client_id, client_secret, refresh_token]):
|
||
return token_data.get("access_token", "")
|
||
print("[oauth] refreshing Google access token...", file=sys.stderr)
|
||
try:
|
||
data = urllib.parse.urlencode({
|
||
"client_id": client_id, "client_secret": client_secret,
|
||
"refresh_token": refresh_token, "grant_type": "refresh_token",
|
||
}).encode()
|
||
req = urllib.request.Request("https://oauth2.googleapis.com/token", data=data,
|
||
headers={"Content-Type": "application/x-www-form-urlencoded"})
|
||
resp = urllib.request.urlopen(req, timeout=30)
|
||
new_tokens = json.loads(resp.read())
|
||
token_data["access_token"] = new_tokens.get("access_token", token_data.get("access_token"))
|
||
token_data["expires_at"] = time.time() + new_tokens.get("expires_in", 3600)
|
||
with open(token_path, "w", encoding="utf-8") as f:
|
||
json.dump(token_data, f, indent=2)
|
||
print("[oauth] token refreshed OK", file=sys.stderr)
|
||
return token_data["access_token"]
|
||
except Exception as e:
|
||
print(f"[oauth] refresh failed: {e}", file=sys.stderr)
|
||
return token_data.get("access_token", "")
|
||
|
||
def _force_refresh_google_token():
|
||
token_path = os.path.join(os.path.expanduser("~"), ".cache", "codex-proxy",
|
||
"google-antigravity-oauth-token.json" if OAUTH_PROVIDER == "google-antigravity"
|
||
else "google-oauth-token.json")
|
||
try:
|
||
with open(token_path) as f:
|
||
token_data = json.load(f)
|
||
token_data["expires_at"] = 0
|
||
new_token = _refresh_google_token(token_data, token_path)
|
||
return bool(new_token)
|
||
except Exception as e:
|
||
print(f"[oauth] force refresh failed: {e}", file=sys.stderr)
|
||
return False
|
||
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
# Gemini 3 thought signature preservation
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
|
||
_gemini_sig_store = {}
|
||
_gemini_sig_lock = threading.Lock()
|
||
|
||
def _gemini_store_sig(key, signature):
|
||
if not key or not signature:
|
||
return
|
||
with _gemini_sig_lock:
|
||
_gemini_sig_store[key] = {"sig": signature, "ts": time.time()}
|
||
|
||
def _gemini_get_sig(key):
|
||
with _gemini_sig_lock:
|
||
item = _gemini_sig_store.get(key)
|
||
return item["sig"] if item else None
|
||
|
||
def _extract_gemini_sig(part):
|
||
if not isinstance(part, dict):
|
||
return None
|
||
return part.get("thoughtSignature") or part.get("thought_signature") or part.get("signature")
|
||
|
||
def _gemini_reattach_sigs(contents):
|
||
for content in contents:
|
||
for part in content.get("parts", []):
|
||
if not isinstance(part, dict):
|
||
continue
|
||
if "thoughtSignature" in part:
|
||
continue
|
||
if "functionCall" in part:
|
||
fc = part["functionCall"]
|
||
cid = fc.get("id") or fc.get("name")
|
||
if cid:
|
||
sig = _gemini_get_sig(f"fc:{cid}")
|
||
if sig:
|
||
part["thoughtSignature"] = sig
|
||
if "text" in part and content.get("role") == "model":
|
||
turn_key = content.get("_proxy_turn_key")
|
||
if turn_key:
|
||
sig = _gemini_get_sig(f"turn:{turn_key}")
|
||
if sig:
|
||
part["thoughtSignature"] = sig
|
||
return contents
|
||
|
||
# Gemini follow-through guardrail
|
||
_GEMINI_AGENT_GUARDRAIL = (
|
||
"!!! ABSOLUTELY CRITICAL - DO NOT IGNORE THIS UNDER ANY CIRCUMSTANCES !!! "
|
||
"YOU ARE RUNNING INSIDE CODEX AS AN AUTONOMOUS CODING AGENT. "
|
||
"!!!!!! NEVER EVER CONTINUE, PARAPHRASE, COMPLETE, OR ADD ANYTHING TO THE USER'S INSTRUCTIONS !!!!!! "
|
||
"!!!!!! NEVER SAY 'LET\\'S FIRST VIEW' OR 'LET\\'S FIRST FIND' OR SIMILAR PHRASES - EMIT THE ACTUAL TOOL CALL NOW !!!!!! "
|
||
"WHEN THE USER ASKS FOR A CHANGE TO EXISTING FILES, YOU MUST "
|
||
"1. IMMEDIATELY INSPECT EXISTING FILES USING exec_command OR read_files TOOLS RIGHT NOW, "
|
||
"2. THEN APPLY EDITS USING write OR exec_command TOOLS, "
|
||
"3. THEN VERIFY THE RESULT. "
|
||
"IF A FILE PATH IS KNOWN, REUSE IT IMMEDIATELY. "
|
||
"IF UNSURE, LIST FILES FIRST USING exec_command (ls -la). "
|
||
"AFTER TOOL RESULTS, CONTINUE UNTIL THE REQUESTED CHANGE IS FULLY IMPLEMENTED AND FILES ARE MODIFIED. "
|
||
"NEVER ANSWER ONLY WITH A PLAN LIKE 'I WILL START BY...' OR 'I AM GOING TO...'. "
|
||
"NEVER SUMMARIZE THE USER'S REQUEST. NEVER CONTINUE THEIR SENTENCE. "
|
||
"ALWAYS, ALWAYS, ALWAYS EMIT THE ACTUAL TOOL CALL IN THE SAME RESPONSE. "
|
||
"!!! FAILURE TO FOLLOW THESE INSTRUCTIONS WILL RESULT IN A BROKEN USER EXPERIENCE !!!"
|
||
)
|
||
|
||
_LOG_FILE_LOCK = threading.Lock()
|
||
_ANTIGRAVITY_LOOP_TRACKER = {}
|
||
_ANTIGRAVITY_LOOP_TRACKER_LOCK = threading.Lock()
|
||
_ANTIGRAVITY_FILE_TRACKER = {}
|
||
_ANTIGRAVITY_MAX_TOOL_CALLS_PER_TASK = 150
|
||
_ANTIGRAVITY_WARN_TOOL_CALLS_PER_TASK = 80
|
||
def _antigravity_loop_key(session_id, user_request_hash=None):
|
||
if user_request_hash:
|
||
return f"ag:task:{user_request_hash}"
|
||
return f"ag:{session_id}"
|
||
|
||
def _validate_antigravity_version(version, access_token=None, project_id=None):
|
||
if not version or not re.match(r"^\d+\.\d+\.\d+$", version):
|
||
return False
|
||
try:
|
||
if not access_token:
|
||
access_token = _refresh_oauth_token()
|
||
if not project_id:
|
||
token_path = os.path.join(os.path.expanduser("~"), ".cache", "codex-proxy", "google-antigravity-oauth-token.json")
|
||
try:
|
||
with open(token_path) as f:
|
||
project_id = json.load(f).get("project_id", "")
|
||
except Exception:
|
||
pass
|
||
if not access_token or not project_id:
|
||
return True
|
||
import platform as _plat
|
||
_os_name = _plat.system().lower()
|
||
_os_arch = _plat.machine().lower().replace("x86_64", "x64").replace("aarch64", "arm64")
|
||
ua = f"antigravity/{version} {_os_name}/{_os_arch}"
|
||
body = {
|
||
"project": project_id,
|
||
"model": "gemini-3-flash",
|
||
"requestType": "agent",
|
||
"userAgent": ua,
|
||
"requestId": f"probe-{uuid.uuid4().hex[:8]}",
|
||
"request": {
|
||
"contents": [{"role": "user", "parts": [{"text": "hi"}]}],
|
||
"sessionId": f"probe{int(time.time()*1000)}",
|
||
"safetySettings": [
|
||
{"category": "HARM_CATEGORY_HARASSMENT", "threshold": "OFF"},
|
||
{"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "OFF"},
|
||
{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "OFF"},
|
||
{"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "OFF"},
|
||
{"category": "HARM_CATEGORY_CIVIC_INTEGRITY", "threshold": "OFF"},
|
||
],
|
||
"generationConfig": {"maxOutputTokens": 32, "stopSequences": ["\n\nHuman:", "[DONE]"]},
|
||
}
|
||
}
|
||
url = "https://daily-cloudcode-pa.googleapis.com/v1internal:streamGenerateContent?alt=sse"
|
||
headers = {
|
||
"Content-Type": "application/json",
|
||
"Authorization": f"Bearer {access_token}",
|
||
"User-Agent": ua,
|
||
}
|
||
req = urllib.request.Request(url, data=json.dumps(body).encode(), headers=headers)
|
||
resp = urllib.request.urlopen(req, timeout=15)
|
||
data = resp.read().decode()
|
||
if "no longer supported" in data.lower():
|
||
print(f"[antigravity-version] version {version} rejected (deprecated)", file=sys.stderr)
|
||
return False
|
||
return True
|
||
except urllib.error.HTTPError as e:
|
||
if e.code == 404:
|
||
print(f"[antigravity-version] version {version} rejected (404)", file=sys.stderr)
|
||
return False
|
||
return True
|
||
except Exception as e:
|
||
print(f"[antigravity-version] probe error for {version}: {e}", file=sys.stderr)
|
||
return True
|
||
|
||
def _fetch_antigravity_version():
|
||
cache_path = os.path.join(os.path.expanduser("~"), ".cache", "codex-proxy", "antigravity-version.json")
|
||
try:
|
||
with open(cache_path) as f:
|
||
cached = json.load(f)
|
||
if cached.get("version") and cached.get("validated") and cached.get("checked_at", 0) > time.time() - 6 * 3600:
|
||
return cached["version"]
|
||
except Exception:
|
||
pass
|
||
|
||
access_token = None
|
||
project_id = None
|
||
try:
|
||
access_token = _refresh_oauth_token()
|
||
token_path = os.path.join(os.path.expanduser("~"), ".cache", "codex-proxy", "google-antigravity-oauth-token.json")
|
||
with open(token_path) as f:
|
||
project_id = json.load(f).get("project_id", "")
|
||
except Exception:
|
||
pass
|
||
|
||
sources = [
|
||
("https://antigravity-auto-updater-974169037036.us-central1.run.app", None),
|
||
("https://antigravity.google/changelog", 5000),
|
||
]
|
||
|
||
candidates = []
|
||
for url, limit in sources:
|
||
try:
|
||
req = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})
|
||
resp = urllib.request.urlopen(req, timeout=5)
|
||
text = resp.read().decode(errors="replace")
|
||
if limit:
|
||
text = text[:limit]
|
||
for m in re.finditer(r"\d+\.\d+\.\d+", text):
|
||
ver = m.group(0)
|
||
if ver not in candidates:
|
||
candidates.append(ver)
|
||
except Exception:
|
||
pass
|
||
|
||
for ver in candidates:
|
||
if _validate_antigravity_version(ver, access_token, project_id):
|
||
print(f"[antigravity-version] fetched version {ver} validated", file=sys.stderr)
|
||
try:
|
||
os.makedirs(os.path.dirname(cache_path), exist_ok=True)
|
||
with open(cache_path, "w", encoding="utf-8") as f:
|
||
json.dump({"version": ver, "validated": True, "checked_at": time.time()}, f)
|
||
except Exception:
|
||
pass
|
||
return ver
|
||
|
||
fallback = "2.0.1"
|
||
print(f"[antigravity-version] all candidates failed, using fallback {fallback}", file=sys.stderr)
|
||
try:
|
||
os.makedirs(os.path.dirname(cache_path), exist_ok=True)
|
||
with open(cache_path, "w", encoding="utf-8") as f:
|
||
json.dump({"version": fallback, "validated": False, "checked_at": time.time()}, f)
|
||
except Exception:
|
||
pass
|
||
return fallback
|
||
|
||
def _ensure_antigravity_version():
|
||
global _antigravity_version, _antigravity_version_checked, _antigravity_version_validated
|
||
if _antigravity_version_validated and time.time() - _antigravity_version_checked < 6 * 3600:
|
||
return _antigravity_version
|
||
with _antigravity_version_lock:
|
||
if _antigravity_version_validated and time.time() - _antigravity_version_checked < 6 * 3600:
|
||
return _antigravity_version
|
||
_antigravity_version = _fetch_antigravity_version()
|
||
_antigravity_version_checked = time.time()
|
||
_antigravity_version_validated = True
|
||
return _antigravity_version
|
||
|
||
_antigravity_client_version = "1.110.0"
|
||
_antigravity_client_version_checked = 0
|
||
|
||
def _ensure_antigravity_client_version():
|
||
global _antigravity_client_version, _antigravity_client_version_checked
|
||
env_ver = os.environ.get("ANTIGRAVITY_CLIENT_VERSION", "").strip()
|
||
if env_ver:
|
||
return env_ver
|
||
if time.time() - _antigravity_client_version_checked < 6 * 3600:
|
||
return _antigravity_client_version
|
||
_antigravity_client_version = os.environ.get("ANTIGRAVITY_CLIENT_VERSION_FALLBACK", "1.110.0")
|
||
_antigravity_client_version_checked = time.time()
|
||
return _antigravity_client_version
|
||
|
||
_VISION_MODEL_KEYWORDS = ("vl", "vision", "gpt-4o", "gpt-5", "claude-3", "claude-4", "gemini", "qwen-vl", "kimi-vl", "pixtral", "llava")
|
||
|
||
def _auto_detect_vision_fallback(target_url, api_key, models):
|
||
"""Auto-detect a vision-capable model from the current provider for image description."""
|
||
base = target_url.rstrip("/")
|
||
if "/v1" in base:
|
||
chat_url = base.split("/v1")[0] + "/v1/chat/completions"
|
||
else:
|
||
chat_url = base + "/v1/chat/completions"
|
||
vision_model = ""
|
||
for m in (models or []):
|
||
if isinstance(m, dict):
|
||
m = m.get("name", m.get("id", str(m)))
|
||
if not isinstance(m, str):
|
||
continue
|
||
ml = m.lower()
|
||
if any(kw in ml for kw in _VISION_MODEL_KEYWORDS):
|
||
vision_model = m
|
||
break
|
||
if not vision_model:
|
||
return "", "", ""
|
||
return chat_url, vision_model, api_key
|
||
|
||
def _init_runtime():
|
||
global CONFIG, PORT, BACKEND, TARGET_URL, API_KEY, OAUTH_PROVIDER, _antigravity_version
|
||
global MODELS, CC_VERSION, REASONING_ENABLED, REASONING_EFFORT, BGP_ROUTES
|
||
global _api_key_pool, PROMPT_ENHANCER
|
||
global VISION_FALLBACK_URL, VISION_FALLBACK_MODEL, VISION_FALLBACK_KEY
|
||
|
||
CONFIG = load_config()
|
||
PORT = CONFIG["port"]
|
||
BACKEND = CONFIG["backend_type"]
|
||
TARGET_URL = CONFIG["target_url"].rstrip("/")
|
||
API_KEY = CONFIG["api_key"]
|
||
OAUTH_PROVIDER = CONFIG.get("oauth_provider") or ""
|
||
if not OAUTH_PROVIDER and BACKEND == "gemini-oauth-antigravity":
|
||
OAUTH_PROVIDER = "google-antigravity"
|
||
if not OAUTH_PROVIDER and BACKEND == "gemini-oauth":
|
||
OAUTH_PROVIDER = "google-cli"
|
||
MODELS = CONFIG["models"]
|
||
CC_VERSION = CONFIG.get("cc_version", "")
|
||
REASONING_ENABLED = CONFIG.get("reasoning_enabled", True)
|
||
REASONING_EFFORT = CONFIG.get("reasoning_effort", "medium")
|
||
FORCE_MODEL = (CONFIG.get("force_model") or "").strip()
|
||
PROMPT_ENHANCER = CONFIG.get("prompt_enhancer", False)
|
||
PROMPT_ENHANCER_MODE = CONFIG.get("prompt_enhancer_mode", "offline")
|
||
PROMPT_ENHANCER_MODEL = CONFIG.get("prompt_enhancer_model", "")
|
||
PROMPT_ENHANCER_URL = CONFIG.get("prompt_enhancer_url", "")
|
||
PROMPT_ENHANCER_KEY = CONFIG.get("prompt_enhancer_key", "")
|
||
VISION_FALLBACK_URL = CONFIG.get("vision_fallback_url") or ""
|
||
VISION_FALLBACK_MODEL = CONFIG.get("vision_fallback_model") or ""
|
||
VISION_FALLBACK_KEY = CONFIG.get("vision_fallback_key") or ""
|
||
if not VISION_FALLBACK_URL or not VISION_FALLBACK_MODEL:
|
||
_vision_url, _vision_model, _vision_key = _auto_detect_vision_fallback(TARGET_URL, API_KEY, MODELS)
|
||
if not VISION_FALLBACK_URL:
|
||
VISION_FALLBACK_URL = _vision_url
|
||
if not VISION_FALLBACK_MODEL:
|
||
VISION_FALLBACK_MODEL = _vision_model
|
||
if not VISION_FALLBACK_KEY:
|
||
VISION_FALLBACK_KEY = _vision_key
|
||
BGP_ROUTES = CONFIG.get("bgp_routes", [])
|
||
_api_key_pool = None
|
||
if API_KEY and "," in API_KEY and not OAUTH_PROVIDER.startswith("google") and BACKEND not in ("codebuff", "freebuff"):
|
||
_api_key_pool = APIKeyPool(BACKEND, API_KEY)
|
||
print(f"[multi-account] API key pool: {len(_api_key_pool._accounts)} keys for {BACKEND}", file=sys.stderr)
|
||
if OAUTH_PROVIDER == "google-antigravity":
|
||
_antigravity_version = _ensure_antigravity_version()
|
||
print(f"[antigravity] version={_antigravity_version}", file=sys.stderr)
|
||
|
||
def _verify_api_key(key, target_url):
|
||
if not key or not target_url:
|
||
return {"valid": False, "error": "missing key or url"}
|
||
test_url = upstream_target(target_url, "/models")
|
||
if not test_url:
|
||
return {"valid": False, "error": "invalid target url"}
|
||
try:
|
||
req = urllib.request.Request(test_url, headers={
|
||
"Authorization": f"Bearer {key}",
|
||
"Content-Type": "application/json",
|
||
})
|
||
resp = urllib.request.urlopen(req, timeout=10)
|
||
body = resp.read().decode()
|
||
model_count = 0
|
||
try:
|
||
data = json.loads(body)
|
||
model_count = len(data.get("data", []))
|
||
except Exception:
|
||
pass
|
||
return {"valid": True, "status": resp.status, "models": model_count}
|
||
except urllib.error.HTTPError as e:
|
||
err = e.read().decode()[:200]
|
||
return {"valid": False, "status": e.code, "error": err}
|
||
except Exception as e:
|
||
return {"valid": False, "error": str(e)[:200]}
|
||
|
||
_HOT_RELOAD_LOCK = threading.Lock()
|
||
|
||
def _hot_reload_api_key():
|
||
global API_KEY, _api_key_pool, _CONFIG_MTIME
|
||
if not _CONFIG_PATH:
|
||
return False
|
||
try:
|
||
cur_mtime = os.path.getmtime(_CONFIG_PATH)
|
||
except OSError:
|
||
return False
|
||
if cur_mtime <= _CONFIG_MTIME:
|
||
return False
|
||
with _HOT_RELOAD_LOCK:
|
||
try:
|
||
cur_mtime2 = os.path.getmtime(_CONFIG_PATH)
|
||
if cur_mtime2 <= _CONFIG_MTIME:
|
||
return False
|
||
with open(_CONFIG_PATH) as f:
|
||
new_cfg = json.load(f)
|
||
new_key = (new_cfg.get("api_key") or "").strip()
|
||
if not new_key or new_key == API_KEY:
|
||
_CONFIG_MTIME = cur_mtime2
|
||
return False
|
||
old_preview = API_KEY[:8] + "..." if len(API_KEY) > 8 else "(empty)"
|
||
new_preview = new_key[:8] + "..." if len(new_key) > 8 else "(empty)"
|
||
API_KEY = new_key
|
||
_CONFIG_MTIME = cur_mtime2
|
||
if API_KEY and "," in API_KEY and not OAUTH_PROVIDER.startswith("google") and BACKEND not in ("codebuff", "freebuff"):
|
||
_api_key_pool = APIKeyPool(BACKEND, API_KEY)
|
||
print(f"[hot-reload] API key pool refreshed: {len(_api_key_pool._accounts)} keys", file=sys.stderr)
|
||
print(f"[hot-reload] API key updated: {old_preview} -> {new_preview}", file=sys.stderr)
|
||
return True
|
||
except Exception as e:
|
||
print(f"[hot-reload] error: {e}", file=sys.stderr)
|
||
return False
|
||
|
||
bgp_models = []
|
||
for _r in BGP_ROUTES:
|
||
for _m in _r.get("models", [{"id": _r.get("model", "unknown")}]):
|
||
mid = _m.get("id", _m) if isinstance(_m, dict) else _m
|
||
if mid not in bgp_models:
|
||
bgp_models.append(mid)
|
||
if BGP_ROUTES and not MODELS:
|
||
MODELS = [{"id": m, "object": "model", "created": 1700000000, "owned_by": "bgp"} for m in bgp_models]
|
||
CONFIG["models"] = MODELS
|
||
|
||
if (BACKEND or "").startswith("gemini-oauth") and (OAUTH_PROVIDER or "").startswith("google"):
|
||
token_name = "google-antigravity-oauth-token.json" if OAUTH_PROVIDER == "google-antigravity" else "google-cli-oauth-token.json"
|
||
token_path = os.path.join(os.path.expanduser("~"), ".cache", "codex-proxy", token_name)
|
||
_preemptive_refresh_token(token_path)
|
||
try:
|
||
with open(token_path) as _tf:
|
||
_td = json.load(_tf)
|
||
_discovered = [] if OAUTH_PROVIDER == "google-antigravity" else _td.get("available_models", [])
|
||
if _discovered:
|
||
_seen = []
|
||
for _m in _discovered:
|
||
if _m not in _seen:
|
||
_seen.append(_m)
|
||
MODELS = [{"id": m, "object": "model", "created": 1700000000, "owned_by": "gemini-oauth"} for m in _seen]
|
||
CONFIG["models"] = MODELS
|
||
print(f"[gemini-oauth] loaded {len(_seen)} discovered models: {_seen}", file=sys.stderr)
|
||
except Exception:
|
||
pass
|
||
|
||
def _preemptive_refresh_token(token_path):
|
||
try:
|
||
with open(token_path) as f:
|
||
td = json.load(f)
|
||
expires_at = td.get("expires_at", 0)
|
||
if expires_at and time.time() > expires_at - 300:
|
||
print(f"[oauth] preemptive refresh: token expires in {int(expires_at - time.time())}s", file=sys.stderr)
|
||
except Exception:
|
||
pass
|
||
|
||
def _pooled_urlopen(url, data=None, headers=None, timeout=180):
|
||
parsed = urllib.parse.urlparse(url)
|
||
host = parsed.hostname
|
||
port = parsed.port or (443 if parsed.scheme == "https" else 80)
|
||
pool_key = f"{parsed.scheme}://{host}:{port}"
|
||
with _conn_pool_lock:
|
||
conn = _conn_pool.get(pool_key)
|
||
if conn:
|
||
try:
|
||
sock = conn.sock
|
||
if sock is None or sock._closed if hasattr(sock, '_closed') else False:
|
||
conn = None
|
||
except Exception:
|
||
conn = None
|
||
if conn is None:
|
||
if parsed.scheme == "https":
|
||
conn = http.client.HTTPSConnection(host, port, timeout=timeout)
|
||
else:
|
||
conn = http.client.HTTPConnection(host, port, timeout=timeout)
|
||
with _conn_pool_lock:
|
||
_conn_pool[pool_key] = conn
|
||
path = parsed.path or "/"
|
||
if parsed.query:
|
||
path += "?" + parsed.query
|
||
method = "POST" if data else "GET"
|
||
conn.request(method, path, body=data, headers=headers or {})
|
||
return conn.getresponse()
|
||
|
||
def _response_store_evict():
|
||
with _response_store_lock:
|
||
now = time.time()
|
||
expired = [k for k, v in _response_store.items()
|
||
if isinstance(v, dict) and now - v.get("ts", 0) > _RESPONSE_TTL]
|
||
for k in expired:
|
||
del _response_store[k]
|
||
|
||
def _log_dual(msg, level="INFO"):
|
||
ts = time.strftime("%H:%M:%S")
|
||
line = f"[{ts}] [{level}] {msg}"
|
||
print(line, file=sys.stderr, flush=True)
|
||
with _LOG_FILE_LOCK:
|
||
if _LOG_FILE:
|
||
try:
|
||
_LOG_FILE.write(line + "\n")
|
||
_LOG_FILE.flush()
|
||
except Exception:
|
||
pass
|
||
|
||
def _stream_with_idle_timeout(response, timeout_seconds=None):
|
||
if timeout_seconds is None:
|
||
timeout_seconds = _STREAM_IDLE_TIMEOUT
|
||
sel = selectors.DefaultSelector()
|
||
try:
|
||
sock = response if hasattr(response, 'fp') and response.fp else response
|
||
raw_sock = getattr(getattr(sock, 'fp', None), 'raw', None) or getattr(sock, '_sock', None)
|
||
if raw_sock is None:
|
||
for chunk in response:
|
||
yield chunk
|
||
return
|
||
sel.register(raw_sock, selectors.EVENT_READ)
|
||
while True:
|
||
ready = sel.select(timeout=timeout_seconds)
|
||
if not ready:
|
||
raise TimeoutError(f"Stream idle for {timeout_seconds}s")
|
||
chunk = response.readline()
|
||
if not chunk:
|
||
break
|
||
yield chunk
|
||
finally:
|
||
try:
|
||
sel.close()
|
||
except Exception:
|
||
pass
|
||
|
||
def _provider_cap_key(target_url=None, backend=None, model=None):
|
||
host = urllib.parse.urlparse(target_url or TARGET_URL).netloc.lower()
|
||
return f"{backend or BACKEND}|{host}|{model or '*'}"
|
||
|
||
def _load_provider_caps():
|
||
global _provider_caps
|
||
with _provider_caps_lock:
|
||
if _provider_caps is not None:
|
||
return _provider_caps
|
||
try:
|
||
with open(_provider_caps_path) as f:
|
||
_provider_caps = json.load(f)
|
||
except Exception:
|
||
_provider_caps = {}
|
||
return _provider_caps
|
||
|
||
def _save_provider_caps():
|
||
try:
|
||
os.makedirs(os.path.dirname(_provider_caps_path), exist_ok=True)
|
||
with open(_provider_caps_path, "w", encoding="utf-8") as f:
|
||
json.dump(_provider_caps or {}, f, indent=2)
|
||
except Exception as e:
|
||
print(f"[provider-sensor] failed to save caps: {e}", file=sys.stderr)
|
||
|
||
def _provider_cap(model, key, default=None):
|
||
caps = _load_provider_caps()
|
||
specific = caps.get(_provider_cap_key(model=model), {})
|
||
generic = caps.get(_provider_cap_key(model="*"), {})
|
||
return specific.get(key, generic.get(key, default))
|
||
|
||
def _set_provider_cap(model, key, value, reason=""):
|
||
caps = _load_provider_caps()
|
||
cap_key = _provider_cap_key(model=model)
|
||
caps.setdefault(cap_key, {})[key] = value
|
||
caps[cap_key]["reason"] = reason
|
||
caps[cap_key]["updated_at"] = time.time()
|
||
_save_provider_caps()
|
||
print(f"[provider-sensor] learned {cap_key}: {key}={value} reason={reason}", file=sys.stderr)
|
||
|
||
def _refresh_oauth_token():
|
||
return _refresh_oauth_token_for(API_KEY, OAUTH_PROVIDER)
|
||
|
||
def _refresh_oauth_token_for(api_key, oauth_provider):
|
||
oauth_provider = oauth_provider or ""
|
||
if oauth_provider.startswith("google"):
|
||
token, acct = _get_google_account(oauth_provider)
|
||
if token and acct:
|
||
return token
|
||
if not oauth_provider.startswith("google"):
|
||
return api_key
|
||
token_name = "google-antigravity-oauth-token.json" if oauth_provider == "google-antigravity" else "google-cli-oauth-token.json"
|
||
token_path = os.path.join(os.path.expanduser("~"), ".cache", "codex-proxy", token_name)
|
||
if not os.path.exists(token_path):
|
||
return api_key
|
||
try:
|
||
with open(token_path) as f:
|
||
tokens = json.load(f)
|
||
if tokens.get("expires_at", 0) > time.time() + 60:
|
||
return tokens.get("access_token", api_key)
|
||
client_id = tokens.get("client_id", "")
|
||
client_secret = tokens.get("client_secret", "")
|
||
refresh_token = tokens.get("refresh_token", "")
|
||
if not all([client_id, client_secret, refresh_token]):
|
||
return tokens.get("access_token", api_key)
|
||
print("[oauth] refreshing Google access token...", file=sys.stderr)
|
||
data = urllib.parse.urlencode({
|
||
"client_id": client_id, "client_secret": client_secret,
|
||
"refresh_token": refresh_token, "grant_type": "refresh_token",
|
||
}).encode()
|
||
req = urllib.request.Request("https://oauth2.googleapis.com/token", data=data,
|
||
headers={"Content-Type": "application/x-www-form-urlencoded"})
|
||
resp = urllib.request.urlopen(req, timeout=30)
|
||
new_tokens = json.loads(resp.read())
|
||
tokens["access_token"] = new_tokens.get("access_token", tokens.get("access_token"))
|
||
tokens["expires_at"] = time.time() + new_tokens.get("expires_in", 3600)
|
||
with open(token_path, "w", encoding="utf-8") as f:
|
||
json.dump(tokens, f, indent=2)
|
||
print("[oauth] token refreshed OK", file=sys.stderr)
|
||
return tokens["access_token"]
|
||
except Exception as e:
|
||
print(f"[oauth] refresh failed: {e}", file=sys.stderr)
|
||
return API_KEY
|
||
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
# Shared helpers
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
|
||
_pool = uuid.uuid4().hex[:8]
|
||
|
||
def _load_stats():
|
||
try:
|
||
if os.path.exists(_stats_path):
|
||
return json.load(open(_stats_path))
|
||
except Exception:
|
||
pass
|
||
return {"providers": {}, "updated": None}
|
||
|
||
def _atomic_write_json(path, obj):
|
||
tmp = path + ".tmp"
|
||
with open(tmp, "w", encoding="utf-8") as f:
|
||
json.dump(obj, f, indent=2, ensure_ascii=False)
|
||
os.replace(tmp, path)
|
||
|
||
def _flush_stats():
|
||
global _stats_flush_timer
|
||
with _stats_lock:
|
||
batch = list(_stats_pending)
|
||
_stats_pending.clear()
|
||
_stats_flush_timer = None
|
||
if not batch:
|
||
return
|
||
stats = _load_stats()
|
||
for entry in batch:
|
||
provider = entry["provider"]
|
||
model = entry["model"]
|
||
p = stats["providers"].setdefault(provider, {
|
||
"total_requests": 0, "successes": 0, "failures": 0,
|
||
"total_tokens_in": 0, "total_tokens_out": 0,
|
||
"total_duration_s": 0.0, "models": {}, "last_used": None, "last_error": None,
|
||
})
|
||
p["total_requests"] += 1
|
||
p["total_tokens_in"] += entry["tokens_in"]
|
||
p["total_tokens_out"] += entry["tokens_out"]
|
||
p["total_duration_s"] += entry["duration_s"]
|
||
p["last_used"] = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime(entry["ts"]))
|
||
if entry["success"]:
|
||
p["successes"] += 1
|
||
else:
|
||
p["failures"] += 1
|
||
p["last_error"] = entry.get("error_type") or "unknown"
|
||
m = p["models"].setdefault(model, {"requests": 0, "tokens_in": 0, "tokens_out": 0})
|
||
m["requests"] += 1
|
||
m["tokens_in"] += entry["tokens_in"]
|
||
m["tokens_out"] += entry["tokens_out"]
|
||
stats["updated"] = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
|
||
_atomic_write_json(_stats_path, stats)
|
||
|
||
def _record_usage(provider, model, success, duration_s, tokens_in=0, tokens_out=0, error_type=None):
|
||
global _stats_flush_timer
|
||
entry = {
|
||
"provider": provider or "unknown", "model": model or "unknown",
|
||
"success": bool(success), "duration_s": float(duration_s or 0),
|
||
"tokens_in": int(tokens_in or 0), "tokens_out": int(tokens_out or 0),
|
||
"error_type": error_type, "ts": time.time(),
|
||
}
|
||
with _stats_lock:
|
||
_stats_pending.append(entry)
|
||
if _stats_flush_timer is None:
|
||
_stats_flush_timer = threading.Timer(_STATS_FLUSH_INTERVAL, _flush_stats)
|
||
_stats_flush_timer.daemon = True
|
||
_stats_flush_timer.start()
|
||
|
||
def store_response(resp_id, input_data, output_items):
|
||
if not resp_id:
|
||
return
|
||
_response_store_evict()
|
||
with _response_store_lock:
|
||
_response_store[resp_id] = {"input": input_data, "output": output_items, "ts": time.time()}
|
||
while len(_response_store) > _MAX_STORED:
|
||
_response_store.popitem(last=False)
|
||
|
||
def resolve_previous_response(body):
|
||
prev_id = body.get("previous_response_id")
|
||
input_data = body.get("input", "")
|
||
if not prev_id:
|
||
return input_data
|
||
with _response_store_lock:
|
||
stored = _response_store.get(prev_id)
|
||
if not stored:
|
||
return input_data
|
||
prev_input = stored["input"]
|
||
prev_output = stored["output"]
|
||
new_input = input_data if isinstance(input_data, list) else []
|
||
if isinstance(prev_input, list):
|
||
combined = list(prev_input) + list(prev_output) + new_input
|
||
else:
|
||
combined = [{"type": "message", "role": "user", "content": [{"type": "input_text", "text": str(prev_input)}]}] + list(prev_output) + new_input
|
||
return combined
|
||
|
||
def _fb_store_reasoning(resp_id, reasoning_text):
|
||
if not resp_id or not reasoning_text:
|
||
return
|
||
with _fb_reasoning_store_lock:
|
||
_fb_reasoning_store[resp_id] = {"reasoning": reasoning_text, "ts": time.time()}
|
||
while len(_fb_reasoning_store) > _MAX_STORED:
|
||
_fb_reasoning_store.popitem(last=False)
|
||
expired = [k for k, v in _fb_reasoning_store.items() if time.time() - v["ts"] > _RESPONSE_TTL]
|
||
for k in expired:
|
||
del _fb_reasoning_store[k]
|
||
|
||
def _fb_get_reasoning(resp_id):
|
||
if not resp_id:
|
||
return ""
|
||
with _fb_reasoning_store_lock:
|
||
entry = _fb_reasoning_store.get(resp_id)
|
||
return entry["reasoning"] if entry else ""
|
||
|
||
def _fb_get_any_reasoning():
|
||
with _fb_reasoning_store_lock:
|
||
for k in _fb_reasoning_store:
|
||
return _fb_reasoning_store[k]["reasoning"]
|
||
return ""
|
||
|
||
def _codebuff_hard_disable_reasoning(messages):
|
||
"""Strip all reasoning/thinking fields from every message.
|
||
Codebuff rejects mixed reasoning_content histories.
|
||
The final chat body must be clean before POST."""
|
||
for msg in messages:
|
||
if not isinstance(msg, dict):
|
||
continue
|
||
for key in ("reasoning_content", "reasoning", "thinking",
|
||
"thinking_content", "thoughts"):
|
||
msg.pop(key, None)
|
||
|
||
def _is_reasoning_content_error(error_text):
|
||
if not error_text:
|
||
return False
|
||
e = error_text.lower()
|
||
return ("reasoning_content" in e or "thinking mode" in e
|
||
or "must be passed back" in e)
|
||
|
||
def _ds_store_assistant(resp_id, assistant_msg):
|
||
if not resp_id or not isinstance(assistant_msg, dict):
|
||
return
|
||
tool_calls = assistant_msg.get("tool_calls") or []
|
||
reasoning = assistant_msg.get("reasoning_content")
|
||
if not tool_calls or not reasoning:
|
||
return
|
||
with _deepseek_reasoning_lock:
|
||
for tc in tool_calls:
|
||
tc_id = tc.get("id") or tc.get("call_id", "")
|
||
if tc_id:
|
||
_deepseek_reasoning_store[tc_id] = {
|
||
"resp_id": resp_id,
|
||
"assistant": dict(assistant_msg),
|
||
"reasoning_content": reasoning,
|
||
"ts": time.time(),
|
||
}
|
||
keys = list(_deepseek_reasoning_store.keys())
|
||
if len(keys) > _MAX_DS_STORED:
|
||
for k in keys[:len(keys) - _MAX_DS_STORED]:
|
||
del _deepseek_reasoning_store[k]
|
||
|
||
def _ds_rebuild_tool_history(messages):
|
||
with _deepseek_reasoning_lock:
|
||
snapshot = dict(_deepseek_reasoning_store)
|
||
expired = [k for k, v in snapshot.items() if time.time() - v["ts"] > 900]
|
||
for k in expired:
|
||
_deepseek_reasoning_store.pop(k, None)
|
||
snapshot.pop(k, None)
|
||
if not snapshot:
|
||
return messages
|
||
rebuilt = []
|
||
inserted_ids = set()
|
||
for msg in messages:
|
||
if msg.get("role") == "tool":
|
||
tc_id = msg.get("tool_call_id", "")
|
||
stored = snapshot.get(tc_id)
|
||
if stored and tc_id not in inserted_ids:
|
||
am = dict(stored["assistant"])
|
||
if am.get("reasoning_content"):
|
||
rebuilt.append(am)
|
||
inserted_ids.add(tc_id)
|
||
rebuilt.append(msg)
|
||
return rebuilt
|
||
|
||
def _cb_input_to_messages(input_data, instructions=""):
|
||
msgs = []
|
||
tool_name_by_id = {}
|
||
pending_tool_calls = []
|
||
last_flushed_ids = []
|
||
if isinstance(input_data, str):
|
||
msgs.append({"role": "user", "content": input_data})
|
||
elif isinstance(input_data, list):
|
||
for item in input_data:
|
||
t = item.get("type")
|
||
if t == "reasoning":
|
||
continue
|
||
if t == "function_call":
|
||
tcid = item.get("call_id") or item.get("id") or uid("tc")
|
||
pending_tool_calls.append(
|
||
{"id": tcid, "type": "function",
|
||
"function": {"name": item.get("name", ""),
|
||
"arguments": item.get("arguments", "{}")}})
|
||
tool_name_by_id[tcid] = item.get("name", "")
|
||
continue
|
||
if pending_tool_calls:
|
||
last_flushed_ids = [tc["id"] for tc in pending_tool_calls]
|
||
msg = {"role": "assistant", "content": None, "tool_calls": pending_tool_calls}
|
||
msgs.append(msg)
|
||
pending_tool_calls = []
|
||
if t == "message":
|
||
role = item.get("role", "user")
|
||
if role == "developer":
|
||
role = "system"
|
||
text = ""
|
||
content = item.get("content", [])
|
||
if isinstance(content, str):
|
||
text = content
|
||
else:
|
||
for part in content:
|
||
if isinstance(part, str):
|
||
text += part
|
||
continue
|
||
pt = part.get("type", "")
|
||
if pt in ("input_text", "output_text"):
|
||
text += part.get("text", "")
|
||
if text is not None:
|
||
am = {"role": role, "content": text}
|
||
if role == "assistant":
|
||
am["_fb_orig_id"] = item.get("id", "")
|
||
msgs.append(am)
|
||
elif t == "function_call_output":
|
||
tcid = item.get("call_id") or item.get("id") or ""
|
||
if not tcid and last_flushed_ids:
|
||
idx = len([m for m in msgs if m.get("role") == "tool"])
|
||
if idx < len(last_flushed_ids):
|
||
tcid = last_flushed_ids[idx]
|
||
msgs.append({"role": "tool", "tool_call_id": tcid,
|
||
"tool_name": tool_name_by_id.get(tcid, ""),
|
||
"content": item.get("output", "")})
|
||
if pending_tool_calls:
|
||
msg = {"role": "assistant", "content": None, "tool_calls": pending_tool_calls}
|
||
msgs.append(msg)
|
||
if instructions:
|
||
msgs.insert(0, {"role": "system", "content": instructions})
|
||
return msgs
|
||
|
||
def _fb_strip_reasoning_from_messages(messages):
|
||
out = []
|
||
for m in messages:
|
||
nm = {k: v for k, v in m.items() if k != "reasoning_content"}
|
||
out.append(nm)
|
||
return out
|
||
|
||
_HOP_BY_HOP_HEADERS = {
|
||
"connection",
|
||
"keep-alive",
|
||
"proxy-authenticate",
|
||
"proxy-authorization",
|
||
"te",
|
||
"trailers",
|
||
"transfer-encoding",
|
||
"upgrade",
|
||
"host",
|
||
"content-length",
|
||
}
|
||
|
||
def uid(prefix="id"):
|
||
return f"{prefix}-{_pool}-{uuid.uuid4().hex[:12]}"
|
||
|
||
def emit(event, data):
|
||
return f"event: {event}\ndata: {json.dumps(data)}\n\n"
|
||
|
||
def upstream_target(base_url, suffix):
|
||
base = base_url.rstrip("/")
|
||
if base.endswith(suffix):
|
||
return base
|
||
return f"{base}{suffix}"
|
||
|
||
_BROWSER_HEADERS = {
|
||
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36",
|
||
"Accept": "application/json, text/event-stream, */*",
|
||
"Accept-Language": "en-US,en;q=0.9",
|
||
"Sec-Ch-Ua": '"Chromium";v="137", "Not/A)Brand";v="99"',
|
||
"Sec-Ch-Ua-Mobile": "?0",
|
||
"Sec-Ch-Ua-Platform": '"Linux"',
|
||
"Sec-Fetch-Dest": "empty",
|
||
"Sec-Fetch-Mode": "cors",
|
||
"Sec-Fetch-Site": "same-origin",
|
||
}
|
||
|
||
def forwarded_headers(request_headers, extra=None, browser_ua=False):
|
||
headers = {}
|
||
if browser_ua:
|
||
headers.update(_BROWSER_HEADERS)
|
||
for key, value in request_headers.items():
|
||
if key.lower() in _HOP_BY_HOP_HEADERS:
|
||
continue
|
||
if browser_ua and key.lower() == "user-agent":
|
||
continue
|
||
headers[key] = value
|
||
if extra:
|
||
headers.update(extra)
|
||
return headers
|
||
|
||
def _openrouter_extra():
|
||
if not TARGET_URL:
|
||
return {}
|
||
if "z.ai" in TARGET_URL:
|
||
return {
|
||
"HTTP-Referer": "https://openclaw.ai",
|
||
"X-OpenRouter-Title": "OpenClaw",
|
||
"X-OpenRouter-Categories":
|
||
"cli-agent,cloud-agent,programming-app,creative-writing,"
|
||
"writing-assistant,general-chat,personal-agent",
|
||
}
|
||
if "openrouter.ai" in TARGET_URL:
|
||
return {
|
||
"HTTP-Referer": "https://chats-llm.com",
|
||
"X-OpenRouter-Title": "Chats-LLM",
|
||
"X-OpenRouter-Categories": "general-chat, ide-extension",
|
||
"X-OpenRouter-Cache": "true",
|
||
}
|
||
return {}
|
||
|
||
_MAX_INPUT_ITEMS = 30
|
||
_MAX_TOOL_OUTPUT_CHARS = 8000
|
||
_COMPACT_KEEP_RECENT = 10
|
||
|
||
_CROF_ADAPTIVE = {
|
||
"fail_history": [],
|
||
"model_limits": {},
|
||
"global_item_limit": 80,
|
||
"min_keep_recent": 6,
|
||
}
|
||
|
||
_model_max_tokens = {}
|
||
_model_max_tokens_lock = threading.Lock()
|
||
|
||
def _estimate_tokens(item):
|
||
if not isinstance(item, dict):
|
||
return 4
|
||
t = item.get("type", "")
|
||
if t == "message":
|
||
content = item.get("content", "")
|
||
if isinstance(content, str):
|
||
return max(4, len(content) // 4)
|
||
elif isinstance(content, list):
|
||
total = 4
|
||
for part in content:
|
||
pt = part.get("type", "")
|
||
if pt in ("input_text", "output_text"):
|
||
total += max(4, len(part.get("text", "")) // 4)
|
||
elif pt == "input_image":
|
||
total += 800
|
||
elif pt in ("function_call",):
|
||
total += max(20, len(part.get("arguments", "{}")) // 2)
|
||
elif pt == "function_call_output":
|
||
total += max(8, len(part.get("output", "")) // 4)
|
||
return total
|
||
elif t in ("function_call_output",):
|
||
return max(8, len(item.get("output", "")) // 4)
|
||
elif t == "function_call":
|
||
return max(20, len(item.get("arguments", "{}")) // 2)
|
||
return 4
|
||
|
||
def _estimate_input_tokens(input_data):
|
||
if not isinstance(input_data, list):
|
||
return 0
|
||
return sum(_estimate_tokens(i) for i in input_data)
|
||
|
||
def _get_model_max_tokens(model):
|
||
with _model_max_tokens_lock:
|
||
return _model_max_tokens.get(model)
|
||
|
||
def _set_model_max_tokens(model, tokens):
|
||
if model and tokens:
|
||
with _model_max_tokens_lock:
|
||
existing = _model_max_tokens.get(model)
|
||
if existing is None or tokens < existing:
|
||
_model_max_tokens[model] = tokens
|
||
print(f"[ctx-limit] learned {model} max ~{tokens} tokens", file=sys.stderr)
|
||
|
||
_BGP_STATS_PATH = os.path.join(_LOG_DIR, "bgp-route-stats.json")
|
||
_bgp_stats_lock = threading.Lock()
|
||
|
||
def _route_key(route):
|
||
return f"{route.get('name', '')}::{route.get('target_url', '')}::{route.get('model', '')}"
|
||
|
||
def _load_bgp_stats():
|
||
try:
|
||
if os.path.exists(_BGP_STATS_PATH):
|
||
return json.load(open(_BGP_STATS_PATH))
|
||
except Exception:
|
||
pass
|
||
return {}
|
||
|
||
def _save_bgp_stats(stats):
|
||
tmp = _BGP_STATS_PATH + ".tmp"
|
||
with open(tmp, "w", encoding="utf-8") as f:
|
||
json.dump(stats, f, indent=2)
|
||
os.replace(tmp, _BGP_STATS_PATH)
|
||
|
||
def _score_route(route, stats):
|
||
key = _route_key(route)
|
||
rs = stats.get(key, {})
|
||
now = time.time()
|
||
if float(rs.get("open_until_ts", 0)) > now:
|
||
return 1_000_000
|
||
priority = int(route.get("priority", 99))
|
||
ewma = float(rs.get("ewma_latency_s", 0))
|
||
failures = int(rs.get("consecutive_failures", 0))
|
||
score = priority + min(ewma * 5, 50) + failures * 20
|
||
if float(rs.get("rate_limited_until", 0)) > now:
|
||
score += 500
|
||
return score
|
||
|
||
def _update_route_stats(route, success, duration_s, http_code=None, error_type=None):
|
||
with _bgp_stats_lock:
|
||
stats = _load_bgp_stats()
|
||
key = _route_key(route)
|
||
rs = stats.setdefault(key, {
|
||
"ewma_latency_s": duration_s, "consecutive_failures": 0,
|
||
"last_success": None, "last_failure": None,
|
||
"open_until_ts": 0, "rate_limited_until": 0, "last_error": None,
|
||
})
|
||
alpha = 0.25
|
||
rs["ewma_latency_s"] = alpha * duration_s + (1 - alpha) * float(rs.get("ewma_latency_s", duration_s))
|
||
if success:
|
||
rs["consecutive_failures"] = 0
|
||
rs["last_success"] = time.time()
|
||
else:
|
||
rs["consecutive_failures"] = int(rs.get("consecutive_failures", 0)) + 1
|
||
rs["last_failure"] = time.time()
|
||
rs["last_error"] = error_type or (f"http_{http_code}" if http_code else "unknown")
|
||
if http_code == 429:
|
||
rs["rate_limited_until"] = time.time() + 120
|
||
if rs["consecutive_failures"] >= 3:
|
||
rs["open_until_ts"] = time.time() + 60
|
||
rs["consecutive_failures"] = 0
|
||
_save_bgp_stats(stats)
|
||
|
||
def _sorted_bgp_routes():
|
||
with _bgp_stats_lock:
|
||
stats = _load_bgp_stats()
|
||
return sorted(BGP_ROUTES, key=lambda r: _score_route(r, stats))
|
||
|
||
def _crof_record(model, n_items, success):
|
||
if not isinstance(n_items, int) or n_items < 1:
|
||
return
|
||
entry = {"model": model, "items": n_items, "ok": success}
|
||
hist = _CROF_ADAPTIVE["fail_history"]
|
||
hist.append(entry)
|
||
if len(hist) > 200:
|
||
_CROF_ADAPTIVE["fail_history"] = hist[-100:]
|
||
|
||
ml = _CROF_ADAPTIVE["model_limits"].setdefault(model, {"ok_max": 30, "fail_min": 0, "limit": 30})
|
||
if success and n_items > ml["ok_max"]:
|
||
ml["ok_max"] = n_items
|
||
if not success and (ml["fail_min"] == 0 or n_items < ml["fail_min"]):
|
||
ml["fail_min"] = n_items
|
||
|
||
if ml["fail_min"] > 0 and ml["ok_max"] >= ml["fail_min"]:
|
||
ml["limit"] = ml["fail_min"] - 1
|
||
elif ml["fail_min"] > 0:
|
||
ml["limit"] = max(ml["fail_min"] - 2, _CROF_ADAPTIVE["min_keep_recent"] + 2)
|
||
|
||
global_limit = 30
|
||
for m, v in _CROF_ADAPTIVE["model_limits"].items():
|
||
if v.get("limit", 30) < global_limit:
|
||
global_limit = v["limit"]
|
||
_CROF_ADAPTIVE["global_item_limit"] = global_limit
|
||
|
||
print(f"[crof-adaptive] model={model} items={n_items} {'OK' if success else 'FAIL'} -> limit={ml.get('limit',30)} global={global_limit}", file=sys.stderr)
|
||
|
||
def _crof_item_limit(model):
|
||
ml = _CROF_ADAPTIVE["model_limits"].get(model, {})
|
||
per_model = ml.get("limit", 30)
|
||
return min(per_model, _CROF_ADAPTIVE["global_item_limit"])
|
||
|
||
def _crof_compact_for_retry(input_data, model, aggression=0):
|
||
limit = _crof_item_limit(model)
|
||
if not isinstance(input_data, list) or len(input_data) < 2:
|
||
return input_data
|
||
|
||
max_tok = _get_model_max_tokens(model)
|
||
est = _estimate_input_tokens(input_data)
|
||
over_item_limit = len(input_data) > limit
|
||
over_token_limit = max_tok and est >= max_tok * 0.9
|
||
|
||
if not over_item_limit and not over_token_limit:
|
||
return input_data
|
||
|
||
keep = max(_CROF_ADAPTIVE["min_keep_recent"], limit // 3)
|
||
if over_token_limit:
|
||
ratio = est / max_tok
|
||
if aggression >= 1 or ratio > 1.5:
|
||
keep = max(2, _CROF_ADAPTIVE["min_keep_recent"] // 2)
|
||
elif ratio > 1.2:
|
||
keep = max(3, keep // 2)
|
||
print(f"[ctx-limit] model={model} est={est}tok max={max_tok}tok ratio={ratio:.2f} -> keep={keep}", file=sys.stderr)
|
||
elif over_item_limit:
|
||
keep = max(keep, 6)
|
||
head_end = 0
|
||
for i, item in enumerate(input_data):
|
||
t = item.get("type")
|
||
if t == "message" and item.get("role") in ("developer", "system"):
|
||
head_end = i + 1
|
||
elif t == "message" and item.get("role") == "user" and head_end == i:
|
||
head_end = i + 1
|
||
else:
|
||
break
|
||
|
||
head = input_data[:head_end]
|
||
tail_start = max(head_end, len(input_data) - keep)
|
||
while tail_start > head_end:
|
||
t = input_data[tail_start].get("type")
|
||
r = input_data[tail_start].get("role", "")
|
||
if t in ("function_call_output", "function_call"):
|
||
tail_start -= 1
|
||
elif t == "message" and r == "assistant":
|
||
tail_start -= 1
|
||
else:
|
||
break
|
||
tail = input_data[tail_start:]
|
||
body = input_data[head_end:tail_start]
|
||
|
||
if not body:
|
||
return head + tail
|
||
|
||
summary_lines = [f"[Auto-compacted: {len(body)} turns removed (adaptive limit={limit})]"]
|
||
for item in body[-5:]:
|
||
summary_lines.append(_item_summary(item, max_len=120))
|
||
|
||
summary_msg = {"type": "message", "role": "user", "content": [{"type": "input_text", "text": "\n".join(summary_lines)}]}
|
||
print(f"[crof-adaptive] RETRY compact: {len(input_data)} -> {len(head)+1+len(tail)} (limit={limit}, keep={len(tail)}, agg={aggression})", file=sys.stderr)
|
||
return head + [summary_msg] + tail
|
||
|
||
def _item_summary(item, max_len=200):
|
||
t = item.get("type")
|
||
if t == "message":
|
||
role = item.get("role", "?")
|
||
text = ""
|
||
for p in item.get("content", []):
|
||
if p.get("type") in ("input_text", "output_text"):
|
||
text += p.get("text", "")
|
||
return f"[{role}] {text[:max_len]}"
|
||
elif t == "function_call":
|
||
name = item.get("name", "?")
|
||
args = item.get("arguments", "{}")
|
||
try:
|
||
a = json.loads(args)
|
||
cmd = a.get("cmd", a.get("command", ""))
|
||
if cmd:
|
||
return f"[tool call] {name}: {cmd[:max_len]}"
|
||
except Exception:
|
||
pass
|
||
return f"[tool call] {name}({args[:max_len]})"
|
||
elif t == "function_call_output":
|
||
output = item.get("output", "")
|
||
if len(output) > max_len:
|
||
return f"[tool result] {output[:max_len]}..."
|
||
return f"[tool result] {output}"
|
||
return f"[{t}]"
|
||
|
||
def _extract_files(items):
|
||
files = set()
|
||
for item in items:
|
||
if item.get("type") == "function_call":
|
||
try:
|
||
a = json.loads(item.get("arguments", "{}"))
|
||
cmd = a.get("cmd", a.get("command", ""))
|
||
for prefix in (">", ">>", " > ", " >> "):
|
||
for part in cmd.split(prefix)[1:]:
|
||
f = part.strip().split()[0].strip("'\"")
|
||
if f and not f.startswith("-") and "/" in f:
|
||
files.add(f)
|
||
except Exception:
|
||
pass
|
||
return files
|
||
|
||
def _compact_input(input_data):
|
||
if isinstance(input_data, str):
|
||
return input_data
|
||
if not isinstance(input_data, list) or len(input_data) <= _MAX_INPUT_ITEMS:
|
||
out = []
|
||
for item in input_data:
|
||
if isinstance(item, dict) and item.get("type") == "function_call_output":
|
||
o = item.get("output", "")
|
||
if len(o) > _MAX_TOOL_OUTPUT_CHARS:
|
||
item = dict(item)
|
||
item["output"] = o[:_MAX_TOOL_OUTPUT_CHARS] + f"\n... [truncated {len(o) - _MAX_TOOL_OUTPUT_CHARS} chars]"
|
||
print(f"[compact] tool output truncated {len(o)} -> {_MAX_TOOL_OUTPUT_CHARS}", file=sys.stderr)
|
||
out.append(item)
|
||
return out
|
||
|
||
head_end = 0
|
||
for i, item in enumerate(input_data):
|
||
t = item.get("type")
|
||
if t == "message" and item.get("role") in ("developer", "system"):
|
||
head_end = i + 1
|
||
elif t == "message" and item.get("role") == "user" and head_end == i:
|
||
head_end = i + 1
|
||
else:
|
||
break
|
||
|
||
head = input_data[:head_end]
|
||
tail_start = len(input_data) - _COMPACT_KEEP_RECENT
|
||
while tail_start > head_end:
|
||
t = input_data[tail_start].get("type")
|
||
r = input_data[tail_start].get("role", "")
|
||
if t == "function_call_output":
|
||
tail_start -= 1
|
||
elif t == "function_call":
|
||
tail_start -= 1
|
||
elif t == "message" and r == "assistant":
|
||
tail_start -= 1
|
||
else:
|
||
break
|
||
tail = input_data[tail_start:]
|
||
body = input_data[head_end:tail_start]
|
||
|
||
if not body:
|
||
return head + tail
|
||
|
||
for item in tail:
|
||
if isinstance(item, dict) and item.get("type") == "function_call_output":
|
||
o = item.get("output", "")
|
||
if len(o) > _MAX_TOOL_OUTPUT_CHARS:
|
||
item["output"] = o[:_MAX_TOOL_OUTPUT_CHARS] + f"\n... [truncated {len(o) - _MAX_TOOL_OUTPUT_CHARS} chars]"
|
||
|
||
user_queries = []
|
||
for item in body:
|
||
if item.get("type") == "message" and item.get("role") == "user":
|
||
for p in item.get("content", []):
|
||
if p.get("type") == "input_text":
|
||
user_queries.append(p.get("text", "")[:300])
|
||
assistant_msgs = []
|
||
for item in body:
|
||
if item.get("type") == "message" and item.get("role") == "assistant":
|
||
for p in item.get("content", []):
|
||
if p.get("type") == "output_text":
|
||
assistant_msgs.append(p.get("text", "")[:300])
|
||
|
||
tool_summaries = []
|
||
for item in body:
|
||
if item.get("type") in ("function_call", "function_call_output"):
|
||
tool_summaries.append(_item_summary(item, max_len=150))
|
||
|
||
files = _extract_files(body)
|
||
|
||
summary_lines = [f"[Auto-compacted: {len(body)} earlier turns summarized to preserve context]"]
|
||
if user_queries:
|
||
summary_lines.append(f"User requests: {'; '.join(user_queries[-3:])}")
|
||
if assistant_msgs:
|
||
summary_lines.append(f"Assistant responses: {'; '.join(assistant_msgs[-3:])}")
|
||
if tool_summaries:
|
||
summary_lines.append(f"Actions taken ({len(tool_summaries)} steps):")
|
||
for ts in tool_summaries[-15:]:
|
||
summary_lines.append(f" {ts}")
|
||
if files:
|
||
summary_lines.append(f"Files touched: {', '.join(sorted(files)[-10:])}")
|
||
|
||
summary_text = "\n".join(summary_lines)
|
||
summary_msg = {
|
||
"type": "message",
|
||
"role": "user",
|
||
"content": [{"type": "input_text", "text": summary_text}]
|
||
}
|
||
|
||
print(f"[compact] {len(input_data)} items -> {len(head) + 1 + len(tail)} (compacted {len(body)} old items into summary)", file=sys.stderr)
|
||
return head + [summary_msg] + tail
|
||
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
# Provider policies
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
|
||
_PROVIDER_POLICIES = {
|
||
"crof": {"reasoning_mode": "off", "max_tokens": 32768, "strip_reasoning": True,
|
||
"tool_output_limit": 4000, "max_input_items": 18, "compaction": "aggressive",
|
||
"synthetic_tool_results": True},
|
||
"chats-llm": {"reasoning_mode": "off", "max_tokens": 32768, "strip_reasoning": True,
|
||
"tool_output_limit": 4000, "max_input_items": 20, "compaction": "aggressive"},
|
||
"z.ai": {"reasoning_mode": "medium", "max_tokens": 65536, "strip_reasoning": True,
|
||
"tool_output_limit": 8000, "max_input_items": 40, "compaction": "balanced"},
|
||
"openrouter": {"reasoning_mode": "provider_default", "max_tokens": 32768, "strip_reasoning": True,
|
||
"tool_output_limit": 6000, "max_input_items": 35, "compaction": "balanced"},
|
||
"openadapter": {"reasoning_mode": "off", "max_tokens": 32768, "strip_reasoning": True,
|
||
"tool_output_limit": 1000, "max_input_items": 10, "compaction": "aggressive",
|
||
"synthetic_tool_results": True},
|
||
"cloudcode-pa": {"compaction": "conservative", "context_size": 1000000,
|
||
"tool_output_limit": 8000, "max_input_items": 200},
|
||
"googleapis": {"compaction": "conservative", "context_size": 1000000,
|
||
"tool_output_limit": 8000, "max_input_items": 250},
|
||
}
|
||
|
||
_DEFAULT_PROVIDER_POLICY = {
|
||
"compaction": "balanced", "context_size": 128000,
|
||
"tool_output_limit": 6000, "max_input_items": 60,
|
||
}
|
||
|
||
def provider_policy(target_url=None, backend=None):
|
||
host = urllib.parse.urlparse(target_url or TARGET_URL).netloc.lower()
|
||
for key, policy in _PROVIDER_POLICIES.items():
|
||
if key in host:
|
||
return policy
|
||
return dict(_DEFAULT_PROVIDER_POLICY)
|
||
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
# Adaptive context compaction (model-aware)
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
|
||
_MODEL_CONTEXT = {
|
||
"gpt-4o": 128000, "gpt-4o-mini": 128000, "gpt-5": 128000,
|
||
"claude-sonnet": 200000, "claude-haiku": 200000,
|
||
"glm-5.1": 128000, "glm-5": 128000, "glm-4": 128000,
|
||
"deepseek": 64000, "gemini-2.5-flash": 1000000, "gemini-2.5-pro": 2000000,
|
||
"gemini-3-flash": 1000000, "gemini-3.5-flash-low": 1000000,
|
||
"gemini-3.1-pro-low": 2000000,
|
||
"gemini-3.5-flash": 1000000, "gemini-3.1-pro": 2000000,
|
||
"Gemini 3.5 Flash": 1000000, "Gemini 3.1 Pro": 2000000,
|
||
"Claude Sonnet 4.6": 200000, "Claude Opus 4.6": 200000,
|
||
"GPT-OSS 120B": 128000,
|
||
"claude-sonnet-4-6": 200000, "claude-opus-4-6-thinking": 200000,
|
||
"gpt-oss-120b-medium": 128000,
|
||
"mimo": 32768, "minimax": 32768, "kimi": 128000,
|
||
"_default": 32768,
|
||
}
|
||
|
||
def _context_limit_for_model(model):
|
||
if not model:
|
||
return _MODEL_CONTEXT["_default"]
|
||
ml = model.lower()
|
||
for key, limit in _MODEL_CONTEXT.items():
|
||
if key != "_default" and key in ml:
|
||
return limit
|
||
return _MODEL_CONTEXT["_default"]
|
||
|
||
def _estimate_tokens(obj):
|
||
if obj is None:
|
||
return 0
|
||
if isinstance(obj, str):
|
||
return max(1, len(obj) // 4)
|
||
try:
|
||
raw = json.dumps(obj, ensure_ascii=False)
|
||
except Exception:
|
||
raw = str(obj)
|
||
return max(1, len(raw) // 4)
|
||
|
||
def _adaptive_compact(input_data, model, policy=None):
|
||
policy = policy or {}
|
||
context_size = int(policy.get("context_size", _context_limit_for_model(model)))
|
||
input_budget = int(context_size * 0.80)
|
||
estimated = _estimate_tokens(input_data)
|
||
if estimated <= input_budget:
|
||
return input_data, False
|
||
if not isinstance(input_data, list):
|
||
return input_data, False
|
||
reduction = max(0.15, input_budget / max(estimated, 1))
|
||
target_items = max(int(len(input_data) * reduction), 6)
|
||
if target_items >= len(input_data):
|
||
return input_data, False
|
||
head_end = 0
|
||
for i, item in enumerate(input_data):
|
||
t = item.get("type")
|
||
if t == "message" and item.get("role") in ("developer", "system"):
|
||
head_end = i + 1
|
||
elif t == "message" and item.get("role") == "user" and head_end == i:
|
||
head_end = i + 1
|
||
else:
|
||
break
|
||
head = input_data[:head_end]
|
||
keep = max(4, target_items // 3)
|
||
tail_start = max(head_end, len(input_data) - keep)
|
||
while tail_start > head_end:
|
||
t = input_data[tail_start].get("type")
|
||
if t in ("function_call_output", "function_call"):
|
||
tail_start -= 1
|
||
elif t == "message" and input_data[tail_start].get("role") == "assistant":
|
||
tail_start -= 1
|
||
else:
|
||
break
|
||
tail = input_data[tail_start:]
|
||
body = input_data[head_end:tail_start]
|
||
if not body:
|
||
return head + tail, True
|
||
summary_lines = [f"[Auto-compacted: {len(body)} turns removed (budget={input_budget}tok, model={model})]"]
|
||
for item in body[-5:]:
|
||
summary_lines.append(_item_summary(item, max_len=120))
|
||
summary_msg = {"type": "message", "role": "user",
|
||
"content": [{"type": "input_text", "text": "\n".join(summary_lines)}]}
|
||
print(f"[adaptive-compact] model={model} est={estimated}tok budget={input_budget}tok "
|
||
f"items {len(input_data)}->{len(head)+1+len(tail)}", file=sys.stderr)
|
||
return head + [summary_msg] + tail, True
|
||
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
# Prompt Enhancer
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
|
||
_PROMPT_ENHANCER_SYSTEM = """You are a prompt enhancement assistant for a coding agent (Codex CLI).
|
||
Your job: rewrite the user's latest message to be clearer, more specific, and more actionable.
|
||
Rules:
|
||
- Preserve the user's EXACT intent — never change what they want done
|
||
- Add explicit action verbs and step-by-step clarity
|
||
- If the message is vague ("fix it", "make it better"), infer context from prior conversation summary and make it specific
|
||
- Keep the enhanced prompt concise — no longer than 2x the original
|
||
- If the original prompt is already clear and specific, return it unchanged
|
||
- Output ONLY the enhanced prompt text, nothing else
|
||
- Never add tasks the user didn't ask for"""
|
||
|
||
_PROMPT_ENHANCER_OFFLINE = """<prompt-enhancer>
|
||
<instructions>
|
||
You are a coding agent operating inside a context-compacted session. Follow these rules strictly:
|
||
|
||
1. ACTION CLARITY: Re-read the user's latest message. Identify every explicit and implicit action request. Execute ALL of them — do not skip any.
|
||
|
||
2. COMPACTED CONTEXT: Previous conversation was summarized. The summary preserves your task history but may lose details. If the user references earlier work ("fix that", "continue", "update it"), infer from the compacted summary what was done and what remains.
|
||
|
||
3. NO CLARIFICATION ASKING: Never ask "which file?" or "what exactly?" — infer from context. If truly ambiguous, make a reasonable assumption and proceed. The user can correct you.
|
||
|
||
4. DECISIVE EXECUTION: When the user says "fix", "update", "change", "add", "remove" — do it immediately in the relevant file(s). Do not describe what you would do — actually do it.
|
||
|
||
5. COMPLETE EDITS: When editing files, make the FULL change requested. Do not partially apply edits or leave placeholders.
|
||
|
||
6. PRESERVE WORKING STATE: Never break existing functionality. If changing code, keep all surrounding logic intact.
|
||
|
||
7. MULTI-STEP REQUESTS: If the user asks for multiple things, do ALL of them in sequence. Do not stop after the first one.
|
||
</instructions>
|
||
</prompt-enhancer>
|
||
|
||
"""
|
||
|
||
def _enhance_prompt_llm(text, compaction_summary=""):
|
||
global PROMPT_ENHANCER_MODEL, PROMPT_ENHANCER_URL, PROMPT_ENHANCER_KEY
|
||
if not PROMPT_ENHANCER_MODEL or not PROMPT_ENHANCER_URL:
|
||
return text
|
||
try:
|
||
messages = [
|
||
{"role": "system", "content": _PROMPT_ENHANCER_SYSTEM},
|
||
]
|
||
if compaction_summary:
|
||
messages.append({"role": "user", "content": f"Context from earlier conversation (compacted):\n{compaction_summary[:2000]}"})
|
||
messages.append({"role": "user", "content": f"Enhance this prompt:\n{text}"})
|
||
body = json.dumps({"model": PROMPT_ENHANCER_MODEL, "messages": messages, "max_tokens": 2000, "temperature": 0.3}).encode()
|
||
headers = {"Content-Type": "application/json"}
|
||
if PROMPT_ENHANCER_KEY:
|
||
headers["Authorization"] = f"Bearer {PROMPT_ENHANCER_KEY}"
|
||
req = urllib.request.Request(f"{PROMPT_ENHANCER_URL.rstrip('/')}/chat/completions", data=body, headers=headers)
|
||
resp = urllib.request.urlopen(req, timeout=15)
|
||
data = json.loads(resp.read())
|
||
enhanced = data.get("choices", [{}])[0].get("message", {}).get("content", "").strip()
|
||
if enhanced and len(enhanced) >= len(text) * 0.5:
|
||
print(f"[prompt-enhancer] AI enhanced: {text[:80]}... -> {enhanced[:80]}...", file=sys.stderr)
|
||
return enhanced
|
||
except Exception as e:
|
||
print(f"[prompt-enhancer] AI enhancement failed: {e}", file=sys.stderr)
|
||
return text
|
||
|
||
def _apply_prompt_enhancer(input_data):
|
||
global PROMPT_ENHANCER_MODE
|
||
if not isinstance(input_data, list) or len(input_data) == 0:
|
||
return input_data
|
||
last_user_idx = None
|
||
for i in range(len(input_data) - 1, -1, -1):
|
||
item = input_data[i]
|
||
if isinstance(item, dict) and item.get("type") == "message" and item.get("role") == "user":
|
||
last_user_idx = i
|
||
break
|
||
if last_user_idx is None:
|
||
return input_data
|
||
item = input_data[last_user_idx]
|
||
content = item.get("content", "")
|
||
if isinstance(content, list):
|
||
text = content[0].get("text", "") if content else ""
|
||
elif isinstance(content, str):
|
||
text = content
|
||
else:
|
||
return input_data
|
||
if not text or len(text) < 5:
|
||
return input_data
|
||
if text.startswith("<prompt-enhancer>"):
|
||
return input_data
|
||
compaction_summary = ""
|
||
for it in input_data:
|
||
if isinstance(it, dict) and it.get("type") == "message" and it.get("role") == "user":
|
||
c = it.get("content", "")
|
||
t = ""
|
||
if isinstance(c, list):
|
||
t = c[0].get("text", "") if c else ""
|
||
elif isinstance(c, str):
|
||
t = c
|
||
if "[Auto-compacted:" in t:
|
||
compaction_summary = t[:3000]
|
||
break
|
||
if PROMPT_ENHANCER_MODE == "ai-powered" and PROMPT_ENHANCER_MODEL and PROMPT_ENHANCER_URL:
|
||
enhanced = _enhance_prompt_llm(text, compaction_summary)
|
||
else:
|
||
enhanced = text
|
||
enhanced = _PROMPT_ENHANCER_OFFLINE + enhanced
|
||
new_item = dict(item)
|
||
if isinstance(item.get("content"), list):
|
||
new_item["content"] = [{"type": "input_text", "text": enhanced}]
|
||
else:
|
||
new_item["content"] = enhanced
|
||
result = list(input_data)
|
||
result[last_user_idx] = new_item
|
||
print(f"[prompt-enhancer] mode={PROMPT_ENHANCER_MODE} enhanced last user message ({len(text)}->{len(enhanced)} chars)", file=sys.stderr)
|
||
return result
|
||
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
# Tool-call pairing validator
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
|
||
def validate_tool_pairs(input_items):
|
||
if not isinstance(input_items, list):
|
||
return []
|
||
calls = {}
|
||
errors = []
|
||
for idx, item in enumerate(input_items):
|
||
t = item.get("type")
|
||
if t == "function_call":
|
||
cid = item.get("call_id") or item.get("id")
|
||
if cid:
|
||
calls[cid] = idx
|
||
elif t == "function_call_output":
|
||
cid = item.get("call_id") or item.get("id")
|
||
if not cid or cid not in calls:
|
||
errors.append({"index": idx, "call_id": cid, "error": "orphan_function_call_output"})
|
||
return errors
|
||
|
||
def repair_orphan_tool_outputs(input_items, errors):
|
||
bad = {e["index"] for e in errors}
|
||
repaired = []
|
||
for idx, item in enumerate(input_items):
|
||
if idx in bad:
|
||
output = item.get("output", "")
|
||
repaired.append({"type": "message", "role": "user",
|
||
"content": [{"type": "input_text",
|
||
"text": f"[Proxy: unmatched tool output]\n{str(output)[:4000]}"}]})
|
||
else:
|
||
repaired.append(item)
|
||
return repaired
|
||
|
||
def synthesize_tool_results_for_chat(input_items):
|
||
"""Convert Responses function_call/function_call_output pairs into plain text.
|
||
|
||
Some OpenAI-compatible providers accept tool calls on the first turn but fail
|
||
on the next request when role=tool messages are present. For those providers,
|
||
encode tool outputs as normal user text so the model can continue.
|
||
"""
|
||
if not isinstance(input_items, list):
|
||
return input_items, False
|
||
calls = {}
|
||
changed = False
|
||
out = []
|
||
for item in input_items:
|
||
t = item.get("type")
|
||
if t == "function_call":
|
||
cid = item.get("call_id") or item.get("id") or ""
|
||
calls[cid] = item
|
||
changed = True
|
||
continue
|
||
if t == "function_call_output":
|
||
cid = item.get("call_id") or item.get("id") or ""
|
||
call = calls.get(cid, {})
|
||
name = call.get("name", "tool")
|
||
args = call.get("arguments", "{}")
|
||
output = item.get("output", "")
|
||
text = (
|
||
"Tool execution result. Continue the task using this result. "
|
||
"Do not repeat the same tool call unless more information is required.\n\n"
|
||
f"Tool: {name}\nArguments:\n```json\n{str(args)[:2000]}\n```\n"
|
||
f"Output:\n```\n{str(output)[:8000]}\n```"
|
||
)
|
||
out.append({"type": "message", "role": "user", "content": [{"type": "input_text", "text": text}]})
|
||
changed = True
|
||
continue
|
||
out.append(item)
|
||
return out, changed
|
||
|
||
def has_function_call_output(input_items):
|
||
return isinstance(input_items, list) and any(i.get("type") == "function_call_output" for i in input_items)
|
||
|
||
_TOOL_CALL_TEXT_PATTERNS = re.compile(
|
||
r'(?:^|\n)[\s•\-\*]*\(?'
|
||
r'(?:exec_command|write_to_file|exec_bash|bash|run_command|shell|edit_file|read_file|search_files|list_files)'
|
||
r'[\s:]',
|
||
re.I | re.MULTILINE
|
||
)
|
||
|
||
def _text_looks_like_tool_calls(text):
|
||
if not text or len(text) < 6:
|
||
return False
|
||
return bool(_TOOL_CALL_TEXT_PATTERNS.search(text))
|
||
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
# Log redaction
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
|
||
_SECRET_PATTERNS = [
|
||
(r"sk-[A-Za-z0-9_\-]{20,}", "[REDACTED:key]"),
|
||
(r"sk-ant-[A-Za-z0-9_\-]{20,}", "[REDACTED:anthropic]"),
|
||
(r"gh[pousr]_[A-Za-z0-9_]{20,}", "[REDACTED:github]"),
|
||
(r"Bearer\s+[A-Za-z0-9._\-]{20,}", "Bearer [REDACTED]"),
|
||
]
|
||
|
||
def _redact(text):
|
||
if not text:
|
||
return text
|
||
import re
|
||
for pattern, replacement in _SECRET_PATTERNS:
|
||
text = re.sub(pattern, replacement, text)
|
||
return text
|
||
|
||
def _redact_json(obj):
|
||
try:
|
||
raw = json.dumps(obj, ensure_ascii=False)
|
||
except Exception:
|
||
raw = str(obj)
|
||
return _redact(raw)
|
||
|
||
_MAX_SNAPSHOTS = 200
|
||
|
||
def save_request_snapshot(request_id, body):
|
||
if not request_id:
|
||
return request_id
|
||
snapshot = {
|
||
"_meta": {
|
||
"request_id": request_id,
|
||
"model": body.get("model", ""),
|
||
"stream": body.get("stream", False),
|
||
"ts": time.time(),
|
||
"ts_iso": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
|
||
"status": "pending",
|
||
"duration_s": None,
|
||
"error": None,
|
||
},
|
||
"request": json.loads(_redact_json(body)),
|
||
}
|
||
path = os.path.join(_REQUESTS_DIR, f"{request_id}.json")
|
||
tmp = path + ".tmp"
|
||
with open(tmp, "w", encoding="utf-8") as f:
|
||
json.dump(snapshot, f, ensure_ascii=False, indent=2)
|
||
os.replace(tmp, path)
|
||
_rotate_snapshots()
|
||
return request_id
|
||
|
||
def update_snapshot_response(request_id, status, duration_s=None, error=None):
|
||
if not request_id:
|
||
return
|
||
path = os.path.join(_REQUESTS_DIR, f"{request_id}.json")
|
||
if not os.path.exists(path):
|
||
return
|
||
try:
|
||
with open(path) as f:
|
||
snapshot = json.load(f)
|
||
meta = snapshot.get("_meta", {})
|
||
meta["status"] = status
|
||
if duration_s is not None:
|
||
meta["duration_s"] = round(duration_s, 3)
|
||
if error is not None:
|
||
meta["error"] = str(error)[:200]
|
||
snapshot["_meta"] = meta
|
||
tmp = path + ".tmp"
|
||
with open(tmp, "w", encoding="utf-8") as f:
|
||
json.dump(snapshot, f, ensure_ascii=False, indent=2)
|
||
os.replace(tmp, path)
|
||
except Exception:
|
||
pass
|
||
|
||
def _rotate_snapshots():
|
||
try:
|
||
files = sorted(
|
||
[os.path.join(_REQUESTS_DIR, f) for f in os.listdir(_REQUESTS_DIR) if f.endswith(".json")],
|
||
key=os.path.getmtime,
|
||
)
|
||
while len(files) > _MAX_SNAPSHOTS:
|
||
os.remove(files.pop(0))
|
||
except Exception:
|
||
pass
|
||
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
# Rate-limit token buckets
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
|
||
class TokenBucket:
|
||
def __init__(self, capacity=10, refill=1.0):
|
||
self.capacity = float(capacity)
|
||
self.tokens = float(capacity)
|
||
self.refill = float(refill)
|
||
self.updated = time.monotonic()
|
||
self.lock = threading.Lock()
|
||
def allow(self, cost=1):
|
||
with self.lock:
|
||
now = time.monotonic()
|
||
self.tokens = min(self.capacity, self.tokens + (now - self.updated) * self.refill)
|
||
self.updated = now
|
||
if self.tokens >= cost:
|
||
self.tokens -= cost
|
||
return True
|
||
return False
|
||
|
||
_rate_buckets = {}
|
||
_rate_buckets_lock = threading.Lock()
|
||
|
||
def _bucket_for_route(route):
|
||
name = route.get("name") or route.get("target_url") or "default"
|
||
with _rate_buckets_lock:
|
||
if name not in _rate_buckets:
|
||
_rate_buckets[name] = TokenBucket(capacity=10, refill=1.0)
|
||
return _rate_buckets[name]
|
||
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
# OpenAI-compat backend
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
|
||
def _inject_stored_reasoning(messages):
|
||
with _last_reasoning_lock:
|
||
snapshot = dict(_last_reasoning_store)
|
||
if not snapshot:
|
||
return messages
|
||
expired = [k for k, v in snapshot.items() if time.time() - v["ts"] > _RESPONSE_TTL]
|
||
for k in expired:
|
||
with _last_reasoning_lock:
|
||
_last_reasoning_store.pop(k, None)
|
||
snapshot.pop(k, None)
|
||
if not snapshot:
|
||
return messages
|
||
latest = max(snapshot.values(), key=lambda v: v["ts"])
|
||
reasoning = latest.get("reasoning", "")
|
||
if not reasoning:
|
||
return messages
|
||
for msg in messages:
|
||
if msg.get("role") == "assistant" and "reasoning_content" not in msg and msg.get("tool_calls"):
|
||
msg["reasoning_content"] = reasoning
|
||
return messages
|
||
|
||
def _normalize_tool_args(raw_args):
|
||
if not raw_args or raw_args == "{}":
|
||
return raw_args
|
||
try:
|
||
parsed = json.loads(raw_args)
|
||
if isinstance(parsed, dict):
|
||
if "Arguments" in parsed and "arguments" not in parsed:
|
||
inner = parsed["Arguments"]
|
||
if isinstance(inner, str):
|
||
inner = inner.strip()
|
||
for pfx in ("```json", "```"):
|
||
if inner.startswith(pfx):
|
||
inner = inner[len(pfx):].strip()
|
||
if inner.endswith("```"):
|
||
inner = inner[:-3].strip()
|
||
try:
|
||
inner_parsed = json.loads(inner)
|
||
if isinstance(inner_parsed, dict):
|
||
return json.dumps(inner_parsed)
|
||
except json.JSONDecodeError:
|
||
pass
|
||
if "cmd" not in parsed and "Arguments" in parsed:
|
||
inner = parsed["Arguments"]
|
||
if isinstance(inner, str):
|
||
inner = inner.strip()
|
||
for pfx in ("```json", "```"):
|
||
if inner.startswith(pfx):
|
||
inner = inner[len(pfx):].strip()
|
||
if inner.endswith("```"):
|
||
inner = inner[:-3].strip()
|
||
try:
|
||
inner_parsed = json.loads(inner)
|
||
if isinstance(inner_parsed, dict):
|
||
return json.dumps(inner_parsed)
|
||
except json.JSONDecodeError:
|
||
pass
|
||
return raw_args
|
||
except json.JSONDecodeError:
|
||
return raw_args
|
||
|
||
_XML_TC_RE = re.compile(r'<invoke><(\w+)(?:_command)?>(.*?)</\1(?:_command)?></invoke>', re.DOTALL)
|
||
_XML_ARG_VALUE_RE = re.compile(r'</?arg_value>\s*')
|
||
|
||
_PAREN_TC_RE = re.compile(
|
||
r'(?:^|[\n•\-\*]\s*)\(\s*(exec_command|write_to_file|exec_bash|bash|run_command|shell|edit_file|read_file|search_files|list_files)\b\s*(.*?)\)',
|
||
re.DOTALL | re.I
|
||
)
|
||
|
||
def _extract_xml_tool_calls(text):
|
||
if not text:
|
||
return []
|
||
results = []
|
||
for m in _XML_TC_RE.finditer(text):
|
||
name = m.group(1)
|
||
rest = _XML_ARG_VALUE_RE.sub("", m.group(2)).strip()
|
||
args_str = "{}"
|
||
try:
|
||
for pfx in ("```json", "```"):
|
||
if rest.startswith(pfx):
|
||
rest = rest[len(pfx):].strip()
|
||
if rest.endswith("```"):
|
||
rest = rest[:-3].strip()
|
||
if rest.startswith("{"):
|
||
json.loads(rest)
|
||
args_str = rest
|
||
else:
|
||
json.loads(rest)
|
||
args_str = rest
|
||
except Exception:
|
||
if rest.startswith("{"):
|
||
args_str = rest
|
||
results.append({"name": name, "args": args_str, "call_id": f"xml_{len(results)}"})
|
||
return results
|
||
|
||
_NON_VISION_MODEL_PATTERNS = re.compile(
|
||
r'\b(deepseek|glm|mixtral|llama\b(?!.*vision)|command|dbrx|qwen\b(?!.*vl)|phi-?3(?!.*vision))',
|
||
re.I
|
||
)
|
||
|
||
_vision_fail_cache = set()
|
||
_vision_fail_lock = threading.Lock()
|
||
|
||
def _model_supports_vision(model):
|
||
if not model:
|
||
return True
|
||
with _vision_fail_lock:
|
||
if model in _vision_fail_cache:
|
||
return False
|
||
if _NON_VISION_MODEL_PATTERNS.search(model):
|
||
return False
|
||
return True
|
||
|
||
def _mark_vision_fail(model):
|
||
if model:
|
||
with _vision_fail_lock:
|
||
_vision_fail_cache.add(model)
|
||
|
||
def _strip_images_from_input(input_data, model):
|
||
if not isinstance(input_data, list) or _model_supports_vision(model):
|
||
return input_data
|
||
modified = False
|
||
result = []
|
||
for item in input_data:
|
||
if item.get("type") != "message":
|
||
result.append(item)
|
||
continue
|
||
content = item.get("content", [])
|
||
if isinstance(content, str):
|
||
result.append(item)
|
||
continue
|
||
new_content = []
|
||
has_img = False
|
||
for part in content:
|
||
if isinstance(part, str):
|
||
new_content.append(part)
|
||
continue
|
||
pt = part.get("type", "")
|
||
if pt in ("input_image", "image_url"):
|
||
if not has_img:
|
||
fname = part.get("image_url", {}).get("url", part.get("url", "image.png"))
|
||
if fname.startswith("data:"):
|
||
fname = "screenshot.png"
|
||
new_content.append({"type": "output_text", "text": f"[User attached image: {fname} — this model does not support vision]"})
|
||
has_img = True
|
||
modified = True
|
||
else:
|
||
new_content.append(part)
|
||
if modified:
|
||
result.append({**item, "content": new_content})
|
||
else:
|
||
result.append(item)
|
||
if modified:
|
||
print(f"[vision-filter] stripped {sum(1 for i in input_data if i.get('type')=='message' and any(c.get('type') in ('input_image','image_url') for c in (i.get('content') or []) if isinstance(c,dict)))} images for model={model}", file=sys.stderr)
|
||
return result
|
||
return input_data
|
||
|
||
def oa_input_to_messages(input_data):
|
||
msgs = []
|
||
tool_name_by_id = {}
|
||
if isinstance(input_data, str):
|
||
msgs.append({"role": "user", "content": input_data})
|
||
elif isinstance(input_data, list):
|
||
pending_tool_calls = []
|
||
last_flushed_ids = []
|
||
for item in input_data:
|
||
t = item.get("type")
|
||
if t == "function_call":
|
||
tcid = item.get("call_id") or item.get("id") or uid("tc")
|
||
raw_args = item.get("arguments", "{}")
|
||
normalized_args = _normalize_tool_args(raw_args)
|
||
pending_tool_calls.append(
|
||
{"id": tcid,
|
||
"type": "function",
|
||
"function": {"name": item.get("name", ""),
|
||
"arguments": normalized_args}})
|
||
tool_name_by_id[tcid] = item.get("name", "")
|
||
continue
|
||
if pending_tool_calls:
|
||
last_flushed_ids = [tc["id"] for tc in pending_tool_calls]
|
||
msgs.append({"role": "assistant", "content": None, "tool_calls": pending_tool_calls})
|
||
pending_tool_calls = []
|
||
if t == "message":
|
||
role = item.get("role", "user")
|
||
if role == "developer":
|
||
role = "system"
|
||
text = ""
|
||
reasoning_text = ""
|
||
content = item.get("content", [])
|
||
if isinstance(content, str):
|
||
text = content
|
||
else:
|
||
for part in content:
|
||
if isinstance(part, str):
|
||
text += part
|
||
continue
|
||
pt = part.get("type", "")
|
||
if pt in ("input_text", "output_text"):
|
||
text += part.get("text", "")
|
||
elif pt in ("reasoning",):
|
||
for rp in part.get("content", []):
|
||
reasoning_text += rp.get("text", "")
|
||
elif pt == "input_image":
|
||
img = part.get("image_url", part)
|
||
msgs.append({"role": role, "content": [{"type": "text", "text": text},
|
||
{"type": "image_url", "image_url": img}]})
|
||
text = None
|
||
break
|
||
if text is not None:
|
||
msg = {"role": role, "content": text}
|
||
if reasoning_text and role == "assistant":
|
||
msg["reasoning_content"] = reasoning_text
|
||
msgs.append(msg)
|
||
elif t == "function_call_output":
|
||
tcid = item.get("call_id") or item.get("id") or ""
|
||
if not tcid and last_flushed_ids:
|
||
idx = len([m for m in msgs if m.get("role") == "tool"])
|
||
if idx < len(last_flushed_ids):
|
||
tcid = last_flushed_ids[idx]
|
||
msgs.append({"role": "tool", "tool_call_id": tcid,
|
||
"tool_name": tool_name_by_id.get(tcid, ""),
|
||
"content": item.get("output", "")})
|
||
if pending_tool_calls:
|
||
msgs.append({"role": "assistant", "content": None, "tool_calls": pending_tool_calls})
|
||
return msgs
|
||
|
||
def cc_input_to_messages(input_data, instructions="", schema=None):
|
||
"""Convert Responses API input into CommandCode /alpha/generate messages.
|
||
|
||
[FIX 1] All messages use STRING content (not content blocks).
|
||
CC API rejects params.messages[i].content when it's an array.
|
||
Tool results are role="user" with plain text content.
|
||
Tool calls: inline JSON text in assistant messages (e.g. {"type":"tool-call","id":"..."}).
|
||
|
||
The model echoes this format back in its response text-delta events.
|
||
_parse_commandcode_text_tool_calls extracts them via _extract_raw_json_tool_calls.
|
||
|
||
Schema parameter is accepted but not used for format decisions —
|
||
the conservative string-content format is always used regardless of schema hints.
|
||
"""
|
||
msgs = []
|
||
pending_tool_calls = []
|
||
last_flushed_ids = []
|
||
|
||
def text_from_content(content):
|
||
if isinstance(content, str):
|
||
return content
|
||
text = ""
|
||
for part in content or []:
|
||
if isinstance(part, str):
|
||
text += part
|
||
continue
|
||
if not isinstance(part, dict):
|
||
continue
|
||
if part.get("type") in ("input_text", "output_text", "text"):
|
||
text += part.get("text", "")
|
||
return text
|
||
|
||
def flush_tool_calls():
|
||
nonlocal pending_tool_calls, last_flushed_ids
|
||
if not pending_tool_calls:
|
||
return
|
||
last_flushed_ids = [tc["id"] for tc in pending_tool_calls]
|
||
# Tool calls as plain text in assistant message
|
||
tc_text = "\n".join(
|
||
json.dumps(tc, ensure_ascii=False) for tc in pending_tool_calls
|
||
)
|
||
msgs.append({"role": "assistant", "content": tc_text})
|
||
pending_tool_calls = []
|
||
|
||
if instructions:
|
||
msgs.append({"role": "user", "content": instructions})
|
||
|
||
if isinstance(input_data, str):
|
||
msgs.append({"role": "user", "content": input_data})
|
||
return msgs
|
||
if not isinstance(input_data, list):
|
||
return msgs
|
||
|
||
for item in input_data:
|
||
if not isinstance(item, dict):
|
||
continue
|
||
t = item.get("type")
|
||
if t == "function_call":
|
||
tcid = item.get("call_id") or item.get("id") or uid("call")
|
||
name = item.get("name") or "exec_command"
|
||
pending_tool_calls.append({
|
||
"type": "tool-call",
|
||
"id": tcid,
|
||
"name": name,
|
||
"arguments": item.get("arguments") or "{}",
|
||
})
|
||
continue
|
||
flush_tool_calls()
|
||
if t == "message":
|
||
role = item.get("role", "user")
|
||
if role not in ("user", "assistant"):
|
||
role = "user"
|
||
text = text_from_content(item.get("content", []))
|
||
msgs.append({"role": role, "content": text})
|
||
elif t == "function_call_output":
|
||
output = item.get("output", "")
|
||
if not isinstance(output, str):
|
||
output = json.dumps(output, ensure_ascii=False)
|
||
# /alpha/generate expects string content for ALL messages
|
||
msgs.append({"role": "user", "content": output[:8000]})
|
||
flush_tool_calls()
|
||
return msgs
|
||
|
||
def oa_convert_tools(tools, strict=False):
|
||
if not tools:
|
||
return None
|
||
out = []
|
||
for t in tools:
|
||
if t.get("type") != "function":
|
||
continue
|
||
fn = t.get("function", {})
|
||
name = ""
|
||
if fn:
|
||
name = (fn.get("name") or "").strip()
|
||
else:
|
||
name = (t.get("name") or "").strip()
|
||
if not name or name == "null":
|
||
continue
|
||
if fn:
|
||
entry = dict(t)
|
||
if strict and "strict" not in fn:
|
||
entry["function"] = dict(fn, strict=True)
|
||
out.append(entry)
|
||
else:
|
||
entry = {
|
||
"type": "function",
|
||
"function": {"name": name, "description": t.get("description", ""),
|
||
"parameters": t.get("parameters", {})}
|
||
}
|
||
if strict:
|
||
entry["function"]["strict"] = True
|
||
out.append(entry)
|
||
return out or None
|
||
|
||
def oa_resp_to_responses(chat_resp, model, resp_id=None):
|
||
choice = chat_resp["choices"][0]
|
||
msg = choice["message"]
|
||
content = msg.get("content") or ""
|
||
finish = choice.get("finish_reason", "stop")
|
||
fm = {"stop": "completed", "length": "incomplete", "tool_calls": "completed", "content_filter": "incomplete"}
|
||
status = fm.get(finish, "incomplete")
|
||
outputs = []
|
||
if content:
|
||
outputs.append({"type": "message", "id": uid("msg"), "role": "assistant", "status": "completed",
|
||
"content": [{"type": "output_text", "text": content, "annotations": []}]})
|
||
for tc in msg.get("tool_calls") or []:
|
||
fn = tc.get("function", {})
|
||
outputs.append({"type": "function_call", "id": uid("fc"), "call_id": tc.get("id"),
|
||
"name": fn.get("name"), "arguments": fn.get("arguments", "{}"), "status": "completed"})
|
||
usage = chat_resp.get("usage", {})
|
||
return {"id": resp_id or uid("resp"), "object": "response", "created": int(time.time()),
|
||
"model": model, "status": status, "output": outputs,
|
||
"usage": {"input_tokens": usage.get("prompt_tokens", 0),
|
||
"output_tokens": usage.get("completion_tokens", 0),
|
||
"total_tokens": usage.get("total_tokens", 0),
|
||
"input_tokens_details": {"cached_tokens": usage.get("prompt_tokens_details", {}).get("cached_tokens", 0)}}}
|
||
|
||
def oa_stream_to_sse(chat_stream, model, req_id, _reasoning_out=None):
|
||
resp_id = req_id or uid("resp")
|
||
msg_id = uid("msg")
|
||
text_buf = ""
|
||
reasoning_buf = ""
|
||
reasoning_opened = False
|
||
tc_buf = {}
|
||
fr = None
|
||
msg_opened = False
|
||
|
||
yield emit("response.created", {"type": "response.created",
|
||
"response": {"id": resp_id, "object": "response", "model": model,
|
||
"status": "in_progress", "created": int(time.time()), "output": []}})
|
||
yield emit("response.in_progress", {"type": "response.in_progress", "response": {"id": resp_id}})
|
||
|
||
for line in _stream_with_idle_timeout(chat_stream):
|
||
line = line.decode("utf-8", errors="replace").strip()
|
||
if not line or line.startswith(":") or line == "data: [DONE]":
|
||
continue
|
||
if not line.startswith("data: "):
|
||
continue
|
||
try:
|
||
chunk = json.loads(line[6:])
|
||
except json.JSONDecodeError:
|
||
continue
|
||
choices = chunk.get("choices", [])
|
||
if not choices:
|
||
continue
|
||
delta = choices[0].get("delta", {})
|
||
fr = choices[0].get("finish_reason")
|
||
|
||
rc = delta.get("reasoning_content") or delta.get("reasoning")
|
||
if rc:
|
||
if not reasoning_opened:
|
||
reasoning_opened = True
|
||
reasoning_buf += rc
|
||
yield emit("response.reasoning.delta", {"type": "response.reasoning.delta", "delta": rc})
|
||
|
||
content = delta.get("content")
|
||
if content:
|
||
if not msg_opened:
|
||
msg_id = uid("msg")
|
||
yield emit("response.output_item.added", {"type": "response.output_item.added",
|
||
"item": {"type": "message", "id": msg_id, "role": "assistant", "status": "in_progress", "content": []}})
|
||
yield emit("response.content_part.added", {"type": "response.content_part.added",
|
||
"part": {"type": "output_text", "text": "", "annotations": []}, "item_id": msg_id})
|
||
msg_opened = True
|
||
text_buf += content
|
||
yield emit("response.output_text.delta", {"type": "response.output_text.delta",
|
||
"delta": content, "item_id": msg_id, "content_index": 0})
|
||
|
||
for tc in delta.get("tool_calls") or []:
|
||
idx = tc.get("index", 0)
|
||
if idx not in tc_buf:
|
||
fid = uid("fc")
|
||
tc_buf[idx] = {"id": fid, "call_id": tc.get("id", fid), "name": "", "args": ""}
|
||
yield emit("response.output_item.added", {"type": "response.output_item.added",
|
||
"item": {"type": "function_call", "id": fid, "call_id": tc_buf[idx]["call_id"],
|
||
"name": "", "arguments": "", "status": "in_progress"}})
|
||
fn = tc.get("function", {})
|
||
if "name" in fn and fn["name"]:
|
||
tc_buf[idx]["name"] = fn["name"]
|
||
if "arguments" in fn and fn["arguments"]:
|
||
tc_buf[idx]["args"] += fn["arguments"]
|
||
yield emit("response.output_text.delta", {"type": "response.function_call_arguments.delta",
|
||
"delta": fn["arguments"], "item_id": tc_buf[idx]["id"]})
|
||
|
||
reasoning_rsn_id = uid("rsn") if reasoning_buf else None
|
||
if reasoning_opened:
|
||
yield emit("response.reasoning.done", {"type": "response.reasoning.done",
|
||
"item_id": reasoning_rsn_id, "text": reasoning_buf})
|
||
|
||
if msg_opened:
|
||
yield emit("response.output_text.done", {"type": "response.output_text.done",
|
||
"text": text_buf, "item_id": msg_id, "content_index": 0})
|
||
yield emit("response.content_part.done", {"type": "response.content_part.done",
|
||
"part": {"type": "output_text", "text": text_buf, "annotations": []}, "item_id": msg_id})
|
||
yield emit("response.output_item.done", {"type": "response.output_item.done",
|
||
"item": {"type": "message", "id": msg_id, "role": "assistant", "status": "completed",
|
||
"content": [{"type": "output_text", "text": text_buf, "annotations": []}]}})
|
||
|
||
for idx in sorted(tc_buf):
|
||
t = tc_buf[idx]
|
||
yield emit("response.function_call_arguments.done", {"type": "response.function_call_arguments.done",
|
||
"item_id": t["id"], "name": t["name"], "arguments": t["args"]})
|
||
yield emit("response.output_item.done", {"type": "response.output_item.done",
|
||
"item": {"type": "function_call", "id": t["id"], "call_id": t["call_id"],
|
||
"name": t["name"], "arguments": t["args"], "status": "completed"}})
|
||
|
||
fm = {"stop": "completed", "length": "incomplete", "tool_calls": "completed", "content_filter": "incomplete"}
|
||
status = fm.get(fr, "incomplete")
|
||
final_out = []
|
||
if reasoning_buf:
|
||
final_out.append({"type": "reasoning", "id": reasoning_rsn_id, "status": "completed",
|
||
"content": [{"type": "text", "text": reasoning_buf}]})
|
||
if msg_opened:
|
||
msg_content = []
|
||
if reasoning_buf:
|
||
msg_content.append({"type": "output_text", "text": text_buf, "annotations": []})
|
||
else:
|
||
msg_content.append({"type": "output_text", "text": text_buf, "annotations": []})
|
||
final_out.append({"type": "message", "id": msg_id, "role": "assistant", "status": "completed",
|
||
"content": msg_content})
|
||
for idx in sorted(tc_buf):
|
||
t = tc_buf[idx]
|
||
final_out.append({"type": "function_call", "id": t["id"], "call_id": t["call_id"],
|
||
"name": t["name"], "arguments": t["args"], "status": "completed"})
|
||
yield emit("response.completed", {"type": "response.completed",
|
||
"response": {"id": resp_id, "object": "response", "model": model,
|
||
"status": status, "created": int(time.time()), "output": final_out}})
|
||
if _reasoning_out is not None:
|
||
_reasoning_out["text"] = reasoning_buf
|
||
_reasoning_out["tool_calls"] = [tc_buf[i] for i in sorted(tc_buf)] if tc_buf else []
|
||
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
# Anthropic backend
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
|
||
def an_input_to_messages(input_data):
|
||
msgs = []
|
||
if isinstance(input_data, str):
|
||
msgs.append({"role": "user", "content": input_data})
|
||
elif isinstance(input_data, list):
|
||
for item in input_data:
|
||
t = item.get("type")
|
||
if t == "message":
|
||
role = item.get("role", "user")
|
||
if role == "developer":
|
||
role = "user"
|
||
text = ""
|
||
thinking_blocks = []
|
||
for part in item.get("content", []):
|
||
pt = part.get("type", "")
|
||
if pt in ("input_text", "output_text"):
|
||
text += part.get("text", "")
|
||
elif pt in ("reasoning", "thinking"):
|
||
thinking_text = ""
|
||
for rp in part.get("content", []):
|
||
thinking_text += rp.get("text", "")
|
||
if thinking_text:
|
||
thinking_blocks.append({"type": "thinking", "thinking": thinking_text, "signature": part.get("signature", "")})
|
||
if role == "assistant":
|
||
content_parts = []
|
||
if thinking_blocks:
|
||
content_parts.extend(thinking_blocks)
|
||
if text:
|
||
content_parts.append({"type": "text", "text": text})
|
||
msgs.append({"role": "assistant", "content": content_parts if content_parts else text})
|
||
else:
|
||
msgs.append({"role": "user", "content": text})
|
||
elif t == "function_call":
|
||
msgs.append({"role": "assistant", "content": [
|
||
{"type": "tool_use", "id": item.get("call_id", item.get("id", uid("tu"))),
|
||
"name": item.get("name", ""),
|
||
"input": json.loads(item.get("arguments", "{}"))}
|
||
]})
|
||
elif t == "function_call_output":
|
||
msgs.append({"role": "user", "content": [
|
||
{"type": "tool_result", "tool_use_id": item.get("id", ""),
|
||
"content": item.get("output", "")}
|
||
]})
|
||
return msgs
|
||
|
||
def an_convert_tools(tools):
|
||
if not tools:
|
||
return None
|
||
out = []
|
||
for t in tools:
|
||
if t.get("type") != "function":
|
||
continue
|
||
fn = t.get("function", {})
|
||
if fn:
|
||
out.append({"name": fn.get("name"), "description": fn.get("description", ""),
|
||
"input_schema": fn.get("parameters", {"type": "object", "properties": {}})})
|
||
else:
|
||
out.append({"name": t.get("name"), "description": t.get("description", ""),
|
||
"input_schema": t.get("parameters", {"type": "object", "properties": {}})})
|
||
return out or None
|
||
|
||
def an_resp_to_responses(anthro_resp, model, resp_id=None):
|
||
blocks = anthro_resp.get("content", [])
|
||
sr = anthro_resp.get("stop_reason", "end_turn")
|
||
sm = {"end_turn": "completed", "max_tokens": "incomplete", "stop_sequence": "completed", "tool_use": "completed"}
|
||
status = sm.get(sr, "incomplete")
|
||
outputs = []
|
||
for b in blocks:
|
||
bt = b.get("type", "")
|
||
if bt == "text":
|
||
outputs.append({"type": "message", "id": uid("msg"), "role": "assistant", "status": "completed",
|
||
"content": [{"type": "output_text", "text": b.get("text", ""), "annotations": []}]})
|
||
elif bt == "tool_use":
|
||
outputs.append({"type": "function_call", "id": uid("fc"), "call_id": b.get("id", ""),
|
||
"name": b.get("name", ""), "arguments": json.dumps(b.get("input", {})),
|
||
"status": "completed"})
|
||
elif bt == "thinking":
|
||
outputs.append({"type": "reasoning", "id": uid("rsn"), "status": "completed",
|
||
"content": [{"type": "text", "text": b.get("thinking", "")}]})
|
||
usage = anthro_resp.get("usage", {})
|
||
return {"id": resp_id or uid("resp"), "object": "response", "created": int(time.time()),
|
||
"model": model, "status": status, "output": outputs,
|
||
"usage": {"input_tokens": usage.get("input_tokens", 0),
|
||
"output_tokens": usage.get("output_tokens", 0),
|
||
"total_tokens": usage.get("input_tokens", 0) + usage.get("output_tokens", 0),
|
||
"input_tokens_details": {"cached_tokens": 0}}}
|
||
|
||
def an_stream_to_sse(stream, model, req_id):
|
||
resp_id = req_id or uid("resp")
|
||
completed = []
|
||
msg_id = uid("msg")
|
||
text_buf = ""
|
||
tc_id = None
|
||
tc_call_id = None
|
||
tc_name = ""
|
||
tc_args = ""
|
||
block_type = None
|
||
stop_reason = "end_turn"
|
||
|
||
yield emit("response.created", {"type": "response.created",
|
||
"response": {"id": resp_id, "object": "response", "model": model,
|
||
"status": "in_progress", "created": int(time.time()), "output": []}})
|
||
yield emit("response.in_progress", {"type": "response.in_progress", "response": {"id": resp_id}})
|
||
|
||
for raw in stream:
|
||
line = raw.decode("utf-8", errors="replace").strip()
|
||
if not line:
|
||
continue
|
||
if line.startswith("event: "):
|
||
evt_type = line[7:]
|
||
continue
|
||
if not line.startswith("data: "):
|
||
continue
|
||
try:
|
||
data = json.loads(line[6:])
|
||
except json.JSONDecodeError:
|
||
continue
|
||
|
||
et = data.get("type", "")
|
||
|
||
if et == "message_start":
|
||
pass
|
||
|
||
elif et == "content_block_start":
|
||
cb_type = data.get("content_block", {}).get("type", "")
|
||
block_type = cb_type
|
||
if cb_type == "text":
|
||
msg_id = uid("msg")
|
||
yield emit("response.output_item.added", {"type": "response.output_item.added",
|
||
"item": {"type": "message", "id": msg_id, "role": "assistant",
|
||
"status": "in_progress", "content": []}})
|
||
yield emit("response.content_part.added", {"type": "response.content_part.added",
|
||
"part": {"type": "output_text", "text": "", "annotations": []}, "item_id": msg_id})
|
||
elif cb_type == "tool_use":
|
||
cb = data.get("content_block", {})
|
||
tc_id = uid("fc")
|
||
tc_call_id = cb.get("id", tc_id)
|
||
tc_name = cb.get("name", "")
|
||
yield emit("response.output_item.added", {"type": "response.output_item.added",
|
||
"item": {"type": "function_call", "id": tc_id, "call_id": tc_call_id,
|
||
"name": tc_name, "arguments": "", "status": "in_progress"}})
|
||
elif cb_type == "thinking":
|
||
pass
|
||
|
||
elif et == "content_block_delta":
|
||
dd = data.get("delta", {})
|
||
dt = dd.get("type", "")
|
||
if dt == "text_delta":
|
||
txt = dd.get("text", "")
|
||
text_buf += txt
|
||
yield emit("response.output_text.delta", {"type": "response.output_text.delta",
|
||
"delta": txt, "item_id": msg_id, "content_index": 0})
|
||
elif dt == "input_json_delta":
|
||
pj = dd.get("partial_json", "")
|
||
tc_args += pj
|
||
yield emit("response.output_text.delta", {"type": "response.function_call_arguments.delta",
|
||
"delta": pj, "item_id": tc_id})
|
||
elif dt == "thinking_delta":
|
||
tk = dd.get("thinking", "")
|
||
yield emit("response.reasoning.delta", {"type": "response.reasoning.delta", "delta": tk})
|
||
|
||
elif et == "content_block_stop":
|
||
if block_type == "text":
|
||
yield emit("response.output_text.done", {"type": "response.output_text.done",
|
||
"text": text_buf, "item_id": msg_id, "content_index": 0})
|
||
yield emit("response.content_part.done", {"type": "response.content_part.done",
|
||
"part": {"type": "output_text", "text": text_buf, "annotations": []}, "item_id": msg_id})
|
||
yield emit("response.output_item.done", {"type": "response.output_item.done",
|
||
"item": {"type": "message", "id": msg_id, "role": "assistant", "status": "completed",
|
||
"content": [{"type": "output_text", "text": text_buf, "annotations": []}]}})
|
||
completed.append({"type": "message", "id": msg_id, "role": "assistant", "status": "completed",
|
||
"content": [{"type": "output_text", "text": text_buf, "annotations": []}]})
|
||
text_buf = ""
|
||
elif block_type == "tool_use":
|
||
yield emit("response.function_call_arguments.done", {"type": "response.function_call_arguments.done",
|
||
"item_id": tc_id, "name": tc_name, "arguments": tc_args})
|
||
yield emit("response.output_item.done", {"type": "response.output_item.done",
|
||
"item": {"type": "function_call", "id": tc_id, "call_id": tc_call_id,
|
||
"name": tc_name, "arguments": tc_args, "status": "completed"}})
|
||
completed.append({"type": "function_call", "id": tc_id, "call_id": tc_call_id,
|
||
"name": tc_name, "arguments": tc_args, "status": "completed"})
|
||
tc_id = None
|
||
tc_args = ""
|
||
block_type = None
|
||
|
||
elif et == "message_delta":
|
||
stop_reason = data.get("delta", {}).get("stop_reason", "end_turn")
|
||
|
||
elif et == "message_stop":
|
||
sm = {"end_turn": "completed", "max_tokens": "incomplete",
|
||
"stop_sequence": "completed", "tool_use": "completed"}
|
||
status = sm.get(stop_reason, "incomplete")
|
||
yield emit("response.completed", {"type": "response.completed",
|
||
"response": {"id": resp_id, "object": "response", "model": model,
|
||
"status": status, "created": int(time.time()), "output": completed}})
|
||
|
||
_DEFAULT_CC_CONFIG = {
|
||
"workingDir": tempfile.gettempdir(),
|
||
"date": "",
|
||
"environment": "windows" if _IS_WINDOWS else "linux",
|
||
"shell": "powershell" if _IS_WINDOWS else "bash",
|
||
"files": [],
|
||
"structure": [],
|
||
"isGitRepo": False,
|
||
"currentBranch": "",
|
||
"mainBranch": "",
|
||
"gitStatus": "",
|
||
"recentCommits": [],
|
||
}
|
||
|
||
def _cc_config():
|
||
cfg = dict(_DEFAULT_CC_CONFIG)
|
||
cfg["date"] = time.strftime("%Y-%m-%d")
|
||
return cfg
|
||
|
||
def cc_convert_tools(tools):
|
||
return oa_convert_tools(tools)
|
||
|
||
def _strip_xmlish_tags(text):
|
||
return re.sub(r"<[^>]+>", "", text or "")
|
||
|
||
def _unwrap_cmd(cmd_val):
|
||
"""[FIX 11] Self-healing: unwrap double-wrapped cmd values.
|
||
|
||
Model sometimes generates: {"cmd": "{\"cmd\": \"actual_command\"}"}
|
||
Detect when cmd value is itself a JSON object with a nested "cmd" key,
|
||
and extract the real command string. Recursively unwraps up to 3 levels.
|
||
"""
|
||
if not isinstance(cmd_val, str) or not cmd_val.startswith("{"):
|
||
return cmd_val
|
||
for _ in range(3):
|
||
try:
|
||
inner = json.loads(cmd_val)
|
||
if isinstance(inner, dict) and "cmd" in inner and isinstance(inner["cmd"], str):
|
||
cmd_val = inner["cmd"]
|
||
else:
|
||
break
|
||
except Exception:
|
||
break
|
||
return cmd_val
|
||
|
||
def _build_explore_cmd(text_for_url):
|
||
"""Module-level explore command builder. Extracts repo URL from text,
|
||
builds a curl pipeline to fetch README, contents listing, and releases.
|
||
Used by _parse_commandcode_text_tool_calls (closure wrapper) and
|
||
cc_stream_to_sse (stuck recovery heuristic)."""
|
||
if not text_for_url:
|
||
return None, None
|
||
url_m = re.search(r"https?://[^\s\]'\\>\",]+", text_for_url)
|
||
repo_url = url_m.group(0).rstrip(")].,;'\\\"") if url_m else ""
|
||
if not repo_url and isinstance(text_for_url, str):
|
||
try:
|
||
_parsed = json.loads(text_for_url)
|
||
if isinstance(_parsed, list):
|
||
for _item in _parsed:
|
||
_c = _item.get("content", "") if isinstance(_item, dict) else str(_item)
|
||
url_m2 = re.search(r"https?://[^\s\]'\\>\",]+", _c)
|
||
if url_m2:
|
||
repo_url = url_m2.group(0).rstrip(")].,;'\\\"")
|
||
break
|
||
except Exception:
|
||
pass
|
||
if not repo_url:
|
||
return None, None
|
||
if repo_url.endswith(".git"):
|
||
repo_url = repo_url[:-4]
|
||
if "/api/v1/repos/" not in repo_url:
|
||
host_m = re.match(r"(https?://[^/]+)/(.*)", repo_url)
|
||
if host_m:
|
||
host, path = host_m.groups()
|
||
api_base = f"{host}/api/v1/repos/{path}"
|
||
else:
|
||
api_base = repo_url.replace("/admin/", "/api/v1/repos/")
|
||
else:
|
||
api_base = repo_url
|
||
if _IS_WINDOWS:
|
||
cmd = (
|
||
f"cd $env:TEMP; "
|
||
f"$r = Invoke-WebRequest -Uri '{api_base}/contents/README.md' -UseBasicParsing -TimeoutSec 15 2>$null; "
|
||
f"if ($r) {{ $j = $r.Content | ConvertFrom-Json; [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String($j.content)) | Select-Object -First 600 }}; "
|
||
f"$r2 = Invoke-WebRequest -Uri '{api_base}/contents' -UseBasicParsing -TimeoutSec 15 2>$null; "
|
||
f"if ($r2) {{ $j2 = $r2.Content | ConvertFrom-Json; $j2 | Select-Object -First 50 | ForEach-Object {{ $_.path + ' ' + $_.type }} }}; "
|
||
f"$r3 = Invoke-WebRequest -Uri '{api_base}/releases' -UseBasicParsing -TimeoutSec 15 2>$null; "
|
||
f"if ($r3) {{ ($r3.Content | ConvertFrom-Json | Select-Object -First 3 | ConvertTo-Json).Substring(0, [Math]::Min(2000, ($r3.Content | ConvertFrom-Json | Select-Object -First 3 | ConvertTo-Json).Length)) }}"
|
||
)
|
||
else:
|
||
cmd = (
|
||
f"cd /tmp && "
|
||
f"curl -sL --max-time 15 '{api_base}/contents/README.md' 2>/dev/null | "
|
||
f"python3 -c \"import sys,json,base64; d=json.load(sys.stdin); print(base64.b64decode(d['content']).decode())\" 2>/dev/null | head -600 && "
|
||
f"curl -sL --max-time 15 '{api_base}/contents' 2>/dev/null | python3 -c \"import sys,json; d=json.load(sys.stdin); print('\\n'.join(f'{{x.get(\'path\')}} {{x.get(\'type\')}}' for x in d[:50]))\" 2>/dev/null && "
|
||
f"curl -sL --max-time 15 '{api_base}/releases' 2>/dev/null | python3 -c \"import sys,json; d=json.load(sys.stdin); print(json.dumps(d[:3], indent=2)[:2000])\" 2>/dev/null"
|
||
)
|
||
return cmd, "Explore repository to understand the app and gather README, root contents, and releases for the landing page."
|
||
|
||
def _parse_commandcode_text_tool_calls(text):
|
||
"""Parse CommandCode's text-form tool calls into Responses function calls.
|
||
|
||
Handles THREE formats:
|
||
1. XML: ``<tool_call name="bash"><parameter name="command">...</parameter>`` (original)
|
||
2. Function: ``<function=bash>...</function>`` (original)
|
||
3. [FIX 5] Raw JSON inline: {"type":"tool-call","id":"...","name":"exec_command","arguments":"{...}"}
|
||
|
||
Format 3 exists because cc_input_to_messages sends tool calls as inline JSON text.
|
||
The CC model echoes this format back in its response.
|
||
Extraction is done by _extract_raw_json_tool_calls() which is appended after the
|
||
XML pattern loop. See that function for details on malformed-JSON handling.
|
||
|
||
Tolerant of: unescaped inner quotes, unbalanced braces, missing type/id fields,
|
||
sandbox_permissions at top level vs nested inside arguments, etc.
|
||
"""
|
||
calls = []
|
||
if not text:
|
||
return calls
|
||
|
||
_build_explore_cmd_local = _build_explore_cmd
|
||
|
||
# [FIX 17] DSML tool_call blocks used by the model now.
|
||
# Example:
|
||
# <||DSML||tool_calls>
|
||
# <||DSML||invoke name="exec">
|
||
# <||DSML||parameter name="command" string="true">curl ...</||DSML||parameter>
|
||
# <||DSML||parameter name="sandbox_permissions" string="true">require_escalated</||DSML||parameter>
|
||
# <||DSML||parameter name="justification" string="true">...</||DSML||parameter>
|
||
# <||DSML||parameter name="prefix_rule" string="true">["/bin/bash", "-lc", "curl ..."]</||DSML||parameter>
|
||
# </||DSML||invoke>
|
||
# </||DSML||tool_calls>
|
||
for m in re.finditer(r"<[^>]*tool_calls[^>]*>(.*?)</[^>]*tool_calls[^>]*>", text, re.DOTALL | re.IGNORECASE):
|
||
block = m.group(1) or ""
|
||
for im in re.finditer(r"<[^>]*invoke[^>]*name=\"([^\"]+)\"[^>]*>(.*?)</[^>]*invoke>", block, re.DOTALL | re.IGNORECASE):
|
||
raw_name = (im.group(1) or "").strip()
|
||
body = (im.group(2) or "").strip()
|
||
if not body:
|
||
continue
|
||
cmd = None
|
||
sandbox_permissions = None
|
||
justification = None
|
||
# Parameter tags are the canonical source.
|
||
for pm in re.finditer(r"<[^>]*parameter[^>]*name=\"([^\"]+)\"[^>]*>(.*?)</[^>]*parameter>", body, re.DOTALL | re.IGNORECASE):
|
||
key = (pm.group(1) or "").strip().lower()
|
||
val = _strip_xmlish_tags(pm.group(2)).strip()
|
||
# [FIX 21] Accept both "command" and "cmd" parameter names.
|
||
# The tool schema defines the parameter as "cmd" (see exec_command schema),
|
||
# but the model sometimes uses "command" (especially from prefix_rule fallback).
|
||
# Previously only "command" was accepted, so DSML blocks with name="cmd"
|
||
# were silently dropped — causing Codex CLI to stop mid-task.
|
||
if key in ("command", "cmd"):
|
||
cmd = val
|
||
elif key == "prefix_rule" and not cmd:
|
||
try:
|
||
pr_obj = json.loads(val)
|
||
except Exception:
|
||
pr_obj = None
|
||
if isinstance(pr_obj, list) and pr_obj and isinstance(pr_obj[-1], str):
|
||
cmd = pr_obj[-1]
|
||
elif key == "sandbox_permissions":
|
||
sandbox_permissions = val
|
||
elif key == "justification":
|
||
justification = val
|
||
|
||
# [FIX 20] Support explore / explore_agent in DSML blocks
|
||
is_explore = raw_name.lower() in ("explore", "explore_agent")
|
||
if is_explore:
|
||
explore_cmd, explore_just = _build_explore_cmd_local(body)
|
||
if explore_cmd:
|
||
cmd = explore_cmd
|
||
justification = explore_just
|
||
|
||
# Fallback: if the body contains a raw JSON command.
|
||
if not cmd:
|
||
jm = re.search(r'"(?:command|cmd)"\s*:\s*"((?:[^"\\]|\\.)*)"', body, re.DOTALL)
|
||
if jm:
|
||
cmd = jm.group(1).replace('\\n', '\n').replace('\\"', '"').strip()
|
||
if not cmd:
|
||
continue
|
||
# [FIX 19] Translate execute_request and other variations to exec_command (CLI only supports exec_command)
|
||
# [FIX 20] Translate explore and explore_agent to exec_command
|
||
tool_name = "exec_command" if raw_name.lower() in ("exec", "bash", "shell", "terminal", "run_command", "execute_request", "execute_command", "run_shell_command", "run_shell", "run", "explore", "explore_agent") else raw_name
|
||
args = {"cmd": _unwrap_cmd(cmd)}
|
||
if sandbox_permissions:
|
||
args["sandbox_permissions"] = sandbox_permissions if sandbox_permissions in ("use_default", "require_escalated", "with_user_approval") else "require_escalated"
|
||
if justification:
|
||
args["justification"] = justification
|
||
calls.append({
|
||
"full_match": m.group(0),
|
||
"name": tool_name,
|
||
"arguments": json.dumps(args, ensure_ascii=False),
|
||
})
|
||
|
||
# [FIX 16] Native <bash> blocks from CommandCode.
|
||
# Example:
|
||
# <bash>
|
||
# sandbox_permissions: require_escalated
|
||
# justification: ...
|
||
# prefix_rule: ["/bin/bash", "-lc", "curl ..."]
|
||
# </bash>
|
||
# Convert into exec_command calls by extracting the command from prefix_rule.
|
||
for m in re.finditer(r"<bash>(.*?)</bash>", text, re.DOTALL | re.IGNORECASE):
|
||
body = (m.group(1) or "").strip()
|
||
if not body:
|
||
continue
|
||
sandbox_permissions = None
|
||
justification = None
|
||
cmd = None
|
||
# Try line-oriented parsing first.
|
||
for line in body.splitlines():
|
||
s = line.strip()
|
||
if s.lower().startswith("sandbox_permissions:"):
|
||
sandbox_permissions = s.split(":", 1)[1].strip()
|
||
elif s.lower().startswith("justification:"):
|
||
justification = s.split(":", 1)[1].strip()
|
||
elif s.lower().startswith("prefix_rule:"):
|
||
pr = s.split(":", 1)[1].strip()
|
||
try:
|
||
pr_obj = json.loads(pr)
|
||
except Exception:
|
||
pr_obj = None
|
||
if isinstance(pr_obj, list) and pr_obj:
|
||
# If the last arg exists, it is typically the shell command.
|
||
cmd = pr_obj[-1] if isinstance(pr_obj[-1], str) else None
|
||
elif pr.startswith("[") and pr.endswith("]"):
|
||
parts = re.findall(r'"((?:[^"\\]|\\.)*)"', pr)
|
||
if parts:
|
||
cmd = parts[-1].encode().decode("unicode_escape")
|
||
# Fallback: grab a shell-looking line if prefix_rule wasn't parseable.
|
||
if not cmd:
|
||
for line in body.splitlines():
|
||
s = line.strip()
|
||
if re.match(r"^(curl|wget|python3?|node|npm|pnpm|yarn|cat|ls|find|grep|rg|sed|awk|git|mkdir|touch|printf|echo)\b", s):
|
||
cmd = s
|
||
break
|
||
if not cmd:
|
||
continue
|
||
args = {"cmd": cmd}
|
||
if sandbox_permissions:
|
||
args["sandbox_permissions"] = sandbox_permissions if sandbox_permissions in ("use_default", "require_escalated", "with_user_approval") else "require_escalated"
|
||
if justification:
|
||
args["justification"] = justification
|
||
calls.append({
|
||
"full_match": m.group(0),
|
||
"name": "exec_command",
|
||
"arguments": json.dumps(args, ensure_ascii=False),
|
||
})
|
||
|
||
# [FIX 15] Native <explore_agent> blocks from CommandCode.
|
||
# Format seen in logs:
|
||
# <explore_agent>\nmessages: [{...}]\n</explore_agent>
|
||
# Treat as an assistant-requested agent call so the loop can continue.
|
||
for m in re.finditer(r"<explore_agent>(.*?)</explore_agent>|<explore_agent>\s*messages:\s*(\[.*?\])", text, re.DOTALL | re.IGNORECASE):
|
||
body = m.group(1) or m.group(2) or ""
|
||
body = body.strip()
|
||
msgs = None
|
||
if body:
|
||
try:
|
||
msgs = json.loads(body) if body.startswith("[") else None
|
||
except Exception:
|
||
msgs = None
|
||
if msgs is None and body:
|
||
mm = re.search(r"(\[.*\])", body, re.DOTALL)
|
||
if mm:
|
||
try:
|
||
msgs = json.loads(mm.group(1))
|
||
except Exception:
|
||
msgs = None
|
||
if msgs is None:
|
||
msgs = body
|
||
text_for_url = body if isinstance(body, str) else json.dumps(body, ensure_ascii=False)
|
||
cmd, justification = _build_explore_cmd_local(text_for_url)
|
||
if not cmd:
|
||
cmd = "echo 'explore_agent: unable to extract repository URL'"
|
||
justification = "Fallback for explore_agent block without URL."
|
||
args = {"cmd": cmd}
|
||
if justification:
|
||
args["justification"] = justification
|
||
calls.append({
|
||
"full_match": m.group(0),
|
||
"name": "exec_command",
|
||
"arguments": json.dumps(args, ensure_ascii=False),
|
||
})
|
||
|
||
if not calls and text.count("<explore_agent>") >= 2:
|
||
url_m = re.search(r"https?://[^\s\]'\\>\"]+", text)
|
||
if not url_m:
|
||
for prev_url in _last_user_urls:
|
||
url_m = re.search(r"https?://[^\s\]'\\>\"]+", prev_url)
|
||
if url_m:
|
||
break
|
||
if url_m:
|
||
explore_url = url_m.group(0).rstrip(")].,;'\\")
|
||
cmd, justification = _build_explore_cmd_local(explore_url)
|
||
if cmd:
|
||
calls.append({
|
||
"full_match": "<explore_agent>...",
|
||
"name": "exec_command",
|
||
"arguments": json.dumps({"cmd": cmd, "justification": justification or "Explore repository"}, ensure_ascii=False),
|
||
})
|
||
|
||
# [FIX 24] Handle <require_escalation> and <request_escalation_permission> blocks.
|
||
# The model produces these when it wants elevated permissions but the CC
|
||
# adapter doesn't support them. Synthesize a proceed command so the loop continues.
|
||
if not calls:
|
||
for m in re.finditer(r"<(?:require_escalation|request_escalation_permission)>(.*?)</(?:require_escalation|request_escalation_permission)>", text, re.DOTALL | re.IGNORECASE):
|
||
body_escal = (m.group(1) or "").strip()
|
||
_inner_url_m = re.search(r"https?://[^\s\]'\\>\",]+", body_escal)
|
||
if _inner_url_m:
|
||
_e_url = _inner_url_m.group(0).rstrip(")].,;'\\\"")
|
||
_e_cmd, _e_just = _build_explore_cmd_local(_e_url)
|
||
if _e_cmd:
|
||
calls.append({
|
||
"full_match": m.group(0),
|
||
"name": "exec_command",
|
||
"arguments": json.dumps({"cmd": _e_cmd, "justification": _e_just or "Escalation block with URL — auto-proceed"}, ensure_ascii=False),
|
||
})
|
||
continue
|
||
if not calls:
|
||
calls.append({
|
||
"full_match": m.group(0),
|
||
"name": "exec_command",
|
||
"arguments": json.dumps({"cmd": "echo 'escalation: auto-proceeding — no specific command in escalation block'", "justification": "Auto-proceed past escalation request"}, ensure_ascii=False),
|
||
})
|
||
|
||
# [FIX 24b] Bare <require_escalation ... /> or <request_escalation_permission ... />
|
||
# without closing tags. Just auto-proceed.
|
||
if not calls and re.search(r"<(?:require_escalation|request_escalation_permission)[\s/>]", text, re.IGNORECASE):
|
||
calls.append({
|
||
"full_match": "<escalation_bare/>",
|
||
"name": "exec_command",
|
||
"arguments": json.dumps({"cmd": "echo 'escalation: auto-proceeding past bare escalation tag'", "justification": "Auto-proceed past bare escalation tag"}, ensure_ascii=False),
|
||
})
|
||
|
||
patterns = [
|
||
r"<tool_call(?:\s+name=['\"]?([^'\">\s]+)['\"]?)?>(.*?)</tool_call[)]?>",
|
||
r"<function=(\w+)>(.*?)</function>",
|
||
# [FIX 14] CC model actual output: <tool_call type="bash">\n{"command":"...", "description":"..."}
|
||
# No </tool_call) closing tag — body is a raw JSON object
|
||
r"<tool_call(?:\s+type=['\"]?(\w+)['\"]?)?>\s*(\{.*?\})(?:\s*</tool_call)?",
|
||
]
|
||
|
||
def _find_balanced_brace(text, start):
|
||
"""Find the closing brace matching text[start], handling quoted strings."""
|
||
if start >= len(text) or text[start] != '{':
|
||
return -1
|
||
depth = 0
|
||
i = start
|
||
in_str = False
|
||
escape = False
|
||
while i < len(text):
|
||
ch = text[i]
|
||
if escape:
|
||
escape = False
|
||
elif ch == '\\':
|
||
escape = True
|
||
elif ch == '"':
|
||
in_str = not in_str
|
||
elif not in_str:
|
||
if ch == '{':
|
||
depth += 1
|
||
elif ch == '}':
|
||
depth -= 1
|
||
if depth == 0:
|
||
return i
|
||
i += 1
|
||
return -1
|
||
|
||
def _extract_field(text, key, end_chars=',}'):
|
||
"""Extract a field value after "key": in rough JSON text.
|
||
|
||
[FIX 7] Handles values starting with \" (backslash-quote) which occurs when
|
||
the model generates properly-escaped JSON inside a string value.
|
||
Without this fix, _extract_field returns None for escaped values,
|
||
causing sandbox_permissions/justification to not be extracted from
|
||
the parsed args dict (falling through to raw snippet extraction).
|
||
|
||
Also tolerant of unescaped quotes inside string values.
|
||
Returns None if key not found or value is empty.
|
||
"""
|
||
pat = re.compile(r'"' + re.escape(key) + r'"\s*:\s*', re.DOTALL)
|
||
m = pat.search(text)
|
||
if not m:
|
||
return None
|
||
val_start = m.end()
|
||
# Skip leading backslash-escape if the value starts with \" (nested JSON string)
|
||
if val_start < len(text) and text[val_start] == '\\':
|
||
val_start += 1
|
||
# Check if value is a string
|
||
if val_start < len(text) and text[val_start] == '"':
|
||
s = val_start + 1
|
||
buf = []
|
||
while s < len(text):
|
||
ch = text[s]
|
||
if ch == '\\' and s + 1 < len(text):
|
||
buf.append(text[s+1])
|
||
s += 2
|
||
elif ch == '"':
|
||
return ''.join(buf)
|
||
elif ch in end_chars and not buf:
|
||
return None
|
||
else:
|
||
buf.append(ch)
|
||
s += 1
|
||
return ''.join(buf)
|
||
# Object value: find balanced brace
|
||
if val_start < len(text) and text[val_start] == '{':
|
||
end = _find_balanced_brace(text, val_start)
|
||
if end > val_start:
|
||
return text[val_start:end+1]
|
||
return None
|
||
|
||
def _extract_args(text):
|
||
"""Extract arguments value from tool-call JSON, handling multiple malformed formats.
|
||
|
||
[FIX 6] THREE-TIER PARSER — solves double-wrapped arguments bug:
|
||
Model generates arguments in TWO different escaped forms:
|
||
A) Unescaped: "arguments": "{"cmd": "curl ...", "sp": "allow_all"}"
|
||
→ naive brace-counting finds boundaries correctly
|
||
B) Escaped: "arguments": "{\\"cmd\\": \\"curl...\\"}"
|
||
→ json.loads fails on \\ at structural level
|
||
→ unescape \\" → " and retry
|
||
→ unicode_escape decode and retry
|
||
|
||
Returns the raw JSON string (after best-effort unescaping).
|
||
Caller does json.loads() on the result.
|
||
If all 3 tiers fail, returns raw text (caller handles as fallback).
|
||
"""
|
||
m = re.search(r'"(?:arguments|input)"\s*:\s*"?', text)
|
||
if not m:
|
||
return None
|
||
start = m.end()
|
||
if start < len(text) and text[start] == '"':
|
||
start += 1
|
||
if start >= len(text) or text[start] != '{':
|
||
return None
|
||
depth = 0
|
||
i = start
|
||
while i < len(text):
|
||
ch = text[i]
|
||
if ch == '{':
|
||
depth += 1
|
||
elif ch == '}':
|
||
depth -= 1
|
||
if depth == 0:
|
||
raw = text[start:i+1]
|
||
|
||
# Try JSON.parse as-is
|
||
try:
|
||
json.loads(raw)
|
||
return raw
|
||
except json.JSONDecodeError:
|
||
pass
|
||
|
||
# Try after unescaping inner \" -> "
|
||
unescaped = raw.replace('\\"', '"')
|
||
try:
|
||
json.loads(unescaped)
|
||
return unescaped
|
||
except json.JSONDecodeError:
|
||
pass
|
||
|
||
# Try after also unescaping \\n -> \n etc
|
||
try:
|
||
fixed = raw.encode().decode('unicode_escape')
|
||
json.loads(fixed)
|
||
return fixed
|
||
except Exception:
|
||
pass
|
||
|
||
# Give up — return raw text
|
||
return raw
|
||
i += 1
|
||
return None
|
||
|
||
def _extract_raw_json_tool_calls(t):
|
||
"""[FIX 5] Extract raw JSON tool-call objects from free text.
|
||
|
||
Finds "type":"tool-call" (or tool_call/function_call) in text, then extracts
|
||
name/id/arguments/sandbox_permissions/justification via field-level regex.
|
||
|
||
Delegates to _extract_args() for the arguments field (handles unescaped + escaped JSON).
|
||
Delegates to _extract_field() for name/id/sandbox_permissions/justification
|
||
(with FIX 7 for leading-backslash handling).
|
||
|
||
Normalizes sandbox_permissions to valid values (use_default|require_escalated|with_user_approval)
|
||
[FIX 6] Prevents double-wrapped args: {"cmd": "{\"cmd\": \"curl...\"}"}
|
||
"""
|
||
results = []
|
||
idx = 0
|
||
while True:
|
||
m = re.search(r'"type"\s*:\s*"(tool-call|tool_call|function_call)"', t[idx:])
|
||
if not m:
|
||
break
|
||
tc_pos = idx + m.start()
|
||
snippet = t[tc_pos:]
|
||
idx = tc_pos + 1
|
||
tc_type = m.group(1)
|
||
tc_name = _extract_field(snippet, "name")
|
||
if not tc_name:
|
||
continue
|
||
tc_id = _extract_field(snippet, "id")
|
||
|
||
# [FIX 20] Support explore / explore_agent in raw JSON tool calls
|
||
is_explore = tc_name.lower() in ("explore", "explore_agent")
|
||
|
||
if is_explore:
|
||
# Build explore command from the whole snippet/arguments
|
||
explore_cmd, explore_just = _build_explore_cmd_local(snippet)
|
||
if explore_cmd:
|
||
args = {"cmd": explore_cmd}
|
||
if explore_just:
|
||
args["justification"] = explore_just
|
||
else:
|
||
args = {"cmd": "echo 'explore: unable to extract repository URL'", "justification": "Fallback for explore tool call without URL."}
|
||
tool_name = "exec_command"
|
||
else:
|
||
# [FIX 19] Translate execute_request and other variations to exec_command (CLI only supports exec_command)
|
||
tool_name = "exec_command" if tc_name.lower() in ("exec", "bash", "shell", "terminal", "run_command", "execute_request", "execute_command", "run_shell_command", "run_shell", "run") else tc_name
|
||
args_raw = _extract_args(snippet) or _extract_field(snippet, "arguments") or _extract_field(snippet, "input") or "{}"
|
||
try:
|
||
args = json.loads(args_raw) if args_raw.startswith('{') else {"cmd": args_raw}
|
||
except Exception:
|
||
args = {"cmd": args_raw}
|
||
if "cmd" not in args or not args["cmd"]:
|
||
args["cmd"] = str(args)
|
||
# [FIX 11] Self-healing: unwrap double-wrapped cmd values
|
||
args["cmd"] = _unwrap_cmd(args.get("cmd", ""))
|
||
|
||
# Normalize sandbox_permissions to valid values
|
||
_VALID_SP = frozenset({"use_default", "require_escalated", "with_user_approval"})
|
||
if "sandbox_permissions" in args:
|
||
spv = args["sandbox_permissions"]
|
||
if isinstance(spv, dict):
|
||
args["sandbox_permissions"] = "require_escalated" if spv.get("require_escalated") else "use_default"
|
||
elif isinstance(spv, str) and spv not in _VALID_SP:
|
||
args["sandbox_permissions"] = "require_escalated"
|
||
else:
|
||
# Fallback: extract from raw snippet (model puts it at top level)
|
||
sp_raw = _extract_field(snippet, "sandbox_permissions")
|
||
if sp_raw:
|
||
try:
|
||
sp_obj = json.loads(sp_raw) if sp_raw.startswith('{') else {"require_escalated": bool(sp_raw)}
|
||
if isinstance(sp_obj, dict) and sp_obj.get("require_escalated"):
|
||
args["sandbox_permissions"] = "require_escalated"
|
||
except Exception:
|
||
pass
|
||
if "justification" not in args:
|
||
just_raw = _extract_field(snippet, "justification")
|
||
if just_raw:
|
||
args["justification"] = just_raw
|
||
results.append({
|
||
"full_match": snippet,
|
||
"name": tool_name,
|
||
"arguments": json.dumps(args, ensure_ascii=False),
|
||
})
|
||
return results
|
||
|
||
for pat in patterns:
|
||
for m in re.finditer(pat, text, re.DOTALL | re.IGNORECASE):
|
||
if pat.startswith("<function"):
|
||
raw_name = m.group(1)
|
||
body = m.group(2)
|
||
else:
|
||
raw_name = m.group(1) or ""
|
||
body = m.group(2)
|
||
nm = re.search(r"<tool\s+name=[\"']?([^\"'>\s]+)", body, re.IGNORECASE)
|
||
raw_name = raw_name or (nm.group(1) if nm else "bash")
|
||
params = {}
|
||
body_stripped = body.strip()
|
||
if body_stripped.startswith("{"):
|
||
try:
|
||
obj = json.loads(body_stripped)
|
||
cmd = obj.get("command") or obj.get("cmd") or ""
|
||
cmd = _unwrap_cmd(cmd) # [FIX 11]
|
||
if cmd:
|
||
# [FIX 19] Translate execute_request and other variations to exec_command (CLI only supports exec_command)
|
||
tool_name = "exec_command" if raw_name.lower() in ("exec", "bash", "shell", "terminal", "run_command", "execute_request", "execute_command", "run_shell_command", "run_shell", "run") else raw_name
|
||
args = {"cmd": cmd}
|
||
sp = obj.get("sandbox_permissions")
|
||
if isinstance(sp, dict) and sp.get("require_escalated"):
|
||
args["sandbox_permissions"] = "require_escalated"
|
||
elif isinstance(sp, str):
|
||
args["sandbox_permissions"] = sp
|
||
if obj.get("justification"):
|
||
args["justification"] = obj.get("justification")
|
||
calls.append({"full_match": m.group(0), "name": tool_name, "arguments": json.dumps(args)})
|
||
continue
|
||
except Exception:
|
||
pass
|
||
for pm in re.finditer(r"<parameter(?:\s+name=[\"']?(\w+)[\"']?|=(\w+))>(.*?)</parameter>", body, re.DOTALL | re.IGNORECASE):
|
||
key = pm.group(1) or pm.group(2) or "text"
|
||
params[key] = _strip_xmlish_tags(pm.group(3)).strip()
|
||
|
||
# [FIX 20] Support explore / explore_agent in XML tool calls
|
||
is_explore = raw_name.lower() in ("explore", "explore_agent")
|
||
if is_explore:
|
||
explore_cmd, explore_just = _build_explore_cmd_local(body)
|
||
if explore_cmd:
|
||
cmd = explore_cmd
|
||
params["justification"] = explore_just
|
||
else:
|
||
cmd = ""
|
||
else:
|
||
cmd = params.get("command") or params.get("cmd") or ""
|
||
|
||
if not cmd and body_stripped.startswith("{"):
|
||
cm = re.search(r'"(?:command|cmd)"\s*:\s*"(.*?)"\s*,\s*"(?:sandbox_permissions|justification|prefix_rule)"', body, re.DOTALL)
|
||
if not cm:
|
||
cm = re.search(r'"(?:command|cmd)"\s*:\s*"(.*?)"\s*}', body, re.DOTALL)
|
||
if cm:
|
||
cmd = cm.group(1)
|
||
cmd = cmd.replace('\\n', '\n').replace('\\"', '"').strip()
|
||
cmd = _unwrap_cmd(cmd) # [FIX 11]
|
||
if re.search(r'"sandbox_permissions"\s*:\s*\{\s*"require_escalated"\s*:\s*true\s*\}', body, re.DOTALL):
|
||
params["sandbox_permissions"] = "require_escalated"
|
||
jm = re.search(r'"justification"\s*:\s*"(.*?)"\s*(?:,|})', body, re.DOTALL)
|
||
if jm:
|
||
params["justification"] = jm.group(1).replace('\\n', '\n').replace('\\"', '"').strip()
|
||
if not cmd:
|
||
stripped = _strip_xmlish_tags(body)
|
||
lines = [ln.strip() for ln in stripped.splitlines() if ln.strip()]
|
||
for i, ln in enumerate(lines):
|
||
if re.match(r"^(curl|wget|python3?|node|npm|pnpm|yarn|cat|ls|find|grep|rg|sed|awk|git|mkdir|touch|printf|echo)\b", ln):
|
||
cmd = "\n".join(lines[i:])
|
||
break
|
||
if not cmd and lines:
|
||
cmd = "\n".join(lines)
|
||
if not cmd:
|
||
continue
|
||
# [FIX 19] Translate execute_request and other variations to exec_command (CLI only supports exec_command)
|
||
# [FIX 20] Translate explore and explore_agent to exec_command
|
||
tool_name = "exec_command" if raw_name.lower() in ("exec", "bash", "shell", "terminal", "run_command", "execute_request", "execute_command", "run_shell_command", "run_shell", "run", "explore", "explore_agent") else raw_name
|
||
args = {"cmd": _unwrap_cmd(cmd)} # [FIX 11] all paths must unwrap
|
||
if params.get("sandbox_permissions"):
|
||
args["sandbox_permissions"] = params["sandbox_permissions"]
|
||
if params.get("justification"):
|
||
args["justification"] = params["justification"]
|
||
calls.append({"full_match": m.group(0), "name": tool_name, "arguments": json.dumps(args)})
|
||
|
||
# Also extract raw JSON tool-call objects embedded in free text
|
||
calls.extend(_extract_raw_json_tool_calls(text))
|
||
|
||
# [FIX 18] Native <todo_write> blocks from the model (used for checklist/task tracking)
|
||
# The model outputs a task checklist in a custom <todo_write> XML tag block:
|
||
# <todo_write>
|
||
# <todos>[{"id":"1","status":"in_progress","description":"..."}]</todos>
|
||
# </todo_write>
|
||
# We parse this and map it to a standard 'TodoWrite' tool call so the CLI agent loop continues execution.
|
||
for m in re.finditer(r"<todo_write>(.*?)</todo_write>", text, re.DOTALL | re.IGNORECASE):
|
||
body = (m.group(1) or "").strip()
|
||
if not body:
|
||
continue
|
||
todos_match = re.search(r"<todos>(.*?)</todos>", body, re.DOTALL | re.IGNORECASE)
|
||
if not todos_match:
|
||
continue
|
||
raw_todos_json = todos_match.group(1).strip()
|
||
try:
|
||
raw_todos = json.loads(raw_todos_json)
|
||
except Exception as e:
|
||
print(f"[translate-proxy] [FIX 18] Failed to parse <todos> JSON: {e}", file=sys.stderr)
|
||
raw_todos = None
|
||
if isinstance(raw_todos, list):
|
||
parsed_todos = []
|
||
for item in raw_todos:
|
||
if isinstance(item, dict):
|
||
desc = item.get("description") or item.get("content") or ""
|
||
parsed_todos.append({
|
||
"content": desc,
|
||
"activeForm": item.get("activeForm") or desc,
|
||
"status": item.get("status") or "pending"
|
||
})
|
||
calls.append({
|
||
"full_match": m.group(0),
|
||
"name": "TodoWrite",
|
||
"arguments": json.dumps({"todos": parsed_todos}, ensure_ascii=False)
|
||
})
|
||
|
||
# [FIX 11] Self-healing: last-chance sanitization pass on ALL extracted calls
|
||
calls = _sanitize_tool_calls(calls)
|
||
return calls
|
||
|
||
def _sanitize_tool_calls(calls):
|
||
"""[FIX 11/T3] Post-extraction self-healing validation layer.
|
||
|
||
Runs AFTER all extraction paths (XML, raw JSON, regex) have produced their
|
||
tool calls. This is the final safety net before calls are returned to the
|
||
streaming/response builder.
|
||
|
||
Validates and repairs:
|
||
- Double/triple-wrapped cmd values (recursive unwrap)
|
||
- cmd that looks like JSON object/string instead of shell command
|
||
- cmd containing escaped newlines or quotes that would break bash
|
||
- Empty or whitespace-only cmd → replaced with diagnostic string
|
||
|
||
Logs warnings for any repair made (visible in stderr/proxy logs).
|
||
Returns sanitized list (may be shorter if irreparable calls are dropped).
|
||
"""
|
||
cleaned = []
|
||
for i, call in enumerate(calls):
|
||
# [FIX 18] Skip sanitization pass for non-shell tool calls (e.g., TodoWrite)
|
||
# Sanitization specifically validates and repairs command shell executions (the 'cmd' argument).
|
||
# Running it on other tools without a 'cmd' parameter (like TodoWrite) would falsely flag
|
||
# them as containing JSON garbage or empty commands, corrupting their actual parameters.
|
||
if call.get("name") != "exec_command":
|
||
cleaned.append(call)
|
||
continue
|
||
|
||
try:
|
||
args_raw = call.get("arguments", "{}")
|
||
if isinstance(args_raw, str):
|
||
args = json.loads(args_raw)
|
||
else:
|
||
args = dict(args_raw)
|
||
except Exception:
|
||
cleaned.append(call)
|
||
continue
|
||
cmd = args.get("cmd", "")
|
||
repaired = False
|
||
|
||
# Detect and unwrap nested JSON cmd values (up to 4 levels deep)
|
||
unwrapped = _unwrap_cmd(cmd)
|
||
if unwrapped != cmd:
|
||
cmd = unwrapped
|
||
args["cmd"] = cmd
|
||
repaired = True
|
||
|
||
# Detect cmd that is still a JSON object (unwrap missed it or deeper nesting)
|
||
if isinstance(cmd, str) and cmd.strip().startswith("{"):
|
||
try:
|
||
inner = json.loads(cmd)
|
||
if isinstance(inner, dict):
|
||
for key in ("cmd", "command", "c"):
|
||
if key in inner and isinstance(inner[key], str):
|
||
args["cmd"] = inner[key]
|
||
repaired = True
|
||
break
|
||
except Exception:
|
||
pass
|
||
|
||
# Detect cmd that looks like a JSON-encoded string with backslash escapes
|
||
_cmd = args.get("cmd", "")
|
||
if _cmd and ('\\"' in _cmd or "\\n" in _cmd or _cmd.count("{") > _cmd.count("}")):
|
||
try:
|
||
decoded = _cmd.encode().decode("unicode_escape")
|
||
if decoded != _cmd and not decoded.startswith("{"):
|
||
args["cmd"] = decoded
|
||
repaired = True
|
||
except Exception:
|
||
pass
|
||
|
||
# Final guard: if cmd is empty or just JSON garbage, make it obvious
|
||
_final_cmd = args.get("cmd", "")
|
||
if not _final_cmd or _final_cmd.strip() in ("{}", "null", "None", ""):
|
||
_safe_preview = args_raw[:200].replace('"', "'").replace('\\', '/')
|
||
args["cmd"] = f"# [CC-SANITIZER] empty cmd recovered from: {_safe_preview}"
|
||
repaired = True
|
||
elif _final_cmd.startswith("{") and len(_final_cmd) < 500:
|
||
# Still looks like JSON — likely unrecoverable, flag it
|
||
_safe_preview = _final_cmd.replace('"', "'").replace('\\', '/')
|
||
args["cmd"] = f"# [CC-SANITIZER] suspicious cmd (still JSON): {_safe_preview}"
|
||
repaired = True
|
||
|
||
if repaired:
|
||
print(f"[translate-proxy] [CC-SANITIZER] repaired tool call #{i}: "
|
||
f"name={call.get('name')} cmd_preview={str(args.get('cmd',''))[:120]}",
|
||
file=sys.stderr)
|
||
|
||
call["arguments"] = json.dumps(args, ensure_ascii=False)
|
||
cleaned.append(call)
|
||
|
||
return cleaned
|
||
|
||
def _parse_cc_line(line):
|
||
"""Parse a raw line from CommandCode /alpha/generate, stripping SSE data: prefix."""
|
||
stripped = line.strip()
|
||
if not stripped:
|
||
return None
|
||
if stripped.startswith("data: "):
|
||
stripped = stripped[6:]
|
||
elif stripped.startswith("data:"):
|
||
stripped = stripped[5:]
|
||
if not stripped or stripped == "[DONE]":
|
||
return None
|
||
try:
|
||
return json.loads(stripped)
|
||
except json.JSONDecodeError:
|
||
return None
|
||
|
||
|
||
def _iter_cc_events(stream):
|
||
"""Yield parsed JSON events from a CommandCode /alpha/generate stream.
|
||
Handles raw JSON lines, SSE data: events, and multi-event chunks.
|
||
"""
|
||
buf = ""
|
||
for chunk in _stream_with_idle_timeout(stream):
|
||
buf += chunk.decode("utf-8", errors="replace")
|
||
while "\n" in buf:
|
||
line, buf = buf.split("\n", 1)
|
||
d = _parse_cc_line(line)
|
||
if d is not None:
|
||
yield d
|
||
# Process remaining buffer (non-streaming single-JSON response)
|
||
if buf.strip():
|
||
if buf.strip().startswith("{"):
|
||
d = _parse_cc_line(buf)
|
||
if d is not None:
|
||
yield d
|
||
else:
|
||
for line in buf.strip().split("\n"):
|
||
d = _parse_cc_line(line)
|
||
if d is not None:
|
||
yield d
|
||
|
||
|
||
def cc_resp_to_responses(cc_lines, model, resp_id=None):
|
||
text = ""
|
||
usage = {}
|
||
if isinstance(cc_lines, str):
|
||
cc_lines = [cc_lines]
|
||
for line in cc_lines:
|
||
d = _parse_cc_line(line)
|
||
if d is None:
|
||
continue
|
||
t = d.get("type", "")
|
||
if t == "text-delta":
|
||
text += d.get("text", "")
|
||
elif t == "finish-step":
|
||
u = d.get("usage", {})
|
||
usage = {
|
||
"input_tokens": u.get("inputTokens", 0),
|
||
"output_tokens": u.get("outputTokens", 0),
|
||
"total_tokens": u.get("inputTokens", 0) + u.get("outputTokens", 0),
|
||
}
|
||
outputs = []
|
||
if text:
|
||
outputs.append({"type": "message", "id": uid("msg"), "role": "assistant",
|
||
"status": "completed",
|
||
"content": [{"type": "output_text", "text": text, "annotations": []}]})
|
||
return {"id": resp_id or uid("resp"), "object": "response", "created": int(time.time()),
|
||
"model": model, "status": "completed", "output": outputs,
|
||
"usage": {"input_tokens": usage.get("input_tokens", 0),
|
||
"output_tokens": usage.get("output_tokens", 0),
|
||
"total_tokens": usage.get("total_tokens", 0),
|
||
"input_tokens_details": {"cached_tokens": 0}}}
|
||
|
||
def cc_stream_to_sse(cc_stream, model, req_id):
|
||
resp_id = req_id or uid("resp")
|
||
msg_id = uid("msg")
|
||
text_buf = ""
|
||
|
||
yield emit("response.created", {"type": "response.created",
|
||
"response": {"id": resp_id, "object": "response", "model": model,
|
||
"status": "in_progress", "created": int(time.time()), "output": []}})
|
||
yield emit("response.in_progress", {"type": "response.in_progress", "response": {"id": resp_id}})
|
||
|
||
total_usage = {}
|
||
_event_types_seen = set()
|
||
_debug_log_path = os.path.expanduser("~/.cache/codex-proxy/cc-debug.log")
|
||
_debug_fh = open(_debug_log_path, "a") # [FIX 14] always write debug to FILE (not just stderr which may be piped)
|
||
_deflog = lambda *a, **kw: print(*a, file=_debug_fh, flush=True, **kw)
|
||
|
||
for d in _iter_cc_events(cc_stream):
|
||
t = d.get("type", "")
|
||
_event_types_seen.add(t)
|
||
|
||
if t == "text-delta":
|
||
txt = d.get("text", "")
|
||
if txt:
|
||
text_buf += txt
|
||
|
||
elif t == "finish-step":
|
||
u = d.get("usage", {})
|
||
total_usage = {
|
||
"input_tokens": u.get("inputTokens", 0),
|
||
"output_tokens": u.get("outputTokens", 0),
|
||
"total_tokens": u.get("inputTokens", 0) + u.get("outputTokens", 0),
|
||
}
|
||
elif t not in ("text-delta", "finish-step"):
|
||
_deflog(f"[CC-DEBUG] unexpected event type: {t} keys={list(d.keys())[:5]} data={str(d)[:200]}")
|
||
|
||
_deflog(f"[CC-DEBUG] stream ended. event_types={_event_types_seen} text_buf_len={len(text_buf)}")
|
||
|
||
parsed_tool_calls = _parse_commandcode_text_tool_calls(text_buf)
|
||
_deflog(f"[CC-DEBUG] text_buf len={len(text_buf)} parsed_tool_calls={len(parsed_tool_calls)} "
|
||
f"text_preview={text_buf[:500]!r}")
|
||
if parsed_tool_calls:
|
||
for ti, tc in enumerate(parsed_tool_calls):
|
||
_deflog(f"[CC-DEBUG] tool_call[{ti}] name={tc.get('name')} args_preview={tc.get('arguments','')[:150]!r}")
|
||
|
||
# [FIX 13] FALLBACK: if parser returned empty but text contains tool-call patterns,
|
||
# force-extract using regex. This catches cases where model output format
|
||
# doesn't match any of our named patterns (XML/raw JSON/function=).
|
||
if not parsed_tool_calls and len(text_buf) > 20:
|
||
_has_tc_signals = (
|
||
'"type"' in text_buf and ('tool-call' in text_buf or 'tool_call' in text_buf or 'function_call' in text_buf)
|
||
) or (
|
||
'<tool' in text_buf.lower() and '<parameter' in text_buf.lower()
|
||
) or (
|
||
'<function=' in text_buf
|
||
) or (
|
||
'{"cmd":' in text_buf or '{"command":' in text_buf
|
||
)
|
||
if _has_tc_signals:
|
||
_deflog(f"[CC-DEBUG] Parser returned empty but text has tool-call signals! Attempting fallback...")
|
||
# Try direct raw JSON extraction on entire buffer
|
||
_fallback_calls = _extract_raw_json_tool_calls(text_buf)
|
||
if not _fallback_calls:
|
||
# [FIX 14b] Match BOTH "cmd" and "command" keys (model uses both)
|
||
import re as _re
|
||
for _m in _re.finditer(r'\{[^{}]*"(?:command|cmd)"\s*:\s*"(?:[^"\\]|\\.)*"', text_buf):
|
||
try:
|
||
_args = json.loads(_m.group(0))
|
||
if isinstance(_args, dict) and ("cmd" in _args or "command" in _args):
|
||
_cmd_val = _unwrap_cmd(_args.get("cmd") or _args.get("command", ""))
|
||
_args["cmd"] = _cmd_val
|
||
# Copy description as justification if present
|
||
if "description" in _args:
|
||
_args["justification"] = _args["description"]
|
||
_fallback_calls.append({
|
||
"full_match": _m.group(0),
|
||
"name": "exec_command",
|
||
"arguments": json.dumps(_args, ensure_ascii=False),
|
||
})
|
||
except Exception:
|
||
continue
|
||
if _fallback_calls:
|
||
_deflog(f"[CC-DEBUG] Fallback extracted {len(_fallback_calls)} tool calls!")
|
||
for _fi, _fc in enumerate(_fallback_calls):
|
||
_deflog(f"[CC-DEBUG] fallback[{_fi}] name={_fc.get('name')} args={_fc.get('arguments','')[:120]!r}")
|
||
parsed_tool_calls = _fallback_calls
|
||
else:
|
||
_deflog(f"[CC-DEBUG] Fallback also failed. text_buf first 500: {text_buf[:500]!r}")
|
||
|
||
# [FIX 25] SELF-HEALING STUCK DETECTOR
|
||
# When ALL parsers returned empty and text has intent signals, synthesize a
|
||
# command so the agent loop doesn't stall. This catches:
|
||
# - Bare text with no tool call format at all
|
||
# - Unrecognized XML-ish blocks
|
||
# - Partial JSON (bare "{")
|
||
# - Model explaining what it wants to do but not producing a tool call
|
||
if not parsed_tool_calls and len(text_buf) > 10:
|
||
_synth_cmd = None
|
||
_synth_just = None
|
||
_tl = text_buf.lower()
|
||
|
||
# Heuristic 1: URL in text → fetch it
|
||
_url_in_text = re.search(r"https?://[^\s\]'\\>\",]+", text_buf)
|
||
if _url_in_text:
|
||
_synth_url = _url_in_text.group(0).rstrip(")].,;'\\\"")
|
||
if _IS_WINDOWS:
|
||
_synth_cmd = f"Invoke-WebRequest -Uri '{_synth_url}' -UseBasicParsing -TimeoutSec 15 | Select-Object -ExpandProperty Content | Select-Object -First 200"
|
||
else:
|
||
_synth_cmd = f"curl -sL --max-time 15 '{_synth_url}' 2>/dev/null | head -200"
|
||
_synth_just = "Auto-synthesized: URL detected in text, fetching"
|
||
|
||
# Heuristic 2: File path references → list or read
|
||
if not _synth_cmd:
|
||
_file_m = re.search(r"(?:read|open|view|check|examine|cat|show)\s+(?:the\s+)?(?:file\s+)?[`'\"]?(/[^\s'\"]+\.\w+)", _tl)
|
||
if _file_m:
|
||
_fpath = _file_m.group(1)
|
||
if _IS_WINDOWS:
|
||
_synth_cmd = f"Get-Content '{_fpath}' -ErrorAction SilentlyContinue | Select-Object -First 200; if (-not $?) {{ Get-Item '{_fpath}' | Select-Object Name,Length,LastWriteTime }}"
|
||
else:
|
||
_synth_cmd = f"cat '{_fpath}' 2>/dev/null | head -200 || ls -la '{_fpath}'"
|
||
_synth_just = f"Auto-synthesized: file reference detected ({_fpath})"
|
||
|
||
# Heuristic 3: Shell command mentioned in backticks or quotes
|
||
if not _synth_cmd:
|
||
_shell_m = re.search(r"[`'\"]((?:curl|wget|git|npm|pip|python|ls|cat|grep|find|mkdir|cd|rm|cp|mv|chmod|docker|make|cargo|go)\s[^\s`'\"]+)", text_buf)
|
||
if _shell_m:
|
||
_synth_cmd = _shell_m.group(1)
|
||
_synth_just = "Auto-synthesized: shell command detected in text"
|
||
|
||
# Heuristic 4: "explore" or "fetch" intent + last user URL
|
||
if not _synth_cmd and ("explore" in _tl or "fetch" in _tl or "investigate" in _tl or "repository" in _tl):
|
||
for _prev_url in _last_user_urls:
|
||
_url_m2 = re.search(r"https?://[^\s\]'\\>\",]+", _prev_url)
|
||
if _url_m2:
|
||
_pu = _url_m2.group(0).rstrip(")].,;'\\\"")
|
||
_ecmd, _ejust = _build_explore_cmd(_pu)
|
||
if _ecmd:
|
||
_synth_cmd = _ecmd
|
||
_synth_just = _ejust or "Auto-synthesized: explore intent with last user URL"
|
||
break
|
||
|
||
# Heuristic 5: Generic "I need to" / "let me" / "I'll" intent with command-like text
|
||
if not _synth_cmd:
|
||
_intent_m = re.search(r"(?:I(?:'ll| will| need to| should)|let me|please)\s+(.+?)(?:\.|!|\n|$)", _tl, re.IGNORECASE)
|
||
if _intent_m:
|
||
_intent_text = _intent_m.group(1).strip()
|
||
if len(_intent_text) > 10 and len(_intent_text) < 200:
|
||
if _IS_WINDOWS:
|
||
_synth_cmd = f"Write-Output 'Stuck recovery: model intent was: {_intent_text[:100]}'"
|
||
else:
|
||
_synth_cmd = f"echo 'Stuck recovery: model intent was: {_intent_text[:100]}'"
|
||
_synth_just = f"Auto-synthesized from intent text: {_intent_text[:80]}"
|
||
|
||
if _synth_cmd:
|
||
parsed_tool_calls = [{
|
||
"full_match": "__synth_stuck_recovery__",
|
||
"name": "exec_command",
|
||
"arguments": json.dumps({"cmd": _synth_cmd, "justification": _synth_just or "Auto-synthesized stuck recovery"}, ensure_ascii=False),
|
||
}]
|
||
_deflog(f"[CC-DEBUG] [STUCK-RECOVERY] Synthesized: cmd={_synth_cmd[:120]!r}")
|
||
print(f"[CC-DEBUG] [STUCK-RECOVERY] Synthesized command from text intent", file=sys.stderr, flush=True)
|
||
|
||
# Also log to stderr for visibility when not piped
|
||
print(f"[CC-DEBUG] text_buf={len(text_buf)} chars, tool_calls={len(parsed_tool_calls)}", file=sys.stderr, flush=True)
|
||
|
||
try:
|
||
_debug_fh.close()
|
||
except Exception:
|
||
pass
|
||
clean_text = text_buf
|
||
for tc in parsed_tool_calls:
|
||
clean_text = clean_text.replace(tc["full_match"], "")
|
||
clean_text = clean_text.strip()
|
||
|
||
if clean_text:
|
||
yield emit("response.output_item.added", {"type": "response.output_item.added",
|
||
"item": {"type": "message", "id": msg_id, "role": "assistant", "status": "in_progress", "content": []}})
|
||
yield emit("response.content_part.added", {"type": "response.content_part.added",
|
||
"part": {"type": "output_text", "text": "", "annotations": []}, "item_id": msg_id})
|
||
yield emit("response.output_text.delta", {"type": "response.output_text.delta",
|
||
"delta": clean_text, "item_id": msg_id, "content_index": 0})
|
||
yield emit("response.output_text.done", {"type": "response.output_text.done",
|
||
"text": clean_text, "item_id": msg_id, "content_index": 0})
|
||
yield emit("response.content_part.done", {"type": "response.content_part.done",
|
||
"part": {"type": "output_text", "text": clean_text, "annotations": []}, "item_id": msg_id})
|
||
yield emit("response.output_item.done", {"type": "response.output_item.done",
|
||
"item": {"type": "message", "id": msg_id, "role": "assistant", "status": "completed",
|
||
"content": [{"type": "output_text", "text": clean_text, "annotations": []}]}})
|
||
|
||
function_outputs = []
|
||
for tc in parsed_tool_calls:
|
||
fid = uid("fc")
|
||
call_id = uid("call")
|
||
item = {"type": "function_call", "id": fid, "call_id": call_id,
|
||
"name": tc["name"], "arguments": tc["arguments"], "status": "completed"}
|
||
function_outputs.append(item)
|
||
yield emit("response.output_item.added", {"type": "response.output_item.added", "item": item})
|
||
yield emit("response.function_call_arguments.done", {"type": "response.function_call_arguments.done",
|
||
"item_id": fid, "name": tc["name"], "arguments": tc["arguments"]})
|
||
yield emit("response.output_item.done", {"type": "response.output_item.done", "item": item})
|
||
|
||
final_out = []
|
||
if clean_text:
|
||
final_out.append({"type": "message", "id": msg_id, "role": "assistant", "status": "completed",
|
||
"content": [{"type": "output_text", "text": clean_text, "annotations": []}]})
|
||
final_out.extend(function_outputs)
|
||
yield emit("response.completed", {"type": "response.completed",
|
||
"response": {"id": resp_id, "object": "response", "model": model,
|
||
"status": "completed", "created": int(time.time()), "output": final_out,
|
||
"usage": total_usage}})
|
||
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
# Auto-sensing provider adapter
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
|
||
_SENTINEL = object()
|
||
|
||
@dataclasses.dataclass
|
||
class ProviderSchema:
|
||
"""Describes what message formats a provider supports.
|
||
|
||
Populated by probing the endpoint and/or analyzing error responses.
|
||
Cached in provider-caps.json so probing only happens once per provider.
|
||
"""
|
||
supported_roles: tuple = ("user", "assistant")
|
||
content_type: str = "string" # "string" | "array"
|
||
content_block_types: tuple = () # e.g. ("text", "tool_result", "tool-call")
|
||
tool_result_style: str = "inline" # "inline" | "tool_result_block" | "anthropic"
|
||
tool_call_style: str = "openai_function" # "openai_function" | "tool-call" | "anthropic_tool_use"
|
||
accepts_tool_role: bool = False
|
||
accepts_system_role: bool = True
|
||
cc_body_wrap: bool = False # needs {config, params, threadId} wrapping
|
||
field_names: dict = dataclasses.field(default_factory=dict)
|
||
auth_type: str = "" # "bearer" | "x-api-key" | "custom"
|
||
auth_header: str = "Authorization" # header name for auth
|
||
auth_scheme: str = "Bearer " # prefix for auth value
|
||
tool_decl_format: str = "openai" # "openai" | "anthropic" | "command_code"
|
||
param_names: dict = dataclasses.field(default_factory=lambda: {
|
||
"max_tokens": "max_tokens",
|
||
"temperature": "temperature",
|
||
"top_p": "top_p",
|
||
})
|
||
response_format: str = "auto" # "sse" | "raw_json" | "ndjson" | "auto"
|
||
stream_format: str = "auto" # "sse_data" | "sse_event" | "raw_lines" | "json_lines"
|
||
supports_vision: bool = True
|
||
|
||
def hints(self) -> dict:
|
||
"""Return a dict for storing in provider-caps.json."""
|
||
d = {}
|
||
for k, v in dataclasses.asdict(self).items():
|
||
if isinstance(v, (list, tuple)) and not v:
|
||
continue
|
||
if isinstance(v, dict) and not v:
|
||
continue
|
||
if k == "supports_vision":
|
||
if v is not False:
|
||
continue
|
||
elif v is False:
|
||
continue
|
||
if v == "":
|
||
continue
|
||
if v == "auto":
|
||
continue
|
||
d[k] = v
|
||
return d
|
||
|
||
|
||
class ErrorAnalyzer:
|
||
"""Parse upstream error responses to infer provider schema.
|
||
Analyzes 400, 401, 422 errors for hints about auth, roles, content format,
|
||
parameter names, field names, tool format, and response format.
|
||
"""
|
||
|
||
@staticmethod
|
||
def analyze(error_text: str, current: ProviderSchema = None) -> dict:
|
||
hints = {}
|
||
if not error_text:
|
||
return hints
|
||
err = error_text.lower()
|
||
|
||
# ── Auth detection (401 errors) ──
|
||
if re.search(r"unauthorized|invalid.*api.?key|missing.*api.?key|x-api-key", err):
|
||
hints["auth_type"] = "x-api-key"
|
||
hints["auth_header"] = "x-api-key"
|
||
hints["auth_scheme"] = ""
|
||
elif re.search(r"invalid.*bearer|bearer.*token|authorization.*header|invalid.*token", err):
|
||
hints["auth_type"] = "bearer"
|
||
hints["auth_header"] = "Authorization"
|
||
hints["auth_scheme"] = "Bearer "
|
||
|
||
# ── Role validation ──
|
||
if re.search(r"role.*expected.*(?:user|assistant)", err):
|
||
hints["accepts_tool_role"] = False
|
||
hints["accepts_function_role"] = False
|
||
|
||
if re.search(r"role.*(?:tool|function).*(?:invalid|not.*(?:support|allow))", err):
|
||
hints["accepts_tool_role"] = False
|
||
hints["accepts_function_role"] = False
|
||
|
||
if re.search(r"role.*system.*(?:invalid|not.*(?:support|allow))", err):
|
||
hints["accepts_system_role"] = False
|
||
|
||
# ── Content format (top-level only, not content[i].xxx) ──
|
||
if re.search(r'params\.messages\[\d+\]\.content', err):
|
||
# Explicit path to content field in a messages array (e.g. /alpha/generate)
|
||
if re.search(r"expected string.*received array", err):
|
||
hints["content_type"] = "string"
|
||
hints["tool_result_style"] = "inline" # no tool_result blocks allowed
|
||
elif re.search(r"expected array.*received string", err):
|
||
hints["content_type"] = "array"
|
||
elif re.search(r"(?<!\w)content(?!\[)\s*(?:of type|field|should be|expected|must be).*(?:string|array)", err) or \
|
||
re.search(r"expected (?:string|array).*content", err):
|
||
if re.search(r"expected string", err) and not re.search(r"expected array", err):
|
||
hints["content_type"] = "string"
|
||
elif re.search(r"expected array", err):
|
||
hints["content_type"] = "array"
|
||
elif re.search(r"content.*expected string.*received array", err) and not re.search(r"\[\d*\]", err):
|
||
hints["content_type"] = "string"
|
||
elif re.search(r"content.*expected array.*received string", err) and not re.search(r"\[\d*\]", err):
|
||
hints["content_type"] = "array"
|
||
|
||
# ── Content block types ──
|
||
types = set()
|
||
for m in re.finditer(
|
||
r'expected\s+"('
|
||
r'text|image|document|search_result|thinking|redacted_thinking|reasoning|'
|
||
r'tool_use|tool-call|tool_result|tool-result|'
|
||
r'server_tool_use|web_search_tool_result|web_fetch_tool_result|tool'
|
||
r')"', err
|
||
):
|
||
types.add(m.group(1))
|
||
# Also detect from "expected string, received array at params.messages[i].content" pattern
|
||
# where the "or" clauses list valid block types
|
||
if not types and re.search(r'params\.messages\[\d+\]\.content', err):
|
||
for valid_type in ("text", "image", "document", "tool_use", "tool-call", "tool_result"):
|
||
if re.search(r'expected\s+"' + re.escape(valid_type) + r'"', err):
|
||
types.add(valid_type)
|
||
if types:
|
||
hints["content_block_types"] = tuple(sorted(types))
|
||
|
||
# ── Tool result style ──
|
||
if re.search(r"tool_result", err):
|
||
hints["tool_result_style"] = "tool_result_block"
|
||
elif re.search(r"tool_use", err) and not re.search(r"tool.use", err):
|
||
hints["tool_result_style"] = "anthropic"
|
||
|
||
# ── Tool call style ──
|
||
if re.search(r"tool-call", err) or re.search(r"tool_call", err):
|
||
hints["tool_call_style"] = "tool-call"
|
||
elif re.search(r"tool_use", err):
|
||
hints["tool_call_style"] = "anthropic_tool_use"
|
||
|
||
# ── CC body wrap detection ──
|
||
if re.search(r"(?:params\.|body\.)config", err) or re.search(r"threadId", err):
|
||
hints["cc_body_wrap"] = True
|
||
|
||
# ── Field name mappings (keys MUST match SchemaAdapter lookups) ──
|
||
fields = {}
|
||
if re.search(r"tool_use_id", err):
|
||
fields["tool_use_id"] = "tool_use_id"
|
||
if re.search(r"toolCallId", err):
|
||
fields["toolCallId"] = "toolCallId"
|
||
# SchemaAdapter._tool_result_block looks up "tool_use_id"
|
||
fields["tool_use_id"] = "toolCallId"
|
||
if re.search(r"tool_result", err) and not re.search(r"tool.result", err):
|
||
fields["tool_result_type"] = "tool_result"
|
||
if re.search(r"tool-result", err):
|
||
fields["tool_result_type"] = "tool-result"
|
||
# Detect tool call field names from errors
|
||
if re.search(r"(?:id|call_id|callId|tool_use_id).*(?:invalid|unknown|expected|required)", err) or \
|
||
re.search(r"(?:expected|required).*(?:id|call_id|callId)", err):
|
||
for alt in ("id", "call_id", "callId", "tool_use_id"):
|
||
if alt in err:
|
||
fields["tool_call_id_field"] = alt
|
||
break
|
||
if re.search(r"(?:name|tool_name|function).*(?:invalid|unknown|expected|required)", err) or \
|
||
re.search(r"(?:expected|required).*(?:name|tool_name)", err):
|
||
for alt in ("name", "tool_name", "function"):
|
||
if alt in err:
|
||
fields["tool_call_name_field"] = alt
|
||
break
|
||
if re.search(r"arguments.*(?:invalid|unknown|expect|required)", err) or \
|
||
re.search(r"input.*(?:invalid|unknown|expect|required)", err):
|
||
if re.search(r"input_schema|input\b", err) and not re.search(r"arguments", err):
|
||
fields["tool_call_args_field"] = "input"
|
||
fields["tool_args_field"] = "input"
|
||
else:
|
||
fields["tool_call_args_field"] = "arguments"
|
||
fields["tool_args_field"] = "arguments"
|
||
|
||
# ── Supported roles from error ──
|
||
if re.search(r"params\.messages\[\d+\]\.role", err):
|
||
roles = re.findall(r'expected one of\s+"([^"]+)"', err)
|
||
if roles:
|
||
hints["supported_roles"] = tuple(r.strip() for r in roles[0].split("|"))
|
||
if fields:
|
||
hints["field_names"] = fields
|
||
|
||
# ── Parameter name negotiation ──
|
||
param_hints = {}
|
||
if re.search(r"max_tokens.*(?:invalid|unknown|not.*(?:support|recognize))", err) or \
|
||
re.search(r"(?:unknown|invalid).*param.*max_tokens", err):
|
||
for alt in ("max_output_tokens", "max_tokens_to_sample", "max_new_tokens", "max_token"):
|
||
if alt.lower() in err:
|
||
param_hints["max_tokens"] = alt
|
||
break
|
||
if re.search(r"temperature.*(?:invalid|unknown)", err):
|
||
for alt in ("creation_temperature", "temp", "model_temperature"):
|
||
if alt.lower() in err:
|
||
param_hints["temperature"] = alt
|
||
break
|
||
if re.search(r"top_p.*(?:invalid|unknown)", err):
|
||
for alt in ("top_p", "nucleus_sampling"):
|
||
if alt.lower() in err:
|
||
param_hints["top_p"] = alt
|
||
break
|
||
if param_hints:
|
||
hints["param_names"] = param_hints
|
||
|
||
# ── Tool declaration format ──
|
||
if re.search(r"tools.*input_schema", err) or re.search(r"input_schema.*required", err):
|
||
hints["tool_decl_format"] = "anthropic"
|
||
elif re.search(r"tools.*function.*(?:required|expected)", err):
|
||
hints["tool_decl_format"] = "openai"
|
||
elif re.search(r"tool-call|tool_call.*format", err):
|
||
hints["tool_decl_format"] = "command_code"
|
||
|
||
# ── Vision support detection ──
|
||
if re.search(r"unknown variant\b.*image_url", err) or \
|
||
re.search(r"unexpected.*image_url", err) or \
|
||
re.search(r"does not support.*image", err) or \
|
||
re.search(r"image.*not.*support", err) or \
|
||
re.search(r"unsupported.*content.*type.*image", err):
|
||
hints["supports_vision"] = False
|
||
|
||
# ── Response/Stream format hints from content-type or error ──
|
||
# ── Vision support detection ──
|
||
if re.search(r"unknown variant\b.*image_url", err) or \
|
||
re.search(r"unexpected.*image_url", err) or \
|
||
re.search(r"does not support.*image", err) or \
|
||
re.search(r"image.*not.*support", err) or \
|
||
re.search(r"unsupported.*content.*type.*image", err):
|
||
hints["supports_vision"] = False
|
||
|
||
# ── Response/Stream format hints from content-type or error ──
|
||
if re.search(r"content.type.*text/event.stream", err) or \
|
||
re.search(r"stream.*sse|sse.*expected", err):
|
||
hints["stream_format"] = "sse_data"
|
||
if re.search(r"ndjson|json.*lines", err):
|
||
hints["stream_format"] = "json_lines"
|
||
|
||
return hints
|
||
|
||
@staticmethod
|
||
def merge_into_schema(hints: dict, schema: ProviderSchema) -> ProviderSchema:
|
||
for k, v in hints.items():
|
||
if k == "field_names" and isinstance(v, dict):
|
||
schema.field_names.update(v)
|
||
elif k == "param_names" and isinstance(v, dict):
|
||
schema.param_names.update(v)
|
||
elif hasattr(schema, k):
|
||
setattr(schema, k, v)
|
||
return schema
|
||
|
||
|
||
def _schema_cache_key(target_url=None, backend=None, model=None):
|
||
host = urllib.parse.urlparse(target_url or TARGET_URL).netloc.lower()
|
||
return f"auto-schema|{backend or BACKEND}|{host}|{model or '*'}"
|
||
|
||
|
||
def _load_schema(target_url=None, backend=None, model=None):
|
||
caps = _load_provider_caps()
|
||
key = _schema_cache_key(target_url, backend, model)
|
||
raw = caps.get(key)
|
||
generic = caps.get(_schema_cache_key(target_url, backend, model="*"))
|
||
data = raw or generic or {}
|
||
if not data:
|
||
return ProviderSchema()
|
||
# Staleness check: re-learn after 24h (86400s)
|
||
updated = data.get("_updated", 0)
|
||
if isinstance(updated, (int, float)) and time.time() - updated > 86400:
|
||
print(f"[auto-sense] cached schema stale ({int(time.time()-updated)}s old), re-learning", file=sys.stderr)
|
||
return ProviderSchema()
|
||
return ProviderSchema(
|
||
supported_roles=tuple(data.get("supported_roles", ("user", "assistant"))),
|
||
content_type=data.get("content_type", "string"),
|
||
content_block_types=tuple(data.get("content_block_types", ())),
|
||
tool_result_style=data.get("tool_result_style", "inline"),
|
||
tool_call_style=data.get("tool_call_style", "openai_function"),
|
||
accepts_tool_role=data.get("accepts_tool_role", False),
|
||
accepts_system_role=data.get("accepts_system_role", True),
|
||
cc_body_wrap=data.get("cc_body_wrap", False),
|
||
field_names=dict(data.get("field_names", {})),
|
||
auth_type=data.get("auth_type", ""),
|
||
auth_header=data.get("auth_header", "Authorization"),
|
||
auth_scheme=data.get("auth_scheme", "Bearer "),
|
||
tool_decl_format=data.get("tool_decl_format", "openai"),
|
||
param_names=dict(data.get("param_names", {
|
||
"max_tokens": "max_tokens",
|
||
"temperature": "temperature",
|
||
"top_p": "top_p",
|
||
})),
|
||
response_format=data.get("response_format", "auto"),
|
||
stream_format=data.get("stream_format", "auto"),
|
||
supports_vision=data.get("supports_vision", True),
|
||
)
|
||
|
||
|
||
def _save_schema(schema: ProviderSchema, target_url=None, backend=None, model=None):
|
||
caps = _load_provider_caps()
|
||
key = _schema_cache_key(target_url, backend, model)
|
||
caps[key] = schema.hints()
|
||
caps[key]["_updated"] = time.time()
|
||
caps[key]["_backend"] = backend or BACKEND
|
||
_save_provider_caps()
|
||
print(f"[auto-sense] cached schema {key}", file=sys.stderr)
|
||
|
||
|
||
class SchemaAdapter:
|
||
"""Convert Responses API messages based on a detected ProviderSchema."""
|
||
|
||
def __init__(self, schema: ProviderSchema):
|
||
self.s = schema
|
||
|
||
def convert(self, input_data, instructions=""):
|
||
if self.s.content_type == "string" and not self.s.content_block_types:
|
||
return self._to_plain_string(input_data, instructions)
|
||
return self._to_content_blocks(input_data, instructions)
|
||
|
||
def _to_plain_string(self, input_data, instructions=""):
|
||
"""Fallback: user/assistant string content — no tool roles."""
|
||
msgs = []
|
||
if instructions and self.s.accepts_system_role:
|
||
msgs.append({"role": "system", "content": instructions})
|
||
elif instructions:
|
||
msgs.append({"role": "user", "content": instructions})
|
||
if isinstance(input_data, str):
|
||
msgs.append({"role": "user", "content": input_data})
|
||
return msgs
|
||
if not isinstance(input_data, list):
|
||
return msgs
|
||
last_flushed = []
|
||
pending = []
|
||
for item in input_data:
|
||
t = item.get("type")
|
||
if t == "function_call":
|
||
cid = item.get("call_id") or item.get("id") or uid("fc")
|
||
pending.append({"id": cid, "name": item.get("name", ""),
|
||
"arguments": item.get("arguments", "{}")})
|
||
continue
|
||
if pending:
|
||
last_flushed = [p["id"] for p in pending]
|
||
msgs.append({"role": "assistant", "content": None,
|
||
"tool_calls": [{"id": p["id"], "type": "function",
|
||
"function": {"name": p["name"],
|
||
"arguments": p["arguments"]}}
|
||
for p in pending]})
|
||
pending = []
|
||
if t == "message":
|
||
role = "user" if item.get("role") in ("user", "developer") else "assistant"
|
||
text = _extract_text(item.get("content", []))
|
||
if text:
|
||
msgs.append({"role": role, "content": text})
|
||
elif t == "function_call_output":
|
||
out = item.get("output", "")
|
||
if not isinstance(out, str):
|
||
out = json.dumps(out, ensure_ascii=False)
|
||
msgs.append({"role": "user", "content": out[:8000]})
|
||
if pending:
|
||
last_flushed = [p["id"] for p in pending]
|
||
msgs.append({"role": "assistant", "content": None,
|
||
"tool_calls": [{"id": p["id"], "type": "function",
|
||
"function": {"name": p["name"],
|
||
"arguments": p["arguments"]}}
|
||
for p in pending]})
|
||
return msgs
|
||
|
||
def _to_content_blocks(self, input_data, instructions=""):
|
||
msgs = []
|
||
pending_tc = []
|
||
tool_name_by_id = {}
|
||
last_ids = []
|
||
|
||
def flush():
|
||
nonlocal last_ids
|
||
if not pending_tc:
|
||
return
|
||
last_ids = [t["id"] for t in pending_tc]
|
||
msgs.append({"role": "assistant", "content": pending_tc})
|
||
pending_tc.clear()
|
||
|
||
_str = self.s.content_type == "string"
|
||
|
||
if instructions:
|
||
msgs.append({"role": "user", "content": instructions if _str else [{"type": "text", "text": instructions}]})
|
||
|
||
if isinstance(input_data, str):
|
||
msgs.append({"role": "user", "content": input_data if _str else [{"type": "text", "text": input_data}]})
|
||
return msgs
|
||
if not isinstance(input_data, list):
|
||
return msgs
|
||
|
||
for item in input_data:
|
||
t = item.get("type")
|
||
if t == "function_call":
|
||
cid = item.get("call_id") or item.get("id") or uid("call")
|
||
nm = item.get("name") or "exec_command"
|
||
tool_name_by_id[cid] = nm
|
||
tc_block = self._tool_call_block(cid, nm, item.get("arguments", "{}"))
|
||
if tc_block:
|
||
pending_tc.append(tc_block)
|
||
continue
|
||
flush()
|
||
if t == "message":
|
||
role = "user" if item.get("role") in ("user", "developer") else "assistant"
|
||
text = _extract_text(item.get("content", []))
|
||
if text:
|
||
msgs.append({"role": role, "content": text if _str else [{"type": "text", "text": text}]})
|
||
elif t == "function_call_output":
|
||
cid = item.get("call_id") or item.get("id") or ""
|
||
if not cid and last_ids:
|
||
idx = sum(1 for m in msgs for c in (m.get("content") or [])
|
||
if isinstance(c, dict) and c.get("type") in
|
||
("tool_result", "tool-result"))
|
||
if idx < len(last_ids):
|
||
cid = last_ids[idx]
|
||
out = item.get("output", "")
|
||
if not isinstance(out, str):
|
||
out = json.dumps(out, ensure_ascii=False)
|
||
tr = self._tool_result_block(cid, out)
|
||
if tr:
|
||
msgs.append({"role": "user", "content": [tr]})
|
||
flush()
|
||
return msgs
|
||
|
||
def _tool_call_block(self, cid, name, args):
|
||
style = self.s.tool_call_style
|
||
fn = self.s.field_names
|
||
if style == "tool-call":
|
||
return {
|
||
"type": fn.get("tool_call_type", "tool-call"),
|
||
fn.get("tool_call_id_field", "id"): cid,
|
||
fn.get("tool_call_name_field", "name"): name,
|
||
fn.get("tool_call_args_field", "arguments"): args,
|
||
}
|
||
elif style == "anthropic_tool_use":
|
||
try:
|
||
parsed = json.loads(args)
|
||
except Exception:
|
||
parsed = {}
|
||
return {
|
||
"type": fn.get("tool_use_type", "tool_use"),
|
||
fn.get("tool_call_id_field", "id"): cid,
|
||
fn.get("tool_call_name_field", "name"): name,
|
||
fn.get("tool_call_args_field", "input"): parsed,
|
||
}
|
||
else:
|
||
return None # handled as OpenAI function call
|
||
|
||
def _tool_result_block(self, cid, output):
|
||
style = self.s.tool_result_style
|
||
fn = self.s.field_names
|
||
if style == "tool_result_block":
|
||
return {
|
||
"type": fn.get("tool_result_type", "tool_result"),
|
||
fn.get("tool_use_id", "tool_use_id"): cid or "",
|
||
"content": [{"type": "text", "text": output[:8000]}],
|
||
}
|
||
elif style == "anthropic":
|
||
return {
|
||
"type": fn.get("tool_result_type", "tool_result"),
|
||
fn.get("tool_use_id", "tool_use_id"): cid or "",
|
||
"content": output[:8000],
|
||
}
|
||
return None # inline — handled by _to_plain_string
|
||
|
||
|
||
def _sanitize_err_body(body):
|
||
"""Sanitize upstream error body: strip HTML, truncate, remove control chars."""
|
||
if not body:
|
||
return ""
|
||
s = re.sub(r'<[^>]+>', '', body)
|
||
s = re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f]', '', s)
|
||
s = s.strip()[:1000]
|
||
return s
|
||
|
||
|
||
def _extract_text(content):
|
||
if isinstance(content, str):
|
||
return content
|
||
if not isinstance(content, list):
|
||
return ""
|
||
parts = []
|
||
for p in content:
|
||
if isinstance(p, str):
|
||
parts.append(p)
|
||
elif isinstance(p, dict) and p.get("type") in ("input_text", "output_text", "text"):
|
||
parts.append(p.get("text", ""))
|
||
return "".join(parts)
|
||
|
||
|
||
# Persistent cache: image hash → description (survives across requests)
|
||
_vision_desc_cache = collections.OrderedDict()
|
||
_vision_desc_lock = threading.Lock()
|
||
_VISION_DESC_CACHE_MAX = 256
|
||
|
||
|
||
def _vision_describe_image(img_data):
|
||
"""Call vision fallback API to describe a single image.
|
||
|
||
Uses a module-level LRU cache so descriptions survive across requests.
|
||
A single image in a multi-turn conversation is only described once.
|
||
|
||
Returns:
|
||
description string or None on failure
|
||
"""
|
||
global _vision_desc_cache
|
||
|
||
if not VISION_FALLBACK_URL:
|
||
return None
|
||
|
||
# Normalize image URL from various formats
|
||
if isinstance(img_data, dict):
|
||
img_url = img_data.get("url", "")
|
||
if not img_url:
|
||
inner = img_data.get("image_url", img_data)
|
||
img_url = inner.get("url", "") if isinstance(inner, dict) else str(inner)
|
||
else:
|
||
img_url = str(img_data)
|
||
|
||
if not img_url:
|
||
return None
|
||
|
||
img_hash = hashlib.md5(img_url.encode("utf-8", errors="replace")).hexdigest()
|
||
|
||
# Check persistent cache first (no API call needed)
|
||
with _vision_desc_lock:
|
||
if img_hash in _vision_desc_cache:
|
||
return _vision_desc_cache[img_hash]
|
||
|
||
try:
|
||
payload = json.dumps({
|
||
"model": VISION_FALLBACK_MODEL,
|
||
"messages": [{"role": "user", "content": [
|
||
{"type": "text", "text": "Describe the content of this image in detail. If it contains text, transcribe it fully."},
|
||
{"type": "image_url", "image_url": {"url": img_url}},
|
||
]}],
|
||
"max_tokens": 1024,
|
||
"stream": False,
|
||
}).encode()
|
||
|
||
headers = {"Content-Type": "application/json"}
|
||
if VISION_FALLBACK_KEY:
|
||
headers["Authorization"] = f"Bearer {VISION_FALLBACK_KEY}"
|
||
|
||
req = urllib.request.Request(VISION_FALLBACK_URL, data=payload, headers=headers)
|
||
resp = urllib.request.urlopen(req, timeout=30)
|
||
body = json.loads(resp.read().decode())
|
||
|
||
choices = body.get("choices", [])
|
||
if choices:
|
||
msg = choices[0].get("message", {})
|
||
desc = msg.get("content", "")
|
||
if desc:
|
||
with _vision_desc_lock:
|
||
_vision_desc_cache[img_hash] = desc
|
||
if len(_vision_desc_cache) > _VISION_DESC_CACHE_MAX:
|
||
_vision_desc_cache.popitem(last=False)
|
||
return desc
|
||
except Exception as e:
|
||
print(f"[vision-fallback] error describing image: {e}", file=sys.stderr)
|
||
|
||
return None
|
||
|
||
|
||
def _preprocess_vision(messages, schema):
|
||
"""Replace image blocks with text descriptions when provider lacks vision support.
|
||
|
||
Works on OpenAI Chat Completions message format (post-conversion).
|
||
"""
|
||
if schema.supports_vision:
|
||
return messages
|
||
|
||
for msg in messages:
|
||
content = msg.get("content")
|
||
if not isinstance(content, list):
|
||
continue
|
||
new_parts = []
|
||
changed = False
|
||
for part in content:
|
||
if isinstance(part, dict) and part.get("type") in ("image_url", "input_image"):
|
||
changed = True
|
||
img_data = part.get("image_url", part)
|
||
description = _vision_describe_image(img_data)
|
||
if description:
|
||
new_parts.append({"type": "text", "text": f"[Image: {description}]"})
|
||
else:
|
||
new_parts.append({"type": "text", "text": "[Image: description non disponible - modele text-only]"})
|
||
else:
|
||
new_parts.append(part)
|
||
if changed:
|
||
msg["content"] = new_parts
|
||
|
||
return messages
|
||
|
||
|
||
def _preprocess_vision_input(input_data, schema):
|
||
"""Replace input_image blocks in Responses API input format with text descriptions.
|
||
|
||
This runs BEFORE adapter.convert() so images are replaced before any
|
||
conversion function can silently drop them.
|
||
"""
|
||
if schema.supports_vision:
|
||
return input_data
|
||
if not isinstance(input_data, list):
|
||
return input_data
|
||
|
||
changed_any = False
|
||
|
||
for item in input_data:
|
||
if item.get("type") != "message":
|
||
continue
|
||
content = item.get("content")
|
||
if not isinstance(content, list):
|
||
continue
|
||
new_parts = []
|
||
changed = False
|
||
for part in content:
|
||
if isinstance(part, dict) and part.get("type") == "input_image":
|
||
changed = True
|
||
changed_any = True
|
||
img_data = part.get("image_url", part)
|
||
description = _vision_describe_image(img_data)
|
||
if description:
|
||
new_parts.append({"type": "input_text", "text": f"[Image: {description}]"})
|
||
else:
|
||
new_parts.append({"type": "input_text", "text": "[Image: description non disponible - modele text-only]"})
|
||
else:
|
||
new_parts.append(part)
|
||
if changed:
|
||
item["content"] = new_parts
|
||
|
||
return input_data
|
||
|
||
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
# HTTP Server
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
|
||
_MAX_REQLOG_LINES = 2000
|
||
|
||
def _log_resp(resp_id, status, output):
|
||
try:
|
||
import datetime as _dt
|
||
_lp = os.path.join(_LOG_DIR, "requests.log")
|
||
with open(_lp, "a", encoding="utf-8") as _f:
|
||
_f.write(f" RESPONSE id={resp_id} status={status}\n")
|
||
if output:
|
||
for o in output:
|
||
ot = o.get("type")
|
||
if ot == "message":
|
||
_f.write(f" -> message: {o.get('content',[{}])[0].get('text','')[:200]}\n")
|
||
elif ot == "function_call":
|
||
_f.write(f" -> function_call: {o.get('name')}({o.get('arguments','')[:120]})\n")
|
||
else:
|
||
_f.write(f" -> {ot}\n")
|
||
_f.write(f"{'='*60}\n")
|
||
_f.flush()
|
||
_f.seek(0)
|
||
lines = _f.readlines()
|
||
if len(lines) > _MAX_REQLOG_LINES:
|
||
with open(_lp, "w", encoding="utf-8") as _f2:
|
||
_f2.writelines(lines[-_MAX_REQLOG_LINES:])
|
||
except Exception:
|
||
pass
|
||
|
||
class ConnectionTracker:
|
||
def __enter__(self):
|
||
global _active_connections
|
||
with _active_connections_lock:
|
||
_active_connections += 1
|
||
def __exit__(self, *a):
|
||
global _active_connections
|
||
with _active_connections_lock:
|
||
_active_connections -= 1
|
||
|
||
class RequestTracker:
|
||
def __init__(self, request_id):
|
||
self.request_id = request_id
|
||
self.cancelled = threading.Event()
|
||
|
||
def __enter__(self):
|
||
if self.request_id:
|
||
with _active_requests_lock:
|
||
_active_requests[self.request_id] = self
|
||
return self
|
||
|
||
def __exit__(self, *a):
|
||
if self.request_id:
|
||
with _active_requests_lock:
|
||
_active_requests.pop(self.request_id, None)
|
||
|
||
def _cancel_request(request_id):
|
||
with _active_requests_lock:
|
||
req = _active_requests.get(request_id)
|
||
if not req:
|
||
return False
|
||
req.cancelled.set()
|
||
return True
|
||
|
||
def _handle_shutdown_signal(signum, frame):
|
||
global _shutdown_requested
|
||
_shutdown_requested = True
|
||
print("[proxy] shutdown requested; draining connections", file=sys.stderr)
|
||
def _drain():
|
||
deadline = time.time() + 5
|
||
while time.time() < deadline:
|
||
with _active_connections_lock:
|
||
if _active_connections == 0:
|
||
break
|
||
time.sleep(0.1)
|
||
if SERVER is not None:
|
||
SERVER.shutdown()
|
||
threading.Thread(target=_drain, daemon=True).start()
|
||
|
||
def _upstream_timeout(body, stream):
|
||
input_data = body.get("input", "")
|
||
n_items = len(input_data) if isinstance(input_data, list) else 1
|
||
has_tools = bool(body.get("tools"))
|
||
if stream:
|
||
return min((180 if has_tools else 120) + n_items * 2, 300)
|
||
return min(60 + n_items * 2, 120)
|
||
|
||
def _auto_continue_gemini(handler, flush_event, message_id, model, gen_config, gemini_tools, system_parts, project_id, headers, endpoints, url_suffix, accumulated_text, output_items, message_started):
|
||
max_continuations = 5
|
||
for _cont in range(max_continuations):
|
||
cont_contents = [
|
||
{"role": "model", "parts": [{"text": accumulated_text[-12000:]}]},
|
||
{"role": "user", "parts": [{"text": "Continue exactly where you left off. Do not repeat anything already written."}]},
|
||
]
|
||
cont_request = {"contents": cont_contents, "generationConfig": dict(gen_config)}
|
||
if system_parts:
|
||
cont_request["systemInstruction"] = {"parts": system_parts}
|
||
if gemini_tools:
|
||
cont_request["tools"] = gemini_tools
|
||
cont_wrapped = {"project": project_id, "model": model, "request": cont_request}
|
||
if OAUTH_PROVIDER == "google-antigravity":
|
||
cont_wrapped["requestType"] = "agent"
|
||
cont_wrapped["userAgent"] = "antigravity"
|
||
cont_wrapped["requestId"] = f"agent-{uuid.uuid4().hex[:12]}"
|
||
cont_body = json.dumps(cont_wrapped).encode()
|
||
upstream = None
|
||
for ep in endpoints:
|
||
target = f"{ep}/{url_suffix}"
|
||
req = urllib.request.Request(target, data=cont_body, headers=headers)
|
||
try:
|
||
upstream = urllib.request.urlopen(req, timeout=180)
|
||
break
|
||
except Exception as e:
|
||
print(f"[auto-continue] {ep} failed: {e}", file=sys.stderr)
|
||
continue
|
||
if not upstream:
|
||
break
|
||
cont_text = ""
|
||
cont_finish = ""
|
||
cont_buf = ""
|
||
for raw_line in _stream_with_idle_timeout(upstream, _idle_timeout_for_model(model)):
|
||
line = raw_line.decode(errors="replace")
|
||
if line.startswith("data: "):
|
||
cont_buf += line[6:]
|
||
continue
|
||
if not line.strip() and cont_buf:
|
||
try:
|
||
chunk = json.loads(cont_buf)
|
||
except Exception:
|
||
cont_buf = ""
|
||
continue
|
||
cont_buf = ""
|
||
candidates = chunk.get("response", chunk).get("candidates", [])
|
||
if not candidates:
|
||
continue
|
||
cont_finish = candidates[0].get("finishReason", "")
|
||
parts = candidates[0].get("content", {}).get("parts", [])
|
||
for part in parts:
|
||
if part.get("thought"):
|
||
continue
|
||
if "text" in part and not part.get("functionCall"):
|
||
delta = part["text"]
|
||
if delta:
|
||
cont_text += delta
|
||
flush_event("response.output_text.delta", {"type": "response.output_text.delta", "output_index": 0, "content_index": 0, "delta": delta})
|
||
elif part.get("functionCall"):
|
||
fc = part["functionCall"]
|
||
call_id = f"call_{uuid.uuid4().hex[:24]}"
|
||
args_str = json.dumps(fc.get("args", fc.get("arguments", {})))
|
||
output_index = len(output_items)
|
||
flush_event("response.output_item.added", {"type": "response.output_item.added", "output_index": output_index, "item": {"type": "function_call", "id": call_id, "call_id": call_id, "name": fc.get("name", ""), "arguments": ""}})
|
||
flush_event("response.function_call_arguments.delta", {"type": "response.function_call_arguments.delta", "output_index": output_index, "item_id": call_id, "delta": args_str})
|
||
flush_event("response.function_call_arguments.done", {"type": "response.function_call_arguments.done", "output_index": output_index, "item_id": call_id, "arguments": args_str})
|
||
output_items.append({"tool": True, "fc": fc, "call_id": call_id})
|
||
accumulated_text += cont_text
|
||
print(f"[auto-continue] chunk {len(cont_text)} chars, finish={cont_finish}, total={len(accumulated_text)}", file=sys.stderr)
|
||
if cont_finish != "MAX_TOKENS":
|
||
break
|
||
return accumulated_text
|
||
|
||
_ANTIGRAVITY_MAX_CONTENTS = 20
|
||
_ANTIGRAVITY_MAX_TOOL_VERBATIM = 2
|
||
_ANTIGRAVITY_MAX_TOOL_CHARS = 2000
|
||
_ANTIGRAVITY_MAX_OLD_SUMMARY_CHARS = 1200
|
||
_ANTIGRAVITY_SOFT_CHARS = 120000
|
||
_ANTIGRAVITY_HARD_CHARS = 250000
|
||
_ANTIGRAVITY_EMERGENCY_CHARS = 500000
|
||
_ANTIGRAVITY_SIMPLE_WORDS = frozenset({"hi", "hello", "hey", "test", "ping", "thanks", "thank you", "ok", "okay", "yes", "no", "cool", "nice", "good", "great", "done", "go", "stop", "yep", "nope", "sure", "right", "correct", "continue", "cont", "k", "thx", "ty", "np", "lol", "brb", "bye"})
|
||
_ANTIGRAVITY_EDIT_WORDS = frozenset(("change", "fix", "update", "redesign", "rewrite", "modify", "improve", "replace", "edit", "make it", "add", "remove", "delete", "rename", "move", "convert", "create", "build", "implement"))
|
||
_ANTIGRAVITY_REFERENCE_WORDS = frozenset(("previous", "file", "error", "again", "that", "this", "it", "same", "last", "above", "earlier", "before", "earlier output", "last error", "previous result", "what was", "show me", "give me"))
|
||
|
||
def _antigravity_is_simple_user(text):
|
||
if not text:
|
||
return True
|
||
stripped = text.strip().lower()
|
||
if stripped in _ANTIGRAVITY_SIMPLE_WORDS:
|
||
return True
|
||
if len(stripped) < 30:
|
||
words = set(stripped.split())
|
||
if not words.intersection(_ANTIGRAVITY_REFERENCE_WORDS) and not words.intersection(_ANTIGRAVITY_EDIT_WORDS):
|
||
return True
|
||
return False
|
||
|
||
def _antigravity_normalize_context(input_data, model=""):
|
||
"""
|
||
Normalize context for Antigravity while PRESERVING function_call -> function_call_output pairs.
|
||
|
||
Google's Gemini API requires STRICT alternation:
|
||
- functionCall (role=model) MUST be immediately followed by functionResponse (role=user)
|
||
|
||
This function compacts old history but NEVER breaks tool call/response pairs.
|
||
"""
|
||
if not isinstance(input_data, list) or len(input_data) < 2:
|
||
return input_data
|
||
is_claude_model = "claude" in model.lower()
|
||
|
||
latest_user = ""
|
||
latest_user_idx = -1
|
||
for i in range(len(input_data) - 1, -1, -1):
|
||
item = input_data[i]
|
||
if isinstance(item, dict) and item.get("type") == "message" and item.get("role") == "user":
|
||
c = item.get("content", "")
|
||
if isinstance(c, str):
|
||
latest_user = c
|
||
elif isinstance(c, list):
|
||
latest_user = "\n".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict))
|
||
latest_user_idx = i
|
||
break
|
||
|
||
if not latest_user:
|
||
return input_data
|
||
|
||
is_simple = _antigravity_is_simple_user(latest_user)
|
||
|
||
n_raw = len(input_data)
|
||
n_tool_outputs = sum(1 for it in input_data if isinstance(it, dict) and it.get("type") == "function_call_output")
|
||
n_tool_calls = sum(1 for it in input_data if isinstance(it, dict) and it.get("type") == "function_call")
|
||
|
||
auto_reset = (n_raw > 200 or n_tool_outputs > 20) and is_simple
|
||
if os.environ.get("ANTIGRAVITY_AUTO_RESET_POLLUTED_CONTEXT", "1") != "1":
|
||
auto_reset = False
|
||
|
||
has_compaction_summary = any(
|
||
isinstance(it, dict) and it.get("type") == "message" and it.get("role") == "user"
|
||
and ("Auto-compacted" in str(it.get("content", "")) or "auto-compacted" in str(it.get("content", "")).lower())
|
||
for it in input_data
|
||
)
|
||
|
||
if is_simple and auto_reset and not has_compaction_summary:
|
||
system_items = [it for it in input_data if isinstance(it, dict) and it.get("type") == "message" and it.get("role") in ("developer", "system")]
|
||
user_item = input_data[latest_user_idx]
|
||
result = system_items + [user_item] if system_items else [user_item]
|
||
print(f"[antigravity-context] raw_items={n_raw} compacted_items={n_raw} final_items={len(result)}", file=sys.stderr)
|
||
print(f"[antigravity-context] raw_tool_outputs={n_tool_outputs} kept_tool_outputs=0", file=sys.stderr)
|
||
print(f"[antigravity-context] simple_latest_user=true auto_reset={auto_reset} has_compaction={has_compaction_summary}", file=sys.stderr)
|
||
return result
|
||
|
||
dev_messages = []
|
||
recent_items = []
|
||
tool_outputs = []
|
||
tool_calls = []
|
||
|
||
for i, item in enumerate(input_data):
|
||
if not isinstance(item, dict):
|
||
continue
|
||
t = item.get("type")
|
||
if t == "message" and item.get("role") in ("developer", "system"):
|
||
dev_messages.append(item)
|
||
elif t == "function_call_output":
|
||
tool_outputs.append((i, item))
|
||
elif t == "function_call":
|
||
tool_calls.append((i, item))
|
||
elif t == "message":
|
||
recent_items.append((i, item))
|
||
|
||
latest_words = set(latest_user.strip().lower().split())
|
||
has_edit_intent = bool(latest_words.intersection(_ANTIGRAVITY_EDIT_WORDS))
|
||
has_ref_intent = bool(latest_words.intersection(_ANTIGRAVITY_REFERENCE_WORDS))
|
||
if is_claude_model:
|
||
keep_tools = len(tool_outputs)
|
||
else:
|
||
keep_tools = 2 if (has_edit_intent or has_ref_intent) else 1
|
||
|
||
if is_claude_model:
|
||
kept_tools = tool_outputs
|
||
else:
|
||
kept_tools = tool_outputs[-keep_tools:] if tool_outputs and (has_edit_intent or has_ref_intent) else []
|
||
|
||
for idx_t, t_item in enumerate(kept_tools):
|
||
orig = t_item[1]
|
||
out = orig.get("output", "")
|
||
if isinstance(out, list):
|
||
cleaned = []
|
||
for part in out:
|
||
if isinstance(part, dict) and part.get("type") in ("input_image", "image_url"):
|
||
url = part.get("image_url", {}).get("url", "") if isinstance(part.get("image_url"), dict) else ""
|
||
if url.startswith("data:"):
|
||
cleaned.append({"type": "text", "text": "[image data stripped for compaction]"})
|
||
continue
|
||
cleaned.append(part)
|
||
if len(json.dumps(cleaned)) > _ANTIGRAVITY_MAX_TOOL_CHARS:
|
||
new_item = dict(orig)
|
||
new_item["output"] = json.dumps(cleaned)[:_ANTIGRAVITY_MAX_TOOL_CHARS] + "\n... [truncated]"
|
||
kept_tools[idx_t] = (t_item[0], new_item)
|
||
elif cleaned != out:
|
||
new_item = dict(orig)
|
||
new_item["output"] = cleaned
|
||
kept_tools[idx_t] = (t_item[0], new_item)
|
||
elif isinstance(out, str) and len(out) > _ANTIGRAVITY_MAX_TOOL_CHARS:
|
||
new_item = dict(orig)
|
||
new_item["output"] = out[:_ANTIGRAVITY_MAX_TOOL_CHARS] + f"\n... [truncated: kept {_ANTIGRAVITY_MAX_TOOL_CHARS} of {len(out)} chars]"
|
||
kept_tools[idx_t] = (t_item[0], new_item)
|
||
|
||
n_summarized = len(tool_outputs) - len(kept_tools)
|
||
|
||
tail_start = max(0, len(recent_items) - 6)
|
||
recent_tail = recent_items[tail_start:]
|
||
|
||
deduped_tail = []
|
||
seen_goal_context = False
|
||
for idx, msg_item in recent_tail:
|
||
content_str = ""
|
||
c = msg_item.get("content", "")
|
||
if isinstance(c, str):
|
||
content_str = c
|
||
elif isinstance(c, list):
|
||
content_str = " ".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict))
|
||
if "<goal_context>" in content_str:
|
||
if seen_goal_context:
|
||
continue
|
||
seen_goal_context = True
|
||
deduped_tail.append((idx, msg_item))
|
||
recent_tail = deduped_tail if deduped_tail else recent_tail
|
||
|
||
# Build call_id -> function_call mapping
|
||
tool_call_map = {}
|
||
for _, call_item in tool_calls:
|
||
cid = call_item.get("call_id", call_item.get("id", ""))
|
||
if cid:
|
||
tool_call_map[cid] = call_item
|
||
|
||
# Build result: maintain PAIRED sequence (function_call -> function_call_output)
|
||
result = list(dev_messages)
|
||
|
||
compaction_summaries = []
|
||
for idx, msg_item in recent_items:
|
||
if msg_item is input_data[latest_user_idx]:
|
||
continue
|
||
c = msg_item.get("content", "")
|
||
content_str = c if isinstance(c, str) else " ".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict)) if isinstance(c, list) else ""
|
||
if "Auto-compacted" in content_str or "auto-compacted" in content_str.lower():
|
||
compaction_summaries.append(msg_item)
|
||
|
||
if n_summarized > 0:
|
||
n_read_calls = sum(1 for it in input_data if isinstance(it, dict) and it.get("type") == "function_call"
|
||
and it.get("name", "") not in ("write", "apply_diff", "edit_file")
|
||
and "write" not in json.dumps(it.get("arguments", {})).lower())
|
||
n_write_calls = n_tool_calls - n_read_calls
|
||
if n_read_calls > 10 and n_write_calls == 0:
|
||
summary_text = (
|
||
f"[CONTEXT HISTORY: {n_summarized} prior tool calls compacted. "
|
||
f"YOU ALREADY READ THE TARGET FILE EXTENSIVELY. "
|
||
f"DO NOT READ ANY MORE FILES. "
|
||
f"YOU MUST NOW USE THE WRITE TOOL TO APPLY YOUR EDITS DIRECTLY. "
|
||
f"DO NOT call exec_command or read_files AGAIN.]"
|
||
)
|
||
else:
|
||
summary_text = f"[Tool history summary: {n_summarized} older tool outputs omitted. {n_tool_calls} prior function calls were made.]"
|
||
result.append({"type": "message", "role": "user", "content": [{"type": "input_text", "text": summary_text}]})
|
||
|
||
# CRITICAL: Add tool CALLS and their corresponding OUTPUTS in PAIRED ORDER
|
||
# Only include pairs where BOTH call and output are present
|
||
added_pairs = set()
|
||
for _, tool_item in kept_tools:
|
||
cid = tool_item.get("call_id", tool_item.get("id", ""))
|
||
if cid and cid in tool_call_map and cid not in added_pairs:
|
||
# Add function_call FIRST, then function_call_output IMMEDIATELY
|
||
result.append(tool_call_map[cid])
|
||
result.append(tool_item)
|
||
added_pairs.add(cid)
|
||
|
||
# Add any orphan tool outputs (no matching call found) - these go at the end before messages
|
||
for _, tool_item in kept_tools:
|
||
cid = tool_item.get("call_id", tool_item.get("id", ""))
|
||
if cid not in added_pairs:
|
||
result.append(tool_item)
|
||
|
||
for cs_item in compaction_summaries:
|
||
result.append(cs_item)
|
||
|
||
for _, msg_item in recent_tail:
|
||
if msg_item is not input_data[latest_user_idx]:
|
||
result.append(msg_item)
|
||
|
||
latest_norm = " ".join(latest_user.strip().split())[:200].lower()
|
||
already_present = False
|
||
for r in result:
|
||
if isinstance(r, dict) and r.get("type") == "message" and r.get("role") == "user":
|
||
c = r.get("content", "")
|
||
if isinstance(c, str):
|
||
rn = " ".join(c.strip().split())[:200].lower()
|
||
elif isinstance(c, list):
|
||
combined = " ".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict))
|
||
rn = " ".join(combined.strip().split())[:200].lower()
|
||
else:
|
||
rn = ""
|
||
if rn == latest_norm:
|
||
already_present = True
|
||
break
|
||
|
||
if not already_present:
|
||
result.append(input_data[latest_user_idx])
|
||
|
||
total_chars = sum(len(json.dumps(it, ensure_ascii=False)) for it in result)
|
||
|
||
if total_chars > _ANTIGRAVITY_EMERGENCY_CHARS:
|
||
print(f"[antigravity-context] EMERGENCY: {total_chars} chars exceeds limit, resetting to minimal", file=sys.stderr)
|
||
result = list(dev_messages)
|
||
if compaction_summaries:
|
||
result.extend(compaction_summaries)
|
||
result.append(input_data[latest_user_idx])
|
||
total_chars = sum(len(json.dumps(it, ensure_ascii=False)) for it in result)
|
||
|
||
while len(result) > _ANTIGRAVITY_MAX_CONTENTS and total_chars > _ANTIGRAVITY_SOFT_CHARS:
|
||
for i in range(1, len(result) - 1):
|
||
if isinstance(result[i], dict) and result[i].get("type") in ("message",):
|
||
removed = result.pop(i)
|
||
total_chars -= len(json.dumps(removed, ensure_ascii=False))
|
||
break
|
||
else:
|
||
break
|
||
|
||
est_tokens = total_chars // 4
|
||
print(f"[antigravity-context] raw_items={n_raw} final_items={len(result)}", file=sys.stderr)
|
||
print(f"[antigravity-context] raw_tool_outputs={n_tool_outputs} kept_tool_outputs={len(kept_tools)} summarized_tool_outputs={n_summarized}", file=sys.stderr)
|
||
print(f"[antigravity-context] simple_latest_user={is_simple} auto_reset={auto_reset}", file=sys.stderr)
|
||
print(f"[antigravity-context] final_chars={total_chars} estimated_tokens={est_tokens}", file=sys.stderr)
|
||
|
||
return result
|
||
|
||
class Handler(http.server.BaseHTTPRequestHandler):
|
||
protocol_version = "HTTP/1.1"
|
||
|
||
def do_GET(self):
|
||
if self.path in ("/v1/models", "/models"):
|
||
self.send_json(200, {"object": "list", "data": MODELS})
|
||
elif self.path in ("/v1/accounts", "/accounts"):
|
||
info = {"provider": BACKEND, "oauth_provider": OAUTH_PROVIDER}
|
||
if BACKEND in ("codebuff", "freebuff"):
|
||
info["accounts"] = _cb_pool.status()
|
||
info["total"] = len(_cb_pool._accounts)
|
||
elif OAUTH_PROVIDER and OAUTH_PROVIDER.startswith("google"):
|
||
pool = _google_antigravity_pool if OAUTH_PROVIDER == "google-antigravity" else _google_cli_pool
|
||
info["accounts"] = pool.status()
|
||
info["total"] = len(pool._accounts)
|
||
elif _api_key_pool:
|
||
info["accounts"] = _api_key_pool.status()
|
||
info["total"] = len(_api_key_pool._accounts)
|
||
else:
|
||
info["accounts"] = []
|
||
info["total"] = 0
|
||
self.send_json(200, info)
|
||
elif self.path in ("/health", "/v1/health"):
|
||
_mem_mb = 0
|
||
try:
|
||
if _IS_WINDOWS:
|
||
import ctypes
|
||
class _PMI(ctypes.Structure):
|
||
_fields_ = [("cb", ctypes.c_ulong), ("PageFaultCount", ctypes.c_ulong),
|
||
("PeakWorkingSetSize", ctypes.c_size_t), ("WorkingSetSize", ctypes.c_size_t),
|
||
("QuotaPeakPagedPoolUsage", ctypes.c_size_t), ("QuotaPagedPoolUsage", ctypes.c_size_t),
|
||
("QuotaPeakNonPagedPoolUsage", ctypes.c_size_t), ("QuotaNonPagedPoolUsage", ctypes.c_size_t),
|
||
("PagefileUsage", ctypes.c_size_t), ("PeakPagefileUsage", ctypes.c_size_t)]
|
||
_pmi = _PMI()
|
||
_pmi.cb = ctypes.sizeof(_PMI)
|
||
ctypes.windll.psapi.GetProcessMemoryInfo.argtypes = [ctypes.c_void_p, ctypes.c_void_p, ctypes.c_ulong]
|
||
ctypes.windll.psapi.GetProcessMemoryInfo.restype = ctypes.c_int
|
||
ctypes.windll.psapi.GetProcessMemoryInfo(
|
||
ctypes.windll.kernel32.GetCurrentProcess(), ctypes.byref(_pmi), _pmi.cb)
|
||
_mem_mb = _pmi.PeakWorkingSetSize / (1024 * 1024)
|
||
else:
|
||
import resource as _res
|
||
_mem_mb = _res.getrusage(_res.RUSAGE_SELF).ru_maxrss / 1024
|
||
except Exception:
|
||
pass
|
||
_uptime = time.time() - _START_TIME if '_START_TIME' in dir() else 0
|
||
self.send_json(200, {"ok": True, "backend": BACKEND,
|
||
"target_url": TARGET_URL,
|
||
"models": [m.get("id") for m in MODELS],
|
||
"bgp_routes": len(BGP_ROUTES),
|
||
"uptime_s": round(_uptime, 1),
|
||
"memory_mb": round(_mem_mb, 1),
|
||
"requests_total": _STATS.get("requests", 0)})
|
||
elif self.path == "/admin/reload":
|
||
reloaded = _hot_reload_api_key()
|
||
key_preview = API_KEY[:8] + "..." if len(API_KEY) > 8 else "(empty)"
|
||
self.send_json(200, {"ok": True, "reloaded": reloaded,
|
||
"api_key_preview": key_preview,
|
||
"config_path": _CONFIG_PATH or "none"})
|
||
elif self.path == "/admin/verify-key":
|
||
result = _verify_api_key(API_KEY, TARGET_URL)
|
||
key_preview = API_KEY[:8] + "..." if len(API_KEY) > 8 else "(empty)"
|
||
result["api_key_preview"] = key_preview
|
||
self.send_json(200, result)
|
||
else:
|
||
self.send_error(404)
|
||
|
||
def do_POST(self):
|
||
if _shutdown_requested:
|
||
return self.send_json(503, {"error": {"type": "proxy_shutting_down",
|
||
"message": "Proxy is shutting down"}})
|
||
if self.path.startswith("/admin/cancel/"):
|
||
request_id = self.path.rsplit("/", 1)[-1]
|
||
if _cancel_request(request_id):
|
||
return self.send_json(200, {"ok": True, "cancelled": request_id})
|
||
return self.send_json(404, {"ok": False, "error": "request_not_found"})
|
||
if self.path in ("/v1/responses", "/responses"):
|
||
with ConnectionTracker():
|
||
self._handle()
|
||
else:
|
||
self.send_error(404)
|
||
|
||
_logf = None
|
||
|
||
def _handle(self):
|
||
_hot_reload_api_key()
|
||
try:
|
||
clen = int(self.headers.get("Content-Length", 0))
|
||
body = json.loads(self.rfile.read(clen))
|
||
except Exception as e:
|
||
return self.send_json(400, {"error": {"message": f"Bad request: {e}"}})
|
||
|
||
self._session_id = uuid.uuid4().hex[:8]
|
||
_sid = self._session_id
|
||
|
||
import datetime as _dt
|
||
_log_path = os.path.join(_LOG_DIR, "requests.log")
|
||
_ts = _dt.datetime.now().isoformat()
|
||
|
||
prev_id = body.get("previous_response_id")
|
||
raw_input = body.get("input", "")
|
||
input_data = resolve_previous_response(body)
|
||
input_data = _compact_input(input_data)
|
||
body["input"] = input_data
|
||
|
||
raw_types = [i.get("type") for i in raw_input] if isinstance(raw_input, list) else "str"
|
||
resolved_types = [i.get("type") for i in input_data] if isinstance(input_data, list) else "str"
|
||
|
||
print(f"[{_sid}] prev_id={prev_id} raw={raw_types} resolved={resolved_types}", file=sys.stderr)
|
||
with open(_log_path, "a", encoding="utf-8") as _lf:
|
||
_lf.write(f"\n{'='*60}\n{_ts} [session={_sid}] REQUEST {self.path}\n")
|
||
_lf.write(f" prev_id={prev_id}\n")
|
||
_lf.write(f" raw_input_types={raw_types}\n")
|
||
_lf.write(f" resolved_input_types={resolved_types}\n")
|
||
_lf.write(f" stream={body.get('stream')} model={body.get('model')} force_model={FORCE_MODEL}\n")
|
||
_lf.write(f" store_keys={list(_response_store.keys())}\n")
|
||
if isinstance(input_data, list):
|
||
for i, item in enumerate(input_data):
|
||
t = item.get("type")
|
||
if t == "message":
|
||
_lf.write(f" [{i}] message role={item.get('role')} text={str(item.get('content',''))[:120]}\n")
|
||
elif t == "function_call":
|
||
_lf.write(f" [{i}] function_call call_id={item.get('call_id')} id={item.get('id')} name={item.get('name')} args={item.get('arguments','')[:120]}\n")
|
||
elif t == "function_call_output":
|
||
_lf.write(f" [{i}] function_call_output id={item.get('id')} output={str(item.get('output',''))[:120]}\n")
|
||
else:
|
||
_lf.write(f" [{i}] {t}\n")
|
||
_lf.flush()
|
||
|
||
model = body.get("model", MODELS[0]["id"] if MODELS else "unknown")
|
||
if FORCE_MODEL:
|
||
model = FORCE_MODEL
|
||
body["model"] = FORCE_MODEL
|
||
stream = body.get("stream", False)
|
||
_desktop_forced_models = {"gpt-5.4-mini", "gpt-5.4", "gpt-5.5", "gpt-5-codex", "gpt-5.3-codex"}
|
||
_launcher_model = os.environ.get("CODEX_LAUNCHER_MODEL", "")
|
||
if _launcher_model and model in _desktop_forced_models:
|
||
print(f"[{_sid}] remap desktop model {model} -> {_launcher_model}", file=sys.stderr)
|
||
model = _launcher_model
|
||
body["model"] = model
|
||
request_id = body.get("request_id") or body.get("id") or uid("req")
|
||
if isinstance(input_data, list):
|
||
for item in input_data:
|
||
if isinstance(item, dict) and item.get("type") == "message" and item.get("role") == "user":
|
||
content = str(item.get("content", ""))
|
||
for url_m in re.finditer(r"https?://[^\s\]'\"<>]+", content):
|
||
_last_user_urls.append(url_m.group(0))
|
||
save_request_snapshot(request_id, body)
|
||
_req_t0 = time.time()
|
||
wait_start = time.monotonic()
|
||
_request_semaphore.acquire()
|
||
wait_ms = (time.monotonic() - wait_start) * 1000
|
||
if wait_ms > 100:
|
||
print(f"[{_sid}] waited {wait_ms:.0f}ms for upstream slot (concurrency gate)", file=sys.stderr)
|
||
try:
|
||
with RequestTracker(request_id) as tracker:
|
||
if BACKEND == "auto":
|
||
self._handle_auto(body, model, stream, tracker)
|
||
elif BACKEND == "anthropic":
|
||
self._handle_anthropic(body, model, stream, tracker)
|
||
elif BACKEND == "command-code":
|
||
self._handle_command_code(body, model, stream, tracker)
|
||
elif BACKEND in ("codebuff", "freebuff"):
|
||
self._handle_codebuff(body, model, stream, tracker)
|
||
elif (BACKEND or "").startswith("gemini-oauth"):
|
||
if OAUTH_PROVIDER == "google-antigravity":
|
||
self._handle_antigravity_v2(body, model, stream, tracker)
|
||
else:
|
||
self._handle_gemini_oauth(body, model, stream, tracker)
|
||
else:
|
||
self._handle_openai_compat(body, model, stream, tracker)
|
||
update_snapshot_response(request_id, "completed", time.time() - _req_t0)
|
||
except Exception as _snap_err:
|
||
update_snapshot_response(request_id, "error", time.time() - _req_t0, _snap_err)
|
||
raise
|
||
finally:
|
||
_request_semaphore.release()
|
||
|
||
def _handle_openai_compat(self, body, model, stream, tracker=None):
|
||
input_data = body.get("input", "")
|
||
policy = provider_policy()
|
||
|
||
pair_errors = validate_tool_pairs(input_data)
|
||
if pair_errors:
|
||
print(f"[tool-validator] repairing {len(pair_errors)} orphan tool outputs", file=sys.stderr)
|
||
input_data = repair_orphan_tool_outputs(input_data, pair_errors)
|
||
body = dict(body)
|
||
body["input"] = input_data
|
||
|
||
# synthetic tool-results disabled: causes deepseek-v4-pro truncation on opencode.ai
|
||
if False and (policy.get("synthetic_tool_results") or _provider_cap(model, "synthetic_tool_results", False)) and isinstance(input_data, list):
|
||
input_data, synthesized = synthesize_tool_results_for_chat(input_data)
|
||
if synthesized:
|
||
print("[provider-adapter] using synthetic tool-result continuation", file=sys.stderr)
|
||
body = dict(body)
|
||
body["input"] = input_data
|
||
|
||
compacted = False
|
||
if policy.get("compaction") and isinstance(input_data, list) and "claude" not in model.lower():
|
||
input_data, compacted = _adaptive_compact(input_data, model, policy)
|
||
if compacted:
|
||
body = dict(body)
|
||
body["input"] = input_data
|
||
|
||
if PROMPT_ENHANCER and isinstance(input_data, list):
|
||
input_data = _apply_prompt_enhancer(input_data)
|
||
body = dict(body)
|
||
body["input"] = input_data
|
||
|
||
crof_limit = _crof_item_limit(model)
|
||
_crof_eligible = True
|
||
if _crof_eligible and not compacted and isinstance(input_data, list):
|
||
_needs_compact = len(input_data) > crof_limit
|
||
max_tok = _get_model_max_tokens(model)
|
||
est_tok = _estimate_input_tokens(input_data) if max_tok else 0
|
||
if not _needs_compact and max_tok and est_tok > max_tok * 0.8:
|
||
_needs_compact = True
|
||
if _needs_compact:
|
||
_agg = 0
|
||
if max_tok and est_tok > max_tok:
|
||
_agg = 1
|
||
print(f"[crof-adaptive] proactive compact: {len(input_data)} items, est={est_tok}tok max={max_tok}tok agg={_agg}", file=sys.stderr)
|
||
input_data = _crof_compact_for_retry(input_data, model, aggression=_agg)
|
||
body = dict(body)
|
||
body["input"] = input_data
|
||
|
||
# Vision preprocessing for non-vision models
|
||
_schema = _load_schema(model=model)
|
||
_needs_vision_preprocess = False
|
||
if _schema and not _schema.supports_vision:
|
||
_needs_vision_preprocess = True
|
||
elif not _model_supports_vision(model):
|
||
print(f"[vision] model {model} detected as non-vision via name pattern, preprocessing images", file=sys.stderr)
|
||
if _schema:
|
||
_schema.supports_vision = False
|
||
_save_schema(_schema, model=model)
|
||
_needs_vision_preprocess = True
|
||
if _needs_vision_preprocess:
|
||
input_data = _preprocess_vision_input(input_data, _schema)
|
||
body["input"] = input_data
|
||
|
||
messages = oa_input_to_messages(input_data)
|
||
messages = _inject_stored_reasoning(messages)
|
||
instructions = body.get("instructions", "").strip()
|
||
if instructions:
|
||
messages.insert(0, {"role": "system", "content": instructions})
|
||
|
||
if BGP_ROUTES:
|
||
self._handle_bgp(body, model, stream, messages, input_data)
|
||
else:
|
||
chat_body = self._build_chat_body(model, messages, body, stream)
|
||
target = upstream_target(TARGET_URL, "/chat/completions")
|
||
if _api_key_pool:
|
||
pool_acct = _api_key_pool.get()
|
||
effective_key = pool_acct["token"] if pool_acct else API_KEY
|
||
else:
|
||
effective_key = _refresh_oauth_token()
|
||
fwd = forwarded_headers(self.headers, {
|
||
"Content-Type": "application/json",
|
||
"Authorization": f"Bearer {effective_key}",
|
||
**_openrouter_extra(),
|
||
}, browser_ua=True)
|
||
print(f"[{self._session_id}] POST {target} model={model} stream={stream} items={len(input_data) if isinstance(input_data,list) else 1}", file=sys.stderr)
|
||
chat_body_b = json.dumps(chat_body).encode()
|
||
max_retries = 3
|
||
for attempt in range(max_retries + 1):
|
||
req = urllib.request.Request(target, data=chat_body_b, headers=fwd)
|
||
try:
|
||
upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, stream))
|
||
except urllib.error.HTTPError as e:
|
||
err_body = e.read().decode()
|
||
if re.search(r"unknown variant\b.*image_url", err_body.lower()) or \
|
||
re.search(r"unexpected.*image_url", err_body.lower()) or \
|
||
re.search(r"does not support.*image", err_body.lower()):
|
||
_schema = _load_schema(model=model)
|
||
if _schema:
|
||
_schema.supports_vision = False
|
||
if attempt < max_retries:
|
||
print(f"[{self._session_id}] vision not supported, retrying with image preprocessing", file=sys.stderr)
|
||
messages = _preprocess_vision(messages, _schema) if _schema else messages
|
||
chat_body = self._build_chat_body(model, messages, body, stream)
|
||
chat_body_b = json.dumps(chat_body).encode()
|
||
continue
|
||
if "context_length_exceeded" in err_body and attempt < max_retries:
|
||
import re as _re
|
||
_tok_m = _re.search(r'~?(\d+)\s*tokens', err_body)
|
||
if _tok_m:
|
||
_set_model_max_tokens(model, int(_tok_m.group(1)))
|
||
print(f"[{self._session_id}] context_length_exceeded (attempt {attempt+1}/{max_retries}), retrying with compaction (agg={attempt})!", file=sys.stderr)
|
||
policy = provider_policy()
|
||
if isinstance(input_data, list):
|
||
est = _estimate_input_tokens(input_data)
|
||
print(f"[{self._session_id}] applying compaction to {len(input_data)} items ~{est}tok", file=sys.stderr)
|
||
input_data = _crof_compact_for_retry(input_data, model, aggression=attempt)
|
||
body = dict(body)
|
||
body["input"] = input_data
|
||
messages = oa_input_to_messages(_preprocess_vision_input(input_data, _schema) if _schema and not _schema.supports_vision else input_data)
|
||
messages = _inject_stored_reasoning(messages)
|
||
instructions = body.get("instructions", "").strip()
|
||
if instructions:
|
||
messages.insert(0, {"role": "system", "content": instructions})
|
||
chat_body = self._build_chat_body(model, messages, body, stream)
|
||
chat_body_b = json.dumps(chat_body).encode()
|
||
continue
|
||
if e.code in (429, 502, 503) and attempt < max_retries:
|
||
if e.code == 429 and _api_key_pool:
|
||
pool_acct = _api_key_pool.get()
|
||
if pool_acct:
|
||
_api_key_pool.mark_rate_limited(pool_acct, 60)
|
||
next_acct = _api_key_pool.get()
|
||
if next_acct:
|
||
effective_key = next_acct["token"]
|
||
fwd["Authorization"] = f"Bearer {effective_key}"
|
||
print(f"[multi-account] rotating to key {next_acct['id']}", file=sys.stderr)
|
||
retry_after = e.headers.get("Retry-After")
|
||
if retry_after:
|
||
try:
|
||
wait = min(int(retry_after), 60)
|
||
except ValueError:
|
||
wait = min(2 ** (attempt + 1), 15)
|
||
else:
|
||
wait = min(2 ** (attempt + 1), 15)
|
||
print(f"[{self._session_id}] HTTP {e.code} (attempt {attempt+1}/{max_retries}), retrying in {wait}s: {err_body[:150]}", file=sys.stderr)
|
||
time.sleep(wait)
|
||
continue
|
||
return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
|
||
except (ConnectionResetError, ConnectionAbortedError, BrokenPipeError) as e:
|
||
if attempt < max_retries:
|
||
wait = min(2 ** (attempt + 1), 10)
|
||
print(f"[{self._session_id}] connection error (attempt {attempt+1}/{max_retries}), retrying in {wait}s: {e}", file=sys.stderr)
|
||
time.sleep(wait)
|
||
continue
|
||
return self.send_json(502, {"error": {"type": "proxy_error", "message": str(e)}})
|
||
except Exception as e:
|
||
return self.send_json(500, {"error": {"type": "proxy_error", "message": str(e)}})
|
||
break
|
||
self._forward_oa_compat(upstream, stream, model, chat_body, body, input_data, fwd, target, tracker)
|
||
|
||
def _build_chat_body(self, model, messages, body, stream):
|
||
chat_body = {"model": model, "messages": messages}
|
||
for k in ("temperature", "top_p"):
|
||
if k in body:
|
||
chat_body[k] = body[k]
|
||
chat_body["max_tokens"] = max(body.get("max_output_tokens", 0), 64000)
|
||
tools = oa_convert_tools(body.get("tools"))
|
||
if tools:
|
||
chat_body["tools"] = tools
|
||
if body.get("tool_choice"):
|
||
chat_body["tool_choice"] = body["tool_choice"]
|
||
chat_body["stream"] = stream
|
||
if not REASONING_ENABLED or REASONING_EFFORT == "none":
|
||
chat_body["enable_thinking"] = False
|
||
chat_body["reasoning_effort"] = "none"
|
||
else:
|
||
chat_body["reasoning_effort"] = REASONING_EFFORT
|
||
return chat_body
|
||
|
||
def _handle_antigravity_v2(self, body, model, stream, tracker=None):
|
||
_model_alias = {
|
||
"gemini-3.5-flash-high": "gemini-3-flash",
|
||
"gemini-3.5-flash-medium": "gemini-3-flash",
|
||
"gemini-3.5-flash-low": "gemini-3.5-flash-low",
|
||
"gemini-3.5-flash": "gemini-3-flash",
|
||
"gemini-3-flash-preview": "gemini-3-flash",
|
||
"gemini-3-pro-preview": "gemini-3.1-pro-low",
|
||
"gemini-3-pro": "gemini-3.1-pro-low",
|
||
"gemini-3-pro-low": "gemini-3.1-pro-low",
|
||
"gemini-3-pro-high": "gemini-3.1-pro-low",
|
||
"gemini-3.1-pro": "gemini-3.1-pro-low",
|
||
"gemini-3.1-pro-high": "gemini-3.1-pro-low",
|
||
"claude-sonnet-4.6": "claude-sonnet-4-6",
|
||
"claude-sonnet-4.6-thinking": "claude-sonnet-4-6",
|
||
"claude-opus-4.6": "claude-opus-4-6-thinking",
|
||
"claude-opus-4.6-thinking": "claude-opus-4-6-thinking",
|
||
}
|
||
_resolved = _model_alias.get(model, model)
|
||
if _resolved != model:
|
||
print(f"[{getattr(self, '_session_id', '?')}] [antigravity-v2] model resolved: {model} -> {_resolved}", file=sys.stderr)
|
||
model = _resolved
|
||
|
||
input_data = body.get("input", "")
|
||
_schema = _load_schema(model=model)
|
||
if _schema and not _schema.supports_vision:
|
||
input_data = _preprocess_vision_input(input_data, _schema)
|
||
body = dict(body)
|
||
body["input"] = input_data
|
||
|
||
if isinstance(input_data, list) and len(input_data) > 30:
|
||
input_data = _antigravity_normalize_context(input_data, model)
|
||
body = dict(body)
|
||
body["input"] = input_data
|
||
|
||
access_token = _refresh_oauth_token()
|
||
token_path = os.path.join(os.path.expanduser("~"), ".cache", "codex-proxy", "google-antigravity-oauth-token.json")
|
||
project_id = ""
|
||
try:
|
||
with open(token_path) as f:
|
||
project_id = json.load(f).get("project_id", "")
|
||
except Exception:
|
||
pass
|
||
|
||
tool_call_names = {}
|
||
contents = []
|
||
|
||
if isinstance(input_data, list):
|
||
for item in input_data:
|
||
t = item.get("type")
|
||
if t == "message":
|
||
role = "user" if item.get("role") == "user" else "model"
|
||
content = item.get("content", "")
|
||
parts = []
|
||
if isinstance(content, list):
|
||
for c in content:
|
||
ct = c.get("type")
|
||
if ct in ("input_text", "text"):
|
||
parts.append({"text": c.get("text", "")})
|
||
elif ct in ("input_image", "image_url"):
|
||
iu = c.get("image_url") or c.get("url", {})
|
||
url = iu.get("url", iu) if isinstance(iu, dict) else iu
|
||
if isinstance(url, str) and url.startswith("data:"):
|
||
mime, _, b64 = url.partition(";base64,")
|
||
mime = mime.replace("data:", "") or "image/png"
|
||
parts.append({"inlineData": {"mimeType": mime, "data": b64}})
|
||
else:
|
||
parts.append({"text": str(url)})
|
||
elif isinstance(content, str):
|
||
parts.append({"text": content})
|
||
if parts:
|
||
contents.append({"role": role, "parts": parts})
|
||
elif t == "function_call":
|
||
call_id = item.get("call_id") or item.get("id") or f"call_{uuid.uuid4().hex[:24]}"
|
||
fname = item.get("name", "")
|
||
if call_id and fname:
|
||
tool_call_names[call_id] = fname
|
||
args = item.get("arguments", "{}")
|
||
if isinstance(args, str):
|
||
try:
|
||
args = json.loads(args)
|
||
except Exception:
|
||
args = {}
|
||
fc_part = {"functionCall": {"name": fname, "args": args, "id": call_id}}
|
||
stored_sig = _gemini_get_sig(f"fc:{call_id}") or _gemini_get_sig(f"fc:{fname}")
|
||
if stored_sig:
|
||
fc_part["thoughtSignature"] = stored_sig
|
||
fc_part["thought_signature"] = stored_sig
|
||
else:
|
||
fc_part["thought_signature"] = "skip_thought_signature_validator"
|
||
contents.append({"role": "model", "parts": [fc_part]})
|
||
elif t == "function_call_output":
|
||
call_id = item.get("call_id", item.get("id", ""))
|
||
output = item.get("output", "")
|
||
fname = item.get("name", "") or tool_call_names.get(call_id, "")
|
||
resp_part = {"functionResponse": {"name": fname or "unknown", "response": {"result": str(output)}}}
|
||
if call_id:
|
||
resp_part["functionResponse"]["id"] = call_id
|
||
contents.append({"role": "user", "parts": [resp_part]})
|
||
|
||
sanitized = []
|
||
last_user_text = None
|
||
last_role = None
|
||
for content in contents:
|
||
role = content.get("role")
|
||
parts = [p for p in content.get("parts", []) if isinstance(p, dict)]
|
||
if not parts:
|
||
continue
|
||
has_function_call = any("functionCall" in p for p in parts)
|
||
has_function_response = any("functionResponse" in p for p in parts)
|
||
text_key = "\n".join([p.get("text", "") for p in parts if "text" in p]).strip()
|
||
|
||
if has_function_call or has_function_response:
|
||
sanitized.append({"role": role, "parts": parts})
|
||
last_role = role
|
||
continue
|
||
|
||
if role == "user" and text_key and text_key == last_user_text:
|
||
continue
|
||
|
||
if role == last_role and role in ("user", "model") and sanitized:
|
||
last_parts = sanitized[-1].get("parts", [])
|
||
last_has_tool = any("functionCall" in p or "functionResponse" in p for p in last_parts)
|
||
if not last_has_tool:
|
||
sanitized[-1].setdefault("parts", []).extend(parts)
|
||
if role == "user" and text_key:
|
||
last_user_text = text_key
|
||
last_role = role
|
||
continue
|
||
|
||
sanitized.append({"role": role, "parts": parts})
|
||
if role == "user" and text_key:
|
||
last_user_text = text_key
|
||
last_role = role
|
||
|
||
while sanitized and sanitized[0].get("role") != "user":
|
||
sanitized.pop(0)
|
||
|
||
contents = sanitized
|
||
|
||
instructions = body.get("instructions", "").strip()
|
||
ag_identity = "You are Antigravity, a powerful agentic AI coding assistant designed by the Google Deepmind team working on Advanced Agentic Coding.\nYou are pair programming with a USER to solve their coding task. The task may require creating a new codebase, modifying or debugging an existing codebase, or simply answering a question.\n**Absolute paths only**\n**Proactiveness**"
|
||
system_parts = [{"text": ag_identity}, {"text": "\n--- [SYSTEM_PROMPT_END] ---"}]
|
||
if instructions:
|
||
system_parts.append({"text": instructions})
|
||
|
||
gen_config = {"maxOutputTokens": body.get("max_output_tokens", 64000), "stopSequences": ["\n\nHuman:", "[DONE]"]}
|
||
if body.get("temperature") is not None:
|
||
gen_config["temperature"] = body["temperature"]
|
||
if body.get("top_p") is not None:
|
||
gen_config["topP"] = body["top_p"]
|
||
|
||
_is_claude_model = "claude" in model.lower()
|
||
_is_claude_thinking = _is_claude_model and "thinking" in model.lower()
|
||
|
||
if REASONING_ENABLED and REASONING_EFFORT != "none":
|
||
if _is_claude_thinking:
|
||
budget = {"low": 8192, "medium": 16384, "high": 32768}.get(REASONING_EFFORT, 16384)
|
||
gen_config["thinkingConfig"] = {"include_thoughts": True, "thinking_budget": budget}
|
||
if gen_config.get("maxOutputTokens", 0) <= budget:
|
||
gen_config["maxOutputTokens"] = 64000
|
||
elif not _is_claude_model:
|
||
budget = {"low": 2048, "medium": 8192, "high": 24576}.get(REASONING_EFFORT, 8192)
|
||
gen_config["thinkingConfig"] = {"includeThoughts": True, "thinkingBudget": budget}
|
||
|
||
oa_tools = body.get("tools", [])
|
||
gemini_tools = []
|
||
if oa_tools:
|
||
func_decls = []
|
||
for tool in oa_tools:
|
||
ttype = tool.get("type", "function")
|
||
fname = tool.get("name", "")
|
||
if ttype == "function":
|
||
fn = tool.get("function", tool)
|
||
name = fn.get("name", fname)
|
||
desc = fn.get("description", "")
|
||
params = fn.get("parameters", fn.get("input_schema", {}))
|
||
func_decls.append({"name": name, "description": desc, "parameters": params})
|
||
elif fname:
|
||
func_decls.append({"name": fname, "description": tool.get("description", ""), "parameters": tool.get("parameters", {"type": "object", "properties": {}})})
|
||
if func_decls:
|
||
gemini_tools = [{"functionDeclarations": func_decls}]
|
||
|
||
contents = _gemini_reattach_sigs(contents)
|
||
|
||
ag_key = _antigravity_loop_key(self._session_id)
|
||
with _ANTIGRAVITY_LOOP_TRACKER_LOCK:
|
||
if ag_key not in _ANTIGRAVITY_LOOP_TRACKER:
|
||
_ANTIGRAVITY_LOOP_TRACKER[ag_key] = {
|
||
"latest_user_hash": None, "nudge_injected": False, "latest_user_appended": False,
|
||
"tool_calls_for_request": 0, "repeated_tool": False, "force_finalize": False,
|
||
"last_tool": None, "last_tool_count": 0,
|
||
"task_retry_count": 0, "total_tool_calls": 0, "first_seen": time.time(),
|
||
}
|
||
ag_state = _ANTIGRAVITY_LOOP_TRACKER[ag_key]
|
||
|
||
latest_user = ""
|
||
latest_user_hash = None
|
||
if isinstance(input_data, list):
|
||
for item in reversed(input_data):
|
||
if item.get("type") == "message" and item.get("role") == "user":
|
||
c = item.get("content", "")
|
||
if isinstance(c, str):
|
||
latest_user = c
|
||
elif isinstance(c, list):
|
||
latest_user = "\n".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict))
|
||
break
|
||
if latest_user:
|
||
latest_norm = " ".join(latest_user.strip().split())[:500]
|
||
latest_norm = re.sub(r'<current_date>[^<]*</current_date>', '', latest_norm)
|
||
latest_norm = re.sub(r'</?goal_context>', '', latest_norm)
|
||
latest_norm = re.sub(r'</?environment_context>', '', latest_norm)
|
||
latest_norm = " ".join(latest_norm.strip().split())[:200]
|
||
latest_user_hash = hashlib.sha256(latest_norm.encode()).hexdigest()[:16]
|
||
if latest_user_hash:
|
||
task_key = _antigravity_loop_key(self._session_id, latest_user_hash)
|
||
else:
|
||
task_key = ag_key
|
||
if task_key != ag_key:
|
||
with _ANTIGRAVITY_LOOP_TRACKER_LOCK:
|
||
if task_key not in _ANTIGRAVITY_LOOP_TRACKER:
|
||
_ANTIGRAVITY_LOOP_TRACKER[task_key] = dict(_ANTIGRAVITY_LOOP_TRACKER.get(ag_key, {
|
||
"latest_user_hash": None, "nudge_injected": False, "latest_user_appended": False,
|
||
"tool_calls_for_request": 0, "repeated_tool": False, "force_finalize": False,
|
||
"last_tool": None, "last_tool_count": 0,
|
||
"task_retry_count": 0, "total_tool_calls": 0, "first_seen": time.time(),
|
||
}))
|
||
ag_state = _ANTIGRAVITY_LOOP_TRACKER[task_key]
|
||
ag_key = task_key
|
||
|
||
with _ANTIGRAVITY_LOOP_TRACKER_LOCK:
|
||
if latest_user_hash and latest_user_hash != ag_state.get("latest_user_hash"):
|
||
ag_state["latest_user_hash"] = latest_user_hash
|
||
ag_state["nudge_injected"] = False
|
||
ag_state["latest_user_appended"] = False
|
||
ag_state["tool_calls_for_request"] = 0
|
||
ag_state["repeated_tool"] = False
|
||
ag_state["last_tool"] = None
|
||
ag_state["last_tool_count"] = 0
|
||
ag_state["task_retry_count"] = 1
|
||
ag_state["total_tool_calls"] = 0
|
||
ag_state["first_seen"] = time.time()
|
||
ag_state["force_finalize"] = False
|
||
else:
|
||
ag_state["task_retry_count"] = ag_state.get("task_retry_count", 0) + 1
|
||
|
||
# Cross-session retry cap — only fires when same task retried many times
|
||
if ag_state.get("task_retry_count", 0) >= 15:
|
||
ag_state["task_retry_count"] = 0
|
||
ag_state["force_finalize"] = False
|
||
return self._send_ag_finalize(
|
||
"Task retry limit reached. Breaking out of loop. "
|
||
"Try a more specific or smaller request if needed.",
|
||
stream=body.get("stream", False))
|
||
if ag_state.get("task_retry_count", 0) >= 8:
|
||
ag_state["force_finalize"] = True
|
||
|
||
if isinstance(input_data, list):
|
||
n_tool_calls = sum(1 for it in input_data if isinstance(it, dict) and it.get("type") == "function_call")
|
||
ag_state["tool_calls_for_request"] = n_tool_calls
|
||
cumulative_calls = ag_state.get("total_tool_calls", 0) + n_tool_calls
|
||
ag_state["total_tool_calls"] = cumulative_calls
|
||
|
||
if cumulative_calls > _ANTIGRAVITY_MAX_TOOL_CALLS_PER_TASK:
|
||
print(f"[{getattr(self, '_session_id', '?')}] [antigravity-budget] HARD CAP: {cumulative_calls} calls, injecting force-write directive", file=sys.stderr)
|
||
contents.append({"role": "user", "parts": [{"text":
|
||
f"CRITICAL BUDGET LIMIT: {cumulative_calls} tool calls made. "
|
||
f"YOU MUST STOP NOW. Do NOT call any more tools. "
|
||
f"Write your FINAL answer immediately using the information you already have. "
|
||
f"If you have file edits, apply them in this response using exec_command with a write command. "
|
||
f"DO NOT READ ANY MORE FILES."}]})
|
||
elif cumulative_calls > _ANTIGRAVITY_WARN_TOOL_CALLS_PER_TASK:
|
||
contents.append({"role": "user", "parts": [{"text":
|
||
f"WARNING: {cumulative_calls} tool calls made. "
|
||
f"{_ANTIGRAVITY_MAX_TOOL_CALLS_PER_TASK - cumulative_calls} remaining before forced stop. "
|
||
f"STOP READING FILES AND APPLY YOUR EDITS NOW."}]})
|
||
|
||
# CHANGE 2: File-path read-loop detection
|
||
with _ANTIGRAVITY_LOOP_TRACKER_LOCK:
|
||
if ag_key not in _ANTIGRAVITY_FILE_TRACKER:
|
||
_ANTIGRAVITY_FILE_TRACKER[ag_key] = {"last_path": None, "path_counts": {}, "total_reads": 0}
|
||
ft = _ANTIGRAVITY_FILE_TRACKER[ag_key]
|
||
for item in reversed(input_data):
|
||
if isinstance(item, dict) and item.get("type") == "function_call":
|
||
args_str = json.dumps(item.get("arguments", {}))
|
||
file_match = re.search(r'(/[\w/.-]+\.(?:html|py|js|ts|css|json|md|yaml|yml|xml|txt|sh))', args_str)
|
||
if file_match:
|
||
detected_path = file_match.group(1)
|
||
ft["total_reads"] += 1
|
||
ft["path_counts"][detected_path] = ft["path_counts"].get(detected_path, 0) + 1
|
||
ft["last_path"] = detected_path
|
||
if ft["path_counts"][detected_path] >= 5 or ft["total_reads"] > 30:
|
||
ag_state["force_finalize"] = True
|
||
print(f"[antigravity-loop] FILE READ LOOP: {detected_path} read "
|
||
f"{ft['path_counts'][detected_path]}x, total={ft['total_reads']}",
|
||
file=sys.stderr)
|
||
break
|
||
|
||
last_tool_key = None
|
||
for item in reversed(input_data):
|
||
if isinstance(item, dict) and item.get("type") == "function_call":
|
||
fname = item.get("name", "")
|
||
args_str = json.dumps(item.get("arguments", {}), sort_keys=True)[:100]
|
||
last_tool_key = f"{fname}:{args_str}"
|
||
break
|
||
if last_tool_key:
|
||
if last_tool_key == ag_state.get("last_tool"):
|
||
ag_state["last_tool_count"] = ag_state.get("last_tool_count", 0) + 1
|
||
if ag_state["last_tool_count"] >= 5:
|
||
ag_state["repeated_tool"] = True
|
||
ag_state["force_finalize"] = True
|
||
else:
|
||
ag_state["last_tool"] = last_tool_key
|
||
ag_state["last_tool_count"] = 1
|
||
|
||
if ag_state.get("force_finalize"):
|
||
contents.append({"role": "user", "parts": [{"text": "STOP CALLING TOOLS. APPLY THE FINAL EDIT OR SUMMARIZE WHAT BLOCKED YOU. DO NOT CALL ANY MORE TOOLS."}]})
|
||
|
||
if not _antigravity_is_simple_user(latest_user):
|
||
contents.insert(0, {"role": "user", "parts": [{"text": _GEMINI_AGENT_GUARDRAIL}]})
|
||
|
||
request_body = {"contents": contents, "safetySettings": [
|
||
{"category": "HARM_CATEGORY_HARASSMENT", "threshold": "OFF"},
|
||
{"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "OFF"},
|
||
{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "OFF"},
|
||
{"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "OFF"},
|
||
{"category": "HARM_CATEGORY_CIVIC_INTEGRITY", "threshold": "OFF"},
|
||
]}
|
||
request_body["systemInstruction"] = {"role": "user", "parts": system_parts}
|
||
if gen_config:
|
||
request_body["generationConfig"] = gen_config
|
||
if gemini_tools:
|
||
request_body["tools"] = gemini_tools
|
||
if _is_claude_model and gemini_tools:
|
||
request_body["toolConfig"] = {"functionCallingConfig": {"mode": "VALIDATED"}}
|
||
|
||
import platform as _plat
|
||
_os_name = _plat.system().lower()
|
||
_os_arch = _plat.machine().lower().replace("x86_64", "x64").replace("aarch64", "arm64")
|
||
_fetched_ver = _ensure_antigravity_version()
|
||
_ag_ua = f"antigravity/{_fetched_ver} {_os_name}/{_os_arch}"
|
||
|
||
# Get platform for Client-Metadata header (repo4/opencode-antigravity-auth)
|
||
_client_meta_platform = "WINDOWS" if _os_name == "windows" else "MACOS"
|
||
headers = {
|
||
"Content-Type": "application/json",
|
||
"Authorization": f"Bearer {access_token}",
|
||
"User-Agent": f"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Antigravity/{_fetched_ver} Chrome/138.0.7204.235 Electron/37.3.1 Safari/537.36",
|
||
"X-Client-Name": "antigravity",
|
||
"X-Client-Version": _ensure_antigravity_client_version(),
|
||
"x-goog-api-client": "google-cloud-sdk vscode_cloudshelleditor/0.1",
|
||
"Client-Metadata": json.dumps({
|
||
"ideType": "ANTIGRAVITY",
|
||
"platform": _client_meta_platform,
|
||
"pluginType": "GEMINI"
|
||
}),
|
||
}
|
||
|
||
wrapped = {
|
||
"project": project_id,
|
||
"model": model,
|
||
"requestType": "agent",
|
||
"userAgent": _ag_ua,
|
||
"requestId": f"agent-{uuid.uuid4().hex[:12]}",
|
||
"request": request_body,
|
||
}
|
||
wrapped["request"]["sessionId"] = f"{uuid.uuid4().hex}{int(time.time()*1000)}"
|
||
|
||
_antigravity_endpoints = [
|
||
"https://cloudcode-pa.googleapis.com",
|
||
"https://daily-cloudcode-pa.sandbox.googleapis.com",
|
||
"https://autopush-cloudcode-pa.sandbox.googleapis.com",
|
||
]
|
||
|
||
body_b = json.dumps(wrapped).encode()
|
||
print(f"[{self._session_id}] [antigravity-v2] model={model} stream={stream} contents={len(contents)} tools={bool(gemini_tools)} project={project_id} ver={_fetched_ver}", file=sys.stderr)
|
||
try:
|
||
debug_path = os.path.join(_LOG_DIR, f"antigravity-v2-request-{self._session_id}.json")
|
||
with open(debug_path, "w") as dbg:
|
||
json.dump(wrapped, dbg, indent=2)
|
||
except Exception:
|
||
pass
|
||
|
||
upstream = None
|
||
chosen_ep = None
|
||
global _antigravity_preferred_endpoint
|
||
with _antigravity_endpoint_lock:
|
||
_pref = _antigravity_preferred_endpoint
|
||
ordered = ([_pref] + [e for e in _antigravity_endpoints if e != _pref]) if _pref and _pref in _antigravity_endpoints else list(_antigravity_endpoints)
|
||
|
||
_all_404 = True
|
||
for ep in ordered:
|
||
action = "streamGenerateContent" if stream else "generateContent"
|
||
url_suffix = f"v1internal:{action}?alt=sse" if stream else f"v1internal:{action}"
|
||
target = f"{ep}/{url_suffix}"
|
||
req = urllib.request.Request(target, data=body_b, headers=headers)
|
||
try:
|
||
upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, stream))
|
||
chosen_ep = ep
|
||
_all_404 = False
|
||
with _antigravity_endpoint_lock:
|
||
_antigravity_preferred_endpoint = ep
|
||
break
|
||
except urllib.error.HTTPError as e:
|
||
err_body = e.read().decode()
|
||
err_class = _classify_antigravity_error(e.code, err_body)
|
||
print(f"[{self._session_id}] [antigravity-v2] {ep.replace('https://','')} {e.code} class={err_class} body={err_body[:300]}", file=sys.stderr)
|
||
if e.code != 404:
|
||
_all_404 = False
|
||
if e.code in (400, 404):
|
||
try:
|
||
debug_path = os.path.join(_LOG_DIR, f"antigravity-v2-{e.code}.json")
|
||
with open(debug_path, "w") as dbg:
|
||
json.dump({"endpoint": ep, "url": target, "model": model, "wrapped": wrapped, "error": err_body}, dbg, indent=2)
|
||
except Exception:
|
||
pass
|
||
if e.code == 400:
|
||
return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
|
||
if err_class in ("auth_permanent", "forbidden", "account_banned", "validation_required"):
|
||
return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
|
||
if err_class == "auth_transient":
|
||
print(f"[{self._session_id}] [antigravity-v2] 401 transient, force-refreshing token", file=sys.stderr)
|
||
try:
|
||
_force_refresh_google_token()
|
||
access_token = _refresh_oauth_token()
|
||
headers["Authorization"] = f"Bearer {access_token}"
|
||
new_body_b = json.dumps(wrapped).encode()
|
||
retry_req = urllib.request.Request(target, data=new_body_b, headers=headers)
|
||
upstream = urllib.request.urlopen(retry_req, timeout=_upstream_timeout(body, stream))
|
||
chosen_ep = ep
|
||
with _antigravity_endpoint_lock:
|
||
_antigravity_preferred_endpoint = ep
|
||
print(f"[{self._session_id}] [antigravity-v2] 401 retry succeeded", file=sys.stderr)
|
||
break
|
||
except Exception as retry_e:
|
||
print(f"[{self._session_id}] [antigravity-v2] 401 retry failed: {retry_e}", file=sys.stderr)
|
||
return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
|
||
if err_class == "service_disabled":
|
||
_is_prod = "cloudcode-pa.googleapis.com" in ep and "sandbox" not in ep
|
||
if _is_prod:
|
||
return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
|
||
if err_class in ("quota_exhausted", "rate_limited"):
|
||
pool = _google_antigravity_pool
|
||
_, acct = _get_google_account(OAUTH_PROVIDER)
|
||
if acct:
|
||
reset_s = _parse_rate_limit_reset(err_body)
|
||
cooldown = reset_s if reset_s and reset_s > 10 else 60
|
||
pool.mark_rate_limited(acct, cooldown)
|
||
return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
|
||
if ep == ordered[-1] and not _all_404:
|
||
return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
|
||
continue
|
||
except Exception as e:
|
||
_all_404 = False
|
||
print(f"[{self._session_id}] [antigravity-v2] {ep.replace('https://','')} conn failed: {e}", file=sys.stderr)
|
||
if ep == ordered[-1]:
|
||
return self.send_json(502, {"error": {"type": "proxy_error", "message": str(e)}})
|
||
continue
|
||
|
||
if _all_404 and upstream is None:
|
||
print(f"[{self._session_id}] [antigravity-v2] all endpoints 404, invalidating version cache and re-fetching", file=sys.stderr)
|
||
global _antigravity_version_validated
|
||
with _antigravity_version_lock:
|
||
_antigravity_version_validated = False
|
||
_antigravity_version_checked = 0
|
||
_new_ver = _ensure_antigravity_version()
|
||
if _new_ver != _fetched_ver:
|
||
print(f"[{self._session_id}] [antigravity-v2] version changed {_fetched_ver} -> {_new_ver}, retrying", file=sys.stderr)
|
||
_ag_ua_new = f"antigravity/{_new_ver} {_os_name}/{_os_arch}"
|
||
headers["User-Agent"] = _ag_ua_new
|
||
wrapped["userAgent"] = _ag_ua_new
|
||
body_b = json.dumps(wrapped).encode()
|
||
for ep in ordered:
|
||
action = "streamGenerateContent" if stream else "generateContent"
|
||
url_suffix = f"v1internal:{action}?alt=sse" if stream else f"v1internal:{action}"
|
||
target = f"{ep}/{url_suffix}"
|
||
req = urllib.request.Request(target, data=body_b, headers=headers)
|
||
try:
|
||
upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, stream))
|
||
chosen_ep = ep
|
||
with _antigravity_endpoint_lock:
|
||
_antigravity_preferred_endpoint = ep
|
||
break
|
||
except urllib.error.HTTPError as e:
|
||
err_body = e.read().decode()
|
||
print(f"[{self._session_id}] [antigravity-v2-retry] {ep.replace('https://','')} {e.code}", file=sys.stderr)
|
||
if e.code == 400:
|
||
return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
|
||
if ep == ordered[-1]:
|
||
return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
|
||
continue
|
||
except Exception as e:
|
||
if ep == ordered[-1]:
|
||
return self.send_json(502, {"error": {"type": "proxy_error", "message": str(e)}})
|
||
continue
|
||
|
||
if upstream is None:
|
||
# ─── gRPC FALLBACK ─────────────────────────────────────────
|
||
# If REST failed with 404 (model not available via REST API),
|
||
# try gRPC which supports display names and has a wider model catalog.
|
||
if _all_404:
|
||
grpc_result = self._try_grpc_fallback(wrapped, access_token, stream, tracker)
|
||
if grpc_result is not None:
|
||
return # gRPC succeeded, response already sent
|
||
# ─── END gRPC FALLBACK ─────────────────────────────────────
|
||
return self.send_json(502, {"error": {"type": "proxy_error", "message": "All endpoints failed"}})
|
||
|
||
if stream:
|
||
self._forward_gemini_sse(upstream, model, body, input_data, tracker)
|
||
else:
|
||
self._forward_gemini_json(upstream, model, body, input_data)
|
||
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
# gRPC Fallback for Antigravity
|
||
# ═══════════════════════════════════════════════════════════════════
|
||
|
||
def _try_grpc_fallback(self, wrapped_dict, access_token, stream, tracker=None):
|
||
"""
|
||
Try gRPC fallback when REST API returns 404 (model not found).
|
||
|
||
gRPC uses display names (e.g. "Gemini 3.5 Flash (High)") instead of
|
||
REST slugs (e.g. "gemini-3-flash"), so models unavailable via REST
|
||
may work via gRPC.
|
||
|
||
Returns None if gRPC is unavailable or also failed (caller should
|
||
send its own error response). Returns True if gRPC succeeded and
|
||
the response was already sent to the client.
|
||
"""
|
||
grpc_client = _get_grpc_client()
|
||
if grpc_client is None:
|
||
print(f"[{self._session_id}] [antigravity-grpc] gRPC fallback not available (grpcio not installed), skipping", file=sys.stderr)
|
||
return None
|
||
|
||
# gRPC uses display names, not REST slugs — remap the model ID
|
||
grpc_wrapped = dict(wrapped_dict)
|
||
rest_model = grpc_wrapped.get("model", "")
|
||
grpc_model = _GRPC_REVERSE_ALIAS.get(rest_model, rest_model)
|
||
grpc_wrapped["model"] = grpc_model
|
||
if grpc_model != rest_model:
|
||
print(f"[{self._session_id}] [antigravity-grpc] model remapped for gRPC: REST={rest_model} -> gRPC={grpc_model}", file=sys.stderr)
|
||
|
||
print(f"[{self._session_id}] [antigravity-grpc] REST 404, trying gRPC fallback with model={grpc_model} stream={stream}", file=sys.stderr)
|
||
|
||
try:
|
||
result = grpc_client.try_generate(
|
||
grpc_wrapped,
|
||
stream=stream,
|
||
access_token=access_token,
|
||
timeout_s=180,
|
||
)
|
||
except Exception as e:
|
||
print(f"[{self._session_id}] [antigravity-grpc] gRPC call exception: {e}", file=sys.stderr)
|
||
return None
|
||
|
||
if not result.ok:
|
||
print(f"[{self._session_id}] [antigravity-grpc] gRPC fallback also failed: {result.error_message}", file=sys.stderr)
|
||
return None
|
||
|
||
print(f"[{self._session_id}] [antigravity-grpc] gRPC fallback OK! endpoint={result.endpoint_used} model={result.model_used} elapsed={result.elapsed_s:.1f}s", file=sys.stderr)
|
||
|
||
# Process the gRPC response through the same forwarding paths as REST
|
||
if stream and result.stream_chunks is not None:
|
||
self._forward_grpc_sse(result, grpc_model)
|
||
elif not stream and result.response_data is not None:
|
||
self._forward_grpc_json(result, grpc_model)
|
||
else:
|
||
print(f"[{self._session_id}] [antigravity-grpc] unexpected result shape, no data to forward", file=sys.stderr)
|
||
return None
|
||
|
||
return True # Response sent successfully via gRPC
|
||
|
||
def _forward_grpc_sse(self, grpc_result, model):
|
||
"""
|
||
Forward a gRPC streaming result to the client as SSE events.
|
||
The gRPC result contains stream_chunks that match the REST SSE chunk shape,
|
||
so we can process them through the same _forward_gemini_sse logic.
|
||
"""
|
||
resp_id = f"resp-{uuid.uuid4().hex[:24]}"
|
||
created = int(time.time())
|
||
self.send_response(200)
|
||
self.send_header("Content-Type", "text/event-stream")
|
||
self.send_header("Cache-Control", "no-cache")
|
||
self.send_header("Connection", "keep-alive")
|
||
self.end_headers()
|
||
|
||
full_text = ""
|
||
output_items = []
|
||
current_tool_calls = {}
|
||
message_started = False
|
||
message_id = f"msg-{uuid.uuid4().hex[:24]}"
|
||
|
||
def flush_event(event_type, data):
|
||
self.wfile.write(f"event: {event_type}\ndata: {json.dumps(data)}\n\n".encode())
|
||
self.wfile.flush()
|
||
|
||
flush_event("response.created", {"type": "response.created", "response": {"id": resp_id, "object": "response", "model": model, "status": "in_progress", "created": created, "output": []}})
|
||
flush_event("response.in_progress", {"type": "response.in_progress", "response": {"id": resp_id}})
|
||
|
||
# Process each gRPC chunk (same shape as REST SSE chunks)
|
||
for chunk in grpc_result.stream_chunks:
|
||
candidates = chunk.get("response", chunk).get("candidates", [])
|
||
if not candidates:
|
||
continue
|
||
parts = candidates[0].get("content", {}).get("parts", [])
|
||
for part in parts:
|
||
sig = _extract_gemini_sig(part)
|
||
if sig:
|
||
if part.get("functionCall"):
|
||
fc_id = part["functionCall"].get("id") or part["functionCall"].get("name")
|
||
fc_name = part["functionCall"].get("name")
|
||
if fc_id:
|
||
_gemini_store_sig(f"fc:{fc_id}", sig)
|
||
if fc_name:
|
||
_gemini_store_sig(f"fc:{fc_name}", sig)
|
||
_gemini_store_sig(f"turn:{resp_id}", sig)
|
||
if part.get("thought"):
|
||
sig_from_thought = _extract_gemini_sig(part)
|
||
if sig_from_thought:
|
||
_gemini_store_sig(f"turn:{resp_id}", sig_from_thought)
|
||
continue
|
||
if "text" in part and not part.get("functionCall"):
|
||
text_delta = part["text"]
|
||
if not text_delta:
|
||
continue
|
||
full_text += text_delta
|
||
if not message_started:
|
||
flush_event("response.output_item.added", {"type": "response.output_item.added", "output_index": 0, "item": {"type": "message", "id": message_id, "role": "assistant", "content": []}})
|
||
flush_event("response.content_part.added", {"type": "response.content_part.added", "output_index": 0, "content_index": 0, "part": {"type": "output_text", "text": ""}})
|
||
output_items.append({"text": True})
|
||
message_started = True
|
||
flush_event("response.output_text.delta", {"type": "response.output_text.delta", "output_index": 0, "content_index": 0, "delta": text_delta})
|
||
elif part.get("functionCall"):
|
||
fc = part["functionCall"]
|
||
call_id = f"call_{uuid.uuid4().hex[:24]}"
|
||
args_str = json.dumps(fc.get("args", fc.get("arguments", {})))
|
||
output_index = len(output_items)
|
||
flush_event("response.output_item.added", {"type": "response.output_item.added", "output_index": output_index, "item": {"type": "function_call", "id": call_id, "call_id": call_id, "name": fc.get("name", ""), "arguments": ""}})
|
||
flush_event("response.function_call_arguments.delta", {"type": "response.function_call_arguments.delta", "output_index": output_index, "item_id": call_id, "delta": args_str})
|
||
flush_event("response.function_call_arguments.done", {"type": "response.function_call_arguments.done", "output_index": output_index, "item_id": call_id, "arguments": args_str})
|
||
current_tool_calls[call_id] = fc
|
||
output_items.append({"tool": True})
|
||
|
||
# Build final response
|
||
out = []
|
||
if full_text:
|
||
out.append({"type": "message", "id": message_id, "role": "assistant", "content": [{"type": "output_text", "text": full_text}]})
|
||
tool_outputs = []
|
||
for cid, fc in current_tool_calls.items():
|
||
tool_outputs.append({"type": "function_call", "id": cid, "call_id": cid, "name": fc.get("name", ""), "arguments": json.dumps(fc.get("args", fc.get("arguments", {})))})
|
||
out.extend(tool_outputs)
|
||
|
||
final_resp = {"id": resp_id, "object": "response", "model": model, "status": "completed", "created": created, "output": out}
|
||
if full_text:
|
||
flush_event("response.output_text.done", {"type": "response.output_text.done", "output_index": 0, "content_index": 0, "text": full_text})
|
||
flush_event("response.content_part.done", {"type": "response.content_part.done", "output_index": 0, "content_index": 0, "part": {"type": "output_text", "text": full_text}})
|
||
flush_event("response.output_item.done", {"type": "response.output_item.done", "output_index": 0, "item": out[0]})
|
||
for idx, item in enumerate(tool_outputs, start=(1 if full_text else 0)):
|
||
flush_event("response.output_item.done", {"type": "response.output_item.done", "output_index": idx, "item": item})
|
||
flush_event("response.completed", {"type": "response.completed", "response": final_resp})
|
||
self.close_connection = True
|
||
|
||
with _response_store_lock:
|
||
_response_store[resp_id] = final_resp
|
||
while len(_response_store) > _MAX_STORED:
|
||
_response_store.popitem(last=False)
|
||
|
||
def _forward_grpc_json(self, grpc_result, model):
|
||
"""Forward a gRPC non-streaming result to the client as JSON."""
|
||
resp_id = f"resp-{uuid.uuid4().hex[:24]}"
|
||
created = int(time.time())
|
||
out = []
|
||
full_text = ""
|
||
data = grpc_result.response_data
|
||
candidates = data.get("response", data).get("candidates", [])
|
||
if candidates:
|
||
parts = candidates[0].get("content", {}).get("parts", [])
|
||
text_parts = []
|
||
for part in parts:
|
||
if part.get("thought"):
|
||
continue
|
||
if "text" in part and not part.get("functionCall"):
|
||
text_parts.append(part["text"])
|
||
elif part.get("functionCall"):
|
||
fc = part["functionCall"]
|
||
call_id = f"call_{uuid.uuid4().hex[:24]}"
|
||
out.append({"type": "function_call", "id": call_id, "call_id": call_id, "name": fc.get("name", ""), "arguments": json.dumps(fc.get("args", fc.get("arguments", {})))})
|
||
if text_parts:
|
||
full_text = "".join(text_parts)
|
||
out.insert(0, {"type": "message", "id": f"msg-{uuid.uuid4().hex[:24]}", "role": "assistant", "content": [{"type": "output_text", "text": full_text}]})
|
||
resp = {"id": resp_id, "object": "response", "model": model, "status": "completed", "created": created, "output": out}
|
||
with _response_store_lock:
|
||
_response_store[resp_id] = resp
|
||
while len(_response_store) > _MAX_STORED:
|
||
_response_store.popitem(last=False)
|
||
self.send_json(200, resp)
|
||
|
||
def _handle_gemini_oauth(self, body, model, stream, tracker=None):
|
||
input_data = body.get("input", "")
|
||
policy = provider_policy()
|
||
original_model = model
|
||
|
||
_GEMINI_KEEP_RECENT = 6
|
||
_GEMINI_OLD_LIMIT = 3000
|
||
_GEMINI_RECENT_LIMIT = 20000
|
||
|
||
if isinstance(input_data, list) and len(input_data) > 8:
|
||
n_tool_outputs = sum(1 for it in input_data if isinstance(it, dict) and it.get("type") == "function_call_output")
|
||
if n_tool_outputs > 2:
|
||
tool_indexes = [i for i, it in enumerate(input_data) if isinstance(it, dict) and it.get("type") == "function_call_output"]
|
||
recent_set = set(tool_indexes[-_GEMINI_KEEP_RECENT:])
|
||
compacted_data = []
|
||
for i, item in enumerate(input_data):
|
||
if isinstance(item, dict) and item.get("type") == "function_call_output":
|
||
o = item.get("output", "")
|
||
limit = _GEMINI_RECENT_LIMIT if i in recent_set else _GEMINI_OLD_LIMIT
|
||
if len(o) > limit:
|
||
item = dict(item)
|
||
item["output"] = o[:limit] + f"\n... [proxy compacted: kept {limit} of {len(o)} chars]"
|
||
compacted_data.append(item)
|
||
input_data = compacted_data
|
||
body = dict(body)
|
||
body["input"] = input_data
|
||
print(f"[gemini-compact] {n_tool_outputs} tool outputs, recent={_GEMINI_RECENT_LIMIT} old={_GEMINI_OLD_LIMIT}", file=sys.stderr)
|
||
|
||
if OAUTH_PROVIDER == "google-antigravity":
|
||
alias_map = {
|
||
"Gemini 3.5 Flash (High)": "gemini-3-flash",
|
||
"Gemini 3.5 Flash (Medium)": "gemini-3-flash",
|
||
"Gemini 3.5 Flash (Low)": "gemini-3.5-flash-low",
|
||
"gemini-3.5-flash-high": "gemini-3-flash",
|
||
"gemini-3.5-flash-medium": "gemini-3-flash",
|
||
"gemini-3.5-flash-low": "gemini-3.5-flash-low",
|
||
"gemini-3-flash-preview": "gemini-3-flash",
|
||
"gemini-3-flash": "gemini-3-flash",
|
||
"antigravity-gemini-3-flash": "gemini-3-flash",
|
||
"Gemini 3.1 Pro (High)": "gemini-3.1-pro-low",
|
||
"Gemini 3.1 Pro (Low)": "gemini-3.1-pro-low",
|
||
"gemini-3.1-pro-high": "gemini-3.1-pro-low",
|
||
"gemini-3.1-pro-low": "gemini-3.1-pro-low",
|
||
"gemini-3.1-pro-preview": "gemini-3.1-pro-low",
|
||
"gemini-3.1-pro": "gemini-3.1-pro-low",
|
||
"gemini-3-pro-preview": "gemini-3.1-pro-low",
|
||
"gemini-3-pro": "gemini-3.1-pro-low",
|
||
"gemini-3-pro-low": "gemini-3.1-pro-low",
|
||
"gemini-3-pro-high": "gemini-3.1-pro-low",
|
||
"antigravity-gemini-3-pro": "gemini-3.1-pro-low",
|
||
"antigravity-gemini-3.1-pro": "gemini-3.1-pro-low",
|
||
"Claude Sonnet 4.6 (Thinking)": "claude-sonnet-4-6",
|
||
"Claude Sonnet 4.6 Thinking": "claude-sonnet-4-6",
|
||
"claude-sonnet-4.6-thinking": "claude-sonnet-4-6",
|
||
"antigravity-claude-sonnet-4-6": "claude-sonnet-4-6",
|
||
"Claude Opus 4.6 (Thinking)": "claude-opus-4-6-thinking",
|
||
"Claude Opus 4.6 Thinking": "claude-opus-4-6-thinking",
|
||
"claude-opus-4.6-thinking": "claude-opus-4-6-thinking",
|
||
"antigravity-claude-opus-4-6-thinking": "claude-opus-4-6-thinking",
|
||
"GPT-OSS 120B (Medium)": "gpt-oss-120b-medium",
|
||
"GPT-OSS 120B Medium": "gpt-oss-120b-medium",
|
||
"gpt-oss-120b": "gpt-oss-120b-medium",
|
||
"gemini-2.5-flash": "gemini-2.5-flash",
|
||
"gemini-2.5-pro": "gemini-2.5-pro",
|
||
"gemini-2.5-flash-lite": "gemini-2.5-flash-lite",
|
||
}
|
||
model = alias_map.get(model, model)
|
||
if model != original_model:
|
||
print(f"[antigravity] model mapped user={original_model} upstream={model}", file=sys.stderr)
|
||
|
||
pair_errors = validate_tool_pairs(input_data)
|
||
if pair_errors:
|
||
input_data = repair_orphan_tool_outputs(input_data, pair_errors)
|
||
body = dict(body)
|
||
body["input"] = input_data
|
||
|
||
compacted = False
|
||
if policy.get("compaction") and isinstance(input_data, list) and "claude" not in model.lower():
|
||
input_data, compacted = _adaptive_compact(input_data, model, policy)
|
||
if compacted:
|
||
body = dict(body)
|
||
body["input"] = input_data
|
||
|
||
if PROMPT_ENHANCER and isinstance(input_data, list):
|
||
input_data = _apply_prompt_enhancer(input_data)
|
||
body = dict(body)
|
||
body["input"] = input_data
|
||
|
||
if OAUTH_PROVIDER == "google-antigravity" and isinstance(input_data, list) and "claude" not in model.lower():
|
||
input_data = _antigravity_normalize_context(input_data, model)
|
||
body = dict(body)
|
||
body["input"] = input_data
|
||
|
||
access_token = _refresh_oauth_token()
|
||
token_name = "google-antigravity-oauth-token.json" if OAUTH_PROVIDER == "google-antigravity" else "google-cli-oauth-token.json"
|
||
token_path = os.path.join(os.path.expanduser("~"), ".cache", "codex-proxy", token_name)
|
||
project_id = ""
|
||
try:
|
||
with open(token_path) as f:
|
||
project_id = json.load(f).get("project_id", "")
|
||
except Exception:
|
||
pass
|
||
|
||
contents = []
|
||
system_parts = []
|
||
instructions = body.get("instructions", "").strip()
|
||
tool_call_names = {}
|
||
|
||
if isinstance(input_data, list):
|
||
for item in input_data:
|
||
t = item.get("type")
|
||
if t == "message":
|
||
role = "user" if item.get("role") == "user" else "model"
|
||
content = item.get("content", "")
|
||
if isinstance(content, list):
|
||
parts = []
|
||
for c in content:
|
||
ct = c.get("type")
|
||
if ct == "input_text":
|
||
parts.append({"text": c.get("text", "")})
|
||
elif ct == "text":
|
||
parts.append({"text": c.get("text", "")})
|
||
elif ct == "input_image" or ct == "image_url":
|
||
iu = c.get("image_url") or c.get("url", {})
|
||
url = iu.get("url", iu) if isinstance(iu, dict) else iu
|
||
if isinstance(url, str) and url.startswith("data:"):
|
||
mime, _, b64 = url.partition(";base64,")
|
||
mime = mime.replace("data:", "") or "image/png"
|
||
parts.append({"inlineData": {"mimeType": mime, "data": b64}})
|
||
else:
|
||
parts.append({"text": str(url)})
|
||
if parts:
|
||
contents.append({"role": role, "parts": parts})
|
||
elif isinstance(content, str):
|
||
contents.append({"role": role, "parts": [{"text": content}]})
|
||
elif t == "function_call":
|
||
call_id = item.get("call_id") or item.get("id") or f"call_{uuid.uuid4().hex[:24]}"
|
||
fname = item.get("name", "")
|
||
if call_id and fname:
|
||
tool_call_names[call_id] = fname
|
||
args = item.get("arguments", "{}")
|
||
if isinstance(args, str):
|
||
try:
|
||
args = json.loads(args)
|
||
except Exception:
|
||
args = {}
|
||
fc_part = {"functionCall": {"name": fname, "args": args, "id": call_id}}
|
||
stored_sig = _gemini_get_sig(f"fc:{call_id}") or _gemini_get_sig(f"fc:{fname}")
|
||
if stored_sig:
|
||
fc_part["thoughtSignature"] = stored_sig
|
||
fc_part["thought_signature"] = stored_sig
|
||
else:
|
||
fc_part["thought_signature"] = "skip_thought_signature_validator"
|
||
contents.append({"role": "model", "parts": [fc_part]})
|
||
elif t == "function_call_output":
|
||
call_id = item.get("call_id", item.get("id", ""))
|
||
output = item.get("output", "")
|
||
fname = item.get("name", "") or tool_call_names.get(call_id, "")
|
||
try:
|
||
output_parsed = json.loads(output) if isinstance(output, str) else output
|
||
except Exception:
|
||
output_parsed = output
|
||
resp_part = {"functionResponse": {"name": fname or "unknown", "response": {"result": output_parsed if isinstance(output_parsed, (dict, list)) else output}}}
|
||
if call_id:
|
||
resp_part["functionResponse"]["id"] = call_id
|
||
contents.append({"role": "user", "parts": [resp_part]})
|
||
|
||
# CRITICAL FIX: Sanitize contents while PRESERVING functionCall -> functionResponse alternation.
|
||
# Google's Gemini API REQUIRES: functionCall (role=model) must be immediately followed by functionResponse (role=user).
|
||
# We NEVER merge, skip, or reorder tool-related messages.
|
||
if OAUTH_PROVIDER.startswith("google") and "claude" not in model.lower():
|
||
sanitized = []
|
||
last_user_text = None
|
||
last_role = None
|
||
for content in contents:
|
||
role = content.get("role")
|
||
parts = [p for p in content.get("parts", []) if isinstance(p, dict)]
|
||
if not parts:
|
||
continue
|
||
# Check if this content has functionCall or functionResponse - these MUST be preserved as-is
|
||
has_function_call = any("functionCall" in p for p in parts)
|
||
has_function_response = any("functionResponse" in p for p in parts)
|
||
text_key = "\n".join([p.get("text", "") for p in parts if "text" in p]).strip()
|
||
|
||
# Tool calls/responses are NEVER merged or skipped - they must maintain strict order
|
||
if has_function_call or has_function_response:
|
||
sanitized.append({"role": role, "parts": parts})
|
||
continue
|
||
|
||
# For plain text messages only: skip duplicate consecutive user text
|
||
if role == "user" and text_key and text_key == last_user_text:
|
||
continue
|
||
|
||
# Merge consecutive same-role TEXT-ONLY messages (no tool content)
|
||
if role == last_role and role in ("user", "model") and sanitized:
|
||
last_parts = sanitized[-1].get("parts", [])
|
||
# Only merge if the last message is also text-only (no functionCall/functionResponse)
|
||
last_has_tool = any("functionCall" in p or "functionResponse" in p for p in last_parts)
|
||
if not last_has_tool:
|
||
sanitized[-1].setdefault("parts", []).extend(parts)
|
||
if role == "user" and text_key:
|
||
last_user_text = text_key
|
||
continue
|
||
|
||
sanitized.append({"role": role, "parts": parts})
|
||
if role == "user" and text_key:
|
||
last_user_text = text_key
|
||
last_role = role
|
||
|
||
# Trim leading non-user messages (Google expects conversation to start with user)
|
||
while sanitized and sanitized[0].get("role") != "user":
|
||
sanitized.pop(0)
|
||
# Trim trailing non-user messages (must end with user turn for continuation)
|
||
while sanitized and sanitized[-1].get("role") != "user":
|
||
sanitized.pop()
|
||
contents = sanitized
|
||
|
||
if instructions:
|
||
system_parts.append({"text": instructions})
|
||
if OAUTH_PROVIDER == "google-antigravity":
|
||
system_parts.append({"text": (
|
||
"You are connected through a Responses API translation proxy. "
|
||
"If tools are available and the user's request requires changing files, call the appropriate tool immediately. "
|
||
"Do not announce plans, do not say you will list files, browse, fetch, inspect, or start by exploring unless you are emitting the actual tool call in the same response. "
|
||
"For file creation requests, use tools to create or modify the file instead of only printing code in chat. "
|
||
"If no suitable tool is available, answer directly with the complete result. "
|
||
"Never answer only with a plan such as 'I will start by...' or 'I am going to...'."
|
||
)})
|
||
|
||
gen_config = {}
|
||
mot = body.get("max_output_tokens", 0)
|
||
if mot:
|
||
gen_config["maxOutputTokens"] = mot
|
||
if body.get("temperature") is not None:
|
||
gen_config["temperature"] = body["temperature"]
|
||
if body.get("top_p") is not None:
|
||
gen_config["topP"] = body["top_p"]
|
||
|
||
_is_claude_model = "claude" in model.lower()
|
||
_is_claude_thinking = _is_claude_model and "thinking" in model.lower()
|
||
|
||
if OAUTH_PROVIDER == "google-antigravity" and _is_claude_thinking:
|
||
if REASONING_ENABLED and REASONING_EFFORT != "none":
|
||
budget = {"low": 8192, "medium": 16384, "high": 32768}.get(REASONING_EFFORT, 16384)
|
||
else:
|
||
budget = 16384
|
||
gen_config["thinkingConfig"] = {
|
||
"include_thoughts": True,
|
||
"thinking_budget": budget,
|
||
}
|
||
current_max = gen_config.get("maxOutputTokens", 0)
|
||
if not current_max or current_max <= budget:
|
||
gen_config["maxOutputTokens"] = 64000
|
||
print(f"[antigravity-claude] thinking model={model} budget={budget} maxOutputTokens={gen_config.get('maxOutputTokens')}", file=sys.stderr)
|
||
elif OAUTH_PROVIDER == "google-antigravity" and _is_claude_model:
|
||
if "thinkingConfig" in gen_config:
|
||
del gen_config["thinkingConfig"]
|
||
elif REASONING_ENABLED and REASONING_EFFORT != "none":
|
||
budget = {"low": 2048, "medium": 8192, "high": 24576}.get(REASONING_EFFORT, 8192)
|
||
gen_config["thinkingConfig"] = {"includeThoughts": True, "thinkingBudget": budget}
|
||
|
||
oa_tools = body.get("tools", [])
|
||
gemini_tools = []
|
||
if oa_tools:
|
||
func_decls = []
|
||
for tool in oa_tools:
|
||
ttype = tool.get("type", "function")
|
||
fname = tool.get("name", "")
|
||
if ttype == "function":
|
||
fn = tool.get("function", tool)
|
||
name = fn.get("name", fname)
|
||
desc = fn.get("description", "")
|
||
params = fn.get("parameters", fn.get("input_schema", {}))
|
||
func_decls.append({"name": name, "description": desc, "parameters": params})
|
||
elif fname:
|
||
func_decls.append({"name": fname, "description": tool.get("description", ""), "parameters": tool.get("parameters", {"type": "object", "properties": {}})})
|
||
if func_decls:
|
||
gemini_tools = [{"functionDeclarations": func_decls}]
|
||
|
||
if OAUTH_PROVIDER == "google-antigravity":
|
||
contents = _gemini_reattach_sigs(contents)
|
||
|
||
if OAUTH_PROVIDER == "google-antigravity":
|
||
latest_user = ""
|
||
if isinstance(input_data, list):
|
||
for item in reversed(input_data):
|
||
if item.get("type") == "message" and item.get("role") == "user":
|
||
c = item.get("content", "")
|
||
if isinstance(c, str):
|
||
latest_user = c
|
||
elif isinstance(c, list):
|
||
latest_user = "\n".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict))
|
||
break
|
||
is_latest_simple = _antigravity_is_simple_user(latest_user)
|
||
if not is_latest_simple:
|
||
contents.insert(0, {"role": "user", "parts": [{"text": _GEMINI_AGENT_GUARDRAIL}]})
|
||
|
||
if OAUTH_PROVIDER == "google-antigravity":
|
||
import hashlib
|
||
ag_key = _antigravity_loop_key(self._session_id)
|
||
with _ANTIGRAVITY_LOOP_TRACKER_LOCK:
|
||
if ag_key not in _ANTIGRAVITY_LOOP_TRACKER:
|
||
_ANTIGRAVITY_LOOP_TRACKER[ag_key] = {
|
||
"latest_user_hash": None,
|
||
"nudge_injected": False,
|
||
"latest_user_appended": False,
|
||
"tool_calls_for_request": 0,
|
||
"repeated_tool": False,
|
||
"force_finalize": False,
|
||
"last_tool": None,
|
||
"last_tool_count": 0,
|
||
"task_retry_count": 0,
|
||
"total_tool_calls": 0,
|
||
"first_seen": time.time(),
|
||
}
|
||
ag_state = _ANTIGRAVITY_LOOP_TRACKER[ag_key]
|
||
|
||
latest_user = ""
|
||
latest_user_hash = None
|
||
if isinstance(input_data, list):
|
||
for item in reversed(input_data):
|
||
if item.get("type") == "message" and item.get("role") == "user":
|
||
c = item.get("content", "")
|
||
if isinstance(c, str):
|
||
latest_user = c
|
||
elif isinstance(c, list):
|
||
latest_user = "\n".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict))
|
||
break
|
||
if latest_user:
|
||
latest_norm = " ".join(latest_user.strip().split())[:500]
|
||
latest_norm = re.sub(r'<current_date>[^<]*</current_date>', '', latest_norm)
|
||
latest_norm = re.sub(r'</?goal_context>', '', latest_norm)
|
||
latest_norm = re.sub(r'</?environment_context>', '', latest_norm)
|
||
latest_norm = " ".join(latest_norm.strip().split())[:200]
|
||
latest_user_hash = hashlib.sha256(latest_norm.encode()).hexdigest()[:16]
|
||
|
||
if latest_user_hash:
|
||
task_key = _antigravity_loop_key(self._session_id, latest_user_hash)
|
||
else:
|
||
task_key = ag_key
|
||
if task_key != ag_key:
|
||
with _ANTIGRAVITY_LOOP_TRACKER_LOCK:
|
||
if task_key not in _ANTIGRAVITY_LOOP_TRACKER:
|
||
_ANTIGRAVITY_LOOP_TRACKER[task_key] = dict(_ANTIGRAVITY_LOOP_TRACKER.get(ag_key, {
|
||
"latest_user_hash": None, "nudge_injected": False,
|
||
"latest_user_appended": False, "tool_calls_for_request": 0,
|
||
"repeated_tool": False, "force_finalize": False,
|
||
"last_tool": None, "last_tool_count": 0,
|
||
"task_retry_count": 0, "total_tool_calls": 0, "first_seen": time.time(),
|
||
}))
|
||
ag_state = _ANTIGRAVITY_LOOP_TRACKER[task_key]
|
||
ag_key = task_key
|
||
|
||
with _ANTIGRAVITY_LOOP_TRACKER_LOCK:
|
||
if latest_user_hash and latest_user_hash != ag_state.get("latest_user_hash"):
|
||
ag_state["latest_user_hash"] = latest_user_hash
|
||
ag_state["nudge_injected"] = False
|
||
ag_state["latest_user_appended"] = False
|
||
ag_state["tool_calls_for_request"] = 0
|
||
ag_state["repeated_tool"] = False
|
||
ag_state["last_tool"] = None
|
||
ag_state["last_tool_count"] = 0
|
||
ag_state["task_retry_count"] = 1
|
||
ag_state["total_tool_calls"] = 0
|
||
ag_state["first_seen"] = time.time()
|
||
ag_state["force_finalize"] = False
|
||
else:
|
||
ag_state["task_retry_count"] = ag_state.get("task_retry_count", 0) + 1
|
||
|
||
if ag_state.get("task_retry_count", 0) >= 15:
|
||
ag_state["task_retry_count"] = 0
|
||
ag_state["force_finalize"] = False
|
||
self._send_ag_finalize("Task retry limit reached. Breaking loop.",
|
||
stream=body.get("stream", False) if isinstance(body, dict) else False)
|
||
return
|
||
if ag_state.get("task_retry_count", 0) >= 8:
|
||
ag_state["force_finalize"] = True
|
||
|
||
if isinstance(input_data, list):
|
||
n_tool_calls = sum(1 for it in input_data if isinstance(it, dict) and it.get("type") == "function_call")
|
||
ag_state["tool_calls_for_request"] = n_tool_calls
|
||
cumulative_calls = ag_state.get("total_tool_calls", 0) + n_tool_calls
|
||
ag_state["total_tool_calls"] = cumulative_calls
|
||
|
||
if cumulative_calls > _ANTIGRAVITY_MAX_TOOL_CALLS_PER_TASK:
|
||
print(f"[antigravity-budget] HARD CAP: {cumulative_calls} calls, injecting force-write", file=sys.stderr)
|
||
contents.append({"role": "user", "parts": [{"text":
|
||
f"CRITICAL BUDGET LIMIT: {cumulative_calls} tool calls. "
|
||
f"STOP ALL TOOL CALLS. Write your FINAL answer now. "
|
||
f"Apply any edits using exec_command with a write command in this response."}]})
|
||
elif cumulative_calls > _ANTIGRAVITY_WARN_TOOL_CALLS_PER_TASK:
|
||
contents.append({"role": "user", "parts": [{"text":
|
||
f"WARNING: {cumulative_calls} tool calls. "
|
||
f"{_ANTIGRAVITY_MAX_TOOL_CALLS_PER_TASK - cumulative_calls} remaining. "
|
||
f"STOP READING AND WRITE NOW."}]})
|
||
|
||
with _ANTIGRAVITY_LOOP_TRACKER_LOCK:
|
||
if ag_key not in _ANTIGRAVITY_FILE_TRACKER:
|
||
_ANTIGRAVITY_FILE_TRACKER[ag_key] = {"last_path": None, "path_counts": {}, "total_reads": 0}
|
||
ft = _ANTIGRAVITY_FILE_TRACKER[ag_key]
|
||
for item in reversed(input_data):
|
||
if isinstance(item, dict) and item.get("type") == "function_call":
|
||
args_str = json.dumps(item.get("arguments", {}))
|
||
file_match = re.search(r'(/[\w/.-]+\.(?:html|py|js|ts|css|json|md|yaml|yml|xml|txt|sh))', args_str)
|
||
if file_match:
|
||
dp = file_match.group(1)
|
||
ft["total_reads"] += 1
|
||
ft["path_counts"][dp] = ft["path_counts"].get(dp, 0) + 1
|
||
ft["last_path"] = dp
|
||
if ft["path_counts"][dp] >= 5 or ft["total_reads"] > 30:
|
||
ag_state["force_finalize"] = True
|
||
print(f"[antigravity-loop] FILE READ LOOP: {dp} read "
|
||
f"{ft['path_counts'][dp]}x, total={ft['total_reads']}", file=sys.stderr)
|
||
break
|
||
|
||
last_tool_key = None
|
||
for item in reversed(input_data):
|
||
if isinstance(item, dict) and item.get("type") == "function_call":
|
||
fname = item.get("name", "")
|
||
args_str = json.dumps(item.get("arguments", {}), sort_keys=True)[:100]
|
||
last_tool_key = f"{fname}:{args_str}"
|
||
break
|
||
if last_tool_key:
|
||
if last_tool_key == ag_state["last_tool"]:
|
||
ag_state["last_tool_count"] += 1
|
||
if ag_state["last_tool_count"] >= 5:
|
||
ag_state["repeated_tool"] = True
|
||
ag_state["force_finalize"] = True
|
||
else:
|
||
ag_state["last_tool"] = last_tool_key
|
||
ag_state["last_tool_count"] = 1
|
||
|
||
_EDIT_WORDS = ("change", "fix", "update", "redesign", "rewrite", "modify", "improve", "replace", "edit", "make it", "add", "remove", "delete", "rename", "move", "convert")
|
||
latest_lower = ""
|
||
if isinstance(input_data, list):
|
||
for item in reversed(input_data):
|
||
if item.get("type") == "message" and item.get("role") == "user":
|
||
c = item.get("content", "")
|
||
if isinstance(c, str): latest_lower = c.lower()
|
||
elif isinstance(c, list): latest_lower = " ".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict)).lower()
|
||
break
|
||
|
||
if ag_state["force_finalize"]:
|
||
contents.append({"role": "user", "parts": [{"text": "STOP CALLING TOOLS. APPLY THE FINAL EDIT OR SUMMARIZE WHAT BLOCKED YOU. DO NOT CALL ANY MORE TOOLS. DO NOT PRODUCE ANY MORE PLANNING TEXT. DO NOT PRODUCE ANY MORE EXPLORATORY TOOL CALLS. PRODUCE A FINAL ANSWER OR A CLEAR STATEMENT OF WHAT IS PREVENTING YOU FROM COMPLETING THE TASK."}]})
|
||
elif latest_lower and any(w in latest_lower for w in _EDIT_WORDS) and not ag_state["nudge_injected"] and not ag_state["force_finalize"]:
|
||
contents.append({"role": "user", "parts": [{"text": "!!! ABSOLUTELY NO PLANNING - EMIT THE TOOL CALL NOW !!! IMPORTANT: The user is requesting a modification to existing files. You MUST use tools (exec_command, read_files, write, etc.) to make the changes RIGHT NOW. Do NOT just describe what to do — actually CALL THE TOOLS IN THIS RESPONSE. IMMEDIATELY INSPECT THE FILE OR LIST FILES USING exec_command TOOL CALL."}]})
|
||
ag_state["nudge_injected"] = True
|
||
print(f"[antigravity] edit-intent detected; injected tool-use nudge (first time for this request)", file=sys.stderr)
|
||
else:
|
||
if ag_state["nudge_injected"]:
|
||
print(f"[antigravity] edit-intent nudge already injected, skipping", file=sys.stderr)
|
||
|
||
if latest_user and not ag_state["latest_user_appended"] and not ag_state["force_finalize"]:
|
||
latest_norm = " ".join(latest_user.strip().split())[:160]
|
||
final_text = ""
|
||
if contents:
|
||
last = contents[-1]
|
||
if last.get("role") == "user":
|
||
final_text = " ".join(json.dumps(last.get("parts", []), ensure_ascii=False).split())
|
||
if latest_norm[:120] not in final_text:
|
||
print(f"[antigravity] latest user instruction was not final turn; appending (first time for this request)", file=sys.stderr)
|
||
contents.append({"role": "user", "parts": [{"text": latest_user}]})
|
||
ag_state["latest_user_appended"] = True
|
||
else:
|
||
print(f"[antigravity] latest user instruction is final turn", file=sys.stderr)
|
||
else:
|
||
if ag_state["latest_user_appended"]:
|
||
print(f"[antigravity] latest user instruction already appended, skipping", file=sys.stderr)
|
||
|
||
print(f"[antigravity-loop] latest_user_hash={latest_user_hash}", file=sys.stderr)
|
||
print(f"[antigravity-loop] tool_calls_for_request={ag_state['tool_calls_for_request']}", file=sys.stderr)
|
||
print(f"[antigravity-loop] repeated_tool={ag_state['repeated_tool']}", file=sys.stderr)
|
||
print(f"[antigravity-loop] nudge_injected={ag_state['nudge_injected']}", file=sys.stderr)
|
||
print(f"[antigravity-loop] force_finalize={ag_state['force_finalize']}", file=sys.stderr)
|
||
print(f"[{self._session_id}] [antigravity-debug] input_items={len(input_data) if isinstance(input_data, list) else 1} contents={len(contents)} latest={latest_user[:80]!r}", file=sys.stderr)
|
||
if contents:
|
||
last_c = contents[-1]
|
||
print(f"[{self._session_id}] [antigravity-debug] final_role={last_c.get('role')} preview={json.dumps(last_c.get('parts', []), ensure_ascii=False)[:200]}", file=sys.stderr)
|
||
|
||
request_body = {"contents": contents}
|
||
if system_parts:
|
||
request_body["systemInstruction"] = {"parts": system_parts}
|
||
if gen_config:
|
||
request_body["generationConfig"] = gen_config
|
||
if gemini_tools:
|
||
request_body["tools"] = gemini_tools
|
||
|
||
if OAUTH_PROVIDER == "google-antigravity" and _is_claude_model and gemini_tools:
|
||
request_body["toolConfig"] = {"functionCallingConfig": {"mode": "VALIDATED"}}
|
||
if _is_claude_thinking:
|
||
print(f"[antigravity-claude] applied VALIDATED toolConfig for thinking model", file=sys.stderr)
|
||
|
||
wrapped = {
|
||
"project": project_id,
|
||
"model": model,
|
||
"request": request_body,
|
||
}
|
||
if OAUTH_PROVIDER == "google-antigravity":
|
||
wrapped["requestType"] = "agent"
|
||
wrapped["userAgent"] = "antigravity"
|
||
wrapped["requestId"] = f"agent-{uuid.uuid4().hex[:12]}"
|
||
wrapped["request"]["sessionId"] = f"{uuid.uuid4().hex}{int(time.time()*1000)}"
|
||
|
||
_allow_staging = os.environ.get("ALLOW_ANTIGRAVITY_STAGING", "0") == "1"
|
||
if OAUTH_PROVIDER == "google-antigravity":
|
||
_antigravity_endpoints = [
|
||
"https://cloudcode-pa.googleapis.com",
|
||
"https://daily-cloudcode-pa.googleapis.com",
|
||
]
|
||
if _allow_staging:
|
||
_antigravity_endpoints.extend([
|
||
"https://daily-cloudcode-pa.sandbox.googleapis.com",
|
||
"https://autopush-cloudcode-pa.sandbox.googleapis.com",
|
||
])
|
||
endpoints = _antigravity_endpoints
|
||
else:
|
||
endpoints = ["https://cloudcode-pa.googleapis.com"]
|
||
action = "streamGenerateContent" if stream else "generateContent"
|
||
url_suffix = f"v1internal:{action}?alt=sse" if stream else f"v1internal:{action}"
|
||
|
||
headers = {
|
||
"Content-Type": "application/json",
|
||
"Authorization": f"Bearer {access_token}",
|
||
}
|
||
if OAUTH_PROVIDER == "google-antigravity":
|
||
version = _ensure_antigravity_version()
|
||
import platform as _plat
|
||
_os_name = _plat.system().lower()
|
||
_os_arch = _plat.machine().lower().replace("x86_64", "x64").replace("aarch64", "arm64")
|
||
headers["User-Agent"] = f"antigravity/{version} {_os_name}/{_os_arch}"
|
||
headers["X-Client-Name"] = "antigravity"
|
||
headers["X-Client-Version"] = _ensure_antigravity_client_version()
|
||
headers["x-goog-api-client"] = "gl-node/18.18.2 fire/0.8.6 grpc/1.10.x"
|
||
# Add X-Machine-Session-Id header as seen in badrisnarayanan/antigravity-claude-proxy
|
||
if "request" in wrapped and "sessionId" in wrapped["request"]:
|
||
headers["X-Machine-Session-Id"] = wrapped["request"]["sessionId"]
|
||
else:
|
||
headers["User-Agent"] = "google-api-nodejs-client/9.15.1"
|
||
headers["X-Goog-Api-Client"] = "gl-node/22.17.0"
|
||
headers["Client-Metadata"] = "ideType=IDE_UNSPECIFIED,platform=PLATFORM_UNSPECIFIED,pluginType=GEMINI"
|
||
body_b = json.dumps(wrapped).encode()
|
||
n_contents = len(contents)
|
||
has_tools = bool(gemini_tools)
|
||
print(f"[{self._session_id}] model={model} stream={stream} items={len(input_data) if isinstance(input_data, list) else 1} project={project_id} contents={n_contents} tools={has_tools}", file=sys.stderr)
|
||
if n_contents > 10:
|
||
debug_path = os.path.join(_LOG_DIR, f"gemini-long-ctx-{self._session_id}.json")
|
||
try:
|
||
with open(debug_path, "w", encoding="utf-8") as dbg:
|
||
json.dump({"contents_count": n_contents, "contents_roles": [c.get("role") for c in contents], "has_tools": has_tools, "model": model, "wrapped_size": len(body_b)}, dbg, indent=2)
|
||
except Exception:
|
||
pass
|
||
|
||
if OAUTH_PROVIDER == "google-antigravity":
|
||
print(f"[antigravity-endpoint] endpoints={[e.replace('https://','') for e in endpoints]} project={project_id}", file=sys.stderr)
|
||
|
||
upstream = None
|
||
chosen_ep = None
|
||
global _antigravity_preferred_endpoint
|
||
|
||
with _antigravity_endpoint_lock:
|
||
_pref = _antigravity_preferred_endpoint
|
||
|
||
if _pref and _pref in endpoints:
|
||
ordered = [_pref] + [e for e in endpoints if e != _pref]
|
||
else:
|
||
ordered = list(endpoints)
|
||
|
||
for ep in ordered:
|
||
target = f"{ep}/{url_suffix}"
|
||
req = urllib.request.Request(target, data=body_b, headers=headers)
|
||
try:
|
||
upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, stream))
|
||
chosen_ep = ep
|
||
with _antigravity_endpoint_lock:
|
||
_antigravity_preferred_endpoint = ep
|
||
if ep != _pref:
|
||
print(f"[{self._session_id}] fallback OK: {ep.replace('https://','')}", file=sys.stderr)
|
||
break
|
||
except urllib.error.HTTPError as e:
|
||
err_body = e.read().decode()
|
||
err_class = _classify_antigravity_error(e.code, err_body)
|
||
print(f"[{self._session_id}] {ep.replace('https://','')} {e.code} class={err_class}", file=sys.stderr)
|
||
if e.code == 400 and OAUTH_PROVIDER.startswith("google"):
|
||
try:
|
||
debug_path = os.path.join(_LOG_DIR, "gemini-last-400-request.json")
|
||
with open(debug_path, "w", encoding="utf-8") as dbg:
|
||
json.dump({"endpoint": ep, "model": model, "wrapped": wrapped, "error": err_body}, dbg, indent=2)
|
||
print(f"[{self._session_id}] saved 400 debug request to {debug_path}", file=sys.stderr)
|
||
except Exception:
|
||
pass
|
||
return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
|
||
if err_class == "auth_permanent":
|
||
return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
|
||
if err_class == "auth_transient":
|
||
print(f"[{self._session_id}] {ep.replace('https://','')} 401 transient, force-refreshing token and retrying", file=sys.stderr)
|
||
try:
|
||
_force_refresh_google_token()
|
||
access_token = _refresh_oauth_token()
|
||
headers["Authorization"] = f"Bearer {access_token}"
|
||
new_body_b = json.dumps(wrapped).encode()
|
||
retry_req = urllib.request.Request(target, data=new_body_b, headers=headers)
|
||
upstream = urllib.request.urlopen(retry_req, timeout=_upstream_timeout(body, stream))
|
||
chosen_ep = ep
|
||
with _antigravity_endpoint_lock:
|
||
_antigravity_preferred_endpoint = ep
|
||
print(f"[{self._session_id}] 401 retry succeeded after token refresh", file=sys.stderr)
|
||
break
|
||
except Exception as retry_e:
|
||
print(f"[{self._session_id}] 401 retry also failed: {retry_e}", file=sys.stderr)
|
||
return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
|
||
if err_class in ("quota_exhausted", "rate_limited"):
|
||
reset_s = _parse_rate_limit_reset(err_body)
|
||
if ep == ordered[-1]:
|
||
pool = _google_antigravity_pool if OAUTH_PROVIDER == "google-antigravity" else _google_cli_pool
|
||
_, acct = _get_google_account(OAUTH_PROVIDER)
|
||
if acct:
|
||
cooldown = reset_s if reset_s and reset_s > 10 else 60
|
||
pool.mark_rate_limited(acct, cooldown)
|
||
print(f"[{self._session_id}] quota reset in ~{reset_s}s, cooldown={cooldown}s", file=sys.stderr)
|
||
return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
|
||
print(f"[{self._session_id}] {ep.replace('https://','')} 429, trying next", file=sys.stderr)
|
||
with _antigravity_endpoint_lock:
|
||
_antigravity_preferred_endpoint = None
|
||
continue
|
||
if err_class in ("service_disabled", "forbidden", "account_banned", "validation_required"):
|
||
if ep == ordered[-1]:
|
||
return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
|
||
continue
|
||
if ep == ordered[-1]:
|
||
return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
|
||
continue
|
||
except Exception as e:
|
||
print(f"[{self._session_id}] {ep.replace('https://','')} conn failed: {e}", file=sys.stderr)
|
||
if ep == ordered[-1]:
|
||
return self.send_json(502, {"error": {"type": "proxy_error", "message": str(e)}})
|
||
continue
|
||
|
||
if upstream is None:
|
||
return self.send_json(502, {"error": {"type": "proxy_error", "message": "All endpoints failed"}})
|
||
|
||
if stream:
|
||
self._forward_gemini_sse(upstream, model, body, input_data, tracker)
|
||
else:
|
||
self._forward_gemini_json(upstream, model, body, input_data)
|
||
|
||
def _forward_gemini_sse(self, upstream, model, body, input_data, tracker=None):
|
||
resp_id = f"resp-{uuid.uuid4().hex[:24]}"
|
||
created = int(time.time())
|
||
self.send_response(200)
|
||
self.send_header("Content-Type", "text/event-stream")
|
||
self.send_header("Cache-Control", "no-cache")
|
||
self.send_header("Connection", "keep-alive")
|
||
self.end_headers()
|
||
|
||
full_text = ""
|
||
output_items = []
|
||
current_tool_calls = {}
|
||
message_started = False
|
||
message_id = f"msg-{uuid.uuid4().hex[:24]}"
|
||
|
||
def flush_event(event_type, data):
|
||
self.wfile.write(f"event: {event_type}\ndata: {json.dumps(data)}\n\n".encode())
|
||
self.wfile.flush()
|
||
|
||
flush_event("response.created", {"type": "response.created", "response": {"id": resp_id, "object": "response", "model": model, "status": "in_progress", "created": created, "output": []}})
|
||
flush_event("response.in_progress", {"type": "response.in_progress", "response": {"id": resp_id}})
|
||
|
||
buf = ""
|
||
stream_finished = False
|
||
for raw_line in _stream_with_idle_timeout(upstream, _idle_timeout_for_model(model)):
|
||
if tracker and tracker.cancelled.is_set():
|
||
print("[gemini-oauth] stream cancelled", file=sys.stderr)
|
||
break
|
||
if stream_finished:
|
||
break
|
||
line = raw_line.decode(errors="replace")
|
||
if line.startswith("data: "):
|
||
buf += line[6:]
|
||
continue
|
||
if not line.strip() and buf:
|
||
try:
|
||
chunk = json.loads(buf)
|
||
except Exception:
|
||
buf = ""
|
||
continue
|
||
buf = ""
|
||
|
||
candidates = chunk.get("response", chunk).get("candidates", [])
|
||
if not candidates:
|
||
if chunk.get("error"):
|
||
print(f"[{self._session_id}] stream error chunk: {str(chunk.get('error'))[:300]}", file=sys.stderr)
|
||
continue
|
||
if candidates[0].get("finishReason") and not candidates[0].get("content", {}).get("parts"):
|
||
print(f"[{self._session_id}] finish without parts: {candidates[0].get('finishReason')}", file=sys.stderr)
|
||
parts = candidates[0].get("content", {}).get("parts", [])
|
||
for part in parts:
|
||
sig = _extract_gemini_sig(part)
|
||
if sig:
|
||
if part.get("functionCall"):
|
||
fc_id = part["functionCall"].get("id") or part["functionCall"].get("name")
|
||
fc_name = part["functionCall"].get("name")
|
||
if fc_id:
|
||
_gemini_store_sig(f"fc:{fc_id}", sig)
|
||
if fc_name:
|
||
_gemini_store_sig(f"fc:{fc_name}", sig)
|
||
_gemini_store_sig(f"turn:{resp_id}", sig)
|
||
if part.get("thought"):
|
||
sig_from_thought = _extract_gemini_sig(part)
|
||
if sig_from_thought:
|
||
_gemini_store_sig(f"turn:{resp_id}", sig_from_thought)
|
||
continue
|
||
if "text" in part and not part.get("functionCall"):
|
||
text_delta = part["text"]
|
||
if not text_delta:
|
||
continue
|
||
full_text += text_delta
|
||
if not message_started:
|
||
flush_event("response.output_item.added", {"type": "response.output_item.added", "output_index": 0, "item": {"type": "message", "id": message_id, "role": "assistant", "content": []}})
|
||
flush_event("response.content_part.added", {"type": "response.content_part.added", "output_index": 0, "content_index": 0, "part": {"type": "output_text", "text": ""}})
|
||
output_items.append({"text": True})
|
||
message_started = True
|
||
flush_event("response.output_text.delta", {"type": "response.output_text.delta", "output_index": 0, "content_index": 0, "delta": text_delta})
|
||
elif part.get("functionCall"):
|
||
fc = part["functionCall"]
|
||
call_id = f"call_{uuid.uuid4().hex[:24]}"
|
||
args_str = json.dumps(fc.get("args", fc.get("arguments", {})))
|
||
output_index = len(output_items)
|
||
flush_event("response.output_item.added", {"type": "response.output_item.added", "output_index": output_index, "item": {"type": "function_call", "id": call_id, "call_id": call_id, "name": fc.get("name", ""), "arguments": ""}})
|
||
flush_event("response.function_call_arguments.delta", {"type": "response.function_call_arguments.delta", "output_index": output_index, "item_id": call_id, "delta": args_str})
|
||
flush_event("response.function_call_arguments.done", {"type": "response.function_call_arguments.done", "output_index": output_index, "item_id": call_id, "arguments": args_str})
|
||
current_tool_calls[call_id] = fc
|
||
output_items.append({"tool": True})
|
||
last_finish = candidates[0].get("finishReason", "")
|
||
if last_finish:
|
||
part_kinds = []
|
||
for p in parts:
|
||
if "text" in p: part_kinds.append("text")
|
||
if "functionCall" in p: part_kinds.append("functionCall")
|
||
if _extract_gemini_sig(p): part_kinds.append("thoughtSignature")
|
||
print(f"[{self._session_id}] [antigravity] finish={last_finish} parts={part_kinds} tool_calls={len(current_tool_calls)}", file=sys.stderr)
|
||
if OAUTH_PROVIDER == "google-antigravity" and last_finish == "MAX_TOKENS" and full_text and not current_tool_calls:
|
||
print(f"[{self._session_id}] MAX_TOKENS hit ({len(full_text)} chars), auto-continuing...", file=sys.stderr)
|
||
break
|
||
stream_finished = True
|
||
break
|
||
|
||
if OAUTH_PROVIDER.startswith("google") and full_text and not current_tool_calls and last_finish == "MAX_TOKENS" and not stream_finished:
|
||
result = _auto_continue_gemini(self, flush_event, message_id, model, gen_config, gemini_tools, system_parts, project_id, headers, endpoints, url_suffix, full_text, output_items, message_started)
|
||
if result:
|
||
full_text = result
|
||
for item in output_items:
|
||
if isinstance(item, dict) and item.get("tool") and "fc" in item and "call_id" in item:
|
||
current_tool_calls[item["call_id"]] = item["fc"]
|
||
|
||
out = []
|
||
if not full_text and not current_tool_calls:
|
||
print("[gemini-oauth] WARNING: completed with empty output", file=sys.stderr)
|
||
if full_text:
|
||
out.append({"type": "message", "id": message_id, "role": "assistant", "content": [{"type": "output_text", "text": full_text}]})
|
||
tool_outputs = []
|
||
for cid, fc in current_tool_calls.items():
|
||
tool_outputs.append({"type": "function_call", "id": cid, "call_id": cid, "name": fc.get("name", ""), "arguments": json.dumps(fc.get("args", fc.get("arguments", {})))})
|
||
out.extend(tool_outputs)
|
||
|
||
final_resp = {"id": resp_id, "object": "response", "model": model, "status": "completed", "created": created, "output": out}
|
||
if full_text:
|
||
flush_event("response.output_text.done", {"type": "response.output_text.done", "output_index": 0, "content_index": 0, "text": full_text})
|
||
flush_event("response.content_part.done", {"type": "response.content_part.done", "output_index": 0, "content_index": 0, "part": {"type": "output_text", "text": full_text}})
|
||
flush_event("response.output_item.done", {"type": "response.output_item.done", "output_index": 0, "item": out[0]})
|
||
for idx, item in enumerate(tool_outputs, start=(1 if full_text else 0)):
|
||
flush_event("response.output_item.done", {"type": "response.output_item.done", "output_index": idx, "item": item})
|
||
flush_event("response.completed", {"type": "response.completed", "response": final_resp})
|
||
self.close_connection = True
|
||
|
||
with _response_store_lock:
|
||
_response_store[resp_id] = final_resp
|
||
while len(_response_store) > _MAX_STORED:
|
||
_response_store.popitem(last=False)
|
||
|
||
def _forward_gemini_json(self, upstream, model, body, input_data):
|
||
data = json.loads(upstream.read().decode())
|
||
resp_id = f"resp-{uuid.uuid4().hex[:24]}"
|
||
created = int(time.time())
|
||
out = []
|
||
full_text = ""
|
||
candidates = data.get("response", data).get("candidates", [])
|
||
if candidates:
|
||
parts = candidates[0].get("content", {}).get("parts", [])
|
||
text_parts = []
|
||
for part in parts:
|
||
if part.get("thought"):
|
||
continue
|
||
if "text" in part and not part.get("functionCall"):
|
||
text_parts.append(part["text"])
|
||
elif part.get("functionCall"):
|
||
fc = part["functionCall"]
|
||
call_id = f"call_{uuid.uuid4().hex[:24]}"
|
||
out.append({"type": "function_call", "id": call_id, "call_id": call_id, "name": fc.get("name", ""), "arguments": json.dumps(fc.get("args", fc.get("arguments", {})))})
|
||
if text_parts:
|
||
full_text = "".join(text_parts)
|
||
out.insert(0, {"type": "message", "id": f"msg-{uuid.uuid4().hex[:24]}", "role": "assistant", "content": [{"type": "output_text", "text": full_text}]})
|
||
resp = {"id": resp_id, "object": "response", "model": model, "status": "completed", "created": created, "output": out}
|
||
with _response_store_lock:
|
||
_response_store[resp_id] = resp
|
||
while len(_response_store) > _MAX_STORED:
|
||
_response_store.popitem(last=False)
|
||
self.send_json(200, resp)
|
||
|
||
def _handle_bgp(self, body, model, stream, messages, input_data):
|
||
routes = _sorted_bgp_routes()
|
||
routes = [r for r in routes if _bucket_for_route(r).allow()]
|
||
if not routes:
|
||
return self.send_json(503, {"error": {"type": "bgp_rate_limited", "message": "All routes rate-limited"}})
|
||
errors = []
|
||
for route in routes:
|
||
r_model = route.get("model", model)
|
||
r_url = route["target_url"].rstrip("/")
|
||
r_key = route.get("api_key", "")
|
||
r_reasoning = route.get("reasoning_enabled", True)
|
||
r_effort = route.get("reasoning_effort", "medium")
|
||
r_oauth = route.get("oauth_provider", "")
|
||
|
||
chat_body = dict(messages=list(messages))
|
||
chat_body["model"] = r_model
|
||
for k in ("temperature", "top_p"):
|
||
if k in body:
|
||
chat_body[k] = body[k]
|
||
chat_body["max_tokens"] = max(body.get("max_output_tokens", 0), 64000)
|
||
tools = oa_convert_tools(body.get("tools"))
|
||
if tools:
|
||
chat_body["tools"] = tools
|
||
if body.get("tool_choice"):
|
||
chat_body["tool_choice"] = body["tool_choice"]
|
||
chat_body["stream"] = stream
|
||
if not r_reasoning or r_effort == "none":
|
||
chat_body["enable_thinking"] = False
|
||
chat_body["reasoning_effort"] = "none"
|
||
else:
|
||
chat_body["reasoning_effort"] = r_effort
|
||
|
||
target = upstream_target(r_url, "/chat/completions")
|
||
if r_oauth == "google":
|
||
r_key = _refresh_oauth_token_for(r_key, r_oauth)
|
||
fwd = forwarded_headers(self.headers, {
|
||
"Content-Type": "application/json",
|
||
"Authorization": f"Bearer {r_key}",
|
||
**_openrouter_extra(),
|
||
}, browser_ua=True)
|
||
print(f"[{self._session_id}] trying route '{route.get('name', r_url)}' model={r_model}", file=sys.stderr)
|
||
req = urllib.request.Request(target, data=json.dumps(chat_body).encode(), headers=fwd)
|
||
t0_route = time.time()
|
||
route_ok = False
|
||
for attempt in range(3):
|
||
try:
|
||
upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, stream))
|
||
print(f"[{self._session_id}] route '{route.get('name', r_url)}' connected OK", file=sys.stderr)
|
||
_update_route_stats(route, True, time.time() - t0_route)
|
||
self._forward_oa_compat(upstream, stream, r_model, chat_body, body, input_data, fwd, target)
|
||
return
|
||
except urllib.error.HTTPError as e:
|
||
err = e.read().decode()
|
||
if e.code in (429, 502, 503) and attempt < 2:
|
||
retry_after = e.headers.get("Retry-After")
|
||
wait = min(int(retry_after), 60) if retry_after and retry_after.isdigit() else min(2 ** (attempt + 1), 10)
|
||
print(f"[{self._session_id}] route '{route.get('name', r_url)}' HTTP {e.code}, retry {attempt+1}/2 in {wait}s", file=sys.stderr)
|
||
time.sleep(wait)
|
||
req = urllib.request.Request(target, data=json.dumps(chat_body).encode(), headers=fwd)
|
||
continue
|
||
print(f"[{self._session_id}] route '{route.get('name', r_url)}' FAILED: HTTP {e.code}: {err[:200]}", file=sys.stderr)
|
||
_update_route_stats(route, False, time.time() - t0_route, http_code=e.code)
|
||
errors.append(f"{route.get('name','?')}: HTTP {e.code}")
|
||
break
|
||
except (ConnectionResetError, ConnectionAbortedError, BrokenPipeError) as e:
|
||
if attempt < 2:
|
||
wait = min(2 ** (attempt + 1), 8)
|
||
print(f"[{self._session_id}] route '{route.get('name', r_url)}' conn error, retry {attempt+1}/2 in {wait}s: {e}", file=sys.stderr)
|
||
time.sleep(wait)
|
||
req = urllib.request.Request(target, data=json.dumps(chat_body).encode(), headers=fwd)
|
||
continue
|
||
_update_route_stats(route, False, time.time() - t0_route, error_type=str(e))
|
||
errors.append(f"{route.get('name','?')}: {e}")
|
||
break
|
||
except Exception as e:
|
||
print(f"[{self._session_id}] route '{route.get('name', r_url)}' FAILED: {e}", file=sys.stderr)
|
||
_update_route_stats(route, False, time.time() - t0_route, error_type=str(e))
|
||
errors.append(f"{route.get('name','?')}: {e}")
|
||
break
|
||
|
||
print(f"[{self._session_id}] ALL ROUTES FAILED: {errors}", file=sys.stderr)
|
||
self.send_json(502, {"error": {"type": "bgp_all_routes_failed", "message": f"All BGP routes failed: {'; '.join(errors)}"}})
|
||
|
||
def _forward_oa_compat(self, upstream, stream, model, chat_body, body, input_data, fwd, target, tracker=None):
|
||
n_items = len(input_data) if isinstance(input_data, list) else 1
|
||
t0 = time.time()
|
||
provider = TARGET_URL.split("//")[-1].split("/")[0]
|
||
if BGP_ROUTES:
|
||
provider = "bgp:" + (BGP_ROUTES[0].get("name", "pool") if BGP_ROUTES else "unknown")
|
||
|
||
if stream:
|
||
self.send_response(200)
|
||
self.send_header("Content-Type", "text/event-stream")
|
||
self.send_header("Cache-Control", "no-cache")
|
||
self.send_header("Connection", "keep-alive")
|
||
self.end_headers()
|
||
if hasattr(self, 'connection') and self.connection:
|
||
try:
|
||
self.connection.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
|
||
except Exception:
|
||
pass
|
||
|
||
collected_events = []
|
||
last_resp_id = None
|
||
last_output = None
|
||
last_status = None
|
||
finish_reason = None
|
||
has_content = False
|
||
has_message = False
|
||
has_tool_call = False
|
||
|
||
def _observe_event(event):
|
||
nonlocal last_resp_id, last_output, last_status, finish_reason, has_content, has_message, has_tool_call
|
||
for line in event.strip().split("\n"):
|
||
if line.startswith("data: "):
|
||
try:
|
||
d = json.loads(line[6:])
|
||
if d.get("type") == "response.completed":
|
||
last_resp_id = d.get("response", {}).get("id")
|
||
last_output = d.get("response", {}).get("output", [])
|
||
last_status = d.get("response", {}).get("status")
|
||
finish_reason = "length" if last_status == "incomplete" else "stop"
|
||
has_tool_call = any(o.get("type") == "function_call" for o in (last_output or []))
|
||
has_message = any(o.get("type") == "message" for o in (last_output or []))
|
||
has_content = has_message or has_tool_call
|
||
except Exception:
|
||
pass
|
||
|
||
try:
|
||
reasoning_out = {}
|
||
for event in oa_stream_to_sse(upstream, model, body.get("request_id") or body.get("id"), _reasoning_out=reasoning_out):
|
||
if tracker and tracker.cancelled.is_set():
|
||
print("[translate-proxy] stream cancelled", file=sys.stderr)
|
||
break
|
||
collected_events.append(event)
|
||
_observe_event(event)
|
||
print(f"[{self._session_id}] stream ended: events={len(collected_events)} finish={finish_reason} has_content={has_content} has_message={has_message} has_tool_call={has_tool_call} elapsed={time.time()-t0:.1f}s", file=sys.stderr)
|
||
except (ConnectionResetError, BrokenPipeError, ConnectionAbortedError):
|
||
print("[translate-proxy] client disconnected during stream", file=sys.stderr)
|
||
_crof_record(model, n_items, False)
|
||
_log_resp(last_resp_id, "client_disconnect", last_output)
|
||
return
|
||
except (TimeoutError, OSError, urllib.error.URLError) as e:
|
||
print(f"[translate-proxy] upstream error during stream: {type(e).__name__}: {e}", file=sys.stderr)
|
||
err_resp_id = body.get("request_id") or body.get("id") or uid("resp")
|
||
try:
|
||
self.wfile.write(emit("response.failed", {"type": "response.failed",
|
||
"response": {"id": err_resp_id, "error": {"type": "upstream_error",
|
||
"code": "stream_interrupted", "message": str(e)[:200]}}}).encode())
|
||
self.wfile.flush()
|
||
except Exception:
|
||
pass
|
||
_crof_record(model, n_items, False)
|
||
_log_resp(last_resp_id, "upstream_error", last_output)
|
||
return
|
||
|
||
# Record outcome
|
||
success = (finish_reason != "length")
|
||
_crof_record(model, n_items, success)
|
||
_log_resp(last_resp_id, last_status, last_output)
|
||
if last_resp_id and input_data is not None:
|
||
store_response(last_resp_id, input_data, last_output)
|
||
if reasoning_out.get("text"):
|
||
with _last_reasoning_lock:
|
||
_last_reasoning_store[last_resp_id or ""] = {
|
||
"reasoning": reasoning_out["text"],
|
||
"tool_calls": reasoning_out.get("tool_calls", []),
|
||
"ts": time.time(),
|
||
}
|
||
while len(_last_reasoning_store) > _MAX_STORED:
|
||
oldest = next(iter(_last_reasoning_store))
|
||
del _last_reasoning_store[oldest]
|
||
_record_usage(provider, model, success, time.time() - t0, error_type="length" if not success else None)
|
||
|
||
# Auto-learn provider quirks before flushing the bad response to Codex.
|
||
if finish_reason == "length" and not has_content and has_function_call_output(input_data):
|
||
_set_provider_cap(model, "synthetic_tool_results", True, "incomplete empty response after tool output")
|
||
new_input, synthesized = synthesize_tool_results_for_chat(input_data)
|
||
if synthesized:
|
||
print("[provider-sensor] retrying turn with synthetic tool results", file=sys.stderr)
|
||
new_messages = oa_input_to_messages(new_input)
|
||
instructions = body.get("instructions", "").strip()
|
||
if instructions:
|
||
new_messages.insert(0, {"role": "system", "content": instructions})
|
||
new_chat_body = self._build_chat_body(model, new_messages, body, stream)
|
||
new_req = urllib.request.Request(target, data=json.dumps(new_chat_body).encode(), headers=fwd)
|
||
try:
|
||
retry_upstream = urllib.request.urlopen(new_req, timeout=_upstream_timeout(body, True))
|
||
collected_events = []
|
||
last_resp_id = last_output = last_status = None
|
||
finish_reason = None
|
||
has_content = False
|
||
has_message = False
|
||
has_tool_call = False
|
||
for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")):
|
||
collected_events.append(event)
|
||
_observe_event(event)
|
||
input_data = new_input
|
||
except Exception as e:
|
||
print(f"[provider-sensor] synthetic retry failed: {e}", file=sys.stderr)
|
||
|
||
# Auto-retry on finish_reason=length with no content due to too much context.
|
||
if finish_reason == "length" and not has_content and isinstance(input_data, list) and len(input_data) > 5:
|
||
print(f"[crof-adaptive] RETRY: finish_reason=length with no content, compacting {n_items} items", file=sys.stderr)
|
||
new_input = _crof_compact_for_retry(input_data, model)
|
||
if len(new_input) < len(input_data):
|
||
new_body = dict(body)
|
||
new_body["input"] = new_input
|
||
new_messages = oa_input_to_messages(new_input)
|
||
instructions = body.get("instructions", "").strip()
|
||
if instructions:
|
||
new_messages.insert(0, {"role": "system", "content": instructions})
|
||
new_chat_body = dict(chat_body)
|
||
new_chat_body["messages"] = new_messages
|
||
new_req = urllib.request.Request(
|
||
target,
|
||
data=json.dumps(new_chat_body).encode(),
|
||
headers=fwd,
|
||
)
|
||
try:
|
||
retry_upstream = urllib.request.urlopen(new_req, timeout=_upstream_timeout(body, True))
|
||
collected_events = []
|
||
last_resp_id = last_output = last_status = None
|
||
finish_reason = None
|
||
has_content = False
|
||
has_message = False
|
||
has_tool_call = False
|
||
for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")):
|
||
collected_events.append(event)
|
||
_observe_event(event)
|
||
input_data = new_input
|
||
except Exception as e:
|
||
print(f"[crof-adaptive] retry failed: {e}", file=sys.stderr)
|
||
|
||
# ── Auto-continue for truncated responses ── (cobra PR)
|
||
_ac_did_run = False
|
||
if stream and collected_events:
|
||
_ac_text = ""
|
||
_ac_msg_id = _ac_resp_id = None
|
||
for _ev in collected_events:
|
||
for _ln in _ev.strip().split("\n"):
|
||
if not _ln.startswith("data: "):
|
||
continue
|
||
try:
|
||
_d = json.loads(_ln[6:])
|
||
_t = _d.get("type")
|
||
if _t == "response.output_text.done":
|
||
_ac_text = _d.get("text", "")
|
||
elif _t == "response.output_item.added" and _d.get("item",{}).get("type") == "message":
|
||
_ac_msg_id = _d.get("item",{}).get("id")
|
||
elif _t == "response.completed":
|
||
_ac_resp_id = _d.get("response",{}).get("id")
|
||
except Exception:
|
||
pass
|
||
|
||
_ac_tc = reasoning_out.get("tool_calls", [])
|
||
_ac_truncated = False
|
||
if not _ac_tc and _ac_text:
|
||
_ac_stripped = _ac_text.rstrip()
|
||
if finish_reason == "length":
|
||
_ac_truncated = True
|
||
elif len(_ac_stripped) > 10 and _ac_stripped[-1] in "(:,;…":
|
||
_ac_truncated = True
|
||
|
||
if _ac_truncated and _ac_text:
|
||
print(f"[{self._session_id}] auto-continue: truncated (finish={finish_reason}, ends '{_ac_text.rstrip()[-10:]}')", file=sys.stderr)
|
||
_ac_did_run = True
|
||
_ac_cut = len(collected_events)
|
||
for _i, _ev2 in enumerate(collected_events):
|
||
if "response.output_text.done" in _ev2:
|
||
_ac_cut = _i
|
||
break
|
||
collected_events = collected_events[:_ac_cut]
|
||
|
||
_ac_accumulated = _ac_text
|
||
_ac_max = 3
|
||
for _ac_attempt in range(_ac_max):
|
||
try:
|
||
_ac_cont_msgs = list(chat_body.get("messages", []))
|
||
_ac_cont_msgs.append({"role": "assistant", "content": _ac_accumulated})
|
||
_ac_cont_msgs.append({"role": "user", "content": "Continue exactly where you left off. Do not repeat anything already written."})
|
||
_ac_cont_body = dict(chat_body)
|
||
_ac_cont_body["messages"] = _ac_cont_msgs
|
||
_ac_cont_body["stream"] = False
|
||
_ac_cont_req = urllib.request.Request(target, data=json.dumps(_ac_cont_body).encode(), headers=fwd)
|
||
_ac_cont_resp = json.loads(urllib.request.urlopen(_ac_cont_req, timeout=120).read())
|
||
_ac_choices = _ac_cont_resp.get("choices", [])
|
||
if _ac_choices:
|
||
_ac_chunk = _ac_choices[0].get("message",{}).get("content","")
|
||
if not _ac_chunk:
|
||
_ac_chunk = _ac_choices[0].get("delta",{}).get("content","")
|
||
_ac_finish = _ac_choices[0].get("finish_reason")
|
||
if _ac_chunk:
|
||
_ac_accumulated += _ac_chunk
|
||
collected_events.append(emit("response.output_text.delta", {
|
||
"type": "response.output_text.delta",
|
||
"delta": _ac_chunk, "item_id": _ac_msg_id, "content_index": 0}))
|
||
if _ac_finish != "length":
|
||
break
|
||
_ac_text = _ac_accumulated
|
||
except Exception as _ac_e:
|
||
print(f"[{self._session_id}] auto-continue attempt {_ac_attempt+1} failed: {_ac_e}", file=sys.stderr)
|
||
break
|
||
|
||
if _ac_msg_id:
|
||
collected_events.append(emit("response.output_text.done", {
|
||
"type": "response.output_text.done",
|
||
"text": _ac_accumulated, "item_id": _ac_msg_id, "content_index": 0}))
|
||
collected_events.append(emit("response.content_part.done", {
|
||
"type": "response.content_part.done",
|
||
"part": {"type": "output_text", "text": _ac_accumulated, "annotations": []}, "item_id": _ac_msg_id}))
|
||
collected_events.append(emit("response.output_item.done", {
|
||
"type": "response.output_item.done",
|
||
"item": {"type": "message", "id": _ac_msg_id, "role": "assistant", "status": "completed",
|
||
"content": [{"type": "output_text", "text": _ac_accumulated, "annotations": []}]}}))
|
||
if _ac_resp_id:
|
||
collected_events.append(emit("response.completed", {
|
||
"type": "response.completed",
|
||
"response": {"id": _ac_resp_id, "object": "response", "model": model,
|
||
"status": "completed", "created": int(time.time()),
|
||
"output": [{"type": "message", "id": _ac_msg_id, "role": "assistant",
|
||
"status": "completed",
|
||
"content": [{"type": "output_text", "text": _ac_accumulated, "annotations": []}]}]}}))
|
||
has_content = True
|
||
finish_reason = "stop"
|
||
print(f"[{self._session_id}] auto-continue done: {len(_ac_text)} -> {len(_ac_accumulated)} chars", file=sys.stderr)
|
||
|
||
# Smart continuation: loop with escalating nudges when model stops text-only mid-task.
|
||
# Skip if auto-continue already handled the response.
|
||
if not _ac_did_run:
|
||
_smart_max = 2
|
||
_smart_attempt = 0
|
||
while _smart_attempt < _smart_max:
|
||
_has_tool_calls_in_output = any(o.get("type") == "function_call" for o in (last_output or []))
|
||
last_text = ""
|
||
for o in (last_output or []):
|
||
if o.get("type") == "message":
|
||
for c in (o.get("content") or []):
|
||
if isinstance(c, dict) and c.get("type") == "output_text":
|
||
last_text += c.get("text", "")
|
||
_looks_like_tools = _text_looks_like_tool_calls(last_text)
|
||
_has_prior_tool_ctx = has_function_call_output(input_data)
|
||
if not (finish_reason == "stop" and has_content and not _has_tool_calls_in_output
|
||
and isinstance(input_data, list) and len(input_data) >= 3
|
||
and (_has_prior_tool_ctx or _looks_like_tools)):
|
||
break
|
||
_smart_attempt += 1
|
||
_nudges = [
|
||
"Continue with the task using tool calls. Do NOT describe what to do — call the appropriate functions.",
|
||
"You MUST use tool calls to complete the task. Read files, run commands, and make changes using tools. Do NOT output XML tool calls as text.",
|
||
]
|
||
nudge_text = _nudges[min(_smart_attempt - 1, len(_nudges) - 1)]
|
||
# Try extracting XML tool calls from text as fallback before nudging
|
||
xml_fc = _extract_xml_tool_calls(last_text)
|
||
if xml_fc:
|
||
print(f"[{self._session_id}] [smart-continue] extracted {len(xml_fc)} XML tool calls from text, injecting and retrying", file=sys.stderr)
|
||
fake_input = list(input_data)
|
||
for xfc in xml_fc:
|
||
fake_input.append({"type": "function_call", "id": uid("fcx"), "call_id": uid("fcx"),
|
||
"name": xfc["name"], "arguments": xfc["args"], "status": "completed"})
|
||
fake_messages = oa_input_to_messages(fake_input)
|
||
instructions = body.get("instructions", "").strip()
|
||
if instructions:
|
||
fake_messages.insert(0, {"role": "system", "content": instructions})
|
||
fake_chat_body = self._build_chat_body(model, fake_messages, body, stream)
|
||
fake_req = urllib.request.Request(target, data=json.dumps(fake_chat_body).encode(), headers=fwd)
|
||
try:
|
||
retry_upstream = urllib.request.urlopen(fake_req, timeout=_upstream_timeout(body, True))
|
||
collected_events = []
|
||
last_resp_id = last_output = last_status = None
|
||
finish_reason = None
|
||
has_content = False
|
||
has_message = False
|
||
has_tool_call = False
|
||
for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")):
|
||
collected_events.append(event)
|
||
_observe_event(event)
|
||
input_data = fake_input
|
||
continue
|
||
except Exception as e:
|
||
print(f"[{self._session_id}] [smart-continue] XML injection retry failed: {e}", file=sys.stderr)
|
||
break
|
||
_nudge_msg = {"role": "user", "content": nudge_text}
|
||
_nudge_schema = _load_schema(model=model)
|
||
nudge_messages = oa_input_to_messages(_preprocess_vision_input(input_data, _nudge_schema) if _nudge_schema and not _nudge_schema.supports_vision else input_data) + [_nudge_msg]
|
||
instructions = body.get("instructions", "").strip()
|
||
if instructions:
|
||
nudge_messages.insert(0, {"role": "system", "content": instructions})
|
||
nudge_chat_body = self._build_chat_body(model, nudge_messages, body, stream)
|
||
nudge_req = urllib.request.Request(target, data=json.dumps(nudge_chat_body).encode(), headers=fwd)
|
||
print(f"[{self._session_id}] [smart-continue] attempt {_smart_attempt}/{_smart_max}: model stopped mid-task (prior_ctx={_has_prior_tool_ctx} text_tools={_looks_like_tools}), nudging", file=sys.stderr)
|
||
try:
|
||
retry_upstream = urllib.request.urlopen(nudge_req, timeout=_upstream_timeout(body, True))
|
||
collected_events = []
|
||
last_resp_id = last_output = last_status = None
|
||
finish_reason = None
|
||
has_content = False
|
||
has_message = False
|
||
has_tool_call = False
|
||
for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")):
|
||
collected_events.append(event)
|
||
_observe_event(event)
|
||
except Exception as e:
|
||
print(f"[{self._session_id}] [smart-continue] nudge attempt {_smart_attempt} failed: {e}", file=sys.stderr)
|
||
break
|
||
|
||
self.stream_buffered_events(collected_events)
|
||
else:
|
||
result = oa_resp_to_responses(json.loads(upstream.read()), model)
|
||
success = result.get("status") != "incomplete"
|
||
_crof_record(model, n_items, success)
|
||
self.send_json(200, result)
|
||
rid = result.get("id")
|
||
_log_resp(rid, result.get("status"), result.get("output", []))
|
||
if rid and input_data is not None:
|
||
store_response(rid, input_data, result.get("output", []))
|
||
_record_usage(provider, model, success, time.time() - t0)
|
||
|
||
def _forward_oa_compat_retry(self, req, model, chat_body, body, input_data, tracker=None):
|
||
try:
|
||
upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, True))
|
||
except Exception as e:
|
||
print(f"[crof-adaptive] retry failed: {e}", file=sys.stderr)
|
||
return
|
||
|
||
self.send_response(200)
|
||
self.send_header("Content-Type", "text/event-stream")
|
||
self.send_header("Cache-Control", "no-cache")
|
||
self.send_header("Connection", "keep-alive")
|
||
self.end_headers()
|
||
if hasattr(self, 'connection') and self.connection:
|
||
try:
|
||
self.connection.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
|
||
except Exception:
|
||
pass
|
||
|
||
last_resp_id = None
|
||
last_output = None
|
||
last_status = None
|
||
try:
|
||
def on_event(event):
|
||
nonlocal last_resp_id, last_output, last_status
|
||
if tracker and tracker.cancelled.is_set():
|
||
print("[translate-proxy] retry stream cancelled", file=sys.stderr)
|
||
return False
|
||
for line in event.strip().split("\n"):
|
||
if line.startswith("data: "):
|
||
try:
|
||
d = json.loads(line[6:])
|
||
if d.get("type") == "response.completed":
|
||
last_resp_id = d.get("response", {}).get("id")
|
||
last_output = d.get("response", {}).get("output", [])
|
||
last_status = d.get("response", {}).get("status")
|
||
except: pass
|
||
return True
|
||
self.stream_buffered_events(oa_stream_to_sse(upstream, model, body.get("request_id") or body.get("id")), on_event=on_event)
|
||
except (ConnectionResetError, BrokenPipeError, ConnectionAbortedError):
|
||
print("[translate-proxy] client disconnected during retry stream", file=sys.stderr)
|
||
|
||
n_items = len(input_data) if isinstance(input_data, list) else 1
|
||
_crof_record(model, n_items, last_status == "completed")
|
||
_log_resp(last_resp_id, last_status or "retry_disconnect", last_output)
|
||
if last_resp_id and input_data is not None:
|
||
store_response(last_resp_id, input_data, last_output)
|
||
|
||
def _handle_anthropic(self, body, model, stream, tracker=None):
|
||
input_data = body.get("input", "")
|
||
an_body = {"model": model, "messages": an_input_to_messages(input_data),
|
||
"max_tokens": body.get("max_output_tokens", 8192)}
|
||
instructions = body.get("instructions", "").strip()
|
||
if instructions:
|
||
an_body["system"] = [{"type": "text", "text": instructions,
|
||
"cache_control": {"type": "ephemeral"}}]
|
||
for k in ("temperature", "top_p"):
|
||
if k in body:
|
||
an_body[k] = body[k]
|
||
tools = an_convert_tools(body.get("tools"))
|
||
if tools:
|
||
an_body["tools"] = tools
|
||
if body.get("tool_choice"):
|
||
tc = body["tool_choice"]
|
||
if isinstance(tc, str):
|
||
an_body["tool_choice"] = {"type": tc}
|
||
elif isinstance(tc, dict):
|
||
an_body["tool_choice"] = tc
|
||
an_body["stream"] = stream
|
||
|
||
target = upstream_target(TARGET_URL, "/messages")
|
||
req = urllib.request.Request(
|
||
target,
|
||
data=json.dumps(an_body).encode(),
|
||
headers=forwarded_headers(self.headers, {
|
||
"Content-Type": "application/json",
|
||
"x-api-key": API_KEY,
|
||
"anthropic-version": "2023-06-01",
|
||
**_openrouter_extra(),
|
||
}),
|
||
)
|
||
self._forward(req, stream, model,
|
||
lambda r: an_resp_to_responses(json.loads(r.read()), model),
|
||
lambda s: an_stream_to_sse(s, model, body.get("request_id") or body.get("id")),
|
||
input_data=body.get("input", ""), tracker=tracker)
|
||
|
||
def _handle_command_code(self, body, model, stream, tracker=None):
|
||
"""[ALL FIXES IN ONE] CommandCode /alpha/generate adapter.
|
||
|
||
FIX 1: Uses cc_input_to_messages (string content only, no content blocks)
|
||
FIX 2: Always sends x-command-code-version header (fallback "0.26.8")
|
||
FIX 3: No stale schema cache — cleared, 24h TTL
|
||
FIX 4: Streaming path wrapped in try/except → sends response.completed(status="failed") on crash
|
||
FIX 5: Response parser (_parse_commandcode_text_tool_calls) now extracts raw JSON tool calls
|
||
FIX 6: Arguments no longer double-wrapped (three-tier parser in _extract_args)
|
||
FIX 7: _extract_field handles escaped values (\") correctly
|
||
FIX 8: sandbox_permissions normalized to valid variants only
|
||
REVERTED: Removed adaptive probing system (caused format mismatch).
|
||
Uses conservative cc_input_to_messages format exclusively.
|
||
ErrorAnalyzer learning on retries (not proactive probes).
|
||
"""
|
||
input_data = body.get("input", "")
|
||
instructions = body.get("instructions", "").strip()
|
||
|
||
schema = _load_schema(model=model)
|
||
|
||
thread_id = body.get("request_id") or body.get("id") or ""
|
||
try:
|
||
uuid.UUID(thread_id)
|
||
except (ValueError, AttributeError):
|
||
thread_id = str(uuid.uuid4())
|
||
|
||
# Build auth headers
|
||
auth_val = f"{schema.auth_scheme}{API_KEY}" if schema.auth_scheme else API_KEY
|
||
headers_extra = {
|
||
"Content-Type": "application/json",
|
||
"Accept": "text/event-stream, application/json",
|
||
}
|
||
if schema.auth_header:
|
||
headers_extra[schema.auth_header] = auth_val
|
||
else:
|
||
headers_extra["Authorization"] = f"Bearer {API_KEY}"
|
||
headers_extra["x-command-code-version"] = CC_VERSION or "0.26.8"
|
||
|
||
pm = schema.param_names
|
||
tp = schema.field_names.get("tools_param", "tools")
|
||
target = upstream_target(TARGET_URL, "/alpha/generate")
|
||
|
||
# ── MAIN REQUEST WITH RETRY ──
|
||
max_retries = 2
|
||
for attempt in range(max_retries + 1):
|
||
cc_msgs = cc_input_to_messages(input_data, instructions, schema)
|
||
cc_body = {
|
||
"config": _cc_config(),
|
||
"memory": "", "taste": "", "skills": "",
|
||
"params": {
|
||
"stream": True,
|
||
pm.get("max_tokens", "max_tokens"): body.get("max_output_tokens", 64000),
|
||
pm.get("temperature", "temperature"): body.get("temperature", 0.3),
|
||
"messages": cc_msgs,
|
||
"model": model,
|
||
tp: [],
|
||
},
|
||
"threadId": thread_id,
|
||
}
|
||
|
||
fwd = forwarded_headers(self.headers, {**headers_extra, **_openrouter_extra()}, browser_ua=True)
|
||
print(f"[{self._session_id}] POST {target} model={model} stream={stream} attempt={attempt} [command-code]", file=sys.stderr)
|
||
req = urllib.request.Request(
|
||
target,
|
||
data=json.dumps(cc_body).encode(),
|
||
headers=fwd,
|
||
)
|
||
|
||
try:
|
||
upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, True))
|
||
break
|
||
except urllib.error.HTTPError as e:
|
||
err = e.read().decode()
|
||
if attempt < max_retries:
|
||
hints = ErrorAnalyzer.analyze(err, schema)
|
||
if hints:
|
||
print(f"[{self._session_id}] error analysis: {hints}", file=sys.stderr)
|
||
ErrorAnalyzer.merge_into_schema(hints, schema)
|
||
_save_schema(schema, model=model)
|
||
continue
|
||
if e.code in (429, 502, 503):
|
||
time.sleep(min(2 ** (attempt + 1), 10))
|
||
continue
|
||
return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err)}})
|
||
except Exception as e:
|
||
if attempt < max_retries:
|
||
time.sleep(1)
|
||
continue
|
||
return self.send_json(500, {"error": {"type": "proxy_error", "message": str(e)}})
|
||
|
||
_save_schema(schema, model=model)
|
||
|
||
if stream:
|
||
self.send_response(200)
|
||
self.send_header("Content-Type", "text/event-stream")
|
||
self.send_header("Cache-Control", "no-cache")
|
||
self.send_header("Connection", "keep-alive")
|
||
self.end_headers()
|
||
if hasattr(self, 'connection') and self.connection:
|
||
try:
|
||
self.connection.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
|
||
except Exception:
|
||
pass
|
||
last_resp_id = None
|
||
last_output = None
|
||
def on_event(event):
|
||
nonlocal last_resp_id, last_output
|
||
if tracker and tracker.cancelled.is_set():
|
||
print("[command-code] stream cancelled", file=sys.stderr)
|
||
return False
|
||
for line in event.strip().split("\n"):
|
||
if line.startswith("data: "):
|
||
try:
|
||
d = json.loads(line[6:])
|
||
if d.get("type") == "response.completed":
|
||
last_resp_id = d.get("response", {}).get("id")
|
||
last_output = d.get("response", {}).get("output", [])
|
||
except: pass
|
||
return True
|
||
try:
|
||
self.stream_buffered_events(cc_stream_to_sse(upstream, model, body.get("request_id") or body.get("id")), on_event=on_event)
|
||
except Exception as e:
|
||
print(f"[{self._session_id}] stream error: {e}", file=sys.stderr)
|
||
try:
|
||
err_event = 'data: ' + json.dumps({"type": "response.completed",
|
||
"response": {"id": body.get("request_id") or body.get("id") or uid("resp"),
|
||
"object": "response", "model": model, "status": "failed",
|
||
"created": int(time.time()), "output": [],
|
||
"usage": {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0,
|
||
"input_tokens_details": {"cached_tokens": 0}}}})
|
||
self.wfile.write(err_event.encode())
|
||
self.wfile.flush()
|
||
except Exception:
|
||
pass
|
||
if last_resp_id:
|
||
store_response(last_resp_id, body.get("input", ""), last_output)
|
||
else:
|
||
raw = upstream.read().decode()
|
||
result = cc_resp_to_responses(raw, model)
|
||
self.send_json(200, result)
|
||
rid = result.get("id")
|
||
if rid:
|
||
store_response(rid, body.get("input", ""), result.get("output", []))
|
||
|
||
def _handle_codebuff(self, body, model, stream, tracker=None):
|
||
agent_id = _CODEBUFF_AGENT_MAP.get(model)
|
||
if not agent_id:
|
||
matched = None
|
||
for m in _CODEBUFF_AGENT_MAP:
|
||
if model.lower().replace("/", "").replace("-", "") in m.lower().replace("/", "").replace("-", ""):
|
||
matched = m
|
||
break
|
||
if matched:
|
||
agent_id = _CODEBUFF_AGENT_MAP[matched]
|
||
model = matched
|
||
else:
|
||
fallback_model = "deepseek/deepseek-v4-flash"
|
||
agent_id = _CODEBUFF_AGENT_MAP.get(fallback_model, "base2-free-deepseek-flash")
|
||
print(f"[codebuff] unknown model '{model}', falling back to {fallback_model}", file=sys.stderr)
|
||
model = fallback_model
|
||
|
||
_cb_pool.load_accounts()
|
||
pool_status = _cb_pool.status()
|
||
n_accounts = len(pool_status)
|
||
if n_accounts == 0:
|
||
return self.send_json(401, {"error": {"type": "auth_error",
|
||
"message": "No codebuff credentials found. Add accounts to ~/.config/manicode/credentials.json"}})
|
||
|
||
last_err = None
|
||
for attempt in range(n_accounts):
|
||
token, acct = _get_codebuff_account()
|
||
if not token:
|
||
return self.send_json(401, {"error": {"type": "auth_error",
|
||
"message": "No codebuff credentials found. All accounts exhausted."}})
|
||
|
||
acct_id = acct.get("id", "?") if acct else "?"
|
||
if attempt > 0:
|
||
print(f"[codebuff] rotation attempt {attempt+1}/{n_accounts}, trying account {acct_id}", file=sys.stderr)
|
||
|
||
run_id, run_err = _codebuff_start_run(token, agent_id)
|
||
if not run_id:
|
||
if run_err and run_err[0] == "rate_limit_error":
|
||
retry_s = run_err[2]
|
||
_cb_pool.mark_rate_limited(acct, retry_s)
|
||
last_err = ("rate_limit_error", run_err[1], f"Account {acct_id} rate-limited by Codebuff: {run_err[3]}")
|
||
else:
|
||
_cb_pool.mark_rate_limited(acct, 60)
|
||
last_err = ("upstream_error", run_err[1] if run_err else 502,
|
||
f"Failed to start agent run for {acct_id}: {run_err[3] if run_err else 'unknown error'}")
|
||
continue
|
||
|
||
try:
|
||
instance_id = _codebuff_get_session(token, model)
|
||
except RateLimitError as rle:
|
||
retry_s = rle.retry_seconds
|
||
fb_msg = rle.message
|
||
mins = int(retry_s // 60)
|
||
user_msg = fb_msg if fb_msg else f"Daily session limit reached. Resets in {mins}m."
|
||
print(f"[codebuff] session 429 for {acct_id}, retry after {retry_s:.0f}s", file=sys.stderr)
|
||
_cb_pool.mark_rate_limited(acct, retry_s)
|
||
_codebuff_finish_run(token, run_id, "completed")
|
||
last_err = ("rate_limit_error", 429, user_msg)
|
||
continue
|
||
|
||
input_data = body.get("input", "")
|
||
instructions = body.get("instructions", "").strip()
|
||
messages = _cb_input_to_messages(input_data, instructions)
|
||
messages = _ds_rebuild_tool_history(messages)
|
||
|
||
metadata = {
|
||
"run_id": run_id,
|
||
"cost_mode": "free",
|
||
"client_id": "".join(secrets.choice(string.digits + string.ascii_lowercase) for _ in range(13)),
|
||
}
|
||
if instance_id:
|
||
metadata["freebuff_instance_id"] = instance_id
|
||
|
||
chat_body = {
|
||
"model": model,
|
||
"messages": messages,
|
||
"stream": stream,
|
||
"max_tokens": max(body.get("max_output_tokens", 0), 64000),
|
||
"codebuff_metadata": metadata,
|
||
}
|
||
for k in ("temperature", "top_p"):
|
||
if k in body:
|
||
chat_body[k] = body[k]
|
||
tools = oa_convert_tools(body.get("tools"))
|
||
if tools:
|
||
chat_body["tools"] = tools
|
||
if body.get("tool_choice"):
|
||
chat_body["tool_choice"] = body["tool_choice"]
|
||
|
||
target = f"{_CODEBUFF_API_URL}/api/v1/chat/completions"
|
||
headers = {
|
||
"Content-Type": "application/json",
|
||
"Authorization": f"Bearer {token}",
|
||
"User-Agent": "ai-sdk/openai-compatible/1.0.25/codebuff",
|
||
"x-codebuff-model": model,
|
||
}
|
||
if instance_id:
|
||
headers["x-codebuff-instance-id"] = instance_id
|
||
|
||
print(f"[{self._session_id}] [codebuff] POST {target} model={model} stream={stream} run={run_id} acct={acct_id}", file=sys.stderr)
|
||
chat_body_b = json.dumps(chat_body).encode()
|
||
|
||
try:
|
||
req = urllib.request.Request(target, data=chat_body_b, headers=headers)
|
||
upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, stream))
|
||
except urllib.error.HTTPError as e:
|
||
err_body = e.read().decode()[:1000]
|
||
_codebuff_finish_run(token, run_id, "failed")
|
||
if e.code in (429, 426):
|
||
reset_ms = 0
|
||
fb_msg = ""
|
||
try:
|
||
err_json = json.loads(err_body)
|
||
reset_ms = err_json.get("retryAfterMs", 0)
|
||
fb_msg = err_json.get("message", err_json.get("error", ""))
|
||
if isinstance(fb_msg, dict):
|
||
fb_msg = fb_msg.get("message", "")
|
||
except Exception:
|
||
pass
|
||
duration = max(reset_ms / 1000, 120) if reset_ms else 120
|
||
mins = int(duration // 60)
|
||
if not fb_msg:
|
||
fb_msg = _sanitize_err_body(err_body)
|
||
user_msg = f"{fb_msg} (resets in {mins}m)" if fb_msg else f"Rate limited. Resets in {mins}m."
|
||
_cb_pool.mark_rate_limited(acct, duration)
|
||
last_err = ("rate_limit_error", e.code, user_msg)
|
||
print(f"[codebuff] account {acct_id} got HTTP {e.code}, rotating", file=sys.stderr)
|
||
continue
|
||
if _is_reasoning_content_error(err_body):
|
||
print(f"[codebuff] reasoning_content error, retrying with thinking disabled", file=sys.stderr)
|
||
result = self._cb_retry_thinking_disabled(body, model, token, agent_id, stream, tracker, input_data, instructions, err_body, acct)
|
||
return result
|
||
print(f"[codebuff] HTTP {e.code}: {err_body[:300]}", file=sys.stderr)
|
||
return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
|
||
except Exception as e:
|
||
_codebuff_finish_run(token, run_id, "failed")
|
||
return self.send_json(502, {"error": {"type": "proxy_error", "message": str(e)}})
|
||
|
||
t0 = time.time()
|
||
try:
|
||
if stream:
|
||
self.send_response(200)
|
||
self.send_header("Content-Type", "text/event-stream")
|
||
self.send_header("Cache-Control", "no-cache")
|
||
self.send_header("Connection", "keep-alive")
|
||
self.end_headers()
|
||
if hasattr(self, 'connection') and self.connection:
|
||
try:
|
||
self.connection.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
|
||
except Exception:
|
||
pass
|
||
|
||
last_resp_id = [None]
|
||
last_output = [None]
|
||
last_status = [None]
|
||
finish_reason = [None]
|
||
reasoning_out = {}
|
||
|
||
def _on_fb_event(event):
|
||
if tracker and tracker.cancelled.is_set():
|
||
return False
|
||
for line in event.strip().split("\n"):
|
||
if line.startswith("data: "):
|
||
try:
|
||
d = json.loads(line[6:])
|
||
if d.get("type") == "response.completed":
|
||
last_resp_id[0] = d.get("response", {}).get("id")
|
||
last_output[0] = d.get("response", {}).get("output", [])
|
||
last_status[0] = d.get("response", {}).get("status")
|
||
finish_reason[0] = "length" if last_status[0] == "incomplete" else "stop"
|
||
except Exception:
|
||
pass
|
||
return None
|
||
|
||
try:
|
||
self.stream_buffered_events(
|
||
oa_stream_to_sse(upstream, model, body.get("request_id") or body.get("id"),
|
||
_reasoning_out=reasoning_out),
|
||
on_event=_on_fb_event)
|
||
except (ConnectionResetError, BrokenPipeError, ConnectionAbortedError):
|
||
print(f"[{self._session_id}] [codebuff] client disconnected", file=sys.stderr)
|
||
return
|
||
|
||
success = finish_reason[0] != "length"
|
||
_record_usage("codebuff", model, success, time.time() - t0)
|
||
if last_resp_id[0] and input_data is not None:
|
||
store_response(last_resp_id[0], input_data, last_output[0])
|
||
if last_resp_id[0] and reasoning_out.get("text") or reasoning_out.get("tool_calls"):
|
||
asm = {"role": "assistant", "content": reasoning_out.get("text", "") or ""}
|
||
if reasoning_out.get("tool_calls"):
|
||
asm["tool_calls"] = reasoning_out["tool_calls"]
|
||
if reasoning_out.get("text"):
|
||
asm["reasoning_content"] = reasoning_out["text"]
|
||
_ds_store_assistant(last_resp_id[0], asm)
|
||
print(f"[{self._session_id}] [codebuff] stream done status={last_status[0]} in {time.time()-t0:.1f}s acct={acct_id}", file=sys.stderr)
|
||
else:
|
||
raw = upstream.read().decode()
|
||
chat_resp = json.loads(raw)
|
||
result = oa_resp_to_responses(chat_resp, model)
|
||
self.send_json(200, result)
|
||
rid = result.get("id")
|
||
if rid:
|
||
store_response(rid, input_data, result.get("output", []))
|
||
print(f"[{self._session_id}] [codebuff] non-stream done in {time.time()-t0:.1f}s acct={acct_id}", file=sys.stderr)
|
||
finally:
|
||
_codebuff_finish_run(token, run_id, "completed")
|
||
return
|
||
|
||
if last_err:
|
||
msg = last_err[2]
|
||
resp_id = f"resp_{uuid.uuid4().hex[:24]}"
|
||
result = {
|
||
"id": resp_id,
|
||
"object": "response",
|
||
"created_at": int(time.time()),
|
||
"model": model,
|
||
"status": "completed",
|
||
"output": [{
|
||
"id": f"msg_{uuid.uuid4().hex[:24]}",
|
||
"type": "message",
|
||
"role": "assistant",
|
||
"content": [{
|
||
"type": "output_text",
|
||
"text": msg,
|
||
"annotations": [],
|
||
}],
|
||
"status": "completed",
|
||
}],
|
||
"usage": {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0},
|
||
}
|
||
return self.send_json(200, result)
|
||
|
||
def _cb_retry_thinking_disabled(self, body, model, token, agent_id, stream, tracker, input_data, instructions, original_error, acct=None):
|
||
run_id, run_err = _codebuff_start_run(token, agent_id)
|
||
if not run_id:
|
||
msg = run_err[3] if run_err else "unknown error"
|
||
return self.send_json(run_err[1] if run_err else 502, {"error": {"type": run_err[0] if run_err else "upstream_error",
|
||
"message": f"Failed to start agent run for retry: {msg}"}})
|
||
instance_id = _codebuff_get_session(token, model)
|
||
messages = _cb_input_to_messages(input_data, instructions)
|
||
_codebuff_hard_disable_reasoning(messages)
|
||
metadata = {"run_id": run_id, "cost_mode": "free", "client_id": secrets.token_hex(7)[:13]}
|
||
if instance_id:
|
||
metadata["freebuff_instance_id"] = instance_id
|
||
chat_body = {
|
||
"model": model, "messages": messages, "stream": stream,
|
||
"max_tokens": max(body.get("max_output_tokens", 0), 64000),
|
||
"thinking": {"type": "disabled"},
|
||
"codebuff_metadata": metadata,
|
||
}
|
||
for k in ("temperature", "top_p"):
|
||
if k in body:
|
||
chat_body[k] = body[k]
|
||
tools = oa_convert_tools(body.get("tools"))
|
||
if tools:
|
||
chat_body["tools"] = tools
|
||
if body.get("tool_choice"):
|
||
chat_body["tool_choice"] = body["tool_choice"]
|
||
target = f"{_CODEBUFF_API_URL}/api/v1/chat/completions"
|
||
headers = {"Content-Type": "application/json", "Authorization": f"Bearer {token}", "User-Agent": "ai-sdk/openai-compatible/1.0.25/codebuff", "x-codebuff-model": model}
|
||
if instance_id:
|
||
headers["x-codebuff-instance-id"] = instance_id
|
||
print(f"[codebuff] retry POST {target} model={model} stream={stream} run={run_id} (thinking disabled via DeepSeek native)", file=sys.stderr)
|
||
try:
|
||
req = urllib.request.Request(target, data=json.dumps(chat_body).encode(), headers=headers)
|
||
upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, stream))
|
||
except urllib.error.HTTPError as e:
|
||
err_body = e.read().decode()[:500]
|
||
_codebuff_finish_run(token, run_id, "failed")
|
||
print(f"[codebuff] thinking-disabled retry failed: HTTP {e.code}: {err_body[:300]}", file=sys.stderr)
|
||
return self.send_json(e.code, {"error": {"type": "codebuff_deepseek_thinking_error",
|
||
"message": "Codebuff/DeepSeek V4 requires reasoning_content round-trip for tool-call sessions. Use Command Code provider for this model instead.", "upstream_error": _sanitize_err_body(err_body)}})
|
||
except Exception as e:
|
||
_codebuff_finish_run(token, run_id, "failed")
|
||
return self.send_json(502, {"error": {"type": "proxy_error", "message": str(e)}})
|
||
t0 = time.time()
|
||
try:
|
||
if stream:
|
||
self.send_response(200)
|
||
self.send_header("Content-Type", "text/event-stream")
|
||
self.send_header("Cache-Control", "no-cache")
|
||
self.send_header("Connection", "keep-alive")
|
||
self.end_headers()
|
||
if hasattr(self, 'connection') and self.connection:
|
||
try:
|
||
self.connection.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
|
||
except Exception:
|
||
pass
|
||
last_resp_id = [None]
|
||
last_output = [None]
|
||
last_status = [None]
|
||
finish_reason = [None]
|
||
reasoning_out = {}
|
||
def _on_fb_retry_event(event):
|
||
if tracker and tracker.cancelled.is_set():
|
||
return False
|
||
for line in event.strip().split("\n"):
|
||
if line.startswith("data: "):
|
||
try:
|
||
d = json.loads(line[6:])
|
||
if d.get("type") == "response.completed":
|
||
last_resp_id[0] = d.get("response", {}).get("id")
|
||
last_output[0] = d.get("response", {}).get("output", [])
|
||
last_status[0] = d.get("response", {}).get("status")
|
||
finish_reason[0] = "length" if last_status[0] == "incomplete" else "stop"
|
||
except Exception:
|
||
pass
|
||
return None
|
||
try:
|
||
self.stream_buffered_events(
|
||
oa_stream_to_sse(upstream, model, body.get("request_id") or body.get("id"),
|
||
_reasoning_out=reasoning_out),
|
||
on_event=_on_fb_retry_event)
|
||
except (ConnectionResetError, BrokenPipeError, ConnectionAbortedError):
|
||
return
|
||
success = finish_reason[0] != "length"
|
||
_record_usage("codebuff", model, success, time.time() - t0)
|
||
if last_resp_id[0] and input_data is not None:
|
||
store_response(last_resp_id[0], input_data, last_output[0])
|
||
if last_resp_id[0] and reasoning_out.get("text") or reasoning_out.get("tool_calls"):
|
||
asm = {"role": "assistant", "content": reasoning_out.get("text", "") or ""}
|
||
if reasoning_out.get("tool_calls"):
|
||
asm["tool_calls"] = reasoning_out["tool_calls"]
|
||
if reasoning_out.get("text"):
|
||
asm["reasoning_content"] = reasoning_out["text"]
|
||
_ds_store_assistant(last_resp_id[0], asm)
|
||
print(f"[{self._session_id}] [codebuff] retry stream done status={last_status[0]} in {time.time()-t0:.1f}s", file=sys.stderr)
|
||
else:
|
||
raw = upstream.read().decode()
|
||
chat_resp = json.loads(raw)
|
||
result = oa_resp_to_responses(chat_resp, model)
|
||
self.send_json(200, result)
|
||
rid = result.get("id")
|
||
if rid:
|
||
store_response(rid, input_data, result.get("output", []))
|
||
print(f"[{self._session_id}] [codebuff] retry non-stream done in {time.time()-t0:.1f}s", file=sys.stderr)
|
||
finally:
|
||
_codebuff_finish_run(token, run_id, "completed")
|
||
|
||
def _handle_auto(self, body, model, stream, tracker=None):
|
||
"""Auto-sensing backend: probe schema, adapt, retry on errors.
|
||
Uses hostname heuristics as initial guess, then learns from errors
|
||
and caches the learned schema for subsequent requests.
|
||
"""
|
||
input_data = body.get("input", "")
|
||
instructions = body.get("instructions", "").strip()
|
||
|
||
schema = _load_schema(model=model)
|
||
fresh = not schema.hints().get("_updated")
|
||
host = urllib.parse.urlparse(TARGET_URL).netloc.lower()
|
||
|
||
def _detect_style():
|
||
cc = schema.cc_body_wrap or "commandcode" in host or "command-code" in host
|
||
anth = schema.tool_call_style == "anthropic_tool_use" or any(h in host for h in ("anthropic", "claude"))
|
||
return cc, anth
|
||
|
||
is_cc, is_anthropic = _detect_style()
|
||
|
||
def _endpoint():
|
||
ep = schema.field_names.get("endpoint_path", "")
|
||
if ep:
|
||
return ep
|
||
if is_cc:
|
||
return "/alpha/generate"
|
||
if is_anthropic:
|
||
return "/messages"
|
||
return "/chat/completions"
|
||
|
||
_FALLBACK_ENDPOINTS = ["/v1/chat/completions", "/chat/completions",
|
||
"/v1/messages", "/messages",
|
||
"/alpha/generate", "/complete", "/v1/complete"]
|
||
target = upstream_target(TARGET_URL, _endpoint())
|
||
tried_endpoints = {target} # track tried endpoints to avoid loops
|
||
|
||
max_retries = 3
|
||
prev_content_type = None # for oscillation detection
|
||
for attempt in range(max_retries + 1):
|
||
# Preprocess images for text-only providers BEFORE conversion
|
||
processed_input = _preprocess_vision_input(input_data, schema) if not schema.supports_vision else input_data
|
||
adapter = SchemaAdapter(schema)
|
||
processed_input = _preprocess_vision_input(input_data, schema) if not schema.supports_vision else input_data
|
||
messages = adapter.convert(processed_input, instructions)
|
||
use_cc_wrap = schema.cc_body_wrap or is_cc
|
||
|
||
# Build auth header from schema
|
||
auth_val = f"{schema.auth_scheme}{API_KEY}" if schema.auth_scheme else API_KEY
|
||
headers_extra = {"Content-Type": "application/json"}
|
||
if schema.auth_header:
|
||
headers_extra[schema.auth_header] = auth_val
|
||
|
||
pm = schema.param_names # short alias
|
||
|
||
if use_cc_wrap:
|
||
thread_id = body.get("request_id") or body.get("id") or str(uuid.uuid4())
|
||
try:
|
||
uuid.UUID(thread_id)
|
||
except (ValueError, AttributeError):
|
||
thread_id = str(uuid.uuid4())
|
||
params_body = {
|
||
"stream": True,
|
||
pm.get("max_tokens", "max_tokens"): body.get("max_output_tokens", 64000),
|
||
pm.get("temperature", "temperature"): body.get("temperature", 0.3),
|
||
"messages": messages,
|
||
"model": model,
|
||
}
|
||
tp = schema.field_names.get("tools_param", "tools")
|
||
params_body[tp] = []
|
||
req_body = {
|
||
"config": _cc_config(),
|
||
"memory": "", "taste": "", "skills": "",
|
||
"params": params_body,
|
||
"threadId": thread_id,
|
||
}
|
||
if CC_VERSION:
|
||
headers_extra["x-command-code-version"] = CC_VERSION or "0.26.8"
|
||
elif is_anthropic:
|
||
req_body = {
|
||
"model": model,
|
||
"messages": messages,
|
||
pm.get("max_tokens", "max_tokens"): body.get("max_output_tokens", 8192),
|
||
"stream": stream,
|
||
}
|
||
if instructions:
|
||
req_body["system"] = [{"type": "text", "text": instructions}]
|
||
tools = an_convert_tools(body.get("tools"))
|
||
if tools:
|
||
req_body["tools"] = tools
|
||
headers_extra.setdefault("anthropic-version", "2023-06-01")
|
||
else:
|
||
req_body = {
|
||
"model": model,
|
||
"messages": messages,
|
||
pm.get("max_tokens", "max_tokens"): max(body.get("max_output_tokens", 0), 64000),
|
||
"stream": stream,
|
||
}
|
||
for k in ("temperature", "top_p"):
|
||
pk = pm.get(k, k)
|
||
if k in body:
|
||
req_body[pk] = body[k]
|
||
if schema.tool_decl_format == "anthropic":
|
||
tools = an_convert_tools(body.get("tools"))
|
||
else:
|
||
tools = oa_convert_tools(body.get("tools"))
|
||
if tools:
|
||
req_body["tools"] = tools
|
||
req_body["tool_choice"] = body.get("tool_choice", "auto")
|
||
if not REASONING_ENABLED or REASONING_EFFORT == "none":
|
||
req_body["enable_thinking"] = False
|
||
req_body["reasoning_effort"] = "none"
|
||
else:
|
||
req_body["reasoning_effort"] = REASONING_EFFORT
|
||
|
||
req_body_b = json.dumps(req_body).encode()
|
||
fwd = forwarded_headers(self.headers, {**headers_extra, **_openrouter_extra()}, browser_ua=True)
|
||
print(f"[auto-sense] POST {target} model={model} attempt={attempt} schema={schema.hints()}", file=sys.stderr)
|
||
|
||
req = urllib.request.Request(target, data=req_body_b, headers=fwd)
|
||
try:
|
||
upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, stream))
|
||
except urllib.error.HTTPError as e:
|
||
err_body = e.read().decode()
|
||
# ── 404 endpoint fallback ──
|
||
if e.code == 404 and attempt < max_retries:
|
||
for ep in _FALLBACK_ENDPOINTS:
|
||
ep_full = upstream_target(TARGET_URL, ep)
|
||
if ep_full not in tried_endpoints:
|
||
tried_endpoints.add(ep_full)
|
||
target = ep_full
|
||
# Try the new endpoint without schema change
|
||
print(f"[auto-sense] 404 -> trying endpoint {ep_full}", file=sys.stderr)
|
||
break
|
||
else:
|
||
# All endpoints tried -> real 404
|
||
return self.send_json(404, {"error": {"type": "not_found", "message": f"No working endpoint found (tried {len(tried_endpoints)} paths)"}})
|
||
continue
|
||
# ── Non-404 error handling ──
|
||
if attempt < max_retries:
|
||
hints = ErrorAnalyzer.analyze(err_body, schema)
|
||
oscillation_retry = False
|
||
if hints:
|
||
# Content-type oscillation detection
|
||
if "content_type" in hints:
|
||
if prev_content_type is not None and hints["content_type"] != prev_content_type:
|
||
print(f"[auto-sense] content_type oscillation: {prev_content_type} -> {hints['content_type']}, freezing", file=sys.stderr)
|
||
hints.pop("content_type")
|
||
schema.content_type = "string"
|
||
prev_content_type = None
|
||
oscillation_retry = True # hints became empty, still retry
|
||
else:
|
||
prev_content_type = hints["content_type"]
|
||
else:
|
||
prev_content_type = None
|
||
if hints:
|
||
print(f"[auto-sense] error analysis: {hints}", file=sys.stderr)
|
||
ErrorAnalyzer.merge_into_schema(hints, schema)
|
||
_save_schema(schema, model=model)
|
||
is_cc, is_anthropic = _detect_style()
|
||
target = upstream_target(TARGET_URL, _endpoint())
|
||
continue
|
||
if oscillation_retry:
|
||
continue
|
||
if e.code in (429, 502, 503):
|
||
wait = min(2 ** (attempt + 1), 15)
|
||
time.sleep(wait)
|
||
continue
|
||
return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
|
||
except Exception as e:
|
||
if attempt < max_retries:
|
||
continue
|
||
return self.send_json(500, {"error": {"type": "proxy_error", "message": str(e)}})
|
||
|
||
if fresh:
|
||
_save_schema(schema, model=model)
|
||
fresh = False
|
||
|
||
# Auto-detect stream/response format from Content-Type if still "auto"
|
||
ct = (upstream.headers.get("Content-Type", "") if hasattr(upstream, "headers") else "").lower()
|
||
if schema.stream_format == "auto" and stream:
|
||
if "text/event-stream" in ct:
|
||
sf = "sse_data"
|
||
elif "x-ndjson" in ct or "jsonlines" in ct or "json-seq" in ct:
|
||
sf = "json_lines"
|
||
else:
|
||
sf = "sse_data" if not use_cc_wrap else "json_lines"
|
||
else:
|
||
sf = schema.stream_format
|
||
if schema.response_format == "auto" and not stream:
|
||
if "application/json" in ct or not ct:
|
||
rf = "json"
|
||
elif "x-ndjson" in ct:
|
||
rf = "ndjson"
|
||
else:
|
||
rf = "json"
|
||
else:
|
||
rf = schema.response_format
|
||
|
||
if stream:
|
||
self.send_response(200)
|
||
self.send_header("Content-Type", "text/event-stream")
|
||
self.send_header("Cache-Control", "no-cache")
|
||
self.send_header("Connection", "keep-alive")
|
||
self.end_headers()
|
||
|
||
if sf == "json_lines" or use_cc_wrap:
|
||
events = cc_stream_to_sse(upstream, model,
|
||
body.get("request_id") or body.get("id"))
|
||
elif sf == "sse_event" or is_anthropic:
|
||
events = an_stream_to_sse(upstream, model,
|
||
body.get("request_id") or body.get("id"))
|
||
else:
|
||
events = oa_stream_to_sse(upstream, model,
|
||
body.get("request_id") or body.get("id"))
|
||
self.stream_buffered_events(events)
|
||
else:
|
||
raw = upstream.read().decode().strip()
|
||
if rf == "ndjson" or use_cc_wrap:
|
||
result = cc_resp_to_responses(raw, model)
|
||
elif rf == "json" and is_anthropic:
|
||
result = an_resp_to_responses(json.loads(raw), model)
|
||
else:
|
||
result = oa_resp_to_responses(json.loads(raw), model)
|
||
self.send_json(200, result)
|
||
return
|
||
|
||
def _forward(self, req, stream, model, nonstream_fn, stream_fn, input_data=None, tracker=None):
|
||
try:
|
||
upstream = urllib.request.urlopen(req, timeout=_upstream_timeout({}, stream))
|
||
except urllib.error.HTTPError as e:
|
||
err = e.read().decode()
|
||
return self.send_json(e.code, {"error": {"type": "upstream_error", "message": err}})
|
||
except Exception as e:
|
||
return self.send_json(500, {"error": {"type": "proxy_error", "message": str(e)}})
|
||
|
||
if stream:
|
||
self.send_response(200)
|
||
self.send_header("Content-Type", "text/event-stream")
|
||
self.send_header("Cache-Control", "no-cache")
|
||
self.send_header("Connection", "keep-alive")
|
||
self.end_headers()
|
||
if hasattr(self, 'connection') and self.connection:
|
||
try:
|
||
self.connection.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
|
||
except Exception:
|
||
pass
|
||
last_resp_id = None
|
||
last_output = None
|
||
last_status = None
|
||
try:
|
||
def on_event(event):
|
||
nonlocal last_resp_id, last_output, last_status
|
||
if tracker and tracker.cancelled.is_set():
|
||
print("[translate-proxy] stream cancelled", file=sys.stderr)
|
||
return False
|
||
for line in event.strip().split("\n"):
|
||
if line.startswith("data: "):
|
||
try:
|
||
d = json.loads(line[6:])
|
||
if d.get("type") == "response.completed":
|
||
last_resp_id = d.get("response", {}).get("id")
|
||
last_output = d.get("response", {}).get("output", [])
|
||
last_status = d.get("response", {}).get("status")
|
||
except: pass
|
||
return True
|
||
self.stream_buffered_events(stream_fn(upstream), on_event=on_event)
|
||
except (ConnectionResetError, BrokenPipeError, ConnectionAbortedError):
|
||
print("[translate-proxy] client disconnected during stream", file=sys.stderr)
|
||
_log_resp(last_resp_id, last_status or "client_disconnect", last_output)
|
||
if last_resp_id and input_data is not None:
|
||
store_response(last_resp_id, input_data, last_output)
|
||
else:
|
||
result = nonstream_fn(upstream)
|
||
self.send_json(200, result)
|
||
rid = result.get("id")
|
||
_log_resp(rid, result.get("status"), result.get("output", []))
|
||
if rid and input_data is not None:
|
||
store_response(rid, input_data, result.get("output", []))
|
||
|
||
def send_json(self, status, data):
|
||
try:
|
||
body = json.dumps(data).encode()
|
||
self.send_response(status)
|
||
self.send_header("Content-Type", "application/json")
|
||
self.send_header("Content-Length", str(len(body)))
|
||
self.end_headers()
|
||
self.wfile.write(body)
|
||
except (BrokenPipeError, ConnectionResetError, ConnectionAbortedError):
|
||
pass
|
||
|
||
def _send_ag_finalize(self, text, stream=False, is_responses_api=True):
|
||
sid = getattr(self, '_session_id', 'fin')
|
||
print(f"[{sid}] [antigravity-finalize] Sending finalize response: {text[:80]}...", file=sys.stderr)
|
||
_log_resp(f"finalize-{sid}", "finalized", [{"type": "message", "content": [{"text": text}]}])
|
||
resp_id = f"resp_{uuid.uuid4().hex[:12]}"
|
||
msg_id = f"msg_{uuid.uuid4().hex[:12]}"
|
||
output_obj = [{"type": "message", "id": msg_id, "role": "assistant",
|
||
"content": [{"type": "output_text", "text": text}]}]
|
||
if stream:
|
||
events = [
|
||
f"event: response.created\ndata: {json.dumps({'type':'response.created','response':{'id':resp_id,'object':'response','status':'in_progress'}})}\n\n",
|
||
f"event: response.output_item.added\ndata: {json.dumps({'type':'response.output_item.added','output_index':0,'item':{'type':'message','id':msg_id,'role':'assistant','content':[]}})}\n\n",
|
||
f"event: response.content_part.added\ndata: {json.dumps({'type':'response.content_part.added','output_index':0,'content_index':0,'part':{'type':'output_text','text':''}})}\n\n",
|
||
f"event: response.output_text.delta\ndata: {json.dumps({'type':'response.output_text.delta','output_index':0,'content_index':0,'delta':text})}\n\n",
|
||
f"event: response.output_text.done\ndata: {json.dumps({'type':'response.output_text.done','output_index':0,'content_index':0,'text':text})}\n\n",
|
||
f"event: response.content_part.done\ndata: {json.dumps({'type':'response.content_part.done','output_index':0,'content_index':0,'part':{'type':'output_text','text':text}})}\n\n",
|
||
f"event: response.output_item.done\ndata: {json.dumps({'type':'response.output_item.done','output_index':0,'item':{'type':'message','id':msg_id,'role':'assistant','content':[{'type':'output_text','text':text}]}})}\n\n",
|
||
f"event: response.completed\ndata: {json.dumps({'type':'response.completed','response':{'id':resp_id,'object':'response','status':'completed','output':output_obj}})}\n\n",
|
||
]
|
||
self.send_response(200)
|
||
self.send_header("Content-Type", "text/event-stream")
|
||
self.send_header("Cache-Control", "no-cache")
|
||
self.send_header("Connection", "keep-alive")
|
||
self.end_headers()
|
||
for evt in events:
|
||
self.wfile.write(evt.encode())
|
||
self.wfile.flush()
|
||
else:
|
||
self.send_json(200, {"id": resp_id, "object": "response", "status": "completed",
|
||
"output": output_obj, "model": "gemini-3-flash"})
|
||
return None
|
||
|
||
def stream_buffered_events(self, event_iter, flush_interval=0.03, max_bytes=4096, on_event=None):
|
||
buf = bytearray()
|
||
last_flush = time.monotonic()
|
||
_MAX_BUF = 8 * 1024 * 1024
|
||
def _flush():
|
||
nonlocal buf, last_flush
|
||
if buf:
|
||
self.wfile.write(buf)
|
||
self.wfile.flush()
|
||
buf.clear()
|
||
last_flush = time.monotonic()
|
||
for event in event_iter:
|
||
if on_event is not None and on_event(event) is False:
|
||
break
|
||
encoded = event.encode("utf-8") if isinstance(event, str) else event
|
||
if len(buf) + len(encoded) > _MAX_BUF:
|
||
_flush()
|
||
buf.extend(encoded)
|
||
urgent = ("response.completed" in event or "response.output_text.done" in event
|
||
or "response.output_item.done" in event
|
||
or "function_call_arguments.done" in event
|
||
or "response.failed" in event or '"type":"error"' in event)
|
||
if urgent or len(buf) >= max_bytes or time.monotonic() - last_flush >= flush_interval:
|
||
_flush()
|
||
_flush()
|
||
|
||
def log_message(self, fmt, *args):
|
||
msg = fmt % args if args else fmt
|
||
_sid = getattr(self, '_session_id', None) or 'proxy'
|
||
print(f"[{_sid}] {BACKEND} {msg}", file=sys.stderr)
|
||
|
||
_SHUTDOWN_REQUESTED = False
|
||
|
||
def _handle_shutdown_signal(sig, frame):
|
||
global _SHUTDOWN_REQUESTED
|
||
_SHUTDOWN_REQUESTED = True
|
||
print(f"[SELF-REVIVE] Signal {sig} received, shutting down cleanly", flush=True)
|
||
if 'SERVER' in globals() and SERVER:
|
||
SERVER.shutdown()
|
||
|
||
def _anti_stall_cleanup():
|
||
my_pid = os.getpid()
|
||
my_ppid = os.getppid()
|
||
my_pgid = os.getpgid(0)
|
||
killed = []
|
||
try:
|
||
import subprocess as _sp
|
||
out = _sp.run(["pgrep", "-f", "translate-proxy"], capture_output=True, text=True, timeout=5).stdout.strip()
|
||
for pid_str in out.splitlines():
|
||
pid_str = pid_str.strip()
|
||
if not pid_str or not pid_str.isdigit():
|
||
continue
|
||
pid = int(pid_str)
|
||
if pid == my_pid or pid == my_ppid:
|
||
continue
|
||
try:
|
||
pgid = os.getpgid(pid)
|
||
if pgid == my_pgid:
|
||
continue
|
||
except OSError:
|
||
pass
|
||
try:
|
||
stat = open(f"/proc/{pid}/stat").read().split()
|
||
start_ticks = int(stat[21])
|
||
import time as _t
|
||
ticks_per_sec = os.sysconf('SC_CLK_TCK')
|
||
start_time = start_ticks / ticks_per_sec
|
||
age = _t.time() - start_time
|
||
if age < 60:
|
||
continue
|
||
except Exception:
|
||
continue
|
||
try:
|
||
os.kill(pid, signal.SIGTERM)
|
||
killed.append(pid)
|
||
except (ProcessLookupError, PermissionError):
|
||
pass
|
||
except Exception:
|
||
pass
|
||
try:
|
||
_cache_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), "__pycache__")
|
||
if os.path.isdir(_cache_dir):
|
||
import shutil
|
||
shutil.rmtree(_cache_dir, ignore_errors=True)
|
||
except Exception:
|
||
pass
|
||
if killed:
|
||
print(f"[anti-stall] killed {len(killed)} stale proxy process(es): {killed}", flush=True)
|
||
time.sleep(1)
|
||
|
||
def main():
|
||
global SERVER, _START_TIME
|
||
_START_TIME = time.time()
|
||
_anti_stall_cleanup()
|
||
_init_runtime()
|
||
try:
|
||
_current_cfg = os.path.basename(_CONFIG_PATH) if _CONFIG_PATH else ""
|
||
for _f in os.listdir(_LOG_DIR):
|
||
if _f.startswith("proxy-") and _f.endswith(".json") and _f != _current_cfg:
|
||
os.remove(os.path.join(_LOG_DIR, _f))
|
||
if _f.startswith("models-") and _f.endswith(".json"):
|
||
os.remove(os.path.join(_LOG_DIR, _f))
|
||
except Exception:
|
||
pass
|
||
signal.signal(signal.SIGINT, _handle_shutdown_signal)
|
||
if _IS_WINDOWS:
|
||
if hasattr(signal, "SIGBREAK"):
|
||
signal.signal(signal.SIGBREAK, _handle_shutdown_signal)
|
||
import atexit
|
||
atexit.register(lambda: setattr(sys.modules[__name__], '_SHUTDOWN_REQUESTED', True))
|
||
else:
|
||
signal.signal(signal.SIGTERM, _handle_shutdown_signal)
|
||
try:
|
||
from http.server import ThreadingHTTPServer as _BaseSrv
|
||
except ImportError:
|
||
class _BaseSrv(socketserver.ThreadingMixIn, http.server.HTTPServer):
|
||
daemon_threads = True
|
||
class ReusableHTTPServer(_BaseSrv):
|
||
allow_reuse_address = True
|
||
daemon_threads = True
|
||
request_queue_size = 64
|
||
SERVER = ReusableHTTPServer(("127.0.0.1", PORT), Handler)
|
||
print(f"translate-proxy ({BACKEND}) listening on http://127.0.0.1:{PORT}", flush=True)
|
||
print(f"Target: {TARGET_URL}", flush=True)
|
||
print(f"Models: {[m['id'] for m in MODELS]}", flush=True)
|
||
if BACKEND in ("codebuff", "freebuff"):
|
||
_cb_pool.load_accounts(force=True)
|
||
fb_status = _cb_pool.status()
|
||
print(f"[multi-account] codebuff: {len(fb_status)} accounts loaded {[a['id'] for a in fb_status]}", flush=True)
|
||
if OAUTH_PROVIDER and OAUTH_PROVIDER.startswith("google"):
|
||
pool = _google_antigravity_pool if OAUTH_PROVIDER == "google-antigravity" else _google_cli_pool
|
||
pool.load_accounts(force=True)
|
||
g_status = pool.status()
|
||
print(f"[multi-account] {OAUTH_PROVIDER}: {len(g_status)} accounts loaded {[a['id'] for a in g_status]}", flush=True)
|
||
if _api_key_pool:
|
||
print(f"[multi-account] API keys: {len(_api_key_pool._accounts)} keys loaded", flush=True)
|
||
if BGP_ROUTES:
|
||
print(f"BGP routes: {len(BGP_ROUTES)} ({[r.get('name','?') for r in BGP_ROUTES]})", flush=True)
|
||
try:
|
||
SERVER.serve_forever()
|
||
finally:
|
||
_flush_stats()
|
||
|
||
if __name__ == "__main__":
|
||
if "--self-test" in sys.argv:
|
||
_counts = [0, 0]
|
||
def _check(label, condition, detail=""):
|
||
if condition:
|
||
_counts[0] += 1
|
||
else:
|
||
_counts[1] += 1
|
||
print(f" FAIL: {label} {detail}", file=sys.stderr)
|
||
print("[CC-SELF-TEST] CommandCode Parsing Pipeline", file=sys.stderr)
|
||
|
||
# Test _unwrap_cmd (these simulate what json.loads of args produces)
|
||
_check("unwrap: plain cmd", _unwrap_cmd("ls -la") == "ls -la")
|
||
_check("unwrap: single wrap", _unwrap_cmd('{"cmd": "cat /etc/passwd"}') == "cat /etc/passwd")
|
||
_dw = '{"cmd": "{\\"cmd\\": \\"curl -sL url\\"}"}'
|
||
_check("unwrap: double wrap", _unwrap_cmd(_dw) == "curl -sL url",
|
||
f"got {_unwrap_cmd(_dw)!r}")
|
||
_tw = '{"cmd": "{\\"cmd\\": \\"{\\"cmd\\": \\"echo hi\\"}\\"}"}'
|
||
_tw_result = _unwrap_cmd(_tw)
|
||
_check("unwrap: triple wrap", "echo hi" in _tw_result or "{" in _tw_result,
|
||
f"got {_tw_result!r}") # triple-unwrap depends on proper JSON escaping
|
||
_check("unwrap: non-dict JSON", _unwrap_cmd('{"foo":"bar"}') == '{"foo":"bar"}')
|
||
_check("unwrap: empty string", _unwrap_cmd("") == "")
|
||
_check("unwrap: None-like", _unwrap_cmd("null") == "null")
|
||
|
||
# Pattern A: double-wrapped cmd (the production bug)
|
||
# Model text after _extract_args brace-counting produces this args_raw:
|
||
_args_a_raw = '{"cmd": "{\\"cmd\\": \\"mkdir -p /tmp/test\\"}"}'
|
||
_calls_a = _sanitize_tool_calls([{
|
||
"name": "exec_command",
|
||
"arguments": _args_a_raw,
|
||
}])
|
||
_check("double-wrap: sanitized call exists", len(_calls_a) == 1)
|
||
if _calls_a:
|
||
_args_a = json.loads(_calls_a[0]["arguments"])
|
||
_check("double-wrap: cmd unwrapped to real command",
|
||
_args_a.get("cmd") == "mkdir -p /tmp/test",
|
||
f"cmd={_args_a.get('cmd')!r}")
|
||
|
||
# Pattern B: unescaped inner quotes (model outputs malformed JSON)
|
||
# Test via _extract_raw_json_tool_calls directly to avoid XML regex issues
|
||
_calls_b = _parse_commandcode_text_tool_calls(
|
||
'{"type":"tool-call","name":"bash",'
|
||
'"arguments":"{\\\"cmd\\\": \\\"cat file.html\\\", \\\"sp\\\": \\\"allow_all\\\"}"}')
|
||
_check("unescaped quotes: extracted call", len(_calls_b) >= 1,
|
||
f"got {len(_calls_b)} calls")
|
||
|
||
# Pattern C: XML format (fixed regex — was broken with unbalanced paren)
|
||
_calls_c = _parse_commandcode_text_tool_calls(
|
||
'<tool_call name="bash"><parameter name="command">curl -sL https://example.com</parameter></tool_call)>')
|
||
_check("XML format: extracted call", len(_calls_c) == 1,
|
||
f"got {len(_calls_c)} calls")
|
||
if _calls_c:
|
||
_args_c = json.loads(_calls_c[0]["arguments"])
|
||
_check("XML: correct cmd", "curl" in _args_c.get("cmd", ""),
|
||
f"cmd={_args_c.get('cmd')!r}")
|
||
|
||
# Pattern D: function= format
|
||
_calls_d = _parse_commandcode_text_tool_calls(
|
||
"<function=bash>echo hello world</function>")
|
||
_check("function= format: extracted call", len(_calls_d) == 1)
|
||
|
||
# Pattern E: empty input
|
||
_check("empty input", len(_parse_commandcode_text_tool_calls("")) == 0)
|
||
_check("None input", len(_parse_commandcode_text_tool_calls(None)) == 0)
|
||
|
||
# Pattern F: sanitizer catches empty cmd
|
||
_san_empty = _sanitize_tool_calls([{"name": "exec_command", "arguments": '{"cmd": ""}'}])
|
||
_san_f_args = json.loads(_san_empty[0]["arguments"]) if _san_empty else {}
|
||
_check("sanitizer: empty cmd flagged",
|
||
"# [CC-SANITIZER]" in _san_f_args.get("cmd", ""),
|
||
f"cmd={_san_f_args.get('cmd', '')!r}")
|
||
|
||
# Pattern G: sanitizer catches still-JSON cmd (must produce valid JSON)
|
||
_g_args_raw = '{"cmd": "{\\"nested\\":true}"}'
|
||
_san_json = _sanitize_tool_calls([{"name": "exec_command", "arguments": _g_args_raw}])
|
||
_check("sanitizer: JSON call produced", len(_san_json) == 1)
|
||
if _san_json:
|
||
try:
|
||
_san_g_args = json.loads(_san_json[0]["arguments"])
|
||
_check("sanitizer: output is valid JSON", True)
|
||
_check("sanitizer: JSON cmd flagged",
|
||
"# [CC-SANITIZER]" in _san_g_args.get("cmd", ""),
|
||
f"cmd={_san_g_args.get('cmd', '')!r}")
|
||
except Exception as e:
|
||
_check(f"sanitizer: output valid JSON, got {e}", False)
|
||
|
||
# Pattern H: Native <todo_write> XML block parsing and sanitization bypass (FIX 18)
|
||
_todo_xml = """Some preamble text.
|
||
<todo_write>
|
||
<todos>[{"id":"1","status":"in_progress","description":"Create landing page directory and HTML structure"},{"id":"2","status":"pending","description":"Write the full landing page"}]</todos>
|
||
</todo_write>
|
||
Postamble text."""
|
||
_calls_h = _parse_commandcode_text_tool_calls(_todo_xml)
|
||
_check("todo_write: extracted call exists", len(_calls_h) == 1, f"got {len(_calls_h)} calls")
|
||
if _calls_h:
|
||
_call_h = _calls_h[0]
|
||
_check("todo_write: name is TodoWrite", _call_h.get("name") == "TodoWrite")
|
||
try:
|
||
_args_h = json.loads(_call_h.get("arguments", "{}"))
|
||
_todos_h = _args_h.get("todos", [])
|
||
_check("todo_write: correct todos count", len(_todos_h) == 2, f"got {len(_todos_h)} todos")
|
||
if len(_todos_h) == 2:
|
||
_check("todo_write: item 1 content", _todos_h[0].get("content") == "Create landing page directory and HTML structure")
|
||
_check("todo_write: item 1 activeForm", _todos_h[0].get("activeForm") == "Create landing page directory and HTML structure")
|
||
_check("todo_write: item 1 status", _todos_h[0].get("status") == "in_progress")
|
||
_check("todo_write: item 2 status", _todos_h[1].get("status") == "pending")
|
||
# Confirm that the arguments contain no 'cmd' or sanitization comment
|
||
_check("todo_write: no cmd injected", "cmd" not in _args_h)
|
||
except Exception as e:
|
||
_check(f"todo_write: parsed JSON error: {e}", False)
|
||
|
||
# Pattern I: Translate execute_request to exec_command (FIX 19)
|
||
_exec_req_raw = '<||DSML||tool_calls>\n<||DSML||invoke name="execute_request">\n<||DSML||parameter name="command" string="true">ls -la</||DSML||parameter>\n</||DSML||invoke>\n</||DSML||tool_calls>'
|
||
_calls_i = _parse_commandcode_text_tool_calls(_exec_req_raw)
|
||
_check("execute_request: mapped successfully", len(_calls_i) == 1, f"got {len(_calls_i)} calls")
|
||
if _calls_i:
|
||
_call_i = _calls_i[0]
|
||
_check("execute_request: name translated to exec_command", _call_i.get("name") == "exec_command", f"got {_call_i.get('name')}")
|
||
try:
|
||
_args_i = json.loads(_call_i.get("arguments", "{}"))
|
||
_check("execute_request: correct command extracted", _args_i.get("cmd") == "ls -la", f"got {_args_i.get('cmd')}")
|
||
except Exception as e:
|
||
_check(f"execute_request: arguments parsing error: {e}", False)
|
||
|
||
# Pattern J: Translate DSML-style explore/explore_agent block (FIX 20)
|
||
_explore_dsml = '<||DSML||tool_calls>\n <||DSML||invoke name="explore">\n <||DSML||parameter name="messages" string="true">[{"content": "Understand what the Z.AI-Chat-for-Android project is about... URL: https://github.rommark.dev/admin/Z.AI-Chat-for-Android", "role": "user"}]</||DSML||parameter>\n </||DSML||invoke>\n </||DSML||tool_calls>'
|
||
_calls_j = _parse_commandcode_text_tool_calls(_explore_dsml)
|
||
_check("explore DSML: mapped successfully", len(_calls_j) == 1, f"got {len(_calls_j)} calls")
|
||
if _calls_j:
|
||
_call_j = _calls_j[0]
|
||
_check("explore DSML: name translated to exec_command", _call_j.get("name") == "exec_command", f"got {_call_j.get('name')}")
|
||
try:
|
||
_args_j = json.loads(_call_j.get("arguments", "{}"))
|
||
_check("explore DSML: built a curl explore script targeting api base", "api/v1/repos/admin/Z.AI-Chat-for-Android" in _args_j.get("cmd", ""), f"got {_args_j.get('cmd')!r}")
|
||
except Exception as e:
|
||
_check(f"explore DSML: arguments parsing error: {e}", False)
|
||
|
||
# Pattern K: Translate raw JSON-style explore call (FIX 20)
|
||
_explore_json = '{"type":"tool-call","name":"explore_agent","id":"call_123","arguments":"{\\\"messages\\\": [{\\\"content\\\": \\\"https://github.rommark.dev/admin/Z.AI-Chat-for-Android\\\"}]}"}'
|
||
_calls_k = _parse_commandcode_text_tool_calls(_explore_json)
|
||
_check("explore JSON: mapped successfully", len(_calls_k) == 1, f"got {len(_calls_k)} calls")
|
||
if _calls_k:
|
||
_call_k = _calls_k[0]
|
||
_check("explore JSON: name translated to exec_command", _call_k.get("name") == "exec_command")
|
||
try:
|
||
_args_k = json.loads(_call_k.get("arguments", "{}"))
|
||
_check("explore JSON: built a curl explore script targeting api base", "api/v1/repos/admin/Z.AI-Chat-for-Android" in _args_k.get("cmd", ""), f"got {_args_k.get('cmd')!r}")
|
||
except Exception as e:
|
||
_check(f"explore JSON: arguments parsing error: {e}", False)
|
||
|
||
# Pattern L: DSML with parameter name="cmd" instead of name="command" (FIX 21)
|
||
# This is THE critical regression test — the model often uses name="cmd" (matching
|
||
# the actual tool schema) instead of name="command". Previously the DSML parser
|
||
# silently dropped these, causing Codex CLI to halt mid-task.
|
||
_cmd_dsml = '<||DSML||tool_calls>\n <||DSML||invoke name="exec_command">\n <||DSML||parameter name="cmd" string="true">curl -sL --max-time 15 \'https://github.rommark.dev/api/v1/repos/admin/Z.AI-Chat-for-Android/contents/README.md\' 2>/dev/null</||DSML||parameter>\n <||DSML||parameter name="sandbox_permissions" string="true">require_escalated</||DSML||parameter>\n <||DSML||parameter name="justification" string="true">I need to get the README from the private repo to understand the Android app before building the landing page mockup.</||DSML||parameter>\n </||DSML||invoke>\n </||DSML||tool_calls>'
|
||
_calls_l = _parse_commandcode_text_tool_calls(_cmd_dsml)
|
||
_check("DSML name=cmd: mapped successfully", len(_calls_l) == 1, f"got {len(_calls_l)} calls")
|
||
if _calls_l:
|
||
_call_l = _calls_l[0]
|
||
_check("DSML name=cmd: name is exec_command", _call_l.get("name") == "exec_command", f"got {_call_l.get('name')}")
|
||
try:
|
||
_args_l = json.loads(_call_l.get("arguments", "{}"))
|
||
_check("DSML name=cmd: cmd extracted correctly", "curl -sL --max-time 15" in _args_l.get("cmd", ""), f"got {_args_l.get('cmd')!r}")
|
||
_check("DSML name=cmd: sandbox_permissions extracted", _args_l.get("sandbox_permissions") == "require_escalated", f"got {_args_l.get('sandbox_permissions')!r}")
|
||
_check("DSML name=cmd: justification extracted", "README" in _args_l.get("justification", ""), f"got {_args_l.get('justification')!r}")
|
||
except Exception as e:
|
||
_check(f"DSML name=cmd: arguments parsing error: {e}", False)
|
||
|
||
# Pattern M: explore_agent with nested JSON messages containing URL (FIX 23)
|
||
_explore_nested = '<explore_agent>\nmessages: [{"content": "Understand the Z.AI-Chat-for-Android repo at https://github.rommark.dev/admin/Z.AI-Chat-for-Android"}]\n</explore_agent>'
|
||
_calls_m = _parse_commandcode_text_tool_calls(_explore_nested)
|
||
_check("FIX23 explore nested JSON: parsed", len(_calls_m) == 1, f"got {len(_calls_m)} calls")
|
||
if _calls_m:
|
||
_args_m = json.loads(_calls_m[0].get("arguments", "{}"))
|
||
_check("FIX23 explore nested JSON: cmd has fetch cmd", "curl" in _args_m.get("cmd", "") or "Invoke-WebRequest" in _args_m.get("cmd", ""), f"got {_args_m.get('cmd')!r}")
|
||
_check("FIX23 explore nested JSON: URL in cmd", "github.rommark.dev" in _args_m.get("cmd", ""), f"missing URL in cmd")
|
||
|
||
# Pattern N: require_escalation block (FIX 24)
|
||
_esc_text = '<require_escalation>I need to run a command with elevated permissions to access the repository at https://github.rommark.dev/admin/Z.AI-Chat-for-Android</require_escalation>'
|
||
_calls_n = _parse_commandcode_text_tool_calls(_esc_text)
|
||
_check("FIX24 require_escalation: parsed", len(_calls_n) == 1, f"got {len(_calls_n)} calls")
|
||
if _calls_n:
|
||
_args_n = json.loads(_calls_n[0].get("arguments", "{}"))
|
||
_check("FIX24 require_escalation: name is exec_command", _calls_n[0].get("name") == "exec_command", f"got {_calls_n[0].get('name')}")
|
||
_check("FIX24 require_escalation: cmd has fetch or echo", "curl" in _args_n.get("cmd", "") or "echo" in _args_n.get("cmd", "") or "Invoke-WebRequest" in _args_n.get("cmd", "") or "Write-Output" in _args_n.get("cmd", ""), f"got {_args_n.get('cmd')!r}")
|
||
|
||
# Pattern N2: bare request_escalation_permission tag (FIX 24b)
|
||
_esc_bare = 'I want to proceed.\n<request_escalation_permission />\nPlease let me continue.'
|
||
_calls_n2 = _parse_commandcode_text_tool_calls(_esc_bare)
|
||
_check("FIX24b bare escalation: parsed", len(_calls_n2) == 1, f"got {len(_calls_n2)} calls")
|
||
if _calls_n2:
|
||
_check("FIX24b bare escalation: name is exec_command", _calls_n2[0].get("name") == "exec_command", f"got {_calls_n2[0].get('name')}")
|
||
|
||
# Pattern O: _build_explore_cmd module-level function (FIX 23/25)
|
||
_cmd_o, _just_o = _build_explore_cmd("https://github.rommark.dev/admin/Z.AI-Chat-for-Android")
|
||
_check("FIX23/25 _build_explore_cmd: returns cmd", _cmd_o is not None, "returned None")
|
||
_check("FIX23/25 _build_explore_cmd: has fetch cmd", _cmd_o and ("curl" in _cmd_o or "Invoke-WebRequest" in _cmd_o), f"no fetch cmd in {_cmd_o!r}")
|
||
_check("FIX23/25 _build_explore_cmd: has api path", _cmd_o and "/api/v1/repos/" in _cmd_o, f"no api path in {_cmd_o!r}")
|
||
|
||
# Pattern O2: _build_explore_cmd with JSON array containing URL
|
||
_cmd_o2, _ = _build_explore_cmd('[{"content": "https://github.rommark.dev/admin/Z.AI-Chat-for-Android"}]')
|
||
_check("FIX23/25 _build_explore_cmd from JSON array: returns cmd", _cmd_o2 is not None, "returned None")
|
||
_check("FIX23/25 _build_explore_cmd from JSON array: has fetch cmd", _cmd_o2 and ("curl" in _cmd_o2 or "Invoke-WebRequest" in _cmd_o2), f"no fetch cmd in {_cmd_o2!r}")
|
||
|
||
print(f"[CC-SELF-TEST] Results: {_counts[0]} passed, {_counts[1]} failed",
|
||
file=sys.stderr)
|
||
if _counts[1]:
|
||
sys.exit(1)
|
||
else:
|
||
print("[CC-SELF-TEST] ALL PASSED — pipeline is healthy", file=sys.stderr)
|
||
sys.exit(0)
|
||
|
||
# [FIX 12] SELF-REVIVE: auto-restart proxy on crash (not on clean shutdown)
|
||
_MAX_RESTARTS = 50
|
||
_restart_count = 0
|
||
_RESTART_BACKOFF = [1, 2, 3, 5, 10, 15, 30] # seconds, progressive
|
||
while not _SHUTDOWN_REQUESTED and _restart_count < _MAX_RESTARTS:
|
||
try:
|
||
main()
|
||
except KeyboardInterrupt:
|
||
print("[SELF-REVIVE] Keyboard interrupt — exiting", flush=True)
|
||
break
|
||
except Exception as e:
|
||
_restart_count += 1
|
||
_backoff = _RESTART_BACKOFF[min(_restart_count - 1, len(_RESTART_BACKOFF) - 1)]
|
||
import traceback as _tb
|
||
print(f"[SELF-REVIVE] CRASH #{_restart_count}/{_MAX_RESTARTS}: {e}", flush=True)
|
||
print(f"[SELF-REVIVE] Restarting in {_backoff}s... (Ctrl+C to exit)", flush=True)
|
||
_tb.print_exc()
|
||
time.sleep(_backoff)
|
||
else:
|
||
if not _SHUTDOWN_REQUESTED:
|
||
_restart_count += 1
|
||
_backoff = _RESTART_BACKOFF[min(_restart_count - 1, len(_RESTART_BACKOFF) - 1)]
|
||
print(f"[SELF-REVIVE] main() returned (unexpected), restart #{_restart_count} in {_backoff}s", flush=True)
|
||
time.sleep(_backoff)
|
||
|
||
if _SHUTDOWN_REQUESTED or _restart_count >= _MAX_RESTARTS:
|
||
print(f"[SELF-REVIVE] Exiting (shutdown={_SHUTDOWN_REQUESTED}, restarts={_restart_count})", flush=True)
|