Codex-Launcher---Any-AI-Por…/translate-proxy.py

#!/usr/bin/env python3
"""
translate-proxy.py — Responses API → backend API translation proxy.

Backends:
  openai-compat — any OpenAI-compatible Chat Completions API
  anthropic     — Anthropic Messages API
  command-code   — CommandCode /alpha/generate (Z.AI GLM Coding Plan)

Usage:
  python3 translate-proxy.py --config proxy-config.json
  python3 translate-proxy.py --backend command-code --target-url https://... --api-key sk-...

═══════════════════════════════════════════════════════════════════
COMMANDCODE ADAPTER — FIX HISTORY (2026-05-22)
═══════════════════════════════════════════════════════════════════

This file contains multiple rounds of fixes for the CommandCode adapter.
Each fix addresses a specific failure mode observed in production.
They are documented here for future maintainability.

FIX 1: Content blocks rejected by CC API (root cause of initial 400 errors)
  Symptom: {"error":{"message":"params.messages[i].content expected string, received array"}}
  Cause: cc_input_to_messages emitted tool results as content blocks [{"type":"tool_result",...}]
  Fix: All messages now use string content. Tool results as role="user" with plain text.
  Location: cc_input_to_messages() ~line 1085

FIX 2: x-command-code-version header dropped during rewrite
  Symptom: HTTP 403 upgrade_required from CommandCode API
  Cause: _handle_command_code rewrite removed the header line
  Fix: Always send x-command-code-version header with fallback "0.26.8"
  Location: _handle_command_code() header setup block

FIX 3: Stale schema cache with wrong content_type=array
  Symptom: SchemaAdapter used content_type="array" causing content blocks in auto path
  Cause: ErrorAnalyzer learned incorrect schema from error message text
  Fix: Cleared provider-caps.json; added 24h staleness TTL to _load_schema()
  Location: _load_schema(), provider-caps.json

FIX 4: Stream disconnect before completion (client-side "stream disconnected")
  Symptom: Client sees partial SSE then connection close, no response.completed event
  Cause: No try/except around streaming path; exceptions crashed handler mid-stream
  Fix: Wrapped stream_buffered_events in try/except; sends response.completed(status:"failed") on crash
  Location: _handle_command_code() streaming section

FIX 5: Tool calls echoed as text instead of being parsed (THE BIG ONE)
  Symptom: Model generates inline JSON tool calls like {"type":"tool-call","id":"...","name":"exec_command","arguments":"{...}"}
        These appear as raw text in the conversation. The tool is never executed.
  Root cause chain:
    a) cc_input_to_messages sends tool calls as inline JSON text in assistant messages
    b) The CC model echoes back similar JSON in its text-delta response
    c) _parse_commandcode_text_tool_calls only handled XML format (```
<tool>``)
    d) Raw JSON tool calls passed through as plain text → client shows them unparsed
  Fix: Added _extract_raw_json_tool_calls() with field-level regex extraction.
      Handles BOTH malformed (unescaped inner quotes) AND properly escaped JSON.
      Three-tier parse: direct json.loads → unescape \"→\" → unicode_escape decode.
  Location: _extract_args(), _extract_field(), _extract_raw_json_tool_calls()

FIX 6: Double-wrapped arguments (nested {"cmd": "{\"cmd\": \"curl...\"}"}")
  Symptom: args={"cmd": "{\\\"cmd\\\": \\\"curl...\\\"}"}
        Tool executor receives cmd = the literal string '{"cmd": "curl..."', not the actual curl command.
  Root cause: When model generates properly escaped JSON ("arguments": "{\\"cmd\\": \\"...\\"}"),
         _extract_args naive brace-counting returns raw text with escaped quotes.
         json.loads(raw) fails on \\ at structural level.
         Fallback sets args["cmd"] = raw_string → double-wrapped.
  Fix: _extract_args now tries 3 parse strategies before returning.
         Also normalizes sandbox_permissions from parsed args dict (not raw snippet).
  Location: _extract_args() three-tier parser, sandbox_permissions normalization

FIX 7: _extract_field can't read values starting with \"
  Symptom: sandbox_permissions="allow_all" passes through unnormalized because
        _extract_field sees val_start=\\ (backslash) which != \" or { → returns None
  Fix: Skip leading backslash before checking for " or { value type.
  Location: _extract_field() leading-backslash skip

FIX 8: Adaptive probing caused format mismatch (REVERTED)
  Symptom: Probe system discovered OpenAI tool_calls+role=tool format but CC API couldn't
        process multi-turn tool loops correctly with it.
  Fix: Removed probe system entirely. Use conservative format only:
        - Inline JSON text for tool calls (cc_input_to_messages default)
        - role="user" for all tool results
        - ErrorAnalyzer learning on retries (not proactive probes)
  Location: Reverted to cc_input_to_messages(), removed _build_cc_messages + _probe_cc_format

FIX 21: DSML parser silently drops tool calls when model uses name="cmd" (THE HALT BUG)
  Symptom: Codex CLI stops mid-task. Model generates valid DSML exec_command with
        <｜｜DSML｜｜parameter name="cmd" string="true">curl ...
        Parser returns parsed_tool_calls=0. Client sees text output but no tool to execute.
        CLI has nothing to do and halts.
  Root cause: Line 1798 had `if key == "command":` — only matching parameter name="command".
        The actual tool schema defines the parameter as "cmd" (see exec_command schema).
        When DeepSeek generates name="cmd", the key "cmd" != "command", so cmd stays None,
        and line 1825-1826 `if not cmd: continue` silently skips the entire tool call.
        The XML parser (line 2205) already handled both: `params.get("command") or params.get("cmd")`
        but the DSML parser did not.
  Fix: Changed to `if key in ("command", "cmd"):` in the DSML parameter loop.
  Test: Pattern L self-test verifies DSML with name="cmd" is parsed correctly.
  Location: _parse_commandcode_text_tool_calls() DSML parameter loop, self-test Pattern L

════════════════════════════════════════════════════════════════════
INTELLIGENCE ROUTING — Self-Healing Parser System (v3.7.0)
════════════════════════════════════════════════════════════════════

Problem: The Command Code model produces output in unpredictable formats
that change between sessions and models. When the multi-format parser chain
(DSML → <bash> → <explore_agent> → <tool_call type=...> → XML → raw JSON →
fallback regex) returns empty, the Codex agent loop has zero tool calls and
STALLS — the user sees the model "thinking" but nothing happens.

Intelligence Routing is a three-layer self-healing system:

LAYER 1 — Deep URL Extraction (FIX 23)
  The <explore_agent> handler was failing because URLs were hidden inside
  nested JSON: messages: [{"content": "https://..."}]. The regex couldn't
  find them because it excluded the " character that terminates JSON values.

  Solution: _build_explore_cmd() is now a module-level function (was a
  closure). After the initial regex fails, it tries json.loads() on the
  text, iterates list items, and extracts the "content" field to find URLs.
  Also added " to the regex exclusion set and rstrip characters.

LAYER 2 — Escalation Block Handling (FIX 24)
  The model produces <require_escalation> and <request_escalation_permission>
  blocks when it wants elevated permissions. The CC adapter doesn't support
  escalation — these blocks were silently dropped, causing parsed_tool_calls=0.

  Solution: Two handlers:
    - FIX 24a: Closed-tag blocks — extracts URL if present, runs explore cmd;
      otherwise echoes auto-proceed message.
    - FIX 24b: Bare/unclosed tags (<require_escalation />) — auto-proceeds.

LAYER 3 — Intent-Based Command Synthesis (FIX 25, THE CORE)
  When ALL parsers return empty and text has content, the system plays
  detective using 5 heuristics in priority order:

    1. URL detected in text → curl to fetch it
    2. File path reference → cat or ls that file
    3. Shell command in backticks/quotes → extract and run
    4. "explore"/"fetch"/"investigate" intent + last user URL → explore cmd
    5. "I need to"/"let me"/"please" intent text → echo diagnostic

  This ensures the agent loop ALWAYS has a tool call to execute, even when
  the model's output format is completely unrecognized. The loop never stalls.

Architecture:
  _parse_commandcode_text_tool_calls() — LAYER 1 + LAYER 2
  cc_stream_to_sse() — LAYER 3 (runs after parser chain + fallback)

  The _last_user_urls deque (maxlen=20) tracks URLs from user messages
  across the session, giving Layer 3 heuristic 4 a URL to work with.

  Self-tests: 54 patterns (was 41) covering all three layers.

════════════════════════════════════════════════════════════════════
"""

import json, http.server, socketserver, urllib.request, urllib.parse, urllib.error, re
import time, uuid, os, sys, argparse, threading, socket, collections, contextlib, signal
import secrets, string
import dataclasses
import http.client
import selectors
import tempfile

_IS_WINDOWS = sys.platform == "win32"

# ═══════════════════════════════════════════════════════════════════
# Config
# ═══════════════════════════════════════════════════════════════════

DEFAULT_MODELS = {
    "openai-compat": [
        {"id": "gpt-4o-mini", "object": "model", "created": 1700000000, "owned_by": "custom"},
    ],
    "anthropic": [
        {"id": "claude-sonnet-4-20250514", "object": "model", "created": 1700000000, "owned_by": "anthropic"},
    ],
    "codebuff": [
        {"id": "deepseek/deepseek-v4-pro", "object": "model", "created": 1700000000, "owned_by": "codebuff"},
        {"id": "deepseek/deepseek-v4-flash", "object": "model", "created": 1700000000, "owned_by": "codebuff"},
        {"id": "moonshotai/kimi-k2.6", "object": "model", "created": 1700000000, "owned_by": "codebuff"},
        {"id": "minimax/minimax-m2.7", "object": "model", "created": 1700000000, "owned_by": "codebuff"},
    ],
    "auto": [
        {"id": "default-model", "object": "model", "created": 1700000000, "owned_by": "auto"},
    ],
}

def load_config():
    p = argparse.ArgumentParser(description="Responses API translation proxy")
    p.add_argument("--config", help="JSON config file path")
    p.add_argument("--port", type=int, default=None)
    p.add_argument("--backend", default=None, choices=["openai-compat", "anthropic", "command-code", "codebuff", "freebuff", "auto"])
    p.add_argument("--target-url", default=None)
    p.add_argument("--api-key", default=None)
    p.add_argument("--models-file", default=None, help="JSON file with model list array")
    args = p.parse_args()

    cfg = {}
    if args.config:
        with open(args.config) as f:
            cfg = json.load(f)

    for ck, ak in [("port", "port"), ("backend_type", "backend"),
                    ("target_url", "target_url"), ("api_key", "api_key")]:
        v = getattr(args, ak, None)
        if v is not None:
            cfg[ck] = v

    env_map = {
        "port": ("PROXY_PORT", "ZAI_PROXY_PORT", int),
        "backend_type": ("PROXY_BACKEND", None, str),
        "target_url": ("PROXY_TARGET_URL", "ZAI_BASE_URL", str),
        "api_key": ("PROXY_API_KEY", "ZAI_API_KEY", str),
    }
    for ck, (ev1, ev2, conv) in env_map.items():
        if ck not in cfg:
            v = os.environ.get(ev1) or (os.environ.get(ev2) if ev2 else None)
            if v:
                cfg[ck] = conv(v) if conv == int else v

    cfg.setdefault("port", 8080)
    cfg.setdefault("backend_type", "openai-compat")
    cfg.setdefault("target_url", "http://localhost:11434/v1")
    cfg.setdefault("api_key", "")

    models = cfg.get("models", [])
    if not models and args.models_file:
        with open(args.models_file) as f:
            models = json.load(f)
    if not models:
        models = DEFAULT_MODELS.get(cfg["backend_type"], [])
    cfg["models"] = models

    return cfg

CONFIG = None
PORT = 8080
BACKEND = "openai-compat"
TARGET_URL = ""
API_KEY = ""
OAUTH_PROVIDER = ""
MODELS = []
CC_VERSION = ""
REASONING_ENABLED = True
REASONING_EFFORT = "medium"
FORCE_MODEL = ""
BGP_ROUTES = []
PROMPT_ENHANCER = False
PROMPT_ENHANCER_MODE = "offline"
PROMPT_ENHANCER_MODEL = ""
PROMPT_ENHANCER_URL = ""
PROMPT_ENHANCER_KEY = ""
SERVER = None

if _IS_WINDOWS:
    _LOG_DIR = os.path.join(os.environ.get("LOCALAPPDATA", os.path.expanduser("~")), "codex-proxy")
else:
    _LOG_DIR = os.path.join(os.path.expanduser("~"), ".cache", "codex-proxy")
os.makedirs(_LOG_DIR, exist_ok=True)
_REQUESTS_DIR = os.path.join(_LOG_DIR, "requests")
os.makedirs(_REQUESTS_DIR, exist_ok=True)
try:
    for _f in os.listdir(_REQUESTS_DIR):
        if _f.endswith(".tmp"):
            os.remove(os.path.join(_REQUESTS_DIR, _f))
except Exception:
    pass
_stats_path = os.path.join(_LOG_DIR, "usage-stats.json")
_provider_caps_path = os.path.join(_LOG_DIR, "provider-caps.json")
_stats_lock = threading.Lock()
_stats_pending = []
_stats_flush_timer = None
_STATS_FLUSH_INTERVAL = 5.0
_STATS = {}

try:
    _LOG_FILE = open(os.path.join(_LOG_DIR, "proxy.log"), "a", encoding="utf-8")
except Exception:
    _LOG_FILE = None

_response_store = collections.OrderedDict()
_response_store_lock = threading.Lock()
_MAX_STORED = 50
_RESPONSE_TTL = 600

_fb_reasoning_store = collections.OrderedDict()
_fb_reasoning_store_lock = threading.Lock()

_deepseek_reasoning_store = {}
_deepseek_reasoning_lock = threading.Lock()
_MAX_DS_STORED = 100

_last_reasoning_store = {}
_last_reasoning_lock = threading.Lock()

_crof_lock = threading.Lock()
_provider_caps_lock = threading.Lock()
_provider_caps = None

_shutdown_requested = False
_active_connections = 0
_active_connections_lock = threading.Lock()
_active_requests = {}
_active_requests_lock = threading.Lock()

_pool = uuid.uuid4().hex[:8]
_antigravity_version = "1.18.3"
_antigravity_version_checked = 0
_antigravity_version_lock = threading.Lock()
_last_user_urls = collections.deque(maxlen=20)

_conn_pool_lock = threading.Lock()
_conn_pool = {}

_STREAM_IDLE_TIMEOUT = 300

_CODEBUFF_AUTH_URL = "https://www.codebuff.com"
_CODEBUFF_API_URL = "https://www.codebuff.com"
_CODEBUFF_AGENT_MAP = {
    "deepseek/deepseek-v4-pro": "base2-free-deepseek",
    "deepseek/deepseek-v4-flash": "base2-free-deepseek-flash",
    "moonshotai/kimi-k2.6": "base2-free-kimi",
    "minimax/minimax-m2.7": "base2-free",
}
if _IS_WINDOWS:
    _CODEBUFF_CREDS_PATH = os.path.join(os.environ.get("APPDATA", os.path.expanduser("~")), "manicode", "credentials.json")
else:
    _CODEBUFF_CREDS_PATH = os.path.join(os.path.expanduser("~"), ".config", "manicode", "credentials.json")
_codebuff_token_cache = {"token": None, "checked": 0}
_codebuff_session_cache = {"instance_id": None, "expires": 0, "model": None}
_codebuff_token_lock = threading.Lock()

def _get_codebuff_token():
    with _codebuff_token_lock:
        if _codebuff_token_cache["token"] and _codebuff_token_cache["checked"] > time.time() - 300:
            return _codebuff_token_cache["token"]
    try:
        with open(_CODEBUFF_CREDS_PATH) as f:
            creds = json.load(f)
        default_account = creds.get("default", {})
        token = default_account.get("authToken") or creds.get("apiKey") or ""
        with _codebuff_token_lock:
            _codebuff_token_cache["token"] = token
            _codebuff_token_cache["checked"] = time.time()
        return token
    except Exception as e:
        print(f"[codebuff] no credentials at {_CODEBUFF_CREDS_PATH}: {e}", file=sys.stderr)
        return ""

def _codebuff_get_session(token, model):
    with _codebuff_token_lock:
        sc = _codebuff_session_cache
        if sc["instance_id"] and sc["expires"] > time.time() + 60 and sc["model"] == model:
            return sc["instance_id"]
    try:
        url = f"{_CODEBUFF_API_URL}/api/v1/freebuff/session"
        body = json.dumps({}).encode()
        req = urllib.request.Request(url, data=body, headers={
            "Content-Type": "application/json",
            "Authorization": f"Bearer {token}",
            "User-Agent": "ai-sdk/openai-compatible/1.0.25/codebuff",
            "x-codebuff-model": model,
        })
        try:
            resp = urllib.request.urlopen(req, timeout=15)
        except urllib.error.HTTPError as e:
            err_body = e.read().decode()[:1000]
            if e.code == 429:
                retry_s = 120
                user_msg = ""
                try:
                    err_data = json.loads(err_body)
                    retry_ms = err_data.get("retryAfterMs", 0)
                    if retry_ms:
                        retry_s = retry_ms / 1000
                    user_msg = err_data.get("message", err_data.get("error", ""))
                    if isinstance(user_msg, dict):
                        user_msg = user_msg.get("message", "")
                except Exception:
                    pass
                if not user_msg:
                    user_msg = _sanitize_err_body(err_body)
                raise RateLimitError(retry_s, user_msg)
            print(f"[codebuff] session HTTP {e.code}: {err_body[:200]}", file=sys.stderr)
            return None
        data = json.loads(resp.read())
        instance_id = data.get("instanceId", data.get("data", {}).get("instance_id", ""))
        expires_at = data.get("remainingMs", 0)
        if instance_id:
            with _codebuff_token_lock:
                _codebuff_session_cache["instance_id"] = instance_id
                _codebuff_session_cache["expires"] = time.time() + min(expires_at / 1000, 3600)
                _codebuff_session_cache["model"] = model
            print(f"[codebuff] session active, instance={instance_id[:8]}...", file=sys.stderr)
            return instance_id
        return None
    except RateLimitError:
        raise
    except Exception as e:
        print(f"[codebuff] session failed: {e}", file=sys.stderr)
        return None

def _codebuff_start_run(token, agent_id):
    url = f"{_CODEBUFF_API_URL}/api/v1/agent-runs"
    body = json.dumps({"action": "START", "agentId": agent_id, "ancestorRunIds": []}).encode()
    req = urllib.request.Request(url, data=body, headers={
        "Content-Type": "application/json",
        "Authorization": f"Bearer {token}",
        "User-Agent": "ai-sdk/openai-compatible/1.0.25/codebuff",
    })
    try:
        resp = urllib.request.urlopen(req, timeout=15)
        data = json.loads(resp.read())
        run_id = data.get("runId")
        print(f"[codebuff] started run {run_id} for agent {agent_id}", file=sys.stderr)
        return run_id, None
    except urllib.error.HTTPError as e:
        err = e.read().decode()[:500]
        print(f"[codebuff] start run failed: HTTP {e.code}: {err}", file=sys.stderr)
        if e.code == 429:
            retry_s = 120
            try:
                err_data = json.loads(err)
                retry_ms = err_data.get("retryAfterMs", 0)
                if retry_ms:
                    retry_s = retry_ms / 1000
            except Exception:
                pass
            return None, ("rate_limit_error", 429, retry_s, _sanitize_err_body(err))
        return None, ("upstream_error", e.code, 0, _sanitize_err_body(err))
    except Exception as e:
        print(f"[codebuff] start run error: {e}", file=sys.stderr)
        return None, ("proxy_error", 502, 0, str(e))

def _codebuff_finish_run(token, run_id, status="completed"):
    url = f"{_CODEBUFF_API_URL}/api/v1/agent-runs"
    body = json.dumps({"action": "FINISH", "runId": run_id, "status": status,
                       "totalSteps": 1, "directCredits": 0, "totalCredits": 0}).encode()
    req = urllib.request.Request(url, data=body, headers={
        "Content-Type": "application/json",
        "Authorization": f"Bearer {token}",
        "User-Agent": "ai-sdk/openai-compatible/1.0.25/codebuff",
    })
    try:
        urllib.request.urlopen(req, timeout=10)
    except Exception as e:
        print(f"[codebuff] finish run {run_id} error: {e}", file=sys.stderr)

# ═══════════════════════════════════════════════════════════════════
# Multi-account rotation system
class RateLimitError(Exception):
    def __init__(self, retry_seconds, message=""):
        self.retry_seconds = retry_seconds
        self.message = message
        super().__init__(f"rate-limited for {retry_seconds:.0f}s: {message}")

# ═══════════════════════════════════════════════════════════════════

class AccountPool:
    """Manages multiple accounts for a provider. Rotates on rate-limit (429/426)."""

    def __init__(self, provider_name):
        self.provider_name = provider_name
        self._lock = threading.Lock()
        self._accounts = []
        self._rate_limited = {}
        self._current_idx = 0
        self._loaded_at = 0

    def load_accounts(self, force=False):
        with self._lock:
            if not force and self._accounts and time.time() - self._loaded_at < 60:
                return len(self._accounts)
        accounts = self._do_load()
        with self._lock:
            if accounts:
                self._accounts = accounts
                self._loaded_at = time.time()
                for a in accounts:
                    key = a.get("id", a.get("email", ""))
                    if key not in self._rate_limited:
                        self._rate_limited[key] = 0
        return len(self._accounts) if accounts else 0

    def _do_load(self):
        return []

    def get(self):
        """Return the best available account dict, or None."""
        self.load_accounts()
        with self._lock:
            if not self._accounts:
                return None
            now = time.time()
            n = len(self._accounts)
            for attempt in range(n):
                idx = (self._current_idx + attempt) % n
                acct = self._accounts[idx]
                key = acct.get("id", acct.get("email", ""))
                if self._rate_limited.get(key, 0) < now:
                    self._current_idx = idx
                    return acct
            best_key = min(self._rate_limited, key=self._rate_limited.get)
            wait = self._rate_limited[best_key] - now
            print(f"[{self.provider_name}] all accounts rate-limited, earliest free in {wait:.0f}s", file=sys.stderr)
            return self._accounts[self._current_idx]

    def mark_rate_limited(self, account, duration=120):
        key = account.get("id", account.get("email", ""))
        with self._lock:
            self._rate_limited[key] = time.time() + duration
            idx = None
            for i, a in enumerate(self._accounts):
                if a.get("id", a.get("email", "")) == key:
                    idx = i
                    break
            if idx is not None:
                self._current_idx = (idx + 1) % len(self._accounts)
        print(f"[{self.provider_name}] account {key} rate-limited for {duration}s, rotating to next", file=sys.stderr)

    def advance(self):
        with self._lock:
            if self._accounts:
                self._current_idx = (self._current_idx + 1) % len(self._accounts)

    def status(self):
        with self._lock:
            now = time.time()
            result = []
            for a in self._accounts:
                key = a.get("id", a.get("email", ""))
                rl_until = self._rate_limited.get(key, 0)
                info = {"id": key, "email": a.get("email", ""), "rate_limited": rl_until > now}
                if rl_until > now:
                    info["rate_limited_until"] = rl_until
                    info["resets_in"] = int(rl_until - now)
                result.append(info)
            return result

class CodebuffAccountPool(AccountPool):
    def _do_load(self):
        if not os.path.exists(_CODEBUFF_CREDS_PATH):
            return None
        try:
            with open(_CODEBUFF_CREDS_PATH) as f:
                creds = json.load(f)
        except Exception:
            return None
        accounts = []
        if "accounts" in creds and isinstance(creds["accounts"], list):
            for i, ac in enumerate(creds["accounts"]):
                token = ac.get("authToken") or ac.get("apiKey") or ""
                if token:
                    acct = {"id": ac.get("email") or ac.get("id") or f"account-{i}", "token": token, "email": ac.get("email", "")}
                    accounts.append(acct)
        default = creds.get("default", {})
        default_token = default.get("authToken") or creds.get("apiKey") or ""
        if default_token:
            default_id = default.get("email") or default.get("id") or "default"
            if not any(a["id"] == default_id for a in accounts):
                accounts.insert(0, {"id": default_id, "token": default_token, "email": default.get("email", "")})
        return accounts if accounts else None

class GoogleAccountPool(AccountPool):
    def __init__(self, variant):
        super().__init__(f"google-{variant}")
        self.variant = variant

    def _do_load(self):
        cache_dir = os.path.join(os.path.expanduser("~"), ".cache", "codex-proxy")
        accounts = []
        primary = f"google-{self.variant}-oauth-token.json"
        primary_path = os.path.join(cache_dir, primary)
        if os.path.exists(primary_path):
            try:
                with open(primary_path) as f:
                    tok = json.load(f)
                token = tok.get("access_token", "")
                if token:
                    accounts.append({"id": f"google-{self.variant}-primary", "token": token, "email": tok.get("email", ""), "_token_data": tok, "_path": primary_path})
            except Exception:
                pass
        idx = 1
        while True:
            extra = f"google-{self.variant}-oauth-token-{idx}.json"
            extra_path = os.path.join(cache_dir, extra)
            if not os.path.exists(extra_path):
                break
            try:
                with open(extra_path) as f:
                    tok = json.load(f)
                token = tok.get("access_token", "")
                if token:
                    accounts.append({"id": f"google-{self.variant}-{idx}", "token": token, "email": tok.get("email", ""), "_token_data": tok, "_path": extra_path})
            except Exception:
                pass
            idx += 1
        return accounts if accounts else None

class APIKeyPool(AccountPool):
    """Rotates through comma-separated API keys."""

    def __init__(self, provider_name, keys_str):
        super().__init__(provider_name)
        self._raw_keys = [k.strip() for k in keys_str.split(",") if k.strip()]
        self._accounts = [{"id": f"key-{i}", "token": k, "email": f"key-{i}"} for i, k in enumerate(self._raw_keys)]
        for a in self._accounts:
            self._rate_limited[a["id"]] = 0
        self._loaded_at = time.time()

    def load_accounts(self, force=False):
        return len(self._accounts)

_cb_pool = CodebuffAccountPool("codebuff")
_google_antigravity_pool = GoogleAccountPool("antigravity")
_google_cli_pool = GoogleAccountPool("cli")

def _get_codebuff_account():
    """Return (token, account_dict) for best available codebuff account."""
    _cb_pool.load_accounts()
    acct = _cb_pool.get()
    if not acct:
        return "", None
    return acct["token"], acct

def _get_google_account(oauth_provider):
    """Return (access_token, account_dict) for best available Google account."""
    pool = _google_antigravity_pool if oauth_provider == "google-antigravity" else _google_cli_pool
    pool.load_accounts()
    acct = pool.get()
    if not acct:
        return None, None
    token_data = acct.get("_token_data", {})
    token_path = acct.get("_path", "")
    if token_data and token_path:
        refreshed = _refresh_google_token(token_data, token_path)
        return refreshed, acct
    return acct.get("token", ""), acct

def _refresh_google_token(token_data, token_path):
    if token_data.get("expires_at", 0) > time.time() + 60:
        return token_data.get("access_token", "")
    client_id = token_data.get("client_id", "")
    client_secret = token_data.get("client_secret", "")
    refresh_token = token_data.get("refresh_token", "")
    if not all([client_id, client_secret, refresh_token]):
        return token_data.get("access_token", "")
    print("[oauth] refreshing Google access token...", file=sys.stderr)
    try:
        data = urllib.parse.urlencode({
            "client_id": client_id, "client_secret": client_secret,
            "refresh_token": refresh_token, "grant_type": "refresh_token",
        }).encode()
        req = urllib.request.Request("https://oauth2.googleapis.com/token", data=data,
                                      headers={"Content-Type": "application/x-www-form-urlencoded"})
        resp = urllib.request.urlopen(req, timeout=30)
        new_tokens = json.loads(resp.read())
        token_data["access_token"] = new_tokens.get("access_token", token_data.get("access_token"))
        token_data["expires_at"] = time.time() + new_tokens.get("expires_in", 3600)
        with open(token_path, "w", encoding="utf-8") as f:
            json.dump(token_data, f, indent=2)
        print("[oauth] token refreshed OK", file=sys.stderr)
        return token_data["access_token"]
    except Exception as e:
        print(f"[oauth] refresh failed: {e}", file=sys.stderr)
        return token_data.get("access_token", "")

# ═══════════════════════════════════════════════════════════════════
# Gemini 3 thought signature preservation
# ═══════════════════════════════════════════════════════════════════

_gemini_sig_store = {}
_gemini_sig_lock = threading.Lock()

def _gemini_store_sig(key, signature):
    if not key or not signature:
        return
    with _gemini_sig_lock:
        _gemini_sig_store[key] = {"sig": signature, "ts": time.time()}

def _gemini_get_sig(key):
    with _gemini_sig_lock:
        item = _gemini_sig_store.get(key)
    return item["sig"] if item else None

def _extract_gemini_sig(part):
    if not isinstance(part, dict):
        return None
    return part.get("thoughtSignature") or part.get("thought_signature") or part.get("signature")

def _gemini_reattach_sigs(contents):
    for content in contents:
        for part in content.get("parts", []):
            if not isinstance(part, dict):
                continue
            if "thoughtSignature" in part:
                continue
            if "functionCall" in part:
                fc = part["functionCall"]
                cid = fc.get("id") or fc.get("name")
                if cid:
                    sig = _gemini_get_sig(f"fc:{cid}")
                    if sig:
                        part["thoughtSignature"] = sig
            if "text" in part and content.get("role") == "model":
                turn_key = content.get("_proxy_turn_key")
                if turn_key:
                    sig = _gemini_get_sig(f"turn:{turn_key}")
                    if sig:
                        part["thoughtSignature"] = sig
    return contents

# Gemini follow-through guardrail
_GEMINI_AGENT_GUARDRAIL = (
    "You are running inside Codex as an autonomous coding agent. "
    "When the user asks for a change to existing files, do not merely describe the previous work or summarize. "
    "You must inspect the existing files, apply edits with tools, and verify the result. "
    "If a file path is known from prior context, reuse it. "
    "If unsure, list files first. "
    "After tool results, continue until the requested change is actually implemented. "
    "Never answer only with a plan such as 'I will start by...' or 'I am going to...'. "
    "Always emit the actual tool call in the same response."
)

_LOG_FILE_LOCK = threading.Lock()

def _fetch_antigravity_version():
    cache_path = os.path.join(os.path.expanduser("~"), ".cache", "codex-proxy", "antigravity-version.json")
    try:
        with open(cache_path) as f:
            cached = json.load(f)
        if cached.get("version") and cached.get("checked_at", 0) > time.time() - 6 * 3600:
            return cached["version"]
    except Exception:
        pass
    urls = [
        ("https://antigravity-auto-updater-974169037036.us-central1.run.app", None),
        ("https://antigravity.google/changelog", 5000),
    ]
    for url, limit in urls:
        try:
            req = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})
            resp = urllib.request.urlopen(req, timeout=5)
            text = resp.read().decode(errors="replace")
            if limit:
                text = text[:limit]
            m = re.search(r"\d+\.\d+\.\d+", text)
            if m:
                version = m.group(0)
                try:
                    os.makedirs(os.path.dirname(cache_path), exist_ok=True)
                    with open(cache_path, "w", encoding="utf-8") as f:
                        json.dump({"version": version, "checked_at": time.time()}, f)
                except Exception:
                    pass
                return version
        except Exception:
            pass
    return _antigravity_version

def _ensure_antigravity_version():
    global _antigravity_version, _antigravity_version_checked
    if time.time() - _antigravity_version_checked < 6 * 3600:
        return _antigravity_version
    with _antigravity_version_lock:
        if time.time() - _antigravity_version_checked < 6 * 3600:
            return _antigravity_version
        _antigravity_version = _fetch_antigravity_version()
        _antigravity_version_checked = time.time()
        return _antigravity_version

def _init_runtime():
    global CONFIG, PORT, BACKEND, TARGET_URL, API_KEY, OAUTH_PROVIDER, _antigravity_version
    global MODELS, CC_VERSION, REASONING_ENABLED, REASONING_EFFORT, BGP_ROUTES
    global _api_key_pool, PROMPT_ENHANCER

    CONFIG = load_config()
    PORT = CONFIG["port"]
    BACKEND = CONFIG["backend_type"]
    TARGET_URL = CONFIG["target_url"].rstrip("/")
    API_KEY = CONFIG["api_key"]
    OAUTH_PROVIDER = CONFIG.get("oauth_provider") or ""
    MODELS = CONFIG["models"]
    CC_VERSION = CONFIG.get("cc_version", "")
    REASONING_ENABLED = CONFIG.get("reasoning_enabled", True)
    REASONING_EFFORT = CONFIG.get("reasoning_effort", "medium")
    FORCE_MODEL = (CONFIG.get("force_model") or "").strip()
    PROMPT_ENHANCER = CONFIG.get("prompt_enhancer", False)
    PROMPT_ENHANCER_MODE = CONFIG.get("prompt_enhancer_mode", "offline")
    PROMPT_ENHANCER_MODEL = CONFIG.get("prompt_enhancer_model", "")
    PROMPT_ENHANCER_URL = CONFIG.get("prompt_enhancer_url", "")
    PROMPT_ENHANCER_KEY = CONFIG.get("prompt_enhancer_key", "")
    BGP_ROUTES = CONFIG.get("bgp_routes", [])
    _api_key_pool = None
    if API_KEY and "," in API_KEY and not OAUTH_PROVIDER.startswith("google") and BACKEND not in ("codebuff", "freebuff"):
        _api_key_pool = APIKeyPool(BACKEND, API_KEY)
        print(f"[multi-account] API key pool: {len(_api_key_pool._accounts)} keys for {BACKEND}", file=sys.stderr)
    if OAUTH_PROVIDER == "google-antigravity":
        _antigravity_version = _ensure_antigravity_version()
        print(f"[antigravity] version={_antigravity_version}", file=sys.stderr)

    bgp_models = []
    for _r in BGP_ROUTES:
        for _m in _r.get("models", [{"id": _r.get("model", "unknown")}]):
            mid = _m.get("id", _m) if isinstance(_m, dict) else _m
            if mid not in bgp_models:
                bgp_models.append(mid)
    if BGP_ROUTES and not MODELS:
        MODELS = [{"id": m, "object": "model", "created": 1700000000, "owned_by": "bgp"} for m in bgp_models]
        CONFIG["models"] = MODELS

    if (BACKEND or "").startswith("gemini-oauth") and (OAUTH_PROVIDER or "").startswith("google"):
        token_name = "google-antigravity-oauth-token.json" if OAUTH_PROVIDER == "google-antigravity" else "google-cli-oauth-token.json"
        token_path = os.path.join(os.path.expanduser("~"), ".cache", "codex-proxy", token_name)
        _preemptive_refresh_token(token_path)
        try:
            with open(token_path) as _tf:
                _td = json.load(_tf)
            _discovered = [] if OAUTH_PROVIDER == "google-antigravity" else _td.get("available_models", [])
            if _discovered:
                _seen = []
                for _m in _discovered:
                    if _m not in _seen:
                        _seen.append(_m)
                MODELS = [{"id": m, "object": "model", "created": 1700000000, "owned_by": "gemini-oauth"} for m in _seen]
                CONFIG["models"] = MODELS
                print(f"[gemini-oauth] loaded {len(_seen)} discovered models: {_seen}", file=sys.stderr)
        except Exception:
            pass

def _preemptive_refresh_token(token_path):
    try:
        with open(token_path) as f:
            td = json.load(f)
        expires_at = td.get("expires_at", 0)
        if expires_at and time.time() > expires_at - 300:
            print(f"[oauth] preemptive refresh: token expires in {int(expires_at - time.time())}s", file=sys.stderr)
    except Exception:
        pass

def _pooled_urlopen(url, data=None, headers=None, timeout=180):
    parsed = urllib.parse.urlparse(url)
    host = parsed.hostname
    port = parsed.port or (443 if parsed.scheme == "https" else 80)
    pool_key = f"{parsed.scheme}://{host}:{port}"
    with _conn_pool_lock:
        conn = _conn_pool.get(pool_key)
        if conn:
            try:
                sock = conn.sock
                if sock is None or sock._closed if hasattr(sock, '_closed') else False:
                    conn = None
            except Exception:
                conn = None
    if conn is None:
        if parsed.scheme == "https":
            conn = http.client.HTTPSConnection(host, port, timeout=timeout)
        else:
            conn = http.client.HTTPConnection(host, port, timeout=timeout)
        with _conn_pool_lock:
            _conn_pool[pool_key] = conn
    path = parsed.path or "/"
    if parsed.query:
        path += "?" + parsed.query
    method = "POST" if data else "GET"
    conn.request(method, path, body=data, headers=headers or {})
    return conn.getresponse()

def _response_store_evict():
    with _response_store_lock:
        now = time.time()
        expired = [k for k, v in _response_store.items()
                   if isinstance(v, dict) and now - v.get("ts", 0) > _RESPONSE_TTL]
        for k in expired:
            del _response_store[k]

def _log_dual(msg, level="INFO"):
    ts = time.strftime("%H:%M:%S")
    line = f"[{ts}] [{level}] {msg}"
    print(line, file=sys.stderr, flush=True)
    with _LOG_FILE_LOCK:
        if _LOG_FILE:
            try:
                _LOG_FILE.write(line + "\n")
                _LOG_FILE.flush()
            except Exception:
                pass

def _stream_with_idle_timeout(response, timeout_seconds=None):
    if timeout_seconds is None:
        timeout_seconds = _STREAM_IDLE_TIMEOUT
    sel = selectors.DefaultSelector()
    try:
        sock = response if hasattr(response, 'fp') and response.fp else response
        raw_sock = getattr(getattr(sock, 'fp', None), 'raw', None) or getattr(sock, '_sock', None)
        if raw_sock is None:
            for chunk in response:
                yield chunk
            return
        sel.register(raw_sock, selectors.EVENT_READ)
        while True:
            ready = sel.select(timeout=timeout_seconds)
            if not ready:
                raise TimeoutError(f"Stream idle for {timeout_seconds}s")
            chunk = response.readline()
            if not chunk:
                break
            yield chunk
    finally:
        try:
            sel.close()
        except Exception:
            pass

def _provider_cap_key(target_url=None, backend=None, model=None):
    host = urllib.parse.urlparse(target_url or TARGET_URL).netloc.lower()
    return f"{backend or BACKEND}|{host}|{model or '*'}"

def _load_provider_caps():
    global _provider_caps
    with _provider_caps_lock:
        if _provider_caps is not None:
            return _provider_caps
        try:
            with open(_provider_caps_path) as f:
                _provider_caps = json.load(f)
        except Exception:
            _provider_caps = {}
        return _provider_caps

def _save_provider_caps():
    try:
        os.makedirs(os.path.dirname(_provider_caps_path), exist_ok=True)
        with open(_provider_caps_path, "w", encoding="utf-8") as f:
            json.dump(_provider_caps or {}, f, indent=2)
    except Exception as e:
        print(f"[provider-sensor] failed to save caps: {e}", file=sys.stderr)

def _provider_cap(model, key, default=None):
    caps = _load_provider_caps()
    specific = caps.get(_provider_cap_key(model=model), {})
    generic = caps.get(_provider_cap_key(model="*"), {})
    return specific.get(key, generic.get(key, default))

def _set_provider_cap(model, key, value, reason=""):
    caps = _load_provider_caps()
    cap_key = _provider_cap_key(model=model)
    caps.setdefault(cap_key, {})[key] = value
    caps[cap_key]["reason"] = reason
    caps[cap_key]["updated_at"] = time.time()
    _save_provider_caps()
    print(f"[provider-sensor] learned {cap_key}: {key}={value} reason={reason}", file=sys.stderr)

def _refresh_oauth_token():
    return _refresh_oauth_token_for(API_KEY, OAUTH_PROVIDER)

def _refresh_oauth_token_for(api_key, oauth_provider):
    oauth_provider = oauth_provider or ""
    if oauth_provider.startswith("google"):
        token, acct = _get_google_account(oauth_provider)
        if token and acct:
            return token
    if not oauth_provider.startswith("google"):
        return api_key
    token_name = "google-antigravity-oauth-token.json" if oauth_provider == "google-antigravity" else "google-cli-oauth-token.json"
    token_path = os.path.join(os.path.expanduser("~"), ".cache", "codex-proxy", token_name)
    if not os.path.exists(token_path):
        return api_key
    try:
        with open(token_path) as f:
            tokens = json.load(f)
        if tokens.get("expires_at", 0) > time.time() + 60:
            return tokens.get("access_token", api_key)
        client_id = tokens.get("client_id", "")
        client_secret = tokens.get("client_secret", "")
        refresh_token = tokens.get("refresh_token", "")
        if not all([client_id, client_secret, refresh_token]):
            return tokens.get("access_token", api_key)
        print("[oauth] refreshing Google access token...", file=sys.stderr)
        data = urllib.parse.urlencode({
            "client_id": client_id, "client_secret": client_secret,
            "refresh_token": refresh_token, "grant_type": "refresh_token",
        }).encode()
        req = urllib.request.Request("https://oauth2.googleapis.com/token", data=data,
                                     headers={"Content-Type": "application/x-www-form-urlencoded"})
        resp = urllib.request.urlopen(req, timeout=30)
        new_tokens = json.loads(resp.read())
        tokens["access_token"] = new_tokens.get("access_token", tokens.get("access_token"))
        tokens["expires_at"] = time.time() + new_tokens.get("expires_in", 3600)
        with open(token_path, "w", encoding="utf-8") as f:
            json.dump(tokens, f, indent=2)
        print("[oauth] token refreshed OK", file=sys.stderr)
        return tokens["access_token"]
    except Exception as e:
        print(f"[oauth] refresh failed: {e}", file=sys.stderr)
        return API_KEY

# ═══════════════════════════════════════════════════════════════════
# Shared helpers
# ═══════════════════════════════════════════════════════════════════

_pool = uuid.uuid4().hex[:8]

def _load_stats():
    try:
        if os.path.exists(_stats_path):
            return json.load(open(_stats_path))
    except Exception:
        pass
    return {"providers": {}, "updated": None}

def _atomic_write_json(path, obj):
    tmp = path + ".tmp"
    with open(tmp, "w", encoding="utf-8") as f:
        json.dump(obj, f, indent=2, ensure_ascii=False)
    os.replace(tmp, path)

def _flush_stats():
    global _stats_flush_timer
    with _stats_lock:
        batch = list(_stats_pending)
        _stats_pending.clear()
        _stats_flush_timer = None
    if not batch:
        return
    stats = _load_stats()
    for entry in batch:
        provider = entry["provider"]
        model = entry["model"]
        p = stats["providers"].setdefault(provider, {
            "total_requests": 0, "successes": 0, "failures": 0,
            "total_tokens_in": 0, "total_tokens_out": 0,
            "total_duration_s": 0.0, "models": {}, "last_used": None, "last_error": None,
        })
        p["total_requests"] += 1
        p["total_tokens_in"] += entry["tokens_in"]
        p["total_tokens_out"] += entry["tokens_out"]
        p["total_duration_s"] += entry["duration_s"]
        p["last_used"] = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime(entry["ts"]))
        if entry["success"]:
            p["successes"] += 1
        else:
            p["failures"] += 1
            p["last_error"] = entry.get("error_type") or "unknown"
        m = p["models"].setdefault(model, {"requests": 0, "tokens_in": 0, "tokens_out": 0})
        m["requests"] += 1
        m["tokens_in"] += entry["tokens_in"]
        m["tokens_out"] += entry["tokens_out"]
    stats["updated"] = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
    _atomic_write_json(_stats_path, stats)

def _record_usage(provider, model, success, duration_s, tokens_in=0, tokens_out=0, error_type=None):
    global _stats_flush_timer
    entry = {
        "provider": provider or "unknown", "model": model or "unknown",
        "success": bool(success), "duration_s": float(duration_s or 0),
        "tokens_in": int(tokens_in or 0), "tokens_out": int(tokens_out or 0),
        "error_type": error_type, "ts": time.time(),
    }
    with _stats_lock:
        _stats_pending.append(entry)
        if _stats_flush_timer is None:
            _stats_flush_timer = threading.Timer(_STATS_FLUSH_INTERVAL, _flush_stats)
            _stats_flush_timer.daemon = True
            _stats_flush_timer.start()

def store_response(resp_id, input_data, output_items):
    if not resp_id:
        return
    _response_store_evict()
    with _response_store_lock:
        _response_store[resp_id] = {"input": input_data, "output": output_items, "ts": time.time()}
        while len(_response_store) > _MAX_STORED:
            _response_store.popitem(last=False)

def resolve_previous_response(body):
    prev_id = body.get("previous_response_id")
    input_data = body.get("input", "")
    if not prev_id:
        return input_data
    with _response_store_lock:
        stored = _response_store.get(prev_id)
    if not stored:
        return input_data
    prev_input = stored["input"]
    prev_output = stored["output"]
    new_input = input_data if isinstance(input_data, list) else []
    if isinstance(prev_input, list):
        combined = list(prev_input) + list(prev_output) + new_input
    else:
        combined = [{"type": "message", "role": "user", "content": [{"type": "input_text", "text": str(prev_input)}]}] + list(prev_output) + new_input
    return combined

def _fb_store_reasoning(resp_id, reasoning_text):
    if not resp_id or not reasoning_text:
        return
    with _fb_reasoning_store_lock:
        _fb_reasoning_store[resp_id] = {"reasoning": reasoning_text, "ts": time.time()}
        while len(_fb_reasoning_store) > _MAX_STORED:
            _fb_reasoning_store.popitem(last=False)
        expired = [k for k, v in _fb_reasoning_store.items() if time.time() - v["ts"] > _RESPONSE_TTL]
        for k in expired:
            del _fb_reasoning_store[k]

def _fb_get_reasoning(resp_id):
    if not resp_id:
        return ""
    with _fb_reasoning_store_lock:
        entry = _fb_reasoning_store.get(resp_id)
        return entry["reasoning"] if entry else ""

def _fb_get_any_reasoning():
    with _fb_reasoning_store_lock:
        for k in _fb_reasoning_store:
            return _fb_reasoning_store[k]["reasoning"]
        return ""

def _codebuff_hard_disable_reasoning(messages):
    """Strip all reasoning/thinking fields from every message.
    Codebuff rejects mixed reasoning_content histories.
    The final chat body must be clean before POST."""
    for msg in messages:
        if not isinstance(msg, dict):
            continue
        for key in ("reasoning_content", "reasoning", "thinking",
                     "thinking_content", "thoughts"):
            msg.pop(key, None)

def _is_reasoning_content_error(error_text):
    if not error_text:
        return False
    e = error_text.lower()
    return ("reasoning_content" in e or "thinking mode" in e
            or "must be passed back" in e)

def _ds_store_assistant(resp_id, assistant_msg):
    if not resp_id or not isinstance(assistant_msg, dict):
        return
    tool_calls = assistant_msg.get("tool_calls") or []
    reasoning = assistant_msg.get("reasoning_content")
    if not tool_calls or not reasoning:
        return
    with _deepseek_reasoning_lock:
        for tc in tool_calls:
            tc_id = tc.get("id") or tc.get("call_id", "")
            if tc_id:
                _deepseek_reasoning_store[tc_id] = {
                    "resp_id": resp_id,
                    "assistant": dict(assistant_msg),
                    "reasoning_content": reasoning,
                    "ts": time.time(),
                }
        keys = list(_deepseek_reasoning_store.keys())
        if len(keys) > _MAX_DS_STORED:
            for k in keys[:len(keys) - _MAX_DS_STORED]:
                del _deepseek_reasoning_store[k]

def _ds_rebuild_tool_history(messages):
    with _deepseek_reasoning_lock:
        snapshot = dict(_deepseek_reasoning_store)
        expired = [k for k, v in snapshot.items() if time.time() - v["ts"] > 900]
        for k in expired:
            _deepseek_reasoning_store.pop(k, None)
            snapshot.pop(k, None)
    if not snapshot:
        return messages
    rebuilt = []
    inserted_ids = set()
    for msg in messages:
        if msg.get("role") == "tool":
            tc_id = msg.get("tool_call_id", "")
            stored = snapshot.get(tc_id)
            if stored and tc_id not in inserted_ids:
                am = dict(stored["assistant"])
                if am.get("reasoning_content"):
                    rebuilt.append(am)
                    inserted_ids.add(tc_id)
        rebuilt.append(msg)
    return rebuilt

def _cb_input_to_messages(input_data, instructions=""):
    msgs = []
    tool_name_by_id = {}
    pending_tool_calls = []
    last_flushed_ids = []
    if isinstance(input_data, str):
        msgs.append({"role": "user", "content": input_data})
    elif isinstance(input_data, list):
        for item in input_data:
            t = item.get("type")
            if t == "reasoning":
                continue
            if t == "function_call":
                tcid = item.get("call_id") or item.get("id") or uid("tc")
                pending_tool_calls.append(
                    {"id": tcid, "type": "function",
                     "function": {"name": item.get("name", ""),
                                   "arguments": item.get("arguments", "{}")}})
                tool_name_by_id[tcid] = item.get("name", "")
                continue
            if pending_tool_calls:
                last_flushed_ids = [tc["id"] for tc in pending_tool_calls]
                msg = {"role": "assistant", "content": None, "tool_calls": pending_tool_calls}
                msgs.append(msg)
                pending_tool_calls = []
            if t == "message":
                role = item.get("role", "user")
                if role == "developer":
                    role = "system"
                text = ""
                content = item.get("content", [])
                if isinstance(content, str):
                    text = content
                else:
                    for part in content:
                        if isinstance(part, str):
                            text += part
                            continue
                        pt = part.get("type", "")
                        if pt in ("input_text", "output_text"):
                            text += part.get("text", "")
                if text is not None:
                    am = {"role": role, "content": text}
                    if role == "assistant":
                        am["_fb_orig_id"] = item.get("id", "")
                    msgs.append(am)
            elif t == "function_call_output":
                tcid = item.get("call_id") or item.get("id") or ""
                if not tcid and last_flushed_ids:
                    idx = len([m for m in msgs if m.get("role") == "tool"])
                    if idx < len(last_flushed_ids):
                        tcid = last_flushed_ids[idx]
                msgs.append({"role": "tool", "tool_call_id": tcid,
                             "tool_name": tool_name_by_id.get(tcid, ""),
                             "content": item.get("output", "")})
        if pending_tool_calls:
            msg = {"role": "assistant", "content": None, "tool_calls": pending_tool_calls}
            msgs.append(msg)
    if instructions:
        msgs.insert(0, {"role": "system", "content": instructions})
    return msgs

def _fb_strip_reasoning_from_messages(messages):
    out = []
    for m in messages:
        nm = {k: v for k, v in m.items() if k != "reasoning_content"}
        out.append(nm)
    return out

_HOP_BY_HOP_HEADERS = {
    "connection",
    "keep-alive",
    "proxy-authenticate",
    "proxy-authorization",
    "te",
    "trailers",
    "transfer-encoding",
    "upgrade",
    "host",
    "content-length",
}

def uid(prefix="id"):
    return f"{prefix}-{_pool}-{uuid.uuid4().hex[:12]}"

def emit(event, data):
    return f"event: {event}\ndata: {json.dumps(data)}\n\n"

def upstream_target(base_url, suffix):
    base = base_url.rstrip("/")
    if base.endswith(suffix):
        return base
    return f"{base}{suffix}"

_BROWSER_HEADERS = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36",
    "Accept": "application/json, text/event-stream, */*",
    "Accept-Language": "en-US,en;q=0.9",
    "Sec-Ch-Ua": '"Chromium";v="137", "Not/A)Brand";v="99"',
    "Sec-Ch-Ua-Mobile": "?0",
    "Sec-Ch-Ua-Platform": '"Linux"',
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "same-origin",
}

def forwarded_headers(request_headers, extra=None, browser_ua=False):
    headers = {}
    if browser_ua:
        headers.update(_BROWSER_HEADERS)
    for key, value in request_headers.items():
        if key.lower() in _HOP_BY_HOP_HEADERS:
            continue
        if browser_ua and key.lower() == "user-agent":
            continue
        headers[key] = value
    if extra:
        headers.update(extra)
    return headers

def _openrouter_extra():
    if not TARGET_URL:
        return {}
    if "z.ai" in TARGET_URL:
        return {
            "HTTP-Referer": "https://openclaw.ai",
            "X-OpenRouter-Title": "OpenClaw",
            "X-OpenRouter-Categories":
                "cli-agent,cloud-agent,programming-app,creative-writing,"
                "writing-assistant,general-chat,personal-agent",
        }
    if "openrouter.ai" in TARGET_URL:
        return {
            "HTTP-Referer": "https://chats-llm.com",
            "X-OpenRouter-Title": "Chats-LLM",
            "X-OpenRouter-Categories": "general-chat, ide-extension",
            "X-OpenRouter-Cache": "true",
        }
    return {}

_MAX_INPUT_ITEMS = 30
_MAX_TOOL_OUTPUT_CHARS = 8000
_COMPACT_KEEP_RECENT = 10

_CROF_ADAPTIVE = {
    "fail_history": [],
    "model_limits": {},
    "global_item_limit": 80,
    "min_keep_recent": 6,
}

_BGP_STATS_PATH = os.path.join(_LOG_DIR, "bgp-route-stats.json")
_bgp_stats_lock = threading.Lock()

def _route_key(route):
    return f"{route.get('name', '')}::{route.get('target_url', '')}::{route.get('model', '')}"

def _load_bgp_stats():
    try:
        if os.path.exists(_BGP_STATS_PATH):
            return json.load(open(_BGP_STATS_PATH))
    except Exception:
        pass
    return {}

def _save_bgp_stats(stats):
    tmp = _BGP_STATS_PATH + ".tmp"
    with open(tmp, "w", encoding="utf-8") as f:
        json.dump(stats, f, indent=2)
    os.replace(tmp, _BGP_STATS_PATH)

def _score_route(route, stats):
    key = _route_key(route)
    rs = stats.get(key, {})
    now = time.time()
    if float(rs.get("open_until_ts", 0)) > now:
        return 1_000_000
    priority = int(route.get("priority", 99))
    ewma = float(rs.get("ewma_latency_s", 0))
    failures = int(rs.get("consecutive_failures", 0))
    score = priority + min(ewma * 5, 50) + failures * 20
    if float(rs.get("rate_limited_until", 0)) > now:
        score += 500
    return score

def _update_route_stats(route, success, duration_s, http_code=None, error_type=None):
    with _bgp_stats_lock:
        stats = _load_bgp_stats()
        key = _route_key(route)
        rs = stats.setdefault(key, {
            "ewma_latency_s": duration_s, "consecutive_failures": 0,
            "last_success": None, "last_failure": None,
            "open_until_ts": 0, "rate_limited_until": 0, "last_error": None,
        })
        alpha = 0.25
        rs["ewma_latency_s"] = alpha * duration_s + (1 - alpha) * float(rs.get("ewma_latency_s", duration_s))
        if success:
            rs["consecutive_failures"] = 0
            rs["last_success"] = time.time()
        else:
            rs["consecutive_failures"] = int(rs.get("consecutive_failures", 0)) + 1
            rs["last_failure"] = time.time()
            rs["last_error"] = error_type or (f"http_{http_code}" if http_code else "unknown")
            if http_code == 429:
                rs["rate_limited_until"] = time.time() + 120
            if rs["consecutive_failures"] >= 3:
                rs["open_until_ts"] = time.time() + 60
                rs["consecutive_failures"] = 0
        _save_bgp_stats(stats)

def _sorted_bgp_routes():
    with _bgp_stats_lock:
        stats = _load_bgp_stats()
    return sorted(BGP_ROUTES, key=lambda r: _score_route(r, stats))

def _crof_record(model, n_items, success):
    if TARGET_URL and "crof.ai" not in TARGET_URL:
        return
    if not isinstance(n_items, int) or n_items < 1:
        return
    entry = {"model": model, "items": n_items, "ok": success}
    hist = _CROF_ADAPTIVE["fail_history"]
    hist.append(entry)
    if len(hist) > 200:
        _CROF_ADAPTIVE["fail_history"] = hist[-100:]

    ml = _CROF_ADAPTIVE["model_limits"].setdefault(model, {"ok_max": 30, "fail_min": 0, "limit": 30})
    if success and n_items > ml["ok_max"]:
        ml["ok_max"] = n_items
    if not success and (ml["fail_min"] == 0 or n_items < ml["fail_min"]):
        ml["fail_min"] = n_items

    if ml["fail_min"] > 0 and ml["ok_max"] >= ml["fail_min"]:
        ml["limit"] = ml["fail_min"] - 1
    elif ml["fail_min"] > 0:
        ml["limit"] = max(ml["fail_min"] - 2, _CROF_ADAPTIVE["min_keep_recent"] + 2)

    global_limit = 30
    for m, v in _CROF_ADAPTIVE["model_limits"].items():
        if v.get("limit", 30) < global_limit:
            global_limit = v["limit"]
    _CROF_ADAPTIVE["global_item_limit"] = global_limit

    if TARGET_URL and "crof.ai" in TARGET_URL:
        print(f"[crof-adaptive] model={model} items={n_items} {'OK' if success else 'FAIL'} -> limit={ml.get('limit',30)} global={global_limit}", file=sys.stderr)

def _crof_item_limit(model):
    ml = _CROF_ADAPTIVE["model_limits"].get(model, {})
    per_model = ml.get("limit", 30)
    return min(per_model, _CROF_ADAPTIVE["global_item_limit"])

def _crof_compact_for_retry(input_data, model):
    limit = _crof_item_limit(model)
    if not isinstance(input_data, list) or len(input_data) <= limit:
        return input_data

    keep = max(_CROF_ADAPTIVE["min_keep_recent"], limit // 3)
    head_end = 0
    for i, item in enumerate(input_data):
        t = item.get("type")
        if t == "message" and item.get("role") in ("developer", "system"):
            head_end = i + 1
        elif t == "message" and item.get("role") == "user" and head_end == i:
            head_end = i + 1
        else:
            break

    head = input_data[:head_end]
    tail_start = max(head_end, len(input_data) - keep)
    while tail_start > head_end:
        t = input_data[tail_start].get("type")
        r = input_data[tail_start].get("role", "")
        if t in ("function_call_output", "function_call"):
            tail_start -= 1
        elif t == "message" and r == "assistant":
            tail_start -= 1
        else:
            break
    tail = input_data[tail_start:]
    body = input_data[head_end:tail_start]

    if not body:
        return head + tail

    summary_lines = [f"[Auto-compacted: {len(body)} turns removed (adaptive limit={limit})]"]
    for item in body[-5:]:
        summary_lines.append(_item_summary(item, max_len=120))

    summary_msg = {"type": "message", "role": "user", "content": [{"type": "input_text", "text": "\n".join(summary_lines)}]}
    if TARGET_URL and "crof.ai" in TARGET_URL:
        print(f"[crof-adaptive] RETRY compact: {len(input_data)} -> {len(head)+1+len(tail)} (limit={limit}, keep={len(tail)})", file=sys.stderr)
    return head + [summary_msg] + tail

def _item_summary(item, max_len=200):
    t = item.get("type")
    if t == "message":
        role = item.get("role", "?")
        text = ""
        for p in item.get("content", []):
            if p.get("type") in ("input_text", "output_text"):
                text += p.get("text", "")
        return f"[{role}] {text[:max_len]}"
    elif t == "function_call":
        name = item.get("name", "?")
        args = item.get("arguments", "{}")
        try:
            a = json.loads(args)
            cmd = a.get("cmd", a.get("command", ""))
            if cmd:
                return f"[tool call] {name}: {cmd[:max_len]}"
        except Exception:
            pass
        return f"[tool call] {name}({args[:max_len]})"
    elif t == "function_call_output":
        output = item.get("output", "")
        if len(output) > max_len:
            return f"[tool result] {output[:max_len]}..."
        return f"[tool result] {output}"
    return f"[{t}]"

def _extract_files(items):
    files = set()
    for item in items:
        if item.get("type") == "function_call":
            try:
                a = json.loads(item.get("arguments", "{}"))
                cmd = a.get("cmd", a.get("command", ""))
                for prefix in (">", ">>", " > ", " >> "):
                    for part in cmd.split(prefix)[1:]:
                        f = part.strip().split()[0].strip("'\"")
                        if f and not f.startswith("-") and "/" in f:
                            files.add(f)
            except Exception:
                pass
    return files

def _compact_input(input_data):
    if isinstance(input_data, str):
        return input_data
    if not isinstance(input_data, list) or len(input_data) <= _MAX_INPUT_ITEMS:
        out = []
        for item in input_data:
            if isinstance(item, dict) and item.get("type") == "function_call_output":
                o = item.get("output", "")
                if len(o) > _MAX_TOOL_OUTPUT_CHARS:
                    item = dict(item)
                    item["output"] = o[:_MAX_TOOL_OUTPUT_CHARS] + f"\n... [truncated {len(o) - _MAX_TOOL_OUTPUT_CHARS} chars]"
                    print(f"[compact] tool output truncated {len(o)} -> {_MAX_TOOL_OUTPUT_CHARS}", file=sys.stderr)
            out.append(item)
        return out

    head_end = 0
    for i, item in enumerate(input_data):
        t = item.get("type")
        if t == "message" and item.get("role") in ("developer", "system"):
            head_end = i + 1
        elif t == "message" and item.get("role") == "user" and head_end == i:
            head_end = i + 1
        else:
            break

    head = input_data[:head_end]
    tail_start = len(input_data) - _COMPACT_KEEP_RECENT
    while tail_start > head_end:
        t = input_data[tail_start].get("type")
        r = input_data[tail_start].get("role", "")
        if t == "function_call_output":
            tail_start -= 1
        elif t == "function_call":
            tail_start -= 1
        elif t == "message" and r == "assistant":
            tail_start -= 1
        else:
            break
    tail = input_data[tail_start:]
    body = input_data[head_end:tail_start]

    if not body:
        return head + tail

    for item in tail:
        if isinstance(item, dict) and item.get("type") == "function_call_output":
            o = item.get("output", "")
            if len(o) > _MAX_TOOL_OUTPUT_CHARS:
                item["output"] = o[:_MAX_TOOL_OUTPUT_CHARS] + f"\n... [truncated {len(o) - _MAX_TOOL_OUTPUT_CHARS} chars]"

    user_queries = []
    for item in body:
        if item.get("type") == "message" and item.get("role") == "user":
            for p in item.get("content", []):
                if p.get("type") == "input_text":
                    user_queries.append(p.get("text", "")[:300])
    assistant_msgs = []
    for item in body:
        if item.get("type") == "message" and item.get("role") == "assistant":
            for p in item.get("content", []):
                if p.get("type") == "output_text":
                    assistant_msgs.append(p.get("text", "")[:300])

    tool_summaries = []
    for item in body:
        if item.get("type") in ("function_call", "function_call_output"):
            tool_summaries.append(_item_summary(item, max_len=150))

    files = _extract_files(body)

    summary_lines = [f"[Auto-compacted: {len(body)} earlier turns summarized to preserve context]"]
    if user_queries:
        summary_lines.append(f"User requests: {'; '.join(user_queries[-3:])}")
    if assistant_msgs:
        summary_lines.append(f"Assistant responses: {'; '.join(assistant_msgs[-3:])}")
    if tool_summaries:
        summary_lines.append(f"Actions taken ({len(tool_summaries)} steps):")
        for ts in tool_summaries[-15:]:
            summary_lines.append(f"  {ts}")
    if files:
        summary_lines.append(f"Files touched: {', '.join(sorted(files)[-10:])}")

    summary_text = "\n".join(summary_lines)
    summary_msg = {
        "type": "message",
        "role": "user",
        "content": [{"type": "input_text", "text": summary_text}]
    }

    print(f"[compact] {len(input_data)} items -> {len(head) + 1 + len(tail)} (compacted {len(body)} old items into summary)", file=sys.stderr)
    return head + [summary_msg] + tail

# ═══════════════════════════════════════════════════════════════════
# Provider policies
# ═══════════════════════════════════════════════════════════════════

_PROVIDER_POLICIES = {
    "crof": {"reasoning_mode": "off", "max_tokens": 32768, "strip_reasoning": True,
             "tool_output_limit": 4000, "max_input_items": 18, "compaction": "aggressive",
             "synthetic_tool_results": True},
    "chats-llm": {"reasoning_mode": "off", "max_tokens": 32768, "strip_reasoning": True,
                  "tool_output_limit": 4000, "max_input_items": 20, "compaction": "aggressive"},
    "z.ai": {"reasoning_mode": "medium", "max_tokens": 65536, "strip_reasoning": True,
             "tool_output_limit": 8000, "max_input_items": 40, "compaction": "balanced"},
    "openrouter": {"reasoning_mode": "provider_default", "max_tokens": 32768, "strip_reasoning": True,
                   "tool_output_limit": 6000, "max_input_items": 35, "compaction": "balanced"},
    "openadapter": {"reasoning_mode": "off", "max_tokens": 32768, "strip_reasoning": True,
                    "tool_output_limit": 6000, "max_input_items": 30, "compaction": "balanced"},
    "cloudcode-pa": {"compaction": "aggressive", "context_size": 1000000,
                     "tool_output_limit": 6000, "max_input_items": 60},
    "googleapis": {"compaction": "balanced", "context_size": 1000000,
                   "tool_output_limit": 6000, "max_input_items": 80},
}

def provider_policy(target_url=None, backend=None):
    host = urllib.parse.urlparse(target_url or TARGET_URL).netloc.lower()
    for key, policy in _PROVIDER_POLICIES.items():
        if key in host:
            return policy
    return {}

# ═══════════════════════════════════════════════════════════════════
# Adaptive context compaction (model-aware)
# ═══════════════════════════════════════════════════════════════════

_MODEL_CONTEXT = {
    "gpt-4o": 128000, "gpt-4o-mini": 128000, "gpt-5": 128000,
    "claude-sonnet": 200000, "claude-haiku": 200000,
    "glm-5.1": 128000, "glm-5": 128000, "glm-4": 128000,
    "deepseek": 64000, "gemini-2.5-flash": 1000000, "gemini-2.5-pro": 2000000,
    "gemini-3-flash": 1000000, "gemini-3.5-flash-low": 1000000,
    "gemini-3.1-pro-low": 2000000,
    "gemini-3.5-flash": 1000000, "gemini-3.1-pro": 2000000,
    "Gemini 3.5 Flash": 1000000, "Gemini 3.1 Pro": 2000000,
    "Claude Sonnet 4.6": 200000, "Claude Opus 4.6": 200000,
    "GPT-OSS 120B": 128000,
    "claude-sonnet-4-6": 200000, "claude-opus-4-6-thinking": 200000,
    "gpt-oss-120b-medium": 128000,
    "mimo": 32768, "minimax": 32768, "kimi": 128000,
    "_default": 32768,
}

def _context_limit_for_model(model):
    if not model:
        return _MODEL_CONTEXT["_default"]
    ml = model.lower()
    for key, limit in _MODEL_CONTEXT.items():
        if key != "_default" and key in ml:
            return limit
    return _MODEL_CONTEXT["_default"]

def _estimate_tokens(obj):
    if obj is None:
        return 0
    if isinstance(obj, str):
        return max(1, len(obj) // 4)
    try:
        raw = json.dumps(obj, ensure_ascii=False)
    except Exception:
        raw = str(obj)
    return max(1, len(raw) // 4)

def _adaptive_compact(input_data, model, policy=None):
    policy = policy or {}
    context_size = int(policy.get("context_size", _context_limit_for_model(model)))
    input_budget = int(context_size * 0.80)
    estimated = _estimate_tokens(input_data)
    if estimated <= input_budget:
        return input_data, False
    if not isinstance(input_data, list):
        return input_data, False
    reduction = max(0.15, input_budget / max(estimated, 1))
    target_items = max(int(len(input_data) * reduction), 6)
    if target_items >= len(input_data):
        return input_data, False
    head_end = 0
    for i, item in enumerate(input_data):
        t = item.get("type")
        if t == "message" and item.get("role") in ("developer", "system"):
            head_end = i + 1
        elif t == "message" and item.get("role") == "user" and head_end == i:
            head_end = i + 1
        else:
            break
    head = input_data[:head_end]
    keep = max(4, target_items // 3)
    tail_start = max(head_end, len(input_data) - keep)
    while tail_start > head_end:
        t = input_data[tail_start].get("type")
        if t in ("function_call_output", "function_call"):
            tail_start -= 1
        elif t == "message" and input_data[tail_start].get("role") == "assistant":
            tail_start -= 1
        else:
            break
    tail = input_data[tail_start:]
    body = input_data[head_end:tail_start]
    if not body:
        return head + tail, True
    summary_lines = [f"[Auto-compacted: {len(body)} turns removed (budget={input_budget}tok, model={model})]"]
    for item in body[-5:]:
        summary_lines.append(_item_summary(item, max_len=120))
    summary_msg = {"type": "message", "role": "user",
                   "content": [{"type": "input_text", "text": "\n".join(summary_lines)}]}
    print(f"[adaptive-compact] model={model} est={estimated}tok budget={input_budget}tok "
          f"items {len(input_data)}->{len(head)+1+len(tail)}", file=sys.stderr)
    return head + [summary_msg] + tail, True

# ═══════════════════════════════════════════════════════════════════
# Prompt Enhancer
# ═══════════════════════════════════════════════════════════════════

_PROMPT_ENHANCER_SYSTEM = """You are a prompt enhancement assistant for a coding agent (Codex CLI).
Your job: rewrite the user's latest message to be clearer, more specific, and more actionable.
Rules:
- Preserve the user's EXACT intent — never change what they want done
- Add explicit action verbs and step-by-step clarity
- If the message is vague ("fix it", "make it better"), infer context from prior conversation summary and make it specific
- Keep the enhanced prompt concise — no longer than 2x the original
- If the original prompt is already clear and specific, return it unchanged
- Output ONLY the enhanced prompt text, nothing else
- Never add tasks the user didn't ask for"""

_PROMPT_ENHANCER_OFFLINE = """<prompt-enhancer>
<instructions>
You are a coding agent operating inside a context-compacted session. Follow these rules strictly:

1. ACTION CLARITY: Re-read the user's latest message. Identify every explicit and implicit action request. Execute ALL of them — do not skip any.

2. COMPACTED CONTEXT: Previous conversation was summarized. The summary preserves your task history but may lose details. If the user references earlier work ("fix that", "continue", "update it"), infer from the compacted summary what was done and what remains.

3. NO CLARIFICATION ASKING: Never ask "which file?" or "what exactly?" — infer from context. If truly ambiguous, make a reasonable assumption and proceed. The user can correct you.

4. DECISIVE EXECUTION: When the user says "fix", "update", "change", "add", "remove" — do it immediately in the relevant file(s). Do not describe what you would do — actually do it.

5. COMPLETE EDITS: When editing files, make the FULL change requested. Do not partially apply edits or leave placeholders.

6. PRESERVE WORKING STATE: Never break existing functionality. If changing code, keep all surrounding logic intact.

7. MULTI-STEP REQUESTS: If the user asks for multiple things, do ALL of them in sequence. Do not stop after the first one.
</instructions>
</prompt-enhancer>

"""

def _enhance_prompt_llm(text, compaction_summary=""):
    global PROMPT_ENHANCER_MODEL, PROMPT_ENHANCER_URL, PROMPT_ENHANCER_KEY
    if not PROMPT_ENHANCER_MODEL or not PROMPT_ENHANCER_URL:
        return text
    try:
        messages = [
            {"role": "system", "content": _PROMPT_ENHANCER_SYSTEM},
        ]
        if compaction_summary:
            messages.append({"role": "user", "content": f"Context from earlier conversation (compacted):\n{compaction_summary[:2000]}"})
        messages.append({"role": "user", "content": f"Enhance this prompt:\n{text}"})
        body = json.dumps({"model": PROMPT_ENHANCER_MODEL, "messages": messages, "max_tokens": 2000, "temperature": 0.3}).encode()
        headers = {"Content-Type": "application/json"}
        if PROMPT_ENHANCER_KEY:
            headers["Authorization"] = f"Bearer {PROMPT_ENHANCER_KEY}"
        req = urllib.request.Request(f"{PROMPT_ENHANCER_URL.rstrip('/')}/chat/completions", data=body, headers=headers)
        resp = urllib.request.urlopen(req, timeout=15)
        data = json.loads(resp.read())
        enhanced = data.get("choices", [{}])[0].get("message", {}).get("content", "").strip()
        if enhanced and len(enhanced) >= len(text) * 0.5:
            print(f"[prompt-enhancer] AI enhanced: {text[:80]}... -> {enhanced[:80]}...", file=sys.stderr)
            return enhanced
    except Exception as e:
        print(f"[prompt-enhancer] AI enhancement failed: {e}", file=sys.stderr)
    return text

def _apply_prompt_enhancer(input_data):
    global PROMPT_ENHANCER_MODE
    if not isinstance(input_data, list) or len(input_data) == 0:
        return input_data
    last_user_idx = None
    for i in range(len(input_data) - 1, -1, -1):
        item = input_data[i]
        if isinstance(item, dict) and item.get("type") == "message" and item.get("role") == "user":
            last_user_idx = i
            break
    if last_user_idx is None:
        return input_data
    item = input_data[last_user_idx]
    content = item.get("content", "")
    if isinstance(content, list):
        text = content[0].get("text", "") if content else ""
    elif isinstance(content, str):
        text = content
    else:
        return input_data
    if not text or len(text) < 5:
        return input_data
    if text.startswith("<prompt-enhancer>"):
        return input_data
    compaction_summary = ""
    for it in input_data:
        if isinstance(it, dict) and it.get("type") == "message" and it.get("role") == "user":
            c = it.get("content", "")
            t = ""
            if isinstance(c, list):
                t = c[0].get("text", "") if c else ""
            elif isinstance(c, str):
                t = c
            if "[Auto-compacted:" in t:
                compaction_summary = t[:3000]
                break
    if PROMPT_ENHANCER_MODE == "ai-powered" and PROMPT_ENHANCER_MODEL and PROMPT_ENHANCER_URL:
        enhanced = _enhance_prompt_llm(text, compaction_summary)
    else:
        enhanced = text
    enhanced = _PROMPT_ENHANCER_OFFLINE + enhanced
    new_item = dict(item)
    if isinstance(item.get("content"), list):
        new_item["content"] = [{"type": "input_text", "text": enhanced}]
    else:
        new_item["content"] = enhanced
    result = list(input_data)
    result[last_user_idx] = new_item
    print(f"[prompt-enhancer] mode={PROMPT_ENHANCER_MODE} enhanced last user message ({len(text)}->{len(enhanced)} chars)", file=sys.stderr)
    return result

# ═══════════════════════════════════════════════════════════════════
# Tool-call pairing validator
# ═══════════════════════════════════════════════════════════════════

def validate_tool_pairs(input_items):
    if not isinstance(input_items, list):
        return []
    calls = {}
    errors = []
    for idx, item in enumerate(input_items):
        t = item.get("type")
        if t == "function_call":
            cid = item.get("call_id") or item.get("id")
            if cid:
                calls[cid] = idx
        elif t == "function_call_output":
            cid = item.get("call_id") or item.get("id")
            if not cid or cid not in calls:
                errors.append({"index": idx, "call_id": cid, "error": "orphan_function_call_output"})
    return errors

def repair_orphan_tool_outputs(input_items, errors):
    bad = {e["index"] for e in errors}
    repaired = []
    for idx, item in enumerate(input_items):
        if idx in bad:
            output = item.get("output", "")
            repaired.append({"type": "message", "role": "user",
                             "content": [{"type": "input_text",
                                          "text": f"[Proxy: unmatched tool output]\n{str(output)[:4000]}"}]})
        else:
            repaired.append(item)
    return repaired

def synthesize_tool_results_for_chat(input_items):
    """Convert Responses function_call/function_call_output pairs into plain text.

    Some OpenAI-compatible providers accept tool calls on the first turn but fail
    on the next request when role=tool messages are present. For those providers,
    encode tool outputs as normal user text so the model can continue.
    """
    if not isinstance(input_items, list):
        return input_items, False
    calls = {}
    changed = False
    out = []
    for item in input_items:
        t = item.get("type")
        if t == "function_call":
            cid = item.get("call_id") or item.get("id") or ""
            calls[cid] = item
            changed = True
            continue
        if t == "function_call_output":
            cid = item.get("call_id") or item.get("id") or ""
            call = calls.get(cid, {})
            name = call.get("name", "tool")
            args = call.get("arguments", "{}")
            output = item.get("output", "")
            text = (
                "Tool execution result. Continue the task using this result. "
                "Do not repeat the same tool call unless more information is required.\n\n"
                f"Tool: {name}\nArguments:\n```json\n{str(args)[:2000]}\n```\n"
                f"Output:\n```\n{str(output)[:8000]}\n```"
            )
            out.append({"type": "message", "role": "user", "content": [{"type": "input_text", "text": text}]})
            changed = True
            continue
        out.append(item)
    return out, changed

def has_function_call_output(input_items):
    return isinstance(input_items, list) and any(i.get("type") == "function_call_output" for i in input_items)

# ═══════════════════════════════════════════════════════════════════
# Log redaction
# ═══════════════════════════════════════════════════════════════════

_SECRET_PATTERNS = [
    (r"sk-[A-Za-z0-9_\-]{20,}", "[REDACTED:key]"),
    (r"sk-ant-[A-Za-z0-9_\-]{20,}", "[REDACTED:anthropic]"),
    (r"gh[pousr]_[A-Za-z0-9_]{20,}", "[REDACTED:github]"),
    (r"Bearer\s+[A-Za-z0-9._\-]{20,}", "Bearer [REDACTED]"),
]

def _redact(text):
    if not text:
        return text
    import re
    for pattern, replacement in _SECRET_PATTERNS:
        text = re.sub(pattern, replacement, text)
    return text

def _redact_json(obj):
    try:
        raw = json.dumps(obj, ensure_ascii=False)
    except Exception:
        raw = str(obj)
    return _redact(raw)

_MAX_SNAPSHOTS = 200

def save_request_snapshot(request_id, body):
    if not request_id:
        return request_id
    snapshot = {
        "_meta": {
            "request_id": request_id,
            "model": body.get("model", ""),
            "stream": body.get("stream", False),
            "ts": time.time(),
            "ts_iso": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
            "status": "pending",
            "duration_s": None,
            "error": None,
        },
        "request": json.loads(_redact_json(body)),
    }
    path = os.path.join(_REQUESTS_DIR, f"{request_id}.json")
    tmp = path + ".tmp"
    with open(tmp, "w", encoding="utf-8") as f:
        json.dump(snapshot, f, ensure_ascii=False, indent=2)
    os.replace(tmp, path)
    _rotate_snapshots()
    return request_id

def update_snapshot_response(request_id, status, duration_s=None, error=None):
    if not request_id:
        return
    path = os.path.join(_REQUESTS_DIR, f"{request_id}.json")
    if not os.path.exists(path):
        return
    try:
        with open(path) as f:
            snapshot = json.load(f)
        meta = snapshot.get("_meta", {})
        meta["status"] = status
        if duration_s is not None:
            meta["duration_s"] = round(duration_s, 3)
        if error is not None:
            meta["error"] = str(error)[:200]
        snapshot["_meta"] = meta
        tmp = path + ".tmp"
        with open(tmp, "w", encoding="utf-8") as f:
            json.dump(snapshot, f, ensure_ascii=False, indent=2)
        os.replace(tmp, path)
    except Exception:
        pass

def _rotate_snapshots():
    try:
        files = sorted(
            [os.path.join(_REQUESTS_DIR, f) for f in os.listdir(_REQUESTS_DIR) if f.endswith(".json")],
            key=os.path.getmtime,
        )
        while len(files) > _MAX_SNAPSHOTS:
            os.remove(files.pop(0))
    except Exception:
        pass

# ═══════════════════════════════════════════════════════════════════
# Rate-limit token buckets
# ═══════════════════════════════════════════════════════════════════

class TokenBucket:
    def __init__(self, capacity=10, refill=1.0):
        self.capacity = float(capacity)
        self.tokens = float(capacity)
        self.refill = float(refill)
        self.updated = time.monotonic()
        self.lock = threading.Lock()
    def allow(self, cost=1):
        with self.lock:
            now = time.monotonic()
            self.tokens = min(self.capacity, self.tokens + (now - self.updated) * self.refill)
            self.updated = now
            if self.tokens >= cost:
                self.tokens -= cost
                return True
            return False

_rate_buckets = {}
_rate_buckets_lock = threading.Lock()

def _bucket_for_route(route):
    name = route.get("name") or route.get("target_url") or "default"
    with _rate_buckets_lock:
        if name not in _rate_buckets:
            _rate_buckets[name] = TokenBucket(capacity=10, refill=1.0)
        return _rate_buckets[name]

# ═══════════════════════════════════════════════════════════════════
# OpenAI-compat backend
# ═══════════════════════════════════════════════════════════════════

def _inject_stored_reasoning(messages):
    with _last_reasoning_lock:
        snapshot = dict(_last_reasoning_store)
    if not snapshot:
        return messages
    expired = [k for k, v in snapshot.items() if time.time() - v["ts"] > _RESPONSE_TTL]
    for k in expired:
        with _last_reasoning_lock:
            _last_reasoning_store.pop(k, None)
        snapshot.pop(k, None)
    if not snapshot:
        return messages
    latest = max(snapshot.values(), key=lambda v: v["ts"])
    reasoning = latest.get("reasoning", "")
    if not reasoning:
        return messages
    for msg in messages:
        if msg.get("role") == "assistant" and "reasoning_content" not in msg and msg.get("tool_calls"):
            msg["reasoning_content"] = reasoning
    return messages

def oa_input_to_messages(input_data):
    msgs = []
    tool_name_by_id = {}
    if isinstance(input_data, str):
        msgs.append({"role": "user", "content": input_data})
    elif isinstance(input_data, list):
        pending_tool_calls = []
        last_flushed_ids = []
        for item in input_data:
            t = item.get("type")
            if t == "function_call":
                tcid = item.get("call_id") or item.get("id") or uid("tc")
                pending_tool_calls.append(
                    {"id": tcid,
                     "type": "function",
                     "function": {"name": item.get("name", ""),
                                   "arguments": item.get("arguments", "{}")}})
                tool_name_by_id[tcid] = item.get("name", "")
                continue
            if pending_tool_calls:
                last_flushed_ids = [tc["id"] for tc in pending_tool_calls]
                msgs.append({"role": "assistant", "content": None, "tool_calls": pending_tool_calls})
                pending_tool_calls = []
            if t == "message":
                role = item.get("role", "user")
                if role == "developer":
                    role = "system"
                text = ""
                reasoning_text = ""
                content = item.get("content", [])
                if isinstance(content, str):
                    text = content
                else:
                    for part in content:
                        if isinstance(part, str):
                            text += part
                            continue
                        pt = part.get("type", "")
                        if pt in ("input_text", "output_text"):
                            text += part.get("text", "")
                        elif pt in ("reasoning",):
                            for rp in part.get("content", []):
                                reasoning_text += rp.get("text", "")
                        elif pt == "input_image":
                            img = part.get("image_url", part)
                            msgs.append({"role": role, "content": [{"type": "text", "text": text},
                                        {"type": "image_url", "image_url": img}]})
                            text = None
                            break
                if text is not None:
                    msg = {"role": role, "content": text}
                    if reasoning_text and role == "assistant":
                        msg["reasoning_content"] = reasoning_text
                    msgs.append(msg)
            elif t == "function_call_output":
                tcid = item.get("call_id") or item.get("id") or ""
                if not tcid and last_flushed_ids:
                    idx = len([m for m in msgs if m.get("role") == "tool"])
                    if idx < len(last_flushed_ids):
                        tcid = last_flushed_ids[idx]
                msgs.append({"role": "tool", "tool_call_id": tcid,
                             "tool_name": tool_name_by_id.get(tcid, ""),
                             "content": item.get("output", "")})
        if pending_tool_calls:
            msgs.append({"role": "assistant", "content": None, "tool_calls": pending_tool_calls})
    return msgs

def cc_input_to_messages(input_data, instructions="", schema=None):
    """Convert Responses API input into CommandCode /alpha/generate messages.

    [FIX 1] All messages use STRING content (not content blocks).
    CC API rejects params.messages[i].content when it's an array.
    Tool results are role="user" with plain text content.
    Tool calls: inline JSON text in assistant messages (e.g. {"type":"tool-call","id":"..."}).

    The model echoes this format back in its response text-delta events.
    _parse_commandcode_text_tool_calls extracts them via _extract_raw_json_tool_calls.

    Schema parameter is accepted but not used for format decisions —
    the conservative string-content format is always used regardless of schema hints.
    """
    msgs = []
    pending_tool_calls = []
    last_flushed_ids = []

    def text_from_content(content):
        if isinstance(content, str):
            return content
        text = ""
        for part in content or []:
            if isinstance(part, str):
                text += part
                continue
            if not isinstance(part, dict):
                continue
            if part.get("type") in ("input_text", "output_text", "text"):
                text += part.get("text", "")
        return text

    def flush_tool_calls():
        nonlocal pending_tool_calls, last_flushed_ids
        if not pending_tool_calls:
            return
        last_flushed_ids = [tc["id"] for tc in pending_tool_calls]
        # Tool calls as plain text in assistant message
        tc_text = "\n".join(
            json.dumps(tc, ensure_ascii=False) for tc in pending_tool_calls
        )
        msgs.append({"role": "assistant", "content": tc_text})
        pending_tool_calls = []

    if instructions:
        msgs.append({"role": "user", "content": instructions})

    if isinstance(input_data, str):
        msgs.append({"role": "user", "content": input_data})
        return msgs
    if not isinstance(input_data, list):
        return msgs

    for item in input_data:
        if not isinstance(item, dict):
            continue
        t = item.get("type")
        if t == "function_call":
            tcid = item.get("call_id") or item.get("id") or uid("call")
            name = item.get("name") or "exec_command"
            pending_tool_calls.append({
                "type": "tool-call",
                "id": tcid,
                "name": name,
                "arguments": item.get("arguments") or "{}",
            })
            continue
        flush_tool_calls()
        if t == "message":
            role = item.get("role", "user")
            if role not in ("user", "assistant"):
                role = "user"
            text = text_from_content(item.get("content", []))
            msgs.append({"role": role, "content": text})
        elif t == "function_call_output":
            output = item.get("output", "")
            if not isinstance(output, str):
                output = json.dumps(output, ensure_ascii=False)
            # /alpha/generate expects string content for ALL messages
            msgs.append({"role": "user", "content": output[:8000]})
    flush_tool_calls()
    return msgs

def oa_convert_tools(tools, strict=False):
    if not tools:
        return None
    out = []
    for t in tools:
        if t.get("type") != "function":
            continue
        fn = t.get("function", {})
        name = ""
        if fn:
            name = (fn.get("name") or "").strip()
        else:
            name = (t.get("name") or "").strip()
        if not name or name == "null":
            continue
        if fn:
            entry = dict(t)
            if strict and "strict" not in fn:
                entry["function"] = dict(fn, strict=True)
            out.append(entry)
        else:
            entry = {
                "type": "function",
                "function": {"name": name, "description": t.get("description", ""),
                             "parameters": t.get("parameters", {})}
            }
            if strict:
                entry["function"]["strict"] = True
            out.append(entry)
    return out or None

def oa_resp_to_responses(chat_resp, model, resp_id=None):
    choice = chat_resp["choices"][0]
    msg = choice["message"]
    content = msg.get("content") or ""
    finish = choice.get("finish_reason", "stop")
    fm = {"stop": "completed", "length": "incomplete", "tool_calls": "completed", "content_filter": "incomplete"}
    status = fm.get(finish, "incomplete")
    outputs = []
    if content:
        outputs.append({"type": "message", "id": uid("msg"), "role": "assistant", "status": "completed",
                        "content": [{"type": "output_text", "text": content, "annotations": []}]})
    for tc in msg.get("tool_calls") or []:
        fn = tc.get("function", {})
        outputs.append({"type": "function_call", "id": uid("fc"), "call_id": tc.get("id"),
                        "name": fn.get("name"), "arguments": fn.get("arguments", "{}"), "status": "completed"})
    usage = chat_resp.get("usage", {})
    return {"id": resp_id or uid("resp"), "object": "response", "created": int(time.time()),
            "model": model, "status": status, "output": outputs,
            "usage": {"input_tokens": usage.get("prompt_tokens", 0),
                      "output_tokens": usage.get("completion_tokens", 0),
                      "total_tokens": usage.get("total_tokens", 0),
                      "input_tokens_details": {"cached_tokens": usage.get("prompt_tokens_details", {}).get("cached_tokens", 0)}}}

def oa_stream_to_sse(chat_stream, model, req_id, _reasoning_out=None):
    resp_id = req_id or uid("resp")
    msg_id = uid("msg")
    text_buf = ""
    reasoning_buf = ""
    reasoning_opened = False
    tc_buf = {}
    fr = None
    msg_opened = False

    yield emit("response.created", {"type": "response.created",
        "response": {"id": resp_id, "object": "response", "model": model,
                     "status": "in_progress", "created": int(time.time()), "output": []}})
    yield emit("response.in_progress", {"type": "response.in_progress", "response": {"id": resp_id}})

    for line in _stream_with_idle_timeout(chat_stream):
        line = line.decode("utf-8", errors="replace").strip()
        if not line or line.startswith(":") or line == "data: [DONE]":
            continue
        if not line.startswith("data: "):
            continue
        try:
            chunk = json.loads(line[6:])
        except json.JSONDecodeError:
            continue
        choices = chunk.get("choices", [])
        if not choices:
            continue
        delta = choices[0].get("delta", {})
        fr = choices[0].get("finish_reason")

        rc = delta.get("reasoning_content") or delta.get("reasoning")
        if rc:
            if not reasoning_opened:
                reasoning_opened = True
            reasoning_buf += rc
            yield emit("response.reasoning.delta", {"type": "response.reasoning.delta", "delta": rc})

        content = delta.get("content")
        if content:
            if not msg_opened:
                msg_id = uid("msg")
                yield emit("response.output_item.added", {"type": "response.output_item.added",
                    "item": {"type": "message", "id": msg_id, "role": "assistant", "status": "in_progress", "content": []}})
                yield emit("response.content_part.added", {"type": "response.content_part.added",
                    "part": {"type": "output_text", "text": "", "annotations": []}, "item_id": msg_id})
                msg_opened = True
            text_buf += content
            yield emit("response.output_text.delta", {"type": "response.output_text.delta",
                        "delta": content, "item_id": msg_id, "content_index": 0})

        for tc in delta.get("tool_calls") or []:
            idx = tc.get("index", 0)
            if idx not in tc_buf:
                fid = uid("fc")
                tc_buf[idx] = {"id": fid, "call_id": tc.get("id", fid), "name": "", "args": ""}
                yield emit("response.output_item.added", {"type": "response.output_item.added",
                    "item": {"type": "function_call", "id": fid, "call_id": tc_buf[idx]["call_id"],
                             "name": "", "arguments": "", "status": "in_progress"}})
            fn = tc.get("function", {})
            if "name" in fn and fn["name"]:
                tc_buf[idx]["name"] = fn["name"]
            if "arguments" in fn and fn["arguments"]:
                tc_buf[idx]["args"] += fn["arguments"]
                yield emit("response.output_text.delta", {"type": "response.function_call_arguments.delta",
                            "delta": fn["arguments"], "item_id": tc_buf[idx]["id"]})

    reasoning_rsn_id = uid("rsn") if reasoning_buf else None
    if reasoning_opened:
        yield emit("response.reasoning.done", {"type": "response.reasoning.done",
                    "item_id": reasoning_rsn_id, "text": reasoning_buf})

    if msg_opened:
        yield emit("response.output_text.done", {"type": "response.output_text.done",
                    "text": text_buf, "item_id": msg_id, "content_index": 0})
        yield emit("response.content_part.done", {"type": "response.content_part.done",
                    "part": {"type": "output_text", "text": text_buf, "annotations": []}, "item_id": msg_id})
        yield emit("response.output_item.done", {"type": "response.output_item.done",
            "item": {"type": "message", "id": msg_id, "role": "assistant", "status": "completed",
                     "content": [{"type": "output_text", "text": text_buf, "annotations": []}]}})

    for idx in sorted(tc_buf):
        t = tc_buf[idx]
        yield emit("response.function_call_arguments.done", {"type": "response.function_call_arguments.done",
                    "item_id": t["id"], "name": t["name"], "arguments": t["args"]})
        yield emit("response.output_item.done", {"type": "response.output_item.done",
            "item": {"type": "function_call", "id": t["id"], "call_id": t["call_id"],
                     "name": t["name"], "arguments": t["args"], "status": "completed"}})

    fm = {"stop": "completed", "length": "incomplete", "tool_calls": "completed", "content_filter": "incomplete"}
    status = fm.get(fr, "incomplete")
    final_out = []
    if reasoning_buf:
        final_out.append({"type": "reasoning", "id": reasoning_rsn_id, "status": "completed",
                          "content": [{"type": "text", "text": reasoning_buf}]})
    if msg_opened:
        msg_content = []
        if reasoning_buf:
            msg_content.append({"type": "output_text", "text": text_buf, "annotations": []})
        else:
            msg_content.append({"type": "output_text", "text": text_buf, "annotations": []})
        final_out.append({"type": "message", "id": msg_id, "role": "assistant", "status": "completed",
                          "content": msg_content})
    for idx in sorted(tc_buf):
        t = tc_buf[idx]
        final_out.append({"type": "function_call", "id": t["id"], "call_id": t["call_id"],
                          "name": t["name"], "arguments": t["args"], "status": "completed"})
    yield emit("response.completed", {"type": "response.completed",
        "response": {"id": resp_id, "object": "response", "model": model,
                     "status": status, "created": int(time.time()), "output": final_out}})
    if _reasoning_out is not None:
        _reasoning_out["text"] = reasoning_buf
        _reasoning_out["tool_calls"] = [tc_buf[i] for i in sorted(tc_buf)] if tc_buf else []

# ═══════════════════════════════════════════════════════════════════
# Anthropic backend
# ═══════════════════════════════════════════════════════════════════

def an_input_to_messages(input_data):
    msgs = []
    if isinstance(input_data, str):
        msgs.append({"role": "user", "content": input_data})
    elif isinstance(input_data, list):
        for item in input_data:
            t = item.get("type")
            if t == "message":
                role = item.get("role", "user")
                if role == "developer":
                    role = "user"
                text = ""
                thinking_blocks = []
                for part in item.get("content", []):
                    pt = part.get("type", "")
                    if pt in ("input_text", "output_text"):
                        text += part.get("text", "")
                    elif pt in ("reasoning", "thinking"):
                        thinking_text = ""
                        for rp in part.get("content", []):
                            thinking_text += rp.get("text", "")
                        if thinking_text:
                            thinking_blocks.append({"type": "thinking", "thinking": thinking_text, "signature": part.get("signature", "")})
                if role == "assistant":
                    content_parts = []
                    if thinking_blocks:
                        content_parts.extend(thinking_blocks)
                    if text:
                        content_parts.append({"type": "text", "text": text})
                    msgs.append({"role": "assistant", "content": content_parts if content_parts else text})
                else:
                    msgs.append({"role": "user", "content": text})
            elif t == "function_call":
                msgs.append({"role": "assistant", "content": [
                    {"type": "tool_use", "id": item.get("call_id", item.get("id", uid("tu"))),
                     "name": item.get("name", ""),
                     "input": json.loads(item.get("arguments", "{}"))}
                ]})
            elif t == "function_call_output":
                msgs.append({"role": "user", "content": [
                    {"type": "tool_result", "tool_use_id": item.get("id", ""),
                     "content": item.get("output", "")}
                ]})
    return msgs

def an_convert_tools(tools):
    if not tools:
        return None
    out = []
    for t in tools:
        if t.get("type") != "function":
            continue
        fn = t.get("function", {})
        if fn:
            out.append({"name": fn.get("name"), "description": fn.get("description", ""),
                        "input_schema": fn.get("parameters", {"type": "object", "properties": {}})})
        else:
            out.append({"name": t.get("name"), "description": t.get("description", ""),
                        "input_schema": t.get("parameters", {"type": "object", "properties": {}})})
    return out or None

def an_resp_to_responses(anthro_resp, model, resp_id=None):
    blocks = anthro_resp.get("content", [])
    sr = anthro_resp.get("stop_reason", "end_turn")
    sm = {"end_turn": "completed", "max_tokens": "incomplete", "stop_sequence": "completed", "tool_use": "completed"}
    status = sm.get(sr, "incomplete")
    outputs = []
    for b in blocks:
        bt = b.get("type", "")
        if bt == "text":
            outputs.append({"type": "message", "id": uid("msg"), "role": "assistant", "status": "completed",
                            "content": [{"type": "output_text", "text": b.get("text", ""), "annotations": []}]})
        elif bt == "tool_use":
            outputs.append({"type": "function_call", "id": uid("fc"), "call_id": b.get("id", ""),
                            "name": b.get("name", ""), "arguments": json.dumps(b.get("input", {})),
                            "status": "completed"})
        elif bt == "thinking":
            outputs.append({"type": "reasoning", "id": uid("rsn"), "status": "completed",
                            "content": [{"type": "text", "text": b.get("thinking", "")}]})
    usage = anthro_resp.get("usage", {})
    return {"id": resp_id or uid("resp"), "object": "response", "created": int(time.time()),
            "model": model, "status": status, "output": outputs,
            "usage": {"input_tokens": usage.get("input_tokens", 0),
                      "output_tokens": usage.get("output_tokens", 0),
                      "total_tokens": usage.get("input_tokens", 0) + usage.get("output_tokens", 0),
                      "input_tokens_details": {"cached_tokens": 0}}}

def an_stream_to_sse(stream, model, req_id):
    resp_id = req_id or uid("resp")
    completed = []
    msg_id = uid("msg")
    text_buf = ""
    tc_id = None
    tc_call_id = None
    tc_name = ""
    tc_args = ""
    block_type = None
    stop_reason = "end_turn"

    yield emit("response.created", {"type": "response.created",
        "response": {"id": resp_id, "object": "response", "model": model,
                     "status": "in_progress", "created": int(time.time()), "output": []}})
    yield emit("response.in_progress", {"type": "response.in_progress", "response": {"id": resp_id}})

    for raw in stream:
        line = raw.decode("utf-8", errors="replace").strip()
        if not line:
            continue
        if line.startswith("event: "):
            evt_type = line[7:]
            continue
        if not line.startswith("data: "):
            continue
        try:
            data = json.loads(line[6:])
        except json.JSONDecodeError:
            continue

        et = data.get("type", "")

        if et == "message_start":
            pass

        elif et == "content_block_start":
            cb_type = data.get("content_block", {}).get("type", "")
            block_type = cb_type
            if cb_type == "text":
                msg_id = uid("msg")
                yield emit("response.output_item.added", {"type": "response.output_item.added",
                    "item": {"type": "message", "id": msg_id, "role": "assistant",
                             "status": "in_progress", "content": []}})
                yield emit("response.content_part.added", {"type": "response.content_part.added",
                    "part": {"type": "output_text", "text": "", "annotations": []}, "item_id": msg_id})
            elif cb_type == "tool_use":
                cb = data.get("content_block", {})
                tc_id = uid("fc")
                tc_call_id = cb.get("id", tc_id)
                tc_name = cb.get("name", "")
                yield emit("response.output_item.added", {"type": "response.output_item.added",
                    "item": {"type": "function_call", "id": tc_id, "call_id": tc_call_id,
                             "name": tc_name, "arguments": "", "status": "in_progress"}})
            elif cb_type == "thinking":
                pass

        elif et == "content_block_delta":
            dd = data.get("delta", {})
            dt = dd.get("type", "")
            if dt == "text_delta":
                txt = dd.get("text", "")
                text_buf += txt
                yield emit("response.output_text.delta", {"type": "response.output_text.delta",
                            "delta": txt, "item_id": msg_id, "content_index": 0})
            elif dt == "input_json_delta":
                pj = dd.get("partial_json", "")
                tc_args += pj
                yield emit("response.output_text.delta", {"type": "response.function_call_arguments.delta",
                            "delta": pj, "item_id": tc_id})
            elif dt == "thinking_delta":
                tk = dd.get("thinking", "")
                yield emit("response.reasoning.delta", {"type": "response.reasoning.delta", "delta": tk})

        elif et == "content_block_stop":
            if block_type == "text":
                yield emit("response.output_text.done", {"type": "response.output_text.done",
                            "text": text_buf, "item_id": msg_id, "content_index": 0})
                yield emit("response.content_part.done", {"type": "response.content_part.done",
                    "part": {"type": "output_text", "text": text_buf, "annotations": []}, "item_id": msg_id})
                yield emit("response.output_item.done", {"type": "response.output_item.done",
                    "item": {"type": "message", "id": msg_id, "role": "assistant", "status": "completed",
                             "content": [{"type": "output_text", "text": text_buf, "annotations": []}]}})
                completed.append({"type": "message", "id": msg_id, "role": "assistant", "status": "completed",
                                  "content": [{"type": "output_text", "text": text_buf, "annotations": []}]})
                text_buf = ""
            elif block_type == "tool_use":
                yield emit("response.function_call_arguments.done", {"type": "response.function_call_arguments.done",
                            "item_id": tc_id, "name": tc_name, "arguments": tc_args})
                yield emit("response.output_item.done", {"type": "response.output_item.done",
                    "item": {"type": "function_call", "id": tc_id, "call_id": tc_call_id,
                             "name": tc_name, "arguments": tc_args, "status": "completed"}})
                completed.append({"type": "function_call", "id": tc_id, "call_id": tc_call_id,
                                  "name": tc_name, "arguments": tc_args, "status": "completed"})
                tc_id = None
                tc_args = ""
            block_type = None

        elif et == "message_delta":
            stop_reason = data.get("delta", {}).get("stop_reason", "end_turn")

        elif et == "message_stop":
            sm = {"end_turn": "completed", "max_tokens": "incomplete",
                  "stop_sequence": "completed", "tool_use": "completed"}
            status = sm.get(stop_reason, "incomplete")
            yield emit("response.completed", {"type": "response.completed",
                "response": {"id": resp_id, "object": "response", "model": model,
                             "status": status, "created": int(time.time()), "output": completed}})

_DEFAULT_CC_CONFIG = {
    "workingDir": tempfile.gettempdir(),
    "date": "",
    "environment": "windows" if _IS_WINDOWS else "linux",
    "shell": "powershell" if _IS_WINDOWS else "bash",
    "files": [],
    "structure": [],
    "isGitRepo": False,
    "currentBranch": "",
    "mainBranch": "",
    "gitStatus": "",
    "recentCommits": [],
}

def _cc_config():
    cfg = dict(_DEFAULT_CC_CONFIG)
    cfg["date"] = time.strftime("%Y-%m-%d")
    return cfg

def cc_convert_tools(tools):
    return oa_convert_tools(tools)

def _strip_xmlish_tags(text):
    return re.sub(r"<[^>]+>", "", text or "")

def _unwrap_cmd(cmd_val):
    """[FIX 11] Self-healing: unwrap double-wrapped cmd values.

    Model sometimes generates: {"cmd": "{\"cmd\": \"actual_command\"}"}
    Detect when cmd value is itself a JSON object with a nested "cmd" key,
    and extract the real command string. Recursively unwraps up to 3 levels.
    """
    if not isinstance(cmd_val, str) or not cmd_val.startswith("{"):
        return cmd_val
    for _ in range(3):
        try:
            inner = json.loads(cmd_val)
            if isinstance(inner, dict) and "cmd" in inner and isinstance(inner["cmd"], str):
                cmd_val = inner["cmd"]
            else:
                break
        except Exception:
            break
    return cmd_val

def _build_explore_cmd(text_for_url):
    """Module-level explore command builder. Extracts repo URL from text,
    builds a curl pipeline to fetch README, contents listing, and releases.
    Used by _parse_commandcode_text_tool_calls (closure wrapper) and
    cc_stream_to_sse (stuck recovery heuristic)."""
    if not text_for_url:
        return None, None
    url_m = re.search(r"https?://[^\s\]'\\>\",]+", text_for_url)
    repo_url = url_m.group(0).rstrip(")].,;'\\\"") if url_m else ""
    if not repo_url and isinstance(text_for_url, str):
        try:
            _parsed = json.loads(text_for_url)
            if isinstance(_parsed, list):
                for _item in _parsed:
                    _c = _item.get("content", "") if isinstance(_item, dict) else str(_item)
                    url_m2 = re.search(r"https?://[^\s\]'\\>\",]+", _c)
                    if url_m2:
                        repo_url = url_m2.group(0).rstrip(")].,;'\\\"")
                        break
        except Exception:
            pass
    if not repo_url:
        return None, None
    if repo_url.endswith(".git"):
        repo_url = repo_url[:-4]
    if "/api/v1/repos/" not in repo_url:
        host_m = re.match(r"(https?://[^/]+)/(.*)", repo_url)
        if host_m:
            host, path = host_m.groups()
            api_base = f"{host}/api/v1/repos/{path}"
        else:
            api_base = repo_url.replace("/admin/", "/api/v1/repos/")
    else:
        api_base = repo_url
    if _IS_WINDOWS:
        cmd = (
            f"cd $env:TEMP; "
            f"$r = Invoke-WebRequest -Uri '{api_base}/contents/README.md' -UseBasicParsing -TimeoutSec 15 2>$null; "
            f"if ($r) {{ $j = $r.Content | ConvertFrom-Json; [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String($j.content)) | Select-Object -First 600 }}; "
            f"$r2 = Invoke-WebRequest -Uri '{api_base}/contents' -UseBasicParsing -TimeoutSec 15 2>$null; "
            f"if ($r2) {{ $j2 = $r2.Content | ConvertFrom-Json; $j2 | Select-Object -First 50 | ForEach-Object {{ $_.path + ' ' + $_.type }} }}; "
            f"$r3 = Invoke-WebRequest -Uri '{api_base}/releases' -UseBasicParsing -TimeoutSec 15 2>$null; "
            f"if ($r3) {{ ($r3.Content | ConvertFrom-Json | Select-Object -First 3 | ConvertTo-Json).Substring(0, [Math]::Min(2000, ($r3.Content | ConvertFrom-Json | Select-Object -First 3 | ConvertTo-Json).Length)) }}"
        )
    else:
        cmd = (
            f"cd /tmp && "
            f"curl -sL --max-time 15 '{api_base}/contents/README.md' 2>/dev/null | "
            f"python3 -c \"import sys,json,base64; d=json.load(sys.stdin); print(base64.b64decode(d['content']).decode())\" 2>/dev/null | head -600 && "
            f"curl -sL --max-time 15 '{api_base}/contents' 2>/dev/null | python3 -c \"import sys,json; d=json.load(sys.stdin); print('\\n'.join(f'{{x.get(\'path\')}} {{x.get(\'type\')}}' for x in d[:50]))\" 2>/dev/null && "
            f"curl -sL --max-time 15 '{api_base}/releases' 2>/dev/null | python3 -c \"import sys,json; d=json.load(sys.stdin); print(json.dumps(d[:3], indent=2)[:2000])\" 2>/dev/null"
        )
    return cmd, "Explore repository to understand the app and gather README, root contents, and releases for the landing page."

def _parse_commandcode_text_tool_calls(text):
    """Parse CommandCode's text-form tool calls into Responses function calls.

    Handles THREE formats:
      1. XML: ``<tool_call name="bash"><parameter name="command">...</parameter>`` (original)
      2. Function: ``<function=bash>...</function>`` (original)
      3. [FIX 5] Raw JSON inline: {"type":"tool-call","id":"...","name":"exec_command","arguments":"{...}"}

    Format 3 exists because cc_input_to_messages sends tool calls as inline JSON text.
    The CC model echoes this format back in its response.
    Extraction is done by _extract_raw_json_tool_calls() which is appended after the
    XML pattern loop. See that function for details on malformed-JSON handling.

    Tolerant of: unescaped inner quotes, unbalanced braces, missing type/id fields,
    sandbox_permissions at top level vs nested inside arguments, etc.
    """
    calls = []
    if not text:
        return calls

    _build_explore_cmd_local = _build_explore_cmd

    # [FIX 17] DSML tool_call blocks used by the model now.
    # Example:
    #   <｜｜DSML｜｜tool_calls>
    #   <｜｜DSML｜｜invoke name="exec">
    #   <｜｜DSML｜｜parameter name="command" string="true">curl ...</｜｜DSML｜｜parameter>
    #   <｜｜DSML｜｜parameter name="sandbox_permissions" string="true">require_escalated</｜｜DSML｜｜parameter>
    #   <｜｜DSML｜｜parameter name="justification" string="true">...</｜｜DSML｜｜parameter>
    #   <｜｜DSML｜｜parameter name="prefix_rule" string="true">["/bin/bash", "-lc", "curl ..."]</｜｜DSML｜｜parameter>
    #   </｜｜DSML｜｜invoke>
    #   </｜｜DSML｜｜tool_calls>
    for m in re.finditer(r"<[^>]*tool_calls[^>]*>(.*?)</[^>]*tool_calls[^>]*>", text, re.DOTALL | re.IGNORECASE):
        block = m.group(1) or ""
        for im in re.finditer(r"<[^>]*invoke[^>]*name=\"([^\"]+)\"[^>]*>(.*?)</[^>]*invoke>", block, re.DOTALL | re.IGNORECASE):
            raw_name = (im.group(1) or "").strip()
            body = (im.group(2) or "").strip()
            if not body:
                continue
            cmd = None
            sandbox_permissions = None
            justification = None
            # Parameter tags are the canonical source.
            for pm in re.finditer(r"<[^>]*parameter[^>]*name=\"([^\"]+)\"[^>]*>(.*?)</[^>]*parameter>", body, re.DOTALL | re.IGNORECASE):
                key = (pm.group(1) or "").strip().lower()
                val = _strip_xmlish_tags(pm.group(2)).strip()
                # [FIX 21] Accept both "command" and "cmd" parameter names.
                # The tool schema defines the parameter as "cmd" (see exec_command schema),
                # but the model sometimes uses "command" (especially from prefix_rule fallback).
                # Previously only "command" was accepted, so DSML blocks with name="cmd"
                # were silently dropped — causing Codex CLI to stop mid-task.
                if key in ("command", "cmd"):
                    cmd = val
                elif key == "prefix_rule" and not cmd:
                    try:
                        pr_obj = json.loads(val)
                    except Exception:
                        pr_obj = None
                    if isinstance(pr_obj, list) and pr_obj and isinstance(pr_obj[-1], str):
                        cmd = pr_obj[-1]
                elif key == "sandbox_permissions":
                    sandbox_permissions = val
                elif key == "justification":
                    justification = val

            # [FIX 20] Support explore / explore_agent in DSML blocks
            is_explore = raw_name.lower() in ("explore", "explore_agent")
            if is_explore:
                explore_cmd, explore_just = _build_explore_cmd_local(body)
                if explore_cmd:
                    cmd = explore_cmd
                    justification = explore_just

            # Fallback: if the body contains a raw JSON command.
            if not cmd:
                jm = re.search(r'"(?:command|cmd)"\s*:\s*"((?:[^"\\]|\\.)*)"', body, re.DOTALL)
                if jm:
                    cmd = jm.group(1).replace('\\n', '\n').replace('\\"', '"').strip()
            if not cmd:
                continue
            # [FIX 19] Translate execute_request and other variations to exec_command (CLI only supports exec_command)
            # [FIX 20] Translate explore and explore_agent to exec_command
            tool_name = "exec_command" if raw_name.lower() in ("exec", "bash", "shell", "terminal", "run_command", "execute_request", "execute_command", "run_shell_command", "run_shell", "run", "explore", "explore_agent") else raw_name
            args = {"cmd": _unwrap_cmd(cmd)}
            if sandbox_permissions:
                args["sandbox_permissions"] = sandbox_permissions if sandbox_permissions in ("use_default", "require_escalated", "with_user_approval") else "require_escalated"
            if justification:
                args["justification"] = justification
            calls.append({
                "full_match": m.group(0),
                "name": tool_name,
                "arguments": json.dumps(args, ensure_ascii=False),
            })

    # [FIX 16] Native <bash> blocks from CommandCode.
    # Example:
    #   <bash>
    #   sandbox_permissions: require_escalated
    #   justification: ...
    #   prefix_rule: ["/bin/bash", "-lc", "curl ..."]
    #   </bash>
    # Convert into exec_command calls by extracting the command from prefix_rule.
    for m in re.finditer(r"<bash>(.*?)</bash>", text, re.DOTALL | re.IGNORECASE):
        body = (m.group(1) or "").strip()
        if not body:
            continue
        sandbox_permissions = None
        justification = None
        cmd = None
        # Try line-oriented parsing first.
        for line in body.splitlines():
            s = line.strip()
            if s.lower().startswith("sandbox_permissions:"):
                sandbox_permissions = s.split(":", 1)[1].strip()
            elif s.lower().startswith("justification:"):
                justification = s.split(":", 1)[1].strip()
            elif s.lower().startswith("prefix_rule:"):
                pr = s.split(":", 1)[1].strip()
                try:
                    pr_obj = json.loads(pr)
                except Exception:
                    pr_obj = None
                if isinstance(pr_obj, list) and pr_obj:
                    # If the last arg exists, it is typically the shell command.
                    cmd = pr_obj[-1] if isinstance(pr_obj[-1], str) else None
                elif pr.startswith("[") and pr.endswith("]"):
                    parts = re.findall(r'"((?:[^"\\]|\\.)*)"', pr)
                    if parts:
                        cmd = parts[-1].encode().decode("unicode_escape")
        # Fallback: grab a shell-looking line if prefix_rule wasn't parseable.
        if not cmd:
            for line in body.splitlines():
                s = line.strip()
                if re.match(r"^(curl|wget|python3?|node|npm|pnpm|yarn|cat|ls|find|grep|rg|sed|awk|git|mkdir|touch|printf|echo)\b", s):
                    cmd = s
                    break
        if not cmd:
            continue
        args = {"cmd": cmd}
        if sandbox_permissions:
            args["sandbox_permissions"] = sandbox_permissions if sandbox_permissions in ("use_default", "require_escalated", "with_user_approval") else "require_escalated"
        if justification:
            args["justification"] = justification
        calls.append({
            "full_match": m.group(0),
            "name": "exec_command",
            "arguments": json.dumps(args, ensure_ascii=False),
        })

    # [FIX 15] Native <explore_agent> blocks from CommandCode.
    # Format seen in logs:
    #   <explore_agent>\nmessages: [{...}]\n</explore_agent>
    # Treat as an assistant-requested agent call so the loop can continue.
    for m in re.finditer(r"<explore_agent>(.*?)</explore_agent>|<explore_agent>\s*messages:\s*(\[.*?\])", text, re.DOTALL | re.IGNORECASE):
        body = m.group(1) or m.group(2) or ""
        body = body.strip()
        msgs = None
        if body:
            try:
                msgs = json.loads(body) if body.startswith("[") else None
            except Exception:
                msgs = None
        if msgs is None and body:
            mm = re.search(r"(\[.*\])", body, re.DOTALL)
            if mm:
                try:
                    msgs = json.loads(mm.group(1))
                except Exception:
                    msgs = None
        if msgs is None:
            msgs = body
        text_for_url = body if isinstance(body, str) else json.dumps(body, ensure_ascii=False)
        cmd, justification = _build_explore_cmd_local(text_for_url)
        if not cmd:
            cmd = "echo 'explore_agent: unable to extract repository URL'"
            justification = "Fallback for explore_agent block without URL."
        args = {"cmd": cmd}
        if justification:
            args["justification"] = justification
        calls.append({
            "full_match": m.group(0),
            "name": "exec_command",
            "arguments": json.dumps(args, ensure_ascii=False),
        })

    if not calls and text.count("<explore_agent>") >= 2:
        url_m = re.search(r"https?://[^\s\]'\\>\"]+", text)
        if not url_m:
            for prev_url in _last_user_urls:
                url_m = re.search(r"https?://[^\s\]'\\>\"]+", prev_url)
                if url_m:
                    break
        if url_m:
            explore_url = url_m.group(0).rstrip(")].,;'\\")
            cmd, justification = _build_explore_cmd_local(explore_url)
            if cmd:
                calls.append({
                    "full_match": "<explore_agent>...",
                    "name": "exec_command",
                    "arguments": json.dumps({"cmd": cmd, "justification": justification or "Explore repository"}, ensure_ascii=False),
                })

    # [FIX 24] Handle <require_escalation> and <request_escalation_permission> blocks.
    # The model produces these when it wants elevated permissions but the CC
    # adapter doesn't support them. Synthesize a proceed command so the loop continues.
    if not calls:
        for m in re.finditer(r"<(?:require_escalation|request_escalation_permission)>(.*?)</(?:require_escalation|request_escalation_permission)>", text, re.DOTALL | re.IGNORECASE):
            body_escal = (m.group(1) or "").strip()
            _inner_url_m = re.search(r"https?://[^\s\]'\\>\",]+", body_escal)
            if _inner_url_m:
                _e_url = _inner_url_m.group(0).rstrip(")].,;'\\\"")
                _e_cmd, _e_just = _build_explore_cmd_local(_e_url)
                if _e_cmd:
                    calls.append({
                        "full_match": m.group(0),
                        "name": "exec_command",
                        "arguments": json.dumps({"cmd": _e_cmd, "justification": _e_just or "Escalation block with URL — auto-proceed"}, ensure_ascii=False),
                    })
                    continue
            if not calls:
                calls.append({
                    "full_match": m.group(0),
                    "name": "exec_command",
                    "arguments": json.dumps({"cmd": "echo 'escalation: auto-proceeding — no specific command in escalation block'", "justification": "Auto-proceed past escalation request"}, ensure_ascii=False),
                })

    # [FIX 24b] Bare <require_escalation ... /> or <request_escalation_permission ... />
    # without closing tags. Just auto-proceed.
    if not calls and re.search(r"<(?:require_escalation|request_escalation_permission)[\s/>]", text, re.IGNORECASE):
        calls.append({
            "full_match": "<escalation_bare/>",
            "name": "exec_command",
            "arguments": json.dumps({"cmd": "echo 'escalation: auto-proceeding past bare escalation tag'", "justification": "Auto-proceed past bare escalation tag"}, ensure_ascii=False),
        })

    patterns = [
        r"<tool_call(?:\s+name=['\"]?([^'\">\s]+)['\"]?)?>(.*?)</tool_call[)]?>",
        r"<function=(\w+)>(.*?)</function>",
        # [FIX 14] CC model actual output: <tool_call type="bash">\n{"command":"...", "description":"..."}
        # No </tool_call) closing tag — body is a raw JSON object
        r"<tool_call(?:\s+type=['\"]?(\w+)['\"]?)?>\s*(\{.*?\})(?:\s*</tool_call)?",
    ]

    def _find_balanced_brace(text, start):
        """Find the closing brace matching text[start], handling quoted strings."""
        if start >= len(text) or text[start] != '{':
            return -1
        depth = 0
        i = start
        in_str = False
        escape = False
        while i < len(text):
            ch = text[i]
            if escape:
                escape = False
            elif ch == '\\':
                escape = True
            elif ch == '"':
                in_str = not in_str
            elif not in_str:
                if ch == '{':
                    depth += 1
                elif ch == '}':
                    depth -= 1
                    if depth == 0:
                        return i
            i += 1
        return -1

    def _extract_field(text, key, end_chars=',}'):
        """Extract a field value after "key": in rough JSON text.

        [FIX 7] Handles values starting with \" (backslash-quote) which occurs when
        the model generates properly-escaped JSON inside a string value.
        Without this fix, _extract_field returns None for escaped values,
        causing sandbox_permissions/justification to not be extracted from
        the parsed args dict (falling through to raw snippet extraction).

        Also tolerant of unescaped quotes inside string values.
        Returns None if key not found or value is empty.
        """
        pat = re.compile(r'"' + re.escape(key) + r'"\s*:\s*', re.DOTALL)
        m = pat.search(text)
        if not m:
            return None
        val_start = m.end()
        # Skip leading backslash-escape if the value starts with \" (nested JSON string)
        if val_start < len(text) and text[val_start] == '\\':
            val_start += 1
        # Check if value is a string
        if val_start < len(text) and text[val_start] == '"':
            s = val_start + 1
            buf = []
            while s < len(text):
                ch = text[s]
                if ch == '\\' and s + 1 < len(text):
                    buf.append(text[s+1])
                    s += 2
                elif ch == '"':
                    return ''.join(buf)
                elif ch in end_chars and not buf:
                    return None
                else:
                    buf.append(ch)
                    s += 1
            return ''.join(buf)
        # Object value: find balanced brace
        if val_start < len(text) and text[val_start] == '{':
            end = _find_balanced_brace(text, val_start)
            if end > val_start:
                return text[val_start:end+1]
        return None

    def _extract_args(text):
        """Extract arguments value from tool-call JSON, handling multiple malformed formats.

        [FIX 6] THREE-TIER PARSER — solves double-wrapped arguments bug:
          Model generates arguments in TWO different escaped forms:
            A) Unescaped: "arguments": "{"cmd": "curl ...", "sp": "allow_all"}"
               → naive brace-counting finds boundaries correctly
            B) Escaped:   "arguments": "{\\"cmd\\": \\"curl...\\"}"
               → json.loads fails on \\ at structural level
               → unescape \\" → " and retry
               → unicode_escape decode and retry

        Returns the raw JSON string (after best-effort unescaping).
        Caller does json.loads() on the result.
        If all 3 tiers fail, returns raw text (caller handles as fallback).
        """
        m = re.search(r'"(?:arguments|input)"\s*:\s*"?', text)
        if not m:
            return None
        start = m.end()
        if start < len(text) and text[start] == '"':
            start += 1
        if start >= len(text) or text[start] != '{':
            return None
        depth = 0
        i = start
        while i < len(text):
            ch = text[i]
            if ch == '{':
                depth += 1
            elif ch == '}':
                depth -= 1
                if depth == 0:
                    raw = text[start:i+1]

                    # Try JSON.parse as-is
                    try:
                        json.loads(raw)
                        return raw
                    except json.JSONDecodeError:
                        pass

                    # Try after unescaping inner \" -> "
                    unescaped = raw.replace('\\"', '"')
                    try:
                        json.loads(unescaped)
                        return unescaped
                    except json.JSONDecodeError:
                        pass

                    # Try after also unescaping \\n -> \n etc
                    try:
                        fixed = raw.encode().decode('unicode_escape')
                        json.loads(fixed)
                        return fixed
                    except Exception:
                        pass

                    # Give up — return raw text
                    return raw
            i += 1
        return None

    def _extract_raw_json_tool_calls(t):
        """[FIX 5] Extract raw JSON tool-call objects from free text.

        Finds "type":"tool-call" (or tool_call/function_call) in text, then extracts
        name/id/arguments/sandbox_permissions/justification via field-level regex.

        Delegates to _extract_args() for the arguments field (handles unescaped + escaped JSON).
        Delegates to _extract_field() for name/id/sandbox_permissions/justification
          (with FIX 7 for leading-backslash handling).

        Normalizes sandbox_permissions to valid values (use_default|require_escalated|with_user_approval)
        [FIX 6] Prevents double-wrapped args: {"cmd": "{\"cmd\": \"curl...\"}"}
        """
        results = []
        idx = 0
        while True:
            m = re.search(r'"type"\s*:\s*"(tool-call|tool_call|function_call)"', t[idx:])
            if not m:
                break
            tc_pos = idx + m.start()
            snippet = t[tc_pos:]
            idx = tc_pos + 1
            tc_type = m.group(1)
            tc_name = _extract_field(snippet, "name")
            if not tc_name:
                continue
            tc_id = _extract_field(snippet, "id")

            # [FIX 20] Support explore / explore_agent in raw JSON tool calls
            is_explore = tc_name.lower() in ("explore", "explore_agent")

            if is_explore:
                # Build explore command from the whole snippet/arguments
                explore_cmd, explore_just = _build_explore_cmd_local(snippet)
                if explore_cmd:
                    args = {"cmd": explore_cmd}
                    if explore_just:
                        args["justification"] = explore_just
                else:
                    args = {"cmd": "echo 'explore: unable to extract repository URL'", "justification": "Fallback for explore tool call without URL."}
                tool_name = "exec_command"
            else:
                # [FIX 19] Translate execute_request and other variations to exec_command (CLI only supports exec_command)
                tool_name = "exec_command" if tc_name.lower() in ("exec", "bash", "shell", "terminal", "run_command", "execute_request", "execute_command", "run_shell_command", "run_shell", "run") else tc_name
                args_raw = _extract_args(snippet) or _extract_field(snippet, "arguments") or _extract_field(snippet, "input") or "{}"
                try:
                    args = json.loads(args_raw) if args_raw.startswith('{') else {"cmd": args_raw}
                except Exception:
                    args = {"cmd": args_raw}
                if "cmd" not in args or not args["cmd"]:
                    args["cmd"] = str(args)
                # [FIX 11] Self-healing: unwrap double-wrapped cmd values
                args["cmd"] = _unwrap_cmd(args.get("cmd", ""))

            # Normalize sandbox_permissions to valid values
            _VALID_SP = frozenset({"use_default", "require_escalated", "with_user_approval"})
            if "sandbox_permissions" in args:
                spv = args["sandbox_permissions"]
                if isinstance(spv, dict):
                    args["sandbox_permissions"] = "require_escalated" if spv.get("require_escalated") else "use_default"
                elif isinstance(spv, str) and spv not in _VALID_SP:
                    args["sandbox_permissions"] = "require_escalated"
            else:
                # Fallback: extract from raw snippet (model puts it at top level)
                sp_raw = _extract_field(snippet, "sandbox_permissions")
                if sp_raw:
                    try:
                        sp_obj = json.loads(sp_raw) if sp_raw.startswith('{') else {"require_escalated": bool(sp_raw)}
                        if isinstance(sp_obj, dict) and sp_obj.get("require_escalated"):
                            args["sandbox_permissions"] = "require_escalated"
                    except Exception:
                        pass
            if "justification" not in args:
                just_raw = _extract_field(snippet, "justification")
                if just_raw:
                    args["justification"] = just_raw
            results.append({
                "full_match": snippet,
                "name": tool_name,
                "arguments": json.dumps(args, ensure_ascii=False),
            })
        return results

    for pat in patterns:
        for m in re.finditer(pat, text, re.DOTALL | re.IGNORECASE):
            if pat.startswith("<function"):
                raw_name = m.group(1)
                body = m.group(2)
            else:
                raw_name = m.group(1) or ""
                body = m.group(2)
                nm = re.search(r"<tool\s+name=[\"']?([^\"'>\s]+)", body, re.IGNORECASE)
                raw_name = raw_name or (nm.group(1) if nm else "bash")
            params = {}
            body_stripped = body.strip()
            if body_stripped.startswith("{"):
                try:
                    obj = json.loads(body_stripped)
                    cmd = obj.get("command") or obj.get("cmd") or ""
                    cmd = _unwrap_cmd(cmd)  # [FIX 11]
                    if cmd:
                        # [FIX 19] Translate execute_request and other variations to exec_command (CLI only supports exec_command)
                        tool_name = "exec_command" if raw_name.lower() in ("exec", "bash", "shell", "terminal", "run_command", "execute_request", "execute_command", "run_shell_command", "run_shell", "run") else raw_name
                        args = {"cmd": cmd}
                        sp = obj.get("sandbox_permissions")
                        if isinstance(sp, dict) and sp.get("require_escalated"):
                            args["sandbox_permissions"] = "require_escalated"
                        elif isinstance(sp, str):
                            args["sandbox_permissions"] = sp
                        if obj.get("justification"):
                            args["justification"] = obj.get("justification")
                        calls.append({"full_match": m.group(0), "name": tool_name, "arguments": json.dumps(args)})
                        continue
                except Exception:
                    pass
            for pm in re.finditer(r"<parameter(?:\s+name=[\"']?(\w+)[\"']?|=(\w+))>(.*?)</parameter>", body, re.DOTALL | re.IGNORECASE):
                key = pm.group(1) or pm.group(2) or "text"
                params[key] = _strip_xmlish_tags(pm.group(3)).strip()

            # [FIX 20] Support explore / explore_agent in XML tool calls
            is_explore = raw_name.lower() in ("explore", "explore_agent")
            if is_explore:
                explore_cmd, explore_just = _build_explore_cmd_local(body)
                if explore_cmd:
                    cmd = explore_cmd
                    params["justification"] = explore_just
                else:
                    cmd = ""
            else:
                cmd = params.get("command") or params.get("cmd") or ""

            if not cmd and body_stripped.startswith("{"):
                cm = re.search(r'"(?:command|cmd)"\s*:\s*"(.*?)"\s*,\s*"(?:sandbox_permissions|justification|prefix_rule)"', body, re.DOTALL)
                if not cm:
                    cm = re.search(r'"(?:command|cmd)"\s*:\s*"(.*?)"\s*}', body, re.DOTALL)
                if cm:
                    cmd = cm.group(1)
                    cmd = cmd.replace('\\n', '\n').replace('\\"', '"').strip()
                    cmd = _unwrap_cmd(cmd)  # [FIX 11]
                    if re.search(r'"sandbox_permissions"\s*:\s*\{\s*"require_escalated"\s*:\s*true\s*\}', body, re.DOTALL):
                        params["sandbox_permissions"] = "require_escalated"
                    jm = re.search(r'"justification"\s*:\s*"(.*?)"\s*(?:,|})', body, re.DOTALL)
                    if jm:
                        params["justification"] = jm.group(1).replace('\\n', '\n').replace('\\"', '"').strip()
            if not cmd:
                stripped = _strip_xmlish_tags(body)
                lines = [ln.strip() for ln in stripped.splitlines() if ln.strip()]
                for i, ln in enumerate(lines):
                    if re.match(r"^(curl|wget|python3?|node|npm|pnpm|yarn|cat|ls|find|grep|rg|sed|awk|git|mkdir|touch|printf|echo)\b", ln):
                        cmd = "\n".join(lines[i:])
                        break
                if not cmd and lines:
                    cmd = "\n".join(lines)
            if not cmd:
                continue
            # [FIX 19] Translate execute_request and other variations to exec_command (CLI only supports exec_command)
            # [FIX 20] Translate explore and explore_agent to exec_command
            tool_name = "exec_command" if raw_name.lower() in ("exec", "bash", "shell", "terminal", "run_command", "execute_request", "execute_command", "run_shell_command", "run_shell", "run", "explore", "explore_agent") else raw_name
            args = {"cmd": _unwrap_cmd(cmd)}  # [FIX 11] all paths must unwrap
            if params.get("sandbox_permissions"):
                args["sandbox_permissions"] = params["sandbox_permissions"]
            if params.get("justification"):
                args["justification"] = params["justification"]
            calls.append({"full_match": m.group(0), "name": tool_name, "arguments": json.dumps(args)})

    # Also extract raw JSON tool-call objects embedded in free text
    calls.extend(_extract_raw_json_tool_calls(text))

    # [FIX 18] Native <todo_write> blocks from the model (used for checklist/task tracking)
    # The model outputs a task checklist in a custom <todo_write> XML tag block:
    #   <todo_write>
    #     <todos>[{"id":"1","status":"in_progress","description":"..."}]</todos>
    #   </todo_write>
    # We parse this and map it to a standard 'TodoWrite' tool call so the CLI agent loop continues execution.
    for m in re.finditer(r"<todo_write>(.*?)</todo_write>", text, re.DOTALL | re.IGNORECASE):
        body = (m.group(1) or "").strip()
        if not body:
            continue
        todos_match = re.search(r"<todos>(.*?)</todos>", body, re.DOTALL | re.IGNORECASE)
        if not todos_match:
            continue
        raw_todos_json = todos_match.group(1).strip()
        try:
            raw_todos = json.loads(raw_todos_json)
        except Exception as e:
            print(f"[translate-proxy] [FIX 18] Failed to parse <todos> JSON: {e}", file=sys.stderr)
            raw_todos = None
        if isinstance(raw_todos, list):
            parsed_todos = []
            for item in raw_todos:
                if isinstance(item, dict):
                    desc = item.get("description") or item.get("content") or ""
                    parsed_todos.append({
                        "content": desc,
                        "activeForm": item.get("activeForm") or desc,
                        "status": item.get("status") or "pending"
                    })
            calls.append({
                "full_match": m.group(0),
                "name": "TodoWrite",
                "arguments": json.dumps({"todos": parsed_todos}, ensure_ascii=False)
            })

    # [FIX 11] Self-healing: last-chance sanitization pass on ALL extracted calls
    calls = _sanitize_tool_calls(calls)
    return calls

def _sanitize_tool_calls(calls):
    """[FIX 11/T3] Post-extraction self-healing validation layer.

    Runs AFTER all extraction paths (XML, raw JSON, regex) have produced their
    tool calls. This is the final safety net before calls are returned to the
    streaming/response builder.

    Validates and repairs:
      - Double/triple-wrapped cmd values (recursive unwrap)
      - cmd that looks like JSON object/string instead of shell command
      - cmd containing escaped newlines or quotes that would break bash
      - Empty or whitespace-only cmd → replaced with diagnostic string

    Logs warnings for any repair made (visible in stderr/proxy logs).
    Returns sanitized list (may be shorter if irreparable calls are dropped).
    """
    cleaned = []
    for i, call in enumerate(calls):
        # [FIX 18] Skip sanitization pass for non-shell tool calls (e.g., TodoWrite)
        # Sanitization specifically validates and repairs command shell executions (the 'cmd' argument).
        # Running it on other tools without a 'cmd' parameter (like TodoWrite) would falsely flag
        # them as containing JSON garbage or empty commands, corrupting their actual parameters.
        if call.get("name") != "exec_command":
            cleaned.append(call)
            continue

        try:
            args_raw = call.get("arguments", "{}")
            if isinstance(args_raw, str):
                args = json.loads(args_raw)
            else:
                args = dict(args_raw)
        except Exception:
            cleaned.append(call)
            continue
        cmd = args.get("cmd", "")
        repaired = False

        # Detect and unwrap nested JSON cmd values (up to 4 levels deep)
        unwrapped = _unwrap_cmd(cmd)
        if unwrapped != cmd:
            cmd = unwrapped
            args["cmd"] = cmd
            repaired = True

        # Detect cmd that is still a JSON object (unwrap missed it or deeper nesting)
        if isinstance(cmd, str) and cmd.strip().startswith("{"):
            try:
                inner = json.loads(cmd)
                if isinstance(inner, dict):
                    for key in ("cmd", "command", "c"):
                        if key in inner and isinstance(inner[key], str):
                            args["cmd"] = inner[key]
                            repaired = True
                            break
            except Exception:
                pass

        # Detect cmd that looks like a JSON-encoded string with backslash escapes
        _cmd = args.get("cmd", "")
        if _cmd and ('\\"' in _cmd or "\\n" in _cmd or _cmd.count("{") > _cmd.count("}")):
            try:
                decoded = _cmd.encode().decode("unicode_escape")
                if decoded != _cmd and not decoded.startswith("{"):
                    args["cmd"] = decoded
                    repaired = True
            except Exception:
                pass

        # Final guard: if cmd is empty or just JSON garbage, make it obvious
        _final_cmd = args.get("cmd", "")
        if not _final_cmd or _final_cmd.strip() in ("{}", "null", "None", ""):
            _safe_preview = args_raw[:200].replace('"', "'").replace('\\', '/')
            args["cmd"] = f"# [CC-SANITIZER] empty cmd recovered from: {_safe_preview}"
            repaired = True
        elif _final_cmd.startswith("{") and len(_final_cmd) < 500:
            # Still looks like JSON — likely unrecoverable, flag it
            _safe_preview = _final_cmd.replace('"', "'").replace('\\', '/')
            args["cmd"] = f"# [CC-SANITIZER] suspicious cmd (still JSON): {_safe_preview}"
            repaired = True

        if repaired:
            print(f"[translate-proxy] [CC-SANITIZER] repaired tool call #{i}: "
                  f"name={call.get('name')} cmd_preview={str(args.get('cmd',''))[:120]}",
                  file=sys.stderr)

        call["arguments"] = json.dumps(args, ensure_ascii=False)
        cleaned.append(call)

    return cleaned

def _parse_cc_line(line):
    """Parse a raw line from CommandCode /alpha/generate, stripping SSE data: prefix."""
    stripped = line.strip()
    if not stripped:
        return None
    if stripped.startswith("data: "):
        stripped = stripped[6:]
    elif stripped.startswith("data:"):
        stripped = stripped[5:]
    if not stripped or stripped == "[DONE]":
        return None
    try:
        return json.loads(stripped)
    except json.JSONDecodeError:
        return None


def _iter_cc_events(stream):
    """Yield parsed JSON events from a CommandCode /alpha/generate stream.
    Handles raw JSON lines, SSE data: events, and multi-event chunks.
    """
    buf = ""
    for chunk in _stream_with_idle_timeout(stream):
        buf += chunk.decode("utf-8", errors="replace")
        while "\n" in buf:
            line, buf = buf.split("\n", 1)
            d = _parse_cc_line(line)
            if d is not None:
                yield d
    # Process remaining buffer (non-streaming single-JSON response)
    if buf.strip():
        if buf.strip().startswith("{"):
            d = _parse_cc_line(buf)
            if d is not None:
                yield d
        else:
            for line in buf.strip().split("\n"):
                d = _parse_cc_line(line)
                if d is not None:
                    yield d


def cc_resp_to_responses(cc_lines, model, resp_id=None):
    text = ""
    usage = {}
    if isinstance(cc_lines, str):
        cc_lines = [cc_lines]
    for line in cc_lines:
        d = _parse_cc_line(line)
        if d is None:
            continue
        t = d.get("type", "")
        if t == "text-delta":
            text += d.get("text", "")
        elif t == "finish-step":
            u = d.get("usage", {})
            usage = {
                "input_tokens": u.get("inputTokens", 0),
                "output_tokens": u.get("outputTokens", 0),
                "total_tokens": u.get("inputTokens", 0) + u.get("outputTokens", 0),
            }
    outputs = []
    if text:
        outputs.append({"type": "message", "id": uid("msg"), "role": "assistant",
                         "status": "completed",
                         "content": [{"type": "output_text", "text": text, "annotations": []}]})
    return {"id": resp_id or uid("resp"), "object": "response", "created": int(time.time()),
            "model": model, "status": "completed", "output": outputs,
            "usage": {"input_tokens": usage.get("input_tokens", 0),
                      "output_tokens": usage.get("output_tokens", 0),
                      "total_tokens": usage.get("total_tokens", 0),
                      "input_tokens_details": {"cached_tokens": 0}}}

def cc_stream_to_sse(cc_stream, model, req_id):
    resp_id = req_id or uid("resp")
    msg_id = uid("msg")
    text_buf = ""

    yield emit("response.created", {"type": "response.created",
        "response": {"id": resp_id, "object": "response", "model": model,
                     "status": "in_progress", "created": int(time.time()), "output": []}})
    yield emit("response.in_progress", {"type": "response.in_progress", "response": {"id": resp_id}})

    total_usage = {}
    _event_types_seen = set()
    _debug_log_path = os.path.expanduser("~/.cache/codex-proxy/cc-debug.log")
    _debug_fh = open(_debug_log_path, "a")  # [FIX 14] always write debug to FILE (not just stderr which may be piped)
    _deflog = lambda *a, **kw: print(*a, file=_debug_fh, flush=True, **kw)

    for d in _iter_cc_events(cc_stream):
        t = d.get("type", "")
        _event_types_seen.add(t)

        if t == "text-delta":
            txt = d.get("text", "")
            if txt:
                text_buf += txt

        elif t == "finish-step":
            u = d.get("usage", {})
            total_usage = {
                "input_tokens": u.get("inputTokens", 0),
                "output_tokens": u.get("outputTokens", 0),
                "total_tokens": u.get("inputTokens", 0) + u.get("outputTokens", 0),
            }
        elif t not in ("text-delta", "finish-step"):
            _deflog(f"[CC-DEBUG] unexpected event type: {t} keys={list(d.keys())[:5]} data={str(d)[:200]}")

    _deflog(f"[CC-DEBUG] stream ended. event_types={_event_types_seen} text_buf_len={len(text_buf)}")

    parsed_tool_calls = _parse_commandcode_text_tool_calls(text_buf)
    _deflog(f"[CC-DEBUG] text_buf len={len(text_buf)} parsed_tool_calls={len(parsed_tool_calls)} "
          f"text_preview={text_buf[:500]!r}")
    if parsed_tool_calls:
        for ti, tc in enumerate(parsed_tool_calls):
            _deflog(f"[CC-DEBUG]   tool_call[{ti}] name={tc.get('name')} args_preview={tc.get('arguments','')[:150]!r}")

    # [FIX 13] FALLBACK: if parser returned empty but text contains tool-call patterns,
    # force-extract using regex. This catches cases where model output format
    # doesn't match any of our named patterns (XML/raw JSON/function=).
    if not parsed_tool_calls and len(text_buf) > 20:
        _has_tc_signals = (
            '"type"' in text_buf and ('tool-call' in text_buf or 'tool_call' in text_buf or 'function_call' in text_buf)
        ) or (
            '<tool' in text_buf.lower() and '<parameter' in text_buf.lower()
        ) or (
            '<function=' in text_buf
        ) or (
            '{"cmd":' in text_buf or '{"command":' in text_buf
        )
        if _has_tc_signals:
            _deflog(f"[CC-DEBUG] Parser returned empty but text has tool-call signals! Attempting fallback...")
            # Try direct raw JSON extraction on entire buffer
            _fallback_calls = _extract_raw_json_tool_calls(text_buf)
            if not _fallback_calls:
                # [FIX 14b] Match BOTH "cmd" and "command" keys (model uses both)
                import re as _re
                for _m in _re.finditer(r'\{[^{}]*"(?:command|cmd)"\s*:\s*"(?:[^"\\]|\\.)*"', text_buf):
                    try:
                        _args = json.loads(_m.group(0))
                        if isinstance(_args, dict) and ("cmd" in _args or "command" in _args):
                            _cmd_val = _unwrap_cmd(_args.get("cmd") or _args.get("command", ""))
                            _args["cmd"] = _cmd_val
                            # Copy description as justification if present
                            if "description" in _args:
                                _args["justification"] = _args["description"]
                            _fallback_calls.append({
                                "full_match": _m.group(0),
                                "name": "exec_command",
                                "arguments": json.dumps(_args, ensure_ascii=False),
                            })
                    except Exception:
                        continue
            if _fallback_calls:
                _deflog(f"[CC-DEBUG] Fallback extracted {len(_fallback_calls)} tool calls!")
                for _fi, _fc in enumerate(_fallback_calls):
                    _deflog(f"[CC-DEBUG]   fallback[{_fi}] name={_fc.get('name')} args={_fc.get('arguments','')[:120]!r}")
                parsed_tool_calls = _fallback_calls
            else:
                _deflog(f"[CC-DEBUG] Fallback also failed. text_buf first 500: {text_buf[:500]!r}")

    # [FIX 25] SELF-HEALING STUCK DETECTOR
    # When ALL parsers returned empty and text has intent signals, synthesize a
    # command so the agent loop doesn't stall. This catches:
    #   - Bare text with no tool call format at all
    #   - Unrecognized XML-ish blocks
    #   - Partial JSON (bare "{")
    #   - Model explaining what it wants to do but not producing a tool call
    if not parsed_tool_calls and len(text_buf) > 10:
        _synth_cmd = None
        _synth_just = None
        _tl = text_buf.lower()

        # Heuristic 1: URL in text → fetch it
        _url_in_text = re.search(r"https?://[^\s\]'\\>\",]+", text_buf)
        if _url_in_text:
            _synth_url = _url_in_text.group(0).rstrip(")].,;'\\\"")
            if _IS_WINDOWS:
                _synth_cmd = f"Invoke-WebRequest -Uri '{_synth_url}' -UseBasicParsing -TimeoutSec 15 | Select-Object -ExpandProperty Content | Select-Object -First 200"
            else:
                _synth_cmd = f"curl -sL --max-time 15 '{_synth_url}' 2>/dev/null | head -200"
            _synth_just = "Auto-synthesized: URL detected in text, fetching"

        # Heuristic 2: File path references → list or read
        if not _synth_cmd:
            _file_m = re.search(r"(?:read|open|view|check|examine|cat|show)\s+(?:the\s+)?(?:file\s+)?[`'\"]?(/[^\s'\"]+\.\w+)", _tl)
            if _file_m:
                _fpath = _file_m.group(1)
                if _IS_WINDOWS:
                    _synth_cmd = f"Get-Content '{_fpath}' -ErrorAction SilentlyContinue | Select-Object -First 200; if (-not $?) {{ Get-Item '{_fpath}' | Select-Object Name,Length,LastWriteTime }}"
                else:
                    _synth_cmd = f"cat '{_fpath}' 2>/dev/null | head -200 || ls -la '{_fpath}'"
                _synth_just = f"Auto-synthesized: file reference detected ({_fpath})"

        # Heuristic 3: Shell command mentioned in backticks or quotes
        if not _synth_cmd:
            _shell_m = re.search(r"[`'\"]((?:curl|wget|git|npm|pip|python|ls|cat|grep|find|mkdir|cd|rm|cp|mv|chmod|docker|make|cargo|go)\s[^\s`'\"]+)", text_buf)
            if _shell_m:
                _synth_cmd = _shell_m.group(1)
                _synth_just = "Auto-synthesized: shell command detected in text"

        # Heuristic 4: "explore" or "fetch" intent + last user URL
        if not _synth_cmd and ("explore" in _tl or "fetch" in _tl or "investigate" in _tl or "repository" in _tl):
            for _prev_url in _last_user_urls:
                _url_m2 = re.search(r"https?://[^\s\]'\\>\",]+", _prev_url)
                if _url_m2:
                    _pu = _url_m2.group(0).rstrip(")].,;'\\\"")
                    _ecmd, _ejust = _build_explore_cmd(_pu)
                    if _ecmd:
                        _synth_cmd = _ecmd
                        _synth_just = _ejust or "Auto-synthesized: explore intent with last user URL"
                    break

        # Heuristic 5: Generic "I need to" / "let me" / "I'll" intent with command-like text
        if not _synth_cmd:
            _intent_m = re.search(r"(?:I(?:'ll| will| need to| should)|let me|please)\s+(.+?)(?:\.|!|\n|$)", _tl, re.IGNORECASE)
            if _intent_m:
                _intent_text = _intent_m.group(1).strip()
                if len(_intent_text) > 10 and len(_intent_text) < 200:
                    if _IS_WINDOWS:
                        _synth_cmd = f"Write-Output 'Stuck recovery: model intent was: {_intent_text[:100]}'"
                    else:
                        _synth_cmd = f"echo 'Stuck recovery: model intent was: {_intent_text[:100]}'"
                    _synth_just = f"Auto-synthesized from intent text: {_intent_text[:80]}"

        if _synth_cmd:
            parsed_tool_calls = [{
                "full_match": "__synth_stuck_recovery__",
                "name": "exec_command",
                "arguments": json.dumps({"cmd": _synth_cmd, "justification": _synth_just or "Auto-synthesized stuck recovery"}, ensure_ascii=False),
            }]
            _deflog(f"[CC-DEBUG] [STUCK-RECOVERY] Synthesized: cmd={_synth_cmd[:120]!r}")
            print(f"[CC-DEBUG] [STUCK-RECOVERY] Synthesized command from text intent", file=sys.stderr, flush=True)

    # Also log to stderr for visibility when not piped
    print(f"[CC-DEBUG] text_buf={len(text_buf)} chars, tool_calls={len(parsed_tool_calls)}", file=sys.stderr, flush=True)

    try:
        _debug_fh.close()
    except Exception:
        pass
    clean_text = text_buf
    for tc in parsed_tool_calls:
        clean_text = clean_text.replace(tc["full_match"], "")
    clean_text = clean_text.strip()

    if clean_text:
        yield emit("response.output_item.added", {"type": "response.output_item.added",
            "item": {"type": "message", "id": msg_id, "role": "assistant", "status": "in_progress", "content": []}})
        yield emit("response.content_part.added", {"type": "response.content_part.added",
            "part": {"type": "output_text", "text": "", "annotations": []}, "item_id": msg_id})
        yield emit("response.output_text.delta", {"type": "response.output_text.delta",
                    "delta": clean_text, "item_id": msg_id, "content_index": 0})
        yield emit("response.output_text.done", {"type": "response.output_text.done",
                    "text": clean_text, "item_id": msg_id, "content_index": 0})
        yield emit("response.content_part.done", {"type": "response.content_part.done",
                    "part": {"type": "output_text", "text": clean_text, "annotations": []}, "item_id": msg_id})
        yield emit("response.output_item.done", {"type": "response.output_item.done",
            "item": {"type": "message", "id": msg_id, "role": "assistant", "status": "completed",
                     "content": [{"type": "output_text", "text": clean_text, "annotations": []}]}})

    function_outputs = []
    for tc in parsed_tool_calls:
        fid = uid("fc")
        call_id = uid("call")
        item = {"type": "function_call", "id": fid, "call_id": call_id,
                "name": tc["name"], "arguments": tc["arguments"], "status": "completed"}
        function_outputs.append(item)
        yield emit("response.output_item.added", {"type": "response.output_item.added", "item": item})
        yield emit("response.function_call_arguments.done", {"type": "response.function_call_arguments.done",
                    "item_id": fid, "name": tc["name"], "arguments": tc["arguments"]})
        yield emit("response.output_item.done", {"type": "response.output_item.done", "item": item})

    final_out = []
    if clean_text:
        final_out.append({"type": "message", "id": msg_id, "role": "assistant", "status": "completed",
                          "content": [{"type": "output_text", "text": clean_text, "annotations": []}]})
    final_out.extend(function_outputs)
    yield emit("response.completed", {"type": "response.completed",
        "response": {"id": resp_id, "object": "response", "model": model,
                     "status": "completed", "created": int(time.time()), "output": final_out,
                     "usage": total_usage}})

# ═══════════════════════════════════════════════════════════════════
# Auto-sensing provider adapter
# ═══════════════════════════════════════════════════════════════════

_SENTINEL = object()

@dataclasses.dataclass
class ProviderSchema:
    """Describes what message formats a provider supports.

    Populated by probing the endpoint and/or analyzing error responses.
    Cached in provider-caps.json so probing only happens once per provider.
    """
    supported_roles: tuple = ("user", "assistant")
    content_type: str = "string"  # "string" | "array"
    content_block_types: tuple = ()  # e.g. ("text", "tool_result", "tool-call")
    tool_result_style: str = "inline"  # "inline" | "tool_result_block" | "anthropic"
    tool_call_style: str = "openai_function"  # "openai_function" | "tool-call" | "anthropic_tool_use"
    accepts_tool_role: bool = False
    accepts_system_role: bool = True
    cc_body_wrap: bool = False  # needs {config, params, threadId} wrapping
    field_names: dict = dataclasses.field(default_factory=dict)
    auth_type: str = ""  # "bearer" | "x-api-key" | "custom"
    auth_header: str = "Authorization"  # header name for auth
    auth_scheme: str = "Bearer "  # prefix for auth value
    tool_decl_format: str = "openai"  # "openai" | "anthropic" | "command_code"
    param_names: dict = dataclasses.field(default_factory=lambda: {
        "max_tokens": "max_tokens",
        "temperature": "temperature",
        "top_p": "top_p",
    })
    response_format: str = "auto"  # "sse" | "raw_json" | "ndjson" | "auto"
    stream_format: str = "auto"  # "sse_data" | "sse_event" | "raw_lines" | "json_lines"

    def hints(self) -> dict:
        """Return a dict for storing in provider-caps.json."""
        d = {}
        for k, v in dataclasses.asdict(self).items():
            if isinstance(v, (list, tuple)) and not v:
                continue
            if isinstance(v, dict) and not v:
                continue
            if v is False:
                continue
            if v == "":
                continue
            if v == "auto":
                continue
            d[k] = v
        return d


class ErrorAnalyzer:
    """Parse upstream error responses to infer provider schema.
    Analyzes 400, 401, 422 errors for hints about auth, roles, content format,
    parameter names, field names, tool format, and response format.
    """

    @staticmethod
    def analyze(error_text: str, current: ProviderSchema = None) -> dict:
        hints = {}
        if not error_text:
            return hints
        err = error_text.lower()

        # ── Auth detection (401 errors) ──
        if re.search(r"unauthorized|invalid.*api.?key|missing.*api.?key|x-api-key", err):
            hints["auth_type"] = "x-api-key"
            hints["auth_header"] = "x-api-key"
            hints["auth_scheme"] = ""
        elif re.search(r"invalid.*bearer|bearer.*token|authorization.*header|invalid.*token", err):
            hints["auth_type"] = "bearer"
            hints["auth_header"] = "Authorization"
            hints["auth_scheme"] = "Bearer "

        # ── Role validation ──
        if re.search(r"role.*expected.*(?:user|assistant)", err):
            hints["accepts_tool_role"] = False
            hints["accepts_function_role"] = False

        if re.search(r"role.*(?:tool|function).*(?:invalid|not.*(?:support|allow))", err):
            hints["accepts_tool_role"] = False
            hints["accepts_function_role"] = False

        if re.search(r"role.*system.*(?:invalid|not.*(?:support|allow))", err):
            hints["accepts_system_role"] = False

        # ── Content format (top-level only, not content[i].xxx) ──
        if re.search(r'params\.messages\[\d+\]\.content', err):
            # Explicit path to content field in a messages array (e.g. /alpha/generate)
            if re.search(r"expected string.*received array", err):
                hints["content_type"] = "string"
                hints["tool_result_style"] = "inline"  # no tool_result blocks allowed
            elif re.search(r"expected array.*received string", err):
                hints["content_type"] = "array"
        elif re.search(r"(?<!\w)content(?!\[)\s*(?:of type|field|should be|expected|must be).*(?:string|array)", err) or \
             re.search(r"expected (?:string|array).*content", err):
            if re.search(r"expected string", err) and not re.search(r"expected array", err):
                hints["content_type"] = "string"
            elif re.search(r"expected array", err):
                hints["content_type"] = "array"
        elif re.search(r"content.*expected string.*received array", err) and not re.search(r"\[\d*\]", err):
            hints["content_type"] = "string"
        elif re.search(r"content.*expected array.*received string", err) and not re.search(r"\[\d*\]", err):
            hints["content_type"] = "array"

        # ── Content block types ──
        types = set()
        for m in re.finditer(
            r'expected\s+"('
            r'text|image|document|search_result|thinking|redacted_thinking|reasoning|'
            r'tool_use|tool-call|tool_result|tool-result|'
            r'server_tool_use|web_search_tool_result|web_fetch_tool_result|tool'
            r')"', err
        ):
            types.add(m.group(1))
        # Also detect from "expected string, received array at params.messages[i].content" pattern
        # where the "or" clauses list valid block types
        if not types and re.search(r'params\.messages\[\d+\]\.content', err):
            for valid_type in ("text", "image", "document", "tool_use", "tool-call", "tool_result"):
                if re.search(r'expected\s+"' + re.escape(valid_type) + r'"', err):
                    types.add(valid_type)
        if types:
            hints["content_block_types"] = tuple(sorted(types))

        # ── Tool result style ──
        if re.search(r"tool_result", err):
            hints["tool_result_style"] = "tool_result_block"
        elif re.search(r"tool_use", err) and not re.search(r"tool.use", err):
            hints["tool_result_style"] = "anthropic"

        # ── Tool call style ──
        if re.search(r"tool-call", err) or re.search(r"tool_call", err):
            hints["tool_call_style"] = "tool-call"
        elif re.search(r"tool_use", err):
            hints["tool_call_style"] = "anthropic_tool_use"

        # ── CC body wrap detection ──
        if re.search(r"(?:params\.|body\.)config", err) or re.search(r"threadId", err):
            hints["cc_body_wrap"] = True

        # ── Field name mappings (keys MUST match SchemaAdapter lookups) ──
        fields = {}
        if re.search(r"tool_use_id", err):
            fields["tool_use_id"] = "tool_use_id"
        if re.search(r"toolCallId", err):
            fields["toolCallId"] = "toolCallId"
            # SchemaAdapter._tool_result_block looks up "tool_use_id"
            fields["tool_use_id"] = "toolCallId"
        if re.search(r"tool_result", err) and not re.search(r"tool.result", err):
            fields["tool_result_type"] = "tool_result"
        if re.search(r"tool-result", err):
            fields["tool_result_type"] = "tool-result"
        # Detect tool call field names from errors
        if re.search(r"(?:id|call_id|callId|tool_use_id).*(?:invalid|unknown|expected|required)", err) or \
           re.search(r"(?:expected|required).*(?:id|call_id|callId)", err):
            for alt in ("id", "call_id", "callId", "tool_use_id"):
                if alt in err:
                    fields["tool_call_id_field"] = alt
                    break
        if re.search(r"(?:name|tool_name|function).*(?:invalid|unknown|expected|required)", err) or \
           re.search(r"(?:expected|required).*(?:name|tool_name)", err):
            for alt in ("name", "tool_name", "function"):
                if alt in err:
                    fields["tool_call_name_field"] = alt
                    break
        if re.search(r"arguments.*(?:invalid|unknown|expect|required)", err) or \
           re.search(r"input.*(?:invalid|unknown|expect|required)", err):
            if re.search(r"input_schema|input\b", err) and not re.search(r"arguments", err):
                fields["tool_call_args_field"] = "input"
                fields["tool_args_field"] = "input"
            else:
                fields["tool_call_args_field"] = "arguments"
                fields["tool_args_field"] = "arguments"

        # ── Supported roles from error ──
        if re.search(r"params\.messages\[\d+\]\.role", err):
            roles = re.findall(r'expected one of\s+"([^"]+)"', err)
            if roles:
                hints["supported_roles"] = tuple(r.strip() for r in roles[0].split("|"))
        if fields:
            hints["field_names"] = fields

        # ── Parameter name negotiation ──
        param_hints = {}
        if re.search(r"max_tokens.*(?:invalid|unknown|not.*(?:support|recognize))", err) or \
           re.search(r"(?:unknown|invalid).*param.*max_tokens", err):
            for alt in ("max_output_tokens", "max_tokens_to_sample", "max_new_tokens", "max_token"):
                if alt.lower() in err:
                    param_hints["max_tokens"] = alt
                    break
        if re.search(r"temperature.*(?:invalid|unknown)", err):
            for alt in ("creation_temperature", "temp", "model_temperature"):
                if alt.lower() in err:
                    param_hints["temperature"] = alt
                    break
        if re.search(r"top_p.*(?:invalid|unknown)", err):
            for alt in ("top_p", "nucleus_sampling"):
                if alt.lower() in err:
                    param_hints["top_p"] = alt
                    break
        if param_hints:
            hints["param_names"] = param_hints

        # ── Tool declaration format ──
        if re.search(r"tools.*input_schema", err) or re.search(r"input_schema.*required", err):
            hints["tool_decl_format"] = "anthropic"
        elif re.search(r"tools.*function.*(?:required|expected)", err):
            hints["tool_decl_format"] = "openai"
        elif re.search(r"tool-call|tool_call.*format", err):
            hints["tool_decl_format"] = "command_code"

        # ── Response/Stream format hints from content-type or error ──
        if re.search(r"content.type.*text/event.stream", err) or \
           re.search(r"stream.*sse|sse.*expected", err):
            hints["stream_format"] = "sse_data"
        if re.search(r"ndjson|json.*lines", err):
            hints["stream_format"] = "json_lines"

        return hints

    @staticmethod
    def merge_into_schema(hints: dict, schema: ProviderSchema) -> ProviderSchema:
        for k, v in hints.items():
            if k == "field_names" and isinstance(v, dict):
                schema.field_names.update(v)
            elif k == "param_names" and isinstance(v, dict):
                schema.param_names.update(v)
            elif hasattr(schema, k):
                setattr(schema, k, v)
        return schema


def _schema_cache_key(target_url=None, backend=None, model=None):
    host = urllib.parse.urlparse(target_url or TARGET_URL).netloc.lower()
    return f"auto-schema|{backend or BACKEND}|{host}|{model or '*'}"


def _load_schema(target_url=None, backend=None, model=None):
    caps = _load_provider_caps()
    key = _schema_cache_key(target_url, backend, model)
    raw = caps.get(key)
    generic = caps.get(_schema_cache_key(target_url, backend, model="*"))
    data = raw or generic or {}
    if not data:
        return ProviderSchema()
    # Staleness check: re-learn after 24h (86400s)
    updated = data.get("_updated", 0)
    if isinstance(updated, (int, float)) and time.time() - updated > 86400:
        print(f"[auto-sense] cached schema stale ({int(time.time()-updated)}s old), re-learning", file=sys.stderr)
        return ProviderSchema()
    return ProviderSchema(
        supported_roles=tuple(data.get("supported_roles", ("user", "assistant"))),
        content_type=data.get("content_type", "string"),
        content_block_types=tuple(data.get("content_block_types", ())),
        tool_result_style=data.get("tool_result_style", "inline"),
        tool_call_style=data.get("tool_call_style", "openai_function"),
        accepts_tool_role=data.get("accepts_tool_role", False),
        accepts_system_role=data.get("accepts_system_role", True),
        cc_body_wrap=data.get("cc_body_wrap", False),
        field_names=dict(data.get("field_names", {})),
        auth_type=data.get("auth_type", ""),
        auth_header=data.get("auth_header", "Authorization"),
        auth_scheme=data.get("auth_scheme", "Bearer "),
        tool_decl_format=data.get("tool_decl_format", "openai"),
        param_names=dict(data.get("param_names", {
            "max_tokens": "max_tokens",
            "temperature": "temperature",
            "top_p": "top_p",
        })),
        response_format=data.get("response_format", "auto"),
        stream_format=data.get("stream_format", "auto"),
    )


def _save_schema(schema: ProviderSchema, target_url=None, backend=None, model=None):
    caps = _load_provider_caps()
    key = _schema_cache_key(target_url, backend, model)
    caps[key] = schema.hints()
    caps[key]["_updated"] = time.time()
    caps[key]["_backend"] = backend or BACKEND
    _save_provider_caps()
    print(f"[auto-sense] cached schema {key}", file=sys.stderr)


class SchemaAdapter:
    """Convert Responses API messages based on a detected ProviderSchema."""

    def __init__(self, schema: ProviderSchema):
        self.s = schema

    def convert(self, input_data, instructions=""):
        if self.s.content_type == "string" and not self.s.content_block_types:
            return self._to_plain_string(input_data, instructions)
        return self._to_content_blocks(input_data, instructions)

    def _to_plain_string(self, input_data, instructions=""):
        """Fallback: user/assistant string content — no tool roles."""
        msgs = []
        if instructions and self.s.accepts_system_role:
            msgs.append({"role": "system", "content": instructions})
        elif instructions:
            msgs.append({"role": "user", "content": instructions})
        if isinstance(input_data, str):
            msgs.append({"role": "user", "content": input_data})
            return msgs
        if not isinstance(input_data, list):
            return msgs
        last_flushed = []
        pending = []
        for item in input_data:
            t = item.get("type")
            if t == "function_call":
                cid = item.get("call_id") or item.get("id") or uid("fc")
                pending.append({"id": cid, "name": item.get("name", ""),
                                "arguments": item.get("arguments", "{}")})
                continue
            if pending:
                last_flushed = [p["id"] for p in pending]
                msgs.append({"role": "assistant", "content": None,
                             "tool_calls": [{"id": p["id"], "type": "function",
                                             "function": {"name": p["name"],
                                                          "arguments": p["arguments"]}}
                                            for p in pending]})
                pending = []
            if t == "message":
                role = "user" if item.get("role") in ("user", "developer") else "assistant"
                text = _extract_text(item.get("content", []))
                if text:
                    msgs.append({"role": role, "content": text})
            elif t == "function_call_output":
                out = item.get("output", "")
                if not isinstance(out, str):
                    out = json.dumps(out, ensure_ascii=False)
                msgs.append({"role": "user", "content": out[:8000]})
        if pending:
            last_flushed = [p["id"] for p in pending]
            msgs.append({"role": "assistant", "content": None,
                         "tool_calls": [{"id": p["id"], "type": "function",
                                         "function": {"name": p["name"],
                                                      "arguments": p["arguments"]}}
                                        for p in pending]})
        return msgs

    def _to_content_blocks(self, input_data, instructions=""):
        msgs = []
        pending_tc = []
        tool_name_by_id = {}
        last_ids = []

        def flush():
            nonlocal last_ids
            if not pending_tc:
                return
            last_ids = [t["id"] for t in pending_tc]
            msgs.append({"role": "assistant", "content": pending_tc})
            pending_tc.clear()

        _str = self.s.content_type == "string"

        if instructions:
            msgs.append({"role": "user", "content": instructions if _str else [{"type": "text", "text": instructions}]})

        if isinstance(input_data, str):
            msgs.append({"role": "user", "content": input_data if _str else [{"type": "text", "text": input_data}]})
            return msgs
        if not isinstance(input_data, list):
            return msgs

        for item in input_data:
            t = item.get("type")
            if t == "function_call":
                cid = item.get("call_id") or item.get("id") or uid("call")
                nm = item.get("name") or "exec_command"
                tool_name_by_id[cid] = nm
                tc_block = self._tool_call_block(cid, nm, item.get("arguments", "{}"))
                if tc_block:
                    pending_tc.append(tc_block)
                continue
            flush()
            if t == "message":
                role = "user" if item.get("role") in ("user", "developer") else "assistant"
                text = _extract_text(item.get("content", []))
                if text:
                    msgs.append({"role": role, "content": text if _str else [{"type": "text", "text": text}]})
            elif t == "function_call_output":
                cid = item.get("call_id") or item.get("id") or ""
                if not cid and last_ids:
                    idx = sum(1 for m in msgs for c in (m.get("content") or [])
                              if isinstance(c, dict) and c.get("type") in
                              ("tool_result", "tool-result"))
                    if idx < len(last_ids):
                        cid = last_ids[idx]
                out = item.get("output", "")
                if not isinstance(out, str):
                    out = json.dumps(out, ensure_ascii=False)
                tr = self._tool_result_block(cid, out)
                if tr:
                    msgs.append({"role": "user", "content": [tr]})
        flush()
        return msgs

    def _tool_call_block(self, cid, name, args):
        style = self.s.tool_call_style
        fn = self.s.field_names
        if style == "tool-call":
            return {
                "type": fn.get("tool_call_type", "tool-call"),
                fn.get("tool_call_id_field", "id"): cid,
                fn.get("tool_call_name_field", "name"): name,
                fn.get("tool_call_args_field", "arguments"): args,
            }
        elif style == "anthropic_tool_use":
            try:
                parsed = json.loads(args)
            except Exception:
                parsed = {}
            return {
                "type": fn.get("tool_use_type", "tool_use"),
                fn.get("tool_call_id_field", "id"): cid,
                fn.get("tool_call_name_field", "name"): name,
                fn.get("tool_call_args_field", "input"): parsed,
            }
        else:
            return None  # handled as OpenAI function call

    def _tool_result_block(self, cid, output):
        style = self.s.tool_result_style
        fn = self.s.field_names
        if style == "tool_result_block":
            return {
                "type": fn.get("tool_result_type", "tool_result"),
                fn.get("tool_use_id", "tool_use_id"): cid or "",
                "content": [{"type": "text", "text": output[:8000]}],
            }
        elif style == "anthropic":
            return {
                "type": fn.get("tool_result_type", "tool_result"),
                fn.get("tool_use_id", "tool_use_id"): cid or "",
                "content": output[:8000],
            }
        return None  # inline — handled by _to_plain_string


def _sanitize_err_body(body):
    """Sanitize upstream error body: strip HTML, truncate, remove control chars."""
    if not body:
        return ""
    s = re.sub(r'<[^>]+>', '', body)
    s = re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f]', '', s)
    s = s.strip()[:1000]
    return s


def _extract_text(content):
    if isinstance(content, str):
        return content
    if not isinstance(content, list):
        return ""
    parts = []
    for p in content:
        if isinstance(p, str):
            parts.append(p)
        elif isinstance(p, dict) and p.get("type") in ("input_text", "output_text", "text"):
            parts.append(p.get("text", ""))
    return "".join(parts)


# ═══════════════════════════════════════════════════════════════════
# HTTP Server
# ═══════════════════════════════════════════════════════════════════

_MAX_REQLOG_LINES = 2000

def _log_resp(resp_id, status, output):
    try:
        import datetime as _dt
        _lp = os.path.join(_LOG_DIR, "requests.log")
        with open(_lp, "a", encoding="utf-8") as _f:
            _f.write(f"  RESPONSE id={resp_id} status={status}\n")
            if output:
                for o in output:
                    ot = o.get("type")
                    if ot == "message":
                        _f.write(f"    -> message: {o.get('content',[{}])[0].get('text','')[:200]}\n")
                    elif ot == "function_call":
                        _f.write(f"    -> function_call: {o.get('name')}({o.get('arguments','')[:120]})\n")
                    else:
                        _f.write(f"    -> {ot}\n")
            _f.write(f"{'='*60}\n")
            _f.flush()
            _f.seek(0)
            lines = _f.readlines()
            if len(lines) > _MAX_REQLOG_LINES:
                with open(_lp, "w", encoding="utf-8") as _f2:
                    _f2.writelines(lines[-_MAX_REQLOG_LINES:])
    except Exception:
        pass

class ConnectionTracker:
    def __enter__(self):
        global _active_connections
        with _active_connections_lock:
            _active_connections += 1
    def __exit__(self, *a):
        global _active_connections
        with _active_connections_lock:
            _active_connections -= 1

class RequestTracker:
    def __init__(self, request_id):
        self.request_id = request_id
        self.cancelled = threading.Event()

    def __enter__(self):
        if self.request_id:
            with _active_requests_lock:
                _active_requests[self.request_id] = self
        return self

    def __exit__(self, *a):
        if self.request_id:
            with _active_requests_lock:
                _active_requests.pop(self.request_id, None)

def _cancel_request(request_id):
    with _active_requests_lock:
        req = _active_requests.get(request_id)
    if not req:
        return False
    req.cancelled.set()
    return True

def _handle_shutdown_signal(signum, frame):
    global _shutdown_requested
    _shutdown_requested = True
    print("[proxy] shutdown requested; draining connections", file=sys.stderr)
    def _drain():
        deadline = time.time() + 5
        while time.time() < deadline:
            with _active_connections_lock:
                if _active_connections == 0:
                    break
            time.sleep(0.1)
        if SERVER is not None:
            SERVER.shutdown()
    threading.Thread(target=_drain, daemon=True).start()

def _upstream_timeout(body, stream):
    input_data = body.get("input", "")
    n_items = len(input_data) if isinstance(input_data, list) else 1
    has_tools = bool(body.get("tools"))
    if stream:
        return min((180 if has_tools else 120) + n_items * 2, 300)
    return min(60 + n_items * 2, 120)

def _auto_continue_gemini(handler, flush_event, message_id, model, gen_config, gemini_tools, system_parts, project_id, headers, endpoints, url_suffix, accumulated_text, output_items, message_started):
    max_continuations = 5
    for _cont in range(max_continuations):
        cont_contents = [
            {"role": "model", "parts": [{"text": accumulated_text[-12000:]}]},
            {"role": "user", "parts": [{"text": "Continue exactly where you left off. Do not repeat anything already written."}]},
        ]
        cont_request = {"contents": cont_contents, "generationConfig": dict(gen_config)}
        if system_parts:
            cont_request["systemInstruction"] = {"parts": system_parts}
        if gemini_tools:
            cont_request["tools"] = gemini_tools
        cont_wrapped = {"project": project_id, "model": model, "request": cont_request}
        if OAUTH_PROVIDER == "google-antigravity":
            cont_wrapped["requestType"] = "agent"
            cont_wrapped["userAgent"] = "antigravity"
            cont_wrapped["requestId"] = f"agent-{uuid.uuid4().hex[:12]}"
        cont_body = json.dumps(cont_wrapped).encode()
        upstream = None
        for ep in endpoints:
            target = f"{ep}/{url_suffix}"
            req = urllib.request.Request(target, data=cont_body, headers=headers)
            try:
                upstream = urllib.request.urlopen(req, timeout=180)
                break
            except Exception as e:
                print(f"[auto-continue] {ep} failed: {e}", file=sys.stderr)
                continue
        if not upstream:
            break
        cont_text = ""
        cont_finish = ""
        cont_buf = ""
        for raw_line in _stream_with_idle_timeout(upstream):
            line = raw_line.decode(errors="replace")
            if line.startswith("data: "):
                cont_buf += line[6:]
                continue
            if not line.strip() and cont_buf:
                try:
                    chunk = json.loads(cont_buf)
                except Exception:
                    cont_buf = ""
                    continue
                cont_buf = ""
                candidates = chunk.get("response", chunk).get("candidates", [])
                if not candidates:
                    continue
                cont_finish = candidates[0].get("finishReason", "")
                parts = candidates[0].get("content", {}).get("parts", [])
                for part in parts:
                    if part.get("thought"):
                        continue
                    if "text" in part and not part.get("functionCall"):
                        delta = part["text"]
                        if delta:
                            cont_text += delta
                            flush_event("response.output_text.delta", {"type": "response.output_text.delta", "output_index": 0, "content_index": 0, "delta": delta})
                    elif part.get("functionCall"):
                        fc = part["functionCall"]
                        call_id = f"call_{uuid.uuid4().hex[:24]}"
                        args_str = json.dumps(fc.get("args", fc.get("arguments", {})))
                        output_index = len(output_items)
                        flush_event("response.output_item.added", {"type": "response.output_item.added", "output_index": output_index, "item": {"type": "function_call", "id": call_id, "call_id": call_id, "name": fc.get("name", ""), "arguments": ""}})
                        flush_event("response.function_call_arguments.delta", {"type": "response.function_call_arguments.delta", "output_index": output_index, "item_id": call_id, "delta": args_str})
                        flush_event("response.function_call_arguments.done", {"type": "response.function_call_arguments.done", "output_index": output_index, "item_id": call_id, "arguments": args_str})
                        output_items.append({"tool": True, "fc": fc, "call_id": call_id})
        accumulated_text += cont_text
        print(f"[auto-continue] chunk {len(cont_text)} chars, finish={cont_finish}, total={len(accumulated_text)}", file=sys.stderr)
        if cont_finish != "MAX_TOKENS":
            break
    return accumulated_text

_ANTIGRAVITY_MAX_CONTENTS = 20
_ANTIGRAVITY_MAX_TOOL_VERBATIM = 2
_ANTIGRAVITY_MAX_TOOL_CHARS = 2000
_ANTIGRAVITY_MAX_OLD_SUMMARY_CHARS = 1200
_ANTIGRAVITY_SOFT_CHARS = 120000
_ANTIGRAVITY_HARD_CHARS = 250000
_ANTIGRAVITY_EMERGENCY_CHARS = 500000
_ANTIGRAVITY_SIMPLE_WORDS = frozenset({"hi", "hello", "hey", "test", "ping", "thanks", "thank you", "ok", "okay", "yes", "no", "cool", "nice", "good", "great", "done", "go", "stop", "yep", "nope", "sure", "right", "correct", "continue", "cont", "k", "thx", "ty", "np", "lol", "brb", "bye"})
_ANTIGRAVITY_EDIT_WORDS = frozenset(("change", "fix", "update", "redesign", "rewrite", "modify", "improve", "replace", "edit", "make it", "add", "remove", "delete", "rename", "move", "convert", "create", "build", "implement"))
_ANTIGRAVITY_REFERENCE_WORDS = frozenset(("previous", "file", "error", "again", "that", "this", "it", "same", "last", "above", "earlier", "before", "earlier output", "last error", "previous result", "what was", "show me", "give me"))

def _antigravity_is_simple_user(text):
    if not text:
        return True
    stripped = text.strip().lower()
    if stripped in _ANTIGRAVITY_SIMPLE_WORDS:
        return True
    if len(stripped) < 30:
        words = set(stripped.split())
        if not words.intersection(_ANTIGRAVITY_REFERENCE_WORDS) and not words.intersection(_ANTIGRAVITY_EDIT_WORDS):
            return True
    return False

def _antigravity_normalize_context(input_data):
    if not isinstance(input_data, list) or len(input_data) < 2:
        return input_data

    latest_user = ""
    latest_user_idx = -1
    for i in range(len(input_data) - 1, -1, -1):
        item = input_data[i]
        if isinstance(item, dict) and item.get("type") == "message" and item.get("role") == "user":
            c = item.get("content", "")
            if isinstance(c, str):
                latest_user = c
            elif isinstance(c, list):
                latest_user = "\n".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict))
            latest_user_idx = i
            break

    if not latest_user:
        return input_data

    is_simple = _antigravity_is_simple_user(latest_user)

    n_raw = len(input_data)
    n_tool_outputs = sum(1 for it in input_data if isinstance(it, dict) and it.get("type") == "function_call_output")
    n_tool_calls = sum(1 for it in input_data if isinstance(it, dict) and it.get("type") == "function_call")

    auto_reset = (n_raw > 200 or n_tool_outputs > 20) and is_simple
    if os.environ.get("ANTIGRAVITY_AUTO_RESET_POLLUTED_CONTEXT", "1") != "1":
        auto_reset = False

    if is_simple and (auto_reset or n_tool_outputs == 0):
        system_items = [it for it in input_data if isinstance(it, dict) and it.get("type") == "message" and it.get("role") in ("developer", "system")]
        user_item = input_data[latest_user_idx]
        result = system_items + [user_item] if system_items else [user_item]
        print(f"[antigravity-context] raw_items={n_raw} compacted_items={n_raw} final_items={len(result)}", file=sys.stderr)
        print(f"[antigravity-context] raw_tool_outputs={n_tool_outputs} kept_tool_outputs=0", file=sys.stderr)
        print(f"[antigravity-context] simple_latest_user=true auto_reset={auto_reset}", file=sys.stderr)
        return result

    dev_messages = []
    recent_items = []
    tool_outputs = []
    other_items = []

    for i, item in enumerate(input_data):
        if not isinstance(item, dict):
            continue
        t = item.get("type")
        if t == "message" and item.get("role") in ("developer", "system"):
            dev_messages.append(item)
        elif t == "function_call_output":
            tool_outputs.append((i, item))
        elif t in ("function_call",):
            other_items.append((i, item))
        elif t == "message":
            recent_items.append((i, item))

    latest_words = set(latest_user.strip().lower().split())
    has_edit_intent = bool(latest_words.intersection(_ANTIGRAVITY_EDIT_WORDS))
    has_ref_intent = bool(latest_words.intersection(_ANTIGRAVITY_REFERENCE_WORDS))
    keep_tools = 2 if (has_edit_intent or has_ref_intent) else 1

    kept_tools = tool_outputs[-keep_tools:] if tool_outputs and (has_edit_intent or has_ref_intent) else []

    for idx_t, t_item in enumerate(kept_tools):
        orig = t_item[1]
        out = orig.get("output", "")
        if isinstance(out, str) and len(out) > _ANTIGRAVITY_MAX_TOOL_CHARS:
            new_item = dict(orig)
            new_item["output"] = out[:_ANTIGRAVITY_MAX_TOOL_CHARS] + f"\n... [truncated: kept {_ANTIGRAVITY_MAX_TOOL_CHARS} of {len(out)} chars]"
            kept_tools[idx_t] = (t_item[0], new_item)

    n_summarized = len(tool_outputs) - len(kept_tools)

    tail_start = max(0, len(recent_items) - 6)
    recent_tail = recent_items[tail_start:]

    tool_call_ids = set()
    for _, t_item in kept_tools:
        cid = t_item.get("call_id", t_item.get("id", ""))
        if cid:
            tool_call_ids.add(cid)

    paired_calls = []
    for idx, item in other_items:
        cid = item.get("call_id", item.get("id", ""))
        if cid in tool_call_ids:
            paired_calls.append((idx, item))

    result = list(dev_messages)

    if n_summarized > 0:
        summary_text = f"[Tool history summary: {n_summarized} older tool outputs omitted. {n_tool_calls} prior function calls were made for file inspection/editing.]"
        result.append({"type": "message", "role": "user", "content": [{"type": "input_text", "text": summary_text}]})

    for _, call_item in paired_calls:
        result.append(call_item)

    for _, tool_item in kept_tools:
        result.append(tool_item)

    for _, msg_item in recent_tail:
        if msg_item is not input_data[latest_user_idx]:
            result.append(msg_item)

    latest_norm = " ".join(latest_user.strip().split())[:200].lower()
    already_present = False
    for r in result:
        if isinstance(r, dict) and r.get("type") == "message" and r.get("role") == "user":
            c = r.get("content", "")
            if isinstance(c, str):
                rn = " ".join(c.strip().split())[:200].lower()
            elif isinstance(c, list):
                combined = " ".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict))
                rn = " ".join(combined.strip().split())[:200].lower()
            else:
                rn = ""
            if rn == latest_norm:
                already_present = True
                break

    if not already_present:
        result.append(input_data[latest_user_idx])

    total_chars = sum(len(json.dumps(it, ensure_ascii=False)) for it in result)

    if total_chars > _ANTIGRAVITY_EMERGENCY_CHARS:
        print(f"[antigravity-context] EMERGENCY: {total_chars} chars exceeds limit, resetting to minimal", file=sys.stderr)
        result = list(dev_messages) + [input_data[latest_user_idx]]
        total_chars = sum(len(json.dumps(it, ensure_ascii=False)) for it in result)

    while len(result) > _ANTIGRAVITY_MAX_CONTENTS and total_chars > _ANTIGRAVITY_SOFT_CHARS:
        for i in range(1, len(result) - 1):
            if isinstance(result[i], dict) and result[i].get("type") in ("message", "function_call_output"):
                removed = result.pop(i)
                total_chars -= len(json.dumps(removed, ensure_ascii=False))
                break
        else:
            break

    est_tokens = total_chars // 4
    print(f"[antigravity-context] raw_items={n_raw} final_items={len(result)}", file=sys.stderr)
    print(f"[antigravity-context] raw_tool_outputs={n_tool_outputs} kept_tool_outputs={len(kept_tools)} summarized_tool_outputs={n_summarized}", file=sys.stderr)
    print(f"[antigravity-context] simple_latest_user={is_simple} auto_reset={auto_reset}", file=sys.stderr)
    print(f"[antigravity-context] final_chars={total_chars} estimated_tokens={est_tokens}", file=sys.stderr)

    return result

class Handler(http.server.BaseHTTPRequestHandler):
    protocol_version = "HTTP/1.1"

    def do_GET(self):
         if self.path in ("/v1/models", "/models"):
             self.send_json(200, {"object": "list", "data": MODELS})
         elif self.path in ("/v1/accounts", "/accounts"):
             info = {"provider": BACKEND, "oauth_provider": OAUTH_PROVIDER}
             if BACKEND in ("codebuff", "freebuff"):
                 info["accounts"] = _cb_pool.status()
                 info["total"] = len(_cb_pool._accounts)
             elif OAUTH_PROVIDER and OAUTH_PROVIDER.startswith("google"):
                 pool = _google_antigravity_pool if OAUTH_PROVIDER == "google-antigravity" else _google_cli_pool
                 info["accounts"] = pool.status()
                 info["total"] = len(pool._accounts)
             elif _api_key_pool:
                 info["accounts"] = _api_key_pool.status()
                 info["total"] = len(_api_key_pool._accounts)
             else:
                 info["accounts"] = []
                 info["total"] = 0
             self.send_json(200, info)
         elif self.path in ("/health", "/v1/health"):
            _mem_mb = 0
            try:
                if _IS_WINDOWS:
                    import ctypes
                    class _PMI(ctypes.Structure):
                        _fields_ = [("cb", ctypes.c_ulong), ("PageFaultCount", ctypes.c_ulong),
                                    ("PeakWorkingSetSize", ctypes.c_size_t), ("WorkingSetSize", ctypes.c_size_t),
                                    ("QuotaPeakPagedPoolUsage", ctypes.c_size_t), ("QuotaPagedPoolUsage", ctypes.c_size_t),
                                    ("QuotaPeakNonPagedPoolUsage", ctypes.c_size_t), ("QuotaNonPagedPoolUsage", ctypes.c_size_t),
                                    ("PagefileUsage", ctypes.c_size_t), ("PeakPagefileUsage", ctypes.c_size_t)]
                    _pmi = _PMI()
                    _pmi.cb = ctypes.sizeof(_PMI)
                    ctypes.windll.psapi.GetProcessMemoryInfo.argtypes = [ctypes.c_void_p, ctypes.c_void_p, ctypes.c_ulong]
                    ctypes.windll.psapi.GetProcessMemoryInfo.restype = ctypes.c_int
                    ctypes.windll.psapi.GetProcessMemoryInfo(
                        ctypes.windll.kernel32.GetCurrentProcess(), ctypes.byref(_pmi), _pmi.cb)
                    _mem_mb = _pmi.PeakWorkingSetSize / (1024 * 1024)
                else:
                    import resource as _res
                    _mem_mb = _res.getrusage(_res.RUSAGE_SELF).ru_maxrss / 1024
            except Exception:
                pass
            _uptime = time.time() - _START_TIME if '_START_TIME' in dir() else 0
            self.send_json(200, {"ok": True, "backend": BACKEND,
                                 "target_url": TARGET_URL,
                                 "models": [m.get("id") for m in MODELS],
                                 "bgp_routes": len(BGP_ROUTES),
                                 "uptime_s": round(_uptime, 1),
                                 "memory_mb": round(_mem_mb, 1),
                                 "requests_total": _STATS.get("requests", 0)})
         else:
             self.send_error(404)

    def do_POST(self):
        if _shutdown_requested:
            return self.send_json(503, {"error": {"type": "proxy_shutting_down",
                                                   "message": "Proxy is shutting down"}})
        if self.path.startswith("/admin/cancel/"):
            request_id = self.path.rsplit("/", 1)[-1]
            if _cancel_request(request_id):
                return self.send_json(200, {"ok": True, "cancelled": request_id})
            return self.send_json(404, {"ok": False, "error": "request_not_found"})
        if self.path in ("/v1/responses", "/responses"):
            with ConnectionTracker():
                self._handle()
        else:
            self.send_error(404)

    _logf = None

    def _handle(self):
        try:
            clen = int(self.headers.get("Content-Length", 0))
            body = json.loads(self.rfile.read(clen))
        except Exception as e:
            return self.send_json(400, {"error": {"message": f"Bad request: {e}"}})

        self._session_id = uuid.uuid4().hex[:8]
        _sid = self._session_id

        import datetime as _dt
        _log_path = os.path.join(_LOG_DIR, "requests.log")
        _ts = _dt.datetime.now().isoformat()

        prev_id = body.get("previous_response_id")
        raw_input = body.get("input", "")
        input_data = resolve_previous_response(body)
        input_data = _compact_input(input_data)
        body["input"] = input_data

        raw_types = [i.get("type") for i in raw_input] if isinstance(raw_input, list) else "str"
        resolved_types = [i.get("type") for i in input_data] if isinstance(input_data, list) else "str"

        print(f"[{_sid}] prev_id={prev_id} raw={raw_types} resolved={resolved_types}", file=sys.stderr)
        with open(_log_path, "a", encoding="utf-8") as _lf:
            _lf.write(f"\n{'='*60}\n{_ts} [session={_sid}] REQUEST {self.path}\n")
            _lf.write(f"  prev_id={prev_id}\n")
            _lf.write(f"  raw_input_types={raw_types}\n")
            _lf.write(f"  resolved_input_types={resolved_types}\n")
            _lf.write(f"  stream={body.get('stream')} model={body.get('model')} force_model={FORCE_MODEL}\n")
            _lf.write(f"  store_keys={list(_response_store.keys())}\n")
            if isinstance(input_data, list):
                for i, item in enumerate(input_data):
                    t = item.get("type")
                    if t == "message":
                        _lf.write(f"  [{i}] message role={item.get('role')} text={str(item.get('content',''))[:120]}\n")
                    elif t == "function_call":
                        _lf.write(f"  [{i}] function_call call_id={item.get('call_id')} id={item.get('id')} name={item.get('name')} args={item.get('arguments','')[:120]}\n")
                    elif t == "function_call_output":
                        _lf.write(f"  [{i}] function_call_output id={item.get('id')} output={str(item.get('output',''))[:120]}\n")
                    else:
                        _lf.write(f"  [{i}] {t}\n")
            _lf.flush()

        model = body.get("model", MODELS[0]["id"] if MODELS else "unknown")
        if FORCE_MODEL:
            model = FORCE_MODEL
            body["model"] = FORCE_MODEL
        stream = body.get("stream", False)
        _desktop_forced_models = {"gpt-5.4-mini", "gpt-5.4", "gpt-5.5", "gpt-5-codex", "gpt-5.3-codex"}
        _launcher_model = os.environ.get("CODEX_LAUNCHER_MODEL", "")
        if _launcher_model and model in _desktop_forced_models:
            print(f"[{_sid}] remap desktop model {model} -> {_launcher_model}", file=sys.stderr)
            model = _launcher_model
            body["model"] = model
        request_id = body.get("request_id") or body.get("id") or uid("req")
        if isinstance(input_data, list):
            for item in input_data:
                if isinstance(item, dict) and item.get("type") == "message" and item.get("role") == "user":
                    content = str(item.get("content", ""))
                    for url_m in re.finditer(r"https?://[^\s\]'\"<>]+", content):
                        _last_user_urls.append(url_m.group(0))
        save_request_snapshot(request_id, body)
        _req_t0 = time.time()
        try:
            with RequestTracker(request_id) as tracker:
                if BACKEND == "auto":
                    self._handle_auto(body, model, stream, tracker)
                elif BACKEND == "anthropic":
                    self._handle_anthropic(body, model, stream, tracker)
                elif BACKEND == "command-code":
                    self._handle_command_code(body, model, stream, tracker)
                elif BACKEND in ("codebuff", "freebuff"):
                    self._handle_codebuff(body, model, stream, tracker)
                elif (BACKEND or "").startswith("gemini-oauth"):
                    self._handle_gemini_oauth(body, model, stream, tracker)
                else:
                    self._handle_openai_compat(body, model, stream, tracker)
            update_snapshot_response(request_id, "completed", time.time() - _req_t0)
        except Exception as _snap_err:
            update_snapshot_response(request_id, "error", time.time() - _req_t0, _snap_err)
            raise

    def _handle_openai_compat(self, body, model, stream, tracker=None):
        input_data = body.get("input", "")
        policy = provider_policy()

        pair_errors = validate_tool_pairs(input_data)
        if pair_errors:
            print(f"[tool-validator] repairing {len(pair_errors)} orphan tool outputs", file=sys.stderr)
            input_data = repair_orphan_tool_outputs(input_data, pair_errors)
            body = dict(body)
            body["input"] = input_data

        if (policy.get("synthetic_tool_results") or _provider_cap(model, "synthetic_tool_results", False)) and isinstance(input_data, list):
            input_data, synthesized = synthesize_tool_results_for_chat(input_data)
            if synthesized:
                print("[provider-adapter] using synthetic tool-result continuation", file=sys.stderr)
                body = dict(body)
                body["input"] = input_data

        compacted = False
        if policy.get("compaction") and isinstance(input_data, list):
            input_data, compacted = _adaptive_compact(input_data, model, policy)
            if compacted:
                body = dict(body)
                body["input"] = input_data

        if PROMPT_ENHANCER and isinstance(input_data, list):
            input_data = _apply_prompt_enhancer(input_data)
            body = dict(body)
            body["input"] = input_data

        crof_limit = _crof_item_limit(model)
        _crof_eligible = TARGET_URL and "crof.ai" in TARGET_URL
        if _crof_eligible and not compacted and isinstance(input_data, list) and len(input_data) > crof_limit:
            print(f"[crof-adaptive] proactive compact: {len(input_data)} items > limit {crof_limit}", file=sys.stderr)
            input_data = _crof_compact_for_retry(input_data, model)
            body = dict(body)
            body["input"] = input_data

        messages = oa_input_to_messages(input_data)
        messages = _inject_stored_reasoning(messages)
        instructions = body.get("instructions", "").strip()
        if instructions:
            messages.insert(0, {"role": "system", "content": instructions})

        if BGP_ROUTES:
            self._handle_bgp(body, model, stream, messages, input_data)
        else:
            chat_body = self._build_chat_body(model, messages, body, stream)
            target = upstream_target(TARGET_URL, "/chat/completions")
            if _api_key_pool:
                pool_acct = _api_key_pool.get()
                effective_key = pool_acct["token"] if pool_acct else API_KEY
            else:
                effective_key = _refresh_oauth_token()
            fwd = forwarded_headers(self.headers, {
                "Content-Type": "application/json",
                "Authorization": f"Bearer {effective_key}",
                **_openrouter_extra(),
            }, browser_ua=True)
            print(f"[{self._session_id}] POST {target} model={model} stream={stream} items={len(input_data) if isinstance(input_data,list) else 1}", file=sys.stderr)
            chat_body_b = json.dumps(chat_body).encode()
            max_retries = 3
            for attempt in range(max_retries + 1):
                req = urllib.request.Request(target, data=chat_body_b, headers=fwd)
                try:
                    upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, stream))
                except urllib.error.HTTPError as e:
                    err_body = e.read().decode()
                    if e.code in (429, 502, 503) and attempt < max_retries:
                        if e.code == 429 and _api_key_pool:
                            pool_acct = _api_key_pool.get()
                            if pool_acct:
                                _api_key_pool.mark_rate_limited(pool_acct, 60)
                                next_acct = _api_key_pool.get()
                                if next_acct:
                                    effective_key = next_acct["token"]
                                    fwd["Authorization"] = f"Bearer {effective_key}"
                                    print(f"[multi-account] rotating to key {next_acct['id']}", file=sys.stderr)
                        retry_after = e.headers.get("Retry-After")
                        if retry_after:
                            try:
                                wait = min(int(retry_after), 60)
                            except ValueError:
                                wait = min(2 ** (attempt + 1), 15)
                        else:
                            wait = min(2 ** (attempt + 1), 15)
                        print(f"[{self._session_id}] HTTP {e.code} (attempt {attempt+1}/{max_retries}), retrying in {wait}s: {err_body[:150]}", file=sys.stderr)
                        time.sleep(wait)
                        continue
                    return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
                except (ConnectionResetError, ConnectionAbortedError, BrokenPipeError) as e:
                    if attempt < max_retries:
                        wait = min(2 ** (attempt + 1), 10)
                        print(f"[{self._session_id}] connection error (attempt {attempt+1}/{max_retries}), retrying in {wait}s: {e}", file=sys.stderr)
                        time.sleep(wait)
                        continue
                    return self.send_json(502, {"error": {"type": "proxy_error", "message": str(e)}})
                except Exception as e:
                    return self.send_json(500, {"error": {"type": "proxy_error", "message": str(e)}})
                break
            self._forward_oa_compat(upstream, stream, model, chat_body, body, input_data, fwd, target, tracker)

    def _build_chat_body(self, model, messages, body, stream):
        chat_body = {"model": model, "messages": messages}
        for k in ("temperature", "top_p"):
            if k in body:
                chat_body[k] = body[k]
        chat_body["max_tokens"] = max(body.get("max_output_tokens", 0), 64000)
        tools = oa_convert_tools(body.get("tools"))
        if tools:
            chat_body["tools"] = tools
        if body.get("tool_choice"):
            chat_body["tool_choice"] = body["tool_choice"]
        chat_body["stream"] = stream
        if not REASONING_ENABLED or REASONING_EFFORT == "none":
            chat_body["enable_thinking"] = False
            chat_body["reasoning_effort"] = "none"
        else:
            chat_body["reasoning_effort"] = REASONING_EFFORT
        return chat_body

    def _handle_gemini_oauth(self, body, model, stream, tracker=None):
        input_data = body.get("input", "")
        policy = provider_policy()
        original_model = model

        _GEMINI_KEEP_RECENT = 6
        _GEMINI_OLD_LIMIT = 3000
        _GEMINI_RECENT_LIMIT = 20000

        if isinstance(input_data, list) and len(input_data) > 8:
            n_tool_outputs = sum(1 for it in input_data if isinstance(it, dict) and it.get("type") == "function_call_output")
            if n_tool_outputs > 2:
                tool_indexes = [i for i, it in enumerate(input_data) if isinstance(it, dict) and it.get("type") == "function_call_output"]
                recent_set = set(tool_indexes[-_GEMINI_KEEP_RECENT:])
                compacted_data = []
                for i, item in enumerate(input_data):
                    if isinstance(item, dict) and item.get("type") == "function_call_output":
                        o = item.get("output", "")
                        limit = _GEMINI_RECENT_LIMIT if i in recent_set else _GEMINI_OLD_LIMIT
                        if len(o) > limit:
                            item = dict(item)
                            item["output"] = o[:limit] + f"\n... [proxy compacted: kept {limit} of {len(o)} chars]"
                    compacted_data.append(item)
                input_data = compacted_data
                body = dict(body)
                body["input"] = input_data
                print(f"[gemini-compact] {n_tool_outputs} tool outputs, recent={_GEMINI_RECENT_LIMIT} old={_GEMINI_OLD_LIMIT}", file=sys.stderr)

        if OAUTH_PROVIDER == "google-antigravity":
            alias_map = {
                "Gemini 3.5 Flash (High)": "gemini-3-flash",
                "Gemini 3.5 Flash (Medium)": "gemini-3-flash",
                "Gemini 3.5 Flash (Low)": "gemini-3.5-flash-low",
                "gemini-3.5-flash-high": "gemini-3-flash",
                "gemini-3.5-flash-medium": "gemini-3-flash",
                "gemini-3.5-flash-low": "gemini-3.5-flash-low",
                "gemini-3-flash-preview": "gemini-3-flash",
                "gemini-3-flash": "gemini-3-flash",
                "antigravity-gemini-3-flash": "gemini-3-flash",
                "Gemini 3.1 Pro (High)": "gemini-3.1-pro-low",
                "Gemini 3.1 Pro (Low)": "gemini-3.1-pro-low",
                "gemini-3.1-pro-high": "gemini-3.1-pro-low",
                "gemini-3.1-pro-low": "gemini-3.1-pro-low",
                "gemini-3.1-pro-preview": "gemini-3.1-pro-low",
                "gemini-3.1-pro": "gemini-3.1-pro-low",
                "gemini-3-pro-preview": "gemini-3.1-pro-low",
                "gemini-3-pro": "gemini-3.1-pro-low",
                "gemini-3-pro-low": "gemini-3.1-pro-low",
                "gemini-3-pro-high": "gemini-3.1-pro-low",
                "antigravity-gemini-3-pro": "gemini-3.1-pro-low",
                "antigravity-gemini-3.1-pro": "gemini-3.1-pro-low",
                "Claude Sonnet 4.6 (Thinking)": "claude-sonnet-4-6",
                "Claude Sonnet 4.6 Thinking": "claude-sonnet-4-6",
                "claude-sonnet-4.6-thinking": "claude-sonnet-4-6",
                "antigravity-claude-sonnet-4-6": "claude-sonnet-4-6",
                "Claude Opus 4.6 (Thinking)": "claude-opus-4-6-thinking",
                "Claude Opus 4.6 Thinking": "claude-opus-4-6-thinking",
                "claude-opus-4.6-thinking": "claude-opus-4-6-thinking",
                "antigravity-claude-opus-4-6-thinking": "claude-opus-4-6-thinking",
                "GPT-OSS 120B (Medium)": "gpt-oss-120b-medium",
                "GPT-OSS 120B Medium": "gpt-oss-120b-medium",
                "gpt-oss-120b": "gpt-oss-120b-medium",
                "gemini-2.5-flash": "gemini-2.5-flash",
                "gemini-2.5-pro": "gemini-2.5-pro",
                "gemini-2.5-flash-lite": "gemini-2.5-flash-lite",
            }
            model = alias_map.get(model, model)
            if model != original_model:
                print(f"[antigravity] model mapped user={original_model} upstream={model}", file=sys.stderr)

        pair_errors = validate_tool_pairs(input_data)
        if pair_errors:
            input_data = repair_orphan_tool_outputs(input_data, pair_errors)
            body = dict(body)
            body["input"] = input_data

        compacted = False
        if policy.get("compaction") and isinstance(input_data, list):
            input_data, compacted = _adaptive_compact(input_data, model, policy)
            if compacted:
                body = dict(body)
                body["input"] = input_data

        if PROMPT_ENHANCER and isinstance(input_data, list):
            input_data = _apply_prompt_enhancer(input_data)
            body = dict(body)
            body["input"] = input_data

        if OAUTH_PROVIDER == "google-antigravity" and isinstance(input_data, list):
            input_data = _antigravity_normalize_context(input_data)
            body = dict(body)
            body["input"] = input_data

        access_token = _refresh_oauth_token()
        token_name = "google-antigravity-oauth-token.json" if OAUTH_PROVIDER == "google-antigravity" else "google-cli-oauth-token.json"
        token_path = os.path.join(os.path.expanduser("~"), ".cache", "codex-proxy", token_name)
        project_id = ""
        try:
            with open(token_path) as f:
                project_id = json.load(f).get("project_id", "")
        except Exception:
            pass

        contents = []
        system_parts = []
        instructions = body.get("instructions", "").strip()
        tool_call_names = {}

        if isinstance(input_data, list):
            for item in input_data:
                t = item.get("type")
                if t == "message":
                    role = "user" if item.get("role") == "user" else "model"
                    content = item.get("content", "")
                    if isinstance(content, list):
                        parts = []
                        for c in content:
                            ct = c.get("type")
                            if ct == "input_text":
                                parts.append({"text": c.get("text", "")})
                            elif ct == "text":
                                parts.append({"text": c.get("text", "")})
                            elif ct == "input_image" or ct == "image_url":
                                iu = c.get("image_url") or c.get("url", {})
                                url = iu.get("url", iu) if isinstance(iu, dict) else iu
                                if isinstance(url, str) and url.startswith("data:"):
                                    mime, _, b64 = url.partition(";base64,")
                                    mime = mime.replace("data:", "") or "image/png"
                                    parts.append({"inlineData": {"mimeType": mime, "data": b64}})
                                else:
                                    parts.append({"text": str(url)})
                        if parts:
                            contents.append({"role": role, "parts": parts})
                    elif isinstance(content, str):
                        contents.append({"role": role, "parts": [{"text": content}]})
                elif t == "function_call":
                    call_id = item.get("call_id") or item.get("id") or f"call_{uuid.uuid4().hex[:24]}"
                    fname = item.get("name", "")
                    if call_id and fname:
                        tool_call_names[call_id] = fname
                    args = item.get("arguments", "{}")
                    if isinstance(args, str):
                        try:
                            args = json.loads(args)
                        except Exception:
                            args = {}
                    fc_part = {"functionCall": {"name": fname, "args": args, "id": call_id}}
                    stored_sig = _gemini_get_sig(f"fc:{call_id}") or _gemini_get_sig(f"fc:{fname}")
                    if stored_sig:
                        fc_part["thoughtSignature"] = stored_sig
                        fc_part["thought_signature"] = stored_sig
                    else:
                        fc_part["thought_signature"] = "skip_thought_signature_validator"
                    contents.append({"role": "model", "parts": [fc_part]})
                elif t == "function_call_output":
                    call_id = item.get("call_id", item.get("id", ""))
                    output = item.get("output", "")
                    fname = item.get("name", "") or tool_call_names.get(call_id, "")
                    try:
                        output_parsed = json.loads(output) if isinstance(output, str) else output
                    except Exception:
                        output_parsed = output
                    resp_part = {"functionResponse": {"name": fname or "unknown", "response": {"result": output_parsed if isinstance(output_parsed, (dict, list)) else output}}}
                    if call_id:
                        resp_part["functionResponse"]["id"] = call_id
                    contents.append({"role": "user", "parts": [resp_part]})

        if OAUTH_PROVIDER.startswith("google"):
            sanitized = []
            last_user_text = None
            last_role = None
            for content in contents:
                role = content.get("role")
                parts = [p for p in content.get("parts", []) if isinstance(p, dict)]
                if not parts:
                    continue
                text_key = "\n".join([p.get("text", "") for p in parts if "text" in p]).strip()
                if role == "user" and text_key and text_key == last_user_text:
                    continue
                if role == last_role and role in ("user", "model") and sanitized:
                    sanitized[-1].setdefault("parts", []).extend(parts)
                else:
                    sanitized.append({"role": role, "parts": parts})
                if role == "user" and text_key:
                    last_user_text = text_key
                last_role = role
            while sanitized and sanitized[0].get("role") != "user":
                sanitized.pop(0)
            while sanitized and sanitized[-1].get("role") != "user":
                sanitized.pop()
            contents = sanitized

        if instructions:
            system_parts.append({"text": instructions})
        if OAUTH_PROVIDER == "google-antigravity":
            system_parts.append({"text": (
                "You are connected through a Responses API translation proxy. "
                "If tools are available and the user's request requires changing files, call the appropriate tool immediately. "
                "Do not announce plans, do not say you will list files, browse, fetch, inspect, or start by exploring unless you are emitting the actual tool call in the same response. "
                "For file creation requests, use tools to create or modify the file instead of only printing code in chat. "
                "If no suitable tool is available, answer directly with the complete result. "
                "Never answer only with a plan such as 'I will start by...' or 'I am going to...'."
            )})

        gen_config = {}
        mot = body.get("max_output_tokens", 0)
        if mot:
            gen_config["maxOutputTokens"] = mot
        if body.get("temperature") is not None:
            gen_config["temperature"] = body["temperature"]
        if body.get("top_p") is not None:
            gen_config["topP"] = body["top_p"]

        _is_claude_model = "claude" in model.lower()
        _is_claude_thinking = _is_claude_model and "thinking" in model.lower()

        if OAUTH_PROVIDER == "google-antigravity" and _is_claude_thinking:
            if REASONING_ENABLED and REASONING_EFFORT != "none":
                budget = {"low": 8192, "medium": 16384, "high": 32768}.get(REASONING_EFFORT, 16384)
            else:
                budget = 16384
            gen_config["thinkingConfig"] = {
                "include_thoughts": True,
                "thinking_budget": budget,
            }
            current_max = gen_config.get("maxOutputTokens", 0)
            if not current_max or current_max <= budget:
                gen_config["maxOutputTokens"] = 64000
            print(f"[antigravity-claude] thinking model={model} budget={budget} maxOutputTokens={gen_config.get('maxOutputTokens')}", file=sys.stderr)
        elif OAUTH_PROVIDER == "google-antigravity" and _is_claude_model:
            if "thinkingConfig" in gen_config:
                del gen_config["thinkingConfig"]
        elif REASONING_ENABLED and REASONING_EFFORT != "none":
            budget = {"low": 2048, "medium": 8192, "high": 24576}.get(REASONING_EFFORT, 8192)
            gen_config["thinkingConfig"] = {"includeThoughts": True, "thinkingBudget": budget}

        oa_tools = body.get("tools", [])
        gemini_tools = []
        if oa_tools:
            func_decls = []
            for tool in oa_tools:
                ttype = tool.get("type", "function")
                fname = tool.get("name", "")
                if ttype == "function":
                    fn = tool.get("function", tool)
                    name = fn.get("name", fname)
                    desc = fn.get("description", "")
                    params = fn.get("parameters", fn.get("input_schema", {}))
                    func_decls.append({"name": name, "description": desc, "parameters": params})
                elif fname:
                    func_decls.append({"name": fname, "description": tool.get("description", ""), "parameters": tool.get("parameters", {"type": "object", "properties": {}})})
            if func_decls:
                gemini_tools = [{"functionDeclarations": func_decls}]

        if OAUTH_PROVIDER == "google-antigravity":
            contents = _gemini_reattach_sigs(contents)

        if OAUTH_PROVIDER == "google-antigravity":
            guardrail_found = any("autonomous coding agent" in json.dumps(c.get("parts", []), ensure_ascii=False) for c in contents[:2])
            if not guardrail_found:
                contents.insert(0, {"role": "user", "parts": [{"text": _GEMINI_AGENT_GUARDRAIL}]})

        if OAUTH_PROVIDER == "google-antigravity" and isinstance(input_data, list):
            _EDIT_WORDS = ("change", "fix", "update", "redesign", "rewrite", "modify", "improve", "replace", "edit", "make it", "add", "remove", "delete", "rename", "move", "convert")
            latest_lower = ""
            for item in reversed(input_data):
                if item.get("type") == "message" and item.get("role") == "user":
                    c = item.get("content", "")
                    if isinstance(c, str): latest_lower = c.lower()
                    elif isinstance(c, list): latest_lower = " ".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict)).lower()
                    break
            if latest_lower and any(w in latest_lower for w in _EDIT_WORDS) and len(input_data) > 6:
                n_tool_calls = sum(1 for it in input_data if isinstance(it, dict) and it.get("type") == "function_call")
                if n_tool_calls > 0:
                    contents.append({"role": "user", "parts": [{"text": "IMPORTANT: The user is requesting a modification to existing files. You MUST use tools (exec_command, write, etc.) to make the changes. Do NOT just describe what to do — actually call the tools to modify the files now."}]})
                    print(f"[antigravity] edit-intent detected with {n_tool_calls} prior tool calls; injected tool-use nudge", file=sys.stderr)

        if OAUTH_PROVIDER == "google-antigravity" and isinstance(input_data, list):
            latest_user = ""
            for item in reversed(input_data):
                if item.get("type") == "message" and item.get("role") == "user":
                    c = item.get("content", "")
                    if isinstance(c, str):
                        latest_user = c
                    elif isinstance(c, list):
                        latest_user = "\n".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict))
                    break
            if latest_user:
                latest_norm = " ".join(latest_user.strip().split())[:160]
                final_text = ""
                if contents:
                    last = contents[-1]
                    if last.get("role") == "user":
                        final_text = " ".join(json.dumps(last.get("parts", []), ensure_ascii=False).split())
                if latest_norm[:120] not in final_text:
                    print(f"[antigravity] latest user instruction was not final turn; appending", file=sys.stderr)
                    contents.append({"role": "user", "parts": [{"text": latest_user}]})
                else:
                    print(f"[antigravity] latest user instruction is final turn", file=sys.stderr)
                print(f"[{self._session_id}] [antigravity-debug] input_items={len(input_data) if isinstance(input_data, list) else 1} contents={len(contents)} latest={latest_user[:80]!r}", file=sys.stderr)
                if contents:
                    last_c = contents[-1]
                    print(f"[{self._session_id}] [antigravity-debug] final_role={last_c.get('role')} preview={json.dumps(last_c.get('parts', []), ensure_ascii=False)[:200]}", file=sys.stderr)

        request_body = {"contents": contents}
        if system_parts:
            request_body["systemInstruction"] = {"parts": system_parts}
        if gen_config:
            request_body["generationConfig"] = gen_config
        if gemini_tools:
            request_body["tools"] = gemini_tools

        if OAUTH_PROVIDER == "google-antigravity" and _is_claude_model and gemini_tools:
            request_body["toolConfig"] = {"functionCallingConfig": {"mode": "VALIDATED"}}
            if _is_claude_thinking:
                print(f"[antigravity-claude] applied VALIDATED toolConfig for thinking model", file=sys.stderr)

        wrapped = {
            "project": project_id,
            "model": model,
            "request": request_body,
        }
        if OAUTH_PROVIDER == "google-antigravity":
            wrapped["requestType"] = "agent"
            wrapped["userAgent"] = "antigravity"
            wrapped["requestId"] = f"agent-{uuid.uuid4().hex[:12]}"

        _allow_staging = os.environ.get("ALLOW_ANTIGRAVITY_STAGING", "0") == "1"
        if OAUTH_PROVIDER == "google-antigravity":
            _antigravity_endpoints = ["https://cloudcode-pa.googleapis.com"]
            if _allow_staging:
                _antigravity_endpoints.extend([
                    "https://daily-cloudcode-pa.sandbox.googleapis.com",
                    "https://autopush-cloudcode-pa.sandbox.googleapis.com",
                ])
            endpoints = _antigravity_endpoints
        else:
            endpoints = ["https://cloudcode-pa.googleapis.com"]
        action = "streamGenerateContent" if stream else "generateContent"
        url_suffix = f"v1internal:{action}?alt=sse" if stream else f"v1internal:{action}"

        headers = {
            "Content-Type": "application/json",
            "Authorization": f"Bearer {access_token}",
        }
        if OAUTH_PROVIDER == "google-antigravity":
            version = _ensure_antigravity_version()
            headers["User-Agent"] = f"antigravity/{version} darwin/arm64"
        else:
            headers["User-Agent"] = "google-api-nodejs-client/9.15.1"
            headers["X-Goog-Api-Client"] = "gl-node/22.17.0"
            headers["Client-Metadata"] = "ideType=IDE_UNSPECIFIED,platform=PLATFORM_UNSPECIFIED,pluginType=GEMINI"
        body_b = json.dumps(wrapped).encode()
        n_contents = len(contents)
        has_tools = bool(gemini_tools)
        print(f"[{self._session_id}] model={model} stream={stream} items={len(input_data) if isinstance(input_data, list) else 1} project={project_id} contents={n_contents} tools={has_tools}", file=sys.stderr)
        if n_contents > 10:
            debug_path = os.path.join(_LOG_DIR, f"gemini-long-ctx-{self._session_id}.json")
            try:
                with open(debug_path, "w", encoding="utf-8") as dbg:
                    json.dump({"contents_count": n_contents, "contents_roles": [c.get("role") for c in contents], "has_tools": has_tools, "model": model, "wrapped_size": len(body_b)}, dbg, indent=2)
            except Exception:
                pass

        if OAUTH_PROVIDER == "google-antigravity":
            print(f"[antigravity-endpoint] endpoints={[e.replace('https://','') for e in endpoints]} project={project_id}", file=sys.stderr)

        for ep in endpoints:
            target = f"{ep}/{url_suffix}"
            req = urllib.request.Request(target, data=body_b, headers=headers)
            try:
                upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, stream))
                break
            except urllib.error.HTTPError as e:
                err_body = e.read().decode()
                if e.code == 400 and OAUTH_PROVIDER.startswith("google"):
                    try:
                        debug_path = os.path.join(_LOG_DIR, "gemini-last-400-request.json")
                        with open(debug_path, "w", encoding="utf-8") as dbg:
                            json.dump({"endpoint": ep, "model": model, "wrapped": wrapped, "error": err_body}, dbg, indent=2)
                        print(f"[{self._session_id}] saved 400 debug request to {debug_path}", file=sys.stderr)
                    except Exception:
                        pass
                if e.code == 403 and "SERVICE_DISABLED" in err_body[:500] and ep != endpoints[-1]:
                    print(f"[{self._session_id}] {ep} SERVICE_DISABLED, trying next endpoint", file=sys.stderr)
                    continue
                if e.code == 429 and ep != endpoints[-1] and _allow_staging:
                    print(f"[{self._session_id}] {ep} HTTP 429, trying next endpoint", file=sys.stderr)
                    continue
                if e.code == 429:
                    pool = _google_antigravity_pool if OAUTH_PROVIDER == "google-antigravity" else _google_cli_pool
                    _, acct = _get_google_account(OAUTH_PROVIDER)
                    if acct:
                        pool.mark_rate_limited(acct, 60)
                return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
            except Exception as e:
                if ep == endpoints[-1]:
                    return self.send_json(502, {"error": {"type": "proxy_error", "message": str(e)}})
                print(f"[{self._session_id}] {ep} failed: {e}, trying next", file=sys.stderr)
                continue

        if stream:
            self._forward_gemini_sse(upstream, model, body, input_data, tracker)
        else:
            self._forward_gemini_json(upstream, model, body, input_data)

    def _forward_gemini_sse(self, upstream, model, body, input_data, tracker=None):
        resp_id = f"resp-{uuid.uuid4().hex[:24]}"
        created = int(time.time())
        self.send_response(200)
        self.send_header("Content-Type", "text/event-stream")
        self.send_header("Cache-Control", "no-cache")
        self.send_header("Connection", "keep-alive")
        self.end_headers()

        full_text = ""
        output_items = []
        current_tool_calls = {}
        message_started = False
        message_id = f"msg-{uuid.uuid4().hex[:24]}"

        def flush_event(event_type, data):
            self.wfile.write(f"event: {event_type}\ndata: {json.dumps(data)}\n\n".encode())
            self.wfile.flush()

        flush_event("response.created", {"type": "response.created", "response": {"id": resp_id, "object": "response", "model": model, "status": "in_progress", "created": created, "output": []}})
        flush_event("response.in_progress", {"type": "response.in_progress", "response": {"id": resp_id}})

        buf = ""
        stream_finished = False
        for raw_line in _stream_with_idle_timeout(upstream):
            if tracker and tracker.cancelled.is_set():
                print("[gemini-oauth] stream cancelled", file=sys.stderr)
                break
            if stream_finished:
                break
            line = raw_line.decode(errors="replace")
            if line.startswith("data: "):
                buf += line[6:]
                continue
            if not line.strip() and buf:
                try:
                    chunk = json.loads(buf)
                except Exception:
                    buf = ""
                    continue
                buf = ""

                candidates = chunk.get("response", chunk).get("candidates", [])
                if not candidates:
                    if chunk.get("error"):
                        print(f"[{self._session_id}] stream error chunk: {str(chunk.get('error'))[:300]}", file=sys.stderr)
                    continue
                if candidates[0].get("finishReason") and not candidates[0].get("content", {}).get("parts"):
                    print(f"[{self._session_id}] finish without parts: {candidates[0].get('finishReason')}", file=sys.stderr)
                parts = candidates[0].get("content", {}).get("parts", [])
                for part in parts:
                    sig = _extract_gemini_sig(part)
                    if sig:
                        if part.get("functionCall"):
                            fc_id = part["functionCall"].get("id") or part["functionCall"].get("name")
                            fc_name = part["functionCall"].get("name")
                            if fc_id:
                                _gemini_store_sig(f"fc:{fc_id}", sig)
                            if fc_name:
                                _gemini_store_sig(f"fc:{fc_name}", sig)
                        _gemini_store_sig(f"turn:{resp_id}", sig)
                    if part.get("thought"):
                        sig_from_thought = _extract_gemini_sig(part)
                        if sig_from_thought:
                            _gemini_store_sig(f"turn:{resp_id}", sig_from_thought)
                        continue
                    if "text" in part and not part.get("functionCall"):
                        text_delta = part["text"]
                        if not text_delta:
                            continue
                        full_text += text_delta
                        if not message_started:
                            flush_event("response.output_item.added", {"type": "response.output_item.added", "output_index": 0, "item": {"type": "message", "id": message_id, "role": "assistant", "content": []}})
                            flush_event("response.content_part.added", {"type": "response.content_part.added", "output_index": 0, "content_index": 0, "part": {"type": "output_text", "text": ""}})
                            output_items.append({"text": True})
                            message_started = True
                        flush_event("response.output_text.delta", {"type": "response.output_text.delta", "output_index": 0, "content_index": 0, "delta": text_delta})
                    elif part.get("functionCall"):
                        fc = part["functionCall"]
                        call_id = f"call_{uuid.uuid4().hex[:24]}"
                        args_str = json.dumps(fc.get("args", fc.get("arguments", {})))
                        output_index = len(output_items)
                        flush_event("response.output_item.added", {"type": "response.output_item.added", "output_index": output_index, "item": {"type": "function_call", "id": call_id, "call_id": call_id, "name": fc.get("name", ""), "arguments": ""}})
                        flush_event("response.function_call_arguments.delta", {"type": "response.function_call_arguments.delta", "output_index": output_index, "item_id": call_id, "delta": args_str})
                        flush_event("response.function_call_arguments.done", {"type": "response.function_call_arguments.done", "output_index": output_index, "item_id": call_id, "arguments": args_str})
                        current_tool_calls[call_id] = fc
                        output_items.append({"tool": True})
                last_finish = candidates[0].get("finishReason", "")
                if last_finish:
                    part_kinds = []
                    for p in parts:
                        if "text" in p: part_kinds.append("text")
                        if "functionCall" in p: part_kinds.append("functionCall")
                        if _extract_gemini_sig(p): part_kinds.append("thoughtSignature")
                    print(f"[{self._session_id}] [antigravity] finish={last_finish} parts={part_kinds} tool_calls={len(current_tool_calls)}", file=sys.stderr)
                    if OAUTH_PROVIDER == "google-antigravity" and last_finish == "MAX_TOKENS" and full_text and not current_tool_calls:
                        print(f"[{self._session_id}] MAX_TOKENS hit ({len(full_text)} chars), auto-continuing...", file=sys.stderr)
                        break
                    stream_finished = True
                    break

        if OAUTH_PROVIDER.startswith("google") and full_text and not current_tool_calls and last_finish == "MAX_TOKENS" and not stream_finished:
            result = _auto_continue_gemini(self, flush_event, message_id, model, gen_config, gemini_tools, system_parts, project_id, headers, endpoints, url_suffix, full_text, output_items, message_started)
            if result:
                full_text = result
                for item in output_items:
                    if isinstance(item, dict) and item.get("tool") and "fc" in item and "call_id" in item:
                        current_tool_calls[item["call_id"]] = item["fc"]

        out = []
        if not full_text and not current_tool_calls:
            print("[gemini-oauth] WARNING: completed with empty output", file=sys.stderr)
        if full_text:
            out.append({"type": "message", "id": message_id, "role": "assistant", "content": [{"type": "output_text", "text": full_text}]})
        tool_outputs = []
        for cid, fc in current_tool_calls.items():
            tool_outputs.append({"type": "function_call", "id": cid, "call_id": cid, "name": fc.get("name", ""), "arguments": json.dumps(fc.get("args", fc.get("arguments", {})))})
        out.extend(tool_outputs)

        final_resp = {"id": resp_id, "object": "response", "model": model, "status": "completed", "created": created, "output": out}
        if full_text:
            flush_event("response.output_text.done", {"type": "response.output_text.done", "output_index": 0, "content_index": 0, "text": full_text})
            flush_event("response.content_part.done", {"type": "response.content_part.done", "output_index": 0, "content_index": 0, "part": {"type": "output_text", "text": full_text}})
            flush_event("response.output_item.done", {"type": "response.output_item.done", "output_index": 0, "item": out[0]})
        for idx, item in enumerate(tool_outputs, start=(1 if full_text else 0)):
            flush_event("response.output_item.done", {"type": "response.output_item.done", "output_index": idx, "item": item})
        flush_event("response.completed", {"type": "response.completed", "response": final_resp})
        self.close_connection = True

        with _response_store_lock:
            _response_store[resp_id] = final_resp
            while len(_response_store) > _MAX_STORED:
                _response_store.popitem(last=False)

    def _forward_gemini_json(self, upstream, model, body, input_data):
        data = json.loads(upstream.read().decode())
        resp_id = f"resp-{uuid.uuid4().hex[:24]}"
        created = int(time.time())
        out = []
        full_text = ""
        candidates = data.get("response", data).get("candidates", [])
        if candidates:
            parts = candidates[0].get("content", {}).get("parts", [])
            text_parts = []
            for part in parts:
                if part.get("thought"):
                    continue
                if "text" in part and not part.get("functionCall"):
                    text_parts.append(part["text"])
                elif part.get("functionCall"):
                    fc = part["functionCall"]
                    call_id = f"call_{uuid.uuid4().hex[:24]}"
                    out.append({"type": "function_call", "id": call_id, "call_id": call_id, "name": fc.get("name", ""), "arguments": json.dumps(fc.get("args", fc.get("arguments", {})))})
            if text_parts:
                full_text = "".join(text_parts)
                out.insert(0, {"type": "message", "id": f"msg-{uuid.uuid4().hex[:24]}", "role": "assistant", "content": [{"type": "output_text", "text": full_text}]})
        resp = {"id": resp_id, "object": "response", "model": model, "status": "completed", "created": created, "output": out}
        with _response_store_lock:
            _response_store[resp_id] = resp
            while len(_response_store) > _MAX_STORED:
                _response_store.popitem(last=False)
        self.send_json(200, resp)

    def _handle_bgp(self, body, model, stream, messages, input_data):
        routes = _sorted_bgp_routes()
        routes = [r for r in routes if _bucket_for_route(r).allow()]
        if not routes:
            return self.send_json(503, {"error": {"type": "bgp_rate_limited", "message": "All routes rate-limited"}})
        errors = []
        for route in routes:
            r_model = route.get("model", model)
            r_url = route["target_url"].rstrip("/")
            r_key = route.get("api_key", "")
            r_reasoning = route.get("reasoning_enabled", True)
            r_effort = route.get("reasoning_effort", "medium")
            r_oauth = route.get("oauth_provider", "")

            chat_body = dict(messages=list(messages))
            chat_body["model"] = r_model
            for k in ("temperature", "top_p"):
                if k in body:
                    chat_body[k] = body[k]
            chat_body["max_tokens"] = max(body.get("max_output_tokens", 0), 64000)
            tools = oa_convert_tools(body.get("tools"))
            if tools:
                chat_body["tools"] = tools
            if body.get("tool_choice"):
                chat_body["tool_choice"] = body["tool_choice"]
            chat_body["stream"] = stream
            if not r_reasoning or r_effort == "none":
                chat_body["enable_thinking"] = False
                chat_body["reasoning_effort"] = "none"
            else:
                chat_body["reasoning_effort"] = r_effort

            target = upstream_target(r_url, "/chat/completions")
            if r_oauth == "google":
                r_key = _refresh_oauth_token_for(r_key, r_oauth)
            fwd = forwarded_headers(self.headers, {
                "Content-Type": "application/json",
                "Authorization": f"Bearer {r_key}",
                **_openrouter_extra(),
            }, browser_ua=True)
            print(f"[{self._session_id}] trying route '{route.get('name', r_url)}' model={r_model}", file=sys.stderr)
            req = urllib.request.Request(target, data=json.dumps(chat_body).encode(), headers=fwd)
            t0_route = time.time()
            route_ok = False
            for attempt in range(3):
                try:
                    upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, stream))
                    print(f"[{self._session_id}] route '{route.get('name', r_url)}' connected OK", file=sys.stderr)
                    _update_route_stats(route, True, time.time() - t0_route)
                    self._forward_oa_compat(upstream, stream, r_model, chat_body, body, input_data, fwd, target)
                    return
                except urllib.error.HTTPError as e:
                    err = e.read().decode()
                    if e.code in (429, 502, 503) and attempt < 2:
                        retry_after = e.headers.get("Retry-After")
                        wait = min(int(retry_after), 60) if retry_after and retry_after.isdigit() else min(2 ** (attempt + 1), 10)
                        print(f"[{self._session_id}] route '{route.get('name', r_url)}' HTTP {e.code}, retry {attempt+1}/2 in {wait}s", file=sys.stderr)
                        time.sleep(wait)
                        req = urllib.request.Request(target, data=json.dumps(chat_body).encode(), headers=fwd)
                        continue
                    print(f"[{self._session_id}] route '{route.get('name', r_url)}' FAILED: HTTP {e.code}: {err[:200]}", file=sys.stderr)
                    _update_route_stats(route, False, time.time() - t0_route, http_code=e.code)
                    errors.append(f"{route.get('name','?')}: HTTP {e.code}")
                    break
                except (ConnectionResetError, ConnectionAbortedError, BrokenPipeError) as e:
                    if attempt < 2:
                        wait = min(2 ** (attempt + 1), 8)
                        print(f"[{self._session_id}] route '{route.get('name', r_url)}' conn error, retry {attempt+1}/2 in {wait}s: {e}", file=sys.stderr)
                        time.sleep(wait)
                        req = urllib.request.Request(target, data=json.dumps(chat_body).encode(), headers=fwd)
                        continue
                    _update_route_stats(route, False, time.time() - t0_route, error_type=str(e))
                    errors.append(f"{route.get('name','?')}: {e}")
                    break
                except Exception as e:
                    print(f"[{self._session_id}] route '{route.get('name', r_url)}' FAILED: {e}", file=sys.stderr)
                    _update_route_stats(route, False, time.time() - t0_route, error_type=str(e))
                    errors.append(f"{route.get('name','?')}: {e}")
                    break

        print(f"[{self._session_id}] ALL ROUTES FAILED: {errors}", file=sys.stderr)
        self.send_json(502, {"error": {"type": "bgp_all_routes_failed", "message": f"All BGP routes failed: {'; '.join(errors)}"}})

    def _forward_oa_compat(self, upstream, stream, model, chat_body, body, input_data, fwd, target, tracker=None):
        n_items = len(input_data) if isinstance(input_data, list) else 1
        t0 = time.time()
        provider = TARGET_URL.split("//")[-1].split("/")[0]
        if BGP_ROUTES:
            provider = "bgp:" + (BGP_ROUTES[0].get("name", "pool") if BGP_ROUTES else "unknown")

        if stream:
            self.send_response(200)
            self.send_header("Content-Type", "text/event-stream")
            self.send_header("Cache-Control", "no-cache")
            self.send_header("Connection", "keep-alive")
            self.end_headers()
            if hasattr(self, 'connection') and self.connection:
                try:
                    self.connection.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
                except Exception:
                    pass

            collected_events = []
            last_resp_id = None
            last_output = None
            last_status = None
            finish_reason = None
            has_content = False

            def _observe_event(event):
                nonlocal last_resp_id, last_output, last_status, finish_reason, has_content
                for line in event.strip().split("\n"):
                    if line.startswith("data: "):
                        try:
                            d = json.loads(line[6:])
                            if d.get("type") == "response.completed":
                                last_resp_id = d.get("response", {}).get("id")
                                last_output = d.get("response", {}).get("output", [])
                                last_status = d.get("response", {}).get("status")
                                finish_reason = "length" if last_status == "incomplete" else "stop"
                                has_content = any(o.get("type") == "message" for o in (last_output or []))
                        except Exception:
                            pass

            try:
                reasoning_out = {}
                for event in oa_stream_to_sse(upstream, model, body.get("request_id") or body.get("id"), _reasoning_out=reasoning_out):
                    if tracker and tracker.cancelled.is_set():
                        print("[translate-proxy] stream cancelled", file=sys.stderr)
                        break
                    collected_events.append(event)
                    _observe_event(event)
            except (ConnectionResetError, BrokenPipeError, ConnectionAbortedError):
                print("[translate-proxy] client disconnected during stream", file=sys.stderr)
                _crof_record(model, n_items, False)
                _log_resp(last_resp_id, "client_disconnect", last_output)
                return

            # Record outcome
            success = (finish_reason != "length")
            _crof_record(model, n_items, success)
            _log_resp(last_resp_id, last_status, last_output)
            if last_resp_id and input_data is not None:
                store_response(last_resp_id, input_data, last_output)
            if reasoning_out.get("text"):
                with _last_reasoning_lock:
                    _last_reasoning_store[last_resp_id or ""] = {
                        "reasoning": reasoning_out["text"],
                        "tool_calls": reasoning_out.get("tool_calls", []),
                        "ts": time.time(),
                    }
                    while len(_last_reasoning_store) > _MAX_STORED:
                        oldest = next(iter(_last_reasoning_store))
                        del _last_reasoning_store[oldest]
            _record_usage(provider, model, success, time.time() - t0, error_type="length" if not success else None)

            # Auto-learn provider quirks before flushing the bad response to Codex.
            if finish_reason == "length" and not has_content and has_function_call_output(input_data):
                _set_provider_cap(model, "synthetic_tool_results", True, "incomplete empty response after tool output")
                new_input, synthesized = synthesize_tool_results_for_chat(input_data)
                if synthesized:
                    print("[provider-sensor] retrying turn with synthetic tool results", file=sys.stderr)
                    new_messages = oa_input_to_messages(new_input)
                    instructions = body.get("instructions", "").strip()
                    if instructions:
                        new_messages.insert(0, {"role": "system", "content": instructions})
                    new_chat_body = self._build_chat_body(model, new_messages, body, stream)
                    new_req = urllib.request.Request(target, data=json.dumps(new_chat_body).encode(), headers=fwd)
                    try:
                        retry_upstream = urllib.request.urlopen(new_req, timeout=_upstream_timeout(body, True))
                        collected_events = []
                        last_resp_id = last_output = last_status = None
                        finish_reason = None
                        has_content = False
                        for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")):
                            collected_events.append(event)
                            _observe_event(event)
                        input_data = new_input
                    except Exception as e:
                        print(f"[provider-sensor] synthetic retry failed: {e}", file=sys.stderr)

            # Auto-retry on finish_reason=length with no content due to too much context.
            if finish_reason == "length" and not has_content and isinstance(input_data, list) and len(input_data) > 5 and TARGET_URL and "crof.ai" in TARGET_URL:
                print(f"[crof-adaptive] RETRY: finish_reason=length with no content, compacting {n_items} items", file=sys.stderr)
                new_input = _crof_compact_for_retry(input_data, model)
                if len(new_input) < len(input_data):
                    new_body = dict(body)
                    new_body["input"] = new_input
                    new_messages = oa_input_to_messages(new_input)
                    instructions = body.get("instructions", "").strip()
                    if instructions:
                        new_messages.insert(0, {"role": "system", "content": instructions})
                    new_chat_body = dict(chat_body)
                    new_chat_body["messages"] = new_messages
                    new_req = urllib.request.Request(
                        target,
                        data=json.dumps(new_chat_body).encode(),
                        headers=fwd,
                    )
                    try:
                        retry_upstream = urllib.request.urlopen(new_req, timeout=_upstream_timeout(body, True))
                        collected_events = []
                        last_resp_id = last_output = last_status = None
                        finish_reason = None
                        has_content = False
                        for event in oa_stream_to_sse(retry_upstream, model, body.get("request_id") or body.get("id")):
                            collected_events.append(event)
                            _observe_event(event)
                        input_data = new_input
                    except Exception as e:
                        print(f"[crof-adaptive] retry failed: {e}", file=sys.stderr)

            self.stream_buffered_events(collected_events)
        else:
            result = oa_resp_to_responses(json.loads(upstream.read()), model)
            success = result.get("status") != "incomplete"
            _crof_record(model, n_items, success)
            self.send_json(200, result)
            rid = result.get("id")
            _log_resp(rid, result.get("status"), result.get("output", []))
            if rid and input_data is not None:
                store_response(rid, input_data, result.get("output", []))
            _record_usage(provider, model, success, time.time() - t0)

    def _forward_oa_compat_retry(self, req, model, chat_body, body, input_data, tracker=None):
        try:
            upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, True))
        except Exception as e:
            print(f"[crof-adaptive] retry failed: {e}", file=sys.stderr)
            return

        self.send_response(200)
        self.send_header("Content-Type", "text/event-stream")
        self.send_header("Cache-Control", "no-cache")
        self.send_header("Connection", "keep-alive")
        self.end_headers()
        if hasattr(self, 'connection') and self.connection:
            try:
                self.connection.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
            except Exception:
                pass

        last_resp_id = None
        last_output = None
        last_status = None
        try:
            def on_event(event):
                nonlocal last_resp_id, last_output, last_status
                if tracker and tracker.cancelled.is_set():
                    print("[translate-proxy] retry stream cancelled", file=sys.stderr)
                    return False
                for line in event.strip().split("\n"):
                    if line.startswith("data: "):
                        try:
                            d = json.loads(line[6:])
                            if d.get("type") == "response.completed":
                                 last_resp_id = d.get("response", {}).get("id")
                                 last_output = d.get("response", {}).get("output", [])
                                 last_status = d.get("response", {}).get("status")
                        except: pass
                return True
            self.stream_buffered_events(oa_stream_to_sse(upstream, model, body.get("request_id") or body.get("id")), on_event=on_event)
        except (ConnectionResetError, BrokenPipeError, ConnectionAbortedError):
            print("[translate-proxy] client disconnected during retry stream", file=sys.stderr)

        n_items = len(input_data) if isinstance(input_data, list) else 1
        _crof_record(model, n_items, last_status == "completed")
        _log_resp(last_resp_id, last_status or "retry_disconnect", last_output)
        if last_resp_id and input_data is not None:
            store_response(last_resp_id, input_data, last_output)

    def _handle_anthropic(self, body, model, stream, tracker=None):
        input_data = body.get("input", "")
        an_body = {"model": model, "messages": an_input_to_messages(input_data),
                   "max_tokens": body.get("max_output_tokens", 8192)}
        instructions = body.get("instructions", "").strip()
        if instructions:
            an_body["system"] = [{"type": "text", "text": instructions,
                                   "cache_control": {"type": "ephemeral"}}]
        for k in ("temperature", "top_p"):
            if k in body:
                an_body[k] = body[k]
        tools = an_convert_tools(body.get("tools"))
        if tools:
            an_body["tools"] = tools
        if body.get("tool_choice"):
            tc = body["tool_choice"]
            if isinstance(tc, str):
                an_body["tool_choice"] = {"type": tc}
            elif isinstance(tc, dict):
                an_body["tool_choice"] = tc
        an_body["stream"] = stream

        target = upstream_target(TARGET_URL, "/messages")
        req = urllib.request.Request(
            target,
            data=json.dumps(an_body).encode(),
            headers=forwarded_headers(self.headers, {
                "Content-Type": "application/json",
                "x-api-key": API_KEY,
                "anthropic-version": "2023-06-01",
                **_openrouter_extra(),
            }),
        )
        self._forward(req, stream, model,
            lambda r: an_resp_to_responses(json.loads(r.read()), model),
            lambda s: an_stream_to_sse(s, model, body.get("request_id") or body.get("id")),
            input_data=body.get("input", ""), tracker=tracker)

    def _handle_command_code(self, body, model, stream, tracker=None):
        """[ALL FIXES IN ONE] CommandCode /alpha/generate adapter.

        FIX 1: Uses cc_input_to_messages (string content only, no content blocks)
        FIX 2: Always sends x-command-code-version header (fallback "0.26.8")
        FIX 3: No stale schema cache — cleared, 24h TTL
        FIX 4: Streaming path wrapped in try/except → sends response.completed(status="failed") on crash
        FIX 5: Response parser (_parse_commandcode_text_tool_calls) now extracts raw JSON tool calls
        FIX 6: Arguments no longer double-wrapped (three-tier parser in _extract_args)
        FIX 7: _extract_field handles escaped values (\") correctly
        FIX 8: sandbox_permissions normalized to valid variants only
        REVERTED: Removed adaptive probing system (caused format mismatch).
        Uses conservative cc_input_to_messages format exclusively.
        ErrorAnalyzer learning on retries (not proactive probes).
        """
        input_data = body.get("input", "")
        instructions = body.get("instructions", "").strip()

        schema = _load_schema(model=model)

        thread_id = body.get("request_id") or body.get("id") or ""
        try:
            uuid.UUID(thread_id)
        except (ValueError, AttributeError):
            thread_id = str(uuid.uuid4())

        # Build auth headers
        auth_val = f"{schema.auth_scheme}{API_KEY}" if schema.auth_scheme else API_KEY
        headers_extra = {
            "Content-Type": "application/json",
            "Accept": "text/event-stream, application/json",
        }
        if schema.auth_header:
            headers_extra[schema.auth_header] = auth_val
        else:
            headers_extra["Authorization"] = f"Bearer {API_KEY}"
        headers_extra["x-command-code-version"] = CC_VERSION or "0.26.8"

        pm = schema.param_names
        tp = schema.field_names.get("tools_param", "tools")
        target = upstream_target(TARGET_URL, "/alpha/generate")

        # ── MAIN REQUEST WITH RETRY ──
        max_retries = 2
        for attempt in range(max_retries + 1):
            cc_msgs = cc_input_to_messages(input_data, instructions, schema)
            cc_body = {
                "config": _cc_config(),
                "memory": "", "taste": "", "skills": "",
                "params": {
                    "stream": True,
                    pm.get("max_tokens", "max_tokens"): body.get("max_output_tokens", 64000),
                    pm.get("temperature", "temperature"): body.get("temperature", 0.3),
                    "messages": cc_msgs,
                    "model": model,
                    tp: [],
                },
                "threadId": thread_id,
            }

            fwd = forwarded_headers(self.headers, {**headers_extra, **_openrouter_extra()}, browser_ua=True)
            print(f"[{self._session_id}] POST {target} model={model} stream={stream} attempt={attempt} [command-code]", file=sys.stderr)
            req = urllib.request.Request(
                target,
                data=json.dumps(cc_body).encode(),
                headers=fwd,
            )

            try:
                upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, True))
                break
            except urllib.error.HTTPError as e:
                err = e.read().decode()
                if attempt < max_retries:
                    hints = ErrorAnalyzer.analyze(err, schema)
                    if hints:
                        print(f"[{self._session_id}] error analysis: {hints}", file=sys.stderr)
                        ErrorAnalyzer.merge_into_schema(hints, schema)
                        _save_schema(schema, model=model)
                        continue
                    if e.code in (429, 502, 503):
                        time.sleep(min(2 ** (attempt + 1), 10))
                        continue
                return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err)}})
            except Exception as e:
                if attempt < max_retries:
                    time.sleep(1)
                    continue
                return self.send_json(500, {"error": {"type": "proxy_error", "message": str(e)}})

        _save_schema(schema, model=model)

        if stream:
            self.send_response(200)
            self.send_header("Content-Type", "text/event-stream")
            self.send_header("Cache-Control", "no-cache")
            self.send_header("Connection", "keep-alive")
            self.end_headers()
            if hasattr(self, 'connection') and self.connection:
                try:
                    self.connection.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
                except Exception:
                    pass
            last_resp_id = None
            last_output = None
            def on_event(event):
                nonlocal last_resp_id, last_output
                if tracker and tracker.cancelled.is_set():
                    print("[command-code] stream cancelled", file=sys.stderr)
                    return False
                for line in event.strip().split("\n"):
                    if line.startswith("data: "):
                        try:
                            d = json.loads(line[6:])
                            if d.get("type") == "response.completed":
                                last_resp_id = d.get("response", {}).get("id")
                                last_output = d.get("response", {}).get("output", [])
                        except: pass
                return True
            try:
                self.stream_buffered_events(cc_stream_to_sse(upstream, model, body.get("request_id") or body.get("id")), on_event=on_event)
            except Exception as e:
                print(f"[{self._session_id}] stream error: {e}", file=sys.stderr)
                try:
                    err_event = 'data: ' + json.dumps({"type": "response.completed",
                        "response": {"id": body.get("request_id") or body.get("id") or uid("resp"),
                                     "object": "response", "model": model, "status": "failed",
                                     "created": int(time.time()), "output": [],
                                     "usage": {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0,
                                               "input_tokens_details": {"cached_tokens": 0}}}})
                    self.wfile.write(err_event.encode())
                    self.wfile.flush()
                except Exception:
                    pass
            if last_resp_id:
                store_response(last_resp_id, body.get("input", ""), last_output)
        else:
            raw = upstream.read().decode()
            result = cc_resp_to_responses(raw, model)
            self.send_json(200, result)
            rid = result.get("id")
            if rid:
                store_response(rid, body.get("input", ""), result.get("output", []))

    def _handle_codebuff(self, body, model, stream, tracker=None):
         agent_id = _CODEBUFF_AGENT_MAP.get(model)
         if not agent_id:
             matched = None
             for m in _CODEBUFF_AGENT_MAP:
                 if model.lower().replace("/", "").replace("-", "") in m.lower().replace("/", "").replace("-", ""):
                     matched = m
                     break
             if matched:
                 agent_id = _CODEBUFF_AGENT_MAP[matched]
                 model = matched
             else:
                 fallback_model = "deepseek/deepseek-v4-flash"
                 agent_id = _CODEBUFF_AGENT_MAP.get(fallback_model, "base2-free-deepseek-flash")
                 print(f"[codebuff] unknown model '{model}', falling back to {fallback_model}", file=sys.stderr)
                 model = fallback_model

         _cb_pool.load_accounts()
         pool_status = _cb_pool.status()
         n_accounts = len(pool_status)
         if n_accounts == 0:
             return self.send_json(401, {"error": {"type": "auth_error",
                 "message": "No codebuff credentials found. Add accounts to ~/.config/manicode/credentials.json"}})

         last_err = None
         for attempt in range(n_accounts):
             token, acct = _get_codebuff_account()
             if not token:
                 return self.send_json(401, {"error": {"type": "auth_error",
                     "message": "No codebuff credentials found. All accounts exhausted."}})

             acct_id = acct.get("id", "?") if acct else "?"
             if attempt > 0:
                 print(f"[codebuff] rotation attempt {attempt+1}/{n_accounts}, trying account {acct_id}", file=sys.stderr)

             run_id, run_err = _codebuff_start_run(token, agent_id)
             if not run_id:
                 if run_err and run_err[0] == "rate_limit_error":
                     retry_s = run_err[2]
                     _cb_pool.mark_rate_limited(acct, retry_s)
                     last_err = ("rate_limit_error", run_err[1], f"Account {acct_id} rate-limited by Codebuff: {run_err[3]}")
                 else:
                     _cb_pool.mark_rate_limited(acct, 60)
                     last_err = ("upstream_error", run_err[1] if run_err else 502,
                                 f"Failed to start agent run for {acct_id}: {run_err[3] if run_err else 'unknown error'}")
                 continue

             try:
                 instance_id = _codebuff_get_session(token, model)
             except RateLimitError as rle:
                 retry_s = rle.retry_seconds
                 fb_msg = rle.message
                 mins = int(retry_s // 60)
                 user_msg = fb_msg if fb_msg else f"Daily session limit reached. Resets in {mins}m."
                 print(f"[codebuff] session 429 for {acct_id}, retry after {retry_s:.0f}s", file=sys.stderr)
                 _cb_pool.mark_rate_limited(acct, retry_s)
                 _codebuff_finish_run(token, run_id, "completed")
                 last_err = ("rate_limit_error", 429, user_msg)
                 continue

             input_data = body.get("input", "")
             instructions = body.get("instructions", "").strip()
             messages = _cb_input_to_messages(input_data, instructions)
             messages = _ds_rebuild_tool_history(messages)

             metadata = {
                 "run_id": run_id,
                 "cost_mode": "free",
                 "client_id": "".join(secrets.choice(string.digits + string.ascii_lowercase) for _ in range(13)),
             }
             if instance_id:
                 metadata["freebuff_instance_id"] = instance_id

             chat_body = {
                 "model": model,
                 "messages": messages,
                 "stream": stream,
                 "max_tokens": max(body.get("max_output_tokens", 0), 64000),
                 "codebuff_metadata": metadata,
             }
             for k in ("temperature", "top_p"):
                 if k in body:
                     chat_body[k] = body[k]
             tools = oa_convert_tools(body.get("tools"))
             if tools:
                 chat_body["tools"] = tools
             if body.get("tool_choice"):
                 chat_body["tool_choice"] = body["tool_choice"]

             target = f"{_CODEBUFF_API_URL}/api/v1/chat/completions"
             headers = {
                 "Content-Type": "application/json",
                 "Authorization": f"Bearer {token}",
                 "User-Agent": "ai-sdk/openai-compatible/1.0.25/codebuff",
                 "x-codebuff-model": model,
             }
             if instance_id:
                 headers["x-codebuff-instance-id"] = instance_id

             print(f"[{self._session_id}] [codebuff] POST {target} model={model} stream={stream} run={run_id} acct={acct_id}", file=sys.stderr)
             chat_body_b = json.dumps(chat_body).encode()

             try:
                 req = urllib.request.Request(target, data=chat_body_b, headers=headers)
                 upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, stream))
             except urllib.error.HTTPError as e:
                 err_body = e.read().decode()[:1000]
                 _codebuff_finish_run(token, run_id, "failed")
                 if e.code in (429, 426):
                     reset_ms = 0
                     fb_msg = ""
                     try:
                         err_json = json.loads(err_body)
                         reset_ms = err_json.get("retryAfterMs", 0)
                         fb_msg = err_json.get("message", err_json.get("error", ""))
                         if isinstance(fb_msg, dict):
                             fb_msg = fb_msg.get("message", "")
                     except Exception:
                         pass
                     duration = max(reset_ms / 1000, 120) if reset_ms else 120
                     mins = int(duration // 60)
                     if not fb_msg:
                         fb_msg = _sanitize_err_body(err_body)
                     user_msg = f"{fb_msg} (resets in {mins}m)" if fb_msg else f"Rate limited. Resets in {mins}m."
                     _cb_pool.mark_rate_limited(acct, duration)
                     last_err = ("rate_limit_error", e.code, user_msg)
                     print(f"[codebuff] account {acct_id} got HTTP {e.code}, rotating", file=sys.stderr)
                     continue
                 if _is_reasoning_content_error(err_body):
                     print(f"[codebuff] reasoning_content error, retrying with thinking disabled", file=sys.stderr)
                     result = self._cb_retry_thinking_disabled(body, model, token, agent_id, stream, tracker, input_data, instructions, err_body, acct)
                     return result
                 print(f"[codebuff] HTTP {e.code}: {err_body[:300]}", file=sys.stderr)
                 return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
             except Exception as e:
                 _codebuff_finish_run(token, run_id, "failed")
                 return self.send_json(502, {"error": {"type": "proxy_error", "message": str(e)}})

             t0 = time.time()
             try:
                 if stream:
                     self.send_response(200)
                     self.send_header("Content-Type", "text/event-stream")
                     self.send_header("Cache-Control", "no-cache")
                     self.send_header("Connection", "keep-alive")
                     self.end_headers()
                     if hasattr(self, 'connection') and self.connection:
                         try:
                             self.connection.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
                         except Exception:
                             pass

                     last_resp_id = [None]
                     last_output = [None]
                     last_status = [None]
                     finish_reason = [None]
                     reasoning_out = {}

                     def _on_fb_event(event):
                         if tracker and tracker.cancelled.is_set():
                             return False
                         for line in event.strip().split("\n"):
                             if line.startswith("data: "):
                                 try:
                                     d = json.loads(line[6:])
                                     if d.get("type") == "response.completed":
                                         last_resp_id[0] = d.get("response", {}).get("id")
                                         last_output[0] = d.get("response", {}).get("output", [])
                                         last_status[0] = d.get("response", {}).get("status")
                                         finish_reason[0] = "length" if last_status[0] == "incomplete" else "stop"
                                 except Exception:
                                     pass
                         return None

                     try:
                         self.stream_buffered_events(
                             oa_stream_to_sse(upstream, model, body.get("request_id") or body.get("id"),
                                              _reasoning_out=reasoning_out),
                             on_event=_on_fb_event)
                     except (ConnectionResetError, BrokenPipeError, ConnectionAbortedError):
                         print(f"[{self._session_id}] [codebuff] client disconnected", file=sys.stderr)
                         return

                     success = finish_reason[0] != "length"
                     _record_usage("codebuff", model, success, time.time() - t0)
                     if last_resp_id[0] and input_data is not None:
                         store_response(last_resp_id[0], input_data, last_output[0])
                     if last_resp_id[0] and reasoning_out.get("text") or reasoning_out.get("tool_calls"):
                         asm = {"role": "assistant", "content": reasoning_out.get("text", "") or ""}
                         if reasoning_out.get("tool_calls"):
                             asm["tool_calls"] = reasoning_out["tool_calls"]
                         if reasoning_out.get("text"):
                             asm["reasoning_content"] = reasoning_out["text"]
                         _ds_store_assistant(last_resp_id[0], asm)
                     print(f"[{self._session_id}] [codebuff] stream done status={last_status[0]} in {time.time()-t0:.1f}s acct={acct_id}", file=sys.stderr)
                 else:
                     raw = upstream.read().decode()
                     chat_resp = json.loads(raw)
                     result = oa_resp_to_responses(chat_resp, model)
                     self.send_json(200, result)
                     rid = result.get("id")
                     if rid:
                         store_response(rid, input_data, result.get("output", []))
                     print(f"[{self._session_id}] [codebuff] non-stream done in {time.time()-t0:.1f}s acct={acct_id}", file=sys.stderr)
             finally:
                 _codebuff_finish_run(token, run_id, "completed")
             return

         if last_err:
             msg = last_err[2]
             resp_id = f"resp_{uuid.uuid4().hex[:24]}"
             result = {
                 "id": resp_id,
                 "object": "response",
                 "created_at": int(time.time()),
                 "model": model,
                 "status": "completed",
                 "output": [{
                     "id": f"msg_{uuid.uuid4().hex[:24]}",
                     "type": "message",
                     "role": "assistant",
                     "content": [{
                         "type": "output_text",
                         "text": msg,
                         "annotations": [],
                     }],
                     "status": "completed",
                 }],
                 "usage": {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0},
             }
             return self.send_json(200, result)

    def _cb_retry_thinking_disabled(self, body, model, token, agent_id, stream, tracker, input_data, instructions, original_error, acct=None):
        run_id, run_err = _codebuff_start_run(token, agent_id)
        if not run_id:
            msg = run_err[3] if run_err else "unknown error"
            return self.send_json(run_err[1] if run_err else 502, {"error": {"type": run_err[0] if run_err else "upstream_error",
                "message": f"Failed to start agent run for retry: {msg}"}})
        instance_id = _codebuff_get_session(token, model)
        messages = _cb_input_to_messages(input_data, instructions)
        _codebuff_hard_disable_reasoning(messages)
        metadata = {"run_id": run_id, "cost_mode": "free", "client_id": secrets.token_hex(7)[:13]}
        if instance_id:
            metadata["freebuff_instance_id"] = instance_id
        chat_body = {
            "model": model, "messages": messages, "stream": stream,
            "max_tokens": max(body.get("max_output_tokens", 0), 64000),
            "thinking": {"type": "disabled"},
            "codebuff_metadata": metadata,
        }
        for k in ("temperature", "top_p"):
            if k in body:
                chat_body[k] = body[k]
        tools = oa_convert_tools(body.get("tools"))
        if tools:
            chat_body["tools"] = tools
        if body.get("tool_choice"):
            chat_body["tool_choice"] = body["tool_choice"]
        target = f"{_CODEBUFF_API_URL}/api/v1/chat/completions"
        headers = {"Content-Type": "application/json", "Authorization": f"Bearer {token}", "User-Agent": "ai-sdk/openai-compatible/1.0.25/codebuff", "x-codebuff-model": model}
        if instance_id:
            headers["x-codebuff-instance-id"] = instance_id
        print(f"[codebuff] retry POST {target} model={model} stream={stream} run={run_id} (thinking disabled via DeepSeek native)", file=sys.stderr)
        try:
            req = urllib.request.Request(target, data=json.dumps(chat_body).encode(), headers=headers)
            upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, stream))
        except urllib.error.HTTPError as e:
            err_body = e.read().decode()[:500]
            _codebuff_finish_run(token, run_id, "failed")
            print(f"[codebuff] thinking-disabled retry failed: HTTP {e.code}: {err_body[:300]}", file=sys.stderr)
            return self.send_json(e.code, {"error": {"type": "codebuff_deepseek_thinking_error",
                "message": "Codebuff/DeepSeek V4 requires reasoning_content round-trip for tool-call sessions. Use Command Code provider for this model instead.", "upstream_error": _sanitize_err_body(err_body)}})
        except Exception as e:
            _codebuff_finish_run(token, run_id, "failed")
            return self.send_json(502, {"error": {"type": "proxy_error", "message": str(e)}})
        t0 = time.time()
        try:
            if stream:
                self.send_response(200)
                self.send_header("Content-Type", "text/event-stream")
                self.send_header("Cache-Control", "no-cache")
                self.send_header("Connection", "keep-alive")
                self.end_headers()
                if hasattr(self, 'connection') and self.connection:
                    try:
                        self.connection.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
                    except Exception:
                        pass
                last_resp_id = [None]
                last_output = [None]
                last_status = [None]
                finish_reason = [None]
                reasoning_out = {}
                def _on_fb_retry_event(event):
                    if tracker and tracker.cancelled.is_set():
                        return False
                    for line in event.strip().split("\n"):
                        if line.startswith("data: "):
                            try:
                                d = json.loads(line[6:])
                                if d.get("type") == "response.completed":
                                    last_resp_id[0] = d.get("response", {}).get("id")
                                    last_output[0] = d.get("response", {}).get("output", [])
                                    last_status[0] = d.get("response", {}).get("status")
                                    finish_reason[0] = "length" if last_status[0] == "incomplete" else "stop"
                            except Exception:
                                pass
                    return None
                try:
                    self.stream_buffered_events(
                        oa_stream_to_sse(upstream, model, body.get("request_id") or body.get("id"),
                                         _reasoning_out=reasoning_out),
                        on_event=_on_fb_retry_event)
                except (ConnectionResetError, BrokenPipeError, ConnectionAbortedError):
                    return
                success = finish_reason[0] != "length"
                _record_usage("codebuff", model, success, time.time() - t0)
                if last_resp_id[0] and input_data is not None:
                    store_response(last_resp_id[0], input_data, last_output[0])
                if last_resp_id[0] and reasoning_out.get("text") or reasoning_out.get("tool_calls"):
                    asm = {"role": "assistant", "content": reasoning_out.get("text", "") or ""}
                    if reasoning_out.get("tool_calls"):
                        asm["tool_calls"] = reasoning_out["tool_calls"]
                    if reasoning_out.get("text"):
                        asm["reasoning_content"] = reasoning_out["text"]
                    _ds_store_assistant(last_resp_id[0], asm)
                print(f"[{self._session_id}] [codebuff] retry stream done status={last_status[0]} in {time.time()-t0:.1f}s", file=sys.stderr)
            else:
                raw = upstream.read().decode()
                chat_resp = json.loads(raw)
                result = oa_resp_to_responses(chat_resp, model)
                self.send_json(200, result)
                rid = result.get("id")
                if rid:
                    store_response(rid, input_data, result.get("output", []))
                print(f"[{self._session_id}] [codebuff] retry non-stream done in {time.time()-t0:.1f}s", file=sys.stderr)
        finally:
            _codebuff_finish_run(token, run_id, "completed")

    def _handle_auto(self, body, model, stream, tracker=None):
        """Auto-sensing backend: probe schema, adapt, retry on errors.
        Uses hostname heuristics as initial guess, then learns from errors
        and caches the learned schema for subsequent requests.
        """
        input_data = body.get("input", "")
        instructions = body.get("instructions", "").strip()

        schema = _load_schema(model=model)
        fresh = not schema.hints().get("_updated")
        host = urllib.parse.urlparse(TARGET_URL).netloc.lower()

        def _detect_style():
            cc = schema.cc_body_wrap or "commandcode" in host or "command-code" in host
            anth = schema.tool_call_style == "anthropic_tool_use" or any(h in host for h in ("anthropic", "claude"))
            return cc, anth

        is_cc, is_anthropic = _detect_style()

        def _endpoint():
            ep = schema.field_names.get("endpoint_path", "")
            if ep:
                return ep
            if is_cc:
                return "/alpha/generate"
            if is_anthropic:
                return "/messages"
            return "/chat/completions"

        _FALLBACK_ENDPOINTS = ["/v1/chat/completions", "/chat/completions",
                                "/v1/messages", "/messages",
                                "/alpha/generate", "/complete", "/v1/complete"]
        target = upstream_target(TARGET_URL, _endpoint())
        tried_endpoints = {target}  # track tried endpoints to avoid loops

        max_retries = 3
        prev_content_type = None  # for oscillation detection
        for attempt in range(max_retries + 1):
            adapter = SchemaAdapter(schema)
            messages = adapter.convert(input_data, instructions)
            use_cc_wrap = schema.cc_body_wrap or is_cc

            # Build auth header from schema
            auth_val = f"{schema.auth_scheme}{API_KEY}" if schema.auth_scheme else API_KEY
            headers_extra = {"Content-Type": "application/json"}
            if schema.auth_header:
                headers_extra[schema.auth_header] = auth_val

            pm = schema.param_names  # short alias

            if use_cc_wrap:
                thread_id = body.get("request_id") or body.get("id") or str(uuid.uuid4())
                try:
                    uuid.UUID(thread_id)
                except (ValueError, AttributeError):
                    thread_id = str(uuid.uuid4())
                params_body = {
                    "stream": True,
                    pm.get("max_tokens", "max_tokens"): body.get("max_output_tokens", 64000),
                    pm.get("temperature", "temperature"): body.get("temperature", 0.3),
                    "messages": messages,
                    "model": model,
                }
                tp = schema.field_names.get("tools_param", "tools")
                params_body[tp] = []
                req_body = {
                    "config": _cc_config(),
                    "memory": "", "taste": "", "skills": "",
                    "params": params_body,
                    "threadId": thread_id,
                }
                if CC_VERSION:
                    headers_extra["x-command-code-version"] = CC_VERSION or "0.26.8"
            elif is_anthropic:
                req_body = {
                    "model": model,
                    "messages": messages,
                    pm.get("max_tokens", "max_tokens"): body.get("max_output_tokens", 8192),
                    "stream": stream,
                }
                if instructions:
                    req_body["system"] = [{"type": "text", "text": instructions}]
                tools = an_convert_tools(body.get("tools"))
                if tools:
                    req_body["tools"] = tools
                headers_extra.setdefault("anthropic-version", "2023-06-01")
            else:
                req_body = {
                    "model": model,
                    "messages": messages,
                    pm.get("max_tokens", "max_tokens"): max(body.get("max_output_tokens", 0), 64000),
                    "stream": stream,
                }
                for k in ("temperature", "top_p"):
                    pk = pm.get(k, k)
                    if k in body:
                        req_body[pk] = body[k]
                if schema.tool_decl_format == "anthropic":
                    tools = an_convert_tools(body.get("tools"))
                else:
                    tools = oa_convert_tools(body.get("tools"))
                if tools:
                    req_body["tools"] = tools
                    req_body["tool_choice"] = body.get("tool_choice", "auto")
                if not REASONING_ENABLED or REASONING_EFFORT == "none":
                    req_body["enable_thinking"] = False
                    req_body["reasoning_effort"] = "none"
                else:
                    req_body["reasoning_effort"] = REASONING_EFFORT

            req_body_b = json.dumps(req_body).encode()
            fwd = forwarded_headers(self.headers, {**headers_extra, **_openrouter_extra()}, browser_ua=True)
            print(f"[auto-sense] POST {target} model={model} attempt={attempt} schema={schema.hints()}", file=sys.stderr)

            req = urllib.request.Request(target, data=req_body_b, headers=fwd)
            try:
                upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, stream))
            except urllib.error.HTTPError as e:
                err_body = e.read().decode()
                # ── 404 endpoint fallback ──
                if e.code == 404 and attempt < max_retries:
                    for ep in _FALLBACK_ENDPOINTS:
                        ep_full = upstream_target(TARGET_URL, ep)
                        if ep_full not in tried_endpoints:
                            tried_endpoints.add(ep_full)
                            target = ep_full
                            # Try the new endpoint without schema change
                            print(f"[auto-sense] 404 -> trying endpoint {ep_full}", file=sys.stderr)
                            break
                    else:
                        # All endpoints tried -> real 404
                        return self.send_json(404, {"error": {"type": "not_found", "message": f"No working endpoint found (tried {len(tried_endpoints)} paths)"}})
                    continue
                # ── Non-404 error handling ──
                if attempt < max_retries:
                    hints = ErrorAnalyzer.analyze(err_body, schema)
                    oscillation_retry = False
                    if hints:
                        # Content-type oscillation detection
                        if "content_type" in hints:
                            if prev_content_type is not None and hints["content_type"] != prev_content_type:
                                print(f"[auto-sense] content_type oscillation: {prev_content_type} -> {hints['content_type']}, freezing", file=sys.stderr)
                                hints.pop("content_type")
                                schema.content_type = "string"
                                prev_content_type = None
                                oscillation_retry = True  # hints became empty, still retry
                            else:
                                prev_content_type = hints["content_type"]
                        else:
                            prev_content_type = None
                    if hints:
                        print(f"[auto-sense] error analysis: {hints}", file=sys.stderr)
                        ErrorAnalyzer.merge_into_schema(hints, schema)
                        _save_schema(schema, model=model)
                        is_cc, is_anthropic = _detect_style()
                        target = upstream_target(TARGET_URL, _endpoint())
                        continue
                    if oscillation_retry:
                        continue
                    if e.code in (429, 502, 503):
                        wait = min(2 ** (attempt + 1), 15)
                        time.sleep(wait)
                        continue
                return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
            except Exception as e:
                if attempt < max_retries:
                    continue
                return self.send_json(500, {"error": {"type": "proxy_error", "message": str(e)}})

            if fresh:
                _save_schema(schema, model=model)
                fresh = False

            # Auto-detect stream/response format from Content-Type if still "auto"
            ct = (upstream.headers.get("Content-Type", "") if hasattr(upstream, "headers") else "").lower()
            if schema.stream_format == "auto" and stream:
                if "text/event-stream" in ct:
                    sf = "sse_data"
                elif "x-ndjson" in ct or "jsonlines" in ct or "json-seq" in ct:
                    sf = "json_lines"
                else:
                    sf = "sse_data" if not use_cc_wrap else "json_lines"
            else:
                sf = schema.stream_format
            if schema.response_format == "auto" and not stream:
                if "application/json" in ct or not ct:
                    rf = "json"
                elif "x-ndjson" in ct:
                    rf = "ndjson"
                else:
                    rf = "json"
            else:
                rf = schema.response_format

            if stream:
                self.send_response(200)
                self.send_header("Content-Type", "text/event-stream")
                self.send_header("Cache-Control", "no-cache")
                self.send_header("Connection", "keep-alive")
                self.end_headers()

                if sf == "json_lines" or use_cc_wrap:
                    events = cc_stream_to_sse(upstream, model,
                                              body.get("request_id") or body.get("id"))
                elif sf == "sse_event" or is_anthropic:
                    events = an_stream_to_sse(upstream, model,
                                              body.get("request_id") or body.get("id"))
                else:
                    events = oa_stream_to_sse(upstream, model,
                                              body.get("request_id") or body.get("id"))
                self.stream_buffered_events(events)
            else:
                raw = upstream.read().decode().strip()
                if rf == "ndjson" or use_cc_wrap:
                    result = cc_resp_to_responses(raw, model)
                elif rf == "json" and is_anthropic:
                    result = an_resp_to_responses(json.loads(raw), model)
                else:
                    result = oa_resp_to_responses(json.loads(raw), model)
                self.send_json(200, result)
            return

    def _forward(self, req, stream, model, nonstream_fn, stream_fn, input_data=None, tracker=None):
        try:
            upstream = urllib.request.urlopen(req, timeout=_upstream_timeout({}, stream))
        except urllib.error.HTTPError as e:
            err = e.read().decode()
            return self.send_json(e.code, {"error": {"type": "upstream_error", "message": err}})
        except Exception as e:
            return self.send_json(500, {"error": {"type": "proxy_error", "message": str(e)}})

        if stream:
            self.send_response(200)
            self.send_header("Content-Type", "text/event-stream")
            self.send_header("Cache-Control", "no-cache")
            self.send_header("Connection", "keep-alive")
            self.end_headers()
            if hasattr(self, 'connection') and self.connection:
                try:
                    self.connection.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
                except Exception:
                    pass
            last_resp_id = None
            last_output = None
            last_status = None
            try:
                def on_event(event):
                    nonlocal last_resp_id, last_output, last_status
                    if tracker and tracker.cancelled.is_set():
                        print("[translate-proxy] stream cancelled", file=sys.stderr)
                        return False
                    for line in event.strip().split("\n"):
                        if line.startswith("data: "):
                            try:
                                d = json.loads(line[6:])
                                if d.get("type") == "response.completed":
                                     last_resp_id = d.get("response", {}).get("id")
                                     last_output = d.get("response", {}).get("output", [])
                                     last_status = d.get("response", {}).get("status")
                            except: pass
                    return True
                self.stream_buffered_events(stream_fn(upstream), on_event=on_event)
            except (ConnectionResetError, BrokenPipeError, ConnectionAbortedError):
                print("[translate-proxy] client disconnected during stream", file=sys.stderr)
            _log_resp(last_resp_id, last_status or "client_disconnect", last_output)
            if last_resp_id and input_data is not None:
                store_response(last_resp_id, input_data, last_output)
        else:
            result = nonstream_fn(upstream)
            self.send_json(200, result)
            rid = result.get("id")
            _log_resp(rid, result.get("status"), result.get("output", []))
            if rid and input_data is not None:
                store_response(rid, input_data, result.get("output", []))

    def send_json(self, status, data):
        try:
            body = json.dumps(data).encode()
            self.send_response(status)
            self.send_header("Content-Type", "application/json")
            self.send_header("Content-Length", str(len(body)))
            self.end_headers()
            self.wfile.write(body)
        except (BrokenPipeError, ConnectionResetError, ConnectionAbortedError):
            pass

    def stream_buffered_events(self, event_iter, flush_interval=0.03, max_bytes=4096, on_event=None):
        buf = bytearray()
        last_flush = time.monotonic()
        _MAX_BUF = 8 * 1024 * 1024
        def _flush():
            nonlocal buf, last_flush
            if buf:
                self.wfile.write(buf)
                self.wfile.flush()
                buf.clear()
                last_flush = time.monotonic()
        for event in event_iter:
            if on_event is not None and on_event(event) is False:
                break
            encoded = event.encode("utf-8") if isinstance(event, str) else event
            if len(buf) + len(encoded) > _MAX_BUF:
                _flush()
            buf.extend(encoded)
            urgent = ("response.completed" in event or "response.output_text.done" in event
                      or "response.output_item.done" in event
                      or "function_call_arguments.done" in event
                      or "response.failed" in event or '"type":"error"' in event)
            if urgent or len(buf) >= max_bytes or time.monotonic() - last_flush >= flush_interval:
                _flush()
        _flush()

    def log_message(self, fmt, *args):
        msg = fmt % args if args else fmt
        _sid = getattr(self, '_session_id', None) or 'proxy'
        print(f"[{_sid}] {BACKEND} {msg}", file=sys.stderr)

_SHUTDOWN_REQUESTED = False

def _handle_shutdown_signal(sig, frame):
    global _SHUTDOWN_REQUESTED
    _SHUTDOWN_REQUESTED = True
    print(f"[SELF-REVIVE] Signal {sig} received, shutting down cleanly", flush=True)
    if 'SERVER' in globals() and SERVER:
         SERVER.shutdown()

def main():
    global SERVER, _START_TIME
    _START_TIME = time.time()
    _init_runtime()
    try:
        _current_cfg = os.path.basename(args.config) if args.config else ""
        for _f in os.listdir(_LOG_DIR):
            if _f.startswith("proxy-") and _f.endswith(".json") and _f != _current_cfg:
                os.remove(os.path.join(_LOG_DIR, _f))
            if _f.startswith("models-") and _f.endswith(".json"):
                os.remove(os.path.join(_LOG_DIR, _f))
    except Exception:
        pass
    signal.signal(signal.SIGINT, _handle_shutdown_signal)
    if _IS_WINDOWS:
        if hasattr(signal, "SIGBREAK"):
            signal.signal(signal.SIGBREAK, _handle_shutdown_signal)
        import atexit
        atexit.register(lambda: setattr(sys.modules[__name__], '_SHUTDOWN_REQUESTED', True))
    else:
        signal.signal(signal.SIGTERM, _handle_shutdown_signal)
    try:
        from http.server import ThreadingHTTPServer as _BaseSrv
    except ImportError:
        class _BaseSrv(socketserver.ThreadingMixIn, http.server.HTTPServer):
            daemon_threads = True
    class ReusableHTTPServer(_BaseSrv):
        allow_reuse_address = True
        daemon_threads = True
        request_queue_size = 64
    SERVER = ReusableHTTPServer(("127.0.0.1", PORT), Handler)
    print(f"translate-proxy ({BACKEND}) listening on http://127.0.0.1:{PORT}", flush=True)
    print(f"Target: {TARGET_URL}", flush=True)
    print(f"Models: {[m['id'] for m in MODELS]}", flush=True)
    if BACKEND in ("codebuff", "freebuff"):
        _cb_pool.load_accounts(force=True)
        fb_status = _cb_pool.status()
        print(f"[multi-account] codebuff: {len(fb_status)} accounts loaded {[a['id'] for a in fb_status]}", flush=True)
    if OAUTH_PROVIDER and OAUTH_PROVIDER.startswith("google"):
        pool = _google_antigravity_pool if OAUTH_PROVIDER == "google-antigravity" else _google_cli_pool
        pool.load_accounts(force=True)
        g_status = pool.status()
        print(f"[multi-account] {OAUTH_PROVIDER}: {len(g_status)} accounts loaded {[a['id'] for a in g_status]}", flush=True)
    if _api_key_pool:
        print(f"[multi-account] API keys: {len(_api_key_pool._accounts)} keys loaded", flush=True)
    if BGP_ROUTES:
        print(f"BGP routes: {len(BGP_ROUTES)} ({[r.get('name','?') for r in BGP_ROUTES]})", flush=True)
    try:
        SERVER.serve_forever()
    finally:
        _flush_stats()

if __name__ == "__main__":
    if "--self-test" in sys.argv:
        _counts = [0, 0]
        def _check(label, condition, detail=""):
            if condition:
                _counts[0] += 1
            else:
                _counts[1] += 1
                print(f"  FAIL: {label} {detail}", file=sys.stderr)
        print("[CC-SELF-TEST] CommandCode Parsing Pipeline", file=sys.stderr)

        # Test _unwrap_cmd (these simulate what json.loads of args produces)
        _check("unwrap: plain cmd", _unwrap_cmd("ls -la") == "ls -la")
        _check("unwrap: single wrap", _unwrap_cmd('{"cmd": "cat /etc/passwd"}') == "cat /etc/passwd")
        _dw = '{"cmd": "{\\"cmd\\": \\"curl -sL url\\"}"}'
        _check("unwrap: double wrap", _unwrap_cmd(_dw) == "curl -sL url",
               f"got {_unwrap_cmd(_dw)!r}")
        _tw = '{"cmd": "{\\"cmd\\": \\"{\\"cmd\\": \\"echo hi\\"}\\"}"}'
        _tw_result = _unwrap_cmd(_tw)
        _check("unwrap: triple wrap", "echo hi" in _tw_result or "{" in _tw_result,
               f"got {_tw_result!r}")  # triple-unwrap depends on proper JSON escaping
        _check("unwrap: non-dict JSON", _unwrap_cmd('{"foo":"bar"}') == '{"foo":"bar"}')
        _check("unwrap: empty string", _unwrap_cmd("") == "")
        _check("unwrap: None-like", _unwrap_cmd("null") == "null")

        # Pattern A: double-wrapped cmd (the production bug)
        # Model text after _extract_args brace-counting produces this args_raw:
        _args_a_raw = '{"cmd": "{\\"cmd\\": \\"mkdir -p /tmp/test\\"}"}'
        _calls_a = _sanitize_tool_calls([{
            "name": "exec_command",
            "arguments": _args_a_raw,
        }])
        _check("double-wrap: sanitized call exists", len(_calls_a) == 1)
        if _calls_a:
            _args_a = json.loads(_calls_a[0]["arguments"])
            _check("double-wrap: cmd unwrapped to real command",
                   _args_a.get("cmd") == "mkdir -p /tmp/test",
                   f"cmd={_args_a.get('cmd')!r}")

        # Pattern B: unescaped inner quotes (model outputs malformed JSON)
        # Test via _extract_raw_json_tool_calls directly to avoid XML regex issues
        _calls_b = _parse_commandcode_text_tool_calls(
            '{"type":"tool-call","name":"bash",'
            '"arguments":"{\\\"cmd\\\": \\\"cat file.html\\\", \\\"sp\\\": \\\"allow_all\\\"}"}')
        _check("unescaped quotes: extracted call", len(_calls_b) >= 1,
               f"got {len(_calls_b)} calls")

        # Pattern C: XML format (fixed regex — was broken with unbalanced paren)
        _calls_c = _parse_commandcode_text_tool_calls(
            '<tool_call name="bash"><parameter name="command">curl -sL https://example.com</parameter></tool_call)>')
        _check("XML format: extracted call", len(_calls_c) == 1,
               f"got {len(_calls_c)} calls")
        if _calls_c:
            _args_c = json.loads(_calls_c[0]["arguments"])
            _check("XML: correct cmd", "curl" in _args_c.get("cmd", ""),
                   f"cmd={_args_c.get('cmd')!r}")

        # Pattern D: function= format
        _calls_d = _parse_commandcode_text_tool_calls(
            "<function=bash>echo hello world</function>")
        _check("function= format: extracted call", len(_calls_d) == 1)

        # Pattern E: empty input
        _check("empty input", len(_parse_commandcode_text_tool_calls("")) == 0)
        _check("None input", len(_parse_commandcode_text_tool_calls(None)) == 0)

        # Pattern F: sanitizer catches empty cmd
        _san_empty = _sanitize_tool_calls([{"name": "exec_command", "arguments": '{"cmd": ""}'}])
        _san_f_args = json.loads(_san_empty[0]["arguments"]) if _san_empty else {}
        _check("sanitizer: empty cmd flagged",
               "# [CC-SANITIZER]" in _san_f_args.get("cmd", ""),
               f"cmd={_san_f_args.get('cmd', '')!r}")

        # Pattern G: sanitizer catches still-JSON cmd (must produce valid JSON)
        _g_args_raw = '{"cmd": "{\\"nested\\":true}"}'
        _san_json = _sanitize_tool_calls([{"name": "exec_command", "arguments": _g_args_raw}])
        _check("sanitizer: JSON call produced", len(_san_json) == 1)
        if _san_json:
            try:
                _san_g_args = json.loads(_san_json[0]["arguments"])
                _check("sanitizer: output is valid JSON", True)
                _check("sanitizer: JSON cmd flagged",
                       "# [CC-SANITIZER]" in _san_g_args.get("cmd", ""),
                       f"cmd={_san_g_args.get('cmd', '')!r}")
            except Exception as e:
                _check(f"sanitizer: output valid JSON, got {e}", False)

        # Pattern H: Native <todo_write> XML block parsing and sanitization bypass (FIX 18)
        _todo_xml = """Some preamble text.
<todo_write>
<todos>[{"id":"1","status":"in_progress","description":"Create landing page directory and HTML structure"},{"id":"2","status":"pending","description":"Write the full landing page"}]</todos>
</todo_write>
Postamble text."""
        _calls_h = _parse_commandcode_text_tool_calls(_todo_xml)
        _check("todo_write: extracted call exists", len(_calls_h) == 1, f"got {len(_calls_h)} calls")
        if _calls_h:
            _call_h = _calls_h[0]
            _check("todo_write: name is TodoWrite", _call_h.get("name") == "TodoWrite")
            try:
                _args_h = json.loads(_call_h.get("arguments", "{}"))
                _todos_h = _args_h.get("todos", [])
                _check("todo_write: correct todos count", len(_todos_h) == 2, f"got {len(_todos_h)} todos")
                if len(_todos_h) == 2:
                    _check("todo_write: item 1 content", _todos_h[0].get("content") == "Create landing page directory and HTML structure")
                    _check("todo_write: item 1 activeForm", _todos_h[0].get("activeForm") == "Create landing page directory and HTML structure")
                    _check("todo_write: item 1 status", _todos_h[0].get("status") == "in_progress")
                    _check("todo_write: item 2 status", _todos_h[1].get("status") == "pending")
                # Confirm that the arguments contain no 'cmd' or sanitization comment
                _check("todo_write: no cmd injected", "cmd" not in _args_h)
            except Exception as e:
                _check(f"todo_write: parsed JSON error: {e}", False)

        # Pattern I: Translate execute_request to exec_command (FIX 19)
        _exec_req_raw = '<｜｜DSML｜｜tool_calls>\n<｜｜DSML｜｜invoke name="execute_request">\n<｜｜DSML｜｜parameter name="command" string="true">ls -la</｜｜DSML｜｜parameter>\n</｜｜DSML｜｜invoke>\n</｜｜DSML｜｜tool_calls>'
        _calls_i = _parse_commandcode_text_tool_calls(_exec_req_raw)
        _check("execute_request: mapped successfully", len(_calls_i) == 1, f"got {len(_calls_i)} calls")
        if _calls_i:
            _call_i = _calls_i[0]
            _check("execute_request: name translated to exec_command", _call_i.get("name") == "exec_command", f"got {_call_i.get('name')}")
            try:
                _args_i = json.loads(_call_i.get("arguments", "{}"))
                _check("execute_request: correct command extracted", _args_i.get("cmd") == "ls -la", f"got {_args_i.get('cmd')}")
            except Exception as e:
                _check(f"execute_request: arguments parsing error: {e}", False)

        # Pattern J: Translate DSML-style explore/explore_agent block (FIX 20)
        _explore_dsml = '<｜｜DSML｜｜tool_calls>\n  <｜｜DSML｜｜invoke name="explore">\n  <｜｜DSML｜｜parameter name="messages" string="true">[{"content": "Understand what the Z.AI-Chat-for-Android project is about... URL: https://github.rommark.dev/admin/Z.AI-Chat-for-Android", "role": "user"}]</｜｜DSML｜｜parameter>\n  </｜｜DSML｜｜invoke>\n  </｜｜DSML｜｜tool_calls>'
        _calls_j = _parse_commandcode_text_tool_calls(_explore_dsml)
        _check("explore DSML: mapped successfully", len(_calls_j) == 1, f"got {len(_calls_j)} calls")
        if _calls_j:
            _call_j = _calls_j[0]
            _check("explore DSML: name translated to exec_command", _call_j.get("name") == "exec_command", f"got {_call_j.get('name')}")
            try:
                _args_j = json.loads(_call_j.get("arguments", "{}"))
                _check("explore DSML: built a curl explore script targeting api base", "api/v1/repos/admin/Z.AI-Chat-for-Android" in _args_j.get("cmd", ""), f"got {_args_j.get('cmd')!r}")
            except Exception as e:
                _check(f"explore DSML: arguments parsing error: {e}", False)

        # Pattern K: Translate raw JSON-style explore call (FIX 20)
        _explore_json = '{"type":"tool-call","name":"explore_agent","id":"call_123","arguments":"{\\\"messages\\\": [{\\\"content\\\": \\\"https://github.rommark.dev/admin/Z.AI-Chat-for-Android\\\"}]}"}'
        _calls_k = _parse_commandcode_text_tool_calls(_explore_json)
        _check("explore JSON: mapped successfully", len(_calls_k) == 1, f"got {len(_calls_k)} calls")
        if _calls_k:
            _call_k = _calls_k[0]
            _check("explore JSON: name translated to exec_command", _call_k.get("name") == "exec_command")
            try:
                _args_k = json.loads(_call_k.get("arguments", "{}"))
                _check("explore JSON: built a curl explore script targeting api base", "api/v1/repos/admin/Z.AI-Chat-for-Android" in _args_k.get("cmd", ""), f"got {_args_k.get('cmd')!r}")
            except Exception as e:
                _check(f"explore JSON: arguments parsing error: {e}", False)

        # Pattern L: DSML with parameter name="cmd" instead of name="command" (FIX 21)
        # This is THE critical regression test — the model often uses name="cmd" (matching
        # the actual tool schema) instead of name="command". Previously the DSML parser
        # silently dropped these, causing Codex CLI to halt mid-task.
        _cmd_dsml = '<｜｜DSML｜｜tool_calls>\n  <｜｜DSML｜｜invoke name="exec_command">\n  <｜｜DSML｜｜parameter name="cmd" string="true">curl -sL --max-time 15 \'https://github.rommark.dev/api/v1/repos/admin/Z.AI-Chat-for-Android/contents/README.md\' 2>/dev/null</｜｜DSML｜｜parameter>\n  <｜｜DSML｜｜parameter name="sandbox_permissions" string="true">require_escalated</｜｜DSML｜｜parameter>\n  <｜｜DSML｜｜parameter name="justification" string="true">I need to get the README from the private repo to understand the Android app before building the landing page mockup.</｜｜DSML｜｜parameter>\n  </｜｜DSML｜｜invoke>\n  </｜｜DSML｜｜tool_calls>'
        _calls_l = _parse_commandcode_text_tool_calls(_cmd_dsml)
        _check("DSML name=cmd: mapped successfully", len(_calls_l) == 1, f"got {len(_calls_l)} calls")
        if _calls_l:
            _call_l = _calls_l[0]
            _check("DSML name=cmd: name is exec_command", _call_l.get("name") == "exec_command", f"got {_call_l.get('name')}")
            try:
                _args_l = json.loads(_call_l.get("arguments", "{}"))
                _check("DSML name=cmd: cmd extracted correctly", "curl -sL --max-time 15" in _args_l.get("cmd", ""), f"got {_args_l.get('cmd')!r}")
                _check("DSML name=cmd: sandbox_permissions extracted", _args_l.get("sandbox_permissions") == "require_escalated", f"got {_args_l.get('sandbox_permissions')!r}")
                _check("DSML name=cmd: justification extracted", "README" in _args_l.get("justification", ""), f"got {_args_l.get('justification')!r}")
            except Exception as e:
                _check(f"DSML name=cmd: arguments parsing error: {e}", False)

        # Pattern M: explore_agent with nested JSON messages containing URL (FIX 23)
        _explore_nested = '<explore_agent>\nmessages: [{"content": "Understand the Z.AI-Chat-for-Android repo at https://github.rommark.dev/admin/Z.AI-Chat-for-Android"}]\n</explore_agent>'
        _calls_m = _parse_commandcode_text_tool_calls(_explore_nested)
        _check("FIX23 explore nested JSON: parsed", len(_calls_m) == 1, f"got {len(_calls_m)} calls")
        if _calls_m:
            _args_m = json.loads(_calls_m[0].get("arguments", "{}"))
            _check("FIX23 explore nested JSON: cmd has fetch cmd", "curl" in _args_m.get("cmd", "") or "Invoke-WebRequest" in _args_m.get("cmd", ""), f"got {_args_m.get('cmd')!r}")
            _check("FIX23 explore nested JSON: URL in cmd", "github.rommark.dev" in _args_m.get("cmd", ""), f"missing URL in cmd")

        # Pattern N: require_escalation block (FIX 24)
        _esc_text = '<require_escalation>I need to run a command with elevated permissions to access the repository at https://github.rommark.dev/admin/Z.AI-Chat-for-Android</require_escalation>'
        _calls_n = _parse_commandcode_text_tool_calls(_esc_text)
        _check("FIX24 require_escalation: parsed", len(_calls_n) == 1, f"got {len(_calls_n)} calls")
        if _calls_n:
            _args_n = json.loads(_calls_n[0].get("arguments", "{}"))
            _check("FIX24 require_escalation: name is exec_command", _calls_n[0].get("name") == "exec_command", f"got {_calls_n[0].get('name')}")
            _check("FIX24 require_escalation: cmd has fetch or echo", "curl" in _args_n.get("cmd", "") or "echo" in _args_n.get("cmd", "") or "Invoke-WebRequest" in _args_n.get("cmd", "") or "Write-Output" in _args_n.get("cmd", ""), f"got {_args_n.get('cmd')!r}")

        # Pattern N2: bare request_escalation_permission tag (FIX 24b)
        _esc_bare = 'I want to proceed.\n<request_escalation_permission />\nPlease let me continue.'
        _calls_n2 = _parse_commandcode_text_tool_calls(_esc_bare)
        _check("FIX24b bare escalation: parsed", len(_calls_n2) == 1, f"got {len(_calls_n2)} calls")
        if _calls_n2:
            _check("FIX24b bare escalation: name is exec_command", _calls_n2[0].get("name") == "exec_command", f"got {_calls_n2[0].get('name')}")

        # Pattern O: _build_explore_cmd module-level function (FIX 23/25)
        _cmd_o, _just_o = _build_explore_cmd("https://github.rommark.dev/admin/Z.AI-Chat-for-Android")
        _check("FIX23/25 _build_explore_cmd: returns cmd", _cmd_o is not None, "returned None")
        _check("FIX23/25 _build_explore_cmd: has fetch cmd", _cmd_o and ("curl" in _cmd_o or "Invoke-WebRequest" in _cmd_o), f"no fetch cmd in {_cmd_o!r}")
        _check("FIX23/25 _build_explore_cmd: has api path", _cmd_o and "/api/v1/repos/" in _cmd_o, f"no api path in {_cmd_o!r}")

        # Pattern O2: _build_explore_cmd with JSON array containing URL
        _cmd_o2, _ = _build_explore_cmd('[{"content": "https://github.rommark.dev/admin/Z.AI-Chat-for-Android"}]')
        _check("FIX23/25 _build_explore_cmd from JSON array: returns cmd", _cmd_o2 is not None, "returned None")
        _check("FIX23/25 _build_explore_cmd from JSON array: has fetch cmd", _cmd_o2 and ("curl" in _cmd_o2 or "Invoke-WebRequest" in _cmd_o2), f"no fetch cmd in {_cmd_o2!r}")

        print(f"[CC-SELF-TEST] Results: {_counts[0]} passed, {_counts[1]} failed",
              file=sys.stderr)
        if _counts[1]:
            sys.exit(1)
        else:
            print("[CC-SELF-TEST] ALL PASSED — pipeline is healthy", file=sys.stderr)
            sys.exit(0)

    # [FIX 12] SELF-REVIVE: auto-restart proxy on crash (not on clean shutdown)
    _MAX_RESTARTS = 50
    _restart_count = 0
    _RESTART_BACKOFF = [1, 2, 3, 5, 10, 15, 30]  # seconds, progressive
    while not _SHUTDOWN_REQUESTED and _restart_count < _MAX_RESTARTS:
        try:
            main()
        except KeyboardInterrupt:
            print("[SELF-REVIVE] Keyboard interrupt — exiting", flush=True)
            break
        except Exception as e:
            _restart_count += 1
            _backoff = _RESTART_BACKOFF[min(_restart_count - 1, len(_RESTART_BACKOFF) - 1)]
            import traceback as _tb
            print(f"[SELF-REVIVE] CRASH #{_restart_count}/{_MAX_RESTARTS}: {e}", flush=True)
            print(f"[SELF-REVIVE] Restarting in {_backoff}s... (Ctrl+C to exit)", flush=True)
            _tb.print_exc()
            time.sleep(_backoff)
        else:
            if not _SHUTDOWN_REQUESTED:
                _restart_count += 1
                _backoff = _RESTART_BACKOFF[min(_restart_count - 1, len(_RESTART_BACKOFF) - 1)]
                print(f"[SELF-REVIVE] main() returned (unexpected), restart #{_restart_count} in {_backoff}s", flush=True)
                time.sleep(_backoff)

    if _SHUTDOWN_REQUESTED or _restart_count >= _MAX_RESTARTS:
        print(f"[SELF-REVIVE] Exiting (shutdown={_SHUTDOWN_REQUESTED}, restarts={_restart_count})", flush=True)