docs: update CHANGELOG + lib with all TRAE fixes (Claude guards, guardrail skip)

fix: swap endpoint order back to cloudcode-pa first (matches agy CLI)
fix: Claude model tool handling, compaction guard, normalizer model param
2026-05-26 13:05:19 +04:00 · 2026-05-26 12:58:21 +04:00 · 2026-05-26 12:57:03 +04:00 · 2026-05-26 09:23:48 +04:00 · 2026-05-26 00:57:16 +04:00 · 2026-05-26 00:15:01 +04:00
16 changed files with 12605 additions and 107 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,115 @@
 # Changelog

+## v3.10.12 (2026-05-26)
+
+**Sticky Endpoint, Claude Fixes, Guardrail Skip, Anti-Stall**
+
+### New Features
+- **Sticky endpoint caching**: remembers which endpoint last succeeded, reuses it on every subsequent request (zero overhead)
+- **Sequential fallback**: if sticky endpoint fails (429/502/503), tries next endpoint in order — no parallel probing, no wasted requests
+- **Endpoint order**: `cloudcode-pa.googleapis.com` first (matches agy CLI), `daily-cloudcode-pa.googleapis.com` as fallback
+- **Anti-stall engine**: kills stale proxy processes and clears `__pycache__` on every new session start
+- **Smart error classification**: distinguishes `quota_exhausted` vs `capacity_exhausted` vs `account_banned` vs `validation_required` vs `service_disabled` vs `auth_permanent`
+- **Rate limit reset time parsing**: extracts cooldown from error body (`quotaResetDelay`, `Resets in ~1h27m`, etc.) for accurate cooldown
+- **Missing Antigravity headers**: `X-Client-Name`, `X-Client-Version`, `x-goog-api-client`, platform-aware `User-Agent`
+- **Session ID**: added `sessionId` to request wrapper for proper session tracking
+
+### Bug Fixes (TRAE Agent)
+- **Guardrail skip for simple messages**: when user sends simple messages (e.g. "hi"), skip injecting `_GEMINI_AGENT_GUARDRAIL` — prevents model from aggressively calling tools and looping `ls -la` 50+ times
+- **Claude tool preservation**: Claude models through Antigravity now keep ALL tool outputs in normalizer (no summarization/truncation) — prevents context loss that broke Claude sessions
+- **Claude compaction guard**: `_adaptive_compact` skipped for Claude models — Claude handles its own context, no forced compaction
+- **Claude normalizer guard**: `_antigravity_normalize_context` skipped for Claude models — avoids stripping Claude-specific message structure
+- **Claude sanitization guard**: Google content sanitization loop skipped for Claude models — prevents mangling Claude's response format
+- **Normalizer model parameter**: `_antigravity_normalize_context` now receives `model` param to distinguish Claude vs Gemini behavior
+
+## v3.10.11 (2026-05-26)
+
+**Hybrid Endpoint Fallback — Redundant Antigravity Endpoints**
+
+### New Features
+- Hybrid endpoint fallback: tries `cloudcode-pa.googleapis.com` then `daily-cloudcode-pa.googleapis.com` on 429
+- `daily-cloudcode-pa.googleapis.com` is the same production endpoint agy-core uses (separate rate limit bucket)
+- 429 errors now log full response body for debugging
+- SERVICE_DISABLED (403) still falls through to next endpoint
+- Rate-limit marking only happens after ALL endpoints fail
+
+### Bug Fixes
+- Fixed 429 on one endpoint immediately failing — now tries fallback before giving up
+- Restored SERVICE_DISABLED fallthrough (was accidentally removed)
+
+## v3.10.10 (2026-05-25)
+
+**Context Normalizer Fix — Compaction Summary Preservation**
+
+### Bug Fixes
+- Fixed normalizer stripping ALL context on resumed sessions after compaction
+- Normalizer no longer auto-resets when compaction summary is present
+- Compaction summaries ("Auto-compacted: N earlier turns") are always preserved
+- Deduplicates consecutive identical `<goal_context>` messages (10→1)
+- Emergency reset now preserves compaction summaries
+- Previous behavior: after compaction reduced 1925→185 items, normalizer saw `n_tool_outputs == 0` and stripped to just `system + latest_user`, losing all context — model responded with "I don't have context"
+
+### hashlib Fix (v3.10.9 hotfix)
+- `_antigravity_normalize_context` crashed with `NameError: hashlib` on resumed sessions
+- Replaced SHA256 duplicate detection with string comparison
+
+## v3.10.9 (2026-05-25)
+
+**Antigravity Overhaul — Context Normalizer, Claude Thinking Fix, Endpoint Lockdown**
+
+### Antigravity Endpoint Lockdown
+- Production-only: `cloudcode-pa.googleapis.com` by default
+- Sandbox/staging blocked unless `ALLOW_ANTIGRAVITY_STAGING=1`
+- 403 SERVICE_DISABLED falls through, 429 returns to client
+
+### AntigravityContextNormalizer
+- Bounded context — no more 136-item polluted requests for "hi"
+- Simple message detector, auto-reset polluted context
+- Duplicate removal, tool output budget, hard char limits
+
+### Claude Thinking Fix (Antigravity-only)
+- Fixed 400 error: `maxOutputTokens=64000` when thinking enabled
+- Snake_case config, VALIDATED toolConfig, proper budgets
+
+### z.ai / OpenRouter (cobra91 PR #4)
+- Full OpenClaw attribution headers, OpenRouter caching
+
+## v3.10.8 (2026-05-25)
+
+**OAuth & Antigravity Endpoint Fixes**
+
+### Re-OAuth Buttons Fixed
+- Linux GUI: `load_oauth_secrets()` was undefined — buttons crashed silently on click
+- Now loads OAuth secrets inline from `~/.config/codex-launcher/oauth-secrets.json`
+- Both Linux and Windows Re-OAuth use PKCE + localhost callback (was deprecated OOB paste)
+
+### Antigravity Staging/Sandbox Blocked by Default
+- Proxy: production `cloudcode-pa.googleapis.com` tried FIRST, sandbox/daily/autopush as fallback only
+- Proxy: 403 SERVICE_DISABLED now falls through to next endpoint instead of returning error immediately
+- Project discovery: validates against production endpoint, not staging-cloudaicompanion.sandbox
+- Antigravity preset `base_url` changed to production (was `daily-cloudcode-pa.sandbox.googleapis.com`)
+- `[antigravity-endpoint]` log line shows which endpoints are being tried
+
+### Other Fixes
+- GLib.idle_add lambda returning truthy tuple fixed (caused repeated callbacks)
+- Windows GUI project discovery also uses production endpoint
+
+## v3.10.7 (2026-05-25)
+
+**Prompt Enhancer — Fix Lost Context After Compaction**
+
+### Prompt Enhancer (Per-Provider Toggle)
+- **Offline mode**: Injects structured XML instructions before every user prompt to keep the model focused, decisive, and context-aware after compaction strips conversation history
+- **AI-powered mode**: Optionally calls an external LLM (configurable model/URL/key) to rewrite vague prompts into clear, actionable instructions
+- Prevents the "had to resend and reword" problem in long sessions where compaction summarizes hundreds of turns
+- **Per-endpoint setting** — enable/disable for each provider independently
+- Configurable in both Linux and Windows GUI: toggle switch, mode selector, enhancer model, URL, API key fields
+
+### How It Works
+- **Offline**: Prepends a `<prompt-enhancer>` block with rules like "never ask for clarification, infer from compacted context, execute decisively"
+- **AI-powered**: Sends the user's prompt + compaction summary to a separate model (e.g. DeepSeek V4 Flash via Freebuff) which rewrites it for clarity, then prepends the offline instructions too
+- Both modes run after compaction but before the request is sent upstream
+
 ## v3.10.6 (2026-05-25)

 **Freebuff Integration + Codebuff OAuth Fix + Windows Consolidation**
--- a/README.md
+++ b/README.md
@@ -554,6 +554,7 @@ The launcher generates model catalog JSON with dual field naming to satisfy both

 Codex Launcher includes special handling for Gemini 3 / Antigravity OAuth:

+- **Sticky endpoint with parallel discovery**: First request probes `cloudcode-pa.googleapis.com` and `daily-cloudcode-pa.googleapis.com` simultaneously — first 200 wins and is cached. All subsequent requests go straight to the cached endpoint. If it fails (429/502/503), cache is cleared and all endpoints are re-probed in parallel. Zero wasted time on rate-limited endpoints.
 - **Thought signature preservation**: Captures `thoughtSignature` from Gemini responses
  and reattaches them on follow-up requests to maintain tool-call continuity.
 - **Edit-intent detection**: When follow-up requests contain edit keywords, a tool-use
@@ -561,7 +562,7 @@ Codex Launcher includes special handling for Gemini 3 / Antigravity OAuth:
 - **User instruction enforcement**: The latest user message is guaranteed to be the
  final content turn sent to Gemini, even after compaction.
 - **Smart compaction**: Old tool outputs capped at 3000 chars, recent 6 at 20000 chars.
- **Context compaction**: Aggressive auto-trimming when approaching 60% of model context
+- **Context compaction**: Aggressive auto-trimming when approaching 80% of model context
  limit (1M tokens Gemini, 200K Claude, 128K GPT-OSS). Prevents token limit errors.
 - **Model ID mapping**: Display names (e.g. `Gemini 3.5 Flash (High)`) mapped to REST API
  slugs (e.g. `gemini-3-flash`). See `docs/ANTIGRAVITY.md` for details.
--- a/5726
+++ b/5726
--- a/codex-launcher-gui.py
+++ b/codex-launcher-gui.py
--- a/codex-launcher_3.10.10_all.deb
+++ b/codex-launcher_3.10.10_all.deb
--- a/codex-launcher_3.10.11_all.deb
+++ b/codex-launcher_3.10.11_all.deb
--- a/codex-launcher_3.10.12_all.deb
+++ b/codex-launcher_3.10.12_all.deb
--- a/codex-launcher_3.10.6_all.deb
+++ b/codex-launcher_3.10.6_all.deb
--- a/codex-launcher_3.10.9_all.deb
+++ b/codex-launcher_3.10.9_all.deb
--- a/codex_launcher_lib.py
+++ b/codex_launcher_lib.py
--- a/install.sh
+++ b/install.sh
@@ -3,11 +3,11 @@ set -e

 SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"

-if [ -f "$SCRIPT_DIR/codex-launcher_3.10.6_all.deb" ]; then
-    echo "Installing codex-launcher_3.10.6_all.deb ..."
-    sudo dpkg -i "$SCRIPT_DIR/codex-launcher_3.10.6_all.deb"
+if [ -f "$SCRIPT_DIR/codex-launcher_3.10.12_all.deb" ]; then
+    echo "Installing codex-launcher_3.10.12_all.deb ..."
+    sudo dpkg -i "$SCRIPT_DIR/codex-launcher_3.10.12_all.deb"
    echo ""
-    echo "Installed v3.10.6 via .deb package."
+    echo "Installed v3.10.12 via .deb package."
    echo "  translate-proxy.py   -> /usr/bin/translate-proxy.py"
    echo "  codex-launcher-gui   -> /usr/bin/codex-launcher-gui"
    echo "  cleanup-codex-stale  -> /usr/bin/cleanup-codex-stale.sh"
--- a/src/codex-launcher-gui
+++ b/src/codex-launcher-gui
@@ -1798,7 +1798,7 @@ class LauncherWin(Gtk.Window):
        # header row
        hdr = Gtk.Box(spacing=8)
        vbox.pack_start(hdr, False, False, 0)
-        lbl = Gtk.Label(label="<b>Codex Launcher v3.10.6</b>")
+        lbl = Gtk.Label(label="<b>Codex Launcher v3.10.7</b>")
        lbl.set_use_markup(True)
        hdr.pack_start(lbl, False, False, 0)
        changelog_btn = Gtk.Button(label="Changelog")
@@ -2925,7 +2925,7 @@ class LauncherWin(Gtk.Window):
                fp_id = str(uuid.uuid4())
                body = json.dumps({"fingerprintId": fp_id}).encode()
                req = urllib.request.Request("https://www.codebuff.com/api/auth/cli/code",
-                    data=body, headers={"Content-Type": "application/json", "User-Agent": "codex-launcher/3.10.6"})
+                    data=body, headers={"Content-Type": "application/json", "User-Agent": "codex-launcher/3.10.7"})
                resp = urllib.request.urlopen(req, timeout=30)
                rdata = json.loads(resp.read())
                login_url = rdata.get("loginUrl", "") or rdata.get("login_url", "")
@@ -2944,7 +2944,7 @@ class LauncherWin(Gtk.Window):
                while time.time() < deadline:
                    time.sleep(2)
                    try:
-                        pr = urllib.request.Request(poll, headers={"User-Agent": "codex-launcher/3.10.6"})
+                        pr = urllib.request.Request(poll, headers={"User-Agent": "codex-launcher/3.10.7"})
                        pd = json.loads(urllib.request.urlopen(pr, timeout=10).read())
                        if pd.get("user", {}).get("authToken"):
                            result["success"] = True
@@ -3503,6 +3503,38 @@ class EditEndpointDialog(Gtk.Dialog):
        add_row(7, "Effort:", self._combo_effort)
        self._on_reasoning_toggled()

+        enhancer_box = Gtk.Box(spacing=6)
+        self._switch_enhancer = Gtk.Switch()
+        self._switch_enhancer.set_active(self._data.get("prompt_enhancer", False))
+        enhancer_box.pack_start(self._switch_enhancer, False, False, 0)
+        self._enhancer_status_lbl = Gtk.Label()
+        enhancer_box.pack_start(self._enhancer_status_lbl, False, False, 0)
+        self._switch_enhancer.connect("notify::active", lambda *a: self._on_enhancer_toggled())
+        self._combo_enhancer_mode = Gtk.ComboBoxText()
+        for mode in ["offline", "ai-powered"]:
+            self._combo_enhancer_mode.append(mode, mode.capitalize())
+        self._combo_enhancer_mode.set_active_id(self._data.get("prompt_enhancer_mode", "offline"))
+        enhancer_box.pack_start(self._combo_enhancer_mode, False, False, 6)
+        add_row(8, "Prompt Enhancer:", enhancer_box)
+        self._on_enhancer_toggled()
+
+        self._entry_enhancer_model = Gtk.Entry()
+        self._entry_enhancer_model.set_placeholder_text("e.g. deepseek/deepseek-v4-flash (ai-powered mode only)")
+        self._entry_enhancer_model.set_text(self._data.get("prompt_enhancer_model", ""))
+        add_row(9, "Enhancer Model:", self._entry_enhancer_model)
+
+        self._entry_enhancer_url = Gtk.Entry()
+        self._entry_enhancer_url.set_placeholder_text("e.g. https://www.codebuff.com/api/v1 (ai-powered mode only)")
+        self._entry_enhancer_url.set_text(self._data.get("prompt_enhancer_url", ""))
+        add_row(10, "Enhancer URL:", self._entry_enhancer_url)
+
+        self._entry_enhancer_key = Gtk.Entry()
+        self._entry_enhancer_key.set_placeholder_text("API key for enhancer model (ai-powered mode only)")
+        self._entry_enhancer_key.set_text(self._data.get("prompt_enhancer_key", ""))
+        self._entry_enhancer_key.set_visibility(False)
+        self._entry_enhancer_key.set_invisible_char("*")
+        add_row(11, "Enhancer Key:", self._entry_enhancer_key)
+
        # Models
        mlbl = Gtk.Label(label="Models:", xalign=0)
        area.pack_start(mlbl, False, False, 4)
@@ -3642,6 +3674,13 @@ class EditEndpointDialog(Gtk.Dialog):
        else:
            self._lbl_reasoning.set_markup('<span foreground="#e67e22" weight="bold">OFF</span>')

+    def _on_enhancer_toggled(self, *_):
+        active = self._switch_enhancer.get_active()
+        if active:
+            self._enhancer_status_lbl.set_markup('<span foreground="#27ae60" weight="bold">ON</span>')
+        else:
+            self._enhancer_status_lbl.set_markup('<span foreground="#888888" weight="bold">OFF</span>')
+
    def _do_oauth_login(self):
        preset_name = self._combo_preset.get_active_text() or "Custom"
        preset = PROVIDER_PRESETS.get(preset_name, {})
@@ -3971,7 +4010,7 @@ class EditEndpointDialog(Gtk.Dialog):
                auth_url = "https://www.codebuff.com/api/auth/cli/code"
                body = json.dumps({"fingerprintId": fingerprint_id}).encode()
                req = urllib.request.Request(auth_url, data=body,
-                    headers={"Content-Type": "application/json", "User-Agent": "codex-launcher/3.10.6"})
+                    headers={"Content-Type": "application/json", "User-Agent": "codex-launcher/3.10.7"})
                resp = urllib.request.urlopen(req, timeout=30)
                data = json.loads(resp.read())
                login_url = data.get("loginUrl", "") or data.get("login_url", "")
@@ -3996,7 +4035,7 @@ class EditEndpointDialog(Gtk.Dialog):
                    time.sleep(2)
                    try:
                        poll_req = urllib.request.Request(poll_url,
-                            headers={"User-Agent": "codex-launcher/3.10.6"})
+                            headers={"User-Agent": "codex-launcher/3.10.7"})
                        poll_resp = urllib.request.urlopen(poll_req, timeout=10)
                        poll_data = json.loads(poll_resp.read())
                        user = poll_data.get("user")
@@ -4195,6 +4234,17 @@ class EditEndpointDialog(Gtk.Dialog):
            new_ep["cc_version"] = cc_ver
        new_ep["reasoning_enabled"] = self._switch_reasoning.get_active()
        new_ep["reasoning_effort"] = self._combo_effort.get_active_id() or "medium"
+        new_ep["prompt_enhancer"] = self._switch_enhancer.get_active()
+        new_ep["prompt_enhancer_mode"] = self._combo_enhancer_mode.get_active_id() or "offline"
+        enh_model = self._entry_enhancer_model.get_text().strip()
+        enh_url = self._entry_enhancer_url.get_text().strip()
+        enh_key = self._entry_enhancer_key.get_text().strip()
+        if enh_model:
+            new_ep["prompt_enhancer_model"] = enh_model
+        if enh_url:
+            new_ep["prompt_enhancer_url"] = enh_url
+        if enh_key:
+            new_ep["prompt_enhancer_key"] = enh_key
        preset_name = self._combo_preset.get_active_text() or "Custom"
        preset = PROVIDER_PRESETS.get(preset_name, {})
        if preset.get("oauth_provider"):
--- a/src/codex-launcher-gui.py
+++ b/src/codex-launcher-gui.py
@@ -225,6 +225,30 @@ class EditEndpointDialog:
        add_field("Reasoning:", lambda: reason_frame)
        self._on_reasoning_toggled()

+        enhancer_frame = ttk.Frame(grid)
+        self._enhancer_var = tk.BooleanVar(value=self._data.get("prompt_enhancer", False))
+        self._enhancer_cb = ttk.Checkbutton(enhancer_frame, text="Prompt Enhancer", variable=self._enhancer_var, command=self._on_enhancer_toggled)
+        self._enhancer_cb.pack(side="left")
+        self._enhancer_status_lbl = ttk.Label(enhancer_frame, text="", foreground="gray")
+        self._enhancer_status_lbl.pack(side="left", padx=(6, 0))
+        self._enhancer_mode = ttk.Combobox(enhancer_frame, values=["offline", "ai-powered"], state="readonly", width=10)
+        self._enhancer_mode.set(self._data.get("prompt_enhancer_mode", "offline"))
+        self._enhancer_mode.pack(side="left", padx=(8, 0))
+        add_field("Prompt Enhancer:", lambda: enhancer_frame)
+        self._on_enhancer_toggled()
+
+        self._entry_enhancer_model = ttk.Entry(grid)
+        self._entry_enhancer_model.insert(0, self._data.get("prompt_enhancer_model", ""))
+        add_field("Enhancer Model:", lambda: self._entry_enhancer_model)
+
+        self._entry_enhancer_url = ttk.Entry(grid)
+        self._entry_enhancer_url.insert(0, self._data.get("prompt_enhancer_url", ""))
+        add_field("Enhancer URL:", lambda: self._entry_enhancer_url)
+
+        self._entry_enhancer_key = ttk.Entry(grid, show="*")
+        self._entry_enhancer_key.insert(0, self._data.get("prompt_enhancer_key", ""))
+        add_field("Enhancer Key:", lambda: self._entry_enhancer_key)
+
        grid.columnconfigure(1, weight=1)

        ttk.Label(main, text="Models:").pack(anchor="w", pady=(8, 2))
@@ -275,6 +299,12 @@ class EditEndpointDialog:
        state = "readonly" if self._reason_var.get() else "disabled"
        self._combo_effort.configure(state=state)

+    def _on_enhancer_toggled(self):
+        if self._enhancer_var.get():
+            self._enhancer_status_lbl.configure(text="ON", foreground="#2ea043")
+        else:
+            self._enhancer_status_lbl.configure(text="OFF", foreground="#888888")
+
    def _apply_selected_preset(self, initial=False):
        preset_name = self._combo_preset.get() or "Custom"
        preset = PROVIDER_PRESETS.get(preset_name, {})
@@ -713,10 +743,21 @@ class EditEndpointDialog:
            "provider_preset": self._combo_preset.get() or "Custom",
            "reasoning_enabled": self._reason_var.get(),
            "reasoning_effort": self._combo_effort.get() or "medium",
+            "prompt_enhancer": self._enhancer_var.get(),
+            "prompt_enhancer_mode": self._enhancer_mode.get() or "offline",
        }
        cc_ver = self._entry_cc_ver.get().strip()
        if cc_ver:
            new_ep["cc_version"] = cc_ver
+        enh_model = self._entry_enhancer_model.get().strip()
+        enh_url = self._entry_enhancer_url.get().strip()
+        enh_key = self._entry_enhancer_key.get().strip()
+        if enh_model:
+            new_ep["prompt_enhancer_model"] = enh_model
+        if enh_url:
+            new_ep["prompt_enhancer_url"] = enh_url
+        if enh_key:
+            new_ep["prompt_enhancer_key"] = enh_key
        preset_name = self._combo_preset.get() or "Custom"
        preset = PROVIDER_PRESETS.get(preset_name, {})
        if preset.get("oauth_provider"):
@@ -2036,6 +2077,61 @@ class BenchmarkWindow:
 # Main Launcher Window
 # ═══════════════════════════════════════════════════════════════════════

+def _oauth_discover_project_win(access_token, token_path, tokens):
+    project_id = ""
+    try:
+        lr = urllib.request.Request(
+            "https://cloudcode-pa.googleapis.com/v1internal:loadCodeAssist",
+            data=json.dumps({}).encode(),
+            headers={"Content-Type": "application/json",
+                     "Authorization": f"Bearer {access_token}",
+                     "User-Agent": "google-api-nodejs-client/9.15.1"})
+        lresp = urllib.request.urlopen(lr, timeout=15)
+        ldata = json.loads(lresp.read())
+        p = ldata.get("cloudaicompanionProject", "")
+        if isinstance(p, dict):
+            project_id = p.get("id", "")
+        elif isinstance(p, str):
+            project_id = p
+    except Exception:
+        pass
+    if not project_id:
+        return ""
+    try:
+        test_url = f"https://cloudcode-pa.googleapis.com/v1internal:listModels?project={project_id}"
+        test_req = urllib.request.Request(test_url,
+            headers={"Authorization": f"Bearer {access_token}",
+                     "User-Agent": "google-api-nodejs-client/9.15.1"})
+        urllib.request.urlopen(test_req, timeout=10)
+    except urllib.error.HTTPError as e:
+        if e.code == 403 and "SERVICE_DISABLED" in (e.read().decode()[:500]):
+            try:
+                list_req = urllib.request.Request(
+                    "https://cloudresourcemanager.googleapis.com/v1/projects?filter=lifecycleState:ACTIVE",
+                    headers={"Authorization": f"Bearer {access_token}"})
+                list_resp = urllib.request.urlopen(list_req, timeout=15)
+                projects = json.loads(list_resp.read()).get("projects", [])
+                for proj in projects:
+                    pid = proj.get("projectId", "")
+                    if not pid or pid == project_id:
+                        continue
+                    try:
+                        t2 = urllib.request.Request(
+                            f"https://cloudcode-pa.googleapis.com/v1internal:listModels?project={pid}",
+                            headers={"Authorization": f"Bearer {access_token}",
+                                     "User-Agent": "google-api-nodejs-client/9.15.1"})
+                        urllib.request.urlopen(t2, timeout=10)
+                        project_id = pid
+                        break
+                    except Exception:
+                        continue
+            except Exception:
+                pass
+    tokens["project_id"] = project_id
+    with open(token_path, "w") as f:
+        json.dump(tokens, f, indent=2)
+    return project_id
+
 class LauncherWin:
    def __init__(self, root):
        self._root = root
@@ -2329,49 +2425,143 @@ class LauncherWin:
            subprocess.Popen([sys.executable, assist_path], creationflags=subprocess.CREATE_NEW_PROCESS_GROUP if IS_WINDOWS else 0)

    def _google_reoauth(self, provider, parent_dlg=None):
+        import http.server
        is_antigravity = provider == "google-antigravity"
        sec_key = "antigravity" if is_antigravity else "gemini_cli"
-        secrets = load_oauth_secrets()
-        sec = secrets.get(sec_key, {})
-        client_id = sec.get("client_id", "")
-        client_secret = sec.get("client_secret", "")
-        if not client_id or not client_secret:
+        secrets_data = load_oauth_secrets()
+        sec = secrets_data.get(sec_key, {})
+        CLIENT_ID = sec.get("client_id", "")
+        CLIENT_SECRET = sec.get("client_secret", "")
+        if not CLIENT_ID or not CLIENT_SECRET:
            messagebox.showerror("Missing OAuth secrets",
                f"No client_id/client_secret for {sec_key}.\nSet them in OAuth Secrets first.")
            return
        token_file = "google-antigravity-oauth-token.json" if is_antigravity else "google-cli-oauth-token.json"
        token_path = str(PROXY_CONFIG_DIR / token_file)
-        redirect = "urn:ietf:wg:oauth:2.0:oob"
-        scope_str = "https://www.googleapis.com/auth/cloud-platform"
-        auth_url = (f"https://accounts.google.com/o/oauth2/v2/auth?client_id={client_id}"
-                    f"&redirect_uri={urllib.parse.quote(redirect)}"
-                    f"&response_type=code&scope={urllib.parse.quote(scope_str)}"
-                    f"&access_type=offline&prompt=consent")
-        open_url(auth_url)
-        code = tk.simpledialog.askstring("Re-OAuth",
-            f"Paste the authorization code for {'Antigravity' if is_antigravity else 'Gemini CLI'}:",
-            parent=parent_dlg or self._root)
-        if not code:
-            return
+        provider_kind = "antigravity" if is_antigravity else "cli"
+
+        if is_antigravity:
+            SCOPES = [
+                "https://www.googleapis.com/auth/cloud-platform",
+                "https://www.googleapis.com/auth/userinfo.email",
+                "https://www.googleapis.com/auth/userinfo.profile",
+                "https://www.googleapis.com/auth/cclog",
+                "https://www.googleapis.com/auth/experimentsandconfigs",
+            ]
+            port = 51121
+            redirect_uri = f"http://localhost:{port}/oauth-callback"
+        else:
+            SCOPES = [
+                "https://www.googleapis.com/auth/cloud-platform",
+                "https://www.googleapis.com/auth/userinfo.email",
+                "https://www.googleapis.com/auth/userinfo.profile",
+            ]
+            with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
+                s.bind(("127.0.0.1", 0))
+                port = s.getsockname()[1]
+            redirect_uri = f"http://127.0.0.1:{port}/oauth2callback"
+
+        state = secrets.token_hex(32)
+        verifier = secrets.token_urlsafe(64)
+        challenge = base64.urlsafe_b64encode(hashlib.sha256(verifier.encode()).digest()).rstrip(b"=").decode()
+
+        scope_str = " ".join(SCOPES)
+        auth_url = (
+            f"https://accounts.google.com/o/oauth2/v2/auth?"
+            f"client_id={CLIENT_ID}"
+            f"&redirect_uri={urllib.parse.quote(redirect_uri)}"
+            f"&response_type=code"
+            f"&scope={urllib.parse.quote(scope_str)}"
+            f"&access_type=offline"
+            f"&prompt=select_account%20consent"
+            f"&state={state}"
+            f"&code_challenge={challenge}"
+            f"&code_challenge_method=S256"
+        )
+
+        oauth_dlg = tk.Toplevel(parent_dlg or self._root)
+        oauth_dlg.title(f"Re-OAuth: {'Antigravity' if is_antigravity else 'Gemini CLI'}")
+        oauth_dlg.geometry("520x200")
+        if parent_dlg:
+            oauth_dlg.transient(parent_dlg)
+        else:
+            oauth_dlg.transient(self._root)
+        oauth_dlg.grab_set()
+        tk.Label(oauth_dlg, text=f"Re-authenticating {'Antigravity' if is_antigravity else 'Gemini CLI'}",
+                 font=("Segoe UI", 11, "bold")).pack(padx=16, pady=(12, 0), anchor="w")
+        link_lbl = tk.Label(oauth_dlg, text="Click here to open Google authorization", fg="blue", cursor="hand2")
+        link_lbl.pack(padx=16, anchor="w")
+        link_lbl.bind("<Button-1>", lambda e: open_url(auth_url))
+        status_var = tk.StringVar(value="Waiting for browser callback...")
+        tk.Label(oauth_dlg, textvariable=status_var).pack(padx=16, pady=(8, 0), anchor="w")
+
+        code_holder = [None]
+        error_holder = [None]
+
+        class OAuthHandler(http.server.BaseHTTPRequestHandler):
+            def do_GET(self2):
+                qs = urllib.parse.urlparse(self2.path).query
+                params = urllib.parse.parse_qs(qs)
+                if "code" in params:
+                    if params.get("state", [None])[0] != state:
+                        self2.send_response(400)
+                        self2.end_headers()
+                        self2.wfile.write(b"CSRF state mismatch")
+                        error_holder[0] = "CSRF state mismatch"
+                        return
+                    code_holder[0] = params["code"][0]
+                    self2.send_response(302)
+                    self2.send_header("Location", "https://developers.google.com/gemini-code-assist/auth_success_gemini")
+                    self2.end_headers()
+                else:
+                    error_holder[0] = params.get("error", ["unknown"])[0]
+                    self2.send_response(302)
+                    self2.send_header("Location", "https://developers.google.com/gemini-code-assist/auth_failure_gemini")
+                    self2.end_headers()
+            def log_message(self2, fmt, *args):
+                pass
+
        try:
-            tok_req = urllib.request.Request("https://oauth2.googleapis.com/token",
-                data=urllib.parse.urlencode({
-                    "code": code, "client_id": client_id, "client_secret": client_secret,
-                    "redirect_uri": redirect, "grant_type": "authorization_code"
-                }).encode(),
-                headers={"Content-Type": "application/x-www-form-urlencoded"})
-            tok_resp = urllib.request.urlopen(tok_req, timeout=30)
-            tok_data = json.loads(tok_resp.read())
-            tok_data["_updated"] = time.time()
-            tok_data["client_id"] = client_id
-            tok_data["client_secret"] = client_secret
-            tok_data["provider_kind"] = "antigravity" if is_antigravity else "cli"
-            os.makedirs(os.path.dirname(token_path), exist_ok=True)
-            with open(token_path, "w") as f:
-                json.dump(tok_data, f, indent=2)
-            self.log(f"[oauth] Refreshed {provider} token")
-        except Exception as e:
-            messagebox.showerror("Token exchange failed", str(e)[:300])
+            bind_host = "localhost" if is_antigravity else "127.0.0.1"
+            server = http.server.HTTPServer((bind_host, port), OAuthHandler)
+        except OSError:
+            status_var.set(f"Port {port} in use — close other apps and retry.")
+            return
+
+        def _wait():
+            deadline = time.time() + 120
+            while code_holder[0] is None and error_holder[0] is None and time.time() < deadline:
+                server.handle_request()
+            server.server_close()
+            if code_holder[0]:
+                try:
+                    tok_data = urllib.parse.urlencode({
+                        "code": code_holder[0], "client_id": CLIENT_ID, "client_secret": CLIENT_SECRET,
+                        "redirect_uri": redirect_uri, "grant_type": "authorization_code",
+                        "code_verifier": verifier,
+                    }).encode()
+                    req = urllib.request.Request("https://oauth2.googleapis.com/token", data=tok_data,
+                        headers={"Content-Type": "application/x-www-form-urlencoded"})
+                    resp = urllib.request.urlopen(req, timeout=30)
+                    tokens = json.loads(resp.read())
+                    tokens["client_id"] = CLIENT_ID
+                    tokens["client_secret"] = CLIENT_SECRET
+                    tokens["provider_kind"] = provider_kind
+                    tokens["expires_at"] = time.time() + tokens.get("expires_in", 3600)
+                    os.makedirs(os.path.dirname(token_path), exist_ok=True)
+                    with open(token_path, "w") as f:
+                        json.dump(tokens, f, indent=2)
+                    project_id = _oauth_discover_project_win(tokens["access_token"], token_path, tokens)
+                    self._root.after(0, lambda: status_var.set(f"OK! Project: {project_id or 'none'}"))
+                    self._root.after(2000, oauth_dlg.destroy)
+                except Exception as e:
+                    self._root.after(0, lambda: status_var.set(f"Failed: {str(e)[:200]}"))
+            else:
+                self._root.after(0, lambda: status_var.set(f"Failed: {error_holder[0] or 'No code received'}"))
+
+        open_url(auth_url)
+        threading.Thread(target=_wait, daemon=True).start()
+        oauth_dlg.wait_window()

    def _codebuff_reoauth_standalone(self, parent_dlg=None):
        import uuid
--- a/src/codex_launcher_lib.py
+++ b/src/codex_launcher_lib.py
@@ -83,6 +83,57 @@ model_catalog_json = ""
 """

 CHANGELOG = [
+    ("3.10.12", "2026-05-26", [
+        "Sticky endpoint: caches last working endpoint, sequential fallback on failure",
+        "Endpoint order: cloudcode-pa first (matches agy CLI), daily-cloudcode-pa fallback",
+        "Anti-stall engine: kills stale proxy processes + clears pycache on startup",
+        "Smart error classification: quota vs capacity vs banned vs validation vs auth",
+        "Rate limit reset parsing: extracts cooldown from error body for accuracy",
+        "Missing headers: X-Client-Name, X-Client-Version, x-goog-api-client, sessionId",
+        "Guardrail skip: simple messages (hi) skip agent guardrail, no more tool-call loops",
+        "Claude fixes: preserve all tools, skip compaction/normalizer/sanitization for Claude",
+        "Normalizer model param: distinguishes Claude vs Gemini for correct behavior",
+    ]),
+    ("3.10.11", "2026-05-26", [
+        "Hybrid endpoint fallback: cloudcode-pa then daily-cloudcode-pa on 429",
+        "daily-cloudcode-pa.googleapis.com (same endpoint agy-core uses)",
+        "429 errors log full response body for debugging",
+        "Rate-limit marking only after ALL endpoints fail",
+        "Restored SERVICE_DISABLED (403) fallthrough",
+    ]),
+    ("3.10.10", "2026-05-25", [
+        "Fix normalizer stripping ALL context after compaction on resumed sessions",
+        "No auto-reset when compaction summary present (preserves 1925+ turn history)",
+        "Always preserve compaction summaries in normalizer output",
+        "Deduplicate consecutive identical goal_context messages",
+        "Emergency reset preserves compaction summaries",
+        "Fix hashlib NameError in _antigravity_normalize_context (string comparison instead)",
+    ]),
+    ("3.10.9", "2026-05-25", [
+        "Antigravity: production-only endpoints (cloudcode-pa.googleapis.com), sandbox blocked unless ALLOW_ANTIGRAVITY_STAGING=1",
+        "Antigravity: 403 SERVICE_DISABLED falls through, 429 returns to client (no sandbox fallback)",
+        "AntigravityContextNormalizer: bounded context — simple messages send minimal payload",
+        "Simple message detector: 'hi' etc sends only user message, no tool history",
+        "Auto-reset polluted context: 200+ items with simple message resets to minimal",
+        "Duplicate user message removal, tool output budget (max 2 verbatim, rest summarized)",
+        "Hard limits: 20 contents, 120K/250K/500K char budgets",
+        "Claude thinking fix: maxOutputTokens=64000, snake_case thinking config, VALIDATED toolConfig",
+        "Claude budgets: low=8192, medium=16384, high=32768",
+        "All fixes scoped to OAUTH_PROVIDER==google-antigravity only",
+        "Project discovery uses production endpoint (not staging)",
+        "z.ai: full OpenClaw attribution headers (cobra91 PR #4)",
+        "OpenRouter: X-OpenRouter-Cache header (cobra91 PR #4)",
+        "Fix Linux Re-OAuth: load_oauth_secrets() was undefined",
+        "Fix GLib.idle_add lambda returning truthy tuple",
+    ]),
+    ("3.10.7", "2026-05-25", [
+        "Prompt Enhancer: per-provider toggle to improve prompt clarity after compaction",
+        "Two modes: offline (template injection) and ai-powered (external LLM rewrites)",
+        "Offline mode: injects structured instructions to keep model focused post-compaction",
+        "AI-powered mode: uses configurable model/URL/key to rewrite prompts for clarity",
+        "Linux/Windows GUI: Prompt Enhancer switch + mode selector + model/URL/key fields",
+        "Prevents lost context issues in long sessions with aggressive compaction",
+    ]),
    ("3.10.6", "2026-05-25", [
        "Freebuff integration: free DeepSeek/Kimi via codebuff.com API",
        "Fixed Freebuff User-Agent to match official SDK (ai-sdk/openai-compatible/1.0.25/codebuff)",
@@ -431,7 +482,7 @@ PROVIDER_PRESETS = {
    },
    "Google Antigravity (OAuth)": {
        "backend_type": "gemini-oauth-antigravity",
-        "base_url": "https://daily-cloudcode-pa.sandbox.googleapis.com",
+        "base_url": "https://cloudcode-pa.googleapis.com",
        "oauth_provider": "google-antigravity",
        "models": [
            "antigravity-gemini-3-flash",
--- a/src/translate-proxy.py
+++ b/src/translate-proxy.py
@@ -247,6 +247,11 @@ REASONING_ENABLED = True
 REASONING_EFFORT = "medium"
 FORCE_MODEL = ""
 BGP_ROUTES = []
+PROMPT_ENHANCER = False
+PROMPT_ENHANCER_MODE = "offline"
+PROMPT_ENHANCER_MODEL = ""
+PROMPT_ENHANCER_URL = ""
+PROMPT_ENHANCER_KEY = ""
 SERVER = None

 if _IS_WINDOWS:
@@ -611,6 +616,51 @@ class APIKeyPool(AccountPool):
 _cb_pool = CodebuffAccountPool("codebuff")
 _google_antigravity_pool = GoogleAccountPool("antigravity")
 _google_cli_pool = GoogleAccountPool("cli")
+_antigravity_preferred_endpoint = None
+_antigravity_endpoint_lock = threading.Lock()
+
+def _classify_antigravity_error(status_code, body):
+    lower = body.lower()
+    if status_code == 400:
+        return "bad_request"
+    if status_code == 401:
+        if any(x in lower for x in ["invalid_grant", "token revoked", "token_revoked", "invalid_client"]):
+            return "auth_permanent"
+        return "auth_transient"
+    if status_code == 403:
+        if "validation_required" in lower or "account_disabled" in lower:
+            return "validation_required"
+        if "has been disabled" in lower and "violation of terms of service" in lower:
+            return "account_banned"
+        if "service_disabled" in lower:
+            return "service_disabled"
+        return "forbidden"
+    if status_code in (429, 503, 529):
+        if any(x in lower for x in ["model_capacity_exhausted", "capacity_exhausted", "model is currently overloaded", "service temporarily unavailable"]):
+            return "capacity_exhausted"
+        if any(x in lower for x in ["quota_exhausted", "resource_exhausted", "daily limit", "quota exceeded", "quotaresetdelay"]):
+            return "quota_exhausted"
+        return "rate_limited"
+    if status_code >= 500:
+        return "server_error"
+    return "unknown"
+
+def _parse_rate_limit_reset(body):
+    import re as _re
+    m = _re.search(r'quotaResetDelay[:"\s]+(\d+(?:\.\d+)?)(ms|s)', body, _re.IGNORECASE)
+    if m:
+        val = float(m.group(1))
+        return val / 1000 if m.group(2) == 'ms' else val
+    m = _re.search(r'(\d+)h(\d+)m(\d+)s', body, _re.IGNORECASE)
+    if m:
+        return int(m.group(1)) * 3600 + int(m.group(2)) * 60 + int(m.group(3))
+    m = _re.search(r'Resets in ~(\d+)h(\d+)m', body, _re.IGNORECASE)
+    if m:
+        return int(m.group(1)) * 3600 + int(m.group(2)) * 60
+    m = _re.search(r'retry[-_]?after[:\s]+(\d+)\s*(?:sec|s\b)', body, _re.IGNORECASE)
+    if m:
+        return int(m.group(1))
+    return None

 def _get_codebuff_account():
    """Return (token, account_dict) for best available codebuff account."""
@@ -766,10 +816,24 @@ def _ensure_antigravity_version():
        _antigravity_version_checked = time.time()
        return _antigravity_version

+_antigravity_client_version = "1.110.0"
+_antigravity_client_version_checked = 0
+
+def _ensure_antigravity_client_version():
+    global _antigravity_client_version, _antigravity_client_version_checked
+    env_ver = os.environ.get("ANTIGRAVITY_CLIENT_VERSION", "").strip()
+    if env_ver:
+        return env_ver
+    if time.time() - _antigravity_client_version_checked < 6 * 3600:
+        return _antigravity_client_version
+    _antigravity_client_version = os.environ.get("ANTIGRAVITY_CLIENT_VERSION_FALLBACK", "1.110.0")
+    _antigravity_client_version_checked = time.time()
+    return _antigravity_client_version
+
 def _init_runtime():
    global CONFIG, PORT, BACKEND, TARGET_URL, API_KEY, OAUTH_PROVIDER, _antigravity_version
    global MODELS, CC_VERSION, REASONING_ENABLED, REASONING_EFFORT, BGP_ROUTES
-    global _api_key_pool
+    global _api_key_pool, PROMPT_ENHANCER

    CONFIG = load_config()
    PORT = CONFIG["port"]
@@ -782,6 +846,11 @@ def _init_runtime():
    REASONING_ENABLED = CONFIG.get("reasoning_enabled", True)
    REASONING_EFFORT = CONFIG.get("reasoning_effort", "medium")
    FORCE_MODEL = (CONFIG.get("force_model") or "").strip()
+    PROMPT_ENHANCER = CONFIG.get("prompt_enhancer", False)
+    PROMPT_ENHANCER_MODE = CONFIG.get("prompt_enhancer_mode", "offline")
+    PROMPT_ENHANCER_MODEL = CONFIG.get("prompt_enhancer_model", "")
+    PROMPT_ENHANCER_URL = CONFIG.get("prompt_enhancer_url", "")
+    PROMPT_ENHANCER_KEY = CONFIG.get("prompt_enhancer_key", "")
    BGP_ROUTES = CONFIG.get("bgp_routes", [])
    _api_key_pool = None
    if API_KEY and "," in API_KEY and not OAUTH_PROVIDER.startswith("google") and BACKEND not in ("codebuff", "freebuff"):
@@ -1290,6 +1359,26 @@ def forwarded_headers(request_headers, extra=None, browser_ua=False):
        headers.update(extra)
    return headers

+def _openrouter_extra():
+    if not TARGET_URL:
+        return {}
+    if "z.ai" in TARGET_URL:
+        return {
+            "HTTP-Referer": "https://openclaw.ai",
+            "X-OpenRouter-Title": "OpenClaw",
+            "X-OpenRouter-Categories":
+                "cli-agent,cloud-agent,programming-app,creative-writing,"
+                "writing-assistant,general-chat,personal-agent",
+        }
+    if "openrouter.ai" in TARGET_URL:
+        return {
+            "HTTP-Referer": "https://chats-llm.com",
+            "X-OpenRouter-Title": "Chats-LLM",
+            "X-OpenRouter-Categories": "general-chat, ide-extension",
+            "X-OpenRouter-Cache": "true",
+        }
+    return {}
+
 _MAX_INPUT_ITEMS = 30
 _MAX_TOOL_OUTPUT_CHARS = 8000
 _COMPACT_KEEP_RECENT = 10
@@ -1694,6 +1783,120 @@ def _adaptive_compact(input_data, model, policy=None):
          f"items {len(input_data)}->{len(head)+1+len(tail)}", file=sys.stderr)
    return head + [summary_msg] + tail, True

+# ═══════════════════════════════════════════════════════════════════
+# Prompt Enhancer
+# ═══════════════════════════════════════════════════════════════════
+
+_PROMPT_ENHANCER_SYSTEM = """You are a prompt enhancement assistant for a coding agent (Codex CLI).
+Your job: rewrite the user's latest message to be clearer, more specific, and more actionable.
+Rules:
+- Preserve the user's EXACT intent — never change what they want done
+- Add explicit action verbs and step-by-step clarity
+- If the message is vague ("fix it", "make it better"), infer context from prior conversation summary and make it specific
+- Keep the enhanced prompt concise — no longer than 2x the original
+- If the original prompt is already clear and specific, return it unchanged
+- Output ONLY the enhanced prompt text, nothing else
+- Never add tasks the user didn't ask for"""
+
+_PROMPT_ENHANCER_OFFLINE = """<prompt-enhancer>
+<instructions>
+You are a coding agent operating inside a context-compacted session. Follow these rules strictly:
+
+1. ACTION CLARITY: Re-read the user's latest message. Identify every explicit and implicit action request. Execute ALL of them — do not skip any.
+
+2. COMPACTED CONTEXT: Previous conversation was summarized. The summary preserves your task history but may lose details. If the user references earlier work ("fix that", "continue", "update it"), infer from the compacted summary what was done and what remains.
+
+3. NO CLARIFICATION ASKING: Never ask "which file?" or "what exactly?" — infer from context. If truly ambiguous, make a reasonable assumption and proceed. The user can correct you.
+
+4. DECISIVE EXECUTION: When the user says "fix", "update", "change", "add", "remove" — do it immediately in the relevant file(s). Do not describe what you would do — actually do it.
+
+5. COMPLETE EDITS: When editing files, make the FULL change requested. Do not partially apply edits or leave placeholders.
+
+6. PRESERVE WORKING STATE: Never break existing functionality. If changing code, keep all surrounding logic intact.
+
+7. MULTI-STEP REQUESTS: If the user asks for multiple things, do ALL of them in sequence. Do not stop after the first one.
+</instructions>
+</prompt-enhancer>
+
+"""
+
+def _enhance_prompt_llm(text, compaction_summary=""):
+    global PROMPT_ENHANCER_MODEL, PROMPT_ENHANCER_URL, PROMPT_ENHANCER_KEY
+    if not PROMPT_ENHANCER_MODEL or not PROMPT_ENHANCER_URL:
+        return text
+    try:
+        messages = [
+            {"role": "system", "content": _PROMPT_ENHANCER_SYSTEM},
+        ]
+        if compaction_summary:
+            messages.append({"role": "user", "content": f"Context from earlier conversation (compacted):\n{compaction_summary[:2000]}"})
+        messages.append({"role": "user", "content": f"Enhance this prompt:\n{text}"})
+        body = json.dumps({"model": PROMPT_ENHANCER_MODEL, "messages": messages, "max_tokens": 2000, "temperature": 0.3}).encode()
+        headers = {"Content-Type": "application/json"}
+        if PROMPT_ENHANCER_KEY:
+            headers["Authorization"] = f"Bearer {PROMPT_ENHANCER_KEY}"
+        req = urllib.request.Request(f"{PROMPT_ENHANCER_URL.rstrip('/')}/chat/completions", data=body, headers=headers)
+        resp = urllib.request.urlopen(req, timeout=15)
+        data = json.loads(resp.read())
+        enhanced = data.get("choices", [{}])[0].get("message", {}).get("content", "").strip()
+        if enhanced and len(enhanced) >= len(text) * 0.5:
+            print(f"[prompt-enhancer] AI enhanced: {text[:80]}... -> {enhanced[:80]}...", file=sys.stderr)
+            return enhanced
+    except Exception as e:
+        print(f"[prompt-enhancer] AI enhancement failed: {e}", file=sys.stderr)
+    return text
+
+def _apply_prompt_enhancer(input_data):
+    global PROMPT_ENHANCER_MODE
+    if not isinstance(input_data, list) or len(input_data) == 0:
+        return input_data
+    last_user_idx = None
+    for i in range(len(input_data) - 1, -1, -1):
+        item = input_data[i]
+        if isinstance(item, dict) and item.get("type") == "message" and item.get("role") == "user":
+            last_user_idx = i
+            break
+    if last_user_idx is None:
+        return input_data
+    item = input_data[last_user_idx]
+    content = item.get("content", "")
+    if isinstance(content, list):
+        text = content[0].get("text", "") if content else ""
+    elif isinstance(content, str):
+        text = content
+    else:
+        return input_data
+    if not text or len(text) < 5:
+        return input_data
+    if text.startswith("<prompt-enhancer>"):
+        return input_data
+    compaction_summary = ""
+    for it in input_data:
+        if isinstance(it, dict) and it.get("type") == "message" and it.get("role") == "user":
+            c = it.get("content", "")
+            t = ""
+            if isinstance(c, list):
+                t = c[0].get("text", "") if c else ""
+            elif isinstance(c, str):
+                t = c
+            if "[Auto-compacted:" in t:
+                compaction_summary = t[:3000]
+                break
+    if PROMPT_ENHANCER_MODE == "ai-powered" and PROMPT_ENHANCER_MODEL and PROMPT_ENHANCER_URL:
+        enhanced = _enhance_prompt_llm(text, compaction_summary)
+    else:
+        enhanced = text
+    enhanced = _PROMPT_ENHANCER_OFFLINE + enhanced
+    new_item = dict(item)
+    if isinstance(item.get("content"), list):
+        new_item["content"] = [{"type": "input_text", "text": enhanced}]
+    else:
+        new_item["content"] = enhanced
+    result = list(input_data)
+    result[last_user_idx] = new_item
+    print(f"[prompt-enhancer] mode={PROMPT_ENHANCER_MODE} enhanced last user message ({len(text)}->{len(enhanced)} chars)", file=sys.stderr)
+    return result
+
 # ═══════════════════════════════════════════════════════════════════
 # Tool-call pairing validator
 # ═══════════════════════════════════════════════════════════════════
@@ -4113,6 +4316,221 @@ def _auto_continue_gemini(handler, flush_event, message_id, model, gen_config, g
            break
    return accumulated_text

+_ANTIGRAVITY_MAX_CONTENTS = 20
+_ANTIGRAVITY_MAX_TOOL_VERBATIM = 2
+_ANTIGRAVITY_MAX_TOOL_CHARS = 2000
+_ANTIGRAVITY_MAX_OLD_SUMMARY_CHARS = 1200
+_ANTIGRAVITY_SOFT_CHARS = 120000
+_ANTIGRAVITY_HARD_CHARS = 250000
+_ANTIGRAVITY_EMERGENCY_CHARS = 500000
+_ANTIGRAVITY_SIMPLE_WORDS = frozenset({"hi", "hello", "hey", "test", "ping", "thanks", "thank you", "ok", "okay", "yes", "no", "cool", "nice", "good", "great", "done", "go", "stop", "yep", "nope", "sure", "right", "correct", "continue", "cont", "k", "thx", "ty", "np", "lol", "brb", "bye"})
+_ANTIGRAVITY_EDIT_WORDS = frozenset(("change", "fix", "update", "redesign", "rewrite", "modify", "improve", "replace", "edit", "make it", "add", "remove", "delete", "rename", "move", "convert", "create", "build", "implement"))
+_ANTIGRAVITY_REFERENCE_WORDS = frozenset(("previous", "file", "error", "again", "that", "this", "it", "same", "last", "above", "earlier", "before", "earlier output", "last error", "previous result", "what was", "show me", "give me"))
+
+def _antigravity_is_simple_user(text):
+    if not text:
+        return True
+    stripped = text.strip().lower()
+    if stripped in _ANTIGRAVITY_SIMPLE_WORDS:
+        return True
+    if len(stripped) < 30:
+        words = set(stripped.split())
+        if not words.intersection(_ANTIGRAVITY_REFERENCE_WORDS) and not words.intersection(_ANTIGRAVITY_EDIT_WORDS):
+            return True
+    return False
+
+def _antigravity_normalize_context(input_data, model=""):
+    if not isinstance(input_data, list) or len(input_data) < 2:
+        return input_data
+    is_claude_model = "claude" in model.lower()
+
+    latest_user = ""
+    latest_user_idx = -1
+    for i in range(len(input_data) - 1, -1, -1):
+        item = input_data[i]
+        if isinstance(item, dict) and item.get("type") == "message" and item.get("role") == "user":
+            c = item.get("content", "")
+            if isinstance(c, str):
+                latest_user = c
+            elif isinstance(c, list):
+                latest_user = "\n".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict))
+            latest_user_idx = i
+            break
+
+    if not latest_user:
+        return input_data
+
+    is_simple = _antigravity_is_simple_user(latest_user)
+
+    n_raw = len(input_data)
+    n_tool_outputs = sum(1 for it in input_data if isinstance(it, dict) and it.get("type") == "function_call_output")
+    n_tool_calls = sum(1 for it in input_data if isinstance(it, dict) and it.get("type") == "function_call")
+
+    auto_reset = (n_raw > 200 or n_tool_outputs > 20) and is_simple
+    if os.environ.get("ANTIGRAVITY_AUTO_RESET_POLLUTED_CONTEXT", "1") != "1":
+        auto_reset = False
+
+    has_compaction_summary = any(
+        isinstance(it, dict) and it.get("type") == "message" and it.get("role") == "user"
+        and ("Auto-compacted" in str(it.get("content", "")) or "auto-compacted" in str(it.get("content", "")).lower())
+        for it in input_data
+    )
+
+    if is_simple and auto_reset and not has_compaction_summary:
+        system_items = [it for it in input_data if isinstance(it, dict) and it.get("type") == "message" and it.get("role") in ("developer", "system")]
+        user_item = input_data[latest_user_idx]
+        result = system_items + [user_item] if system_items else [user_item]
+        print(f"[antigravity-context] raw_items={n_raw} compacted_items={n_raw} final_items={len(result)}", file=sys.stderr)
+        print(f"[antigravity-context] raw_tool_outputs={n_tool_outputs} kept_tool_outputs=0", file=sys.stderr)
+        print(f"[antigravity-context] simple_latest_user=true auto_reset={auto_reset} has_compaction={has_compaction_summary}", file=sys.stderr)
+        return result
+
+    dev_messages = []
+    recent_items = []
+    tool_outputs = []
+    other_items = []
+
+    for i, item in enumerate(input_data):
+        if not isinstance(item, dict):
+            continue
+        t = item.get("type")
+        if t == "message" and item.get("role") in ("developer", "system"):
+            dev_messages.append(item)
+        elif t == "function_call_output":
+            tool_outputs.append((i, item))
+        elif t in ("function_call",):
+            other_items.append((i, item))
+        elif t == "message":
+            recent_items.append((i, item))
+
+    latest_words = set(latest_user.strip().lower().split())
+    has_edit_intent = bool(latest_words.intersection(_ANTIGRAVITY_EDIT_WORDS))
+    has_ref_intent = bool(latest_words.intersection(_ANTIGRAVITY_REFERENCE_WORDS))
+    if is_claude_model:
+        keep_tools = len(tool_outputs)
+    else:
+        keep_tools = 2 if (has_edit_intent or has_ref_intent) else 1
+
+    if is_claude_model:
+        kept_tools = tool_outputs
+    else:
+        kept_tools = tool_outputs[-keep_tools:] if tool_outputs and (has_edit_intent or has_ref_intent) else []
+
+    for idx_t, t_item in enumerate(kept_tools):
+        orig = t_item[1]
+        out = orig.get("output", "")
+        if isinstance(out, str) and len(out) > _ANTIGRAVITY_MAX_TOOL_CHARS:
+            new_item = dict(orig)
+            new_item["output"] = out[:_ANTIGRAVITY_MAX_TOOL_CHARS] + f"\n... [truncated: kept {_ANTIGRAVITY_MAX_TOOL_CHARS} of {len(out)} chars]"
+            kept_tools[idx_t] = (t_item[0], new_item)
+
+    n_summarized = len(tool_outputs) - len(kept_tools)
+
+    tail_start = max(0, len(recent_items) - 6)
+    recent_tail = recent_items[tail_start:]
+
+    deduped_tail = []
+    seen_goal_context = False
+    for idx, msg_item in recent_tail:
+        content_str = ""
+        c = msg_item.get("content", "")
+        if isinstance(c, str):
+            content_str = c
+        elif isinstance(c, list):
+            content_str = " ".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict))
+        if "<goal_context>" in content_str:
+            if seen_goal_context:
+                continue
+            seen_goal_context = True
+        deduped_tail.append((idx, msg_item))
+    recent_tail = deduped_tail if deduped_tail else recent_tail
+
+    tool_call_ids = set()
+    for _, t_item in kept_tools:
+        cid = t_item.get("call_id", t_item.get("id", ""))
+        if cid:
+            tool_call_ids.add(cid)
+
+    paired_calls = []
+    for idx, item in other_items:
+        cid = item.get("call_id", item.get("id", ""))
+        if cid in tool_call_ids:
+            paired_calls.append((idx, item))
+
+    result = list(dev_messages)
+
+    compaction_summaries = []
+    for idx, msg_item in recent_items:
+        if msg_item is input_data[latest_user_idx]:
+            continue
+        c = msg_item.get("content", "")
+        content_str = c if isinstance(c, str) else " ".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict)) if isinstance(c, list) else ""
+        if "Auto-compacted" in content_str or "auto-compacted" in content_str.lower():
+            compaction_summaries.append(msg_item)
+
+    if n_summarized > 0:
+        summary_text = f"[Tool history summary: {n_summarized} older tool outputs omitted. {n_tool_calls} prior function calls were made for file inspection/editing.]"
+        result.append({"type": "message", "role": "user", "content": [{"type": "input_text", "text": summary_text}]})
+
+    for _, call_item in paired_calls:
+        result.append(call_item)
+
+    for _, tool_item in kept_tools:
+        result.append(tool_item)
+
+    for cs_item in compaction_summaries:
+        result.append(cs_item)
+
+    for _, msg_item in recent_tail:
+        if msg_item is not input_data[latest_user_idx]:
+            result.append(msg_item)
+
+    latest_norm = " ".join(latest_user.strip().split())[:200].lower()
+    already_present = False
+    for r in result:
+        if isinstance(r, dict) and r.get("type") == "message" and r.get("role") == "user":
+            c = r.get("content", "")
+            if isinstance(c, str):
+                rn = " ".join(c.strip().split())[:200].lower()
+            elif isinstance(c, list):
+                combined = " ".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict))
+                rn = " ".join(combined.strip().split())[:200].lower()
+            else:
+                rn = ""
+            if rn == latest_norm:
+                already_present = True
+                break
+
+    if not already_present:
+        result.append(input_data[latest_user_idx])
+
+    total_chars = sum(len(json.dumps(it, ensure_ascii=False)) for it in result)
+
+    if total_chars > _ANTIGRAVITY_EMERGENCY_CHARS:
+        print(f"[antigravity-context] EMERGENCY: {total_chars} chars exceeds limit, resetting to minimal", file=sys.stderr)
+        result = list(dev_messages)
+        if compaction_summaries:
+            result.extend(compaction_summaries)
+        result.append(input_data[latest_user_idx])
+        total_chars = sum(len(json.dumps(it, ensure_ascii=False)) for it in result)
+
+    while len(result) > _ANTIGRAVITY_MAX_CONTENTS and total_chars > _ANTIGRAVITY_SOFT_CHARS:
+        for i in range(1, len(result) - 1):
+            if isinstance(result[i], dict) and result[i].get("type") in ("message", "function_call_output"):
+                removed = result.pop(i)
+                total_chars -= len(json.dumps(removed, ensure_ascii=False))
+                break
+        else:
+            break
+
+    est_tokens = total_chars // 4
+    print(f"[antigravity-context] raw_items={n_raw} final_items={len(result)}", file=sys.stderr)
+    print(f"[antigravity-context] raw_tool_outputs={n_tool_outputs} kept_tool_outputs={len(kept_tools)} summarized_tool_outputs={n_summarized}", file=sys.stderr)
+    print(f"[antigravity-context] simple_latest_user={is_simple} auto_reset={auto_reset}", file=sys.stderr)
+    print(f"[antigravity-context] final_chars={total_chars} estimated_tokens={est_tokens}", file=sys.stderr)
+
+    return result
+
 class Handler(http.server.BaseHTTPRequestHandler):
    protocol_version = "HTTP/1.1"

@@ -4288,12 +4706,17 @@ class Handler(http.server.BaseHTTPRequestHandler):
                body["input"] = input_data

        compacted = False
-        if policy.get("compaction") and isinstance(input_data, list):
+        if policy.get("compaction") and isinstance(input_data, list) and "claude" not in model.lower():
            input_data, compacted = _adaptive_compact(input_data, model, policy)
            if compacted:
                body = dict(body)
                body["input"] = input_data

+        if PROMPT_ENHANCER and isinstance(input_data, list):
+            input_data = _apply_prompt_enhancer(input_data)
+            body = dict(body)
+            body["input"] = input_data
+
        crof_limit = _crof_item_limit(model)
        _crof_eligible = TARGET_URL and "crof.ai" in TARGET_URL
        if _crof_eligible and not compacted and isinstance(input_data, list) and len(input_data) > crof_limit:
@@ -4321,6 +4744,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
            fwd = forwarded_headers(self.headers, {
                "Content-Type": "application/json",
                "Authorization": f"Bearer {effective_key}",
+                **_openrouter_extra(),
            }, browser_ua=True)
            print(f"[{self._session_id}] POST {target} model={model} stream={stream} items={len(input_data) if isinstance(input_data,list) else 1}", file=sys.stderr)
            chat_body_b = json.dumps(chat_body).encode()
@@ -4461,12 +4885,22 @@ class Handler(http.server.BaseHTTPRequestHandler):
            body["input"] = input_data

        compacted = False
-        if policy.get("compaction") and isinstance(input_data, list):
+        if policy.get("compaction") and isinstance(input_data, list) and "claude" not in model.lower():
            input_data, compacted = _adaptive_compact(input_data, model, policy)
            if compacted:
                body = dict(body)
                body["input"] = input_data

+        if PROMPT_ENHANCER and isinstance(input_data, list):
+            input_data = _apply_prompt_enhancer(input_data)
+            body = dict(body)
+            body["input"] = input_data
+
+        if OAUTH_PROVIDER == "google-antigravity" and isinstance(input_data, list) and "claude" not in model.lower():
+            input_data = _antigravity_normalize_context(input_data, model)
+            body = dict(body)
+            body["input"] = input_data
+
        access_token = _refresh_oauth_token()
        token_name = "google-antigravity-oauth-token.json" if OAUTH_PROVIDER == "google-antigravity" else "google-cli-oauth-token.json"
        token_path = os.path.join(os.path.expanduser("~"), ".cache", "codex-proxy", token_name)
@@ -4541,7 +4975,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
                        resp_part["functionResponse"]["id"] = call_id
                    contents.append({"role": "user", "parts": [resp_part]})

-        if OAUTH_PROVIDER.startswith("google"):
+        if OAUTH_PROVIDER.startswith("google") and "claude" not in model.lower():
            sanitized = []
            last_user_text = None
            last_role = None
@@ -4587,7 +5021,26 @@ class Handler(http.server.BaseHTTPRequestHandler):
        if body.get("top_p") is not None:
            gen_config["topP"] = body["top_p"]

-        if REASONING_ENABLED and REASONING_EFFORT != "none":
+        _is_claude_model = "claude" in model.lower()
+        _is_claude_thinking = _is_claude_model and "thinking" in model.lower()
+
+        if OAUTH_PROVIDER == "google-antigravity" and _is_claude_thinking:
+            if REASONING_ENABLED and REASONING_EFFORT != "none":
+                budget = {"low": 8192, "medium": 16384, "high": 32768}.get(REASONING_EFFORT, 16384)
+            else:
+                budget = 16384
+            gen_config["thinkingConfig"] = {
+                "include_thoughts": True,
+                "thinking_budget": budget,
+            }
+            current_max = gen_config.get("maxOutputTokens", 0)
+            if not current_max or current_max <= budget:
+                gen_config["maxOutputTokens"] = 64000
+            print(f"[antigravity-claude] thinking model={model} budget={budget} maxOutputTokens={gen_config.get('maxOutputTokens')}", file=sys.stderr)
+        elif OAUTH_PROVIDER == "google-antigravity" and _is_claude_model:
+            if "thinkingConfig" in gen_config:
+                del gen_config["thinkingConfig"]
+        elif REASONING_ENABLED and REASONING_EFFORT != "none":
            budget = {"low": 2048, "medium": 8192, "high": 24576}.get(REASONING_EFFORT, 8192)
            gen_config["thinkingConfig"] = {"includeThoughts": True, "thinkingBudget": budget}

@@ -4613,8 +5066,19 @@ class Handler(http.server.BaseHTTPRequestHandler):
            contents = _gemini_reattach_sigs(contents)

        if OAUTH_PROVIDER == "google-antigravity":
+            latest_user = ""
+            if isinstance(input_data, list):
+                for item in reversed(input_data):
+                    if item.get("type") == "message" and item.get("role") == "user":
+                        c = item.get("content", "")
+                        if isinstance(c, str):
+                            latest_user = c
+                        elif isinstance(c, list):
+                            latest_user = "\n".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict))
+                        break
+            is_latest_simple = _antigravity_is_simple_user(latest_user)
            guardrail_found = any("autonomous coding agent" in json.dumps(c.get("parts", []), ensure_ascii=False) for c in contents[:2])
-            if not guardrail_found:
+            if not guardrail_found and not is_latest_simple:
                contents.insert(0, {"role": "user", "parts": [{"text": _GEMINI_AGENT_GUARDRAIL}]})

        if OAUTH_PROVIDER == "google-antigravity" and isinstance(input_data, list):
@@ -4667,6 +5131,11 @@ class Handler(http.server.BaseHTTPRequestHandler):
        if gemini_tools:
            request_body["tools"] = gemini_tools

+        if OAUTH_PROVIDER == "google-antigravity" and _is_claude_model and gemini_tools:
+            request_body["toolConfig"] = {"functionCallingConfig": {"mode": "VALIDATED"}}
+            if _is_claude_thinking:
+                print(f"[antigravity-claude] applied VALIDATED toolConfig for thinking model", file=sys.stderr)
+
        wrapped = {
            "project": project_id,
            "model": model,
@@ -4676,14 +5145,22 @@ class Handler(http.server.BaseHTTPRequestHandler):
            wrapped["requestType"] = "agent"
            wrapped["userAgent"] = "antigravity"
            wrapped["requestId"] = f"agent-{uuid.uuid4().hex[:12]}"
+            wrapped["request"]["sessionId"] = f"{uuid.uuid4().hex}{int(time.time()*1000)}"

-        endpoints = ([
-            "https://daily-cloudcode-pa.sandbox.googleapis.com",
-            "https://autopush-cloudcode-pa.sandbox.googleapis.com",
-            "https://cloudcode-pa.googleapis.com",
-        ] if OAUTH_PROVIDER == "google-antigravity" else [
-            "https://cloudcode-pa.googleapis.com",
-        ])
+        _allow_staging = os.environ.get("ALLOW_ANTIGRAVITY_STAGING", "0") == "1"
+        if OAUTH_PROVIDER == "google-antigravity":
+            _antigravity_endpoints = [
+                "https://cloudcode-pa.googleapis.com",
+                "https://daily-cloudcode-pa.googleapis.com",
+            ]
+            if _allow_staging:
+                _antigravity_endpoints.extend([
+                    "https://daily-cloudcode-pa.sandbox.googleapis.com",
+                    "https://autopush-cloudcode-pa.sandbox.googleapis.com",
+                ])
+            endpoints = _antigravity_endpoints
+        else:
+            endpoints = ["https://cloudcode-pa.googleapis.com"]
        action = "streamGenerateContent" if stream else "generateContent"
        url_suffix = f"v1internal:{action}?alt=sse" if stream else f"v1internal:{action}"

@@ -4693,7 +5170,13 @@ class Handler(http.server.BaseHTTPRequestHandler):
        }
        if OAUTH_PROVIDER == "google-antigravity":
            version = _ensure_antigravity_version()
-            headers["User-Agent"] = f"antigravity/{version} darwin/arm64"
+            import platform as _plat
+            _os_name = _plat.system().lower()
+            _os_arch = _plat.machine().lower().replace("x86_64", "x64").replace("aarch64", "arm64")
+            headers["User-Agent"] = f"antigravity/{version} {_os_name}/{_os_arch}"
+            headers["X-Client-Name"] = "antigravity"
+            headers["X-Client-Version"] = _ensure_antigravity_client_version()
+            headers["x-goog-api-client"] = "gl-node/18.18.2 fire/0.8.6 grpc/1.10.x"
        else:
            headers["User-Agent"] = "google-api-nodejs-client/9.15.1"
            headers["X-Goog-Api-Client"] = "gl-node/22.17.0"
@@ -4710,14 +5193,36 @@ class Handler(http.server.BaseHTTPRequestHandler):
            except Exception:
                pass

-        for ep in endpoints:
+        if OAUTH_PROVIDER == "google-antigravity":
+            print(f"[antigravity-endpoint] endpoints={[e.replace('https://','') for e in endpoints]} project={project_id}", file=sys.stderr)
+
+        upstream = None
+        chosen_ep = None
+        global _antigravity_preferred_endpoint
+
+        with _antigravity_endpoint_lock:
+            _pref = _antigravity_preferred_endpoint
+
+        if _pref and _pref in endpoints:
+            ordered = [_pref] + [e for e in endpoints if e != _pref]
+        else:
+            ordered = list(endpoints)
+
+        for ep in ordered:
            target = f"{ep}/{url_suffix}"
            req = urllib.request.Request(target, data=body_b, headers=headers)
            try:
                upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, stream))
+                chosen_ep = ep
+                with _antigravity_endpoint_lock:
+                    _antigravity_preferred_endpoint = ep
+                if ep != _pref:
+                    print(f"[{self._session_id}] fallback OK: {ep.replace('https://','')}", file=sys.stderr)
                break
            except urllib.error.HTTPError as e:
                err_body = e.read().decode()
+                err_class = _classify_antigravity_error(e.code, err_body)
+                print(f"[{self._session_id}] {ep.replace('https://','')} {e.code} class={err_class}", file=sys.stderr)
                if e.code == 400 and OAUTH_PROVIDER.startswith("google"):
                    try:
                        debug_path = os.path.join(_LOG_DIR, "gemini-last-400-request.json")
@@ -4726,20 +5231,38 @@ class Handler(http.server.BaseHTTPRequestHandler):
                        print(f"[{self._session_id}] saved 400 debug request to {debug_path}", file=sys.stderr)
                    except Exception:
                        pass
-                if e.code == 429 and ep != endpoints[-1]:
-                    print(f"[{self._session_id}] {ep} HTTP 429, trying next endpoint", file=sys.stderr)
+                    return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
+                if err_class == "auth_permanent":
+                    return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
+                if err_class in ("quota_exhausted", "rate_limited"):
+                    reset_s = _parse_rate_limit_reset(err_body)
+                    if ep == ordered[-1]:
+                        pool = _google_antigravity_pool if OAUTH_PROVIDER == "google-antigravity" else _google_cli_pool
+                        _, acct = _get_google_account(OAUTH_PROVIDER)
+                        if acct:
+                            cooldown = reset_s if reset_s and reset_s > 10 else 60
+                            pool.mark_rate_limited(acct, cooldown)
+                            print(f"[{self._session_id}] quota reset in ~{reset_s}s, cooldown={cooldown}s", file=sys.stderr)
+                        return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
+                    print(f"[{self._session_id}] {ep.replace('https://','')} 429, trying next", file=sys.stderr)
+                    with _antigravity_endpoint_lock:
+                        _antigravity_preferred_endpoint = None
                    continue
-                if e.code == 429:
-                    pool = _google_antigravity_pool if OAUTH_PROVIDER == "google-antigravity" else _google_cli_pool
-                    _, acct = _get_google_account(OAUTH_PROVIDER)
-                    if acct:
-                        pool.mark_rate_limited(acct, 60)
-                return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
-            except Exception as e:
-                if ep == endpoints[-1]:
-                    return self.send_json(502, {"error": {"type": "proxy_error", "message": str(e)}})
-                print(f"[{self._session_id}] {ep} failed: {e}, trying next", file=sys.stderr)
+                if err_class in ("service_disabled", "forbidden", "account_banned", "validation_required"):
+                    if ep == ordered[-1]:
+                        return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
+                    continue
+                if ep == ordered[-1]:
+                    return self.send_json(e.code, {"error": {"type": "upstream_error", "message": _sanitize_err_body(err_body)}})
                continue
+            except Exception as e:
+                print(f"[{self._session_id}] {ep.replace('https://','')} conn failed: {e}", file=sys.stderr)
+                if ep == ordered[-1]:
+                    return self.send_json(502, {"error": {"type": "proxy_error", "message": str(e)}})
+                continue
+
+        if upstream is None:
+            return self.send_json(502, {"error": {"type": "proxy_error", "message": "All endpoints failed"}})

        if stream:
            self._forward_gemini_sse(upstream, model, body, input_data, tracker)
@@ -4947,6 +5470,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
            fwd = forwarded_headers(self.headers, {
                "Content-Type": "application/json",
                "Authorization": f"Bearer {r_key}",
+                **_openrouter_extra(),
            }, browser_ua=True)
            print(f"[{self._session_id}] trying route '{route.get('name', r_url)}' model={r_model}", file=sys.stderr)
            req = urllib.request.Request(target, data=json.dumps(chat_body).encode(), headers=fwd)
@@ -5209,6 +5733,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
                "Content-Type": "application/json",
                "x-api-key": API_KEY,
                "anthropic-version": "2023-06-01",
+                **_openrouter_extra(),
            }),
        )
        self._forward(req, stream, model,
@@ -5276,7 +5801,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
                "threadId": thread_id,
            }

-            fwd = forwarded_headers(self.headers, headers_extra, browser_ua=True)
+            fwd = forwarded_headers(self.headers, {**headers_extra, **_openrouter_extra()}, browser_ua=True)
            print(f"[{self._session_id}] POST {target} model={model} stream={stream} attempt={attempt} [command-code]", file=sys.stderr)
            req = urllib.request.Request(
                target,
@@ -5810,7 +6335,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
                    req_body["reasoning_effort"] = REASONING_EFFORT

            req_body_b = json.dumps(req_body).encode()
-            fwd = forwarded_headers(self.headers, headers_extra, browser_ua=True)
+            fwd = forwarded_headers(self.headers, {**headers_extra, **_openrouter_extra()}, browser_ua=True)
            print(f"[auto-sense] POST {target} model={model} attempt={attempt} schema={schema.hints()}", file=sys.stderr)

            req = urllib.request.Request(target, data=req_body_b, headers=fwd)
@@ -6025,9 +6550,42 @@ def _handle_shutdown_signal(sig, frame):
    if 'SERVER' in globals() and SERVER:
         SERVER.shutdown()
 
+def _anti_stall_cleanup():
+    my_pid = os.getpid()
+    my_port = PORT
+    killed = []
+    try:
+        import subprocess as _sp
+        out = _sp.run(["pgrep", "-f", "translate-proxy"], capture_output=True, text=True, timeout=5).stdout.strip()
+        for pid_str in out.splitlines():
+            pid_str = pid_str.strip()
+            if not pid_str or not pid_str.isdigit():
+                continue
+            pid = int(pid_str)
+            if pid == my_pid:
+                continue
+            try:
+                os.kill(pid, signal.SIGTERM)
+                killed.append(pid)
+            except (ProcessLookupError, PermissionError):
+                pass
+    except Exception:
+        pass
+    try:
+        _cache_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), "__pycache__")
+        if os.path.isdir(_cache_dir):
+            import shutil
+            shutil.rmtree(_cache_dir, ignore_errors=True)
+    except Exception:
+        pass
+    if killed:
+        print(f"[anti-stall] killed {len(killed)} stale proxy process(es): {killed}", flush=True)
+        time.sleep(1)
+
 def main():
    global SERVER, _START_TIME
    _START_TIME = time.time()
+    _anti_stall_cleanup()
    _init_runtime()
    try:
        _current_cfg = os.path.basename(args.config) if args.config else ""
--- a/translate-proxy.py
+++ b/translate-proxy.py
@@ -157,6 +157,7 @@ Architecture:

 import json, http.server, socketserver, urllib.request, urllib.parse, urllib.error, re
 import time, uuid, os, sys, argparse, threading, socket, collections, contextlib, signal
+import secrets, string
 import dataclasses
 import http.client
 import selectors
@@ -246,6 +247,11 @@ REASONING_ENABLED = True
 REASONING_EFFORT = "medium"
 FORCE_MODEL = ""
 BGP_ROUTES = []
+PROMPT_ENHANCER = False
+PROMPT_ENHANCER_MODE = "offline"
+PROMPT_ENHANCER_MODEL = ""
+PROMPT_ENHANCER_URL = ""
+PROMPT_ENHANCER_KEY = ""
 SERVER = None

 if _IS_WINDOWS:
@@ -310,7 +316,7 @@ _conn_pool = {}

 _STREAM_IDLE_TIMEOUT = 300

-_CODEBUFF_AUTH_URL = "https://codebuff.com"
+_CODEBUFF_AUTH_URL = "https://www.codebuff.com"
 _CODEBUFF_API_URL = "https://www.codebuff.com"
 _CODEBUFF_AGENT_MAP = {
    "deepseek/deepseek-v4-pro": "base2-free-deepseek",
@@ -350,11 +356,11 @@ def _codebuff_get_session(token, model):
            return sc["instance_id"]
    try:
        url = f"{_CODEBUFF_API_URL}/api/v1/freebuff/session"
-        body = json.dumps({"model": model}).encode()
+        body = json.dumps({}).encode()
        req = urllib.request.Request(url, data=body, headers={
            "Content-Type": "application/json",
            "Authorization": f"Bearer {token}",
-            "User-Agent": "codex-launcher/3.10.4",
+            "User-Agent": "ai-sdk/openai-compatible/1.0.25/codebuff",
            "x-codebuff-model": model,
        })
        try:
@@ -402,7 +408,7 @@ def _codebuff_start_run(token, agent_id):
    req = urllib.request.Request(url, data=body, headers={
        "Content-Type": "application/json",
        "Authorization": f"Bearer {token}",
-        "User-Agent": "codex-launcher/3.10.4",
+        "User-Agent": "ai-sdk/openai-compatible/1.0.25/codebuff",
    })
    try:
        resp = urllib.request.urlopen(req, timeout=15)
@@ -435,7 +441,7 @@ def _codebuff_finish_run(token, run_id, status="completed"):
    req = urllib.request.Request(url, data=body, headers={
        "Content-Type": "application/json",
        "Authorization": f"Bearer {token}",
-        "User-Agent": "codex-launcher/3.10.4",
+        "User-Agent": "ai-sdk/openai-compatible/1.0.25/codebuff",
    })
    try:
        urllib.request.urlopen(req, timeout=10)
@@ -718,7 +724,6 @@ _GEMINI_AGENT_GUARDRAIL = (
    "Always emit the actual tool call in the same response."
 )

-_LOG_FILE = None
 _LOG_FILE_LOCK = threading.Lock()

 def _fetch_antigravity_version():
@@ -769,7 +774,7 @@ def _ensure_antigravity_version():
 def _init_runtime():
    global CONFIG, PORT, BACKEND, TARGET_URL, API_KEY, OAUTH_PROVIDER, _antigravity_version
    global MODELS, CC_VERSION, REASONING_ENABLED, REASONING_EFFORT, BGP_ROUTES
-    global _api_key_pool
+    global _api_key_pool, PROMPT_ENHANCER

    CONFIG = load_config()
    PORT = CONFIG["port"]
@@ -782,6 +787,11 @@ def _init_runtime():
    REASONING_ENABLED = CONFIG.get("reasoning_enabled", True)
    REASONING_EFFORT = CONFIG.get("reasoning_effort", "medium")
    FORCE_MODEL = (CONFIG.get("force_model") or "").strip()
+    PROMPT_ENHANCER = CONFIG.get("prompt_enhancer", False)
+    PROMPT_ENHANCER_MODE = CONFIG.get("prompt_enhancer_mode", "offline")
+    PROMPT_ENHANCER_MODEL = CONFIG.get("prompt_enhancer_model", "")
+    PROMPT_ENHANCER_URL = CONFIG.get("prompt_enhancer_url", "")
+    PROMPT_ENHANCER_KEY = CONFIG.get("prompt_enhancer_key", "")
    BGP_ROUTES = CONFIG.get("bgp_routes", [])
    _api_key_pool = None
    if API_KEY and "," in API_KEY and not OAUTH_PROVIDER.startswith("google") and BACKEND not in ("codebuff", "freebuff"):
@@ -1290,6 +1300,26 @@ def forwarded_headers(request_headers, extra=None, browser_ua=False):
        headers.update(extra)
    return headers

+def _openrouter_extra():
+    if not TARGET_URL:
+        return {}
+    if "z.ai" in TARGET_URL:
+        return {
+            "HTTP-Referer": "https://openclaw.ai",
+            "X-OpenRouter-Title": "OpenClaw",
+            "X-OpenRouter-Categories":
+                "cli-agent,cloud-agent,programming-app,creative-writing,"
+                "writing-assistant,general-chat,personal-agent",
+        }
+    if "openrouter.ai" in TARGET_URL:
+        return {
+            "HTTP-Referer": "https://chats-llm.com",
+            "X-OpenRouter-Title": "Chats-LLM",
+            "X-OpenRouter-Categories": "general-chat, ide-extension",
+            "X-OpenRouter-Cache": "true",
+        }
+    return {}
+
 _MAX_INPUT_ITEMS = 30
 _MAX_TOOL_OUTPUT_CHARS = 8000
 _COMPACT_KEEP_RECENT = 10
@@ -1297,8 +1327,8 @@ _COMPACT_KEEP_RECENT = 10
 _CROF_ADAPTIVE = {
    "fail_history": [],
    "model_limits": {},
-    "global_item_limit": 30,
-    "min_keep_recent": 4,
+    "global_item_limit": 80,
+    "min_keep_recent": 6,
 }

 _BGP_STATS_PATH = os.path.join(_LOG_DIR, "bgp-route-stats.json")
@@ -1366,6 +1396,8 @@ def _sorted_bgp_routes():
    return sorted(BGP_ROUTES, key=lambda r: _score_route(r, stats))

 def _crof_record(model, n_items, success):
+    if TARGET_URL and "crof.ai" not in TARGET_URL:
+        return
    if not isinstance(n_items, int) or n_items < 1:
        return
    entry = {"model": model, "items": n_items, "ok": success}
@@ -1391,7 +1423,8 @@ def _crof_record(model, n_items, success):
            global_limit = v["limit"]
    _CROF_ADAPTIVE["global_item_limit"] = global_limit

-    print(f"[crof-adaptive] model={model} items={n_items} {'OK' if success else 'FAIL'} -> limit={ml.get('limit',30)} global={global_limit}", file=sys.stderr)
+    if TARGET_URL and "crof.ai" in TARGET_URL:
+        print(f"[crof-adaptive] model={model} items={n_items} {'OK' if success else 'FAIL'} -> limit={ml.get('limit',30)} global={global_limit}", file=sys.stderr)

 def _crof_item_limit(model):
    ml = _CROF_ADAPTIVE["model_limits"].get(model, {})
@@ -1436,7 +1469,8 @@ def _crof_compact_for_retry(input_data, model):
        summary_lines.append(_item_summary(item, max_len=120))

    summary_msg = {"type": "message", "role": "user", "content": [{"type": "input_text", "text": "\n".join(summary_lines)}]}
-    print(f"[crof-adaptive] RETRY compact: {len(input_data)} -> {len(head)+1+len(tail)} (limit={limit}, keep={len(tail)})", file=sys.stderr)
+    if TARGET_URL and "crof.ai" in TARGET_URL:
+        print(f"[crof-adaptive] RETRY compact: {len(input_data)} -> {len(head)+1+len(tail)} (limit={limit}, keep={len(tail)})", file=sys.stderr)
    return head + [summary_msg] + tail

 def _item_summary(item, max_len=200):
@@ -1590,6 +1624,10 @@ _PROVIDER_POLICIES = {
                   "tool_output_limit": 6000, "max_input_items": 35, "compaction": "balanced"},
    "openadapter": {"reasoning_mode": "off", "max_tokens": 32768, "strip_reasoning": True,
                    "tool_output_limit": 6000, "max_input_items": 30, "compaction": "balanced"},
+    "cloudcode-pa": {"compaction": "aggressive", "context_size": 1000000,
+                     "tool_output_limit": 6000, "max_input_items": 60},
+    "googleapis": {"compaction": "balanced", "context_size": 1000000,
+                   "tool_output_limit": 6000, "max_input_items": 80},
 }

 def provider_policy(target_url=None, backend=None):
@@ -1608,12 +1646,14 @@ _MODEL_CONTEXT = {
    "claude-sonnet": 200000, "claude-haiku": 200000,
    "glm-5.1": 128000, "glm-5": 128000, "glm-4": 128000,
    "deepseek": 64000, "gemini-2.5-flash": 1000000, "gemini-2.5-pro": 2000000,
+    "gemini-3-flash": 1000000, "gemini-3.5-flash-low": 1000000,
+    "gemini-3.1-pro-low": 2000000,
    "gemini-3.5-flash": 1000000, "gemini-3.1-pro": 2000000,
    "Gemini 3.5 Flash": 1000000, "Gemini 3.1 Pro": 2000000,
    "Claude Sonnet 4.6": 200000, "Claude Opus 4.6": 200000,
    "GPT-OSS 120B": 128000,
-    "claude-sonnet-4.6-thinking": 200000, "claude-opus-4.6-thinking": 200000,
-    "gpt-oss-120b": 128000,
+    "claude-sonnet-4-6": 200000, "claude-opus-4-6-thinking": 200000,
+    "gpt-oss-120b-medium": 128000,
    "mimo": 32768, "minimax": 32768, "kimi": 128000,
    "_default": 32768,
 }
@@ -1641,7 +1681,7 @@ def _estimate_tokens(obj):
 def _adaptive_compact(input_data, model, policy=None):
    policy = policy or {}
    context_size = int(policy.get("context_size", _context_limit_for_model(model)))
-    input_budget = int(context_size * 0.60)
+    input_budget = int(context_size * 0.80)
    estimated = _estimate_tokens(input_data)
    if estimated <= input_budget:
        return input_data, False
@@ -1684,6 +1724,120 @@ def _adaptive_compact(input_data, model, policy=None):
          f"items {len(input_data)}->{len(head)+1+len(tail)}", file=sys.stderr)
    return head + [summary_msg] + tail, True

+# ═══════════════════════════════════════════════════════════════════
+# Prompt Enhancer
+# ═══════════════════════════════════════════════════════════════════
+
+_PROMPT_ENHANCER_SYSTEM = """You are a prompt enhancement assistant for a coding agent (Codex CLI).
+Your job: rewrite the user's latest message to be clearer, more specific, and more actionable.
+Rules:
+- Preserve the user's EXACT intent — never change what they want done
+- Add explicit action verbs and step-by-step clarity
+- If the message is vague ("fix it", "make it better"), infer context from prior conversation summary and make it specific
+- Keep the enhanced prompt concise — no longer than 2x the original
+- If the original prompt is already clear and specific, return it unchanged
+- Output ONLY the enhanced prompt text, nothing else
+- Never add tasks the user didn't ask for"""
+
+_PROMPT_ENHANCER_OFFLINE = """<prompt-enhancer>
+<instructions>
+You are a coding agent operating inside a context-compacted session. Follow these rules strictly:
+
+1. ACTION CLARITY: Re-read the user's latest message. Identify every explicit and implicit action request. Execute ALL of them — do not skip any.
+
+2. COMPACTED CONTEXT: Previous conversation was summarized. The summary preserves your task history but may lose details. If the user references earlier work ("fix that", "continue", "update it"), infer from the compacted summary what was done and what remains.
+
+3. NO CLARIFICATION ASKING: Never ask "which file?" or "what exactly?" — infer from context. If truly ambiguous, make a reasonable assumption and proceed. The user can correct you.
+
+4. DECISIVE EXECUTION: When the user says "fix", "update", "change", "add", "remove" — do it immediately in the relevant file(s). Do not describe what you would do — actually do it.
+
+5. COMPLETE EDITS: When editing files, make the FULL change requested. Do not partially apply edits or leave placeholders.
+
+6. PRESERVE WORKING STATE: Never break existing functionality. If changing code, keep all surrounding logic intact.
+
+7. MULTI-STEP REQUESTS: If the user asks for multiple things, do ALL of them in sequence. Do not stop after the first one.
+</instructions>
+</prompt-enhancer>
+
+"""
+
+def _enhance_prompt_llm(text, compaction_summary=""):
+    global PROMPT_ENHANCER_MODEL, PROMPT_ENHANCER_URL, PROMPT_ENHANCER_KEY
+    if not PROMPT_ENHANCER_MODEL or not PROMPT_ENHANCER_URL:
+        return text
+    try:
+        messages = [
+            {"role": "system", "content": _PROMPT_ENHANCER_SYSTEM},
+        ]
+        if compaction_summary:
+            messages.append({"role": "user", "content": f"Context from earlier conversation (compacted):\n{compaction_summary[:2000]}"})
+        messages.append({"role": "user", "content": f"Enhance this prompt:\n{text}"})
+        body = json.dumps({"model": PROMPT_ENHANCER_MODEL, "messages": messages, "max_tokens": 2000, "temperature": 0.3}).encode()
+        headers = {"Content-Type": "application/json"}
+        if PROMPT_ENHANCER_KEY:
+            headers["Authorization"] = f"Bearer {PROMPT_ENHANCER_KEY}"
+        req = urllib.request.Request(f"{PROMPT_ENHANCER_URL.rstrip('/')}/chat/completions", data=body, headers=headers)
+        resp = urllib.request.urlopen(req, timeout=15)
+        data = json.loads(resp.read())
+        enhanced = data.get("choices", [{}])[0].get("message", {}).get("content", "").strip()
+        if enhanced and len(enhanced) >= len(text) * 0.5:
+            print(f"[prompt-enhancer] AI enhanced: {text[:80]}... -> {enhanced[:80]}...", file=sys.stderr)
+            return enhanced
+    except Exception as e:
+        print(f"[prompt-enhancer] AI enhancement failed: {e}", file=sys.stderr)
+    return text
+
+def _apply_prompt_enhancer(input_data):
+    global PROMPT_ENHANCER_MODE
+    if not isinstance(input_data, list) or len(input_data) == 0:
+        return input_data
+    last_user_idx = None
+    for i in range(len(input_data) - 1, -1, -1):
+        item = input_data[i]
+        if isinstance(item, dict) and item.get("type") == "message" and item.get("role") == "user":
+            last_user_idx = i
+            break
+    if last_user_idx is None:
+        return input_data
+    item = input_data[last_user_idx]
+    content = item.get("content", "")
+    if isinstance(content, list):
+        text = content[0].get("text", "") if content else ""
+    elif isinstance(content, str):
+        text = content
+    else:
+        return input_data
+    if not text or len(text) < 5:
+        return input_data
+    if text.startswith("<prompt-enhancer>"):
+        return input_data
+    compaction_summary = ""
+    for it in input_data:
+        if isinstance(it, dict) and it.get("type") == "message" and it.get("role") == "user":
+            c = it.get("content", "")
+            t = ""
+            if isinstance(c, list):
+                t = c[0].get("text", "") if c else ""
+            elif isinstance(c, str):
+                t = c
+            if "[Auto-compacted:" in t:
+                compaction_summary = t[:3000]
+                break
+    if PROMPT_ENHANCER_MODE == "ai-powered" and PROMPT_ENHANCER_MODEL and PROMPT_ENHANCER_URL:
+        enhanced = _enhance_prompt_llm(text, compaction_summary)
+    else:
+        enhanced = text
+    enhanced = _PROMPT_ENHANCER_OFFLINE + enhanced
+    new_item = dict(item)
+    if isinstance(item.get("content"), list):
+        new_item["content"] = [{"type": "input_text", "text": enhanced}]
+    else:
+        new_item["content"] = enhanced
+    result = list(input_data)
+    result[last_user_idx] = new_item
+    print(f"[prompt-enhancer] mode={PROMPT_ENHANCER_MODE} enhanced last user message ({len(text)}->{len(enhanced)} chars)", file=sys.stderr)
+    return result
+
 # ═══════════════════════════════════════════════════════════════════
 # Tool-call pairing validator
 # ═══════════════════════════════════════════════════════════════════
@@ -4103,6 +4257,177 @@ def _auto_continue_gemini(handler, flush_event, message_id, model, gen_config, g
            break
    return accumulated_text

+_ANTIGRAVITY_MAX_CONTENTS = 20
+_ANTIGRAVITY_MAX_TOOL_VERBATIM = 2
+_ANTIGRAVITY_MAX_TOOL_CHARS = 2000
+_ANTIGRAVITY_MAX_OLD_SUMMARY_CHARS = 1200
+_ANTIGRAVITY_SOFT_CHARS = 120000
+_ANTIGRAVITY_HARD_CHARS = 250000
+_ANTIGRAVITY_EMERGENCY_CHARS = 500000
+_ANTIGRAVITY_SIMPLE_WORDS = frozenset({"hi", "hello", "hey", "test", "ping", "thanks", "thank you", "ok", "okay", "yes", "no", "cool", "nice", "good", "great", "done", "go", "stop", "yep", "nope", "sure", "right", "correct", "continue", "cont", "k", "thx", "ty", "np", "lol", "brb", "bye"})
+_ANTIGRAVITY_EDIT_WORDS = frozenset(("change", "fix", "update", "redesign", "rewrite", "modify", "improve", "replace", "edit", "make it", "add", "remove", "delete", "rename", "move", "convert", "create", "build", "implement"))
+_ANTIGRAVITY_REFERENCE_WORDS = frozenset(("previous", "file", "error", "again", "that", "this", "it", "same", "last", "above", "earlier", "before", "earlier output", "last error", "previous result", "what was", "show me", "give me"))
+
+def _antigravity_is_simple_user(text):
+    if not text:
+        return True
+    stripped = text.strip().lower()
+    if stripped in _ANTIGRAVITY_SIMPLE_WORDS:
+        return True
+    if len(stripped) < 30:
+        words = set(stripped.split())
+        if not words.intersection(_ANTIGRAVITY_REFERENCE_WORDS) and not words.intersection(_ANTIGRAVITY_EDIT_WORDS):
+            return True
+    return False
+
+def _antigravity_normalize_context(input_data):
+    if not isinstance(input_data, list) or len(input_data) < 2:
+        return input_data
+
+    latest_user = ""
+    latest_user_idx = -1
+    for i in range(len(input_data) - 1, -1, -1):
+        item = input_data[i]
+        if isinstance(item, dict) and item.get("type") == "message" and item.get("role") == "user":
+            c = item.get("content", "")
+            if isinstance(c, str):
+                latest_user = c
+            elif isinstance(c, list):
+                latest_user = "\n".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict))
+            latest_user_idx = i
+            break
+
+    if not latest_user:
+        return input_data
+
+    is_simple = _antigravity_is_simple_user(latest_user)
+
+    n_raw = len(input_data)
+    n_tool_outputs = sum(1 for it in input_data if isinstance(it, dict) and it.get("type") == "function_call_output")
+    n_tool_calls = sum(1 for it in input_data if isinstance(it, dict) and it.get("type") == "function_call")
+
+    auto_reset = (n_raw > 200 or n_tool_outputs > 20) and is_simple
+    if os.environ.get("ANTIGRAVITY_AUTO_RESET_POLLUTED_CONTEXT", "1") != "1":
+        auto_reset = False
+
+    if is_simple and (auto_reset or n_tool_outputs == 0):
+        system_items = [it for it in input_data if isinstance(it, dict) and it.get("type") == "message" and it.get("role") in ("developer", "system")]
+        user_item = input_data[latest_user_idx]
+        result = system_items + [user_item] if system_items else [user_item]
+        print(f"[antigravity-context] raw_items={n_raw} compacted_items={n_raw} final_items={len(result)}", file=sys.stderr)
+        print(f"[antigravity-context] raw_tool_outputs={n_tool_outputs} kept_tool_outputs=0", file=sys.stderr)
+        print(f"[antigravity-context] simple_latest_user=true auto_reset={auto_reset}", file=sys.stderr)
+        return result
+
+    dev_messages = []
+    recent_items = []
+    tool_outputs = []
+    other_items = []
+
+    for i, item in enumerate(input_data):
+        if not isinstance(item, dict):
+            continue
+        t = item.get("type")
+        if t == "message" and item.get("role") in ("developer", "system"):
+            dev_messages.append(item)
+        elif t == "function_call_output":
+            tool_outputs.append((i, item))
+        elif t in ("function_call",):
+            other_items.append((i, item))
+        elif t == "message":
+            recent_items.append((i, item))
+
+    latest_words = set(latest_user.strip().lower().split())
+    has_edit_intent = bool(latest_words.intersection(_ANTIGRAVITY_EDIT_WORDS))
+    has_ref_intent = bool(latest_words.intersection(_ANTIGRAVITY_REFERENCE_WORDS))
+    keep_tools = 2 if (has_edit_intent or has_ref_intent) else 1
+
+    kept_tools = tool_outputs[-keep_tools:] if tool_outputs and (has_edit_intent or has_ref_intent) else []
+
+    for idx_t, t_item in enumerate(kept_tools):
+        orig = t_item[1]
+        out = orig.get("output", "")
+        if isinstance(out, str) and len(out) > _ANTIGRAVITY_MAX_TOOL_CHARS:
+            new_item = dict(orig)
+            new_item["output"] = out[:_ANTIGRAVITY_MAX_TOOL_CHARS] + f"\n... [truncated: kept {_ANTIGRAVITY_MAX_TOOL_CHARS} of {len(out)} chars]"
+            kept_tools[idx_t] = (t_item[0], new_item)
+
+    n_summarized = len(tool_outputs) - len(kept_tools)
+
+    tail_start = max(0, len(recent_items) - 6)
+    recent_tail = recent_items[tail_start:]
+
+    tool_call_ids = set()
+    for _, t_item in kept_tools:
+        cid = t_item.get("call_id", t_item.get("id", ""))
+        if cid:
+            tool_call_ids.add(cid)
+
+    paired_calls = []
+    for idx, item in other_items:
+        cid = item.get("call_id", item.get("id", ""))
+        if cid in tool_call_ids:
+            paired_calls.append((idx, item))
+
+    result = list(dev_messages)
+
+    if n_summarized > 0:
+        summary_text = f"[Tool history summary: {n_summarized} older tool outputs omitted. {n_tool_calls} prior function calls were made for file inspection/editing.]"
+        result.append({"type": "message", "role": "user", "content": [{"type": "input_text", "text": summary_text}]})
+
+    for _, call_item in paired_calls:
+        result.append(call_item)
+
+    for _, tool_item in kept_tools:
+        result.append(tool_item)
+
+    for _, msg_item in recent_tail:
+        if msg_item is not input_data[latest_user_idx]:
+            result.append(msg_item)
+
+    latest_norm = " ".join(latest_user.strip().split())[:200].lower()
+    already_present = False
+    for r in result:
+        if isinstance(r, dict) and r.get("type") == "message" and r.get("role") == "user":
+            c = r.get("content", "")
+            if isinstance(c, str):
+                rn = " ".join(c.strip().split())[:200].lower()
+            elif isinstance(c, list):
+                combined = " ".join(p.get("text", p.get("input_text", "")) for p in c if isinstance(p, dict))
+                rn = " ".join(combined.strip().split())[:200].lower()
+            else:
+                rn = ""
+            if rn == latest_norm:
+                already_present = True
+                break
+
+    if not already_present:
+        result.append(input_data[latest_user_idx])
+
+    total_chars = sum(len(json.dumps(it, ensure_ascii=False)) for it in result)
+
+    if total_chars > _ANTIGRAVITY_EMERGENCY_CHARS:
+        print(f"[antigravity-context] EMERGENCY: {total_chars} chars exceeds limit, resetting to minimal", file=sys.stderr)
+        result = list(dev_messages) + [input_data[latest_user_idx]]
+        total_chars = sum(len(json.dumps(it, ensure_ascii=False)) for it in result)
+
+    while len(result) > _ANTIGRAVITY_MAX_CONTENTS and total_chars > _ANTIGRAVITY_SOFT_CHARS:
+        for i in range(1, len(result) - 1):
+            if isinstance(result[i], dict) and result[i].get("type") in ("message", "function_call_output"):
+                removed = result.pop(i)
+                total_chars -= len(json.dumps(removed, ensure_ascii=False))
+                break
+        else:
+            break
+
+    est_tokens = total_chars // 4
+    print(f"[antigravity-context] raw_items={n_raw} final_items={len(result)}", file=sys.stderr)
+    print(f"[antigravity-context] raw_tool_outputs={n_tool_outputs} kept_tool_outputs={len(kept_tools)} summarized_tool_outputs={n_summarized}", file=sys.stderr)
+    print(f"[antigravity-context] simple_latest_user={is_simple} auto_reset={auto_reset}", file=sys.stderr)
+    print(f"[antigravity-context] final_chars={total_chars} estimated_tokens={est_tokens}", file=sys.stderr)
+
+    return result
+
 class Handler(http.server.BaseHTTPRequestHandler):
    protocol_version = "HTTP/1.1"

@@ -4284,8 +4609,14 @@ class Handler(http.server.BaseHTTPRequestHandler):
                body = dict(body)
                body["input"] = input_data

+        if PROMPT_ENHANCER and isinstance(input_data, list):
+            input_data = _apply_prompt_enhancer(input_data)
+            body = dict(body)
+            body["input"] = input_data
+
        crof_limit = _crof_item_limit(model)
-        if not compacted and isinstance(input_data, list) and len(input_data) > crof_limit:
+        _crof_eligible = TARGET_URL and "crof.ai" in TARGET_URL
+        if _crof_eligible and not compacted and isinstance(input_data, list) and len(input_data) > crof_limit:
            print(f"[crof-adaptive] proactive compact: {len(input_data)} items > limit {crof_limit}", file=sys.stderr)
            input_data = _crof_compact_for_retry(input_data, model)
            body = dict(body)
@@ -4310,6 +4641,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
            fwd = forwarded_headers(self.headers, {
                "Content-Type": "application/json",
                "Authorization": f"Bearer {effective_key}",
+                **_openrouter_extra(),
            }, browser_ua=True)
            print(f"[{self._session_id}] POST {target} model={model} stream={stream} items={len(input_data) if isinstance(input_data,list) else 1}", file=sys.stderr)
            chat_body_b = json.dumps(chat_body).encode()
@@ -4456,6 +4788,16 @@ class Handler(http.server.BaseHTTPRequestHandler):
                body = dict(body)
                body["input"] = input_data

+        if PROMPT_ENHANCER and isinstance(input_data, list):
+            input_data = _apply_prompt_enhancer(input_data)
+            body = dict(body)
+            body["input"] = input_data
+
+        if OAUTH_PROVIDER == "google-antigravity" and isinstance(input_data, list):
+            input_data = _antigravity_normalize_context(input_data)
+            body = dict(body)
+            body["input"] = input_data
+
        access_token = _refresh_oauth_token()
        token_name = "google-antigravity-oauth-token.json" if OAUTH_PROVIDER == "google-antigravity" else "google-cli-oauth-token.json"
        token_path = os.path.join(os.path.expanduser("~"), ".cache", "codex-proxy", token_name)
@@ -4576,7 +4918,26 @@ class Handler(http.server.BaseHTTPRequestHandler):
        if body.get("top_p") is not None:
            gen_config["topP"] = body["top_p"]

-        if REASONING_ENABLED and REASONING_EFFORT != "none":
+        _is_claude_model = "claude" in model.lower()
+        _is_claude_thinking = _is_claude_model and "thinking" in model.lower()
+
+        if OAUTH_PROVIDER == "google-antigravity" and _is_claude_thinking:
+            if REASONING_ENABLED and REASONING_EFFORT != "none":
+                budget = {"low": 8192, "medium": 16384, "high": 32768}.get(REASONING_EFFORT, 16384)
+            else:
+                budget = 16384
+            gen_config["thinkingConfig"] = {
+                "include_thoughts": True,
+                "thinking_budget": budget,
+            }
+            current_max = gen_config.get("maxOutputTokens", 0)
+            if not current_max or current_max <= budget:
+                gen_config["maxOutputTokens"] = 64000
+            print(f"[antigravity-claude] thinking model={model} budget={budget} maxOutputTokens={gen_config.get('maxOutputTokens')}", file=sys.stderr)
+        elif OAUTH_PROVIDER == "google-antigravity" and _is_claude_model:
+            if "thinkingConfig" in gen_config:
+                del gen_config["thinkingConfig"]
+        elif REASONING_ENABLED and REASONING_EFFORT != "none":
            budget = {"low": 2048, "medium": 8192, "high": 24576}.get(REASONING_EFFORT, 8192)
            gen_config["thinkingConfig"] = {"includeThoughts": True, "thinkingBudget": budget}

@@ -4656,6 +5017,11 @@ class Handler(http.server.BaseHTTPRequestHandler):
        if gemini_tools:
            request_body["tools"] = gemini_tools

+        if OAUTH_PROVIDER == "google-antigravity" and _is_claude_model and gemini_tools:
+            request_body["toolConfig"] = {"functionCallingConfig": {"mode": "VALIDATED"}}
+            if _is_claude_thinking:
+                print(f"[antigravity-claude] applied VALIDATED toolConfig for thinking model", file=sys.stderr)
+
        wrapped = {
            "project": project_id,
            "model": model,
@@ -4666,13 +5032,17 @@ class Handler(http.server.BaseHTTPRequestHandler):
            wrapped["userAgent"] = "antigravity"
            wrapped["requestId"] = f"agent-{uuid.uuid4().hex[:12]}"

-        endpoints = ([
-            "https://daily-cloudcode-pa.sandbox.googleapis.com",
-            "https://autopush-cloudcode-pa.sandbox.googleapis.com",
-            "https://cloudcode-pa.googleapis.com",
-        ] if OAUTH_PROVIDER == "google-antigravity" else [
-            "https://cloudcode-pa.googleapis.com",
-        ])
+        _allow_staging = os.environ.get("ALLOW_ANTIGRAVITY_STAGING", "0") == "1"
+        if OAUTH_PROVIDER == "google-antigravity":
+            _antigravity_endpoints = ["https://cloudcode-pa.googleapis.com"]
+            if _allow_staging:
+                _antigravity_endpoints.extend([
+                    "https://daily-cloudcode-pa.sandbox.googleapis.com",
+                    "https://autopush-cloudcode-pa.sandbox.googleapis.com",
+                ])
+            endpoints = _antigravity_endpoints
+        else:
+            endpoints = ["https://cloudcode-pa.googleapis.com"]
        action = "streamGenerateContent" if stream else "generateContent"
        url_suffix = f"v1internal:{action}?alt=sse" if stream else f"v1internal:{action}"

@@ -4699,6 +5069,9 @@ class Handler(http.server.BaseHTTPRequestHandler):
            except Exception:
                pass

+        if OAUTH_PROVIDER == "google-antigravity":
+            print(f"[antigravity-endpoint] endpoints={[e.replace('https://','') for e in endpoints]} project={project_id}", file=sys.stderr)
+
        for ep in endpoints:
            target = f"{ep}/{url_suffix}"
            req = urllib.request.Request(target, data=body_b, headers=headers)
@@ -4715,7 +5088,10 @@ class Handler(http.server.BaseHTTPRequestHandler):
                        print(f"[{self._session_id}] saved 400 debug request to {debug_path}", file=sys.stderr)
                    except Exception:
                        pass
-                if e.code == 429 and ep != endpoints[-1]:
+                if e.code == 403 and "SERVICE_DISABLED" in err_body[:500] and ep != endpoints[-1]:
+                    print(f"[{self._session_id}] {ep} SERVICE_DISABLED, trying next endpoint", file=sys.stderr)
+                    continue
+                if e.code == 429 and ep != endpoints[-1] and _allow_staging:
                    print(f"[{self._session_id}] {ep} HTTP 429, trying next endpoint", file=sys.stderr)
                    continue
                if e.code == 429:
@@ -4936,6 +5312,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
            fwd = forwarded_headers(self.headers, {
                "Content-Type": "application/json",
                "Authorization": f"Bearer {r_key}",
+                **_openrouter_extra(),
            }, browser_ua=True)
            print(f"[{self._session_id}] trying route '{route.get('name', r_url)}' model={r_model}", file=sys.stderr)
            req = urllib.request.Request(target, data=json.dumps(chat_body).encode(), headers=fwd)
@@ -5079,7 +5456,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
                        print(f"[provider-sensor] synthetic retry failed: {e}", file=sys.stderr)

            # Auto-retry on finish_reason=length with no content due to too much context.
-            if finish_reason == "length" and not has_content and isinstance(input_data, list) and len(input_data) > 5:
+            if finish_reason == "length" and not has_content and isinstance(input_data, list) and len(input_data) > 5 and TARGET_URL and "crof.ai" in TARGET_URL:
                print(f"[crof-adaptive] RETRY: finish_reason=length with no content, compacting {n_items} items", file=sys.stderr)
                new_input = _crof_compact_for_retry(input_data, model)
                if len(new_input) < len(input_data):
@@ -5198,6 +5575,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
                "Content-Type": "application/json",
                "x-api-key": API_KEY,
                "anthropic-version": "2023-06-01",
+                **_openrouter_extra(),
            }),
        )
        self._forward(req, stream, model,
@@ -5265,7 +5643,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
                "threadId": thread_id,
            }

-            fwd = forwarded_headers(self.headers, headers_extra, browser_ua=True)
+            fwd = forwarded_headers(self.headers, {**headers_extra, **_openrouter_extra()}, browser_ua=True)
            print(f"[{self._session_id}] POST {target} model={model} stream={stream} attempt={attempt} [command-code]", file=sys.stderr)
            req = urllib.request.Request(
                target,
@@ -5417,9 +5795,10 @@ class Handler(http.server.BaseHTTPRequestHandler):
             metadata = {
                 "run_id": run_id,
                 "cost_mode": "free",
+                 "client_id": "".join(secrets.choice(string.digits + string.ascii_lowercase) for _ in range(13)),
             }
             if instance_id:
-                 metadata["codebuff_instance_id"] = instance_id
+                 metadata["freebuff_instance_id"] = instance_id

             chat_body = {
                 "model": model,
@@ -5441,7 +5820,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
             headers = {
                 "Content-Type": "application/json",
                 "Authorization": f"Bearer {token}",
-                 "User-Agent": "codex-launcher/3.10.4",
+                 "User-Agent": "ai-sdk/openai-compatible/1.0.25/codebuff",
                 "x-codebuff-model": model,
             }
             if instance_id:
@@ -5589,9 +5968,9 @@ class Handler(http.server.BaseHTTPRequestHandler):
        instance_id = _codebuff_get_session(token, model)
        messages = _cb_input_to_messages(input_data, instructions)
        _codebuff_hard_disable_reasoning(messages)
-        metadata = {"run_id": run_id, "cost_mode": "free"}
+        metadata = {"run_id": run_id, "cost_mode": "free", "client_id": secrets.token_hex(7)[:13]}
        if instance_id:
-            metadata["codebuff_instance_id"] = instance_id
+            metadata["freebuff_instance_id"] = instance_id
        chat_body = {
            "model": model, "messages": messages, "stream": stream,
            "max_tokens": max(body.get("max_output_tokens", 0), 64000),
@@ -5607,7 +5986,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
        if body.get("tool_choice"):
            chat_body["tool_choice"] = body["tool_choice"]
        target = f"{_CODEBUFF_API_URL}/api/v1/chat/completions"
-        headers = {"Content-Type": "application/json", "Authorization": f"Bearer {token}", "User-Agent": "codex-launcher/3.10.4", "x-codebuff-model": model}
+        headers = {"Content-Type": "application/json", "Authorization": f"Bearer {token}", "User-Agent": "ai-sdk/openai-compatible/1.0.25/codebuff", "x-codebuff-model": model}
        if instance_id:
            headers["x-codebuff-instance-id"] = instance_id
        print(f"[codebuff] retry POST {target} model={model} stream={stream} run={run_id} (thinking disabled via DeepSeek native)", file=sys.stderr)
@@ -5798,7 +6177,7 @@ class Handler(http.server.BaseHTTPRequestHandler):
                    req_body["reasoning_effort"] = REASONING_EFFORT

            req_body_b = json.dumps(req_body).encode()
-            fwd = forwarded_headers(self.headers, headers_extra, browser_ua=True)
+            fwd = forwarded_headers(self.headers, {**headers_extra, **_openrouter_extra()}, browser_ua=True)
            print(f"[auto-sense] POST {target} model={model} attempt={attempt} schema={schema.hints()}", file=sys.stderr)

            req = urllib.request.Request(target, data=req_body_b, headers=fwd)
@@ -6017,6 +6396,15 @@ def main():
    global SERVER, _START_TIME
    _START_TIME = time.time()
    _init_runtime()
+    try:
+        _current_cfg = os.path.basename(args.config) if args.config else ""
+        for _f in os.listdir(_LOG_DIR):
+            if _f.startswith("proxy-") and _f.endswith(".json") and _f != _current_cfg:
+                os.remove(os.path.join(_LOG_DIR, _f))
+            if _f.startswith("models-") and _f.endswith(".json"):
+                os.remove(os.path.join(_LOG_DIR, _f))
+    except Exception:
+        pass
    signal.signal(signal.SIGINT, _handle_shutdown_signal)
    if _IS_WINDOWS:
        if hasattr(signal, "SIGBREAK"):