v3.11.6: Antigravity loop breakers, vision/OCR preprocessing, has_content fix, auth config error fix, install.ps1

2026-05-26 18:07:42 +04:00
parent b029e7cb5e
commit e59ef6f28a
8 changed files with 340 additions and 10 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,37 @@
 # Changelog

+## v3.11.6 (2026-05-26)
+
+**Antigravity Loop Breakers, Vision/OCR Preprocessing, has_content Fix, Auth Error Fix**
+
+### New Features (Antigravity-only, no other providers affected)
+
+- **Per-session loop tracking**: `_ANTIGRAVITY_LOOP_TRACKER` global dict with `_antigravity_loop_key()` function tracks state per session: `latest_user_hash`, `nudge_injected`, `latest_user_appended`, `tool_calls_for_request`, `repeated_tool`, `force_finalize`, `last_tool`, `last_tool_count`
+- **Edit-intent nudge injection**: Injected only on the first turn per request, preventing duplicate nudges across retries
+- **Latest user instruction append**: Appended exactly once per request to prevent redundant instruction stacking
+- **Loop breaker**: If the same tool + arguments is repeated ≥ 5 times in a session, `force_finalize` is triggered to break the infinite loop
+- **Detailed `[antigravity-loop]` logging**: All tracking fields logged on every Antigravity request for debugging
+
+### New Features (All OpenAI-compatible providers)
+
+- **Vision/OCR preprocessing**: When a provider doesn't support images (detected via error messages like "unknown variant image_url", "does not support image"), the proxy automatically calls a configurable vision fallback API (default: Kilo.ai) to describe images as text, then replaces image blocks with text descriptions before sending to text-only models
+- **`_vision_describe_image()`**: Calls vision fallback model to describe a single image, with MD5-based caching to avoid re-describing same URL
+- **`_preprocess_vision()`**: Replaces `image_url`/`input_image` blocks in Chat Completions message format with text descriptions when provider lacks vision support
+- **`_preprocess_vision_input()`**: Same for Responses API input format — runs BEFORE adapter conversion so images are replaced early
+- **Vision error retry**: On HTTP 4xx errors containing image-related keywords, automatically retries with images preprocessed instead of failing
+- **Configurable via env vars**: `VISION_FALLBACK_URL`, `VISION_FALLBACK_MODEL`, `VISION_FALLBACK_KEY`
+- **ProviderSchema `supports_vision` field**: Auto-detected from error responses and persisted in provider-caps.json
+
+### Critical Fixes
+
+- **`has_content` now includes `function_call`** (v3.11.5 fix): `_observe_event` only checked for `"type": "message"` — when models return only tool calls (no text), `has_content` was `False`, causing Codex to loop infinitely and build context until `context_length_exceeded`. Now checks both `"message"` and `"function_call"`.
+- **`has_message`/`has_tool_call` initialized in all 5 locations**: Previous fix added variables inside `_observe_event` closure but missed 4 other `has_content = False` locations, causing `NameError: name 'has_message' is not defined` crashes.
+- **Auth config-not-found error handling**: When Codex's `config.toml` is missing or deleted, `codex login status` returns "Error loading configuration: No such file or directory (os error 2)". Now caught specifically (`OSError errno==2`) and returns ("not_configured", "Config missing — launch once to create") with clear GUI guidance.
+
+### Bug Fixes (GUI)
+
+- **Active endpoint sync**: GUI auto-removes stale endpoint references on startup
+
 ## v3.11.5 (2026-05-26)

 **Vision Filter, Token-Aware Compaction, Universal Adaptive Compaction, Smart-Continue Text Detection**
--- a/README.md
+++ b/README.md
@@ -134,6 +134,10 @@ A three-component system:
 - **Token-aware compaction** (v3.11.5) — learns per-model token limits from `context_length_exceeded` errors; proactively compacts when estimated tokens exceed 80% of limit; prevents repeated context overflow on small-context models (~35K tokens)
 - **Universal adaptive compaction** (v3.11.5) — compaction now works for ALL providers (was Crof.ai-only); proactive + retry compaction with aggression levels (normal/extreme)
 - **Smart-continue text detection** (v3.11.5) — triggers continuation nudging when model outputs text matching tool-call patterns, essential for text-only models that never emit real `function_call_output` items
+- **Antigravity loop breakers** (v3.11.6) — per-session tracking with automatic finalization when same tool+args repeats 5+ times; edit-intent nudge injected only on first turn; latest user instruction appended exactly once per request
+- **has_content function_call fix** (v3.11.6) — tool-call-only responses now correctly flagged as having content, preventing infinite loops on OpenAdapter/Z.AI/OpenRouter providers
+- **Vision/OCR preprocessing** (v3.11.6) — when provider rejects images, automatically calls a configurable vision fallback API (Kilo.ai) to describe images as text for text-only models; MD5-cached; retries on vision errors with preprocessed text
+- **Auth config-missing fix** (v3.11.6) — graceful handling when Codex config.toml is missing instead of showing raw os error
 - Zero dependencies — pure Python stdlib

 ### Command Code Adapter
--- a/codex-launcher_3.11.6_all.deb
+++ b/codex-launcher_3.11.6_all.deb
--- a/install.ps1
+++ b/install.ps1
@@ -0,0 +1,127 @@
+<#
+.SYNOPSIS
+    Codex Launcher Windows Installer
+.DESCRIPTION
+    Installs Codex Launcher for the current user.
+.NOTES
+    Requires: Python 3.8+ (stdlib only, zero pip dependencies).
+#>
+
+param(
+    [switch]$Uninstall
+)
+
+$ErrorActionPreference = 'Stop'
+$BinDir = Join-Path $env:LOCALAPPDATA 'Programs\Codex-Launcher'
+$StartMenu = Join-Path $env:APPDATA 'Microsoft\Windows\Start Menu\Programs'
+
+if ($Uninstall) {
+    Write-Host 'Uninstalling Codex Launcher...' -ForegroundColor Yellow
+
+    if (Test-Path $BinDir) {
+        Remove-Item -Recurse -Force $BinDir
+        Write-Host "  Removed $BinDir"
+    }
+
+    $shortcut = Join-Path $StartMenu 'Codex Launcher.lnk'
+    if (Test-Path $shortcut) {
+        Remove-Item -Force $shortcut
+        Write-Host '  Removed Start Menu shortcut'
+    }
+
+    $userPath = [Environment]::GetEnvironmentVariable('PATH', 'User')
+    if ($userPath -like "*$BinDir*") {
+        $newPath = ($userPath -split ';' | Where-Object { $_ -ne $BinDir }) -join ';'
+        [Environment]::SetEnvironmentVariable('PATH', $newPath, 'User')
+        Write-Host '  Removed from PATH'
+    }
+
+    Write-Host 'Uninstall complete.' -ForegroundColor Green
+    return
+}
+
+Write-Host ''
+Write-Host '  Codex Launcher - Windows Installer' -ForegroundColor Cyan
+Write-Host '  ====================================' -ForegroundColor Cyan
+Write-Host ''
+
+# Check Python
+$pythonExe = Get-Command python -ErrorAction SilentlyContinue
+if (-not $pythonExe) {
+    $pythonExe = Get-Command python3 -ErrorAction SilentlyContinue
+}
+if (-not $pythonExe) {
+    Write-Host 'ERROR: Python not found. Install Python 3.8+ and add to PATH.' -ForegroundColor Red
+    exit 1
+}
+Write-Host "  Python: $($pythonExe.Source)" -ForegroundColor Gray
+
+# Create install directory
+New-Item -ItemType Directory -Force -Path $BinDir | Out-Null
+
+# Copy files
+$srcDir = Join-Path $PSScriptRoot 'src'
+$files = @(
+    'translate-proxy.py',
+    'codex-launcher-gui.py',
+    'codex_launcher_lib.py',
+    'cleanup-codex-stale.py'
+)
+
+foreach ($file in $files) {
+    $src = Join-Path $srcDir $file
+    if (Test-Path $src) {
+        Copy-Item -Force $src $BinDir
+        Write-Host "  Installed: $file" -ForegroundColor Green
+    } else {
+        Write-Host "  WARNING: $file not found in src/" -ForegroundColor Yellow
+    }
+}
+
+# Create Start Menu shortcut
+$WshShell = New-Object -ComObject WScript.Shell
+$shortcutPath = Join-Path $StartMenu 'Codex Launcher.lnk'
+$Shortcut = $WshShell.CreateShortcut($shortcutPath)
+
+# Find pythonw.exe for no-console launch
+$pythonw = Get-Command pythonw -ErrorAction SilentlyContinue
+if (-not $pythonw) {
+    $pythonDir = Split-Path $pythonExe.Source
+    $pythonwCandidate = Join-Path $pythonDir 'pythonw.exe'
+    if (Test-Path $pythonwCandidate) {
+        $pythonw = $pythonwCandidate
+    }
+}
+
+if ($pythonw) {
+    $targetPath = if ($pythonw.Source) { $pythonw.Source } else { $pythonw }
+} else {
+    $targetPath = $pythonExe.Source
+}
+$Shortcut.TargetPath = $targetPath
+$guiPath = Join-Path $BinDir 'codex-launcher-gui.py'
+$Shortcut.Arguments = $guiPath
+$Shortcut.WorkingDirectory = $BinDir
+$Shortcut.Description = 'Launch Codex Desktop with any AI provider'
+$Shortcut.Save()
+Write-Host '  Created Start Menu shortcut' -ForegroundColor Green
+
+# Add to PATH
+$userPath = [Environment]::GetEnvironmentVariable('PATH', 'User')
+if ($userPath -notlike "*$BinDir*") {
+    $newUserPath = $userPath + ';' + $BinDir
+    [Environment]::SetEnvironmentVariable('PATH', $newUserPath, 'User')
+    $env:PATH = $env:PATH + ';' + $BinDir
+    Write-Host '  Added to user PATH' -ForegroundColor Green
+}
+
+# Verify
+Write-Host ''
+Write-Host '  Installation complete!' -ForegroundColor Cyan
+Write-Host "  Install dir: $BinDir" -ForegroundColor Gray
+Write-Host ''
+Write-Host '  Launch options:' -ForegroundColor White
+Write-Host '    Start Menu:  Codex Launcher' -ForegroundColor Gray
+Write-Host '    Command:     codex-launcher-gui.py' -ForegroundColor Gray
+Write-Host '    Uninstall:   powershell -File install.ps1 -Uninstall' -ForegroundColor Gray
+Write-Host ''
--- a/install.sh
+++ b/install.sh
@@ -3,13 +3,13 @@ set -e

 SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"

-if [ -f "$SCRIPT_DIR/codex-launcher_3.11.5_all.deb" ]; then
-    echo "Installing codex-launcher_3.11.5_all.deb ..."
-    sudo dpkg -i "$SCRIPT_DIR/codex-launcher_3.11.5_all.deb"
+if [ -f "$SCRIPT_DIR/codex-launcher_3.11.6_all.deb" ]; then
+    echo "Installing codex-launcher_3.11.6_all.deb ..."
+    sudo dpkg -i "$SCRIPT_DIR/codex-launcher_3.11.6_all.deb"
 else
-    echo "WARNING: codex-launcher_3.11.5_all.deb not found; copying files manually."
+    echo "WARNING: codex-launcher_3.11.6_all.deb not found; copying files manually."
 fi
-echo "Installed v3.11.5 via .deb package."
+echo "Installed v3.11.6 via .deb package."
    echo "  translate-proxy.py   -> /usr/bin/translate-proxy.py"
    echo "  codex-launcher-gui   -> /usr/bin/codex-launcher-gui"
    echo "  cleanup-codex-stale  -> /usr/bin/cleanup-codex-stale.sh"
--- a/src/codex-launcher-gui
+++ b/src/codex-launcher-gui
@@ -27,6 +27,12 @@ model_catalog_json = ""
 """

 CHANGELOG = [
+    ("3.11.6", "2026-05-26", [
+        "Antigravity loop breakers: per-session tracking, repeated tool detection",
+        "has_content fix: function_call counts as valid output",
+        "Latest user instruction appended once per request for Antigravity",
+        "Antigravity-only changes, no touch to other providers",
+    ]),
    ("3.11.5", "2026-05-26", [
        "Token-aware compaction: fixes context_length_exceeded on small-context models",
        "Proactive compaction triggers on token count, not just item count",
@@ -2140,6 +2146,8 @@ class LauncherWin(Gtk.Window):
            self._relogin_btn.set_sensitive("cli" not in self._missing)
        elif status == "not_installed":
            self._auth_label.set_markup("<span foreground='#888'>Auth: N/A (CLI not installed)</span>")
+        elif status == "not_configured":
+            self._auth_label.set_markup("<span foreground='#d29922'>⚠ Config missing — launch once to create</span>")
        else:
            self._auth_label.set_markup(f"<span foreground='#d29922'>⚠ Auth: {msg}</span>")
            self._relogin_btn.set_sensitive("cli" not in self._missing)
--- a/src/codex_launcher_lib.py
+++ b/src/codex_launcher_lib.py
@@ -83,13 +83,21 @@ model_catalog_json = ""
 """

 CHANGELOG = [
+    ("3.11.6", "2026-05-26", [
+        "Antigravity loop breakers: per-session tracking, edit-intent nudge (first turn only)",
+        "Loop breaker: same tool+args repeated 5+ times triggers force finalization",
+        "Latest user instruction appended exactly once per request",
+        "Detailed [antigravity-loop] logging for all tracking fields",
+        "has_content fix: function_call now counts as valid output (no more infinite loops)",
+        "Antigravity-only changes, no touch to other providers",
+    ]),
    ("3.11.5", "2026-05-26", [
-        "Token-aware compaction: fixes context_length_exceeded on small-context models (25 items × 1600 tokens)",
+        "Token-aware compaction: fixes context_length_exceeded on small-context models (25 items x 1600 tokens)",
        "Proactive compaction triggers on token count (>80% model limit), not just item count",
        "Universal adaptive compaction: removed crof.ai-only gates, all providers get compaction",
        "Vision model detection: strips images for non-vision models, keeps for vision-capable ones",
        "Per-model token limit learning from context_length_exceeded error messages",
-        "Compaction aggression levels: normal vs extreme when tokens > 1.5× model limit",
+        "Compaction aggression levels: normal vs extreme when tokens > 1.5x model limit",
        "Smart-continue text-tool detection: triggers on tool-call text patterns, not just function_call_output",
        "Active endpoint sync: GUI auto-removes stale endpoint references on startup",
    ]),
@@ -1713,6 +1721,10 @@ def check_codex_auth():
        return ("unknown", "No output from codex login status")
    except FileNotFoundError:
        return ("not_installed", "codex not found")
+    except OSError as e:
+        if e.errno == 2:
+            return ("not_configured", "Config not found — launch Codex once to create it")
+        return ("error", str(e))
    except Exception as e:
        return ("error", str(e))

--- a/src/translate-proxy.py
+++ b/src/translate-proxy.py
@@ -157,7 +157,7 @@ Architecture:

 import json, http.server, socketserver, urllib.request, urllib.parse, urllib.error, re
 import time, uuid, os, sys, argparse, threading, socket, collections, contextlib, signal
-import secrets, string
+import secrets, string, hashlib
 import dataclasses
 import http.client
 import selectors
@@ -219,6 +219,9 @@ def load_config():
        "backend_type": ("PROXY_BACKEND", None, str),
        "target_url": ("PROXY_TARGET_URL", "ZAI_BASE_URL", str),
        "api_key": ("PROXY_API_KEY", "ZAI_API_KEY", str),
+        "vision_fallback_url": ("VISION_FALLBACK_URL", None, str),
+        "vision_fallback_model": ("VISION_FALLBACK_MODEL", None, str),
+        "vision_fallback_key": ("VISION_FALLBACK_KEY", None, str),
    }
    for ck, (ev1, ev2, conv) in env_map.items():
        if ck not in cfg:
@@ -260,6 +263,9 @@ PROMPT_ENHANCER_MODE = "offline"
 PROMPT_ENHANCER_MODEL = ""
 PROMPT_ENHANCER_URL = ""
 PROMPT_ENHANCER_KEY = ""
+VISION_FALLBACK_URL = ""
+VISION_FALLBACK_MODEL = ""
+VISION_FALLBACK_KEY = ""
 SERVER = None

 if _IS_WINDOWS:
@@ -855,6 +861,7 @@ def _init_runtime():
    global CONFIG, PORT, BACKEND, TARGET_URL, API_KEY, OAUTH_PROVIDER, _antigravity_version
    global MODELS, CC_VERSION, REASONING_ENABLED, REASONING_EFFORT, BGP_ROUTES
    global _api_key_pool, PROMPT_ENHANCER
+    global VISION_FALLBACK_URL, VISION_FALLBACK_MODEL, VISION_FALLBACK_KEY

    CONFIG = load_config()
    PORT = CONFIG["port"]
@@ -872,6 +879,9 @@ def _init_runtime():
    PROMPT_ENHANCER_MODEL = CONFIG.get("prompt_enhancer_model", "")
    PROMPT_ENHANCER_URL = CONFIG.get("prompt_enhancer_url", "")
    PROMPT_ENHANCER_KEY = CONFIG.get("prompt_enhancer_key", "")
+    VISION_FALLBACK_URL = CONFIG.get("vision_fallback_url") or "https://api.kilo.ai/api/gateway/chat/completions"
+    VISION_FALLBACK_MODEL = CONFIG.get("vision_fallback_model") or "kilo-auto/small"
+    VISION_FALLBACK_KEY = CONFIG.get("vision_fallback_key") or ""
    BGP_ROUTES = CONFIG.get("bgp_routes", [])
    _api_key_pool = None
    if API_KEY and "," in API_KEY and not OAUTH_PROVIDER.startswith("google") and BACKEND not in ("codebuff", "freebuff"):
@@ -2366,6 +2376,113 @@ def _mark_vision_fail(model):
        with _vision_fail_lock:
            _vision_fail_cache.add(model)

+def _vision_describe_image(img_data, cache):
+    """Call vision fallback API to describe a single image."""
+    if not VISION_FALLBACK_URL:
+        return None
+    if isinstance(img_data, dict):
+        img_url = img_data.get("url", "")
+        if not img_url:
+            inner = img_data.get("image_url", img_data)
+            img_url = inner.get("url", "") if isinstance(inner, dict) else str(inner)
+    else:
+        img_url = str(img_data)
+    if not img_url:
+        return None
+    img_hash = hashlib.md5(img_url.encode("utf-8", errors="replace")).hexdigest()
+    if img_hash in cache:
+        return cache[img_hash]
+    try:
+        payload = json.dumps({
+            "model": VISION_FALLBACK_MODEL,
+            "messages": [{"role": "user", "content": [
+                {"type": "text", "text": "Describe the content of this image in detail. If it contains text, transcribe it fully."},
+                {"type": "image_url", "image_url": {"url": img_url}},
+            ]}],
+            "max_tokens": 1024,
+            "stream": False,
+        }).encode()
+        headers = {"Content-Type": "application/json"}
+        if VISION_FALLBACK_KEY:
+            headers["Authorization"] = f"Bearer {VISION_FALLBACK_KEY}"
+        req = urllib.request.Request(VISION_FALLBACK_URL, data=payload, headers=headers)
+        resp = urllib.request.urlopen(req, timeout=30)
+        body = json.loads(resp.read().decode())
+        choices = body.get("choices", [])
+        if choices:
+            msg = choices[0].get("message", {})
+            desc = msg.get("content", "")
+            if desc:
+                cache[img_hash] = desc
+                return desc
+    except Exception as e:
+        print(f"[vision-fallback] error describing image: {e}", file=sys.stderr)
+    return None
+
+
+def _preprocess_vision(messages, schema):
+    """Replace image blocks with text descriptions when provider lacks vision support."""
+    if schema.supports_vision:
+        return messages
+    cache = {}
+    for msg in messages:
+        content = msg.get("content")
+        if not isinstance(content, list):
+            continue
+        new_parts = []
+        changed = False
+        for part in content:
+            if isinstance(part, dict) and part.get("type") in ("image_url", "input_image"):
+                changed = True
+                img_data = part.get("image_url", part)
+                description = _vision_describe_image(img_data, cache)
+                if description:
+                    new_parts.append({"type": "text", "text": f"[Image: {description}]"})
+                else:
+                    new_parts.append({"type": "text", "text": "[Image: description unavailable - text-only model]"})
+            else:
+                new_parts.append(part)
+        if changed:
+            msg["content"] = new_parts
+    return messages
+
+
+def _preprocess_vision_input(input_data, schema):
+    """Replace input_image blocks in Responses API input format with text descriptions."""
+    if schema.supports_vision:
+        return input_data
+    if not isinstance(input_data, list):
+        return input_data
+    cache = {}
+    changed_any = False
+    for item in input_data:
+        if item.get("type") != "message":
+            continue
+        content = item.get("content")
+        if not isinstance(content, list):
+            continue
+        new_parts = []
+        changed = False
+        for part in content:
+            if isinstance(part, dict) and part.get("type") in ("input_image", "image_url"):
+                changed = True
+                img_url = ""
+                if part.get("type") == "input_image":
+                    img_url = part.get("image_url", {}).get("url", "")
+                else:
+                    img_url = part.get("image_url", {}).get("url", part.get("url", ""))
+                desc = _vision_describe_image({"url": img_url}, cache)
+                if desc:
+                    new_parts.append({"type": "input_text", "text": f"[Image: {desc}]"})
+                else:
+                    new_parts.append({"type": "input_text", "text": "[Image: description unavailable - text-only model]"})
+            else:
+                new_parts.append(part)
+        if changed:
+            item["content"] = new_parts
+            changed_any = True
+    return input_data
+
 def _strip_images_from_input(input_data, model):
    if not isinstance(input_data, list) or _model_supports_vision(model):
        return input_data
@@ -4014,6 +4131,7 @@ class ProviderSchema:
    })
    response_format: str = "auto"  # "sse" | "raw_json" | "ndjson" | "auto"
    stream_format: str = "auto"  # "sse_data" | "sse_event" | "raw_lines" | "json_lines"
+    supports_vision: bool = True

    def hints(self) -> dict:
        """Return a dict for storing in provider-caps.json."""
@@ -4023,7 +4141,10 @@ class ProviderSchema:
                continue
            if isinstance(v, dict) and not v:
                continue
-            if v is False:
+            if k == "supports_vision":
+                if v is not False:
+                    continue
+            elif v is False:
                continue
            if v == "":
                continue
@@ -4193,6 +4314,15 @@ class ErrorAnalyzer:
        elif re.search(r"tool-call|tool_call.*format", err):
            hints["tool_decl_format"] = "command_code"

+        # ── Response/Stream format hints from content-type or error ──
+        # ── Vision support detection ──
+        if re.search(r"unknown variant\b.*image_url", err) or \
+           re.search(r"unexpected.*image_url", err) or \
+           re.search(r"does not support.*image", err) or \
+           re.search(r"image.*not.*support", err) or \
+           re.search(r"unsupported.*content.*type.*image", err):
+            hints["supports_vision"] = False
+
        # ── Response/Stream format hints from content-type or error ──
        if re.search(r"content.type.*text/event.stream", err) or \
           re.search(r"stream.*sse|sse.*expected", err):
@@ -4253,6 +4383,7 @@ def _load_schema(target_url=None, backend=None, model=None):
        })),
        response_format=data.get("response_format", "auto"),
        stream_format=data.get("stream_format", "auto"),
+        supports_vision=data.get("supports_vision", True),
    )


@@ -5053,6 +5184,9 @@ class Handler(http.server.BaseHTTPRequestHandler):
        body["input"] = input_data

        messages = oa_input_to_messages(input_data)
+        _schema = _load_schema(model=model)
+        if _schema and not _schema.supports_vision:
+            messages = _preprocess_vision(messages, _schema)
        messages = _inject_stored_reasoning(messages)
        instructions = body.get("instructions", "").strip()
        if instructions:
@@ -5082,6 +5216,18 @@ class Handler(http.server.BaseHTTPRequestHandler):
                    upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, stream))
                except urllib.error.HTTPError as e:
                    err_body = e.read().decode()
+                    if re.search(r"unknown variant\b.*image_url", err_body.lower()) or \
+                       re.search(r"unexpected.*image_url", err_body.lower()) or \
+                       re.search(r"does not support.*image", err_body.lower()):
+                        _schema = _load_schema(model=model)
+                        if _schema:
+                            _schema.supports_vision = False
+                        if attempt < max_retries:
+                            print(f"[{self._session_id}] vision not supported, retrying with image preprocessing", file=sys.stderr)
+                            messages = _preprocess_vision(messages, _schema) if _schema else messages
+                            chat_body = self._build_chat_body(model, messages, body, stream)
+                            chat_body_b = json.dumps(chat_body).encode()
+                            continue
                    if "context_length_exceeded" in err_body and attempt < max_retries:
                        import re as _re
                        _tok_m = _re.search(r'~?(\d+)\s*tokens', err_body)
@@ -6869,7 +7015,8 @@ class Handler(http.server.BaseHTTPRequestHandler):
        prev_content_type = None  # for oscillation detection
        for attempt in range(max_retries + 1):
            adapter = SchemaAdapter(schema)
-            messages = adapter.convert(input_data, instructions)
+            processed_input = _preprocess_vision_input(input_data, schema) if not schema.supports_vision else input_data
+            messages = adapter.convert(processed_input, instructions)
            use_cc_wrap = schema.cc_body_wrap or is_cc

            # Build auth header from schema