diff --git a/CHANGELOG.md b/CHANGELOG.md index b5bf12d..3a366a9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,37 @@ # Changelog +## v3.11.6 (2026-05-26) + +**Antigravity Loop Breakers, Vision/OCR Preprocessing, has_content Fix, Auth Error Fix** + +### New Features (Antigravity-only, no other providers affected) + +- **Per-session loop tracking**: `_ANTIGRAVITY_LOOP_TRACKER` global dict with `_antigravity_loop_key()` function tracks state per session: `latest_user_hash`, `nudge_injected`, `latest_user_appended`, `tool_calls_for_request`, `repeated_tool`, `force_finalize`, `last_tool`, `last_tool_count` +- **Edit-intent nudge injection**: Injected only on the first turn per request, preventing duplicate nudges across retries +- **Latest user instruction append**: Appended exactly once per request to prevent redundant instruction stacking +- **Loop breaker**: If the same tool + arguments is repeated ≥ 5 times in a session, `force_finalize` is triggered to break the infinite loop +- **Detailed `[antigravity-loop]` logging**: All tracking fields logged on every Antigravity request for debugging + +### New Features (All OpenAI-compatible providers) + +- **Vision/OCR preprocessing**: When a provider doesn't support images (detected via error messages like "unknown variant image_url", "does not support image"), the proxy automatically calls a configurable vision fallback API (default: Kilo.ai) to describe images as text, then replaces image blocks with text descriptions before sending to text-only models +- **`_vision_describe_image()`**: Calls vision fallback model to describe a single image, with MD5-based caching to avoid re-describing same URL +- **`_preprocess_vision()`**: Replaces `image_url`/`input_image` blocks in Chat Completions message format with text descriptions when provider lacks vision support +- **`_preprocess_vision_input()`**: Same for Responses API input format — runs BEFORE adapter conversion so images are replaced early +- **Vision error retry**: On HTTP 4xx errors containing image-related keywords, automatically retries with images preprocessed instead of failing +- **Configurable via env vars**: `VISION_FALLBACK_URL`, `VISION_FALLBACK_MODEL`, `VISION_FALLBACK_KEY` +- **ProviderSchema `supports_vision` field**: Auto-detected from error responses and persisted in provider-caps.json + +### Critical Fixes + +- **`has_content` now includes `function_call`** (v3.11.5 fix): `_observe_event` only checked for `"type": "message"` — when models return only tool calls (no text), `has_content` was `False`, causing Codex to loop infinitely and build context until `context_length_exceeded`. Now checks both `"message"` and `"function_call"`. +- **`has_message`/`has_tool_call` initialized in all 5 locations**: Previous fix added variables inside `_observe_event` closure but missed 4 other `has_content = False` locations, causing `NameError: name 'has_message' is not defined` crashes. +- **Auth config-not-found error handling**: When Codex's `config.toml` is missing or deleted, `codex login status` returns "Error loading configuration: No such file or directory (os error 2)". Now caught specifically (`OSError errno==2`) and returns ("not_configured", "Config missing — launch once to create") with clear GUI guidance. + +### Bug Fixes (GUI) + +- **Active endpoint sync**: GUI auto-removes stale endpoint references on startup + ## v3.11.5 (2026-05-26) **Vision Filter, Token-Aware Compaction, Universal Adaptive Compaction, Smart-Continue Text Detection** diff --git a/README.md b/README.md index 7ee8861..e2fec67 100644 --- a/README.md +++ b/README.md @@ -134,6 +134,10 @@ A three-component system: - **Token-aware compaction** (v3.11.5) — learns per-model token limits from `context_length_exceeded` errors; proactively compacts when estimated tokens exceed 80% of limit; prevents repeated context overflow on small-context models (~35K tokens) - **Universal adaptive compaction** (v3.11.5) — compaction now works for ALL providers (was Crof.ai-only); proactive + retry compaction with aggression levels (normal/extreme) - **Smart-continue text detection** (v3.11.5) — triggers continuation nudging when model outputs text matching tool-call patterns, essential for text-only models that never emit real `function_call_output` items +- **Antigravity loop breakers** (v3.11.6) — per-session tracking with automatic finalization when same tool+args repeats 5+ times; edit-intent nudge injected only on first turn; latest user instruction appended exactly once per request +- **has_content function_call fix** (v3.11.6) — tool-call-only responses now correctly flagged as having content, preventing infinite loops on OpenAdapter/Z.AI/OpenRouter providers +- **Vision/OCR preprocessing** (v3.11.6) — when provider rejects images, automatically calls a configurable vision fallback API (Kilo.ai) to describe images as text for text-only models; MD5-cached; retries on vision errors with preprocessed text +- **Auth config-missing fix** (v3.11.6) — graceful handling when Codex config.toml is missing instead of showing raw os error - Zero dependencies — pure Python stdlib ### Command Code Adapter diff --git a/codex-launcher_3.11.6_all.deb b/codex-launcher_3.11.6_all.deb new file mode 100644 index 0000000..edc0e5c Binary files /dev/null and b/codex-launcher_3.11.6_all.deb differ diff --git a/install.ps1 b/install.ps1 new file mode 100644 index 0000000..586da10 --- /dev/null +++ b/install.ps1 @@ -0,0 +1,127 @@ +<# +.SYNOPSIS + Codex Launcher Windows Installer +.DESCRIPTION + Installs Codex Launcher for the current user. +.NOTES + Requires: Python 3.8+ (stdlib only, zero pip dependencies). +#> + +param( + [switch]$Uninstall +) + +$ErrorActionPreference = 'Stop' +$BinDir = Join-Path $env:LOCALAPPDATA 'Programs\Codex-Launcher' +$StartMenu = Join-Path $env:APPDATA 'Microsoft\Windows\Start Menu\Programs' + +if ($Uninstall) { + Write-Host 'Uninstalling Codex Launcher...' -ForegroundColor Yellow + + if (Test-Path $BinDir) { + Remove-Item -Recurse -Force $BinDir + Write-Host " Removed $BinDir" + } + + $shortcut = Join-Path $StartMenu 'Codex Launcher.lnk' + if (Test-Path $shortcut) { + Remove-Item -Force $shortcut + Write-Host ' Removed Start Menu shortcut' + } + + $userPath = [Environment]::GetEnvironmentVariable('PATH', 'User') + if ($userPath -like "*$BinDir*") { + $newPath = ($userPath -split ';' | Where-Object { $_ -ne $BinDir }) -join ';' + [Environment]::SetEnvironmentVariable('PATH', $newPath, 'User') + Write-Host ' Removed from PATH' + } + + Write-Host 'Uninstall complete.' -ForegroundColor Green + return +} + +Write-Host '' +Write-Host ' Codex Launcher - Windows Installer' -ForegroundColor Cyan +Write-Host ' ====================================' -ForegroundColor Cyan +Write-Host '' + +# Check Python +$pythonExe = Get-Command python -ErrorAction SilentlyContinue +if (-not $pythonExe) { + $pythonExe = Get-Command python3 -ErrorAction SilentlyContinue +} +if (-not $pythonExe) { + Write-Host 'ERROR: Python not found. Install Python 3.8+ and add to PATH.' -ForegroundColor Red + exit 1 +} +Write-Host " Python: $($pythonExe.Source)" -ForegroundColor Gray + +# Create install directory +New-Item -ItemType Directory -Force -Path $BinDir | Out-Null + +# Copy files +$srcDir = Join-Path $PSScriptRoot 'src' +$files = @( + 'translate-proxy.py', + 'codex-launcher-gui.py', + 'codex_launcher_lib.py', + 'cleanup-codex-stale.py' +) + +foreach ($file in $files) { + $src = Join-Path $srcDir $file + if (Test-Path $src) { + Copy-Item -Force $src $BinDir + Write-Host " Installed: $file" -ForegroundColor Green + } else { + Write-Host " WARNING: $file not found in src/" -ForegroundColor Yellow + } +} + +# Create Start Menu shortcut +$WshShell = New-Object -ComObject WScript.Shell +$shortcutPath = Join-Path $StartMenu 'Codex Launcher.lnk' +$Shortcut = $WshShell.CreateShortcut($shortcutPath) + +# Find pythonw.exe for no-console launch +$pythonw = Get-Command pythonw -ErrorAction SilentlyContinue +if (-not $pythonw) { + $pythonDir = Split-Path $pythonExe.Source + $pythonwCandidate = Join-Path $pythonDir 'pythonw.exe' + if (Test-Path $pythonwCandidate) { + $pythonw = $pythonwCandidate + } +} + +if ($pythonw) { + $targetPath = if ($pythonw.Source) { $pythonw.Source } else { $pythonw } +} else { + $targetPath = $pythonExe.Source +} +$Shortcut.TargetPath = $targetPath +$guiPath = Join-Path $BinDir 'codex-launcher-gui.py' +$Shortcut.Arguments = $guiPath +$Shortcut.WorkingDirectory = $BinDir +$Shortcut.Description = 'Launch Codex Desktop with any AI provider' +$Shortcut.Save() +Write-Host ' Created Start Menu shortcut' -ForegroundColor Green + +# Add to PATH +$userPath = [Environment]::GetEnvironmentVariable('PATH', 'User') +if ($userPath -notlike "*$BinDir*") { + $newUserPath = $userPath + ';' + $BinDir + [Environment]::SetEnvironmentVariable('PATH', $newUserPath, 'User') + $env:PATH = $env:PATH + ';' + $BinDir + Write-Host ' Added to user PATH' -ForegroundColor Green +} + +# Verify +Write-Host '' +Write-Host ' Installation complete!' -ForegroundColor Cyan +Write-Host " Install dir: $BinDir" -ForegroundColor Gray +Write-Host '' +Write-Host ' Launch options:' -ForegroundColor White +Write-Host ' Start Menu: Codex Launcher' -ForegroundColor Gray +Write-Host ' Command: codex-launcher-gui.py' -ForegroundColor Gray +Write-Host ' Uninstall: powershell -File install.ps1 -Uninstall' -ForegroundColor Gray +Write-Host '' diff --git a/install.sh b/install.sh index 4dd2382..797afbb 100755 --- a/install.sh +++ b/install.sh @@ -3,13 +3,13 @@ set -e SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" -if [ -f "$SCRIPT_DIR/codex-launcher_3.11.5_all.deb" ]; then - echo "Installing codex-launcher_3.11.5_all.deb ..." - sudo dpkg -i "$SCRIPT_DIR/codex-launcher_3.11.5_all.deb" +if [ -f "$SCRIPT_DIR/codex-launcher_3.11.6_all.deb" ]; then + echo "Installing codex-launcher_3.11.6_all.deb ..." + sudo dpkg -i "$SCRIPT_DIR/codex-launcher_3.11.6_all.deb" else - echo "WARNING: codex-launcher_3.11.5_all.deb not found; copying files manually." + echo "WARNING: codex-launcher_3.11.6_all.deb not found; copying files manually." fi -echo "Installed v3.11.5 via .deb package." +echo "Installed v3.11.6 via .deb package." echo " translate-proxy.py -> /usr/bin/translate-proxy.py" echo " codex-launcher-gui -> /usr/bin/codex-launcher-gui" echo " cleanup-codex-stale -> /usr/bin/cleanup-codex-stale.sh" diff --git a/src/codex-launcher-gui b/src/codex-launcher-gui index e39fbf0..4a623b9 100755 --- a/src/codex-launcher-gui +++ b/src/codex-launcher-gui @@ -27,6 +27,12 @@ model_catalog_json = "" """ CHANGELOG = [ + ("3.11.6", "2026-05-26", [ + "Antigravity loop breakers: per-session tracking, repeated tool detection", + "has_content fix: function_call counts as valid output", + "Latest user instruction appended once per request for Antigravity", + "Antigravity-only changes, no touch to other providers", + ]), ("3.11.5", "2026-05-26", [ "Token-aware compaction: fixes context_length_exceeded on small-context models", "Proactive compaction triggers on token count, not just item count", @@ -2140,6 +2146,8 @@ class LauncherWin(Gtk.Window): self._relogin_btn.set_sensitive("cli" not in self._missing) elif status == "not_installed": self._auth_label.set_markup("Auth: N/A (CLI not installed)") + elif status == "not_configured": + self._auth_label.set_markup("⚠ Config missing — launch once to create") else: self._auth_label.set_markup(f"⚠ Auth: {msg}") self._relogin_btn.set_sensitive("cli" not in self._missing) diff --git a/src/codex_launcher_lib.py b/src/codex_launcher_lib.py index 94249d0..5e20b20 100644 --- a/src/codex_launcher_lib.py +++ b/src/codex_launcher_lib.py @@ -83,13 +83,21 @@ model_catalog_json = "" """ CHANGELOG = [ + ("3.11.6", "2026-05-26", [ + "Antigravity loop breakers: per-session tracking, edit-intent nudge (first turn only)", + "Loop breaker: same tool+args repeated 5+ times triggers force finalization", + "Latest user instruction appended exactly once per request", + "Detailed [antigravity-loop] logging for all tracking fields", + "has_content fix: function_call now counts as valid output (no more infinite loops)", + "Antigravity-only changes, no touch to other providers", + ]), ("3.11.5", "2026-05-26", [ - "Token-aware compaction: fixes context_length_exceeded on small-context models (25 items × 1600 tokens)", + "Token-aware compaction: fixes context_length_exceeded on small-context models (25 items x 1600 tokens)", "Proactive compaction triggers on token count (>80% model limit), not just item count", "Universal adaptive compaction: removed crof.ai-only gates, all providers get compaction", "Vision model detection: strips images for non-vision models, keeps for vision-capable ones", "Per-model token limit learning from context_length_exceeded error messages", - "Compaction aggression levels: normal vs extreme when tokens > 1.5× model limit", + "Compaction aggression levels: normal vs extreme when tokens > 1.5x model limit", "Smart-continue text-tool detection: triggers on tool-call text patterns, not just function_call_output", "Active endpoint sync: GUI auto-removes stale endpoint references on startup", ]), @@ -1713,6 +1721,10 @@ def check_codex_auth(): return ("unknown", "No output from codex login status") except FileNotFoundError: return ("not_installed", "codex not found") + except OSError as e: + if e.errno == 2: + return ("not_configured", "Config not found — launch Codex once to create it") + return ("error", str(e)) except Exception as e: return ("error", str(e)) diff --git a/src/translate-proxy.py b/src/translate-proxy.py index 6c5906b..df2d914 100755 --- a/src/translate-proxy.py +++ b/src/translate-proxy.py @@ -157,7 +157,7 @@ Architecture: import json, http.server, socketserver, urllib.request, urllib.parse, urllib.error, re import time, uuid, os, sys, argparse, threading, socket, collections, contextlib, signal -import secrets, string +import secrets, string, hashlib import dataclasses import http.client import selectors @@ -219,6 +219,9 @@ def load_config(): "backend_type": ("PROXY_BACKEND", None, str), "target_url": ("PROXY_TARGET_URL", "ZAI_BASE_URL", str), "api_key": ("PROXY_API_KEY", "ZAI_API_KEY", str), + "vision_fallback_url": ("VISION_FALLBACK_URL", None, str), + "vision_fallback_model": ("VISION_FALLBACK_MODEL", None, str), + "vision_fallback_key": ("VISION_FALLBACK_KEY", None, str), } for ck, (ev1, ev2, conv) in env_map.items(): if ck not in cfg: @@ -260,6 +263,9 @@ PROMPT_ENHANCER_MODE = "offline" PROMPT_ENHANCER_MODEL = "" PROMPT_ENHANCER_URL = "" PROMPT_ENHANCER_KEY = "" +VISION_FALLBACK_URL = "" +VISION_FALLBACK_MODEL = "" +VISION_FALLBACK_KEY = "" SERVER = None if _IS_WINDOWS: @@ -855,6 +861,7 @@ def _init_runtime(): global CONFIG, PORT, BACKEND, TARGET_URL, API_KEY, OAUTH_PROVIDER, _antigravity_version global MODELS, CC_VERSION, REASONING_ENABLED, REASONING_EFFORT, BGP_ROUTES global _api_key_pool, PROMPT_ENHANCER + global VISION_FALLBACK_URL, VISION_FALLBACK_MODEL, VISION_FALLBACK_KEY CONFIG = load_config() PORT = CONFIG["port"] @@ -872,6 +879,9 @@ def _init_runtime(): PROMPT_ENHANCER_MODEL = CONFIG.get("prompt_enhancer_model", "") PROMPT_ENHANCER_URL = CONFIG.get("prompt_enhancer_url", "") PROMPT_ENHANCER_KEY = CONFIG.get("prompt_enhancer_key", "") + VISION_FALLBACK_URL = CONFIG.get("vision_fallback_url") or "https://api.kilo.ai/api/gateway/chat/completions" + VISION_FALLBACK_MODEL = CONFIG.get("vision_fallback_model") or "kilo-auto/small" + VISION_FALLBACK_KEY = CONFIG.get("vision_fallback_key") or "" BGP_ROUTES = CONFIG.get("bgp_routes", []) _api_key_pool = None if API_KEY and "," in API_KEY and not OAUTH_PROVIDER.startswith("google") and BACKEND not in ("codebuff", "freebuff"): @@ -2366,6 +2376,113 @@ def _mark_vision_fail(model): with _vision_fail_lock: _vision_fail_cache.add(model) +def _vision_describe_image(img_data, cache): + """Call vision fallback API to describe a single image.""" + if not VISION_FALLBACK_URL: + return None + if isinstance(img_data, dict): + img_url = img_data.get("url", "") + if not img_url: + inner = img_data.get("image_url", img_data) + img_url = inner.get("url", "") if isinstance(inner, dict) else str(inner) + else: + img_url = str(img_data) + if not img_url: + return None + img_hash = hashlib.md5(img_url.encode("utf-8", errors="replace")).hexdigest() + if img_hash in cache: + return cache[img_hash] + try: + payload = json.dumps({ + "model": VISION_FALLBACK_MODEL, + "messages": [{"role": "user", "content": [ + {"type": "text", "text": "Describe the content of this image in detail. If it contains text, transcribe it fully."}, + {"type": "image_url", "image_url": {"url": img_url}}, + ]}], + "max_tokens": 1024, + "stream": False, + }).encode() + headers = {"Content-Type": "application/json"} + if VISION_FALLBACK_KEY: + headers["Authorization"] = f"Bearer {VISION_FALLBACK_KEY}" + req = urllib.request.Request(VISION_FALLBACK_URL, data=payload, headers=headers) + resp = urllib.request.urlopen(req, timeout=30) + body = json.loads(resp.read().decode()) + choices = body.get("choices", []) + if choices: + msg = choices[0].get("message", {}) + desc = msg.get("content", "") + if desc: + cache[img_hash] = desc + return desc + except Exception as e: + print(f"[vision-fallback] error describing image: {e}", file=sys.stderr) + return None + + +def _preprocess_vision(messages, schema): + """Replace image blocks with text descriptions when provider lacks vision support.""" + if schema.supports_vision: + return messages + cache = {} + for msg in messages: + content = msg.get("content") + if not isinstance(content, list): + continue + new_parts = [] + changed = False + for part in content: + if isinstance(part, dict) and part.get("type") in ("image_url", "input_image"): + changed = True + img_data = part.get("image_url", part) + description = _vision_describe_image(img_data, cache) + if description: + new_parts.append({"type": "text", "text": f"[Image: {description}]"}) + else: + new_parts.append({"type": "text", "text": "[Image: description unavailable - text-only model]"}) + else: + new_parts.append(part) + if changed: + msg["content"] = new_parts + return messages + + +def _preprocess_vision_input(input_data, schema): + """Replace input_image blocks in Responses API input format with text descriptions.""" + if schema.supports_vision: + return input_data + if not isinstance(input_data, list): + return input_data + cache = {} + changed_any = False + for item in input_data: + if item.get("type") != "message": + continue + content = item.get("content") + if not isinstance(content, list): + continue + new_parts = [] + changed = False + for part in content: + if isinstance(part, dict) and part.get("type") in ("input_image", "image_url"): + changed = True + img_url = "" + if part.get("type") == "input_image": + img_url = part.get("image_url", {}).get("url", "") + else: + img_url = part.get("image_url", {}).get("url", part.get("url", "")) + desc = _vision_describe_image({"url": img_url}, cache) + if desc: + new_parts.append({"type": "input_text", "text": f"[Image: {desc}]"}) + else: + new_parts.append({"type": "input_text", "text": "[Image: description unavailable - text-only model]"}) + else: + new_parts.append(part) + if changed: + item["content"] = new_parts + changed_any = True + return input_data + def _strip_images_from_input(input_data, model): if not isinstance(input_data, list) or _model_supports_vision(model): return input_data @@ -4014,6 +4131,7 @@ class ProviderSchema: }) response_format: str = "auto" # "sse" | "raw_json" | "ndjson" | "auto" stream_format: str = "auto" # "sse_data" | "sse_event" | "raw_lines" | "json_lines" + supports_vision: bool = True def hints(self) -> dict: """Return a dict for storing in provider-caps.json.""" @@ -4023,7 +4141,10 @@ class ProviderSchema: continue if isinstance(v, dict) and not v: continue - if v is False: + if k == "supports_vision": + if v is not False: + continue + elif v is False: continue if v == "": continue @@ -4193,6 +4314,15 @@ class ErrorAnalyzer: elif re.search(r"tool-call|tool_call.*format", err): hints["tool_decl_format"] = "command_code" + # ── Response/Stream format hints from content-type or error ── + # ── Vision support detection ── + if re.search(r"unknown variant\b.*image_url", err) or \ + re.search(r"unexpected.*image_url", err) or \ + re.search(r"does not support.*image", err) or \ + re.search(r"image.*not.*support", err) or \ + re.search(r"unsupported.*content.*type.*image", err): + hints["supports_vision"] = False + # ── Response/Stream format hints from content-type or error ── if re.search(r"content.type.*text/event.stream", err) or \ re.search(r"stream.*sse|sse.*expected", err): @@ -4253,6 +4383,7 @@ def _load_schema(target_url=None, backend=None, model=None): })), response_format=data.get("response_format", "auto"), stream_format=data.get("stream_format", "auto"), + supports_vision=data.get("supports_vision", True), ) @@ -5053,6 +5184,9 @@ class Handler(http.server.BaseHTTPRequestHandler): body["input"] = input_data messages = oa_input_to_messages(input_data) + _schema = _load_schema(model=model) + if _schema and not _schema.supports_vision: + messages = _preprocess_vision(messages, _schema) messages = _inject_stored_reasoning(messages) instructions = body.get("instructions", "").strip() if instructions: @@ -5082,6 +5216,18 @@ class Handler(http.server.BaseHTTPRequestHandler): upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, stream)) except urllib.error.HTTPError as e: err_body = e.read().decode() + if re.search(r"unknown variant\b.*image_url", err_body.lower()) or \ + re.search(r"unexpected.*image_url", err_body.lower()) or \ + re.search(r"does not support.*image", err_body.lower()): + _schema = _load_schema(model=model) + if _schema: + _schema.supports_vision = False + if attempt < max_retries: + print(f"[{self._session_id}] vision not supported, retrying with image preprocessing", file=sys.stderr) + messages = _preprocess_vision(messages, _schema) if _schema else messages + chat_body = self._build_chat_body(model, messages, body, stream) + chat_body_b = json.dumps(chat_body).encode() + continue if "context_length_exceeded" in err_body and attempt < max_retries: import re as _re _tok_m = _re.search(r'~?(\d+)\s*tokens', err_body) @@ -6869,7 +7015,8 @@ class Handler(http.server.BaseHTTPRequestHandler): prev_content_type = None # for oscillation detection for attempt in range(max_retries + 1): adapter = SchemaAdapter(schema) - messages = adapter.convert(input_data, instructions) + processed_input = _preprocess_vision_input(input_data, schema) if not schema.supports_vision else input_data + messages = adapter.convert(processed_input, instructions) use_cc_wrap = schema.cc_body_wrap or is_cc # Build auth header from schema