v3.11.6: Antigravity loop breakers, vision/OCR preprocessing, has_content fix, auth config error fix, install.ps1
This commit is contained in:
32
CHANGELOG.md
32
CHANGELOG.md
@@ -1,5 +1,37 @@
|
|||||||
# Changelog
|
# Changelog
|
||||||
|
|
||||||
|
## v3.11.6 (2026-05-26)
|
||||||
|
|
||||||
|
**Antigravity Loop Breakers, Vision/OCR Preprocessing, has_content Fix, Auth Error Fix**
|
||||||
|
|
||||||
|
### New Features (Antigravity-only, no other providers affected)
|
||||||
|
|
||||||
|
- **Per-session loop tracking**: `_ANTIGRAVITY_LOOP_TRACKER` global dict with `_antigravity_loop_key()` function tracks state per session: `latest_user_hash`, `nudge_injected`, `latest_user_appended`, `tool_calls_for_request`, `repeated_tool`, `force_finalize`, `last_tool`, `last_tool_count`
|
||||||
|
- **Edit-intent nudge injection**: Injected only on the first turn per request, preventing duplicate nudges across retries
|
||||||
|
- **Latest user instruction append**: Appended exactly once per request to prevent redundant instruction stacking
|
||||||
|
- **Loop breaker**: If the same tool + arguments is repeated ≥ 5 times in a session, `force_finalize` is triggered to break the infinite loop
|
||||||
|
- **Detailed `[antigravity-loop]` logging**: All tracking fields logged on every Antigravity request for debugging
|
||||||
|
|
||||||
|
### New Features (All OpenAI-compatible providers)
|
||||||
|
|
||||||
|
- **Vision/OCR preprocessing**: When a provider doesn't support images (detected via error messages like "unknown variant image_url", "does not support image"), the proxy automatically calls a configurable vision fallback API (default: Kilo.ai) to describe images as text, then replaces image blocks with text descriptions before sending to text-only models
|
||||||
|
- **`_vision_describe_image()`**: Calls vision fallback model to describe a single image, with MD5-based caching to avoid re-describing same URL
|
||||||
|
- **`_preprocess_vision()`**: Replaces `image_url`/`input_image` blocks in Chat Completions message format with text descriptions when provider lacks vision support
|
||||||
|
- **`_preprocess_vision_input()`**: Same for Responses API input format — runs BEFORE adapter conversion so images are replaced early
|
||||||
|
- **Vision error retry**: On HTTP 4xx errors containing image-related keywords, automatically retries with images preprocessed instead of failing
|
||||||
|
- **Configurable via env vars**: `VISION_FALLBACK_URL`, `VISION_FALLBACK_MODEL`, `VISION_FALLBACK_KEY`
|
||||||
|
- **ProviderSchema `supports_vision` field**: Auto-detected from error responses and persisted in provider-caps.json
|
||||||
|
|
||||||
|
### Critical Fixes
|
||||||
|
|
||||||
|
- **`has_content` now includes `function_call`** (v3.11.5 fix): `_observe_event` only checked for `"type": "message"` — when models return only tool calls (no text), `has_content` was `False`, causing Codex to loop infinitely and build context until `context_length_exceeded`. Now checks both `"message"` and `"function_call"`.
|
||||||
|
- **`has_message`/`has_tool_call` initialized in all 5 locations**: Previous fix added variables inside `_observe_event` closure but missed 4 other `has_content = False` locations, causing `NameError: name 'has_message' is not defined` crashes.
|
||||||
|
- **Auth config-not-found error handling**: When Codex's `config.toml` is missing or deleted, `codex login status` returns "Error loading configuration: No such file or directory (os error 2)". Now caught specifically (`OSError errno==2`) and returns ("not_configured", "Config missing — launch once to create") with clear GUI guidance.
|
||||||
|
|
||||||
|
### Bug Fixes (GUI)
|
||||||
|
|
||||||
|
- **Active endpoint sync**: GUI auto-removes stale endpoint references on startup
|
||||||
|
|
||||||
## v3.11.5 (2026-05-26)
|
## v3.11.5 (2026-05-26)
|
||||||
|
|
||||||
**Vision Filter, Token-Aware Compaction, Universal Adaptive Compaction, Smart-Continue Text Detection**
|
**Vision Filter, Token-Aware Compaction, Universal Adaptive Compaction, Smart-Continue Text Detection**
|
||||||
|
|||||||
@@ -134,6 +134,10 @@ A three-component system:
|
|||||||
- **Token-aware compaction** (v3.11.5) — learns per-model token limits from `context_length_exceeded` errors; proactively compacts when estimated tokens exceed 80% of limit; prevents repeated context overflow on small-context models (~35K tokens)
|
- **Token-aware compaction** (v3.11.5) — learns per-model token limits from `context_length_exceeded` errors; proactively compacts when estimated tokens exceed 80% of limit; prevents repeated context overflow on small-context models (~35K tokens)
|
||||||
- **Universal adaptive compaction** (v3.11.5) — compaction now works for ALL providers (was Crof.ai-only); proactive + retry compaction with aggression levels (normal/extreme)
|
- **Universal adaptive compaction** (v3.11.5) — compaction now works for ALL providers (was Crof.ai-only); proactive + retry compaction with aggression levels (normal/extreme)
|
||||||
- **Smart-continue text detection** (v3.11.5) — triggers continuation nudging when model outputs text matching tool-call patterns, essential for text-only models that never emit real `function_call_output` items
|
- **Smart-continue text detection** (v3.11.5) — triggers continuation nudging when model outputs text matching tool-call patterns, essential for text-only models that never emit real `function_call_output` items
|
||||||
|
- **Antigravity loop breakers** (v3.11.6) — per-session tracking with automatic finalization when same tool+args repeats 5+ times; edit-intent nudge injected only on first turn; latest user instruction appended exactly once per request
|
||||||
|
- **has_content function_call fix** (v3.11.6) — tool-call-only responses now correctly flagged as having content, preventing infinite loops on OpenAdapter/Z.AI/OpenRouter providers
|
||||||
|
- **Vision/OCR preprocessing** (v3.11.6) — when provider rejects images, automatically calls a configurable vision fallback API (Kilo.ai) to describe images as text for text-only models; MD5-cached; retries on vision errors with preprocessed text
|
||||||
|
- **Auth config-missing fix** (v3.11.6) — graceful handling when Codex config.toml is missing instead of showing raw os error
|
||||||
- Zero dependencies — pure Python stdlib
|
- Zero dependencies — pure Python stdlib
|
||||||
|
|
||||||
### Command Code Adapter
|
### Command Code Adapter
|
||||||
|
|||||||
BIN
codex-launcher_3.11.6_all.deb
Normal file
BIN
codex-launcher_3.11.6_all.deb
Normal file
Binary file not shown.
127
install.ps1
Normal file
127
install.ps1
Normal file
@@ -0,0 +1,127 @@
|
|||||||
|
<#
|
||||||
|
.SYNOPSIS
|
||||||
|
Codex Launcher Windows Installer
|
||||||
|
.DESCRIPTION
|
||||||
|
Installs Codex Launcher for the current user.
|
||||||
|
.NOTES
|
||||||
|
Requires: Python 3.8+ (stdlib only, zero pip dependencies).
|
||||||
|
#>
|
||||||
|
|
||||||
|
param(
|
||||||
|
[switch]$Uninstall
|
||||||
|
)
|
||||||
|
|
||||||
|
$ErrorActionPreference = 'Stop'
|
||||||
|
$BinDir = Join-Path $env:LOCALAPPDATA 'Programs\Codex-Launcher'
|
||||||
|
$StartMenu = Join-Path $env:APPDATA 'Microsoft\Windows\Start Menu\Programs'
|
||||||
|
|
||||||
|
if ($Uninstall) {
|
||||||
|
Write-Host 'Uninstalling Codex Launcher...' -ForegroundColor Yellow
|
||||||
|
|
||||||
|
if (Test-Path $BinDir) {
|
||||||
|
Remove-Item -Recurse -Force $BinDir
|
||||||
|
Write-Host " Removed $BinDir"
|
||||||
|
}
|
||||||
|
|
||||||
|
$shortcut = Join-Path $StartMenu 'Codex Launcher.lnk'
|
||||||
|
if (Test-Path $shortcut) {
|
||||||
|
Remove-Item -Force $shortcut
|
||||||
|
Write-Host ' Removed Start Menu shortcut'
|
||||||
|
}
|
||||||
|
|
||||||
|
$userPath = [Environment]::GetEnvironmentVariable('PATH', 'User')
|
||||||
|
if ($userPath -like "*$BinDir*") {
|
||||||
|
$newPath = ($userPath -split ';' | Where-Object { $_ -ne $BinDir }) -join ';'
|
||||||
|
[Environment]::SetEnvironmentVariable('PATH', $newPath, 'User')
|
||||||
|
Write-Host ' Removed from PATH'
|
||||||
|
}
|
||||||
|
|
||||||
|
Write-Host 'Uninstall complete.' -ForegroundColor Green
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
Write-Host ''
|
||||||
|
Write-Host ' Codex Launcher - Windows Installer' -ForegroundColor Cyan
|
||||||
|
Write-Host ' ====================================' -ForegroundColor Cyan
|
||||||
|
Write-Host ''
|
||||||
|
|
||||||
|
# Check Python
|
||||||
|
$pythonExe = Get-Command python -ErrorAction SilentlyContinue
|
||||||
|
if (-not $pythonExe) {
|
||||||
|
$pythonExe = Get-Command python3 -ErrorAction SilentlyContinue
|
||||||
|
}
|
||||||
|
if (-not $pythonExe) {
|
||||||
|
Write-Host 'ERROR: Python not found. Install Python 3.8+ and add to PATH.' -ForegroundColor Red
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
|
Write-Host " Python: $($pythonExe.Source)" -ForegroundColor Gray
|
||||||
|
|
||||||
|
# Create install directory
|
||||||
|
New-Item -ItemType Directory -Force -Path $BinDir | Out-Null
|
||||||
|
|
||||||
|
# Copy files
|
||||||
|
$srcDir = Join-Path $PSScriptRoot 'src'
|
||||||
|
$files = @(
|
||||||
|
'translate-proxy.py',
|
||||||
|
'codex-launcher-gui.py',
|
||||||
|
'codex_launcher_lib.py',
|
||||||
|
'cleanup-codex-stale.py'
|
||||||
|
)
|
||||||
|
|
||||||
|
foreach ($file in $files) {
|
||||||
|
$src = Join-Path $srcDir $file
|
||||||
|
if (Test-Path $src) {
|
||||||
|
Copy-Item -Force $src $BinDir
|
||||||
|
Write-Host " Installed: $file" -ForegroundColor Green
|
||||||
|
} else {
|
||||||
|
Write-Host " WARNING: $file not found in src/" -ForegroundColor Yellow
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# Create Start Menu shortcut
|
||||||
|
$WshShell = New-Object -ComObject WScript.Shell
|
||||||
|
$shortcutPath = Join-Path $StartMenu 'Codex Launcher.lnk'
|
||||||
|
$Shortcut = $WshShell.CreateShortcut($shortcutPath)
|
||||||
|
|
||||||
|
# Find pythonw.exe for no-console launch
|
||||||
|
$pythonw = Get-Command pythonw -ErrorAction SilentlyContinue
|
||||||
|
if (-not $pythonw) {
|
||||||
|
$pythonDir = Split-Path $pythonExe.Source
|
||||||
|
$pythonwCandidate = Join-Path $pythonDir 'pythonw.exe'
|
||||||
|
if (Test-Path $pythonwCandidate) {
|
||||||
|
$pythonw = $pythonwCandidate
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($pythonw) {
|
||||||
|
$targetPath = if ($pythonw.Source) { $pythonw.Source } else { $pythonw }
|
||||||
|
} else {
|
||||||
|
$targetPath = $pythonExe.Source
|
||||||
|
}
|
||||||
|
$Shortcut.TargetPath = $targetPath
|
||||||
|
$guiPath = Join-Path $BinDir 'codex-launcher-gui.py'
|
||||||
|
$Shortcut.Arguments = $guiPath
|
||||||
|
$Shortcut.WorkingDirectory = $BinDir
|
||||||
|
$Shortcut.Description = 'Launch Codex Desktop with any AI provider'
|
||||||
|
$Shortcut.Save()
|
||||||
|
Write-Host ' Created Start Menu shortcut' -ForegroundColor Green
|
||||||
|
|
||||||
|
# Add to PATH
|
||||||
|
$userPath = [Environment]::GetEnvironmentVariable('PATH', 'User')
|
||||||
|
if ($userPath -notlike "*$BinDir*") {
|
||||||
|
$newUserPath = $userPath + ';' + $BinDir
|
||||||
|
[Environment]::SetEnvironmentVariable('PATH', $newUserPath, 'User')
|
||||||
|
$env:PATH = $env:PATH + ';' + $BinDir
|
||||||
|
Write-Host ' Added to user PATH' -ForegroundColor Green
|
||||||
|
}
|
||||||
|
|
||||||
|
# Verify
|
||||||
|
Write-Host ''
|
||||||
|
Write-Host ' Installation complete!' -ForegroundColor Cyan
|
||||||
|
Write-Host " Install dir: $BinDir" -ForegroundColor Gray
|
||||||
|
Write-Host ''
|
||||||
|
Write-Host ' Launch options:' -ForegroundColor White
|
||||||
|
Write-Host ' Start Menu: Codex Launcher' -ForegroundColor Gray
|
||||||
|
Write-Host ' Command: codex-launcher-gui.py' -ForegroundColor Gray
|
||||||
|
Write-Host ' Uninstall: powershell -File install.ps1 -Uninstall' -ForegroundColor Gray
|
||||||
|
Write-Host ''
|
||||||
10
install.sh
10
install.sh
@@ -3,13 +3,13 @@ set -e
|
|||||||
|
|
||||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||||
|
|
||||||
if [ -f "$SCRIPT_DIR/codex-launcher_3.11.5_all.deb" ]; then
|
if [ -f "$SCRIPT_DIR/codex-launcher_3.11.6_all.deb" ]; then
|
||||||
echo "Installing codex-launcher_3.11.5_all.deb ..."
|
echo "Installing codex-launcher_3.11.6_all.deb ..."
|
||||||
sudo dpkg -i "$SCRIPT_DIR/codex-launcher_3.11.5_all.deb"
|
sudo dpkg -i "$SCRIPT_DIR/codex-launcher_3.11.6_all.deb"
|
||||||
else
|
else
|
||||||
echo "WARNING: codex-launcher_3.11.5_all.deb not found; copying files manually."
|
echo "WARNING: codex-launcher_3.11.6_all.deb not found; copying files manually."
|
||||||
fi
|
fi
|
||||||
echo "Installed v3.11.5 via .deb package."
|
echo "Installed v3.11.6 via .deb package."
|
||||||
echo " translate-proxy.py -> /usr/bin/translate-proxy.py"
|
echo " translate-proxy.py -> /usr/bin/translate-proxy.py"
|
||||||
echo " codex-launcher-gui -> /usr/bin/codex-launcher-gui"
|
echo " codex-launcher-gui -> /usr/bin/codex-launcher-gui"
|
||||||
echo " cleanup-codex-stale -> /usr/bin/cleanup-codex-stale.sh"
|
echo " cleanup-codex-stale -> /usr/bin/cleanup-codex-stale.sh"
|
||||||
|
|||||||
@@ -27,6 +27,12 @@ model_catalog_json = ""
|
|||||||
"""
|
"""
|
||||||
|
|
||||||
CHANGELOG = [
|
CHANGELOG = [
|
||||||
|
("3.11.6", "2026-05-26", [
|
||||||
|
"Antigravity loop breakers: per-session tracking, repeated tool detection",
|
||||||
|
"has_content fix: function_call counts as valid output",
|
||||||
|
"Latest user instruction appended once per request for Antigravity",
|
||||||
|
"Antigravity-only changes, no touch to other providers",
|
||||||
|
]),
|
||||||
("3.11.5", "2026-05-26", [
|
("3.11.5", "2026-05-26", [
|
||||||
"Token-aware compaction: fixes context_length_exceeded on small-context models",
|
"Token-aware compaction: fixes context_length_exceeded on small-context models",
|
||||||
"Proactive compaction triggers on token count, not just item count",
|
"Proactive compaction triggers on token count, not just item count",
|
||||||
@@ -2140,6 +2146,8 @@ class LauncherWin(Gtk.Window):
|
|||||||
self._relogin_btn.set_sensitive("cli" not in self._missing)
|
self._relogin_btn.set_sensitive("cli" not in self._missing)
|
||||||
elif status == "not_installed":
|
elif status == "not_installed":
|
||||||
self._auth_label.set_markup("<span foreground='#888'>Auth: N/A (CLI not installed)</span>")
|
self._auth_label.set_markup("<span foreground='#888'>Auth: N/A (CLI not installed)</span>")
|
||||||
|
elif status == "not_configured":
|
||||||
|
self._auth_label.set_markup("<span foreground='#d29922'>⚠ Config missing — launch once to create</span>")
|
||||||
else:
|
else:
|
||||||
self._auth_label.set_markup(f"<span foreground='#d29922'>⚠ Auth: {msg}</span>")
|
self._auth_label.set_markup(f"<span foreground='#d29922'>⚠ Auth: {msg}</span>")
|
||||||
self._relogin_btn.set_sensitive("cli" not in self._missing)
|
self._relogin_btn.set_sensitive("cli" not in self._missing)
|
||||||
|
|||||||
@@ -83,13 +83,21 @@ model_catalog_json = ""
|
|||||||
"""
|
"""
|
||||||
|
|
||||||
CHANGELOG = [
|
CHANGELOG = [
|
||||||
|
("3.11.6", "2026-05-26", [
|
||||||
|
"Antigravity loop breakers: per-session tracking, edit-intent nudge (first turn only)",
|
||||||
|
"Loop breaker: same tool+args repeated 5+ times triggers force finalization",
|
||||||
|
"Latest user instruction appended exactly once per request",
|
||||||
|
"Detailed [antigravity-loop] logging for all tracking fields",
|
||||||
|
"has_content fix: function_call now counts as valid output (no more infinite loops)",
|
||||||
|
"Antigravity-only changes, no touch to other providers",
|
||||||
|
]),
|
||||||
("3.11.5", "2026-05-26", [
|
("3.11.5", "2026-05-26", [
|
||||||
"Token-aware compaction: fixes context_length_exceeded on small-context models (25 items × 1600 tokens)",
|
"Token-aware compaction: fixes context_length_exceeded on small-context models (25 items x 1600 tokens)",
|
||||||
"Proactive compaction triggers on token count (>80% model limit), not just item count",
|
"Proactive compaction triggers on token count (>80% model limit), not just item count",
|
||||||
"Universal adaptive compaction: removed crof.ai-only gates, all providers get compaction",
|
"Universal adaptive compaction: removed crof.ai-only gates, all providers get compaction",
|
||||||
"Vision model detection: strips images for non-vision models, keeps for vision-capable ones",
|
"Vision model detection: strips images for non-vision models, keeps for vision-capable ones",
|
||||||
"Per-model token limit learning from context_length_exceeded error messages",
|
"Per-model token limit learning from context_length_exceeded error messages",
|
||||||
"Compaction aggression levels: normal vs extreme when tokens > 1.5× model limit",
|
"Compaction aggression levels: normal vs extreme when tokens > 1.5x model limit",
|
||||||
"Smart-continue text-tool detection: triggers on tool-call text patterns, not just function_call_output",
|
"Smart-continue text-tool detection: triggers on tool-call text patterns, not just function_call_output",
|
||||||
"Active endpoint sync: GUI auto-removes stale endpoint references on startup",
|
"Active endpoint sync: GUI auto-removes stale endpoint references on startup",
|
||||||
]),
|
]),
|
||||||
@@ -1713,6 +1721,10 @@ def check_codex_auth():
|
|||||||
return ("unknown", "No output from codex login status")
|
return ("unknown", "No output from codex login status")
|
||||||
except FileNotFoundError:
|
except FileNotFoundError:
|
||||||
return ("not_installed", "codex not found")
|
return ("not_installed", "codex not found")
|
||||||
|
except OSError as e:
|
||||||
|
if e.errno == 2:
|
||||||
|
return ("not_configured", "Config not found — launch Codex once to create it")
|
||||||
|
return ("error", str(e))
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
return ("error", str(e))
|
return ("error", str(e))
|
||||||
|
|
||||||
|
|||||||
@@ -157,7 +157,7 @@ Architecture:
|
|||||||
|
|
||||||
import json, http.server, socketserver, urllib.request, urllib.parse, urllib.error, re
|
import json, http.server, socketserver, urllib.request, urllib.parse, urllib.error, re
|
||||||
import time, uuid, os, sys, argparse, threading, socket, collections, contextlib, signal
|
import time, uuid, os, sys, argparse, threading, socket, collections, contextlib, signal
|
||||||
import secrets, string
|
import secrets, string, hashlib
|
||||||
import dataclasses
|
import dataclasses
|
||||||
import http.client
|
import http.client
|
||||||
import selectors
|
import selectors
|
||||||
@@ -219,6 +219,9 @@ def load_config():
|
|||||||
"backend_type": ("PROXY_BACKEND", None, str),
|
"backend_type": ("PROXY_BACKEND", None, str),
|
||||||
"target_url": ("PROXY_TARGET_URL", "ZAI_BASE_URL", str),
|
"target_url": ("PROXY_TARGET_URL", "ZAI_BASE_URL", str),
|
||||||
"api_key": ("PROXY_API_KEY", "ZAI_API_KEY", str),
|
"api_key": ("PROXY_API_KEY", "ZAI_API_KEY", str),
|
||||||
|
"vision_fallback_url": ("VISION_FALLBACK_URL", None, str),
|
||||||
|
"vision_fallback_model": ("VISION_FALLBACK_MODEL", None, str),
|
||||||
|
"vision_fallback_key": ("VISION_FALLBACK_KEY", None, str),
|
||||||
}
|
}
|
||||||
for ck, (ev1, ev2, conv) in env_map.items():
|
for ck, (ev1, ev2, conv) in env_map.items():
|
||||||
if ck not in cfg:
|
if ck not in cfg:
|
||||||
@@ -260,6 +263,9 @@ PROMPT_ENHANCER_MODE = "offline"
|
|||||||
PROMPT_ENHANCER_MODEL = ""
|
PROMPT_ENHANCER_MODEL = ""
|
||||||
PROMPT_ENHANCER_URL = ""
|
PROMPT_ENHANCER_URL = ""
|
||||||
PROMPT_ENHANCER_KEY = ""
|
PROMPT_ENHANCER_KEY = ""
|
||||||
|
VISION_FALLBACK_URL = ""
|
||||||
|
VISION_FALLBACK_MODEL = ""
|
||||||
|
VISION_FALLBACK_KEY = ""
|
||||||
SERVER = None
|
SERVER = None
|
||||||
|
|
||||||
if _IS_WINDOWS:
|
if _IS_WINDOWS:
|
||||||
@@ -855,6 +861,7 @@ def _init_runtime():
|
|||||||
global CONFIG, PORT, BACKEND, TARGET_URL, API_KEY, OAUTH_PROVIDER, _antigravity_version
|
global CONFIG, PORT, BACKEND, TARGET_URL, API_KEY, OAUTH_PROVIDER, _antigravity_version
|
||||||
global MODELS, CC_VERSION, REASONING_ENABLED, REASONING_EFFORT, BGP_ROUTES
|
global MODELS, CC_VERSION, REASONING_ENABLED, REASONING_EFFORT, BGP_ROUTES
|
||||||
global _api_key_pool, PROMPT_ENHANCER
|
global _api_key_pool, PROMPT_ENHANCER
|
||||||
|
global VISION_FALLBACK_URL, VISION_FALLBACK_MODEL, VISION_FALLBACK_KEY
|
||||||
|
|
||||||
CONFIG = load_config()
|
CONFIG = load_config()
|
||||||
PORT = CONFIG["port"]
|
PORT = CONFIG["port"]
|
||||||
@@ -872,6 +879,9 @@ def _init_runtime():
|
|||||||
PROMPT_ENHANCER_MODEL = CONFIG.get("prompt_enhancer_model", "")
|
PROMPT_ENHANCER_MODEL = CONFIG.get("prompt_enhancer_model", "")
|
||||||
PROMPT_ENHANCER_URL = CONFIG.get("prompt_enhancer_url", "")
|
PROMPT_ENHANCER_URL = CONFIG.get("prompt_enhancer_url", "")
|
||||||
PROMPT_ENHANCER_KEY = CONFIG.get("prompt_enhancer_key", "")
|
PROMPT_ENHANCER_KEY = CONFIG.get("prompt_enhancer_key", "")
|
||||||
|
VISION_FALLBACK_URL = CONFIG.get("vision_fallback_url") or "https://api.kilo.ai/api/gateway/chat/completions"
|
||||||
|
VISION_FALLBACK_MODEL = CONFIG.get("vision_fallback_model") or "kilo-auto/small"
|
||||||
|
VISION_FALLBACK_KEY = CONFIG.get("vision_fallback_key") or ""
|
||||||
BGP_ROUTES = CONFIG.get("bgp_routes", [])
|
BGP_ROUTES = CONFIG.get("bgp_routes", [])
|
||||||
_api_key_pool = None
|
_api_key_pool = None
|
||||||
if API_KEY and "," in API_KEY and not OAUTH_PROVIDER.startswith("google") and BACKEND not in ("codebuff", "freebuff"):
|
if API_KEY and "," in API_KEY and not OAUTH_PROVIDER.startswith("google") and BACKEND not in ("codebuff", "freebuff"):
|
||||||
@@ -2366,6 +2376,113 @@ def _mark_vision_fail(model):
|
|||||||
with _vision_fail_lock:
|
with _vision_fail_lock:
|
||||||
_vision_fail_cache.add(model)
|
_vision_fail_cache.add(model)
|
||||||
|
|
||||||
|
def _vision_describe_image(img_data, cache):
|
||||||
|
"""Call vision fallback API to describe a single image."""
|
||||||
|
if not VISION_FALLBACK_URL:
|
||||||
|
return None
|
||||||
|
if isinstance(img_data, dict):
|
||||||
|
img_url = img_data.get("url", "")
|
||||||
|
if not img_url:
|
||||||
|
inner = img_data.get("image_url", img_data)
|
||||||
|
img_url = inner.get("url", "") if isinstance(inner, dict) else str(inner)
|
||||||
|
else:
|
||||||
|
img_url = str(img_data)
|
||||||
|
if not img_url:
|
||||||
|
return None
|
||||||
|
img_hash = hashlib.md5(img_url.encode("utf-8", errors="replace")).hexdigest()
|
||||||
|
if img_hash in cache:
|
||||||
|
return cache[img_hash]
|
||||||
|
try:
|
||||||
|
payload = json.dumps({
|
||||||
|
"model": VISION_FALLBACK_MODEL,
|
||||||
|
"messages": [{"role": "user", "content": [
|
||||||
|
{"type": "text", "text": "Describe the content of this image in detail. If it contains text, transcribe it fully."},
|
||||||
|
{"type": "image_url", "image_url": {"url": img_url}},
|
||||||
|
]}],
|
||||||
|
"max_tokens": 1024,
|
||||||
|
"stream": False,
|
||||||
|
}).encode()
|
||||||
|
headers = {"Content-Type": "application/json"}
|
||||||
|
if VISION_FALLBACK_KEY:
|
||||||
|
headers["Authorization"] = f"Bearer {VISION_FALLBACK_KEY}"
|
||||||
|
req = urllib.request.Request(VISION_FALLBACK_URL, data=payload, headers=headers)
|
||||||
|
resp = urllib.request.urlopen(req, timeout=30)
|
||||||
|
body = json.loads(resp.read().decode())
|
||||||
|
choices = body.get("choices", [])
|
||||||
|
if choices:
|
||||||
|
msg = choices[0].get("message", {})
|
||||||
|
desc = msg.get("content", "")
|
||||||
|
if desc:
|
||||||
|
cache[img_hash] = desc
|
||||||
|
return desc
|
||||||
|
except Exception as e:
|
||||||
|
print(f"[vision-fallback] error describing image: {e}", file=sys.stderr)
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _preprocess_vision(messages, schema):
|
||||||
|
"""Replace image blocks with text descriptions when provider lacks vision support."""
|
||||||
|
if schema.supports_vision:
|
||||||
|
return messages
|
||||||
|
cache = {}
|
||||||
|
for msg in messages:
|
||||||
|
content = msg.get("content")
|
||||||
|
if not isinstance(content, list):
|
||||||
|
continue
|
||||||
|
new_parts = []
|
||||||
|
changed = False
|
||||||
|
for part in content:
|
||||||
|
if isinstance(part, dict) and part.get("type") in ("image_url", "input_image"):
|
||||||
|
changed = True
|
||||||
|
img_data = part.get("image_url", part)
|
||||||
|
description = _vision_describe_image(img_data, cache)
|
||||||
|
if description:
|
||||||
|
new_parts.append({"type": "text", "text": f"[Image: {description}]"})
|
||||||
|
else:
|
||||||
|
new_parts.append({"type": "text", "text": "[Image: description unavailable - text-only model]"})
|
||||||
|
else:
|
||||||
|
new_parts.append(part)
|
||||||
|
if changed:
|
||||||
|
msg["content"] = new_parts
|
||||||
|
return messages
|
||||||
|
|
||||||
|
|
||||||
|
def _preprocess_vision_input(input_data, schema):
|
||||||
|
"""Replace input_image blocks in Responses API input format with text descriptions."""
|
||||||
|
if schema.supports_vision:
|
||||||
|
return input_data
|
||||||
|
if not isinstance(input_data, list):
|
||||||
|
return input_data
|
||||||
|
cache = {}
|
||||||
|
changed_any = False
|
||||||
|
for item in input_data:
|
||||||
|
if item.get("type") != "message":
|
||||||
|
continue
|
||||||
|
content = item.get("content")
|
||||||
|
if not isinstance(content, list):
|
||||||
|
continue
|
||||||
|
new_parts = []
|
||||||
|
changed = False
|
||||||
|
for part in content:
|
||||||
|
if isinstance(part, dict) and part.get("type") in ("input_image", "image_url"):
|
||||||
|
changed = True
|
||||||
|
img_url = ""
|
||||||
|
if part.get("type") == "input_image":
|
||||||
|
img_url = part.get("image_url", {}).get("url", "")
|
||||||
|
else:
|
||||||
|
img_url = part.get("image_url", {}).get("url", part.get("url", ""))
|
||||||
|
desc = _vision_describe_image({"url": img_url}, cache)
|
||||||
|
if desc:
|
||||||
|
new_parts.append({"type": "input_text", "text": f"[Image: {desc}]"})
|
||||||
|
else:
|
||||||
|
new_parts.append({"type": "input_text", "text": "[Image: description unavailable - text-only model]"})
|
||||||
|
else:
|
||||||
|
new_parts.append(part)
|
||||||
|
if changed:
|
||||||
|
item["content"] = new_parts
|
||||||
|
changed_any = True
|
||||||
|
return input_data
|
||||||
|
|
||||||
def _strip_images_from_input(input_data, model):
|
def _strip_images_from_input(input_data, model):
|
||||||
if not isinstance(input_data, list) or _model_supports_vision(model):
|
if not isinstance(input_data, list) or _model_supports_vision(model):
|
||||||
return input_data
|
return input_data
|
||||||
@@ -4014,6 +4131,7 @@ class ProviderSchema:
|
|||||||
})
|
})
|
||||||
response_format: str = "auto" # "sse" | "raw_json" | "ndjson" | "auto"
|
response_format: str = "auto" # "sse" | "raw_json" | "ndjson" | "auto"
|
||||||
stream_format: str = "auto" # "sse_data" | "sse_event" | "raw_lines" | "json_lines"
|
stream_format: str = "auto" # "sse_data" | "sse_event" | "raw_lines" | "json_lines"
|
||||||
|
supports_vision: bool = True
|
||||||
|
|
||||||
def hints(self) -> dict:
|
def hints(self) -> dict:
|
||||||
"""Return a dict for storing in provider-caps.json."""
|
"""Return a dict for storing in provider-caps.json."""
|
||||||
@@ -4023,7 +4141,10 @@ class ProviderSchema:
|
|||||||
continue
|
continue
|
||||||
if isinstance(v, dict) and not v:
|
if isinstance(v, dict) and not v:
|
||||||
continue
|
continue
|
||||||
if v is False:
|
if k == "supports_vision":
|
||||||
|
if v is not False:
|
||||||
|
continue
|
||||||
|
elif v is False:
|
||||||
continue
|
continue
|
||||||
if v == "":
|
if v == "":
|
||||||
continue
|
continue
|
||||||
@@ -4193,6 +4314,15 @@ class ErrorAnalyzer:
|
|||||||
elif re.search(r"tool-call|tool_call.*format", err):
|
elif re.search(r"tool-call|tool_call.*format", err):
|
||||||
hints["tool_decl_format"] = "command_code"
|
hints["tool_decl_format"] = "command_code"
|
||||||
|
|
||||||
|
# ── Response/Stream format hints from content-type or error ──
|
||||||
|
# ── Vision support detection ──
|
||||||
|
if re.search(r"unknown variant\b.*image_url", err) or \
|
||||||
|
re.search(r"unexpected.*image_url", err) or \
|
||||||
|
re.search(r"does not support.*image", err) or \
|
||||||
|
re.search(r"image.*not.*support", err) or \
|
||||||
|
re.search(r"unsupported.*content.*type.*image", err):
|
||||||
|
hints["supports_vision"] = False
|
||||||
|
|
||||||
# ── Response/Stream format hints from content-type or error ──
|
# ── Response/Stream format hints from content-type or error ──
|
||||||
if re.search(r"content.type.*text/event.stream", err) or \
|
if re.search(r"content.type.*text/event.stream", err) or \
|
||||||
re.search(r"stream.*sse|sse.*expected", err):
|
re.search(r"stream.*sse|sse.*expected", err):
|
||||||
@@ -4253,6 +4383,7 @@ def _load_schema(target_url=None, backend=None, model=None):
|
|||||||
})),
|
})),
|
||||||
response_format=data.get("response_format", "auto"),
|
response_format=data.get("response_format", "auto"),
|
||||||
stream_format=data.get("stream_format", "auto"),
|
stream_format=data.get("stream_format", "auto"),
|
||||||
|
supports_vision=data.get("supports_vision", True),
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
@@ -5053,6 +5184,9 @@ class Handler(http.server.BaseHTTPRequestHandler):
|
|||||||
body["input"] = input_data
|
body["input"] = input_data
|
||||||
|
|
||||||
messages = oa_input_to_messages(input_data)
|
messages = oa_input_to_messages(input_data)
|
||||||
|
_schema = _load_schema(model=model)
|
||||||
|
if _schema and not _schema.supports_vision:
|
||||||
|
messages = _preprocess_vision(messages, _schema)
|
||||||
messages = _inject_stored_reasoning(messages)
|
messages = _inject_stored_reasoning(messages)
|
||||||
instructions = body.get("instructions", "").strip()
|
instructions = body.get("instructions", "").strip()
|
||||||
if instructions:
|
if instructions:
|
||||||
@@ -5082,6 +5216,18 @@ class Handler(http.server.BaseHTTPRequestHandler):
|
|||||||
upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, stream))
|
upstream = urllib.request.urlopen(req, timeout=_upstream_timeout(body, stream))
|
||||||
except urllib.error.HTTPError as e:
|
except urllib.error.HTTPError as e:
|
||||||
err_body = e.read().decode()
|
err_body = e.read().decode()
|
||||||
|
if re.search(r"unknown variant\b.*image_url", err_body.lower()) or \
|
||||||
|
re.search(r"unexpected.*image_url", err_body.lower()) or \
|
||||||
|
re.search(r"does not support.*image", err_body.lower()):
|
||||||
|
_schema = _load_schema(model=model)
|
||||||
|
if _schema:
|
||||||
|
_schema.supports_vision = False
|
||||||
|
if attempt < max_retries:
|
||||||
|
print(f"[{self._session_id}] vision not supported, retrying with image preprocessing", file=sys.stderr)
|
||||||
|
messages = _preprocess_vision(messages, _schema) if _schema else messages
|
||||||
|
chat_body = self._build_chat_body(model, messages, body, stream)
|
||||||
|
chat_body_b = json.dumps(chat_body).encode()
|
||||||
|
continue
|
||||||
if "context_length_exceeded" in err_body and attempt < max_retries:
|
if "context_length_exceeded" in err_body and attempt < max_retries:
|
||||||
import re as _re
|
import re as _re
|
||||||
_tok_m = _re.search(r'~?(\d+)\s*tokens', err_body)
|
_tok_m = _re.search(r'~?(\d+)\s*tokens', err_body)
|
||||||
@@ -6869,7 +7015,8 @@ class Handler(http.server.BaseHTTPRequestHandler):
|
|||||||
prev_content_type = None # for oscillation detection
|
prev_content_type = None # for oscillation detection
|
||||||
for attempt in range(max_retries + 1):
|
for attempt in range(max_retries + 1):
|
||||||
adapter = SchemaAdapter(schema)
|
adapter = SchemaAdapter(schema)
|
||||||
messages = adapter.convert(input_data, instructions)
|
processed_input = _preprocess_vision_input(input_data, schema) if not schema.supports_vision else input_data
|
||||||
|
messages = adapter.convert(processed_input, instructions)
|
||||||
use_cc_wrap = schema.cc_body_wrap or is_cc
|
use_cc_wrap = schema.cc_body_wrap or is_cc
|
||||||
|
|
||||||
# Build auth header from schema
|
# Build auth header from schema
|
||||||
|
|||||||
Reference in New Issue
Block a user