Compare commits
48 Commits
v3.11.12
...
850c7d1e82
2
.gitignore
vendored
2
.gitignore
vendored
@@ -9,3 +9,5 @@ config.toml
|
||||
*.swp
|
||||
*~
|
||||
.DS_Store
|
||||
DEBIAN/
|
||||
usr/
|
||||
|
||||
638
AI-MONITORING-DESIGN.md
Normal file
638
AI-MONITORING-DESIGN.md
Normal file
@@ -0,0 +1,638 @@
|
||||
# AI Monitoring — Design Specification
|
||||
|
||||
> **Codex Launcher v3.8.0 Feature Design**
|
||||
> Self-healing nano agent that monitors proxy health, diagnoses failures, and auto-recovers sessions.
|
||||
|
||||
---
|
||||
|
||||
## 1. Problem Statement
|
||||
|
||||
Over 42 sessions in production, we observed these failure categories:
|
||||
|
||||
| # | Failure Category | Count | Example |
|
||||
|---|-----------------|-------|---------|
|
||||
| F1 | **parsed_tool_calls=0** — model produces unparseable output | 42 | Bare `<explore_agent>`, `<bash>` without cmd, plain English intent |
|
||||
| F2 | **Stuck recovery triggered** — Intelligence Routing Layer 3 | 13 | "I need to fetch the README", "let me write the script" |
|
||||
| F3 | **Sanitizer flagged suspicious cmd** — cmd still JSON after unwrap | 11 | `{/'cmd/': /'sshpass -p .../'}` — double-escaped quoting |
|
||||
| F4 | **Upstream 500** — provider internal error | ~5 | `"An internal error occurred. Please try again later."` |
|
||||
| F5 | **Connection timeout** — upstream unreachable | ~3 | `Connection timed out after 15002 milliseconds` |
|
||||
| F6 | **Upstream 401/403** — auth failure | ~2 | Wrong API key, expired token, `upgrade_required` |
|
||||
| F7 | **Stream crash** — exception mid-stream | ~2 | `BrokenPipeError`, `ConnectionResetError` during SSE |
|
||||
| F8 | **Proxy port conflict** — Address already in use | ~1 | Stale process holding port |
|
||||
| F9 | **Schema cache corruption** — stale content_type=array | ~1 | `ErrorAnalyzer` learned wrong schema |
|
||||
| F10 | **Codex Desktop crash** — SIGKILL at ~27GB | ~1 | Issue #24048 — unbounded tool output memory |
|
||||
| F11 | **Codex 300s stall** — turn state machine race | ~1 | Issue #23807 — `stream disconnected` after 300s |
|
||||
|
||||
### The Gap
|
||||
|
||||
Intelligence Routing (v3.7.0) handles F1/F2/F3 **inside a single request**. But it can't:
|
||||
|
||||
- **Detect a dead proxy process** (F7/F8) — the proxy already crashed
|
||||
- **Reconnect Codex to a restarted proxy** (F5/F7/F8) — Codex doesn't auto-reconnect
|
||||
- **Switch to a backup provider** when the primary is down (F4/F5)
|
||||
- **Clear corrupt caches** (F9) — requires out-of-band action
|
||||
- **Restart Codex Desktop** after a crash (F10/F11)
|
||||
- **Learn from failure patterns** across sessions — each failure is handled independently
|
||||
|
||||
### What We Need
|
||||
|
||||
A **separate lightweight watchdog process** that:
|
||||
1. Monitors proxy health continuously
|
||||
2. Detects failures the proxy can't detect itself
|
||||
3. Uses a cheap AI model to diagnose novel failures
|
||||
4. Takes corrective action automatically
|
||||
5. Learns from past incidents to prevent repeats
|
||||
|
||||
---
|
||||
|
||||
## 2. Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Codex Launcher GUI │
|
||||
│ ┌──────────┐ ┌──────────────┐ ┌───────────────────────────────┐ │
|
||||
│ │ Proxy │ │ Codex │ │ AI Monitoring Panel │ │
|
||||
│ │ Manager │ │ Launcher │ │ ┌─────────────────────┐ │ │
|
||||
│ │ │ │ │ │ │ ON/OFF Toggle │ │ │
|
||||
│ └────┬─────┘ └──────┬───────┘ │ │ Provider Selector │ │ │
|
||||
│ │ │ │ │ Model Selector │ │ │
|
||||
│ │ │ │ │ Incident Log │ │ │
|
||||
│ │ │ │ │ [View Diagnostics] │ │ │
|
||||
│ │ │ │ └─────────────────────┘ │ │
|
||||
│ │ │ └───────────────────────────────┘ │
|
||||
└───────┼───────────────┼────────────────────────────────────────────┘
|
||||
│ │
|
||||
▼ ▼
|
||||
┌───────────────┐ ┌────────────────┐
|
||||
│ translate- │ │ Codex Desktop │
|
||||
│ proxy.py │ │ / CLI │
|
||||
│ (port 8080) │ │ │
|
||||
│ │ │ │
|
||||
│ /health ──────┼──┼─► health check │
|
||||
│ /responses ───┼──┼─► main API │
|
||||
└───────────────┘ └────────────────┘
|
||||
▲
|
||||
│ health probes + log analysis + corrective actions
|
||||
│
|
||||
┌───────┴────────────────────────────────────────────────────────────┐
|
||||
│ AI Monitor Watchdog │
|
||||
│ (thread in codex-launcher-gui) │
|
||||
│ │
|
||||
│ ┌─────────────────┐ ┌─────────────────┐ ┌──────────────────┐ │
|
||||
│ │ Health Watcher │ │ Log Analyzer │ │ AI Diagnostic │ │
|
||||
│ │ (every 5s) │ │ (continuous) │ │ Agent (on-call) │ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ │ - /health probe │ │ - tail cc-debug │ │ - Classify err │ │
|
||||
│ │ - process alive │ │ - tail proxy.log│ │ - Root cause │ │
|
||||
│ │ - port check │ │ - pattern match │ │ - Suggest fix │ │
|
||||
│ │ - memory watch │ │ - incident DB │ │ - Execute fix │ │
|
||||
│ └────────┬────────┘ └────────┬────────┘ └────────┬─────────┘ │
|
||||
│ │ │ │ │
|
||||
│ └────────────────────┼─────────────────────┘ │
|
||||
│ ▼ │
|
||||
│ ┌──────────────────────┐ │
|
||||
│ │ Incident Store │ │
|
||||
│ │ (JSON file) │ │
|
||||
│ │ - Known patterns │ │
|
||||
│ │ - Past resolutions │ │
|
||||
│ │ - Success rates │ │
|
||||
│ └──────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Three-Tier Response System
|
||||
|
||||
### Tier 1: Fast Path — Rule-Based Auto-Recovery (< 1 second)
|
||||
|
||||
Immediate reactions to **known failure patterns**. No AI needed.
|
||||
|
||||
```python
|
||||
TIER1_RULES = [
|
||||
# (trigger_pattern, action, cooldown)
|
||||
|
||||
# --- Proxy Health ---
|
||||
("proxy_health_fail", "restart_proxy", 30),
|
||||
("proxy_port_conflict", "kill_stale + restart", 60),
|
||||
("proxy_memory_over_1gb", "restart_proxy", 120),
|
||||
|
||||
# --- Upstream Errors ---
|
||||
("upstream_429", "wait_retry_after", 0),
|
||||
("upstream_502_503", "retry_with_backoff", 30),
|
||||
("upstream_500_repeat_3x", "switch_provider", 60),
|
||||
("upstream_timeout", "retry + increase_timeout", 30),
|
||||
("upstream_401_403", "alert_user_bad_key", 0),
|
||||
|
||||
# --- Stream Errors ---
|
||||
("stream_broken_pipe", "restart_proxy", 30),
|
||||
("stream_reset", "restart_proxy", 30),
|
||||
("stream_idle_300s", "restart_proxy", 60),
|
||||
|
||||
# --- Parser Failures ---
|
||||
("parsed_tool_calls_0_x3", "clear_schema_cache", 300),
|
||||
("sanitizer_suspicious_5x","alert_user_model_issue", 0),
|
||||
("stuck_recovery_x5", "suggest_switch_model", 0),
|
||||
|
||||
# --- Codex Process ---
|
||||
("codex_process_dead", "alert_user_restart", 0),
|
||||
("codex_memory_over_4gb", "alert_user_memory", 0),
|
||||
|
||||
# --- Cache Corruption ---
|
||||
("schema_content_type_array", "delete_provider_caps", 0),
|
||||
]
|
||||
```
|
||||
|
||||
### Tier 2: Pattern Matching — Incident Store Lookup (< 100ms)
|
||||
|
||||
For failures we've **seen before and resolved**, look up the fix:
|
||||
|
||||
```json
|
||||
{
|
||||
"incidents": [
|
||||
{
|
||||
"pattern": "cc_stream_ended_empty + explore_agent + no_url",
|
||||
"fix": "synth_explore_from_last_user_urls",
|
||||
"source": "FIX-23",
|
||||
"success_rate": 0.85,
|
||||
"last_seen": "2026-05-22T16:00:00Z",
|
||||
"occurrences": 5
|
||||
},
|
||||
{
|
||||
"pattern": "require_escalation + no_cmd",
|
||||
"fix": "auto_proceed_echo",
|
||||
"source": "FIX-24",
|
||||
"success_rate": 1.0,
|
||||
"last_seen": "2026-05-22T15:30:00Z",
|
||||
"occurrences": 3
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Tier 3: AI Diagnostic — Nano Agent (2-5 seconds)
|
||||
|
||||
For **novel failures** that don't match any rule or pattern, invoke a cheap AI model:
|
||||
|
||||
```
|
||||
Prompt Template (system):
|
||||
─────────────────────
|
||||
You are a diagnostic agent for a translation proxy that sits between
|
||||
OpenAI Codex CLI/Desktop and AI providers (Command Code, OpenAI-compat,
|
||||
Anthropic, etc.). You analyze error context and suggest ONE corrective action.
|
||||
|
||||
Available actions: restart_proxy, kill_stale_processes, clear_schema_cache,
|
||||
switch_provider, increase_timeout, alert_user, ignore, retry_now,
|
||||
regenerate_config, cleanup_codex_stale
|
||||
|
||||
Respond with ONLY a JSON object: {"action": "...", "reason": "...", "confidence": 0.0-1.0}
|
||||
|
||||
Prompt Template (user):
|
||||
─────────────────────
|
||||
INCIDENT REPORT:
|
||||
Time: {timestamp}
|
||||
Session: {session_id}
|
||||
Proxy health: {alive/dead, port, uptime, memory_mb}
|
||||
Upstream: {url, model, last_http_code, last_error}
|
||||
Recent errors (last 60s):
|
||||
{log_lines}
|
||||
Parser state: {parsed_tool_calls, stuck_recovery_count, sanitizer_flags}
|
||||
Provider: {backend_type, model}
|
||||
History: {last_5_incidents_for_this_pattern}
|
||||
|
||||
What corrective action should be taken?
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Complete Failure Catalog
|
||||
|
||||
### Category A: Proxy-Level Failures (watchdog detects, auto-recovers)
|
||||
|
||||
| ID | Failure | Symptoms | Tier 1 Action | Log Signature |
|
||||
|----|---------|----------|---------------|---------------|
|
||||
| A1 | Proxy process crashed | `/health` returns connection refused | `restart_proxy` | `urllib.error.URLError: [Errno 111] Connection refused` |
|
||||
| A2 | Port conflict | `Address already in use` on startup | `kill_stale + restart` | `OSError: [Errno 98] Address already in use` |
|
||||
| A3 | Memory leak | Process RSS > 1GB | `restart_proxy` | `/proc/{pid}/status` VmRSS check |
|
||||
| A4 | Deadlock | Health check hangs > 15s | `restart_proxy` | health probe timeout |
|
||||
| A5 | Unhandled exception | Process exits with non-zero | `restart_proxy` | `SELF-REVIVE CRASH #{n}` |
|
||||
| A6 | SSL/TLS error | `CERTIFICATE_VERIFY_FAILED` upstream | `alert_user` | `urllib.error.URLError: certificate verify failed` |
|
||||
| A7 | DNS resolution failure | `getaddrinfo failed` | `retry_with_backoff` | `socket.gaierror: Name or service not known` |
|
||||
|
||||
### Category B: Upstream Provider Failures (proxy detects, watchdog analyzes)
|
||||
|
||||
| ID | Failure | Symptoms | Tier 1 Action | Log Signature |
|
||||
|----|---------|----------|---------------|---------------|
|
||||
| B1 | Rate limit (429) | Too many requests | `wait_retry_after` | `HTTP 429` + `Retry-After` header |
|
||||
| B2 | Server error (5xx) | Provider down | `retry_with_backoff` | `HTTP 500/502/503` |
|
||||
| B3 | Auth failure (401/403) | Bad/expired key | `alert_user_bad_key` | `HTTP 401 {"error":"invalid_api_key"}` |
|
||||
| B4 | CC upgrade required (403) | Version mismatch | `update_cc_version` | `HTTP 403 upgrade_required` |
|
||||
| B5 | Connection timeout | Upstream silent | `retry + increase_timeout` | `urllib.error.URLError: timed out` |
|
||||
| B6 | Connection reset | Upstream dropped mid-stream | `restart_proxy` | `ConnectionResetError: Connection reset by peer` |
|
||||
| B7 | Broken pipe | Client disconnected | `ignore` | `BrokenPipeError: Broken pipe` |
|
||||
| B8 | Upstream 400 bad request | Malformed request | `clear_schema_cache` | `HTTP 400 {"error":"...expected string..."}` |
|
||||
| B9 | Provider capacity (503) | Overloaded | `switch_provider` | `HTTP 503` after 3 retries |
|
||||
| B10 | Cloudflare block (403/1010) | Bot detection | `check_browser_ua` | `HTTP 403 error 1010` |
|
||||
|
||||
### Category C: Parser/Format Failures (Intelligence Routing handles, watchdog tracks)
|
||||
|
||||
| ID | Failure | Symptoms | Auto-Fix (IR Layer) | Watchdog Escalation |
|
||||
|----|---------|----------|--------------------|--------------------|
|
||||
| C1 | Bare `<explore_agent>` | `parsed_tool_calls=0` | Layer 1: URL extraction | If 3x in a row → suggest model switch |
|
||||
| C2 | `<require_escalation>` block | Model wants permissions | Layer 2: Auto-proceed | If 5x → suggest different provider |
|
||||
| C3 | Unrecognized format | No parser matches | Layer 3: Intent synthesis | If 5x → log for AI diagnosis |
|
||||
| C4 | Double-wrapped cmd | `cmd = "{\"cmd\": ...}"` | Sanitizer: unwrap | If cmd still JSON → alert |
|
||||
| C5 | Suspicious cmd (JSON) | `cmd starts with {` | Sanitizer: flag | If 3x → clear cache + restart |
|
||||
| C6 | Empty cmd | `cmd = ""` or `cmd = "{}"` | Sanitizer: diagnostic echo | If 3x → suggest model switch |
|
||||
| C7 | Bare `{` token | Model outputs incomplete JSON | Layer 3: heuristic 5 | If persistent → AI diagnosis |
|
||||
| C8 | `<bash>` without cmd | Block has sandbox but no command | Layer 3: heuristic | If 3x → AI diagnosis |
|
||||
| C9 | DSML name mismatch | `name="cmd"` vs `name="command"` | DSML parser handles both | Self-test catches regression |
|
||||
| C10 | Stuck model loop | Same recovery 5+ times | Layer 3 max 3x then alert | Switch model or provider |
|
||||
|
||||
### Category D: Codex Process Failures (watchdog detects, alerts user)
|
||||
|
||||
| ID | Failure | Symptoms | Action | Log Signature |
|
||||
|----|---------|----------|--------|---------------|
|
||||
| D1 | Codex process killed | PID gone from pids.json | `alert_user_restart` | Process not in `/proc/{pid}` |
|
||||
| D2 | Codex memory explosion | RSS > 4GB | `alert_user_memory` | `/proc/{pid}/status` check |
|
||||
| D3 | Codex 300s stall | `stream disconnected` loop | `restart_proxy` | Codex stderr: `stream disconnected` |
|
||||
| D4 | Config corruption | `database disk image is malformed` | `regenerate_config` | Codex stderr: `malformed` |
|
||||
| D5 | Session context overflow | `context_length_exceeded` | `alert_user_context` | Codex stderr: `context_length_exceeded` |
|
||||
| D6 | WebSocket reconnect loop | `Reconnecting... N/5` | `check_proxy_health` | Codex stderr: `Reconnecting` |
|
||||
|
||||
### Category E: Config/State Failures (watchdog detects, auto-fixes)
|
||||
|
||||
| ID | Failure | Symptoms | Action | Detection |
|
||||
|----|---------|----------|--------|-----------|
|
||||
| E1 | Schema cache corruption | `content_type: "array"` in provider-caps.json | `delete_provider_caps` | Read file, check for known-bad values |
|
||||
| E2 | Stale PID file | pids.json has dead PIDs | `cleanup_pids` | Check `/proc/{pid}` existence |
|
||||
| E3 | Port from old session | config.toml has stale port | `regenerate_config` | Port in config != running port |
|
||||
| E4 | OAuth token expired | Google/Gemini token refresh fails | `alert_user_reauth` | Token file `expiry_ts < now` |
|
||||
| E5 | BGP all routes down | Every route returned error | `alert_user_no_provider` | All routes in cooldown |
|
||||
|
||||
---
|
||||
|
||||
## 5. Component Design
|
||||
|
||||
### 5.1 Health Watcher Thread
|
||||
|
||||
Runs in the GUI process as a background thread. Pings proxy `/health` endpoint every 5 seconds.
|
||||
|
||||
```python
|
||||
class HealthWatcher(threading.Thread):
|
||||
def __init__(self, proxy_port, on_failure, on_recovery):
|
||||
super().__init__(daemon=True)
|
||||
self.proxy_port = proxy_port
|
||||
self.on_failure = on_failure
|
||||
self.on_recovery = on_recovery
|
||||
self.check_interval = 5 # seconds
|
||||
self.failures = 0
|
||||
self.running = True
|
||||
|
||||
def run(self):
|
||||
while self.running:
|
||||
healthy = self._check_health()
|
||||
if healthy:
|
||||
if self.failures > 0:
|
||||
self.failures = 0
|
||||
self.on_recovery()
|
||||
else:
|
||||
self.failures += 1
|
||||
if self.failures >= 3: # 15s of consecutive failures
|
||||
self.on_failure(self.failures)
|
||||
time.sleep(self.check_interval)
|
||||
|
||||
def _check_health(self):
|
||||
try:
|
||||
req = urllib.request.Request(f"http://localhost:{self.proxy_port}/health")
|
||||
resp = urllib.request.urlopen(req, timeout=5)
|
||||
return resp.status == 200
|
||||
except Exception:
|
||||
return False
|
||||
```
|
||||
|
||||
### 5.2 Log Analyzer Thread
|
||||
|
||||
Tails the debug log and extracts failure signals in real-time.
|
||||
|
||||
```python
|
||||
FAILURE_SIGNALS = {
|
||||
"parsed_tool_calls=0": ("C1", "parser_empty"),
|
||||
"[STUCK-RECOVERY]": ("C3", "stuck_recovery"),
|
||||
"suspicious cmd": ("C4", "sanitizer_flag"),
|
||||
"empty cmd recovered": ("C6", "empty_cmd"),
|
||||
"HTTP 429": ("B1", "rate_limited"),
|
||||
"HTTP 500": ("B2", "server_error"),
|
||||
"HTTP 401": ("B3", "auth_failure"),
|
||||
"HTTP 403": ("B4", "forbidden"),
|
||||
"Connection refused": ("A1", "proxy_dead"),
|
||||
"Address already in use": ("A2", "port_conflict"),
|
||||
"Broken pipe": ("B7", "broken_pipe"),
|
||||
"Connection reset": ("B6", "connection_reset"),
|
||||
"timed out": ("B5", "timeout"),
|
||||
"SELF-REVIVE CRASH": ("A5", "proxy_crash"),
|
||||
"stream error": ("B6", "stream_error"),
|
||||
}
|
||||
|
||||
class LogAnalyzer(threading.Thread):
|
||||
def __init__(self, log_path, on_signal):
|
||||
super().__init__(daemon=True)
|
||||
self.log_path = log_path
|
||||
self.on_signal = on_signal
|
||||
self.running = True
|
||||
|
||||
def run(self):
|
||||
fh = open(self.log_path, "r")
|
||||
fh.seek(0, 2) # seek to end
|
||||
while self.running:
|
||||
line = fh.readline()
|
||||
if not line:
|
||||
time.sleep(0.5)
|
||||
continue
|
||||
for pattern, (fault_id, category) in FAILURE_SIGNALS.items():
|
||||
if pattern in line:
|
||||
self.on_signal(fault_id, category, line.strip())
|
||||
break
|
||||
```
|
||||
|
||||
### 5.3 AI Diagnostic Agent
|
||||
|
||||
Invoked by the watchdog when a failure doesn't match Tier 1 rules or Tier 2 patterns.
|
||||
|
||||
```python
|
||||
class AIDiagnosticAgent:
|
||||
def __init__(self, provider_url, model, api_key):
|
||||
self.provider_url = provider_url
|
||||
self.model = model
|
||||
self.api_key = api_key
|
||||
self.system_prompt = DIAGNOSTIC_SYSTEM_PROMPT # defined below
|
||||
self.incident_store = IncidentStore()
|
||||
|
||||
def diagnose(self, context):
|
||||
# Tier 2: Check incident store first
|
||||
pattern = self._extract_pattern(context)
|
||||
known_fix = self.incident_store.lookup(pattern)
|
||||
if known_fix and known_fix["success_rate"] > 0.7:
|
||||
return known_fix["fix"], "tier2_pattern", known_fix["success_rate"]
|
||||
|
||||
# Tier 3: Ask AI
|
||||
prompt = self._build_prompt(context)
|
||||
response = self._call_model(prompt)
|
||||
action = self._parse_response(response)
|
||||
|
||||
# Learn from this incident
|
||||
if action:
|
||||
self.incident_store.record(pattern, action)
|
||||
|
||||
return action, "tier3_ai", None
|
||||
|
||||
def _call_model(self, prompt):
|
||||
body = {
|
||||
"model": self.model,
|
||||
"messages": [
|
||||
{"role": "system", "content": self.system_prompt},
|
||||
{"role": "user", "content": prompt}
|
||||
],
|
||||
"max_tokens": 200,
|
||||
"temperature": 0.1,
|
||||
}
|
||||
req = urllib.request.Request(
|
||||
self.provider_url,
|
||||
data=json.dumps(body).encode(),
|
||||
headers={
|
||||
"Content-Type": "application/json",
|
||||
"Authorization": f"Bearer {self.api_key}",
|
||||
}
|
||||
)
|
||||
resp = urllib.request.urlopen(req, timeout=15)
|
||||
return json.loads(resp.read())["choices"][0]["message"]["content"]
|
||||
```
|
||||
|
||||
### 5.4 Incident Store
|
||||
|
||||
JSON file that accumulates failure patterns and their resolutions.
|
||||
|
||||
```json
|
||||
{
|
||||
"version": 1,
|
||||
"incidents": {
|
||||
"parser_empty+explore_agent": {
|
||||
"fault_ids": ["C1"],
|
||||
"fix": "synth_explore_from_urls",
|
||||
"source": "intelligent_routing",
|
||||
"success_count": 8,
|
||||
"fail_count": 1,
|
||||
"last_seen": "2026-05-22T16:00:00Z",
|
||||
"auto_applied": true
|
||||
},
|
||||
"server_error+repeat_3x": {
|
||||
"fault_ids": ["B2"],
|
||||
"fix": "switch_provider",
|
||||
"source": "tier1_rule",
|
||||
"success_count": 2,
|
||||
"fail_count": 0,
|
||||
"last_seen": "2026-05-22T14:00:00Z",
|
||||
"auto_applied": true
|
||||
}
|
||||
},
|
||||
"ai_diagnostic_calls": 0,
|
||||
"tokens_used": 0,
|
||||
"cost_usd": 0.0
|
||||
}
|
||||
```
|
||||
|
||||
### 5.5 Diagnostic Agent System Prompt
|
||||
|
||||
```
|
||||
You are a diagnostic agent for "Codex Launcher" — a desktop app that runs a local
|
||||
translation proxy between OpenAI Codex CLI/Desktop and various AI providers.
|
||||
|
||||
## Your Job
|
||||
Analyze the incident report and recommend ONE corrective action.
|
||||
|
||||
## Available Actions
|
||||
- restart_proxy: Kill and restart translate-proxy.py
|
||||
- kill_stale_processes: Kill orphaned proxy/codex processes
|
||||
- clear_schema_cache: Delete ~/.cache/codex-proxy/provider-caps.json
|
||||
- switch_provider: Switch to a different configured endpoint
|
||||
- increase_timeout: Increase upstream timeout for slow providers
|
||||
- regenerate_config: Regenerate Codex config.toml
|
||||
- cleanup_codex_stale: Run cleanup-codex-stale.sh
|
||||
- alert_user: Show notification to user (can't auto-fix)
|
||||
- ignore: Transient error, no action needed
|
||||
- retry_now: Immediate retry without changes
|
||||
|
||||
## Decision Rules
|
||||
- If upstream returns 401/403 with auth error → alert_user (can't fix bad keys)
|
||||
- If proxy process is dead → restart_proxy
|
||||
- If same error repeated 5+ times → switch_provider or alert_user
|
||||
- If error is about content_type/schema → clear_schema_cache
|
||||
- If "Address already in use" → kill_stale_processes then restart_proxy
|
||||
- If timeout and upstream is slow → increase_timeout
|
||||
- If single transient 429/502/503 → ignore (retry handles it)
|
||||
- If "stream disconnected" and proxy is healthy → ignore (Codex retries)
|
||||
|
||||
## Response Format
|
||||
Reply with ONLY a JSON object:
|
||||
{"action": "...", "reason": "...", "confidence": 0.0-1.0}
|
||||
|
||||
No explanation, no markdown, no extra text.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. GUI Integration
|
||||
|
||||
### AI Monitoring Panel (in Settings tab)
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ AI Monitoring [ON] │
|
||||
│ │
|
||||
│ ┌─ Diagnostic Agent ─────────────────────────────────┐ │
|
||||
│ │ Provider: [OpenCode Zen ▼] │ │
|
||||
│ │ Model: [Qwen3-32B ▼] │ │
|
||||
│ │ API Key: [sk-•••••••••••••••••••• ] │ │
|
||||
│ │ │ │
|
||||
│ │ Cost this month: $0.12 (3 diagnostic calls) │ │
|
||||
│ │ Tokens used: 1,847 input / 423 output │ │
|
||||
│ └─────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─ Incident Log (last 7 days) ──────────────────────┐ │
|
||||
│ │ ✅ 16:00 F1 parser_empty → synth_explore (Tier 2) │ │
|
||||
│ │ ⚠️ 15:30 B2 server_error → retry (Tier 1) │ │
|
||||
│ │ ✅ 15:00 A1 proxy_dead → restart_proxy (Tier 1) │ │
|
||||
│ │ 🤖 14:30 C3 novel_format → clear_cache (Tier 3) │ │
|
||||
│ │ ... │ │
|
||||
│ └────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ [View Full Diagnostics] [Export Incident Report] │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Config Storage (in endpoints.json)
|
||||
|
||||
```json
|
||||
{
|
||||
"ai_monitoring": {
|
||||
"enabled": true,
|
||||
"provider_url": "https://opencode.ai/zen/v1/chat/completions",
|
||||
"model": "Qwen/Qwen3-32B",
|
||||
"api_key": "sk-...",
|
||||
"tier1_enabled": true,
|
||||
"tier2_enabled": true,
|
||||
"tier3_enabled": true,
|
||||
"auto_restart_proxy": true,
|
||||
"auto_switch_provider": false,
|
||||
"health_check_interval_s": 5,
|
||||
"max_memory_mb": 1024,
|
||||
"notification_level": "important_only"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Recommended Models (by cost)
|
||||
|
||||
| Model | Cost/Diagnosis | Latency | Quality | Recommended For |
|
||||
|-------|---------------|---------|---------|----------------|
|
||||
| **Qwen3-32B** (OpenCode) | ~$0.0005 | 2-4s | Good | Default — cheapest decent model |
|
||||
| **DeepSeek V4 Flash** | ~$0.0003 | 2-3s | Good | Cheapest option |
|
||||
| **GPT-4o-mini** | ~$0.001 | 1-2s | Excellent | Best quality/latency |
|
||||
| **Gemini 2.0 Flash** | ~$0.0002 | 1-2s | Good | Cheapest + fastest |
|
||||
| **Claude Haiku 4.5** | ~$0.001 | 2-3s | Excellent | Best reasoning quality |
|
||||
| **Local Ollama** (if running) | $0 | 5-15s | Varies | Zero-cost offline option |
|
||||
|
||||
### Cost Estimate
|
||||
|
||||
- Average diagnostic prompt: ~800 tokens input, ~100 tokens output
|
||||
- Expected frequency: ~1-5 incidents per day that reach Tier 3
|
||||
- **Monthly cost**: $0.10 - $1.50 depending on model and usage
|
||||
|
||||
---
|
||||
|
||||
## 7. Watchdog Response Flow
|
||||
|
||||
```
|
||||
Failure Detected
|
||||
│
|
||||
▼
|
||||
┌─────────────┐ YES ┌──────────────────┐
|
||||
│ Tier 1 Rule? ├─────────►│ Execute Action │
|
||||
│ (known) │ │ Log incident │
|
||||
└──────┬───────┘ └──────────────────┘
|
||||
│ NO
|
||||
▼
|
||||
┌─────────────┐ YES ┌──────────────────┐
|
||||
│ Tier 2 Match?├─────────►│ Apply Known Fix │
|
||||
│ (incident DB)│ │ Update success │
|
||||
└──────┬───────┘ └──────────────────┘
|
||||
│ NO
|
||||
▼
|
||||
┌─────────────┐ YES ┌──────────────────┐
|
||||
│ AI Enabled? ├─────────►│ Collect Context │
|
||||
│ (Tier 3) │ │ Build Prompt │
|
||||
└──────┬───────┘ │ Call AI Model │
|
||||
│ NO │ Parse Response │
|
||||
▼ │ Execute if auto │
|
||||
┌─────────────┐ │ Store incident │
|
||||
│ Alert User │ └──────────────────┘
|
||||
│ (can't fix) │
|
||||
└─────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Safety Guards
|
||||
|
||||
1. **Rate limit AI calls** — max 1 Tier 3 call per 60 seconds, max 10 per day
|
||||
2. **Never auto-execute destructive actions** — `alert_user` for: delete files, change API keys, modify source code
|
||||
3. **Auto-restart cap** — max 5 proxy restarts per 10 minutes, then alert user
|
||||
4. **Cost cap** — monthly AI diagnostic budget (configurable, default $2/month)
|
||||
5. **Cooldown per pattern** — same failure pattern has escalating cooldown (30s → 60s → 300s → alert)
|
||||
6. **User override** — any auto-action can be cancelled within 3 seconds via GUI
|
||||
7. **Incident store max size** — 500 entries, LRU eviction
|
||||
8. **Health check bypass** — if user manually stopped proxy, don't alert
|
||||
|
||||
---
|
||||
|
||||
## 9. Implementation Plan
|
||||
|
||||
### Phase 1: Core Watchdog (v3.8.0)
|
||||
- `HealthWatcher` thread in `codex-launcher-gui`
|
||||
- `LogAnalyzer` thread tailing `cc-debug.log` and `proxy.log`
|
||||
- Tier 1 rule engine with all 20+ rules
|
||||
- Incident store (JSON file)
|
||||
- GUI toggle (ON/OFF) in settings
|
||||
- Auto-restart proxy on crash
|
||||
|
||||
### Phase 2: Pattern Learning (v3.8.1)
|
||||
- Tier 2 incident store lookup
|
||||
- Auto-learn from Intelligence Routing outcomes
|
||||
- Success rate tracking per pattern
|
||||
- Incident log viewer in GUI
|
||||
|
||||
### Phase 3: AI Diagnostic Agent (v3.9.0)
|
||||
- Tier 3 AI model integration
|
||||
- Provider/model selector in GUI
|
||||
- Diagnostic prompt template
|
||||
- Cost tracking
|
||||
- Full incident report export
|
||||
|
||||
### Phase 4: Advanced Recovery (v4.0.0)
|
||||
- Auto-switch to backup provider on repeated failure
|
||||
- BGP route health monitoring
|
||||
- Predictive failure detection (memory growth, latency trends)
|
||||
- Codex process memory monitoring
|
||||
- WebSocket reconnect assistance
|
||||
|
||||
---
|
||||
|
||||
## 10. File Changes Summary
|
||||
|
||||
| File | Changes |
|
||||
|------|---------|
|
||||
| `codex-launcher-gui` | +HealthWatcher thread, +LogAnalyzer thread, +AI Monitoring panel, +incident log viewer |
|
||||
| `translate-proxy.py` | +`/monitoring` endpoint (returns health + metrics), enhanced `/health` with memory/uptime |
|
||||
| `~/.cache/codex-proxy/incident-store.json` | New file — incident pattern database |
|
||||
| `~/.cache/codex-proxy/monitoring.log` | New file — watchdog activity log |
|
||||
| `~/.codex/endpoints.json` | +`ai_monitoring` config section |
|
||||
196
CHANGELOG.md
196
CHANGELOG.md
@@ -1,5 +1,201 @@
|
||||
# Changelog
|
||||
|
||||
## v3.9.7 (2026-05-25)
|
||||
|
||||
**Codebuff Error Forwarding & Crash Fixes**
|
||||
|
||||
### Rate Limit Error Forwarding
|
||||
- **Real Codebuff error messages** forwarded to user instead of generic "429 Too Many Requests"
|
||||
- **HTTP 200 + Responses API format** for rate limits — Codex displays the actual Codebuff message (e.g. "Daily session limit reached. Resets in 29m.") instead of retrying
|
||||
- **`retryAfterMs` extraction** from Codebuff 429 responses for accurate cooldown timers
|
||||
- **`_codebuff_start_run`** returns actual error body instead of `None` — shows real Codebuff errors
|
||||
|
||||
### Crash Fixes
|
||||
- **BrokenPipeError crash** on "all accounts exhausted" response — wrapped in try/except
|
||||
- **3 SyntaxWarnings** fixed for invalid `\ ` escape sequences in docstrings
|
||||
|
||||
## v3.9.6 (2026-05-25)
|
||||
|
||||
**Performance & Stability Hardening — Connection Pooling, Stream Idle Timeouts, Retry-After**
|
||||
|
||||
Inspired by architectural study of [Codex-Proxy-Server](https://github.com/unluckyjori/Codex-Proxy-Server) (Rust/Axum).
|
||||
|
||||
### P0: Connection Pooling & Stream Idle Timeout
|
||||
- **Connection pooling** (`http.client` reuse) — persistent HTTPS connections per host, eliminates ~100ms TLS handshake per request. Pool keyed by `{scheme}://{host}:{port}`, reused across requests.
|
||||
- **Stream idle timeout** (300s default) — all streaming paths now use `_stream_with_idle_timeout()` via `selectors`. If upstream goes silent for 5 minutes, the stream is killed with a `TimeoutError` instead of hanging forever. Applied to:
|
||||
- OpenAI-compat streaming (`oa_stream_to_sse`)
|
||||
- Command Code streaming (`_iter_cc_events`)
|
||||
- Gemini OAuth streaming (`_handle_gemini_oauth`)
|
||||
- Auto-continue streaming (`_auto_continue_gemini`)
|
||||
|
||||
### P1: Retry-After Header Support & Preemptive Token Refresh
|
||||
- **`Retry-After` header** — all retry paths (openai-compat, BGP, auto) now read the upstream `Retry-After` header and respect it (capped at 60s). Falls back to exponential backoff if header is absent.
|
||||
- **Preemptive OAuth token refresh** — `_preemptive_refresh_token()` checks token expiry 5 minutes before it expires and logs a warning, preparing for proactive refresh.
|
||||
|
||||
### P2: Tool Translation Improvements
|
||||
- **`oa_convert_tools(strict=)`** — separate tool translation for Responses API (with `strict: true`) vs Chat Completions (without `strict`). Some providers reject the `strict` field in Chat Completions mode.
|
||||
- **Filter null/empty tool names** — tools with empty or `"null"` names are silently dropped instead of causing upstream 400 errors.
|
||||
|
||||
### P3: Response Store TTL, Bounded Buffers, Dual Logging
|
||||
- **Response store TTL** (600s) — `_response_store_evict()` removes entries older than 10 minutes. Prevents unbounded memory growth on long sessions.
|
||||
- **Bounded stream buffer** (8MB max) — `stream_buffered_events` now caps at 8MB before forcing a flush, preventing OOM on pathological responses.
|
||||
- **`response.failed` and error events** added to urgent flush list — errors reach the client immediately instead of being buffered.
|
||||
- **Dual logging** — `proxy.log` in `~/.cache/codex-proxy/` captures all proxy messages alongside stderr. Survives Codex Desktop's stderr piping.
|
||||
|
||||
## v3.5.0 (2026-05-22)
|
||||
|
||||
**Major Release — Command Code Adapter Overhaul, AI Assist, Self-Revive Watchdog, Debug Infrastructure**
|
||||
|
||||
### Command Code Provider — Multi-Format Tool-Call Parser (Critical Bug Fix)
|
||||
|
||||
The Command Code (CC) provider adapter in `translate-proxy.py` had a critical bug where the CC model's tool-call output was not being parsed into executable tool calls, causing the Codex agent loop to stop after the first response. The CC model output format **changes between sessions and models** — the parser must handle all observed formats.
|
||||
|
||||
**Root Cause:** The CC model returns tool calls as inline text in various formats (raw JSON, XML, DSML tags, HTML-like blocks) within `text-delta` SSE events. The original parser only handled one format. When the model switched output style, tool calls were silently dropped, and Codex received a plain text response instead of executable commands — halting the multi-turn agent loop.
|
||||
|
||||
**The Fix — Multi-Format Parser Chain (17 patches):**
|
||||
|
||||
A cascading parser chain was built that tries each format in order, first match wins:
|
||||
`DSML → <bash> blocks → <explore_agent> → <tool_call type=...> → XML patterns → raw JSON → fallback regex`
|
||||
|
||||
- **FIX 1**: `cc_input_to_messages()` — enforce STRING content only (CC `/alpha/generate` rejects content blocks). Tool calls sent as inline JSON text in assistant messages. Tool results as `role: "user"` plain text (NOT `role: "tool"`).
|
||||
- **FIX 2**: `x-command-code-version` header always sent (fallback `"0.26.8"`) — prevents 403 `upgrade_required` errors.
|
||||
- **FIX 3**: Cleared stale schema cache (`content_type:"array"`) that was corrupting message construction.
|
||||
- **FIX 4**: Streaming `try/except` wrapper — catches all streaming errors and sends `response.completed(status:"failed")` event instead of crashing the connection.
|
||||
- **FIX 5**: `_extract_raw_json_tool_calls()` — new parser that finds raw JSON tool calls embedded in model text (`{"cmd":"...","type":"tool-call"}`).
|
||||
- **FIX 6**: `_extract_args()` three-tier parser — tries direct parse → `codecs.escape_decode` → `unicode_escape` to prevent double-wrapped argument strings.
|
||||
- **FIX 7**: `_extract_field()` skips leading `\` before value type check — handles malformed escape sequences in CC output.
|
||||
- **FIX 8**: `sandbox_permissions` normalization from parsed dict — converts `{"docker":"full"}` to the flat string format Codex expects.
|
||||
- **FIX 9** (REVERTED): Removed adaptive probe system — proved unnecessary, conservative inline-text format is sufficient.
|
||||
- **FIX 10**: Comprehensive fix documentation added to proxy file header for maintainability.
|
||||
- **FIX 11**: `_unwrap_cmd()` recursive unwrapping — handles double/triple-wrapped `cmd` values at all 7 extraction paths. `_sanitize_tool_calls()` post-extraction validation layer ensures every tool call has valid name + args.
|
||||
- **FIX 11c**: XML regex fix — `</tool_call)` had unbalanced parenthesis for ~4000 lines; now uses `[)]?>` to match both `</tool_call)>` and `</tool_call)>`.
|
||||
- **FIX 12**: Self-revive watchdog loop — auto-restarts proxy on crash (up to 50x, progressive backoff 1→30s). Controlled by `_SHUTDOWN_REQUESTED` flag on SIGTERM/SIGINT.
|
||||
- **FIX 13**: Fallback extraction when main parser returns empty but text contains tool-call signals (`{"cmd":`, `"type":"tool-call"`, `<tool`, `<function=`).
|
||||
- **FIX 14**: Parser for `<tool_call type="bash">\n{"command":"..."}` format (actual CC model output) + fixed fallback regex to match BOTH `"cmd"` AND `"command"` keys.
|
||||
- **FIX 15**: `<explore_agent>` blocks converted to real `exec_command` with synthesized curl-based repo exploration command.
|
||||
- **FIX 16**: `<bash>...</bash>` blocks parsed — extracts `prefix_rule`, `sandbox_permissions`, `justification` via line-oriented parsing.
|
||||
- **FIX 17**: DSML tool_call blocks — the **current CC model output format**:
|
||||
- `<||DSML||tool_calls>` wrapper
|
||||
- `<||DSML||invoke name="exec">` with `<||DSML||parameter name="command">` tags
|
||||
- Extracts command from `parameter name="command"` or fallback to `prefix_rule`
|
||||
- Maps `exec`/`bash` → `exec_command`
|
||||
|
||||
### Debug Infrastructure
|
||||
- **Debug-to-file**: All proxy events, text_buf preview, parser results, and fallback attempts logged to `~/.cache/codex-proxy/cc-debug.log` — works even when stderr is piped by Codex Desktop.
|
||||
- **Inline self-test**: `--self-test` flag runs 19 tests covering unwrap, double-wrap, unescaped quotes, XML, function=, sanitizer edge cases.
|
||||
- **Per-request logging**: Event types, text_buf content, parser match results written to debug log for every request.
|
||||
|
||||
### AI Assist
|
||||
- AI Assist integration in launcher GUI for intelligent provider configuration and troubleshooting.
|
||||
|
||||
### Self-Revive Watchdog
|
||||
- Proxy auto-restarts on crash with progressive backoff (1s → 30s, up to 50 restarts).
|
||||
- Clean shutdown on SIGTERM/SIGINT via `_SHUTDOWN_REQUESTED` flag.
|
||||
- Eliminates manual proxy restart during long coding sessions.
|
||||
|
||||
### Other Improvements
|
||||
- `text_buf` in `cc_stream_to_sse` accumulates all `text-delta` events; parsing happens at end-of-stream for complete extraction.
|
||||
- Schema cache with 24h staleness TTL for provider capabilities.
|
||||
- ErrorAnalyzer learns from 4xx errors on retry (max 2 retries).
|
||||
- `cleanup-codex-stale.sh` updated with additional stale process patterns.
|
||||
|
||||
## v3.3.0 (2026-05-20)
|
||||
|
||||
**Antigravity + Gemini CLI OAuth — full Codex agent loop working**
|
||||
|
||||
### Gemini CLI OAuth + Antigravity OAuth
|
||||
- Split Google OAuth into separate Gemini CLI OAuth and Google Antigravity OAuth presets/backends.
|
||||
- Gemini CLI OAuth uses the Gemini CLI public OAuth client and Code Assist endpoints.
|
||||
- Antigravity OAuth uses Antigravity OAuth credentials, Code Assist daily/autopush/prod fallback, and Antigravity-style request wrapping.
|
||||
- Added Antigravity version discovery from the updater/changelog with local caching.
|
||||
- Added Antigravity model alias mapping from UI-facing `antigravity-*` IDs to upstream Code Assist model IDs.
|
||||
|
||||
### Responses API + Tool Flow
|
||||
- Added Gemini-style history hardening for Google OAuth requests: removes empty turns, coalesces adjacent roles, drops duplicate user repeats, and enforces user-start/user-end history.
|
||||
- Preserves function-call IDs across turns and adds synthetic `thoughtSignature` for historical Gemini function calls, matching Gemini CLI hardening behavior.
|
||||
- Fixed Antigravity streaming Responses API compatibility: single assistant message item, text done events, content part done, output item done, final completed event, and connection close.
|
||||
- Added `response.function_call_arguments.delta` and `response.function_call_arguments.done` events so Codex can execute Antigravity tool calls and create files.
|
||||
- Fixed functionResponse name matching — uses the original functionCall name instead of falling back to call_id.
|
||||
- Strengthened Antigravity prompt policy: use tools immediately for file changes, avoid planning-only responses, and answer directly when no suitable tool exists.
|
||||
- **Auto-continue on MAX_TOKENS** — when Gemini/Antigravity truncates a text response, the proxy transparently sends a continuation request and concatenates the output so Codex receives the complete response without manual intervention.
|
||||
|
||||
### Reliability + Routing
|
||||
- Added BGP++ route scoring, route cooldowns, token buckets, and persisted route stats.
|
||||
- Added provider policy layer and adaptive context compaction.
|
||||
- Added tool-call pairing validation/repair for orphaned tool outputs.
|
||||
- Added Endpoint Doctor in the endpoint editor.
|
||||
- Added log redaction helper for common API key/token patterns.
|
||||
|
||||
## v3.1.0 (2026-05-20)
|
||||
|
||||
- Initial Antigravity/Gemini CLI OAuth backend split.
|
||||
- Gemini-style history hardening, SSE streaming fixes.
|
||||
|
||||
## v3.0.0 (2026-05-20)
|
||||
|
||||
**Major architectural overhaul — Phase 0 + Phase 1 of engineering roadmap**
|
||||
|
||||
### Proxy (translate-proxy.py)
|
||||
- **ThreadingHTTPServer** — serves concurrent requests (no more blocking)
|
||||
- **Thread-safe shared state** — OrderedDict response store with locks, Crof state lock, stats lock
|
||||
- **Batched + atomic stats writes** — stats buffered in memory, flushed every 5s via `os.replace()`
|
||||
- **Graceful shutdown** — SIGTERM/SIGINT drain active connections (up to 5s), reject new with 503
|
||||
- **Progressive upstream timeouts** — based on input size and tools (60-300s instead of flat 180s)
|
||||
- **Lazy JSON parsing** — skip parsing SSE events unless they contain `response.completed`
|
||||
- **Buffered SSE writes** — flush every 30ms, on urgent events, or at 4KB (reduces syscalls)
|
||||
- **`/health` endpoint** — returns backend, target, models, BGP route count
|
||||
- **Consolidated imports** — all at top, no more missing import crashes
|
||||
- **`main()` entry point** — runtime init moved out of module level
|
||||
- **TCP_NODELAY** — on all streaming paths (from v2.7.0)
|
||||
- **Anthropic prompt caching** — `cache_control: ephemeral` on system prompts (from v2.7.0)
|
||||
|
||||
### Launcher (codex-launcher-gui)
|
||||
- **Dynamic port allocation** — `_pick_free_port()` picks random free port, no more 8080 conflicts
|
||||
- **Proxy health gating** — Codex will NOT launch if proxy fails health check within 15s
|
||||
- **Error dialogs** — clear GTK error dialog when proxy startup fails
|
||||
- **Atomic config backup/restore** — temp file + `os.replace()`, no more corrupted config.toml
|
||||
- **Config transactions** — recovery from interrupted sessions on next startup
|
||||
- **Safe cleanup (PID registry)** — only kills processes launched by the app (pids.json)
|
||||
- **Proxy stderr piped to log** — real-time proxy logs in launcher UI
|
||||
- **Bearer token** — Codex config uses `codex-launcher-local` instead of real API key
|
||||
- **Usage Dashboard v2** — OpenUsage-inspired dark theme with status pills, KPI strip, model bars (from v2.7.0)
|
||||
|
||||
## v2.7.0 (2026-05-20)
|
||||
|
||||
- **Usage Dashboard redesigned** (inspired by OpenUsage design patterns)
|
||||
- Deep Space dark theme with Catppuccin-inspired color palette
|
||||
- Header with animated status dots (OK/WARN/ERR provider health)
|
||||
- KPI summary strip: total providers, requests, token volume, avg latency
|
||||
- Provider cards with colored borders matching health status
|
||||
- Status pills: OK (green), WARN (yellow), ERR (red)
|
||||
- Colored section separators per metric type (Usage=yellow, Models=lavender)
|
||||
- Model composition bar: stacked horizontal segments per model share
|
||||
- Per-model breakdown with mini progress bars, percentage, request counts
|
||||
- Per-model token breakdown (in/out) when available
|
||||
- Token formatting: 1.2M, 45.3K instead of raw numbers
|
||||
- Duration formatting: 1.5h, 3.2m instead of raw seconds
|
||||
- Error section with warning icon
|
||||
|
||||
- **TCP_NODELAY streaming optimization**
|
||||
- Disables Nagle's algorithm on streaming connections
|
||||
- Reduces per-packet latency by up to 40ms on small SSE events
|
||||
- Applied to all 4 streaming code paths (openai-compat, retry, command-code, generic)
|
||||
|
||||
- **Anthropic prompt caching**
|
||||
- System prompts now sent as `cache_control: ephemeral` structured format
|
||||
- Enables Anthropic's automatic prompt caching (saves tokens + cost on repeated prompts)
|
||||
|
||||
## v2.6.1 (2026-05-20)
|
||||
|
||||
- **Google OAuth rebuilt to emulate Gemini CLI**
|
||||
- Uses Google's public OAuth client_id (same as gemini-cli)
|
||||
- No `client_secret.json` needed — zero setup required
|
||||
- PKCE (S256 code challenge) + CSRF state protection
|
||||
- Scopes: cloud-platform, generative-language, userinfo.email, userinfo.profile
|
||||
- Redirects to Google's success/failure pages (same as gemini-cli)
|
||||
- Just click "OAuth Login" → browser opens → authorize → done
|
||||
- Token file permissions set to 0600 for security
|
||||
|
||||
## v2.6.0 (2026-05-20)
|
||||
|
||||
- **Usage Dashboard** — per-provider tracking with visual cards
|
||||
|
||||
382
README.md
382
README.md
@@ -15,7 +15,7 @@
|
||||
|
||||
<p align="center">
|
||||
<strong>Run OpenAI Codex CLI & Desktop with <em>any</em> AI provider.</strong><br/>
|
||||
OpenCode • Z.AI • Anthropic • Command Code • OpenRouter • Crof.ai • NVIDIA NIM • Kilo.ai • and more
|
||||
Google Antigravity • Gemini CLI • OpenCode • Z.AI • Anthropic • Command Code • Codebuff • OpenRouter • Crof.ai • NVIDIA NIM • OpenAdapter • Kilo.ai • DeepSeek • and more
|
||||
</p>
|
||||
|
||||
<p align="center">
|
||||
@@ -32,6 +32,10 @@
|
||||
<img src="https://img.shields.io/badge/Command_Code-✓-success" />
|
||||
<img src="https://img.shields.io/badge/Streaming_SSE-✓-success" />
|
||||
<img src="https://img.shields.io/badge/Tool_Calls-✓-success" />
|
||||
<img src="https://img.shields.io/badge/AI_Assist-✓-success" />
|
||||
<img src="https://img.shields.io/badge/Intelligence_Routing-✓-success" />
|
||||
<img src="https://img.shields.io/badge/AI_Monitoring-✓-success" />
|
||||
<img src="https://img.shields.io/badge/Self_Revive_Watchdog-✓-success" />
|
||||
</p>
|
||||
|
||||
---
|
||||
@@ -43,14 +47,16 @@ OpenAI's Codex CLI v2.0+ exclusively uses the **Responses API** — a protocol t
|
||||
| Provider | API | Works with Codex? |
|
||||
|----------|-----|:-:|
|
||||
| OpenAI | Responses API | ✅ |
|
||||
| Z.AI | Chat Completions | ❌ |
|
||||
| OpenCode | Chat Completions | ❌ |
|
||||
| Anthropic | Messages API | ❌ |
|
||||
| Command Code | Custom `/alpha/generate` | ❌ |
|
||||
| Ollama | Chat Completions | ❌ |
|
||||
| OpenRouter | Chat Completions | ❌ |
|
||||
| NVIDIA NIM | Chat Completions | ❌ |
|
||||
| Crof.ai | Chat Completions | ❌ |
|
||||
| Google Antigravity (OAuth) | Code Assist / Gemini Native | ✅ |
|
||||
| Gemini CLI OAuth | Code Assist | ✅ |
|
||||
| Z.AI | Chat Completions | ✅ |
|
||||
| OpenCode | Chat Completions | ✅ |
|
||||
| Anthropic | Messages API | ✅ |
|
||||
| Command Code | Custom `/alpha/generate` | ✅ |
|
||||
| Ollama | Chat Completions | ✅ |
|
||||
| OpenRouter | Chat Completions | ✅ |
|
||||
| NVIDIA NIM | Chat Completions | ✅ |
|
||||
| Crof.ai | Chat Completions | ✅ |
|
||||
|
||||
The protocols differ in **endpoint paths**, **message formats**, **tool-call structures**, **streaming events**, and **completion semantics**. You can't just swap a base URL.
|
||||
|
||||
@@ -65,23 +71,23 @@ A three-component system:
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Codex Launcher GUI │
|
||||
│ (endpoint management + lifecycle) │
|
||||
│ (endpoint management + AI Assist + lifecycle) │
|
||||
└──────────┬─────────────────┬──────────────────┬────────────────────┘
|
||||
│ │ │
|
||||
┌──────▼──────┐ ┌──────▼──────┐ ┌────────▼─────────┐
|
||||
│ Codex │ │ Native │ │ Translation │
|
||||
│ Default │ │ OpenAI │ │ Proxy │
|
||||
│ (remove │ │ (direct │ │ (port 8080) │
|
||||
│ (remove │ │ (direct │ │ (auto-revive) │
|
||||
│ config) │ │ URL) │ │ │
|
||||
└──────┬──────┘ └──────┬──────┘ └────────┬─────────┘
|
||||
│ │ │
|
||||
▼ ▼ ┌────────┴────────┐
|
||||
┌──────────────┐ ┌───────────┐ │ │
|
||||
│ Built-in │ │ config. │ ▼ ▼
|
||||
│ Codex OAuth │ │ toml │ ┌────────────┐ ┌───────────┐
|
||||
└──────────────┘ └───────────┘ │ OpenAI │ │ Anthropic │
|
||||
│ Chat Comp. │ │ Messages │
|
||||
└────────────┘ └───────────┘
|
||||
│ Codex OAuth │ │ toml │ ┌────────────┐ ┌───────────┐ ┌──────────┐
|
||||
└──────────────┘ └───────────┘ │ OpenAI │ │ Anthropic │ │ Command │
|
||||
│ Chat Comp. │ │ Messages │ │ Code │
|
||||
└────────────┘ └───────────┘ └──────────┘
|
||||
```
|
||||
|
||||
---
|
||||
@@ -103,20 +109,70 @@ A three-component system:
|
||||
- **Browser UA injection** — bypasses Cloudflare bot detection for providers like OpenCode
|
||||
- **Smart URL construction** — prevents double-path bugs (`/v1/chat/completions/chat/completions`)
|
||||
- **Header forwarding** — preserves client identity headers while filtering hop-by-hop headers
|
||||
- **Connection pooling** — persistent HTTPS connections per host, eliminates TLS handshake overhead per request
|
||||
- **Stream idle timeout** — kills stalled upstream connections after 5 minutes of silence
|
||||
- **Retry-After support** — respects upstream `Retry-After` headers on 429/502/503 responses
|
||||
- **Response store TTL** — evicts stored responses older than 10 minutes, prevents memory leaks
|
||||
- **Bounded stream buffers** — 8MB cap prevents OOM on pathological responses
|
||||
- **Dual logging** — all proxy messages written to both stderr and `~/.cache/codex-proxy/proxy.log`
|
||||
- Zero dependencies — pure Python stdlib
|
||||
|
||||
### Command Code Adapter
|
||||
- **Multi-format tool-call parser** — handles all known CC model output formats in a cascading chain:
|
||||
- DSML tags (`<||DSML||invoke>`) — current model format
|
||||
- `<bash>...</bash>` blocks with metadata extraction
|
||||
- `<explore_agent>` blocks converted to real `exec_command`
|
||||
- `<tool_call type="bash">` HTML-like blocks
|
||||
- XML `<function=` patterns
|
||||
- Raw JSON `{"cmd":"..."}` embedded in text
|
||||
- Fallback regex for unrecognized tool-call signals
|
||||
- **Three-tier argument parser** — handles double-wrapped, escaped, and unicode-escaped arguments
|
||||
- **Recursive unwrapping** — handles double/triple-wrapped `cmd` values
|
||||
- **Post-extraction sanitizer** — validates every tool call has valid name + args before forwarding to Codex
|
||||
- **ErrorAnalyzer** — learns from 4xx errors, retries with adjusted parameters (max 2 retries)
|
||||
- **Schema cache** with 24h staleness TTL for provider capabilities
|
||||
|
||||
### Intelligence Routing (v3.7.0)
|
||||
- **Three-layer self-healing system** — the agent loop never stalls, even when the model speaks gibberish
|
||||
- **Layer 1 — Deep URL Extraction**: When `<explore_agent>` hides URLs inside nested JSON (`messages: [{"content": "https://..."}]`), the parser drills into the JSON structure to find them. Module-level `_build_explore_cmd()` is reused across parser + stream path.
|
||||
- **Layer 2 — Escalation Auto-Proceed**: `<require_escalation>` and `<request_escalation_permission>` blocks are detected and auto-resolved — the model doesn't get stuck waiting for permissions that don't exist.
|
||||
- **Layer 3 — Intent-Based Command Synthesis**: When ALL parsers fail, 5 heuristics analyze the model's plain-text output and synthesize a working command:
|
||||
1. URL detected → `curl` it
|
||||
2. File path mentioned → `cat` or `ls` it
|
||||
3. Shell command in quotes → extract and run it
|
||||
4. "explore"/"fetch" intent → use the last URL the user mentioned
|
||||
5. "I need to"/"let me" intent → echo a diagnostic so the loop continues
|
||||
- **Session URL memory** — `_last_user_urls` deque (20 entries) tracks URLs from user messages across the session, giving the synthesizer context to work with
|
||||
- **54 self-test patterns** — comprehensive coverage of all three layers
|
||||
|
||||
### AI Monitoring (v3.8.0)
|
||||
- **Self-healing watchdog** — the proxy auto-recovers from crashes, the model getting stuck, upstream failures, and more
|
||||
- **Three-tier response system**: Tier 1 = rule-based (< 1s), Tier 2 = pattern lookup (< 100ms), Tier 3 = AI diagnostic agent (2-5s)
|
||||
- **HealthWatcher thread** — pings proxy `/health` every 5 seconds, auto-restarts on crash
|
||||
- **LogAnalyzer thread** — tails debug logs for 18 failure signal patterns in real-time
|
||||
- **14 Tier 1 rules** — restart proxy, clear schema cache, kill stale processes, retry with backoff, rate limit handling
|
||||
- **Incident pattern store** — learns from every resolved incident, looks up known fixes by success rate
|
||||
- **AI diagnostic agent** — user-configurable provider/model (e.g., Gemini Flash, GPT-4o-mini, local Ollama) for diagnosing novel failures
|
||||
- **30 fault types** catalogued across 5 categories: proxy failures (A), upstream errors (B), parser failures (C), Codex process failures (D), config/state failures (E)
|
||||
- **Safety guards** — rate-limited AI calls, restart caps (5/10min), cooldown per pattern, monthly budget cap
|
||||
- **GUI panel** — ON/OFF toggle, provider/model/API key selector, health check interval, auto-restart toggle, incident log viewer
|
||||
- **Enhanced `/health`** — returns `uptime_s`, `memory_mb`, `requests_total` for monitoring
|
||||
|
||||
### GTK Launcher (`codex-launcher-gui`)
|
||||
- **Endpoint manager** — add, edit, delete, set default providers
|
||||
- **Provider presets** — one-click setup for 10+ providers with pre-filled URLs and model lists
|
||||
- **Provider presets** — one-click setup for 15+ providers with pre-filled URLs and model lists
|
||||
- **Model auto-fetch** — pulls available models directly from provider APIs
|
||||
- **Bulk model import** — paste a comma/newline-separated list of model IDs
|
||||
- **Launch Desktop** — starts Codex Desktop with the selected provider and model
|
||||
- **Launch CLI** — opens Codex CLI in a terminal with the selected provider
|
||||
- **Codex Default** — launch with built-in OAuth, no proxy or custom config
|
||||
- **AI Assist** — integrated AI-powered configuration assistance and troubleshooting
|
||||
- **Usage Dashboard** — per-provider tracking with dark theme, KPI strip, model bars, status pills
|
||||
- **Profile backup/import** — export and import endpoint configurations as portable JSON bundles
|
||||
- **Threaded operations** — model refresh runs in background, UI stays responsive
|
||||
- **Process lifecycle** — stall detection, kill/cleanup, config backup/restore around sessions
|
||||
- **Config normalization** — automatically strips stale API path suffixes from URLs
|
||||
- **Reasoning controls** — per-provider reasoning toggle with effort level selection
|
||||
|
||||
### Process Management
|
||||
- Kills stale electron/webview/app-server processes from previous sessions
|
||||
@@ -266,6 +322,153 @@ codex-launcher-gui
|
||||
2. On launch: backup config → **delete** `config.toml` entirely → start Codex → restore config after exit
|
||||
3. Key insight: writing empty strings (`model = ""`, `model_provider = ""`) causes Codex to error with "Model provider `` not found". The config must not exist at all for Codex to fall back to built-in defaults.
|
||||
|
||||
### Phase 7: Command Code Multi-Format Parser — The 17-Fix Odyssey
|
||||
|
||||
**Problem:** Command Code provider's tool calls were silently dropped, causing the Codex agent loop to stop after the first response. The CC model returns tool calls as inline text in wildly varying formats that change between sessions and model versions.
|
||||
|
||||
**Root Cause Analysis:**
|
||||
1. CC's `/alpha/generate` API uses a proprietary protocol — not Chat Completions, not Anthropic Messages
|
||||
2. Tool calls appear as inline text within `text-delta` SSE events, not as structured JSON
|
||||
3. The model output format is **non-deterministic** — observed 6+ distinct formats:
|
||||
- Raw JSON: `{"cmd":"mkdir -p /foo","type":"tool-call"}`
|
||||
- XML: `<function name="exec_command"><parameter name="cmd">...</parameter></function>`
|
||||
- HTML-like: `<tool_call type="bash">\n{"command":"..."}`
|
||||
- Bash blocks: `<bash>\nprefix_rule: ...\n{"command":"..."}</bash>`
|
||||
- Explore blocks: `<explore_agent>...</explore_agent>`
|
||||
- DSML tags: `<||DSML||invoke name="exec"><||DSML||parameter name="command">...</parameter></invoke>`
|
||||
4. Additional complications: double-wrapped arguments, unescaped quotes, unicode escapes, missing fields
|
||||
|
||||
**The Fix — 17 Incremental Patches:**
|
||||
Built a cascading parser chain (`DSML → bash → explore → tool_call → XML → raw JSON → fallback regex`) that tries each format in order. Each patch addressed a specific format observed in production:
|
||||
|
||||
- **FIX 1–4**: Foundation — string-only content, version headers, cache clearing, streaming error handling
|
||||
- **FIX 5–8**: Core parsing — raw JSON extraction, three-tier argument parser, field extraction, permission normalization
|
||||
- **FIX 9–10**: Cleanup — removed dead code, added documentation
|
||||
- **FIX 11–11c**: Robustness — recursive unwrapping of nested cmd values, post-extraction sanitizer, XML regex fix
|
||||
- **FIX 12**: Self-revive watchdog — proxy auto-restarts on crash instead of dying silently
|
||||
- **FIX 13–17**: New format support — fallback extraction, HTML-like blocks, explore blocks, bash blocks, DSML tags
|
||||
|
||||
**Key Design Decision:** Field-level regex extraction instead of JSON parsing. Standard JSON parsers fail on unescaped quotes in shell commands (e.g., `echo "hello world"` breaks JSON). The regex approach tolerates malformed JSON by extracting individual fields.
|
||||
|
||||
**Verification:** `--self-test` flag runs 19 automated tests covering all edge cases. Debug logging to `~/.cache/codex-proxy/cc-debug.log` captures every parser decision for troubleshooting.
|
||||
|
||||
### Phase 8: Intelligence Routing — When the Model Refuses to Speak Machine
|
||||
|
||||
**Problem:** The 17-fix parser chain from Phase 7 was powerful — it could handle DSML, XML, JSON, bash blocks, explore tags, you name it. But there was one edge case it couldn't crack: **when the model doesn't produce a parseable tool-call format at all**.
|
||||
|
||||
In production, `deepseek/deepseek-v4-flash` via Command Code kept doing things like:
|
||||
|
||||
```
|
||||
<explore_agent>
|
||||
messages: [{"content": "Understand the Z.AI-Chat-for-Android repo at https://..."}]
|
||||
</explore_agent>
|
||||
```
|
||||
|
||||
or:
|
||||
|
||||
```
|
||||
<require_escalation>
|
||||
I need elevated permissions to access the repository.
|
||||
</require_escalation>
|
||||
```
|
||||
|
||||
or just plain English: *"I need to fetch the README from the repository to understand the app structure."*
|
||||
|
||||
In every case, `parsed_tool_calls=0`. No tool to execute. The Codex agent loop ground to a halt. The user saw "thinking..." forever.
|
||||
|
||||
**The insight:** The model is trying to communicate *intent*, just not in a format we can parse. Instead of adding more regex patterns, what if we could **read the model's mind** — understand what it *wants* to do, and synthesize the command for it?
|
||||
|
||||
**Intelligence Routing — Three Layers of Escalation:**
|
||||
|
||||
```
|
||||
Layer 1: "Fix the input" — Can we extract more from what the model gave us?
|
||||
Layer 2: "Handle the intent" — Is the model asking for something we can auto-resolve?
|
||||
Layer 3: "Read the mind" — What is the model trying to do? Just do it for it.
|
||||
```
|
||||
|
||||
**Layer 1 — Deep URL Extraction (FIX 23):**
|
||||
|
||||
The `<explore_agent>` handler had a URL regex, but the URL was trapped inside `{"content": "https://..."}` — the trailing `"` broke matching. The fix: after the initial regex fails, `json.loads()` the entire block, walk the JSON tree, and pull URLs out of `content` fields. The `_build_explore_cmd()` function was extracted to module level so both the parser and the stream handler could use it.
|
||||
|
||||
```python
|
||||
# Before: regex fails, URL lost
|
||||
# After: json.loads -> iterate items -> extract content -> find URL
|
||||
```
|
||||
|
||||
**Layer 2 — Escalation Auto-Proceed (FIX 24):**
|
||||
|
||||
`<require_escalation>` blocks are the model's way of saying "I need more permissions." The CC adapter doesn't have an escalation mechanism — these blocks were silently dropped. The fix: detect them (both closed `<tag>...</tag>` and bare `<tag />` forms), extract any URL inside them, and auto-proceed with an explore command or a diagnostic echo.
|
||||
|
||||
```python
|
||||
# Model: <require_escalation>Please let me run curl</require_escalation>
|
||||
# Proxy: Okay, here's your curl command → exec_command synthesized
|
||||
```
|
||||
|
||||
**Layer 3 — Intent-Based Command Synthesis (FIX 25):**
|
||||
|
||||
The crown jewel. When ALL parsers return empty — no DSML, no XML, no JSON, no fallback regex matches — the system doesn't give up. It analyzes the model's raw text through **5 heuristic lenses** in priority order:
|
||||
|
||||
| Priority | Signal | Synthesized Command |
|
||||
|:--------:|--------|---------------------|
|
||||
| 1 | URL in text | `curl` to fetch it |
|
||||
| 2 | File path reference | `cat` or `ls` the file |
|
||||
| 3 | Shell command in backticks/quotes | Extract and run it |
|
||||
| 4 | "explore"/"fetch" + last user URL | Full explore command |
|
||||
| 5 | "I need to"/"let me" intent | Echo diagnostic |
|
||||
|
||||
The system also maintains a **session URL memory** (`_last_user_urls`, a deque of the last 20 URLs from user messages) so heuristic 4 always has a URL to work with, even when the model's text doesn't contain one.
|
||||
|
||||
```python
|
||||
# Model: "I should explore the repository to understand its structure."
|
||||
# Parser: empty (no parseable format)
|
||||
# Layer 3 heuristic 4: "explore" detected, pulling URL from session memory...
|
||||
# Result: exec_command with full curl pipeline
|
||||
```
|
||||
|
||||
**The result:** Before Intelligence Routing, `parsed_tool_calls=0` meant **game over** — the agent loop stalled permanently. After Intelligence Routing, `parsed_tool_calls=0` triggers the self-healing chain and the loop **always** gets a tool call to execute. The model can speak in tongues and the system still works.
|
||||
|
||||
**Test coverage:** 54 self-test patterns (up from 41), with 13 new tests specifically for Intelligence Routing layers.
|
||||
|
||||
### Phase 9: AI Monitoring — The Watchman That Never Sleeps
|
||||
|
||||
**Problem:** Intelligence Routing (Phase 8) handles failures *inside a single request*. But it can't detect a dead proxy process, reconnect Codex to a restarted proxy, switch to a backup provider when the primary is down, or clear corrupt caches. When the proxy crashes at 3 AM, the user wakes up to a broken Codex session and has to manually restart everything.
|
||||
|
||||
**The insight:** We needed a separate watchdog process that runs *outside* the proxy — monitoring it from the outside, like a night watchman patrolling a building. But a dumb watchdog that just restarts on crash is crude. What if the watchdog could *think* — diagnose *why* the proxy crashed and take the right corrective action?
|
||||
|
||||
**The Three-Tier Response System:**
|
||||
|
||||
```
|
||||
Failure Detected
|
||||
│
|
||||
├── Tier 1: Known pattern? → Rule-based fix (< 1 second)
|
||||
│ "proxy dead" → restart_proxy
|
||||
│ "429 rate limit" → wait_retry_after
|
||||
│ "schema corrupt" → delete_provider_caps
|
||||
│
|
||||
├── Tier 2: Seen this before? → Incident store lookup (< 100ms)
|
||||
│ 85% success rate → reuse the fix that worked last time
|
||||
│
|
||||
└── Tier 3: Novel failure? → AI diagnostic agent (2-5 seconds)
|
||||
Feed context to cheap LLM → get recommended action
|
||||
Learn from result for next time
|
||||
```
|
||||
|
||||
**What makes this different from existing solutions:**
|
||||
|
||||
Existing proxy tools (ccLoad, cc-proxy, codex-pool) all focus on routing and failover at the *request* level. None have an AI-powered diagnostic agent that analyzes failure context and recommends corrective actions. ccLoad has health checks and cooldowns, but it's purely rule-based. AI Monitoring adds the *intelligence* layer on top — the Tier 3 agent can diagnose novel failures that no rule covers.
|
||||
|
||||
**How it works:**
|
||||
|
||||
Two threads run in the GUI process:
|
||||
1. `HealthWatcher` — pings `/health` every 5 seconds. On 3 consecutive failures, triggers Tier 1 `restart_proxy`.
|
||||
2. `LogAnalyzer` — tails the debug log file, watching for 18 signal patterns. Counts consecutive failures per category. When a threshold is hit (e.g., 5x stuck recovery, 3x server error), triggers the appropriate tier.
|
||||
|
||||
The AI diagnostic agent (Tier 3) is fully configurable — the user picks any provider and model. A cheap model like Gemini Flash (~$0.0002/call) or a free local Ollama instance works perfectly. The agent receives a structured incident report (proxy health, upstream status, recent errors, parser state) and responds with one JSON action.
|
||||
|
||||
**Learning over time:** Every resolved incident is stored in `incident-store.json` with pattern → fix → success rate. Over time, the system shifts from Tier 3 (expensive AI calls) to Tier 2 (instant pattern lookup). A failure seen 10 times with 90% success rate will never reach the AI again.
|
||||
|
||||
**Catalogued 30 fault types** across 5 categories based on analysis of 42 production `parsed_tool_calls=0` events, 13 stuck recoveries, and 11 sanitizer flags from our actual debug logs. The system knows exactly what to look for.
|
||||
|
||||
---
|
||||
|
||||
## Architecture Deep Dive
|
||||
@@ -332,6 +535,67 @@ The launcher generates model catalog JSON with dual field naming to satisfy both
|
||||
|
||||
---
|
||||
|
||||
## Gemini Antigravity State Continuity
|
||||
|
||||
Codex Launcher includes special handling for Gemini 3 / Antigravity OAuth:
|
||||
|
||||
- **Thought signature preservation**: Captures `thoughtSignature` from Gemini responses
|
||||
and reattaches them on follow-up requests to maintain tool-call continuity.
|
||||
- **Edit-intent detection**: When follow-up requests contain edit keywords, a tool-use
|
||||
nudge is injected to prevent text-only responses.
|
||||
- **User instruction enforcement**: The latest user message is guaranteed to be the
|
||||
final content turn sent to Gemini, even after compaction.
|
||||
- **Smart compaction**: Old tool outputs capped at 3000 chars, recent 6 at 20000 chars.
|
||||
|
||||
---
|
||||
|
||||
## Multi-Account Rotation
|
||||
|
||||
Codex Launcher supports **multiple accounts per provider** with automatic rotation
|
||||
when one account is rate-limited.
|
||||
|
||||
### Codebuff (Multiple Accounts)
|
||||
|
||||
Register additional free accounts at [codebuff.com](https://www.codebuff.com), then
|
||||
add them to `~/.config/manicode/credentials.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"default": { "authToken": "token-primary", "email": "you+1@gmail.com" },
|
||||
"accounts": [
|
||||
{ "authToken": "token-secondary", "email": "you+2@gmail.com" },
|
||||
{ "authToken": "token-tertiary", "email": "you+3@gmail.com" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Each account gets 5 free requests/day. With 3 accounts = **15 requests/day**.
|
||||
|
||||
### Google OAuth (Multiple Projects)
|
||||
|
||||
Add additional Google Cloud token files:
|
||||
|
||||
```
|
||||
~/.cache/codex-proxy/google-antigravity-oauth-token.json # primary
|
||||
~/.cache/codex-proxy/google-antigravity-oauth-token-1.json # extra project 1
|
||||
~/.cache/codex-proxy/google-antigravity-oauth-token-2.json # extra project 2
|
||||
```
|
||||
|
||||
### API Keys (Comma-Separated)
|
||||
|
||||
For any OpenAI-compatible provider:
|
||||
```json
|
||||
{ "api_key": "sk-key1,sk-key2,sk-key3" }
|
||||
```
|
||||
|
||||
### Account Status Endpoint
|
||||
|
||||
```bash
|
||||
curl http://127.0.0.1:PORT/v1/accounts
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Provider Presets
|
||||
|
||||
| Preset | Backend | Base URL |
|
||||
@@ -341,13 +605,28 @@ The launcher generates model catalog JSON with dual field naming to satisfy both
|
||||
| OpenCode Zen | OpenAI-compat | `https://opencode.ai/zen/v1` |
|
||||
| OpenCode Go | OpenAI-compat | `https://opencode.ai/zen/go/v1` |
|
||||
| Command Code | Command Code | `https://api.commandcode.ai` |
|
||||
| **Codebuff** | **Codebuff** | `https://codebuff.com` *(free DeepSeek/Kimi — OAuth login built-in)* |
|
||||
| Crof.ai | OpenAI-compat | `https://crof.ai/v1` |
|
||||
| OpenAdapter | OpenAI-compat | `https://api.openadapter.in/v1` |
|
||||
| Z.ai Coding | OpenAI-compat | `https://api.z.ai/api/coding/paas/v4` |
|
||||
| NVIDIA NIM | OpenAI-compat | `https://integrate.api.nvidia.com/v1` |
|
||||
| Kilo.ai | OpenAI-compat | `https://api.kilo.ai/api/gateway` |
|
||||
| OpenRouter | OpenAI-compat | `https://openrouter.ai/api/v1` |
|
||||
| Z.AI | OpenAI-compat | `https://api.z.ai/api/coding/paas/v4` |
|
||||
| Google Gemini (API Key) | OpenAI-compat | `https://generativelanguage.googleapis.com/v1beta/openai` |
|
||||
| Google Gemini (OAuth) | Gemini OAuth | `cloudcode-pa.googleapis.com` |
|
||||
| Google Antigravity (OAuth) | Antigravity OAuth | `daily-cloudcode-pa.sandbox.googleapis.com` |
|
||||
| Custom | Any | User-defined |
|
||||
|
||||
### Free Models (via Codebuff)
|
||||
Codebuff provides free access to these models — no API key needed:
|
||||
- **DeepSeek V4 Pro** — Smartest model
|
||||
- **DeepSeek V4 Flash** — Most efficient
|
||||
- **Kimi K2.6** — Balanced
|
||||
- **MiniMax M2.7** — Fastest
|
||||
|
||||
*Requires: `codebuff login` via GUI OAuth button, or `npm install -g codebuff && codebuff login` (GitHub OAuth)*
|
||||
|
||||
---
|
||||
|
||||
## File Structure
|
||||
@@ -366,17 +645,63 @@ README.md # This file
|
||||
### Installed Locations
|
||||
|
||||
```
|
||||
~/.local/bin/translate-proxy.py # Proxy
|
||||
~/.local/bin/codex-launcher-gui # Launcher
|
||||
~/.local/bin/cleanup-codex-stale.sh # Cleanup
|
||||
~/.local/share/applications/codex-launcher.desktop # App grid entry
|
||||
~/.codex/endpoints.json # Endpoint storage
|
||||
~/.codex/config.toml # Codex config (auto-generated)
|
||||
~/.cache/codex-proxy/ # Proxy configs + model catalogs
|
||||
/usr/bin/translate-proxy.py # Proxy (from .deb)
|
||||
/usr/bin/codex-launcher-gui # Launcher (from .deb)
|
||||
/usr/bin/cleanup-codex-stale.sh # Cleanup (from .deb)
|
||||
/usr/share/applications/codex-launcher.desktop # App grid entry
|
||||
~/.codex/endpoints.json # Endpoint storage
|
||||
~/.codex/config.toml # Codex config (auto-generated)
|
||||
~/.cache/codex-proxy/ # Proxy configs + model catalogs
|
||||
~/.cache/codex-proxy/cc-debug.log # Debug log (per-request)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Phase 10: Codebuff Integration — Free AI for Everyone (v3.8.1)
|
||||
|
||||
**Problem:** Users want access to powerful models like DeepSeek V4 Pro without paying API fees. Codebuff (by CodebuffAI) offers free access to premium models through their server, but it's a CLI tool — not an API you can plug into Codex Launcher.
|
||||
|
||||
**The insight:** Codebuff's backend is a Next.js app with an OpenAI-compatible `/api/v1/chat/completions` endpoint. It uses agent-run lifecycle management and model-specific routing. If we replicate the agent run protocol in our proxy, we can tap into codebuff's free tier.
|
||||
|
||||
**How Codebuff works internally:**
|
||||
1. User logs in via GitHub OAuth → session token stored in `~/.config/manicode/credentials.json`
|
||||
2. Each request creates an **agent run** via `POST /api/v1/agent-runs`
|
||||
3. Chat completions sent with `codebuff_metadata: {run_id, cost_mode: "free"}`
|
||||
4. Server routes to the correct upstream provider using its own API keys
|
||||
5. Agent run finished when request completes
|
||||
|
||||
**What we built:**
|
||||
|
||||
```
|
||||
Codex Request
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────┐
|
||||
│ translate-proxy.py │
|
||||
│ _handle_codebuff() │
|
||||
│ │
|
||||
│ 1. Read token from credentials │
|
||||
│ 2. POST /api/v1/agent-runs │──→ {action: "START", agentId}
|
||||
│ 3. POST /api/v1/chat/completions │──→ {model, messages,
|
||||
│ codebuff_metadata: {
|
||||
│ run_id, cost_mode: "free"}}
|
||||
│ 4. Stream response back to Codex │←── SSE events
|
||||
│ 5. POST /api/v1/agent-runs │──→ {action: "FINISH"}
|
||||
└─────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Free models available:**
|
||||
| Model | Agent ID | Notes |
|
||||
|-------|----------|-------|
|
||||
| DeepSeek V4 Pro | `base2-free-deepseek` | Smartest |
|
||||
| DeepSeek V4 Flash | `base2-free-deepseek-flash` | Most efficient |
|
||||
| Kimi K2.6 | `base2-free-kimi` | Balanced |
|
||||
| MiniMax M2.7 | `base2-free` | Fastest |
|
||||
|
||||
**Bonus fix:** While investigating this, we discovered that `endpoints.json` had been overwritten with only 4 AG X entries, losing all 17+ provider presets. Restored all presets from proxy cache files.
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Issue | Cause | Fix |
|
||||
@@ -391,6 +716,17 @@ README.md # This file
|
||||
| Models not showing in picker | Wrong model catalog format | Must have both `slug` + `model` fields |
|
||||
| Codex hangs in "thinking" | Missing `response.completed` | Proxy emits full SSE event sequence |
|
||||
| Stops after first tool call (Crof) | `previous_response_id` not resolved | V2.1.2 stores and chains responses for multi-turn |
|
||||
| CC agent stops after first response | Tool calls not parsed from model text | V3.5 multi-format parser handles all CC output formats |
|
||||
| CC tool calls have wrong args | Double-wrapped arguments | V3.5 three-tier parser + recursive unwrapping |
|
||||
| Proxy crashes mid-session | Unhandled streaming error | V3.5 self-revive watchdog auto-restarts |
|
||||
| CC 403 upgrade_required | Missing version header | V3.5 always sends `x-command-code-version` |
|
||||
| CC explore_agent can't find URL | URL hidden inside JSON messages | V3.7 Layer 1 drills into JSON to extract URLs |
|
||||
| CC agent stalls on escalation blocks | `<require_escalation>` not handled | V3.7 Layer 2 auto-proceeds past escalation requests |
|
||||
| CC agent stalls — no tool calls at all | Model output format unrecognized | V3.7 Layer 3 synthesizes command from text intent |
|
||||
| Proxy crashes mid-session | Unhandled streaming error | V3.8 AI Monitor auto-restarts proxy |
|
||||
| Proxy port conflict on restart | Stale process holding port | V3.8 AI Monitor kills stale + restarts |
|
||||
| Schema cache corruption | ErrorAnalyzer learned wrong schema | V3.8 AI Monitor auto-clears provider-caps.json |
|
||||
| Upstream 500 repeatedly | Provider having issues | V3.8 AI Monitor detects pattern + alerts/switches |
|
||||
|
||||
---
|
||||
|
||||
|
||||
5073
codex-launcher-gui
Executable file
5073
codex-launcher-gui
Executable file
File diff suppressed because it is too large
Load Diff
Binary file not shown.
BIN
codex-launcher_3.0.0_all.deb
Normal file
BIN
codex-launcher_3.0.0_all.deb
Normal file
Binary file not shown.
BIN
codex-launcher_3.3.0_all.deb
Normal file
BIN
codex-launcher_3.3.0_all.deb
Normal file
Binary file not shown.
BIN
codex-launcher_3.5.0_all.deb
Normal file
BIN
codex-launcher_3.5.0_all.deb
Normal file
Binary file not shown.
BIN
codex-launcher_3.6.0_all.deb
Normal file
BIN
codex-launcher_3.6.0_all.deb
Normal file
Binary file not shown.
BIN
codex-launcher_3.7.0_all.deb
Normal file
BIN
codex-launcher_3.7.0_all.deb
Normal file
Binary file not shown.
BIN
codex-launcher_3.8.0_all.deb
Normal file
BIN
codex-launcher_3.8.0_all.deb
Normal file
Binary file not shown.
BIN
codex-launcher_3.8.1_all.deb
Normal file
BIN
codex-launcher_3.8.1_all.deb
Normal file
Binary file not shown.
BIN
codex-launcher_3.8.3_all.deb
Normal file
BIN
codex-launcher_3.8.3_all.deb
Normal file
Binary file not shown.
BIN
codex-launcher_3.8.4_all.deb
Normal file
BIN
codex-launcher_3.8.4_all.deb
Normal file
Binary file not shown.
BIN
codex-launcher_3.9.7_all.deb
Normal file
BIN
codex-launcher_3.9.7_all.deb
Normal file
Binary file not shown.
49
install.sh
49
install.sh
@@ -2,28 +2,35 @@
|
||||
set -e
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
BIN_DIR="$HOME/.local/bin"
|
||||
APP_DIR="$HOME/.local/share/applications"
|
||||
|
||||
mkdir -p "$BIN_DIR" "$APP_DIR"
|
||||
if [ -f "$SCRIPT_DIR/codex-launcher_3.9.7_all.deb" ]; then
|
||||
echo "Installing codex-launcher_3.9.7_all.deb ..."
|
||||
sudo dpkg -i "$SCRIPT_DIR/codex-launcher_3.9.7_all.deb"
|
||||
echo ""
|
||||
echo "Installed v3.9.7 via .deb package."
|
||||
echo " translate-proxy.py -> /usr/bin/translate-proxy.py"
|
||||
echo " codex-launcher-gui -> /usr/bin/codex-launcher-gui"
|
||||
echo " cleanup-codex-stale -> /usr/bin/cleanup-codex-stale.sh"
|
||||
echo " desktop entry -> /usr/share/applications/codex-launcher.desktop"
|
||||
else
|
||||
BIN_DIR="$HOME/.local/bin"
|
||||
APP_DIR="$HOME/.local/share/applications"
|
||||
mkdir -p "$BIN_DIR" "$APP_DIR"
|
||||
cp "$SCRIPT_DIR/src/translate-proxy.py" "$BIN_DIR/"
|
||||
cp "$SCRIPT_DIR/src/codex-launcher-gui" "$BIN_DIR/"
|
||||
cp "$SCRIPT_DIR/src/cleanup-codex-stale.sh" "$BIN_DIR/"
|
||||
chmod +x "$BIN_DIR/translate-proxy.py"
|
||||
chmod +x "$BIN_DIR/codex-launcher-gui"
|
||||
chmod +x "$BIN_DIR/cleanup-codex-stale.sh"
|
||||
USERNAME=$(whoami)
|
||||
sed "s/YOUR_USERNAME/$USERNAME/g" "$SCRIPT_DIR/src/codex-launcher.desktop.template" > "$APP_DIR/codex-launcher.desktop"
|
||||
update-desktop-database "$APP_DIR" 2>/dev/null || true
|
||||
echo "Installed from source."
|
||||
echo " translate-proxy.py -> $BIN_DIR/translate-proxy.py"
|
||||
echo " codex-launcher-gui -> $BIN_DIR/codex-launcher-gui"
|
||||
echo " cleanup-codex-stale -> $BIN_DIR/cleanup-codex-stale.sh"
|
||||
echo " desktop entry -> $APP_DIR/codex-launcher.desktop"
|
||||
fi
|
||||
|
||||
cp "$SCRIPT_DIR/src/translate-proxy.py" "$BIN_DIR/"
|
||||
cp "$SCRIPT_DIR/src/codex-launcher-gui" "$BIN_DIR/"
|
||||
cp "$SCRIPT_DIR/src/cleanup-codex-stale.sh" "$BIN_DIR/"
|
||||
|
||||
chmod +x "$BIN_DIR/translate-proxy.py"
|
||||
chmod +x "$BIN_DIR/codex-launcher-gui"
|
||||
chmod +x "$BIN_DIR/cleanup-codex-stale.sh"
|
||||
|
||||
USERNAME=$(whoami)
|
||||
sed "s/YOUR_USERNAME/$USERNAME/g" "$SCRIPT_DIR/src/codex-launcher.desktop.template" > "$APP_DIR/codex-launcher.desktop"
|
||||
|
||||
update-desktop-database "$APP_DIR" 2>/dev/null || true
|
||||
|
||||
echo "Installed."
|
||||
echo " translate-proxy.py -> $BIN_DIR/translate-proxy.py"
|
||||
echo " codex-launcher-gui -> $BIN_DIR/codex-launcher-gui"
|
||||
echo " cleanup-codex-stale -> $BIN_DIR/cleanup-codex-stale.sh"
|
||||
echo " desktop entry -> $APP_DIR/codex-launcher.desktop"
|
||||
echo ""
|
||||
echo "Open 'Codex Launcher' from your app grid, or run: codex-launcher-gui"
|
||||
|
||||
@@ -1,42 +1,51 @@
|
||||
#!/bin/bash
|
||||
# Cleanup script for Codex Desktop - kills stale processes before launch
|
||||
# Cleanup script for Codex Launcher - kills only launcher-owned processes.
|
||||
|
||||
echo "Cleaning up stale Codex processes..." >&2
|
||||
set -u
|
||||
|
||||
# Kill codex app-server processes
|
||||
for pid in $(ps aux 2>/dev/null | grep -E "codex .*app-server" | grep -v grep | awk '{print $2}'); do
|
||||
kill -9 "$pid" 2>/dev/null || true
|
||||
echo " Killed app-server pid=$pid"
|
||||
REGISTRY="${HOME}/.cache/codex-launcher/pids.json"
|
||||
|
||||
echo "Cleaning up launcher-owned processes..." >&2
|
||||
|
||||
kill_group() {
|
||||
kind="$1"
|
||||
pgid="$2"
|
||||
|
||||
if [ -z "$pgid" ] || [ "$pgid" = "null" ]; then
|
||||
return 0
|
||||
fi
|
||||
|
||||
if kill -TERM -- "-$pgid" 2>/dev/null; then
|
||||
echo " Stopped ${kind} pgid=${pgid}"
|
||||
return 0
|
||||
fi
|
||||
|
||||
return 0
|
||||
}
|
||||
|
||||
if [ -f "$REGISTRY" ]; then
|
||||
python3 - "$REGISTRY" <<'PY'
|
||||
import json, sys
|
||||
from pathlib import Path
|
||||
|
||||
path = Path(sys.argv[1])
|
||||
try:
|
||||
data = json.loads(path.read_text())
|
||||
except Exception:
|
||||
data = {}
|
||||
|
||||
for kind, meta in sorted(data.items()):
|
||||
pgid = meta.get('pgid') if isinstance(meta, dict) else None
|
||||
if pgid:
|
||||
print(f'{kind}\t{pgid}')
|
||||
PY
|
||||
else
|
||||
echo " No registry found; nothing to stop"
|
||||
fi | while IFS=$'\t' read -r kind pgid; do
|
||||
[ -n "${kind:-}" ] || continue
|
||||
kill_group "$kind" "$pgid"
|
||||
done
|
||||
|
||||
# Kill webview server
|
||||
for pid in $(ps aux 2>/dev/null | grep webview-server.py | grep -v grep | awk '{print $2}'); do
|
||||
kill -9 "$pid" 2>/dev/null || true
|
||||
echo " Killed webview-server pid=$pid"
|
||||
done
|
||||
|
||||
# Kill main electron process for codex-desktop
|
||||
for pid in $(ps aux 2>/dev/null | grep "/opt/codex-desktop/electron" | grep "class=codex-desktop" | grep -v grep | awk '{print $2}'); do
|
||||
kill -9 "$pid" 2>/dev/null || true
|
||||
echo " Killed electron pid=$pid"
|
||||
done
|
||||
|
||||
# Kill all remaining child processes of codex-desktop
|
||||
for pid in $(ps aux 2>/dev/null | grep "/opt/codex-desktop/" | grep -v grep | awk '{print $2}'); do
|
||||
kill -9 "$pid" 2>/dev/null || true
|
||||
done
|
||||
|
||||
# Kill zai proxy (if any)
|
||||
for pid in $(ps aux 2>/dev/null | grep zai-proxy.py | grep -v grep | awk '{print $2}'); do
|
||||
kill "$pid" 2>/dev/null || true
|
||||
done
|
||||
|
||||
# Kill unified translation proxy (if any)
|
||||
for pid in $(ps aux 2>/dev/null | grep translate-proxy.py | grep -v grep | awk '{print $2}'); do
|
||||
kill "$pid" 2>/dev/null || true
|
||||
done
|
||||
|
||||
# Remove stale socket and PID files
|
||||
rm -f "$HOME/.codex/.launch-action-socket" 2>/dev/null || true
|
||||
rm -f "$HOME/.codex/.codex-desktop-launch-action" 2>/dev/null || true
|
||||
rm -f "$HOME/.local/share/codex-desktop/.launch-action-socket" 2>/dev/null || true
|
||||
@@ -46,12 +55,4 @@ rm -f "$HOME/.cache/codex-desktop/.codex-desktop-pid" 2>/dev/null || true
|
||||
rm -f "$HOME/.local/share/codex-desktop/.webview-pid" 2>/dev/null || true
|
||||
rm -f "$HOME/.cache/codex-desktop/.webview-pid" 2>/dev/null || true
|
||||
|
||||
sleep 1
|
||||
|
||||
# Verify no remaining process on port 5175 (webview)
|
||||
if lsof -ti :5175 2>/dev/null | grep -q .; then
|
||||
echo " Warning: Port 5175 still in use"
|
||||
lsof -ti :5175 2>/dev/null | xargs kill -9 2>/dev/null || true
|
||||
fi
|
||||
|
||||
echo "Cleanup complete"
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
6067
translate-proxy.py
Executable file
6067
translate-proxy.py
Executable file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user