v3.7.0: Intelligence Routing — self-healing parser system

Layer 1 (FIX 23): Deep URL extraction from nested JSON in explore_agent blocks.
Layer 2 (FIX 24): Auto-proceed on require_escalation / request_escalation_permission.
Layer 3 (FIX 25): Intent-based command synthesis with 5 heuristics when all parsers fail.

Module-level _build_explore_cmd() for reuse across parser + stream path.
54 self-test patterns (up from 41).
This commit is contained in:
admin
2026-05-22 16:29:45 +04:00
Unverified
parent 8e99821606
commit c52b801cde
5 changed files with 316 additions and 37 deletions

View File

@@ -98,7 +98,61 @@ FIX 21: DSML parser silently drops tool calls when model uses name="cmd" (THE HA
Test: Pattern L self-test verifies DSML with name="cmd" is parsed correctly.
Location: _parse_commandcode_text_tool_calls() DSML parameter loop, self-test Pattern L
═══════════════════════════════════════════════════════════════════
═══════════════════════════════════════════════════════════════════
INTELLIGENCE ROUTING — Self-Healing Parser System (v3.7.0)
════════════════════════════════════════════════════════════════════
Problem: The Command Code model produces output in unpredictable formats
that change between sessions and models. When the multi-format parser chain
(DSML → <bash> → <explore_agent> → <tool_call type=...> → XML → raw JSON →
fallback regex) returns empty, the Codex agent loop has zero tool calls and
STALLS — the user sees the model "thinking" but nothing happens.
Intelligence Routing is a three-layer self-healing system:
LAYER 1 — Deep URL Extraction (FIX 23)
The <explore_agent> handler was failing because URLs were hidden inside
nested JSON: messages: [{"content": "https://..."}]. The regex couldn't
find them because it excluded the " character that terminates JSON values.
Solution: _build_explore_cmd() is now a module-level function (was a
closure). After the initial regex fails, it tries json.loads() on the
text, iterates list items, and extracts the "content" field to find URLs.
Also added " to the regex exclusion set and rstrip characters.
LAYER 2 — Escalation Block Handling (FIX 24)
The model produces <require_escalation> and <request_escalation_permission>
blocks when it wants elevated permissions. The CC adapter doesn't support
escalation — these blocks were silently dropped, causing parsed_tool_calls=0.
Solution: Two handlers:
- FIX 24a: Closed-tag blocks — extracts URL if present, runs explore cmd;
otherwise echoes auto-proceed message.
- FIX 24b: Bare/unclosed tags (<require_escalation />) — auto-proceeds.
LAYER 3 — Intent-Based Command Synthesis (FIX 25, THE CORE)
When ALL parsers return empty and text has content, the system plays
detective using 5 heuristics in priority order:
1. URL detected in text → curl to fetch it
2. File path reference → cat or ls that file
3. Shell command in backticks/quotes → extract and run
4. "explore"/"fetch"/"investigate" intent + last user URL → explore cmd
5. "I need to"/"let me"/"please" intent text → echo diagnostic
This ensures the agent loop ALWAYS has a tool call to execute, even when
the model's output format is completely unrecognized. The loop never stalls.
Architecture:
_parse_commandcode_text_tool_calls() — LAYER 1 + LAYER 2
cc_stream_to_sse() — LAYER 3 (runs after parser chain + fallback)
The _last_user_urls deque (maxlen=20) tracks URLs from user messages
across the session, giving Layer 3 heuristic 4 a URL to work with.
Self-tests: 54 patterns (was 41) covering all three layers.
════════════════════════════════════════════════════════════════════
"""
import json, http.server, socketserver, urllib.request, urllib.parse, urllib.error, re
@@ -1736,6 +1790,49 @@ def _unwrap_cmd(cmd_val):
break
return cmd_val
def _build_explore_cmd(text_for_url):
"""Module-level explore command builder. Extracts repo URL from text,
builds a curl pipeline to fetch README, contents listing, and releases.
Used by _parse_commandcode_text_tool_calls (closure wrapper) and
cc_stream_to_sse (stuck recovery heuristic)."""
if not text_for_url:
return None, None
url_m = re.search(r"https?://[^\s\]'\\>\",]+", text_for_url)
repo_url = url_m.group(0).rstrip(")].,;'\\\"") if url_m else ""
if not repo_url and isinstance(text_for_url, str):
try:
_parsed = json.loads(text_for_url)
if isinstance(_parsed, list):
for _item in _parsed:
_c = _item.get("content", "") if isinstance(_item, dict) else str(_item)
url_m2 = re.search(r"https?://[^\s\]'\\>\",]+", _c)
if url_m2:
repo_url = url_m2.group(0).rstrip(")].,;'\\\"")
break
except Exception:
pass
if not repo_url:
return None, None
if repo_url.endswith(".git"):
repo_url = repo_url[:-4]
if "/api/v1/repos/" not in repo_url:
host_m = re.match(r"(https?://[^/]+)/(.*)", repo_url)
if host_m:
host, path = host_m.groups()
api_base = f"{host}/api/v1/repos/{path}"
else:
api_base = repo_url.replace("/admin/", "/api/v1/repos/")
else:
api_base = repo_url
cmd = (
f"cd /tmp && "
f"curl -sL --max-time 15 '{api_base}/contents/README.md' 2>/dev/null | "
f"python3 -c \"import sys,json,base64; d=json.load(sys.stdin); print(base64.b64decode(d['content']).decode())\" 2>/dev/null | head -600 && "
f"curl -sL --max-time 15 '{api_base}/contents' 2>/dev/null | python3 -c \"import sys,json; d=json.load(sys.stdin); print('\\n'.join(f'{{x.get(\'path\')}} {{x.get(\'type\')}}' for x in d[:50]))\" 2>/dev/null && "
f"curl -sL --max-time 15 '{api_base}/releases' 2>/dev/null | python3 -c \"import sys,json; d=json.load(sys.stdin); print(json.dumps(d[:3], indent=2)[:2000])\" 2>/dev/null"
)
return cmd, "Explore repository to understand the app and gather README, root contents, and releases for the landing page."
def _parse_commandcode_text_tool_calls(text):
"""Parse CommandCode's text-form tool calls into Responses function calls.
@@ -1756,37 +1853,7 @@ def _parse_commandcode_text_tool_calls(text):
if not text:
return calls
# [FIX 20] Use the module-level _build_explore_cmd helper (moved to module level
# in FIX 22 so it can also be called from cc_stream_to_sse for escalation recovery).
# The function is defined above, before _parse_commandcode_text_tool_calls.
def _build_explore_cmd_local(text_for_url):
if not text_for_url:
return None, None
url_m = re.search(r"https?://[^\s\]'\\>\"]+", text_for_url)
repo_url = url_m.group(0).rstrip(")].,;'\\") if url_m else ""
if repo_url:
# Clean trailing .git if present
if repo_url.endswith(".git"):
repo_url = repo_url[:-4]
if "/api/v1/repos/" not in repo_url:
host_m = re.match(r"(https?://[^/]+)/(.*)", repo_url)
if host_m:
host, path = host_m.groups()
api_base = f"{host}/api/v1/repos/{path}"
else:
api_base = repo_url.replace("/admin/", "/api/v1/repos/")
else:
api_base = repo_url
cmd = (
f"cd /tmp && "
f"curl -sL --max-time 15 '{api_base}/contents/README.md' 2>/dev/null | "
f"python3 -c \"import sys,json,base64; d=json.load(sys.stdin); print(base64.b64decode(d['content']).decode())\" 2>/dev/null | head -600 && "
f"curl -sL --max-time 15 '{api_base}/contents' 2>/dev/null | python3 -c \"import sys,json; d=json.load(sys.stdin); print('\\n'.join(f'{{x.get(\'path\')}} {{x.get(\'type\')}}' for x in d[:50]))\" 2>/dev/null && "
f"curl -sL --max-time 15 '{api_base}/releases' 2>/dev/null | python3 -c \"import sys,json; d=json.load(sys.stdin); print(json.dumps(d[:3], indent=2)[:2000])\" 2>/dev/null"
)
return cmd, "Explore repository to understand the app and gather README, root contents, and releases for the landing page."
return None, None
_build_explore_cmd_local = _build_explore_cmd
# [FIX 17] DSML tool_call blocks used by the model now.
# Example:
@@ -1968,6 +2035,39 @@ def _parse_commandcode_text_tool_calls(text):
"arguments": json.dumps({"cmd": cmd, "justification": justification or "Explore repository"}, ensure_ascii=False),
})
# [FIX 24] Handle <require_escalation> and <request_escalation_permission> blocks.
# The model produces these when it wants elevated permissions but the CC
# adapter doesn't support them. Synthesize a proceed command so the loop continues.
if not calls:
for m in re.finditer(r"<(?:require_escalation|request_escalation_permission)>(.*?)</(?:require_escalation|request_escalation_permission)>", text, re.DOTALL | re.IGNORECASE):
body_escal = (m.group(1) or "").strip()
_inner_url_m = re.search(r"https?://[^\s\]'\\>\",]+", body_escal)
if _inner_url_m:
_e_url = _inner_url_m.group(0).rstrip(")].,;'\\\"")
_e_cmd, _e_just = _build_explore_cmd_local(_e_url)
if _e_cmd:
calls.append({
"full_match": m.group(0),
"name": "exec_command",
"arguments": json.dumps({"cmd": _e_cmd, "justification": _e_just or "Escalation block with URL — auto-proceed"}, ensure_ascii=False),
})
continue
if not calls:
calls.append({
"full_match": m.group(0),
"name": "exec_command",
"arguments": json.dumps({"cmd": "echo 'escalation: auto-proceeding — no specific command in escalation block'", "justification": "Auto-proceed past escalation request"}, ensure_ascii=False),
})
# [FIX 24b] Bare <require_escalation ... /> or <request_escalation_permission ... />
# without closing tags. Just auto-proceed.
if not calls and re.search(r"<(?:require_escalation|request_escalation_permission)[\s/>]", text, re.IGNORECASE):
calls.append({
"full_match": "<escalation_bare/>",
"name": "exec_command",
"arguments": json.dumps({"cmd": "echo 'escalation: auto-proceeding past bare escalation tag'", "justification": "Auto-proceed past bare escalation tag"}, ensure_ascii=False),
})
patterns = [
r"<tool_call(?:\s+name=['\"]?([^'\">\s]+)['\"]?)?>(.*?)</tool_call[)]?>",
r"<function=(\w+)>(.*?)</function>",
@@ -2568,6 +2668,70 @@ def cc_stream_to_sse(cc_stream, model, req_id):
else:
_deflog(f"[CC-DEBUG] Fallback also failed. text_buf first 500: {text_buf[:500]!r}")
# [FIX 25] SELF-HEALING STUCK DETECTOR
# When ALL parsers returned empty and text has intent signals, synthesize a
# command so the agent loop doesn't stall. This catches:
# - Bare text with no tool call format at all
# - Unrecognized XML-ish blocks
# - Partial JSON (bare "{")
# - Model explaining what it wants to do but not producing a tool call
if not parsed_tool_calls and len(text_buf) > 10:
_synth_cmd = None
_synth_just = None
_tl = text_buf.lower()
# Heuristic 1: URL in text → fetch it
_url_in_text = re.search(r"https?://[^\s\]'\\>\",]+", text_buf)
if _url_in_text:
_synth_url = _url_in_text.group(0).rstrip(")].,;'\\\"")
_synth_cmd = f"curl -sL --max-time 15 '{_synth_url}' 2>/dev/null | head -200"
_synth_just = "Auto-synthesized: URL detected in text, fetching"
# Heuristic 2: File path references → list or read
if not _synth_cmd:
_file_m = re.search(r"(?:read|open|view|check|examine|cat|show)\s+(?:the\s+)?(?:file\s+)?[`'\"]?(/[^\s'\"]+\.\w+)", _tl)
if _file_m:
_fpath = _file_m.group(1)
_synth_cmd = f"cat '{_fpath}' 2>/dev/null | head -200 || ls -la '{_fpath}'"
_synth_just = f"Auto-synthesized: file reference detected ({_fpath})"
# Heuristic 3: Shell command mentioned in backticks or quotes
if not _synth_cmd:
_shell_m = re.search(r"[`'\"]((?:curl|wget|git|npm|pip|python|ls|cat|grep|find|mkdir|cd|rm|cp|mv|chmod|docker|make|cargo|go)\s[^\s`'\"]+)", text_buf)
if _shell_m:
_synth_cmd = _shell_m.group(1)
_synth_just = "Auto-synthesized: shell command detected in text"
# Heuristic 4: "explore" or "fetch" intent + last user URL
if not _synth_cmd and ("explore" in _tl or "fetch" in _tl or "investigate" in _tl or "repository" in _tl):
for _prev_url in _last_user_urls:
_url_m2 = re.search(r"https?://[^\s\]'\\>\",]+", _prev_url)
if _url_m2:
_pu = _url_m2.group(0).rstrip(")].,;'\\\"")
_ecmd, _ejust = _build_explore_cmd(_pu)
if _ecmd:
_synth_cmd = _ecmd
_synth_just = _ejust or "Auto-synthesized: explore intent with last user URL"
break
# Heuristic 5: Generic "I need to" / "let me" / "I'll" intent with command-like text
if not _synth_cmd:
_intent_m = re.search(r"(?:I(?:'ll| will| need to| should)|let me|please)\s+(.+?)(?:\.|!|\n|$)", _tl, re.IGNORECASE)
if _intent_m:
_intent_text = _intent_m.group(1).strip()
if len(_intent_text) > 10 and len(_intent_text) < 200:
_synth_cmd = f"echo 'Stuck recovery: model intent was: {_intent_text[:100]}'"
_synth_just = f"Auto-synthesized from intent text: {_intent_text[:80]}"
if _synth_cmd:
parsed_tool_calls = [{
"full_match": "__synth_stuck_recovery__",
"name": "exec_command",
"arguments": json.dumps({"cmd": _synth_cmd, "justification": _synth_just or "Auto-synthesized stuck recovery"}, ensure_ascii=False),
}]
_deflog(f"[CC-DEBUG] [STUCK-RECOVERY] Synthesized: cmd={_synth_cmd[:120]!r}")
print(f"[CC-DEBUG] [STUCK-RECOVERY] Synthesized command from text intent", file=sys.stderr, flush=True)
# Also log to stderr for visibility when not piped
print(f"[CC-DEBUG] text_buf={len(text_buf)} chars, tool_calls={len(parsed_tool_calls)}", file=sys.stderr, flush=True)
@@ -4782,6 +4946,42 @@ Postamble text."""
except Exception as e:
_check(f"DSML name=cmd: arguments parsing error: {e}", False)
# Pattern M: explore_agent with nested JSON messages containing URL (FIX 23)
_explore_nested = '<explore_agent>\nmessages: [{"content": "Understand the Z.AI-Chat-for-Android repo at https://github.rommark.dev/admin/Z.AI-Chat-for-Android"}]\n</explore_agent>'
_calls_m = _parse_commandcode_text_tool_calls(_explore_nested)
_check("FIX23 explore nested JSON: parsed", len(_calls_m) == 1, f"got {len(_calls_m)} calls")
if _calls_m:
_args_m = json.loads(_calls_m[0].get("arguments", "{}"))
_check("FIX23 explore nested JSON: cmd has curl", "curl" in _args_m.get("cmd", ""), f"got {_args_m.get('cmd')!r}")
_check("FIX23 explore nested JSON: URL in cmd", "github.rommark.dev" in _args_m.get("cmd", ""), f"missing URL in cmd")
# Pattern N: require_escalation block (FIX 24)
_esc_text = '<require_escalation>I need to run a command with elevated permissions to access the repository at https://github.rommark.dev/admin/Z.AI-Chat-for-Android</require_escalation>'
_calls_n = _parse_commandcode_text_tool_calls(_esc_text)
_check("FIX24 require_escalation: parsed", len(_calls_n) == 1, f"got {len(_calls_n)} calls")
if _calls_n:
_args_n = json.loads(_calls_n[0].get("arguments", "{}"))
_check("FIX24 require_escalation: name is exec_command", _calls_n[0].get("name") == "exec_command", f"got {_calls_n[0].get('name')}")
_check("FIX24 require_escalation: cmd has curl or echo", "curl" in _args_n.get("cmd", "") or "echo" in _args_n.get("cmd", ""), f"got {_args_n.get('cmd')!r}")
# Pattern N2: bare request_escalation_permission tag (FIX 24b)
_esc_bare = 'I want to proceed.\n<request_escalation_permission />\nPlease let me continue.'
_calls_n2 = _parse_commandcode_text_tool_calls(_esc_bare)
_check("FIX24b bare escalation: parsed", len(_calls_n2) == 1, f"got {len(_calls_n2)} calls")
if _calls_n2:
_check("FIX24b bare escalation: name is exec_command", _calls_n2[0].get("name") == "exec_command", f"got {_calls_n2[0].get('name')}")
# Pattern O: _build_explore_cmd module-level function (FIX 23/25)
_cmd_o, _just_o = _build_explore_cmd("https://github.rommark.dev/admin/Z.AI-Chat-for-Android")
_check("FIX23/25 _build_explore_cmd: returns cmd", _cmd_o is not None, "returned None")
_check("FIX23/25 _build_explore_cmd: has curl", _cmd_o and "curl" in _cmd_o, f"no curl in {_cmd_o!r}")
_check("FIX23/25 _build_explore_cmd: has api path", _cmd_o and "/api/v1/repos/" in _cmd_o, f"no api path in {_cmd_o!r}")
# Pattern O2: _build_explore_cmd with JSON array containing URL
_cmd_o2, _ = _build_explore_cmd('[{"content": "https://github.rommark.dev/admin/Z.AI-Chat-for-Android"}]')
_check("FIX23/25 _build_explore_cmd from JSON array: returns cmd", _cmd_o2 is not None, "returned None")
_check("FIX23/25 _build_explore_cmd from JSON array: has curl", _cmd_o2 and "curl" in _cmd_o2, f"no curl in {_cmd_o2!r}")
print(f"[CC-SELF-TEST] Results: {_counts[0]} passed, {_counts[1]} failed",
file=sys.stderr)
if _counts[1]: