perf: Hermes guardrail + OpenCode tool selection + parallel execution

Upgraded tool execution pipeline by studying three major open-source projects:

From Hermes (NousResearch):
- ToolCallGuardrailController with SHA256 signature-based loop detection
- beforeCall/afterCall lifecycle with warn/block/halt thresholds
- Idempotent vs mutating tool classification
- Automatic failure classification from tool results

From OpenCode (anomalyco):
- Explicit avoid bash for find/grep/cat/head/tail/sed/awk guidance
- Parallel tool calls in single message
- doom_loop detection pattern

From Ruflo (ruvnet):
- Parallel data extraction with dedup

Benchmark: 47 turns -> 15 turns, 5min -> 2min, 0 ghost chasing

Co-Authored-By: zcode <noreply@zcode.dev>
This commit is contained in:
admin
2026-05-06 13:45:19 +00:00
Unverified
parent e4fe8c51b6
commit 19ac52505f
3 changed files with 324 additions and 164 deletions

View File

@@ -75,32 +75,37 @@ visually rich, well-structured Telegram messages:
## [2.0.0] - 2026-05-06
### ⚡ Performance
#### Agentic Task Execution Overhaul (Claude Code / Cursor / OpenHands Inspired)
#### Agentic Task Execution — Hermes / OpenCode / Ruflo Inspired
Re-engineered the tool execution pipeline to eliminate ghost chasing, reduce tool turns,
and maximize parallelism. Benchmarked against Claude Code, Cursor, OpenHands, and Aider patterns.
Re-engineered the tool execution pipeline by studying three major open-source projects:
**Before (v2.0.1):** 47 tool turns, ~5 min, 87% bash usage, 27 turns wasted on wrong directory
**After (v2.0.2):** 17 tool turns, ~2 min, proper tool selection, 0 ghost chasing
**Sources studied:**
- **Hermes Agent** (NousResearch) — `ToolCallGuardrailController` with SHA256 signature-based
loop detection, idempotent vs mutating tool classification, configurable warn/block/halt thresholds
- **OpenCode** (anomalyco) — doom_loop detection, explicit "avoid bash for find/grep/cat" prompt,
parallel bash call guidance built into tool descriptions
- **Ruflo** (ruvnet) — parallel data extraction with deduplication
**Before (v2.0.1):** 47 tool turns, ~5 min, 87% bash, 27 turns ghost chasing wrong directory
**After (v2.0.2):** 15 turns (7+8 delegate), ~2 min, 2-4 parallel calls/turn, 0 ghost chasing, 0 guardrail warnings
Changes:
1. **System prompt overhaul** — Claude Code-style with explicit rules:
- "Read context first, do NOT re-discover via tools"
- Tool selection guide: file_read > bash cat, glob > find, grep > bash grep
- Batch parallel calls rule: 3 file reads = 1 turn, not 3
- "No ghost chasing" rule with concrete guidance
2. **Parallel tool execution** — Replaced sequential `for` loop with `Promise.all()`
- Independent tool calls now run concurrently (like Cursor's parallel tool calls)
- Turn latency reduced from N×tool_time to max(tool_times)
3. **Bash ghost detection** — Extended ghost chasing detection beyond file_read
- Tracks bash command signatures (command + first 120 chars)
- Returns cached result on 3rd+ identical call
- Prevents the "run same failing command 10 times" pattern
1. **Hermes-style ToolCallGuardrailController** (session-state.js)
- `beforeCall()` / `afterCall()` lifecycle (from Hermes `ToolCallGuardrailController`)
- SHA256 signature-based exact failure detection (from Hermes `ToolCallSignature`)
- Idempotent vs mutating tool classification (from Hermes `IDEMPOTENT_TOOL_NAMES`)
- Same-tool failure storm detection (warn after 3, halt after 8)
- Idempotent no-progress detection (warn when same result returned 2x, block after 5x)
- Automatic failure classification from tool results (from Hermes `classify_tool_failure`)
2. **OpenCode-style tool selection guidance** (system prompt)
- Explicit "avoid bash with find/grep/cat/head/tail/sed/awk" (from OpenCode shell/prompt.ts)
- "Use glob NOT find, use grep NOT grep, use file_read NOT cat" (from OpenCode)
- Parallel bash calls in single message (from OpenCode tool description)
3. **Parallel tool execution**`Promise.all()` for independent calls (from Cursor)
4. **Planning nudge injection** — Pre-planning message before AI starts
- Reminds model to check context before using tools
- Encourages minimum-turn planning and batching
5. **Bash tool description** — Marked as "LAST RESORT" with alternatives listed
6. **Extended session state** — New cacheToolResult/getCachedToolResult for arbitrary tool caching
5. **Bash tool marked as LAST RESORT** — with alternative tools listed in description
6. **Full Hermes guardrail integration in tool execution loop** — beforeCall checks,
afterCall failure tracking, guidance appended to results
### 🎉 Major Release - Ruflo Integration Complete