Files
zCode-CLI-X/CHANGELOG.md
admin 994c5481bf fix: crash loop after reboot - resilient error handlers + mask user service
Root causes:
1. uncaughtException/unhandledRejection called gracefulShutdown() -> process.exit(0)
   Any minor error killed the entire bot. Changed to LOG ONLY (Hermes/OpenCode pattern).
2. User-level systemd service was running alongside system-level, fighting for port 3001.
   Masked user service permanently.
3. Fragile new Promise(() => {}) keepalive replaced with setInterval-based keepalive.
4. Syntax error in uncaughtException handler (literal newline in single-quoted string).

Tested: 5 rapid consecutive restarts all pass. Uptime stable.

Co-Authored-By: zcode <noreply@zcode.dev>
2026-05-06 16:51:12 +00:00

14 KiB
Raw Blame History

Changelog

All notable changes to zCode CLI X will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.


[2.0.2] - 2026-05-06

Performance

Agentic Task Execution — Hermes / OpenCode / Ruflo Inspired

Re-engineered the tool execution pipeline by studying three major open-source projects:

Sources studied:

  • Hermes Agent (NousResearch) — ToolCallGuardrailController with SHA256 signature-based loop detection, idempotent vs mutating tool classification, configurable warn/block/halt thresholds
  • OpenCode (anomalyco) — doom_loop detection, explicit "avoid bash for find/grep/cat" prompt, parallel bash call guidance built into tool descriptions
  • Ruflo (ruvnet) — parallel data extraction with deduplication

Before (v2.0.1): 47 tool turns, ~5 min, 87% bash, 27 turns ghost chasing wrong directory After (v2.0.2): 15 turns (7+8 delegate), ~2 min, 2-4 parallel calls/turn, 0 ghost chasing, 0 guardrail warnings

Changes:

  1. Hermes-style ToolCallGuardrailController (src/bot/session-state.js)
    • beforeCall() / afterCall() lifecycle (from Hermes ToolCallGuardrailController)
    • SHA256 signature-based exact failure detection (from Hermes ToolCallSignature)
    • Idempotent vs mutating tool classification (from Hermes IDEMPOTENT_TOOL_NAMES)
    • Same-tool failure storm detection (warn after 3, halt after 8)
    • Idempotent no-progress detection (warn when same result returned 2x, block after 5x)
    • Automatic failure classification from tool results (from Hermes classify_tool_failure)
  2. OpenCode-style tool selection guidance (src/bot/index.js system prompt)
    • Explicit "avoid bash with find/grep/cat/head/tail/sed/awk" (from OpenCode shell/prompt.ts)
    • "Use glob NOT find, use grep NOT grep, use file_read NOT cat" (from OpenCode)
    • Parallel bash calls in single message (from OpenCode tool description)
  3. Parallel tool executionPromise.all() for independent calls (from Cursor)
  4. Planning nudge injection — Pre-planning message before AI starts
  5. Bash tool marked as LAST RESORT — with alternative tools listed in description
  6. Full Hermes guardrail integration in tool execution loop — beforeCall checks, afterCall failure tracking, guidance appended to results

🐛 Fixed

  • Crash loop after rebootuncaughtException and unhandledRejection handlers were calling gracefulShutdown() (which calls process.exit(0)), so ANY unhandled error killed the bot. Changed to LOG ONLY (Hermes/OpenCode pattern) — only SIGINT/SIGTERM trigger clean shutdown.
  • Dual systemd service war — User-level service (~/.config/systemd/user/zcode.service) was running alongside system-level service, both fighting for port 3001. Masked the user service permanently (ln -sf /dev/null zcode.service).
  • Fragile keepaliveawait new Promise(() => {}) replaced with setInterval-based keepalive that's robust against V8 optimization.

📄 Files Changed

  • src/bot/session-state.js — Complete rewrite with Hermes guardrail controller (+200 lines)
  • src/bot/index.js — Parallel tool execution, system prompt overhaul, resilient error handlers (+160 lines)
  • src/bot/index.js — Fixed syntax error in uncaughtException handler (literal newline in string)
  • CHANGELOG.md — Updated with full v2.0.2 details
  • README.md — Updated header with v2.0.2 summary

[2.0.1] - 2026-05-06

🐛 Fixed

Critical: EADDRINUSE Crash Loop (Port Binding Race Condition)

Root Cause: The EADDRINUSE error handler used fuser to identify processes on port 3001. During systemd restart cycles, fuser returned the current process PID due to a race condition (the socket was half-open before the guard p !== process.pid could filter it). The process would kill itself, triggering a crash loop.

Additionally, two competing systemd services (system-level and user-level) were both trying to manage the same binary, creating a restart war where each instance killed the other.

Fix: Replaced the entire fuser-based port conflict resolution with a robust approach inspired by Next.js, Vite, and webpack-dev-server:

  1. PID-file based stale detection — Read .zcode-bot.pid to identify the previous instance (no fuser, no race condition with the current process)
  2. net.createServer port probe — Atomically test if a port is free using Node.js built-in net module (no external shell commands, no TOCTOU gap)
  3. ss fallback — When pidfile is missing (deleted during graceful shutdown), use ss -tlnp to find the PID owning the port (kernel-authoritative, no race)
  4. Wait loop with 300ms polling — After SIGTERM to stale process, poll until port is confirmed free before attempting to bind (up to 5s timeout)
  5. Single-service architecture — Disabled the user-level systemd unit; only the system-level zcode.service manages the process, preventing dual-instance conflicts

Impact: The bot now survives rapid restart cycles (5 consecutive restarts tested), recovers cleanly from stale processes, and has zero EADDRINUSE crashes.

Secondary Fixes

  • Pidfile lock removed — The old acquirePidfile() killed any process with the stored PID, including the current process during restart races. Now pidfile is informational-only
  • WebSocket EADDRINUSE swallower removed — The wss.on('error') handler silently swallowed EADDRINUSE errors on the WS server, masking the real issue. Removed entirely
  • sequentialize middleware disabled@grammyjs/runner's sequentialize caused incompatibility with systemd service management; replaced with a pass-through middleware

🎨 Improved

Telegram Message Formatting Overhaul

Enhanced the markdownToHtml converter in src/bot/message-sender.js to produce visually rich, well-structured Telegram messages:

  • Heading hierarchy — h1 gets 🚀 + separator line, h2 gets █ block marker, h3 gets ▸ triangle, h4 gets ● dot — all bold, visually distinct
  • Multi-line blockquotes — consecutive > lines now merge into a single <blockquote> element instead of one per line
  • Indented bullet lists with leading spaces for better readability
  • Table support — Markdown tables (| col | col |) rendered as <pre> blocks
  • Horizontal rules--- and *** render as ──── separator lines
  • Code blocks — fenced code blocks get <pre><code> with language class attribute
  • Cleaner vertical spacing (excessive blank lines collapsed)

🔧 Changed

  • src/bot/index.js — Port binding logic completely rewritten (68 lines removed, 143 added)
  • src/bot/message-sender.js — markdownToHtml converter enhanced (13 lines removed, 41 added)
  • zcode.service (system) — Added EnvironmentFile, reduced RestartSec to 5s, added TimeoutStartSec=60
  • User-level systemd unit masked to prevent dual-service conflicts

[2.0.0] - 2026-05-06

🎉 Major Release - Ruflo Integration Complete

Complete integration of Ruflo's multi-agent orchestration system with comprehensive documentation update.

Added

Core Features

  • Multi-Agent Swarm System

    • SwarmCoordinator with 3 topologies: simple, hierarchical, swarm
    • 9 agent roles: coder, tester, reviewer, architect, devops, security, researcher, designer, coordinator
    • DAG-compatible task system with priorities and dependencies
    • AgentOrchestrator for distributed task execution
  • Plugin System

    • PluginManager with fault-isolated extension point routing
    • PluginLoader with dependency-resolving batch loading
    • 16 standard extension points:
      • tool.execute (before/after)
      • ai.response (before/after)
      • session.start / session.end
      • message.receive / message.send
      • memory.save / memory.load
      • agent.spawn / agent.terminate
      • cron.trigger
      • health.check
      • And more...
    • BasePlugin with lifecycle hooks (initialize, shutdown)
  • Hook System

    • Pre/post tool hooks for logging, validation, caching
    • Pre/post AI hooks for prompt modification, response analysis
    • Session lifecycle hooks (start, end, pause, resume)
    • Priority-based execution order
    • Zero latency impact (runs asynchronously)
  • Enhanced Memory Backend

    • JSONBackend with typed entries, LRU eviction, text search
    • InMemoryBackend with TTL auto-eviction for ephemeral data
    • 7 memory types: lesson, pattern, preference, discovery, gotcha, context, ephemeral
    • Smart eviction (old discoveries first, lessons/gotchas kept)

New Tools (6 Total)

  • swarm_spawn - Spawn new agent swarm with specified roles
  • swarm_execute - Execute current swarm task
  • swarm_distribute - Distribute work to swarm agents
  • swarm_state - Check swarm progress and status
  • swarm_terminate - Terminate all swarm agents
  • delegate_agent - Delegate task to specific agent role

Documentation

  • README.md - Complete rewrite (26,782 bytes, ~1,180 lines)

    • Feature comparison table (zCode vs Hermes vs Claude vs Ruflo)
    • Architecture diagrams (system overview, Ruflo integration, message flow)
    • Usage examples for all commands
    • Security guidelines and performance benchmarks
    • Roadmap (v1.1, v1.2, v2.0)
  • INSTALLATION.md - New comprehensive setup guide (11,789 bytes, ~545 lines)

  • CREDITS.md - New attribution document (8,893 bytes, ~309 lines)

  • CONTRIBUTING.md - New contribution guide (9,574 bytes, ~461 lines)

  • REPO_UPDATE_SUMMARY.md - New update summary (7,450 bytes, ~205 lines)

Metadata

  • package.json - Enhanced with comprehensive metadata
    • Version bumped to 2.0.0
    • Added author, license, repository information
    • Added 20+ keywords for discoverability
    • Added funding and support links

🔄 Changed

  • Version Bump: 1.0.0 → 2.0.0 (major release)
  • README.md: Complete rewrite, 1,180 lines changed
  • package.json: Enhanced metadata, 55 lines modified
  • Documentation Structure: Organized into core, setup, and contributing sections

🛠️ Modified

  • src/plugins/ - New plugin system (4 files, ~23KB)
  • src/agents/ - Enhanced agent system (4 files, ~28KB)
  • src/bot/hooks.js - New hook system (4,900 bytes)
  • src/bot/memory-backend.js - Enhanced memory backend (8,077 bytes)
  • src/bot/index.js - Integrated all new systems (~17KB)

🧪 Added Tests

  • test-ruflo-smoke.mjs - Comprehensive smoke test suite
    • Total: 53 tests, all passing

🎯 Features Comparison

Feature v1.0.0 v2.0.0 Change
24/7 Telegram Bot Unchanged
Self-Learning Memory Enhanced with LRU
Voice I/O Unchanged
Self-Evolution Unchanged
Multi-Agent Swarm NEW
Plugin System NEW
Hook System NEW
Enhanced Memory ⚠️ UPGRADED
18 Tools Unchanged
9 Agent Roles NEW
16 Extension Points NEW
6 Swarm Tools NEW
Documentation ⚠️ COMPLETE

Legend: Full support | ⚠️ Partial support | Not available


[1.0.0] - 2026-05-04

🎉 Initial Release

Added

  • Core Features

    • 24/7 Telegram bot with grammy framework
    • Self-learning memory (5 categories)
    • Voice I/O (Vosk STT + node-edge-tts TTS)
    • Self-evolution with 3-layer safety
    • Intelligence Routing (unified agentic loop)
    • RTK (Rust Token Killer) integration
  • Tools (18 Total)

    • BashTool, FileEditTool, FileReadTool, FileWriteTool
    • GitTool, WebSearchTool, WebFetchTool
    • BrowserTool, VisionTool, TTSTool
    • GrepTool, GlobTool, TaskCreateTool
    • TaskUpdateTool, TaskListTool, SendMessageTool
    • ScheduleCronTool, SelfEvolveTool
  • Agents (3 Roles)

    • Code Reviewer
    • System Architect
    • DevOps Engineer
  • Skills

    • code_review, bug_fix, refactor, documentation, testing
  • Documentation

    • README.md, ARCHITECTURE.md, SERVICE_MAP.md
    • QUICKSTART.md, TELEGRAM_SETUP.md, PERFORMANCE.md

🛠️ Technology Stack

  • AI Model: Z.AI GLM-5.1 (Coding Plan)
  • Telegram Framework: grammy
  • Web Server: Express
  • Logging: Winston
  • Voice STT: Vosk (offline)
  • Voice TTS: node-edge-tts
  • Token Optimization: RTK
  • Database: JSON-backed memory

📦 Installation

  • Node.js ≥ 20.0.0
  • npm ≥ 9.0.0
  • ffmpeg (for voice I/O)
  • Python 3.8+ (for Vosk)
  • systemd (for 24/7 service)

[Unreleased]

Planned for v2.1.0

  • Enhanced swarm topologies (federated, gossip)
  • Plugin marketplace
  • Advanced analytics dashboard
  • Custom agent training (LoRA fine-tuning)

Planned for v2.2.0

  • Web UI dashboard
  • Multi-language support (Spanish, French, German)
  • Distributed memory backend (Redis)
  • Kubernetes deployment
  • Horizontal scaling

Notes

Breaking Changes

  • v2.0.0: No breaking changes to existing functionality
  • All v1.0.0 features remain fully compatible
  • New features are additive only

Migration Guide

No migration needed! v2.0.0 is fully backward compatible with v1.0.0.

Known Issues

None reported.

Contributors

  • Roman (@uroma2) - Author, maintainer, primary developer
  • [More contributors coming soon]

zCode CLI X - The Ultimate Agentic Coding Assistant Hermes Agent × Claude Code × Ruflo × Opencode

Version License