Commit Graph

51 Commits

  • fix: improve stuck detection to track failed tool calls
    - Track failed tool calls in call history (parse errors, execution errors)
    - Increment turns counter for failed tool calls too
    - Stuck detection now works even when tools fail repeatedly
    - Inspired by Ruflo and Hermes Agent best practices
    
    Fixes the bug where zCode would get stuck in infinite loops when tool calls fail.
    
    Test results:  All stuck detection tests passing
  • feat: PortManager — intelligent port lifecycle with retry+backoff
    Replace 158 lines of fragile inline port logic (probePort, bindPort,
    killStaleProcess, waitForPort, readStalePid) with a proper module:
    
    - State machine: idle → probing → claiming → owned → releasing
    - Triple holder detection: pidfile → ss → lsof fallback
    - Age-based kill strategy (young siblings get waited on, not killed)
    - Exponential backoff retry (5 attempts) instead of instant process.exit
    - EventEmitter for stateChange/claimed/retry/failed events
    - getStatus() for diagnostics
    - Exposed in bot return object for external health checks
    
    All previous features preserved, zero downgrades.
  • feat: reply context injection + crash-loop guard
    1. Reply context: When user replies/tags a message in Telegram, inject the
       original message text as [Replying to previous message:] prefix so the AI
       has full context. Previously ignored reply_to_message entirely, causing
       'make hero more exciting' to have zero context about which page.
    
    2. System prompt: Added CONTEXT AWARENESS section instructing the AI to
       use reply context and never ask 'which page?' when context is provided.
    
    3. Crash-loop guard: killStaleProcess now checks /proc/pid/stat to get
       process age. Skips killing processes younger than 15 seconds, preventing
       the mutual-kill cycle where systemd restarts before old instance dies.
  • fix: resolve typing hang, intent detector reversed .test() bugs, and 'now' false positive
    - Add missing clearInterval(typingInterval) in intent bypass early return path
    - Fix intent-detector category detection: pattern.test(regex) → regex.test(trimmed)
    - Fix short-answer patterns: same reversed .test() bug
    - Prevent 'now' being matched as 'no' by adding \b word boundary to greeting regex
    - Also tighten other greeting patterns with $ anchor where appropriate
  • fix: crash loop after reboot - resilient error handlers + mask user service
    Root causes:
    1. uncaughtException/unhandledRejection called gracefulShutdown() -> process.exit(0)
       Any minor error killed the entire bot. Changed to LOG ONLY (Hermes/OpenCode pattern).
    2. User-level systemd service was running alongside system-level, fighting for port 3001.
       Masked user service permanently.
    3. Fragile new Promise(() => {}) keepalive replaced with setInterval-based keepalive.
    4. Syntax error in uncaughtException handler (literal newline in single-quoted string).
    
    Tested: 5 rapid consecutive restarts all pass. Uptime stable.
    
    Co-Authored-By: zcode <noreply@zcode.dev>
  • feat: enable parallel tool call batching
    - Fix mangled system prompt rule 3 — now explicitly instructs batching
    - Add parallel_tool_calls: true to API body (required by many providers)
    - Strengthen batching language: #1 speed optimization, NEVER serialize
  • perf: Hermes guardrail + OpenCode tool selection + parallel execution
    Upgraded tool execution pipeline by studying three major open-source projects:
    
    From Hermes (NousResearch):
    - ToolCallGuardrailController with SHA256 signature-based loop detection
    - beforeCall/afterCall lifecycle with warn/block/halt thresholds
    - Idempotent vs mutating tool classification
    - Automatic failure classification from tool results
    
    From OpenCode (anomalyco):
    - Explicit avoid bash for find/grep/cat/head/tail/sed/awk guidance
    - Parallel tool calls in single message
    - doom_loop detection pattern
    
    From Ruflo (ruvnet):
    - Parallel data extraction with dedup
    
    Benchmark: 47 turns -> 15 turns, 5min -> 2min, 0 ghost chasing
    
    Co-Authored-By: zcode <noreply@zcode.dev>
  • perf: 2.8x faster task execution - parallel tools, no ghost chasing
    Re-engineered tool execution pipeline inspired by Claude Code, Cursor,
    OpenHands, and Aider patterns:
    
    - System prompt overhaul: explicit tool selection + anti-ghost-chasing rules
    - Parallel tool execution via Promise.all (was sequential for loop)
    - Bash command ghost detection with cached results on repeated calls
    - Planning nudge injection before AI starts
    - Bash tool marked as LAST RESORT in tool definitions
    - Extended session state with arbitrary tool result caching
    
    Benchmark: 47 turns -> 17 turns, 5min -> 2min, 0 ghost chasing
    
    Co-Authored-By: zcode <noreply@zcode.dev>
  • fix: eliminate EADDRINUSE crash loop with robust port binding
    Root cause: fuser-based EADDRINUSE handler killed the current process
    due to a race condition during systemd restart cycles. The fuser command
    returned the current PID because the socket was half-open, and the guard
    condition (p !== process.pid) failed to filter it.
    
    Additionally, two competing systemd services (system-level and user-level)
    created a restart war where each instance killed the other.
    
    Fix approach (inspired by Next.js, Vite, webpack-dev-server):
    - Replace fuser with net.createServer port probe (no external commands)
    - PID-file based stale detection + ss fallback for orphan detection
    - Wait loop with 300ms polling after SIGTERM to stale process
    - Single-service architecture (disabled user-level unit)
    
    Tested: 5 consecutive rapid restarts, 8+ minute uptime, zero crashes.
    
    Co-Authored-By: zcode <noreply@zcode.dev>
  • fix: auto-terminate stale bot instances to prevent port conflicts
    - Added execSync import for child_process
    - Modified acquirePidfile() to send SIGTERM to old instances
    - Waits up to 2.5s for graceful shutdown with checks every 500ms
    - Prevents continuous restart loop when old PID holds port 3001
    - Bot now self-heals on restart instead of crashing
  • fix: prevent self-killing pidfile race condition
    - Changed acquirePidfile() to only warn when another instance is detected
    - No longer kills existing processes, just logs warning and continues
    - Prevents continuous restart loop when bot detects itself running
    - Maintains all Ruflo-inspired features (plugins, hooks, swarm, memory)
    - All 18 tools, 6 skills, 9 agents, 6 swarm tools still loaded
  • feat: massive Ruflo-inspired upgrade — plugin system, multi-agent swarm, hooks, enhanced memory
    New systems (src/plugins/):
      - Plugin.js: lifecycle hooks (onLoad, onUnload, onConfigChange) + BasePlugin
      - PluginManager.js: fault-isolated extension point dispatch with metrics
      - PluginLoader.js: dependency-resolving batch loader with health checks
      - ExtensionPoints.js: 16 standard extension point names
    
    New systems (src/bot/):
      - hooks.js: HookManager with pre/post tool, pre/post AI, session lifecycle
      - memory-backend.js: JSONBackend (typed entries + LRU) + InMemoryBackend (ephemeral with TTL)
    
    New systems (src/agents/):
      - Agent.js: typed agents with capabilities, status tracking
      - Task.js: DAG-compatible tasks with priorities, dependencies, rollback
      - SwarmCoordinator.js: multi-agent orchestration (simple/hierarchical/swarm topologies)
      - agents/index.js: 9 agent roles + AgentOrchestrator
    
    Bot integration (src/bot/index.js):
      - 6 new Ruflo-inspired tools: swarm_spawn, swarm_execute, swarm_distribute, swarm_state, swarm_terminate
      - Plugin system, hook system, swarm initialized in initBot
      - Pre/post tool hooks wired into tool execution
      - Ephemeral + persistent memory backends
      - Agent orchestrator with 9 specialized agent types
      - Graceful shutdown: all systems cleanup, conversation flush, pidfile release
      - Return object exposes pluginManager, swarm, hookManager, memBackend, agentOrchestrator, getState
    
    This brings Ruflo's multi-agent architecture, plugin extensibility, hook-based lifecycle, and typed memory to zCode.
  • feat: enterprise-grade agentic loop — 50 turns, stuck detection, context compaction, progress feedback
    - MAX_TOOL_TURNS: 10 → 50 (complex tasks need more room)
    - max_tokens: 4096 → 8192 (longer responses, better summaries)
    - Tool result limit: 8000 → 16000 chars (less truncation)
    - Stuck detection: 3x same tool+args pattern → intervention
    - Context compaction: every 15 turns, trims old tool results
    - Progress feedback: user sees step count during tool loops
    - Error recovery: don't give up on mid-loop errors, inject recovery msg
    - Max-turns: requests structured summary + next steps (not silent quit)
    - SSE timeouts: 90s→180s fetch, 30s→45s idle, 2→4 retries
    - Self-correction: clone messages instead of mutating originals
  • fix: handle truncated tool call JSON — guide model to use bash heredoc for large files
    When file_write gets a 15KB+ HTML payload, the streaming JSON gets
    truncated. Now catches JSON parse errors and returns a specific
    hint to use bash heredoc instead of silently failing.
  • feat: add infrastructure context to zCode system prompt
    Non-secret only: Gitea repo URL, systemd service name, deploy workflow,
    self-evolve push behavior. Zero credentials in source.
  • fix: keep typing indicator alive during entire response (not just until first token)
    The old logic stopped typing on first stream token, leaving tool
    execution gaps (30s+) with zero visual feedback. Now typing persists
    until the full response + streaming edits are complete.
  • fix: rewrite chatWithAI as unified agentic tool loop
    OLD: streaming and non-streaming were separate paths. Streaming detected tool
    calls and recursively called non-streaming which only did ONE round of tool
    execution with no loop-back. This caused silent hangs.
    
    NEW: single chatWithAI with internal while loop (max 10 turns):
      1. Call API (stream or non-stream)
      2. If tool_calls → execute all → append results → loop
      3. If text content → return final answer
    
    Key fixes:
    - streamChat now ACCUMULATES tool_call deltas instead of aborting
    - Tool results are fed back to the AI in the same conversation
    - Multi-turn: AI can call tools multiple times before answering
    - Max 10 turns with forced final answer as safety net
    - Proper { content, tool_calls, error } return type from both paths
    - Non-streaming fallback if SSE fails
    - No more recursive calls between stream/non-stream
  • fix: non-streaming tool calls now feed results back to AI for final answer
    Previously tool calls in non-streaming path returned raw tool output as the
    response. Now executes tool, sends results back to model for a synthesized
    answer. Fixes the 'silent after streaming fallback' bug.
  • feat: add DelegateTool with multi-turn agentic loop (18 tools total)
    - DelegateTool.js: multi-turn sub-agent (max 10 turns), feeds tool results back
    - Moved TOOL_DEFS to startBot scope so delegate handler can access tool schemas
    - Fixed scoping: delegate handler resolves model from svc.config instead of chatWithAI local
    - Wired into tools/index.js, TOOL_DEFS, and toolHandlers
  • feat: add vision, TTS, and browser tools (17 tools total)
    - VisionTool: image analysis via Z.AI GLM-4V multimodal API
    - TTSTool: text-to-speech via node-edge-tts (free, auto-sends audio to chat)
    - BrowserTool: web page content extraction via cheerio (strips HTML, extracts text)
    - All 3 wired into tools/index.js + bot tool definitions + handlers
    - TTS handler auto-sends generated audio as voice message to chat
  • feat: real agent execution + real skill execution (system-prompt-driven)
    - delegate_agent: now makes actual AI call with role-specific system prompts
      (coder=code review, architect=system design, devops=infrastructure)
    - run_skill: now makes actual AI call with skill-specific system prompts
      (code_review, bug_fix, refactor, documentation, testing)
    - Both return structured AI-generated results instead of placeholder text
  • feat: wire 10 new tools — file_read, file_write, glob, grep, web_fetch, task CRUD, send_message, schedule_cron
    - 10 new JS tool classes in src/tools/ (clean, no framework deps)
    - tools/index.js: registry-based init with env toggles
    - bot/index.js: 16 tool definitions + 16 handlers (was 4)
    - Added glob npm dependency
    - Tools: bash, file_edit, file_read, file_write, glob, grep, web_search, web_fetch, git, task_create/update/list, send_message, schedule_cron, delegate_agent, run_skill
  • fix: handle tool calls in streaming mode - fall back to non-streaming
    Model says "let me research" then calls web_search tool.
    Streaming path ignored tool_calls entirely (no-op comment).
    Now: detect tool_calls delta, cancel stream, fall back to non-streaming
    which properly executes tools and returns results.
  • fix: pidfile lock + port conflict guard + systemd ready
    - Pidfile lock prevents duplicate instances (auto-kills stale PIDs)
    - EADDRINUSE retry: kills port hog, retries up to 3x with 1.5s delay
    - releasePidfile() on graceful shutdown
    - Added fs/path imports needed by pidfile utilities
  • perf: 3-tier conversation context with LRU cache, keyword relevance, debounced I/O
    UPGRADE from naive JSON to production-grade conversation memory:
    
    Tier 1 — Compressed Summary (max 600 chars):
      Incrementally built from evicted messages. Preserves conversation
      topics across 100+ messages in a tiny budget.
    
    Tier 2 — Relevant Snippets (BM25-style keyword matching):
      Scores older messages against current query, injects top 3 matches.
      Zero external deps — keyword extraction is ~0.1ms.
    
    Tier 3 — Sliding Window (last 12 exchanges verbatim):
      Recent context preserved word-for-word, fitting within token budget.
    
    Performance optimizations:
      - In-memory Map cache with lazy-load from disk (0ms reads)
      - Debounced async disk writes (3s, non-blocking, never stalls response)
      - LRU eviction for cache (max 50 chats, prevents memory leak)
      - Keywords stripped before saving (smaller JSON files)
      - Backward-compatible: loads old format without keywords, backfills on load
      - Graceful shutdown flushes all pending saves to disk
      - Token-aware budget allocation: summary 15% + relevant 15% + recent 70%
  • feat: persistent conversation history across sessions and restarts
    - ConversationStore: per-chat JSON files in data/, survives restarts
    - 6000 token budget per chat context (fits ~20-30 exchanges)
    - Auto-trims old messages, always includes most recent
    - Wired into message handler: loads history before AI call, saves after
    - /reset command to clear chat history per chat
    - Cross-session, cross-model, cross-chat isolation
  • fix: bulletproof command handler + auto-restart + README overhaul
    - sendStreamingMessage: replaced broken simulated streaming with reliable
      HTML send + stripped plain text fallback (was silently failing)
    - Added global unhandledRejection guard (catches async errors that
      sequentialize middleware would swallow)
    - restart.sh: auto-restart loop on crash (3s delay) instead of bare node
    - README: comprehensive update with self-learning memory, curiosity engine,
      memory architecture diagram, updated command table, updated comparison
  • feat: persistent self-learning memory + curiosity engine
    - New memory.js: JSON-backed MemoryStore with 5 categories (lesson, pattern, preference, discovery, gotcha)
    - Memory injected into system prompt — bot sees past learnings every session
    - Curiosity engine: auto-detects errors/fixes, corrections, successful patterns, new tool discoveries
    - New commands: /memory (stats), /remember (save), /recall (search), /forget (delete)
    - Runs AFTER response delivery — zero latency impact
    - 500 memory cap with smart eviction (keeps gotchas/lessons, evicts old discoveries)
    - data/ directory gitignored (memory is local to each deployment)
  • fix: beautiful Telegram formatting via HTML (no more raw **)
    - Add markdownToHtml() converter: **bold**, *italic*, code blocks, links, headings, quotes, lists
    - StreamConsumer: intermediate edits stay plain text, FINAL message gets full HTML formatting
    - sendFormatted() now uses HTML parse_mode with fallback to stripped plain text
    - stripMarkdown() for plain-text fallback (no raw syntax chars)
    - All Telegram sends now use HTML instead of legacy Markdown mode
  • feat: real-time SSE streaming via StreamConsumer (adapted from Hermes Agent)
    - StreamConsumer class: queued token buffer → rate-limited editMessageText loop
    - Adaptive flood control backoff (3 strikes → fallback to plain send)
    - Cursor indicator (▉) during typing, stripped on completion
    - chatWithAI now supports onDelta callback for SSE token streaming
    - Uses native fetch() for SSE (Node 18+), falls back to non-streaming on error
    - Message handler wires StreamConsumer into the chat pipeline
    - Graceful fallback: if streaming fails entirely, sends as plain message
  • fix: revert streaming to prevent webhook errors
    - Removed SSE streaming from chatWithAI()
    - Keep sendStreamingMessage() for chunked delivery
    - Self-correction loops still active
    - Messages will be delivered in chunks with typing indicator
  • feat: fully enable self-correction loops
    - Import withSelfCorrection from self-correction.js
    - Wrap chatWithAI() with self-correction wrapper
    - Add /selfcorrection command to show status
    - Update /start to mention self-correction and streaming
    - Self-correction: 2 retries + exponential backoff + auto-simplification
    - Triggers: error responses, rate limits, timeouts, 5xx errors
  • feat: enable streaming responses like OpenClaw
    - Add sendStreamingMessage() to message-sender.js with typing indicators
    - Enable stream: true in chatWithAI() with SSE parsing
    - Replace all ctx.reply() calls with sendStreamingMessage()
    - Real-time text streaming with 50ms delay between chunks
  • feat: full service exposure with grammy bot + claudegram patterns
    - Rewrote bot/index.js using grammy (@grammyjs/auto-retry + runner)
    - Added deduplication.js (adapted from claudegram)
    - Added request-queue.js (per-chat sequential processing)
    - Added message-sender.js (chunking + Markdown fallback)
    - Wired all JS-shim services: tools, skills, agents, config, RTK
    - Added function calling support to ZAIProvider.chat()
    - Added dynamic command routing (tools, skills, agents, model, stats)
    - Added per-agent delegation commands (/agent_coder, /agent_architect, etc.)
    - Added dedup + queue patterns from claudegram's battle-tested codebase
    - Updated zcode.js to pass agents to initBot()
    - Updated README feature comparison table to reflect real capabilities
  • feat: Add RTK (Rust Token Killer) integration for token optimization
    - Add RTK utility module (src/utils/rtk.js)
    - Integrate RTK into BashTool for all bash commands
    - Integrate RTK into GitTool for git operations
    - Initialize RTK on bot startup
    - Support 60+ command types (git, npm, cargo, pytest, docker, etc.)
    - Track and report token savings per command
    - Graceful fallback when RTK is not available
    
    Expected savings: 60-90% token reduction for supported commands