admin 91165bf6d1 feat: add infrastructure context to zCode system prompt
Non-secret only: Gitea repo URL, systemd service name, deploy workflow,
self-evolve push behavior. Zero credentials in source.
91165bf6d1 · 2026-05-05 19:47:30 +00:00
97 Commits
2026-04-01 17:18:04 +02:00

zCode CLI X

Agentic coding assistant with Z.AI + Telegram integration — autonomous code execution with real-time streaming, self-correction loops, persistent self-learning memory, and RTK token optimization.

💡 Get 10% OFF Z.AI — Use code ROK78RJKNW at z.ai/subscribe for the Coding Plan

Features

Core

  • 🤖 AI-Powered Code Generation: Powered by Z.AI GLM-5.1 (Coding Plan)
  • 📱 Telegram Bot: 24/7 via grammy + webhook with real-time SSE streaming
  • 🛠️ Full Engineering Access: Bash, FileEdit, WebSearch, Git tools
  • 🧠 Agent System: Code Reviewer, System Architect, DevOps Engineer
  • 📚 Skills System: Pre-built skills for common tasks

🧠 Self-Learning Memory

  • Persistent across sessions: JSON-backed memory store survives restarts
  • 5 categories: lesson, pattern, preference, discovery, gotcha
  • Auto-injected into system prompt: AI knows what it learned before — every conversation builds on the last
  • Smart eviction: Max 500 memories with priority-based eviction (old discoveries first, lessons/gotchas kept)
  • Deduplication: Same memory won't be stored twice — access count increments instead

🔬 Curiosity Engine

The bot doesn't just respond — it learns from every interaction. After each response, an asynchronous analysis pass runs:

User message + AI response
         │
         ▼
   ┌─────────────────┐
   │ Pattern Detector │  ← runs AFTER delivery (zero latency)
   └────────┬────────┘
            │
    ┌───────┼───────┬──────────┬──────────┐
    ▼       ▼       ▼          ▼          ▼
  Error   User    Successful  First-time  New API
  + Fix   Correct  Complex     Tool       Quirk
    │      │      Solution     Usage      Found
    ▼      ▼       ▼           ▼          ▼
  gotcha  lesson  pattern   discovery  discovery

What triggers learning:

Trigger Category Example
Error with fix found gotcha ENOENT: no such file → use absolute paths
User says "wrong" or "fix" lesson Correction on "npm install": use --legacy-peer-deps
Complex successful solution pattern Solution for "deploy to VPS": 12-step process with SSH
First tool usage works discovery Bash tool works for shell commands on this server
New API quirk discovered discovery Z.AI SSE sends empty data lines between chunks
Repeated user preference preference User always wants TypeScript over JavaScript

Commands:

Command Description
/memory View memory stats + recent memories
/remember <text> Manually save a memory (auto-detects category)
/recall <query> Search memories by keyword
/forget <id> Delete a specific memory

🎤 Voice I/O (Speech-to-Text + Text-to-Speech)

Fully local voice processing. No API keys, no cloud services, no costs.

User sends voice message
         │
         ▼
  ┌──────────────┐
  │ Download OGG │  ← Telegram Bot API
  │ to /tmp      │
  └──────┬───────┘
         │
         ▼
  ┌──────────────┐
  │ ffmpeg → WAV │  ← 16kHz mono (Vosk requirement)
  │ (16kHz mono) │
  └──────┬───────┘
         │
         ▼
  ┌──────────────┐
  │ Vosk STT     │  ← Offline, ~200ms, 68MB model
  │ Python bridge│     Zero network calls
  └──────┬───────┘
         │
         ▼
  {"text": "...", "confidence": 0.95}
         │
         ▼
  Feed into chatWithAI → AI responds
  (optionally via TTS tool → voice reply)
Component Technology Size Latency Cost
STT (voice→text) Vosk — offline speech recognition 68MB model ~200ms Free
TTS (text→voice) node-edge-tts — Microsoft Edge voices No download ~2s Free
Audio conversion ffmpeg (system) N/A ~100ms Free

How it works:

  1. Telegram sends voice as OGG Opus. Bot downloads to /tmp.
  2. scripts/stt.py — Python bridge that converts to WAV (ffmpeg) and runs Vosk inference.
  3. Returns JSON {"text": "...", "confidence": 0.95} to Node.js.
  4. Transcribed text enters the normal handleTextMessage() pipeline — full AI response with streaming, tools, memory, self-correction.
  5. AI can optionally use the tts tool to reply with a voice message.

Why Vosk over Whisper:

  • No GPU needed — runs on CPU, ~200MB RAM (Whisper needs 1-4GB)
  • Fast — 200ms vs 5-10s for Whisper on CPU
  • Tiny model — 68MB vs 1-3GB for Whisper
  • Offline — zero network calls, zero API costs
  • Good enough — ~95% accuracy for English speech

🧬 Self-Evolution

zCode can modify its own source code — and survive the attempt. Every change goes through a safety chain that guarantees rollback on any failure.

Safety chain (8 steps):

AI requests patch
       │
       ▼
┌──────────────────┐
│ 1. Git stash     │  ← save uncommitted work
│ 2. Git commit    │  ← checkpoint current state
│ 3. File backup   │  ← copy original to .self-evolve-backups/
│ 4. Apply patch   │  ← string find/replace in target file
│ 5. Syntax check  │  ← node --check / py_compile / JSON.parse
│ 6. Git commit    │  ← commit the change
│ 7. Push + restart│  ← deploy to production
│ 8. Health + smoke│  ← verify bot is alive and responding
└──────────────────┘
       │
   ┌───┴───┐
   │ FAIL? │──── auto git reset + restart + restore from backup
   └───┬───┘
       │ OK
       ▼
   ✅ Live

Protected files — the bot cannot modify these (would brick itself):

  • SelfEvolveTool.js — the safety system itself
  • stt.py — voice recognition bridge

Backup system:

  • Every patch auto-saves the original file to .self-evolve-backups/ with a timestamp ID
  • backups action lists all backups (file, size, timestamp)
  • restore action copies any backup back to production (with its own safety checks)
  • Backups survive git corruption — they're plain file copies, not git objects
  • Restoring also creates a pre-restore backup (undo the undo)

Actions: read, patch, list_files, git_log, diff, backups, restore

Rate limit: 1 patch per 60 seconds (prevents runaway self-modification loops)

🧠 Intelligence Routing

The core of zCode CLI X's reliability. A unified agentic loop that handles both streaming and non-streaming through the same execution path — no more split paths that lose context or hang silently.

User Message
     │
     ▼
┌─────────────────────────────────────────┐
│         chatWithAI() — Main Loop        │
│  ┌───────────────────────────────────┐  │
│  │  Call API (stream or non-stream)  │  │
│  └─────────────┬─────────────────────┘  │
│                │                        │
│         ┌──────┴──────┐                 │
│         ▼             ▼                 │
│   tool_calls?     text content          │
│         │             │                 │
│    ┌────┴────┐    return answer         │
│    ▼         ▼                           │
│  Execute   Feed results                  │
│  tools     back to AI                    │
│    │         │                           │
│    └────┬────┘                           │
│         ▼                                │
│    Append to messages → loop back        │
│    (max 10 turns, forced final answer)   │
└─────────────────────────────────────────┘

How it works:

Component Role
chatWithAI() Single entry point. While loop (max 10 turns) that calls API, handles tools, loops back.
streamChat() SSE transport. Streams tokens to user via onDelta(). Accumulates tool_call deltas instead of aborting. Returns { content, tool_calls, error }.
nonStreamChat() REST transport. Single POST, returns { content, tool_calls, error }.
Tool execution Runs all tool_calls in parallel, appends results as tool messages, loops back.

Key design decisions:

  • No recursive fallbacks. Old code had stream → detect tool → abort → call non-stream (which lost the stream context). Now both paths return the same struct and feed into the same loop.
  • Tool call accumulation. SSE sends tool_calls in delta chunks (name in one chunk, arguments in the next). streamChat() accumulates these across chunks and builds a complete tool_calls array — same as the non-streaming response.
  • Max 10 turns safety net. If the AI keeps calling tools (infinite loop), after 10 turns a final non-streaming call without tools forces a text answer.
  • Streaming is transport, not logic. Whether tokens stream to the user or not, the tool execution loop is identical.

Before Intelligence Routing:

Stream starts → detects tool_call → ABORT stream → recursive non-stream call
→ non-stream returns raw tool output as response → NO final answer from AI
→ user sees raw bash output or silence

After Intelligence Routing:

Stream starts → detects tool_call → accumulates full tool_call → executes tool
→ feeds result back to AI in same conversation → AI synthesizes final answer
→ streams final answer to user → done

Streaming & Formatting

  • Real-time SSE Streaming: Token-by-token delivery via StreamConsumer — adapted from Hermes Agent's GatewayStreamConsumer
    • Queued token buffer → rate-limited editMessageText loop (1s base interval)
    • Adaptive backoff on Telegram flood control (429)
    • Typing cursor during generation, clean final message
    • Graceful fallback to plain send on repeated failures
  • ⌨️ Persistent Typing Indicator: sendChatAction('typing') every 4s
    • Active from message receipt through entire response (tool turns + streaming)
    • Telegram typing expires after ~5s — 4s interval keeps it alive continuously
    • Only stops when final response is fully delivered
    • Ensures user always sees activity, even during long tool execution gaps
  • 🎨 Telegram HTML Formatting: AI markdown → clean Telegram HTML
    • **bold**, *italic*, `code`, fenced code blocks, [links](url), ~~strike~~, headings, blockquotes, lists
    • Double fallback: HTML → stripped plain text (never shows raw **)

Reliability

  • 🔄 Self-Correction Loops: Automatic retry with exponential backoff
    • 2 retry attempts (500ms → 1s → 1.5s delay)
    • Triggers: API errors, rate limits, timeouts, 5xx server errors
    • Auto-simplification: prompts simplified on retry to avoid recurring errors
    • Full logging of all retry attempts with reason tracking
  • 🔁 Auto-Restart: Process supervisor restarts the bot on crash (3s delay)
  • 🛡️ RTK (Rust Token Killer): Token optimization for supported commands
    • 60-90% savings on git, npm, cargo, pytest, docker, and more
    • Active tracking stats via getTrackingStats()

Architecture

  • 📨 Multi-Channel Delivery: Hub-based routing (Telegram + Discord + WebSocket + log)
  • 🔁 Deduplication: 60s TTL message deduplication
  • 📋 Request Queue: Per-chat sequential processing (no race conditions)
  • 🔌 MCP Protocol: Full MCP client + server management
  • Cron Scheduling: 1s interval, task locking, auto-recovery
  • 🛡️ Unhandled rejection guard: Catches any async error that slips through

📦 Installation

cd zcode-cli-x
npm install

⚙️ Configuration

Copy .env.example to .env and configure:

cp .env.example .env

Required Environment Variables

# Z.AI Configuration (Coding Plan)
GLM_BASE_URL=https://api.z.ai/api/coding/paas/v4
ZAI_API_KEY=***

# Telegram Bot Configuration
TELEGRAM_BOT_TOKEN=***
TELEGRAM_ALLOWED_USERS=your_telegram_id
ZCODE_WEBHOOK_URL=https://your-domain.com/telegram/webhook

🎮 Usage

Run as CLI

node bin/zcode.js

Run as Telegram Bot (24/7)

node bin/zcode.js --no-cli
# /etc/systemd/system/zcode.service
[Unit]
Description=zCode CLI X Bot
After=network.target

[Service]
Type=simple
User=<your-user>
WorkingDirectory=/path/to/zcode-cli-x
ExecStart=/usr/bin/node bin/zcode.js --no-cli
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target
sudo systemctl enable zcode
sudo systemctl start zcode

Quick restart (no systemd)

bash restart.sh

🤖 Telegram Bot Commands

Command Description
/start Show help and capabilities
/tools List available tools
/skills List loaded skills
/agents List agent roles
/model <name> Switch AI model
/stats System & RTK stats
/memory 🧠 Persistent memory stats
/remember <text> 📝 Save to memory
/recall <query> 🔍 Search memory
/forget <id> 🗑 Delete a memory
/selfcorrection Self-correction status
/bash <cmd> Execute shell command
/web <query> Search the web
/git <action> Git operations
/cancel Cancel current operation

Or just chat — zCode uses tools automatically when needed.

🛠️ Tools

Tool Description
BashTool Shell command execution with timeout control
FileEditTool Diff-aware file operations (read, write, patch)
WebSearchTool Web search with result ranking
GitTool Git operations (status, log, diff, commit, push, pull)

🧠 Agents

Agent Role
Code Reviewer Review code for bugs, security issues, and improvements
System Architect Design system architecture and patterns
DevOps Engineer Handle deployment, CI/CD, and infrastructure

🏗️ Architecture

zCode CLI X uses a hybrid architecture:

User Message
    │
    ▼
┌─────────────┐     ┌──────────────┐     ┌─────────────────┐
│  Telegram    │────▶│ dedup.js     │────▶│ request-queue   │
│  (grammy)    │     │ (60s TTL)    │     │ (per-chat seq)  │
└─────────────┘     └──────────────┘     └────────┬────────┘
                                                   │
                                                   ▼
                                          ┌────────────────┐
                                          │ self-correction │
                                          │ (2 retries +   │
                                          │  backoff)       │
                                          └────────┬───────┘
                                                   │
                              ┌────────────────────┼────────────────────┐
                              ▼                    ▼                    ▼
                     ┌──────────────┐    ┌──────────────┐    ┌──────────────┐
                     │ chatWithAI   │    │ Tool         │    │ Agent        │
                     │ (SSE stream) │    │ Handlers     │    │ Delegation   │
                     └──────┬───────┘    └──────────────┘    └──────────────┘
                            │
                    ┌───────┴────────┐
                    ▼                ▼
            ┌──────────────┐ ┌──────────────┐
            │ Stream       │ │ sendFormatted│
            │ Consumer     │ │ (HTML mode)  │
            │ (edit-in-    │ └──────────────┘
            │  place)      │
            └──────┬───────┘
                   │
                   ▼
            ┌──────────────┐
            │ markdown     │
            │ ToHtml()     │
            │ converter    │
            └──────┬───────┘
                   │
                   ▼
            ┌──────────────┐
            │ Telegram API │
            │ (HTML mode)  │
            └──────┬───────┘
                   │
                   ▼
            ┌──────────────┐
            │ 🧠 Self-     │  ← async, zero latency
            │ Learning     │     extracts patterns
            │ Engine       │     stores to memory.json
            └──────────────┘

Core Components

zcode-cli-x/
├── bin/
│   └── zcode.js              # CLI entry point
├── src/
│   ├── bot/
│   │   ├── index.js          # Telegram bot (grammy + SSE streaming + memory)
│   │   ├── message-sender.js # StreamConsumer + markdownToHtml converter
│   │   ├── memory.js         # Persistent self-learning memory store
│   │   ├── deduplication.js  # Message deduplication (60s TTL)
│   │   ├── request-queue.js  # Per-chat request queuing
│   │   ├── delivery-hub.js   # Multi-channel delivery
│   │   ├── discord.js        # Discord integration (discord.js v14)
│   │   └── self-correction.js # Self-correction wrapper (2 retries + backoff)
│   ├── api/
│   │   └── index.js          # Z.AI API adapter (GLM-5.1, SSE support)
│   ├── tools/
│   │   ├── BashTool.js       # Shell command executor (RTK-aware)
│   │   ├── FileEditTool.js   # File operations
│   │   ├── WebSearchTool.js  # Web search
│   │   └── GitTool.js        # Git operations (RTK-aware)
│   ├── agents/
│   │   └── index.js          # Agent orchestration
│   ├── skills/
│   │   └── index.js          # Skills system
│   └── utils/
│       ├── logger.js         # Winston logger
│       ├── env.js            # Environment validation
│       └── rtk.js            # RTK (Rust Token Killer) integration
├── data/
│   └── memory.json           # Persistent memory (auto-created, gitignored)
├── logs/                     # Runtime logs (gitignored)
├── .env                      # Configuration
└── package.json

Bot Message Flow

  1. Message Reception: Telegram webhook → grammy handler
  2. Deduplication: deduplication.js (60s TTL, prevents double-processing)
  3. Request Queue: request-queue.js (per-chat sequential processing)
  4. Memory Injection: Memory context injected into system prompt
  5. Self-Correction: self-correction.js (2 retries + exponential backoff + auto-simplification)
  6. AI Chat + Streaming: chatWithAI() → SSE stream → StreamConsumer → real-time edits
  7. Formatting: markdownToHtml() converts AI markdown → Telegram HTML
  8. Final Delivery: editMessageText with HTML parse_mode (or fallback to stripped plain text)
  9. Self-Learning: selfLearn() analyzes interaction → extracts patterns → saves to memory.json

StreamConsumer Pipeline

Z.AI API (SSE)
    │
    ▼ onDelta(token)
┌──────────────┐
│ Token Buffer │  ← accumulates tokens from SSE stream
└──────┬───────┘
       │ (every ~1s or 40 chars)
       ▼
┌──────────────┐
│ editMessage  │  ← plain text + cursor ▉ (no parse_mode)
│ Text()       │     rate-limited, adaptive backoff on flood
└──────┬───────┘
       │ (on finish)
       ▼
┌──────────────┐
│ markdownTo   │  ← converts **bold**, *italic*, `code`, etc.
│ Html()       │     to <b>, <i>, <code>, <pre> HTML tags
└──────┬───────┘
       │
       ▼
┌──────────────┐
│ editMessage  │  ← final message with parse_mode: 'HTML'
│ Text()       │     fallback: stripped plain text (no raw **)
└──────────────┘
       │
       ▼ (async, after delivery)
┌──────────────┐
│ selfLearn()  │  ← pattern detector extracts learnable insights
│              │     saves to data/memory.json
└──────────────┘

Memory System Architecture

┌──────────────────────────────────────────────┐
│              data/memory.json                │
│  (persistent, survives restarts, gitignored) │
└──────────────────┬───────────────────────────┘
                   │
         ┌─────────┴─────────┐
         │   MemoryStore     │
         │   (singleton)     │
         └─────────┬─────────┘
                   │
    ┌──────────────┼──────────────┬──────────────┬──────────────┐
    ▼              ▼              ▼              ▼              ▼
  📖 lesson    🔧 pattern    👤 preference  💡 discovery   ⚠️ gotcha
  "Always       "For deploy:  "User prefers   "Z.AI SSE      "ENOENT →
   use abs      use scp..."    TS over JS"     sends empty    use absolute
   paths"                                     data lines"     paths"
    │              │              │              │              │
    └──────────────┴──────────────┴──────────────┴──────────────┘
                   │
                   ▼
    buildContextSummary() → injected into system prompt
    recall(query) → search memories
    remember(cat, text) → save new memory
    forget(id) → delete memory

Priority in system prompt: gotchas > lessons > patterns > preferences > discoveries

Eviction policy: When memory exceeds 500 entries, old single-access discoveries are evicted first. Lessons and gotchas are never evicted unless all else fails.

📊 Feature Comparison

Feature zCode CLI X Hermes Agent better-clawd
Agentic
Autonomous execution Full autonomous mode Full autonomous mode ⚠️ Manual step-by-step
Sub-agents Multi-agent (swarm) delegate_task + batch Single agent only
Agent roles Code Reviewer, Architect, DevOps Agent Registry (10+ roles) Fixed single role
Self-correction loops 2 retries + backoff + auto-simplification Agent self-correction skill None
Intelligence
Persistent memory JSON-backed, 5 categories, auto-learn Cross-session memory None
Self-learning / curiosity Pattern detector + auto-extraction Knowledge + memory tools None
Memory-injected prompts Every conversation uses past lessons Memory injected None
Streaming
Intelligence Routing Unified agentic loop (stream+non-stream) ⚠️ Separate stream/non-stream paths None
Real-time SSE streaming StreamConsumer (edit-in-place) GatewayStreamConsumer None
Tool call accumulation Delta accumulation from SSE chunks ⚠️ Abort on tool detection None
Telegram HTML formatting markdownToHtml + fallback Native HTML support None
Adaptive flood control Exponential backoff Flood backoff N/A
Tooling
Bash/Shell BashTool TerminalTool Shell access
File editing FileEditTool (diff-aware) Patch + Write + Edit ⚠️ Basic write
Web search WebSearch WebSearch + Vane + Exa None
Git integration GitTool (RTK-aware) GitTool None
Browser automation Computer-use (Anthropic) Full browser toolkit None
MCP servers Full MCP protocol Native MCP + mcporter None
RTK optimization RTK active (60-90% savings) RTK integrated None
Platform
Telegram integration Native bot + webhook + streaming 2-way Telegram bridge
Persistent typing indicator 4s interval, active during full response None
Discord Native bot (discord.js) Full Discord integration
Multi-channel delivery Delivery hub (TG + DC + WS + log) Cron→multi-platform None
Voice
Speech-to-Text Vosk (offline, ~200ms, 68MB) ⚠️ Whisper (needs GPU)
Text-to-Speech Edge TTS (free, 100+ voices) node-edge-tts
Voice→AI pipeline Transcribe → full agentic loop ⚠️ Separate pipeline
Self-Evolution
Self-modifying code Patch own source + auto-deploy Not supported
Auto-backup before change File-level + git checkpoint N/A
Auto-rollback on failure Syntax + health + smoke test N/A
Backup history + restore Timestamped backups, 1-command restore N/A
Infrastructure
Model routing Multi-provider Multi-provider routing Single model
Context compression Compact pipeline lean-ctx MCP (90% savings) None
Auto-restart Process supervisor systemd managed None
Cron scheduling 1s interval, jitter, locks Cron jobs with delivery None

Summary

  • zCode CLI X — Lightweight agentic coder focused on Telegram + Z.AI. Intelligence Routing — a unified agentic loop that handles streaming and non-streaming through one execution path with tool call accumulation from SSE deltas. Real-time SSE streaming, self-correction loops, persistent self-learning memory with curiosity engine, RTK optimization, and beautiful HTML formatting. Gets smarter with every conversation. Ideal for quick coding tasks via Telegram.
  • Hermes Agent — Full-stack AI assistant platform. Best for complex multi-agent workflows, scheduled automation, and cross-platform deployment. 500+ skills, MCP ecosystem, deepest toolset.
  • better-clawd — Minimal Claude Code clone. Useful as a lightweight reference but lacks agentic depth.

🔗 Integrations

  • Z.AI API: GLM-5.1 model (Coding Plan) with SSE streaming
  • Telegram Bot API: grammy + auto-retry + sequentialize + webhook
  • Discord.js v14: Discord bot with GatewayIntentBits
  • Express.js: HTTP server for webhook handling
  • Winston: Structured logging
  • WebSocket: Real-time updates
  • RTK: Rust Token Killer (token optimization)
  • Vosk: Offline speech recognition (STT, 68MB model, no API key)
  • ffmpeg: Audio conversion (OGG → WAV for Vosk)

🤝 Contributing

Contributions welcome! Based on:

  • better-clawd — Claude Code clone
  • Hermes Agent — AI assistant platform (streaming architecture + memory system credit)

Built with by zCode CLI X

Description
Your Z.ai CLI buddy a hybrid of smart agent and coding all in one.
Readme 50 MiB
Languages
JavaScript 54.6%
TypeScript 42.5%
MDX 2.1%
HTML 0.6%
CSS 0.1%
Other 0.1%