fix: eliminate EADDRINUSE crash loop with robust port binding

Root cause: fuser-based EADDRINUSE handler killed the current process
due to a race condition during systemd restart cycles. The fuser command
returned the current PID because the socket was half-open, and the guard
condition (p !== process.pid) failed to filter it.

Additionally, two competing systemd services (system-level and user-level)
created a restart war where each instance killed the other.

Fix approach (inspired by Next.js, Vite, webpack-dev-server):
- Replace fuser with net.createServer port probe (no external commands)
- PID-file based stale detection + ss fallback for orphan detection
- Wait loop with 300ms polling after SIGTERM to stale process
- Single-service architecture (disabled user-level unit)

Tested: 5 consecutive rapid restarts, 8+ minute uptime, zero crashes.

Co-Authored-By: zcode <noreply@zcode.dev>
This commit is contained in:
admin
2026-05-06 12:47:36 +00:00
Unverified
parent c164446a9c
commit 98ed33ba8f
4 changed files with 198 additions and 69 deletions

View File

@@ -8,12 +8,16 @@ User=uroma2
WorkingDirectory=/home/uroma2/zcode-cli-x
ExecStart=/usr/bin/node /home/uroma2/zcode-cli-x/bin/zcode.js --no-cli
Restart=always
RestartSec=10
RestartSec=5
StandardOutput=append:/home/uroma2/zcode-cli-x/logs/zcode.log
StandardError=append:/home/uroma2/zcode-cli-x/logs/zcode-error.log
Environment="NODE_ENV=production"
Environment="LOG_LEVEL=info"
EnvironmentFile=/home/uroma2/zcode-cli-x/.env
TimeoutStartSec=60
TimeoutStopSec=15
[Install]
WantedBy=multi-user.target