v3.8.0: AI Monitoring — self-healing watchdog with 3-tier response system

- HealthWatcher thread: monitors proxy /health every 5s
- LogAnalyzer thread: tails cc-debug.log for 18 failure signal patterns
- Tier 1 rule engine: 14 rules for instant auto-recovery (< 1s)
- Tier 2 incident store: JSON pattern database with success rates
- Tier 3 AI diagnostic agent: calls configurable provider/model for novel failures
- AIMonitoringWindow GUI: ON/OFF toggle, provider/model/API key selector, incident log
- 30 fault types catalogued across 5 categories (A-E)
- Enhanced /health endpoint with memory_mb, uptime_s, requests_total
- Auto-restart proxy, auto-clear schema cache, kill stale processes
- Safety: rate-limited AI calls, restart caps, cooldowns per pattern
- AI Monitoring design spec (AI-MONITORING-DESIGN.md)
- 54 self-test patterns passing
This commit is contained in:
admin
2026-05-22 22:36:16 +04:00
Unverified
parent a56db90e68
commit 096d32bebd
4 changed files with 621 additions and 17 deletions

View File

@@ -3410,10 +3410,20 @@ class Handler(http.server.BaseHTTPRequestHandler):
if self.path in ("/v1/models", "/models"):
self.send_json(200, {"object": "list", "data": MODELS})
elif self.path in ("/health", "/v1/health"):
import resource as _res
_mem_mb = 0
try:
_mem_mb = _res.getrusage(_res.RUSAGE_SELF).ru_maxrss / 1024
except Exception:
pass
_uptime = time.time() - _START_TIME if '_START_TIME' in dir() else 0
self.send_json(200, {"ok": True, "backend": BACKEND,
"target_url": TARGET_URL,
"models": [m.get("id") for m in MODELS],
"bgp_routes": len(BGP_ROUTES)})
"bgp_routes": len(BGP_ROUTES),
"uptime_s": round(_uptime, 1),
"memory_mb": round(_mem_mb, 1),
"requests_total": _STATS.get("requests", 0)})
else:
self.send_error(404)
@@ -4750,10 +4760,11 @@ def _handle_shutdown_signal(sig, frame):
_SHUTDOWN_REQUESTED = True
print(f"[SELF-REVIVE] Signal {sig} received, shutting down cleanly", flush=True)
if 'SERVER' in globals() and SERVER:
SERVER.shutdown()
SERVER.shutdown()
def main():
global SERVER
global SERVER, _START_TIME
_START_TIME = time.time()
_init_runtime()
signal.signal(signal.SIGTERM, _handle_shutdown_signal)
signal.signal(signal.SIGINT, _handle_shutdown_signal)