Files
SuperCharged-Claude-Code-Up…/prometheus/docs/Evaluation-log.md
admin b52318eeae feat: Add intelligent auto-router and enhanced integrations
- Add intelligent-router.sh hook for automatic agent routing
- Add AUTO-TRIGGER-SUMMARY.md documentation
- Add FINAL-INTEGRATION-SUMMARY.md documentation
- Complete Prometheus integration (6 commands + 4 tools)
- Complete Dexto integration (12 commands + 5 tools)
- Enhanced Ralph with access to all agents
- Fix /clawd command (removed disable-model-invocation)
- Update hooks.json to v5 with intelligent routing
- 291 total skills now available
- All 21 commands with automatic routing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-28 00:27:56 +04:00

1.7 KiB

This log tracks our evaluation results and associated costs.

Date Executed by Version Dataset #Instance Model Resolved Rate API Cost Notes
2025-07-08 Yue Pan v1.0 SWE-Bench Lite 300 DeepSeek V3 28.67% $70.05 initial version
2025-07-18 Yue Pan v1.0 SWE-Bench Multilingual 300 DeepSeek V3 13.67% $113.6 initial version
2025-07-31 Yue Pan v1.1 SWE-Bench Lite 300 GPT-4o 30.00% $1569.73 context retrieval improved version
2025-08-09 Zhaoyang v1.0 SWE-Bench Verified 500 Devstral Medium 2507 33.00% -
2025-08-11 Yue Pan v1.1 SWE-Bench Verified 500 Devstral Medium 2507 38.4% -
2025-11-06 Yue Pan v1.3(with Athena) dcloud347/SWE-bench_verified_lite 50 GPT-5 + gpt-4o 70.00% $200.79
2025-11-06 Yue Pan v1.3(without Athena) dcloud347/SWE-bench_verified_lite 50 GPT-5 + gpt-4o 56.00% $367.73