- Add intelligent-router.sh hook for automatic agent routing - Add AUTO-TRIGGER-SUMMARY.md documentation - Add FINAL-INTEGRATION-SUMMARY.md documentation - Complete Prometheus integration (6 commands + 4 tools) - Complete Dexto integration (12 commands + 5 tools) - Enhanced Ralph with access to all agents - Fix /clawd command (removed disable-model-invocation) - Update hooks.json to v5 with intelligent routing - 291 total skills now available - All 21 commands with automatic routing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1.7 KiB
1.7 KiB
This log tracks our evaluation results and associated costs.
| Date | Executed by | Version | Dataset | #Instance | Model | Resolved Rate | API Cost | Notes |
|---|---|---|---|---|---|---|---|---|
| 2025-07-08 | Yue Pan | v1.0 | SWE-Bench Lite | 300 | DeepSeek V3 | 28.67% | $70.05 | initial version |
| 2025-07-18 | Yue Pan | v1.0 | SWE-Bench Multilingual | 300 | DeepSeek V3 | 13.67% | $113.6 | initial version |
| 2025-07-31 | Yue Pan | v1.1 | SWE-Bench Lite | 300 | GPT-4o | 30.00% | $1569.73 | context retrieval improved version |
| 2025-08-09 | Zhaoyang | v1.0 | SWE-Bench Verified | 500 | Devstral Medium 2507 | 33.00% | - | |
| 2025-08-11 | Yue Pan | v1.1 | SWE-Bench Verified | 500 | Devstral Medium 2507 | 38.4% | - | |
| 2025-11-06 | Yue Pan | v1.3(with Athena) | dcloud347/SWE-bench_verified_lite | 50 | GPT-5 + gpt-4o | 70.00% | $200.79 | |
| 2025-11-06 | Yue Pan | v1.3(without Athena) | dcloud347/SWE-bench_verified_lite | 50 | GPT-5 + gpt-4o | 56.00% | $367.73 |