SuperCharged-Claude-Code-Upgrade/prometheus/docs/Evaluation-log.md at main

Files

admin b52318eeae feat: Add intelligent auto-router and enhanced integrations

- Add intelligent-router.sh hook for automatic agent routing
- Add AUTO-TRIGGER-SUMMARY.md documentation
- Add FINAL-INTEGRATION-SUMMARY.md documentation
- Complete Prometheus integration (6 commands + 4 tools)
- Complete Dexto integration (12 commands + 5 tools)
- Enhanced Ralph with access to all agents
- Fix /clawd command (removed disable-model-invocation)
- Update hooks.json to v5 with intelligent routing
- 291 total skills now available
- All 21 commands with automatic routing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2026-01-28 00:27:56 +04:00

1.7 KiB

Raw Permalink Blame History

This log tracks our evaluation results and associated costs.

Date	Executed by	Version	Dataset	#Instance	Model	Resolved Rate	API Cost	Notes
2025-07-08	Yue Pan	v1.0	SWE-Bench Lite	300	DeepSeek V3	28.67%	$70.05	initial version
2025-07-18	Yue Pan	v1.0	SWE-Bench Multilingual	300	DeepSeek V3	13.67%	$113.6	initial version
2025-07-31	Yue Pan	v1.1	SWE-Bench Lite	300	GPT-4o	30.00%	$1569.73	context retrieval improved version
2025-08-09	Zhaoyang	v1.0	SWE-Bench Verified	500	Devstral Medium 2507	33.00%	-
2025-08-11	Yue Pan	v1.1	SWE-Bench Verified	500	Devstral Medium 2507	38.4%	-
2025-11-06	Yue Pan	v1.3(with Athena)	dcloud347/SWE-bench_verified_lite	50	GPT-5 + gpt-4o	70.00%	$200.79
2025-11-06	Yue Pan	v1.3(without Athena)	dcloud347/SWE-bench_verified_lite	50	GPT-5 + gpt-4o	56.00%	$367.73

1.7 KiB Raw Permalink Blame History

This log tracks our evaluation results and associated costs.

1.7 KiB

Raw Permalink Blame History