v2.2.0: per-provider reasoning controls (on/off + effort level)
- Add Reasoning On/Off toggle and Effort selector in endpoint editor - Proxy sends enable_thinking=false when reasoning is OFF - Proxy sends reasoning_effort level when reasoning is ON - Strip reasoning_content from output, force max_tokens=64000 minimum - Fixes Crof mimo-v2.5-pro and similar reasoning model token exhaustion
This commit is contained in:
12
CHANGELOG.md
12
CHANGELOG.md
@@ -1,5 +1,17 @@
|
||||
# Changelog
|
||||
|
||||
## v2.2.0 (2026-05-20)
|
||||
|
||||
- **Added per-provider Reasoning controls in endpoint editor**
|
||||
- Reasoning On/Off toggle — disable reasoning for models that exhaust output tokens (e.g., Crof mimo-v2.5-pro)
|
||||
- Reasoning Effort selector: None, Minimal, Low, Medium, High, Max
|
||||
- When reasoning is OFF: sends `enable_thinking=false` + `reasoning_effort=none` to upstream API
|
||||
- When reasoning is ON: sends user-selected effort level (default: Medium)
|
||||
- Settings stored per-endpoint, passed through proxy config to upstream requests
|
||||
- Strip `reasoning_content` from proxy output — Codex doesn't use it, avoids token waste
|
||||
- Force `max_tokens=64000` minimum for openai-compat providers — room for both reasoning and content
|
||||
- Inspired by unsloth's reasoning control patterns for Qwen/GPT-OSS models
|
||||
|
||||
## v2.1.3 (2026-05-19)
|
||||
|
||||
- **Fixed Crof mimo-v2.5-pro stopping mid-response (finish_reason=length)**
|
||||
|
||||
Reference in New Issue
Block a user