claudish/ai_docs/THINKING_ALIGNMENT_SUMMARY.md

67 lines
3.5 KiB
Markdown
Raw Permalink Normal View History

# Thinking Translation Model Alignment Summary
**Last Updated:** 2025-11-25
**Status:** Verification Complete ✅
## Overview
We have implemented a comprehensive **Thinking Translation Model** that aligns Claude Code's native `thinking.budget_tokens` parameter with the diverse reasoning configurations of 6 major AI providers. This ensures that when a user requests a specific thinking budget (e.g., "Think for 16k tokens"), it is correctly translated into the native control mechanism for the target model.
## Provider Alignment Matrix
| Provider | Model | Claude Parameter | Translated Parameter | Logic |
| :--- | :--- | :--- | :--- | :--- |
| **OpenAI** | o1, o3 | `budget_tokens` | `reasoning_effort` | < 4k: `minimal`<br>4k-16k: `low`<br>16k-32k: `medium`<br>> 32k: `high` |
| **Google** | Gemini 3.0 | `budget_tokens` | `thinking_level` | < 16k: `low`<br>>= 16k: `high` |
| **Google** | Gemini 2.5/2.0 | `budget_tokens` | `thinking_config.thinking_budget` | Passes exact budget (capped at 24,576) |
| **xAI** | Grok 3 Mini | `budget_tokens` | `reasoning_effort` | < 20k: `low`<br>>= 20k: `high` |
| **Qwen** | Qwen 2.5/3 | `budget_tokens` | `enable_thinking`, `thinking_budget` | `enable_thinking: true`<br>`thinking_budget`: exact value |
| **MiniMax** | M2 | `thinking` | `reasoning_split` | `reasoning_split: true` |
| **DeepSeek** | R1 | `thinking` | *(Stripped)* | Parameter removed to prevent API error (400) |
## Implementation Details
### 1. OpenAI Adapter (`OpenAIAdapter`)
- **File:** `src/adapters/openai-adapter.ts`
- **Behavior:** Maps continuous token budget into discrete effort levels.
- **New Feature:** Added support for `minimal` effort (typically < 4000 tokens) for faster, lighter reasoning tasks.
### 2. Gemini Adapter (`GeminiAdapter`)
- **File:** `src/adapters/gemini-adapter.ts`
- **Behavior:**
- **Gemini 3 detection:** Checks `modelId` for "gemini-3". Uses `thinking_level`.
- **Backward Compatibility:** Defaults to `thinking_config` for Gemini 2.0/2.5.
- **Safety:** Caps budget at 24k tokens to maintain stability.
### 3. Grok Adapter (`GrokAdapter`)
- **File:** `src/adapters/grok-adapter.ts`
- **Behavior:**
- **Validation:** Explicitly checks for "mini" models (Grok 3 Mini).
- **Stripping:** Removes thinking parameters for standard Grok 3 models which do not support API-controlled reasoning (prevents errors).
### 4. Qwen Adapter (`QwenAdapter`)
- **File:** `src/adapters/qwen-adapter.ts`
- **Behavior:**
- Enables the specific `enable_thinking` flag required by Alibaba Cloud / OpenRouter.
- Passes the budget through directly.
### 5. MiniMax Adapter (`MiniMaxAdapter`)
- **File:** `src/adapters/minimax-adapter.ts`
- **Behavior:**
- Sets `reasoning_split: true`.
- Does not support budget control, but correctly enables the interleaved reasoning feature.
### 6. DeepSeek Adapter (`DeepSeekAdapter`)
- **File:** `src/adapters/deepseek-adapter.ts`
- **Behavior:**
- **Defensive:** Detects DeepSeek models and *removes* the `thinking` object.
- **Reasoning:** Reasoning happens automatically (R1) or not at all; sending the parameter causes API rejection.
## Protocol Integration
The translation happens during the `prepareRequest` phase of the `BaseModelAdapter`.
1. **Intercept:** The adapter intercepts the `ClaudeRequest`.
2. **Translate:** It reads `thinking.budget_tokens`.
3. **Mutate:** It modifies the `OpenRouterPayload` to add provider-specific fields.
4. **Clean:** It deletes the original `thinking` object to prevent OpenRouter from receiving conflicting or unrecognized parameters.