claudish/ai_docs/THINKING_ALIGNMENT_SUMMARY.md

# Thinking Translation Model Alignment Summary

**Last Updated:** 2025-11-25
**Status:** Verification Complete ✅

## Overview

We have implemented a comprehensive **Thinking Translation Model** that aligns Claude Code's native `thinking.budget_tokens` parameter with the diverse reasoning configurations of 6 major AI providers. This ensures that when a user requests a specific thinking budget (e.g., "Think for 16k tokens"), it is correctly translated into the native control mechanism for the target model.

## Provider Alignment Matrix

| Provider | Model | Claude Parameter | Translated Parameter | Logic |
| :--- | :--- | :--- | :--- | :--- |
| **OpenAI** | o1, o3 | `budget_tokens` | `reasoning_effort` | < 4k: `minimal`<br>4k-16k: `low`<br>16k-32k: `medium`<br>> 32k: `high` |
| **Google** | Gemini 3.0 | `budget_tokens` | `thinking_level` | < 16k: `low`<br>>= 16k: `high` |
| **Google** | Gemini 2.5/2.0 | `budget_tokens` | `thinking_config.thinking_budget` | Passes exact budget (capped at 24,576) |
| **xAI** | Grok 3 Mini | `budget_tokens` | `reasoning_effort` | < 20k: `low`<br>>= 20k: `high` |
| **Qwen** | Qwen 2.5/3 | `budget_tokens` | `enable_thinking`, `thinking_budget` | `enable_thinking: true`<br>`thinking_budget`: exact value |
| **MiniMax** | M2 | `thinking` | `reasoning_split` | `reasoning_split: true` |
| **DeepSeek** | R1 | `thinking` | *(Stripped)* | Parameter removed to prevent API error (400) |

## Implementation Details

### 1. OpenAI Adapter (`OpenAIAdapter`)
- **File:** `src/adapters/openai-adapter.ts`
- **Behavior:** Maps continuous token budget into discrete effort levels.
- **New Feature:** Added support for `minimal` effort (typically < 4000 tokens) for faster, lighter reasoning tasks.

### 2. Gemini Adapter (`GeminiAdapter`)
- **File:** `src/adapters/gemini-adapter.ts`
- **Behavior:**
    - **Gemini 3 detection:** Checks `modelId` for "gemini-3". Uses `thinking_level`.
    - **Backward Compatibility:** Defaults to `thinking_config` for Gemini 2.0/2.5.
    - **Safety:** Caps budget at 24k tokens to maintain stability.

### 3. Grok Adapter (`GrokAdapter`)
- **File:** `src/adapters/grok-adapter.ts`
- **Behavior:**
    - **Validation:** Explicitly checks for "mini" models (Grok 3 Mini).
    - **Stripping:** Removes thinking parameters for standard Grok 3 models which do not support API-controlled reasoning (prevents errors).

### 4. Qwen Adapter (`QwenAdapter`)
- **File:** `src/adapters/qwen-adapter.ts`
- **Behavior:**
    - Enables the specific `enable_thinking` flag required by Alibaba Cloud / OpenRouter.
    - Passes the budget through directly.

### 5. MiniMax Adapter (`MiniMaxAdapter`)
- **File:** `src/adapters/minimax-adapter.ts`
- **Behavior:**
    - Sets `reasoning_split: true`.
    - Does not support budget control, but correctly enables the interleaved reasoning feature.

### 6. DeepSeek Adapter (`DeepSeekAdapter`)
- **File:** `src/adapters/deepseek-adapter.ts`
- **Behavior:**
    - **Defensive:** Detects DeepSeek models and *removes* the `thinking` object.
    - **Reasoning:** Reasoning happens automatically (R1) or not at all; sending the parameter causes API rejection.

## Protocol Integration

The translation happens during the `prepareRequest` phase of the `BaseModelAdapter`.
1.  **Intercept:** The adapter intercepts the `ClaudeRequest`.
2.  **Translate:** It reads `thinking.budget_tokens`.
3.  **Mutate:** It modifies the `OpenRouterPayload` to add provider-specific fields.
4.  **Clean:** It deletes the original `thinking` object to prevent OpenRouter from receiving conflicting or unrecognized parameters.
Initial commit: Claudish - OpenRouter proxy for Claude Code A proxy server that enables Claude Code to work with any OpenRouter model (Grok, GPT-5, Gemini, DeepSeek, etc.) with automatic message transformation. Features: - Model-specific adapters for Grok, Gemini, OpenAI, DeepSeek, Qwen, MiniMax - Interactive and single-shot CLI modes - MCP server support - Monitor mode for debugging - Comprehensive test suite 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-11-28 13:25:08 +03:00			`# Thinking Translation Model Alignment Summary`

			`Last Updated: 2025-11-25`
			`Status: Verification Complete ✅`

			`## Overview`

			We have implemented a comprehensive Thinking Translation Model that aligns Claude Code's native `thinking.budget_tokens` parameter with the diverse reasoning configurations of 6 major AI providers. This ensures that when a user requests a specific thinking budget (e.g., "Think for 16k tokens"), it is correctly translated into the native control mechanism for the target model.

			`## Provider Alignment Matrix`

			`\| Provider \| Model \| Claude Parameter \| Translated Parameter \| Logic \|`
			`\| :--- \| :--- \| :--- \| :--- \| :--- \|`
			\| OpenAI \| o1, o3 \| `budget_tokens` \| `reasoning_effort` \| < 4k: `minimal`<br>4k-16k: `low`<br>16k-32k: `medium`<br>> 32k: `high` \|
			\| Google \| Gemini 3.0 \| `budget_tokens` \| `thinking_level` \| < 16k: `low`<br>>= 16k: `high` \|
			\| Google \| Gemini 2.5/2.0 \| `budget_tokens` \| `thinking_config.thinking_budget` \| Passes exact budget (capped at 24,576) \|
			\| xAI \| Grok 3 Mini \| `budget_tokens` \| `reasoning_effort` \| < 20k: `low`<br>>= 20k: `high` \|
			\| Qwen \| Qwen 2.5/3 \| `budget_tokens` \| `enable_thinking`, `thinking_budget` \| `enable_thinking: true`<br>`thinking_budget`: exact value \|
			\| MiniMax \| M2 \| `thinking` \| `reasoning_split` \| `reasoning_split: true` \|
			\| DeepSeek \| R1 \| `thinking` \| (Stripped) \| Parameter removed to prevent API error (400) \|

			`## Implementation Details`

			### 1. OpenAI Adapter (`OpenAIAdapter`)
			- File: `src/adapters/openai-adapter.ts`
			`- Behavior: Maps continuous token budget into discrete effort levels.`
			- New Feature: Added support for `minimal` effort (typically < 4000 tokens) for faster, lighter reasoning tasks.

			### 2. Gemini Adapter (`GeminiAdapter`)
			- File: `src/adapters/gemini-adapter.ts`
			`- Behavior:`
			- Gemini 3 detection: Checks `modelId` for "gemini-3". Uses `thinking_level`.
			- Backward Compatibility: Defaults to `thinking_config` for Gemini 2.0/2.5.
			`- Safety: Caps budget at 24k tokens to maintain stability.`

			### 3. Grok Adapter (`GrokAdapter`)
			- File: `src/adapters/grok-adapter.ts`
			`- Behavior:`
			`- Validation: Explicitly checks for "mini" models (Grok 3 Mini).`
			`- Stripping: Removes thinking parameters for standard Grok 3 models which do not support API-controlled reasoning (prevents errors).`

			### 4. Qwen Adapter (`QwenAdapter`)
			- File: `src/adapters/qwen-adapter.ts`
			`- Behavior:`
			- Enables the specific `enable_thinking` flag required by Alibaba Cloud / OpenRouter.
			`- Passes the budget through directly.`

			### 5. MiniMax Adapter (`MiniMaxAdapter`)
			- File: `src/adapters/minimax-adapter.ts`
			`- Behavior:`
			- Sets `reasoning_split: true`.
			`- Does not support budget control, but correctly enables the interleaved reasoning feature.`

			### 6. DeepSeek Adapter (`DeepSeekAdapter`)
			- File: `src/adapters/deepseek-adapter.ts`
			`- Behavior:`
			- Defensive: Detects DeepSeek models and removes the `thinking` object.
			`- Reasoning: Reasoning happens automatically (R1) or not at all; sending the parameter causes API rejection.`

			`## Protocol Integration`

			The translation happens during the `prepareRequest` phase of the `BaseModelAdapter`.
			1. Intercept: The adapter intercepts the `ClaudeRequest`.
			2. Translate: It reads `thinking.budget_tokens`.
			3. Mutate: It modifies the `OpenRouterPayload` to add provider-specific fields.
			4. Clean: It deletes the original `thinking` object to prevent OpenRouter from receiving conflicting or unrecognized parameters.