3.5 KiB

Raw Blame History

Thinking Translation Model Alignment Summary

Last Updated: 2025-11-25 Status: Verification Complete ✅

Overview

We have implemented a comprehensive Thinking Translation Model that aligns Claude Code's native thinking.budget_tokens parameter with the diverse reasoning configurations of 6 major AI providers. This ensures that when a user requests a specific thinking budget (e.g., "Think for 16k tokens"), it is correctly translated into the native control mechanism for the target model.

Provider Alignment Matrix

Provider	Model	Claude Parameter	Translated Parameter	Logic
OpenAI	o1, o3	`budget_tokens`	`reasoning_effort`	< 4k: `minimal` 4k-16k: `low` 16k-32k: `medium` > 32k: `high`
Google	Gemini 3.0	`budget_tokens`	`thinking_level`	< 16k: `low` >= 16k: `high`
Google	Gemini 2.5/2.0	`budget_tokens`	`thinking_config.thinking_budget`	Passes exact budget (capped at 24,576)
xAI	Grok 3 Mini	`budget_tokens`	`reasoning_effort`	< 20k: `low` >= 20k: `high`
Qwen	Qwen 2.5/3	`budget_tokens`	`enable_thinking`, `thinking_budget`	`enable_thinking: true` `thinking_budget`: exact value
MiniMax	M2	`thinking`	`reasoning_split`	`reasoning_split: true`
DeepSeek	R1	`thinking`	(Stripped)	Parameter removed to prevent API error (400)

Implementation Details

1. OpenAI Adapter (`OpenAIAdapter`)

File: src/adapters/openai-adapter.ts
Behavior: Maps continuous token budget into discrete effort levels.
New Feature: Added support for minimal effort (typically < 4000 tokens) for faster, lighter reasoning tasks.

2. Gemini Adapter (`GeminiAdapter`)

File: src/adapters/gemini-adapter.ts
Behavior:
- Gemini 3 detection: Checks modelId for "gemini-3". Uses thinking_level.
- Backward Compatibility: Defaults to thinking_config for Gemini 2.0/2.5.
- Safety: Caps budget at 24k tokens to maintain stability.

3. Grok Adapter (`GrokAdapter`)

File: src/adapters/grok-adapter.ts
Behavior:
- Validation: Explicitly checks for "mini" models (Grok 3 Mini).
- Stripping: Removes thinking parameters for standard Grok 3 models which do not support API-controlled reasoning (prevents errors).

4. Qwen Adapter (`QwenAdapter`)

File: src/adapters/qwen-adapter.ts
Behavior:
- Enables the specific enable_thinking flag required by Alibaba Cloud / OpenRouter.
- Passes the budget through directly.

5. MiniMax Adapter (`MiniMaxAdapter`)

File: src/adapters/minimax-adapter.ts
Behavior:
- Sets reasoning_split: true.
- Does not support budget control, but correctly enables the interleaved reasoning feature.

6. DeepSeek Adapter (`DeepSeekAdapter`)

File: src/adapters/deepseek-adapter.ts
Behavior:
- Defensive: Detects DeepSeek models and removes the thinking object.
- Reasoning: Reasoning happens automatically (R1) or not at all; sending the parameter causes API rejection.

Protocol Integration

The translation happens during the prepareRequest phase of the BaseModelAdapter.

Intercept: The adapter intercepts the ClaudeRequest.
Translate: It reads thinking.budget_tokens.
Mutate: It modifies the OpenRouterPayload to add provider-specific fields.
Clean: It deletes the original thinking object to prevent OpenRouter from receiving conflicting or unrecognized parameters.

3.5 KiB Raw Blame History

Thinking Translation Model Alignment Summary

Overview

Provider Alignment Matrix

Implementation Details

1. OpenAI Adapter (OpenAIAdapter)

2. Gemini Adapter (GeminiAdapter)

3. Grok Adapter (GrokAdapter)

4. Qwen Adapter (QwenAdapter)

5. MiniMax Adapter (MiniMaxAdapter)

6. DeepSeek Adapter (DeepSeekAdapter)