claudish/ai_docs/PROTOCOL_SPECIFICATION.md

1179 lines
27 KiB
Markdown
Raw Normal View History

# Claude Code Protocol Specification
> **COMPREHENSIVE DOCUMENTATION** of Claude Code's communication protocol with Anthropic API
>
> Based on deep analysis of monitor mode logs and real-world traffic patterns.
---
## Table of Contents
1. [Protocol Overview](#protocol-overview)
2. [Request Structure](#request-structure)
3. [Multi-Call Pattern](#multi-call-pattern)
4. [Streaming Protocol](#streaming-protocol)
5. [Thinking Mode](#thinking-mode)
6. [Tool Call Protocol](#tool-call-protocol)
7. [Prompt Caching](#prompt-caching)
8. [Beta Features](#beta-features)
9. [Complete Examples](#complete-examples)
---
## Protocol Overview
### Core Characteristics
Claude Code communicates with Anthropic API using:
- **Transport:** HTTPS with Server-Sent Events (SSE) for streaming
- **Format:** JSON for requests, SSE for responses
- **Authentication:** API key via `x-api-key` header
- **Streaming:** Always enabled (`stream: true`)
- **Caching:** Extensive prompt caching with ephemeral cache controls
### Key Specifications
```
API Version: 2023-06-01
User Agent: claude-cli/2.0.36 (external, cli)
Timeout: 600 seconds (10 minutes) - Set by Claude Code SDK (not configurable)
Max Tokens: 32000 (configurable)
Beta Features: claude-code-20250219, interleaved-thinking-2025-05-14, fine-grained-tool-streaming-2025-05-14
```
---
## Request Structure
### HTTP Headers
Claude Code sends comprehensive metadata in every request:
```json
{
"accept": "application/json",
"accept-encoding": "gzip, deflate, br, zstd",
"anthropic-beta": "claude-code-20250219,interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14",
"anthropic-dangerous-direct-browser-access": "true",
"anthropic-version": "2023-06-01",
"content-type": "application/json",
"user-agent": "claude-cli/2.0.36 (external, cli)",
"x-api-key": "sk-ant-api03-...",
"x-app": "cli",
"x-stainless-arch": "arm64",
"x-stainless-helper-method": "stream",
"x-stainless-lang": "js",
"x-stainless-os": "MacOS",
"x-stainless-package-version": "0.68.0",
"x-stainless-retry-count": "0",
"x-stainless-runtime": "node",
"x-stainless-runtime-version": "v24.3.0",
"x-stainless-timeout": "600"
}
```
#### Critical Headers
| Header | Purpose | Example Value |
|--------|---------|---------------|
| `anthropic-beta` | Enable beta features | `claude-code-20250219,interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14` |
| `anthropic-version` | API version | `2023-06-01` |
| `x-api-key` | Authentication | `sk-ant-api03-...` |
| `x-stainless-timeout` | Request timeout (seconds) | `600` (set by SDK) |
| `x-stainless-helper-method` | Streaming flag | `stream` |
**Note:** `x-stainless-*` headers are set by Claude Code's Anthropic TypeScript SDK, which is generated by Stainless. These are not configurable by the proxy.
### Request Body Structure
```json
{
"model": "claude-sonnet-4-5-20250929",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "<system-reminder>...CLAUDE.md content...</system-reminder>",
"cache_control": { "type": "ephemeral" }
},
{
"type": "text",
"text": "User's actual query",
"cache_control": { "type": "ephemeral" }
}
]
}
],
"system": [
{
"type": "text",
"text": "You are Claude Code, Anthropic's official CLI...",
"cache_control": { "type": "ephemeral" }
},
{
"type": "text",
"text": "Agent-specific instructions and environment info...",
"cache_control": { "type": "ephemeral" }
}
],
"tools": [...], // Array of 80+ tool definitions
"metadata": {
"user_id": "user_f925af13bf4d0fe65c090d75dbee55fca59693b4c4cbeb48994578dda58eb051_account__session_5faaad4e-780f-4f05-b320-49a85727901b"
},
"max_tokens": 32000,
"stream": true
}
```
### Message Content Types
#### 1. Text Block (Standard)
```json
{
"type": "text",
"text": "Content here"
}
```
#### 2. Text Block with Caching
```json
{
"type": "text",
"text": "Large content to cache",
"cache_control": {
"type": "ephemeral"
}
}
```
#### 3. Tool Result Block
```json
{
"type": "tool_result",
"tool_use_id": "toolu_01ABC123",
"content": "Tool execution result"
}
```
#### 4. System Reminder Block
```json
{
"type": "text",
"text": "<system-reminder>\n# Context\nProject-specific information...\n</system-reminder>",
"cache_control": { "type": "ephemeral" }
}
```
---
## Multi-Call Pattern
Claude Code makes **multiple sequential API calls** for each user request:
### Call Sequence
```
User Request
┌─────────────────────────────────────┐
│ Call 1: Warmup (Haiku 4.5) │
│ - Fast, cheap model │
│ - Context loading │
│ - No tools │
│ - Returns planning/warmup info │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Call 2: Main Execution (Sonnet 4.5)│
│ - Primary model │
│ - Full tool definitions (80+) │
│ - Can execute tools │
│ - Returns response or tool calls │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Call 3+: Tool Results (if needed) │
│ - Contains tool_result blocks │
│ - Continues conversation │
│ - May trigger more tool calls │
└─────────────────────────────────────┘
```
### Call 1: Warmup Phase
**Purpose:** Fast context loading and preparation
**Model:** `claude-haiku-4-5-20251001` (fast, cheap)
**Characteristics:**
- ✅ System prompts included
- ✅ Project context (CLAUDE.md)
- ✅ Agent instructions
- ❌ No tools
- ❌ No actual execution
**Request Size:** ~20-50 KB
**Example:**
```json
{
"model": "claude-haiku-4-5-20251001",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "<system-reminder>...project context...</system-reminder>",
"cache_control": { "type": "ephemeral" }
},
{
"type": "text",
"text": "Warmup",
"cache_control": { "type": "ephemeral" }
}
]
}
],
"system": [...],
"tools": [], // NO TOOLS
"max_tokens": 32000,
"stream": true
}
```
### Call 2: Main Execution
**Purpose:** Actual task execution with tools
**Model:** `claude-sonnet-4-5-20250929` (powerful)
**Characteristics:**
- ✅ System prompts (cached from Call 1)
- ✅ Project context (cached from Call 1)
- ✅ Agent instructions (cached from Call 1)
- ✅ Full tool definitions (80+ tools)
- ✅ Can execute tools
- ✅ User's actual query
**Request Size:** ~70-100 KB (due to tool definitions)
**Example:**
```json
{
"model": "claude-sonnet-4-5-20250929",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "<system-reminder>...same as Call 1...</system-reminder>",
"cache_control": { "type": "ephemeral" }
},
{
"type": "text",
"text": "What is 2+2?",
"cache_control": { "type": "ephemeral" }
}
]
}
],
"system": [...], // Same as Call 1
"tools": [
{
"name": "Task",
"description": "Launch specialized agents...",
"input_schema": {...}
},
{
"name": "Bash",
"description": "Execute shell commands...",
"input_schema": {...}
},
// ... 80+ more tools
],
"max_tokens": 32000,
"stream": true
}
```
### Call 3+: Tool Execution Loop
**Purpose:** Continue conversation with tool results
**Model:** Same as Call 2 (Sonnet 4.5)
**Pattern:**
```
1. Model responds with tool_use blocks
2. Claude Code executes tools
3. Claude Code sends tool_result blocks
4. Model processes results
5. Repeat if more tools needed
```
**Example Request with Tool Results:**
```json
{
"model": "claude-sonnet-4-5-20250929",
"messages": [
{
"role": "user",
"content": [...] // Original query
},
{
"role": "assistant",
"content": [
{
"type": "tool_use",
"id": "toolu_01ABC123",
"name": "Read",
"input": {
"file_path": "/path/to/file.ts"
}
}
]
},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_01ABC123",
"content": "// File contents here..."
}
]
}
],
"system": [...],
"tools": [...],
"max_tokens": 32000,
"stream": true
}
```
---
## Streaming Protocol
### Overview
Claude Code **ALWAYS** uses streaming (`stream: true`). Responses are Server-Sent Events (SSE).
### SSE Format
```
event: message_start
data: {"type":"message_start","message":{...}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{...}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{...}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{...},"usage":{...}}
event: message_stop
data: {"type":"message_stop"}
```
### Event Types
#### 1. `message_start`
**When:** First event in stream
**Purpose:** Initialize message metadata
**Example:**
```json
{
"type": "message_start",
"message": {
"id": "msg_01ABC123",
"type": "message",
"role": "assistant",
"content": [],
"model": "claude-sonnet-4-5-20250929",
"stop_reason": null,
"stop_sequence": null,
"usage": {
"input_tokens": 1234,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 5000,
"output_tokens": 0
}
}
}
```
**Key Fields:**
- `id` - Unique message ID (format: `msg_XXXXX`)
- `usage.cache_read_input_tokens` - Tokens read from cache
- `usage.cache_creation_input_tokens` - Tokens written to cache
#### 2. `content_block_start`
**When:** Starting a new content block (text or tool_use)
**Purpose:** Declare block type and metadata
**Example (Text Block):**
```json
{
"type": "content_block_start",
"index": 0,
"content_block": {
"type": "text",
"text": ""
}
}
```
**Example (Tool Use Block):**
```json
{
"type": "content_block_start",
"index": 1,
"content_block": {
"type": "tool_use",
"id": "toolu_01ABC123",
"name": "Read",
"input": {}
}
}
```
#### 3. `content_block_delta`
**When:** Streaming content within a block
**Purpose:** Incrementally send text or tool input
**Example (Text Delta):**
```json
{
"type": "content_block_delta",
"index": 0,
"delta": {
"type": "text_delta",
"text": "The answer is "
}
}
```
**Example (Tool Input Delta):**
```json
{
"type": "content_block_delta",
"index": 1,
"delta": {
"type": "input_json_delta",
"partial_json": "{\"file_path\":\"/path/"
}
}
```
**Note:** Tool inputs are streamed as partial JSON strings that must be concatenated.
#### 4. `ping`
**When:** Periodically during long streams
**Purpose:** Keep connection alive
**Example:**
```json
{
"type": "ping"
}
```
#### 5. `content_block_stop`
**When:** Finishing a content block
**Purpose:** Signal block completion
**Example:**
```json
{
"type": "content_block_stop",
"index": 0
}
```
#### 6. `message_delta`
**When:** Message metadata updates (usually at end)
**Purpose:** Provide stop_reason and final usage
**Example:**
```json
{
"type": "message_delta",
"delta": {
"stop_reason": "end_turn",
"stop_sequence": null
},
"usage": {
"output_tokens": 145
}
}
```
**Stop Reasons:**
- `end_turn` - Normal completion
- `max_tokens` - Hit token limit
- `tool_use` - Waiting for tool execution
- `stop_sequence` - Hit stop sequence
#### 7. `message_stop`
**When:** Final event in stream
**Purpose:** Signal stream completion
**Example:**
```json
{
"type": "message_stop"
}
```
### Complete Streaming Sequence
#### Example 1: Simple Text Response
```
event: message_start
data: {"type":"message_start","message":{"id":"msg_01ABC","role":"assistant",...}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: ping
data: {"type":"ping"}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"2"}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" + "}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"2"}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" equals "}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"4"}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"."}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":12}}
event: message_stop
data: {"type":"message_stop"}
```
#### Example 2: Tool Use Response
```
event: message_start
data: {"type":"message_start",...}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"I'll read the file."}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"tool_use","id":"toolu_01ABC","name":"Read","input":{}}}
event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"{\"file"}}
event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"_path\":\"/path/to/"}}
event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"test.ts\"}"}}
event: content_block_stop
data: {"type":"content_block_stop","index":1}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"tool_use"},"usage":{"output_tokens":87}}
event: message_stop
data: {"type":"message_stop"}
```
---
## Thinking Mode
### Overview
**Feature:** `interleaved-thinking-2025-05-14`
Thinking mode allows the model to include internal reasoning blocks in responses.
### Thinking Block Structure
**NOT YET OBSERVED IN LOGS** - Placeholder for when we capture it.
Expected format based on beta feature:
```json
{
"type": "thinking",
"thinking": "Internal reasoning here..."
}
```
Expected in streaming:
```
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"thinking"}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":"Let me think..."}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"text_delta","text":"Based on my analysis..."}}
...
```
### Interleaved Pattern
Thinking blocks appear **before** text/tool blocks:
```
[thinking] → [text]
[thinking] → [tool_use]
[thinking] → [thinking] → [text]
```
**To capture:** Need to run monitor mode with tasks that trigger extended reasoning.
---
## Tool Call Protocol
### Tool Definition Format
Each tool has complete JSON Schema:
```json
{
"name": "Read",
"description": "Reads a file from the local filesystem...",
"input_schema": {
"type": "object",
"properties": {
"file_path": {
"type": "string",
"description": "The absolute path to the file to read"
},
"limit": {
"type": "number",
"description": "The number of lines to read..."
},
"offset": {
"type": "number",
"description": "The line number to start reading from..."
}
},
"required": ["file_path"],
"additionalProperties": false,
"$schema": "http://json-schema.org/draft-07/schema#"
}
}
```
### Available Tools
Claude Code provides **16 core tools**:
1. **Task** - Launch specialized agents
2. **Bash** - Execute shell commands
3. **Glob** - File pattern matching
4. **Grep** - Content search
5. **ExitPlanMode** - Exit planning mode
6. **Read** - Read files
7. **Edit** - Edit files
8. **Write** - Write files
9. **NotebookEdit** - Edit Jupyter notebooks
10. **WebFetch** - Fetch web content
11. **TodoWrite** - Manage task list
12. **WebSearch** - Search the web
13. **BashOutput** - Get background shell output
14. **KillShell** - Kill background shell
15. **Skill** - Execute skills
16. **SlashCommand** - Execute slash commands
### Tool Use Request (from Model)
```json
{
"type": "tool_use",
"id": "toolu_01ABC123XYZ",
"name": "Read",
"input": {
"file_path": "/path/to/test.ts"
}
}
```
**Key Fields:**
- `id` - Unique tool call ID (format: `toolu_XXXXX`)
- `name` - Tool name (must match definition)
- `input` - Tool parameters (validated against schema)
### Tool Result Response (from Claude Code)
```json
{
"type": "tool_result",
"tool_use_id": "toolu_01ABC123XYZ",
"content": "const x = 42;\nfunction test() {\n return x;\n}"
}
```
**Key Fields:**
- `tool_use_id` - References original tool_use.id
- `content` - Tool execution result (string or JSON)
### Tool Error Response
```json
{
"type": "tool_result",
"tool_use_id": "toolu_01ABC123XYZ",
"content": "Error: File not found",
"is_error": true
}
```
### Fine-Grained Tool Streaming
**Feature:** `fine-grained-tool-streaming-2025-05-14`
Tool inputs are streamed incrementally as partial JSON:
```
event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"tool_use","id":"toolu_01ABC","name":"Read","input":{}}}
event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"{\"file"}}
event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"_path\":\""}}
event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"/path/to/test.ts\"}"}}
event: content_block_stop
data: {"type":"content_block_stop","index":1}
```
**Reconstructing Input:**
```javascript
let input = "";
// For each delta event:
input += delta.partial_json;
// Final: input = "{\"file_path\":\"/path/to/test.ts\"}"
const params = JSON.parse(input);
```
---
## Prompt Caching
### Overview
Claude Code uses **extensive prompt caching** to reduce costs and latency.
### Cache Control Format
```json
{
"type": "text",
"text": "Large content to cache",
"cache_control": {
"type": "ephemeral"
}
}
```
### What Gets Cached
1. **System Prompts** - Agent instructions
2. **Project Context** - CLAUDE.md contents
3. **Tool Definitions** - All 80+ tools
4. **User Messages** - Some user inputs
### Cache Lifecycle
- **Type:** Ephemeral (5 minutes TTL)
- **Scope:** Per user, per conversation
- **Hit Rate:** Very high on subsequent calls
### Cache Usage Metrics
From `message_start` event:
```json
{
"usage": {
"input_tokens": 1234,
"cache_creation_input_tokens": 8500, // Tokens written to cache (Call 1)
"cache_read_input_tokens": 8500, // Tokens read from cache (Call 2+)
"output_tokens": 0
}
}
```
**Cost Impact:**
- Writing to cache: 1.25x input cost
- Reading from cache: 0.1x input cost (90% savings!)
### Caching Strategy
```
Call 1 (Warmup):
- Creates cache with system prompts + context
- cache_creation_input_tokens: ~8500
Call 2 (Main):
- Reads from cache
- cache_read_input_tokens: ~8500
- Adds tool definitions (not cached initially)
Call 3+ (Tool Results):
- Reads from cache
- cache_read_input_tokens: ~8500
- Only tool results are new tokens
```
**Total Token Savings:**
```
Without caching: 8500 tokens * 3 calls = 25,500 tokens input
With caching: 8500 + (8500 * 0.1 * 2) = 10,200 effective tokens
Savings: 60% reduction in input costs
```
---
## Beta Features
### Required Beta Header
```
anthropic-beta: claude-code-20250219,interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14
```
### Feature Breakdown
#### 1. `claude-code-20250219`
**Purpose:** Claude Code-specific features
**Enables:**
- Enhanced tool calling
- CLI-specific optimizations
- Agent framework support
#### 2. `interleaved-thinking-2025-05-14`
**Purpose:** Thinking mode (extended reasoning)
**Enables:**
- Thinking blocks in responses
- Internal reasoning visible to user
- Better complex problem solving
**Block Types:**
- `thinking` - Internal reasoning
- `text` - Final answer
**Pattern:**
```
<thinking>Analyzing the problem...</thinking>
<text>Here's my solution...</text>
```
#### 3. `fine-grained-tool-streaming-2025-05-14`
**Purpose:** Stream tool inputs incrementally
**Enables:**
- `input_json_delta` events
- Progressive tool parameter revelation
- Better UX for slow tool calls
**Without:** Tool inputs appear only when complete
**With:** Tool inputs stream character by character
---
## Complete Examples
### Example 1: Simple Query (No Tools)
**User Request:** "What is 2+2?"
**Call 1: Warmup**
```json
POST /v1/messages
{
"model": "claude-haiku-4-5-20251001",
"messages": [{
"role": "user",
"content": [{
"type": "text",
"text": "<system-reminder>...CLAUDE.md...</system-reminder>",
"cache_control": {"type": "ephemeral"}
}, {
"type": "text",
"text": "Warmup",
"cache_control": {"type": "ephemeral"}
}]
}],
"system": [...],
"max_tokens": 32000,
"stream": true
}
```
**Response:** Authentication error (in our logs - API key was placeholder)
**Call 2: Main Execution**
```json
POST /v1/messages
{
"model": "claude-sonnet-4-5-20250929",
"messages": [{
"role": "user",
"content": [{
"type": "text",
"text": "<system-reminder>...CLAUDE.md...</system-reminder>",
"cache_control": {"type": "ephemeral"}
}, {
"type": "text",
"text": "What is 2+2?",
"cache_control": {"type": "ephemeral"}
}]
}],
"system": [...],
"tools": [...], // 80+ tools
"max_tokens": 32000,
"stream": true
}
```
**Expected Response Stream:**
```
event: message_start
data: {"type":"message_start","message":{"id":"msg_01ABC",...}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"2 + 2 equals 4."}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":8}}
event: message_stop
data: {"type":"message_stop"}
```
### Example 2: Tool Use (Read File)
**User Request:** "Read package.json and tell me the version"
**Call 2: Main Execution** (after warmup)
```json
POST /v1/messages
{
"model": "claude-sonnet-4-5-20250929",
"messages": [{
"role": "user",
"content": [{
"type": "text",
"text": "Read package.json and tell me the version"
}]
}],
"tools": [...],
"max_tokens": 32000,
"stream": true
}
```
**Response Stream (Tool Call):**
```
event: message_start
data: {"type":"message_start",...}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"I'll read the package.json file."}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"tool_use","id":"toolu_01ABC123","name":"Read","input":{}}}
event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"{\"file_path\":\"/path/to/project/package.json\"}"}}
event: content_block_stop
data: {"type":"content_block_stop","index":1}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"tool_use"},"usage":{"output_tokens":45}}
event: message_stop
data: {"type":"message_stop"}
```
**Call 3: Tool Result**
```json
POST /v1/messages
{
"model": "claude-sonnet-4-5-20250929",
"messages": [{
"role": "user",
"content": [{"type": "text", "text": "Read package.json and tell me the version"}]
}, {
"role": "assistant",
"content": [{
"type": "tool_use",
"id": "toolu_01ABC123",
"name": "Read",
"input": {"file_path": "/path/to/project/package.json"}
}]
}, {
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": "toolu_01ABC123",
"content": "{\"name\":\"claudish\",\"version\":\"1.0.8\",...}"
}]
}],
"tools": [...],
"max_tokens": 32000,
"stream": true
}
```
**Response Stream (Final Answer):**
```
event: message_start
data: {"type":"message_start",...}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"The version is 1.0.8."}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":10}}
event: message_stop
data: {"type":"message_stop"}
```
---
## Summary
### Key Takeaways
1. **Always Streaming** - No non-streaming mode exists
2. **Multi-Call Pattern** - Warmup → Main → Tool Loop
3. **Extensive Caching** - 60%+ cost savings
4. **Beta Features Required** - claude-code-20250219, thinking, tool streaming
5. **Fine-Grained Streaming** - Even tool inputs stream incrementally
6. **16 Core Tools** - Task, Bash, Read, Edit, Write, etc.
7. **Thinking Mode** - Supported but not yet observed in simple queries
8. **Robust Error Handling** - Authentication errors gracefully handled
### For Proxy Implementers
**Must Support:**
- ✅ Server-Sent Events (SSE) streaming
- ✅ All beta features in header
- ✅ Prompt caching (ephemeral)
- ✅ Multi-turn conversations
- ✅ Tool calling protocol
- ✅ Fine-grained tool streaming
- ✅ 600s timeout minimum
- ✅ 32000 max_tokens default
**Nice to Have:**
- ⭐ Thinking mode block recognition
- ⭐ Cache analytics
- ⭐ Request/response logging
- ⭐ Token usage tracking
---
**Last Updated:** 2025-11-10
**Based On:** Monitor mode logs from Claudish v1.0.8
**Status:** ⚠️ **INCOMPLETE** - Need streaming response capture with real API key
**TODO:**
- [ ] Capture actual streaming responses
- [ ] Document thinking mode blocks in detail
- [ ] Test multi-tool sequences
- [ ] Document error response formats
- [ ] Add timing/latency metrics