27 KiB

Raw Blame History

Claude Code Protocol Specification

COMPREHENSIVE DOCUMENTATION of Claude Code's communication protocol with Anthropic API

Based on deep analysis of monitor mode logs and real-world traffic patterns.

Protocol Overview
Request Structure
Multi-Call Pattern
Streaming Protocol
Thinking Mode
Tool Call Protocol
Prompt Caching
Beta Features
Complete Examples

Protocol Overview

Core Characteristics

Claude Code communicates with Anthropic API using:

Transport: HTTPS with Server-Sent Events (SSE) for streaming
Format: JSON for requests, SSE for responses
Authentication: API key via x-api-key header
Streaming: Always enabled (stream: true)
Caching: Extensive prompt caching with ephemeral cache controls

Key Specifications

API Version: 2023-06-01
User Agent: claude-cli/2.0.36 (external, cli)
Timeout: 600 seconds (10 minutes) - Set by Claude Code SDK (not configurable)
Max Tokens: 32000 (configurable)
Beta Features: claude-code-20250219, interleaved-thinking-2025-05-14, fine-grained-tool-streaming-2025-05-14

Request Structure

HTTP Headers

Claude Code sends comprehensive metadata in every request:

{
  "accept": "application/json",
  "accept-encoding": "gzip, deflate, br, zstd",
  "anthropic-beta": "claude-code-20250219,interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14",
  "anthropic-dangerous-direct-browser-access": "true",
  "anthropic-version": "2023-06-01",
  "content-type": "application/json",
  "user-agent": "claude-cli/2.0.36 (external, cli)",
  "x-api-key": "sk-ant-api03-...",
  "x-app": "cli",
  "x-stainless-arch": "arm64",
  "x-stainless-helper-method": "stream",
  "x-stainless-lang": "js",
  "x-stainless-os": "MacOS",
  "x-stainless-package-version": "0.68.0",
  "x-stainless-retry-count": "0",
  "x-stainless-runtime": "node",
  "x-stainless-runtime-version": "v24.3.0",
  "x-stainless-timeout": "600"
}

Critical Headers

Header	Purpose	Example Value
`anthropic-beta`	Enable beta features	`claude-code-20250219,interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14`
`anthropic-version`	API version	`2023-06-01`
`x-api-key`	Authentication	`sk-ant-api03-...`
`x-stainless-timeout`	Request timeout (seconds)	`600` (set by SDK)
`x-stainless-helper-method`	Streaming flag	`stream`

Note: x-stainless-* headers are set by Claude Code's Anthropic TypeScript SDK, which is generated by Stainless. These are not configurable by the proxy.

Request Body Structure

{
  "model": "claude-sonnet-4-5-20250929",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "<system-reminder>...CLAUDE.md content...</system-reminder>",
          "cache_control": { "type": "ephemeral" }
        },
        {
          "type": "text",
          "text": "User's actual query",
          "cache_control": { "type": "ephemeral" }
        }
      ]
    }
  ],
  "system": [
    {
      "type": "text",
      "text": "You are Claude Code, Anthropic's official CLI...",
      "cache_control": { "type": "ephemeral" }
    },
    {
      "type": "text",
      "text": "Agent-specific instructions and environment info...",
      "cache_control": { "type": "ephemeral" }
    }
  ],
  "tools": [...],  // Array of 80+ tool definitions
  "metadata": {
    "user_id": "user_f925af13bf4d0fe65c090d75dbee55fca59693b4c4cbeb48994578dda58eb051_account__session_5faaad4e-780f-4f05-b320-49a85727901b"
  },
  "max_tokens": 32000,
  "stream": true
}

Message Content Types

1. Text Block (Standard)

{
  "type": "text",
  "text": "Content here"
}

2. Text Block with Caching

{
  "type": "text",
  "text": "Large content to cache",
  "cache_control": {
    "type": "ephemeral"
  }
}

3. Tool Result Block

{
  "type": "tool_result",
  "tool_use_id": "toolu_01ABC123",
  "content": "Tool execution result"
}

4. System Reminder Block

{
  "type": "text",
  "text": "<system-reminder>\n# Context\nProject-specific information...\n</system-reminder>",
  "cache_control": { "type": "ephemeral" }
}

Multi-Call Pattern

Claude Code makes multiple sequential API calls for each user request:

Call Sequence

User Request
    ↓
┌─────────────────────────────────────┐
│ Call 1: Warmup (Haiku 4.5)         │
│ - Fast, cheap model                 │
│ - Context loading                   │
│ - No tools                          │
│ - Returns planning/warmup info      │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│ Call 2: Main Execution (Sonnet 4.5)│
│ - Primary model                     │
│ - Full tool definitions (80+)      │
│ - Can execute tools                 │
│ - Returns response or tool calls    │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│ Call 3+: Tool Results (if needed)  │
│ - Contains tool_result blocks       │
│ - Continues conversation            │
│ - May trigger more tool calls       │
└─────────────────────────────────────┘

Call 1: Warmup Phase

Purpose: Fast context loading and preparation

Model: claude-haiku-4-5-20251001 (fast, cheap)

Characteristics:

✅ System prompts included
✅ Project context (CLAUDE.md)
✅ Agent instructions
❌ No tools
❌ No actual execution

Request Size: ~20-50 KB

Example:

{
  "model": "claude-haiku-4-5-20251001",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "<system-reminder>...project context...</system-reminder>",
          "cache_control": { "type": "ephemeral" }
        },
        {
          "type": "text",
          "text": "Warmup",
          "cache_control": { "type": "ephemeral" }
        }
      ]
    }
  ],
  "system": [...],
  "tools": [],  // NO TOOLS
  "max_tokens": 32000,
  "stream": true
}

Call 2: Main Execution

Purpose: Actual task execution with tools

Model: claude-sonnet-4-5-20250929 (powerful)

Characteristics:

✅ System prompts (cached from Call 1)
✅ Project context (cached from Call 1)
✅ Agent instructions (cached from Call 1)
✅ Full tool definitions (80+ tools)
✅ Can execute tools
✅ User's actual query

Request Size: ~70-100 KB (due to tool definitions)

Example:

{
  "model": "claude-sonnet-4-5-20250929",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "<system-reminder>...same as Call 1...</system-reminder>",
          "cache_control": { "type": "ephemeral" }
        },
        {
          "type": "text",
          "text": "What is 2+2?",
          "cache_control": { "type": "ephemeral" }
        }
      ]
    }
  ],
  "system": [...],  // Same as Call 1
  "tools": [
    {
      "name": "Task",
      "description": "Launch specialized agents...",
      "input_schema": {...}
    },
    {
      "name": "Bash",
      "description": "Execute shell commands...",
      "input_schema": {...}
    },
    // ... 80+ more tools
  ],
  "max_tokens": 32000,
  "stream": true
}

Call 3+: Tool Execution Loop

Purpose: Continue conversation with tool results

Model: Same as Call 2 (Sonnet 4.5)

Pattern:

1. Model responds with tool_use blocks
2. Claude Code executes tools
3. Claude Code sends tool_result blocks
4. Model processes results
5. Repeat if more tools needed

Example Request with Tool Results:

{
  "model": "claude-sonnet-4-5-20250929",
  "messages": [
    {
      "role": "user",
      "content": [...]  // Original query
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "tool_use",
          "id": "toolu_01ABC123",
          "name": "Read",
          "input": {
            "file_path": "/path/to/file.ts"
          }
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "tool_result",
          "tool_use_id": "toolu_01ABC123",
          "content": "// File contents here..."
        }
      ]
    }
  ],
  "system": [...],
  "tools": [...],
  "max_tokens": 32000,
  "stream": true
}

Streaming Protocol

Overview

Claude Code ALWAYS uses streaming (stream: true). Responses are Server-Sent Events (SSE).

SSE Format

event: message_start
data: {"type":"message_start","message":{...}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{...}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{...}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{...},"usage":{...}}

event: message_stop
data: {"type":"message_stop"}

Event Types

1. `message_start`

When: First event in stream

Purpose: Initialize message metadata

Example:

{
  "type": "message_start",
  "message": {
    "id": "msg_01ABC123",
    "type": "message",
    "role": "assistant",
    "content": [],
    "model": "claude-sonnet-4-5-20250929",
    "stop_reason": null,
    "stop_sequence": null,
    "usage": {
      "input_tokens": 1234,
      "cache_creation_input_tokens": 0,
      "cache_read_input_tokens": 5000,
      "output_tokens": 0
    }
  }
}

Key Fields:

id - Unique message ID (format: msg_XXXXX)
usage.cache_read_input_tokens - Tokens read from cache
usage.cache_creation_input_tokens - Tokens written to cache

2. `content_block_start`

When: Starting a new content block (text or tool_use)

Purpose: Declare block type and metadata

Example (Text Block):

{
  "type": "content_block_start",
  "index": 0,
  "content_block": {
    "type": "text",
    "text": ""
  }
}

Example (Tool Use Block):

{
  "type": "content_block_start",
  "index": 1,
  "content_block": {
    "type": "tool_use",
    "id": "toolu_01ABC123",
    "name": "Read",
    "input": {}
  }
}

3. `content_block_delta`

When: Streaming content within a block

Purpose: Incrementally send text or tool input

Example (Text Delta):

{
  "type": "content_block_delta",
  "index": 0,
  "delta": {
    "type": "text_delta",
    "text": "The answer is "
  }
}

Example (Tool Input Delta):

{
  "type": "content_block_delta",
  "index": 1,
  "delta": {
    "type": "input_json_delta",
    "partial_json": "{\"file_path\":\"/path/"
  }
}

Note: Tool inputs are streamed as partial JSON strings that must be concatenated.

4. `ping`

When: Periodically during long streams

Purpose: Keep connection alive

Example:

{
  "type": "ping"
}

5. `content_block_stop`

When: Finishing a content block

Purpose: Signal block completion

Example:

{
  "type": "content_block_stop",
  "index": 0
}

6. `message_delta`

When: Message metadata updates (usually at end)

Purpose: Provide stop_reason and final usage

Example:

{
  "type": "message_delta",
  "delta": {
    "stop_reason": "end_turn",
    "stop_sequence": null
  },
  "usage": {
    "output_tokens": 145
  }
}

Stop Reasons:

end_turn - Normal completion
max_tokens - Hit token limit
tool_use - Waiting for tool execution
stop_sequence - Hit stop sequence

7. `message_stop`

When: Final event in stream

Purpose: Signal stream completion

Example:

{
  "type": "message_stop"
}

Complete Streaming Sequence

Example 1: Simple Text Response

event: message_start
data: {"type":"message_start","message":{"id":"msg_01ABC","role":"assistant",...}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: ping
data: {"type":"ping"}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"2"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" + "}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"2"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" equals "}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"4"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"."}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":12}}

event: message_stop
data: {"type":"message_stop"}

Example 2: Tool Use Response

event: message_start
data: {"type":"message_start",...}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"I'll read the file."}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"tool_use","id":"toolu_01ABC","name":"Read","input":{}}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"{\"file"}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"_path\":\"/path/to/"}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"test.ts\"}"}}

event: content_block_stop
data: {"type":"content_block_stop","index":1}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"tool_use"},"usage":{"output_tokens":87}}

event: message_stop
data: {"type":"message_stop"}

Thinking Mode

Overview

Feature: interleaved-thinking-2025-05-14

Thinking mode allows the model to include internal reasoning blocks in responses.

Thinking Block Structure

NOT YET OBSERVED IN LOGS - Placeholder for when we capture it.

Expected format based on beta feature:

{
  "type": "thinking",
  "thinking": "Internal reasoning here..."
}

Expected in streaming:

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"thinking"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":"Let me think..."}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"text_delta","text":"Based on my analysis..."}}

...

Interleaved Pattern

Thinking blocks appear before text/tool blocks:

[thinking] → [text]
[thinking] → [tool_use]
[thinking] → [thinking] → [text]

To capture: Need to run monitor mode with tasks that trigger extended reasoning.

Tool Call Protocol

Tool Definition Format

Each tool has complete JSON Schema:

{
  "name": "Read",
  "description": "Reads a file from the local filesystem...",
  "input_schema": {
    "type": "object",
    "properties": {
      "file_path": {
        "type": "string",
        "description": "The absolute path to the file to read"
      },
      "limit": {
        "type": "number",
        "description": "The number of lines to read..."
      },
      "offset": {
        "type": "number",
        "description": "The line number to start reading from..."
      }
    },
    "required": ["file_path"],
    "additionalProperties": false,
    "$schema": "http://json-schema.org/draft-07/schema#"
  }
}

Available Tools

Claude Code provides 16 core tools:

Task - Launch specialized agents
Bash - Execute shell commands
Glob - File pattern matching
Grep - Content search
ExitPlanMode - Exit planning mode
Read - Read files
Edit - Edit files
Write - Write files
NotebookEdit - Edit Jupyter notebooks
WebFetch - Fetch web content
TodoWrite - Manage task list
WebSearch - Search the web
BashOutput - Get background shell output
KillShell - Kill background shell
Skill - Execute skills
SlashCommand - Execute slash commands

Tool Use Request (from Model)

{
  "type": "tool_use",
  "id": "toolu_01ABC123XYZ",
  "name": "Read",
  "input": {
    "file_path": "/path/to/test.ts"
  }
}

Key Fields:

id - Unique tool call ID (format: toolu_XXXXX)
name - Tool name (must match definition)
input - Tool parameters (validated against schema)

Tool Result Response (from Claude Code)

{
  "type": "tool_result",
  "tool_use_id": "toolu_01ABC123XYZ",
  "content": "const x = 42;\nfunction test() {\n  return x;\n}"
}

Key Fields:

tool_use_id - References original tool_use.id
content - Tool execution result (string or JSON)

Tool Error Response

{
  "type": "tool_result",
  "tool_use_id": "toolu_01ABC123XYZ",
  "content": "Error: File not found",
  "is_error": true
}

Fine-Grained Tool Streaming

Feature: fine-grained-tool-streaming-2025-05-14

Tool inputs are streamed incrementally as partial JSON:

event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"tool_use","id":"toolu_01ABC","name":"Read","input":{}}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"{\"file"}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"_path\":\""}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"/path/to/test.ts\"}"}}

event: content_block_stop
data: {"type":"content_block_stop","index":1}

Reconstructing Input:

let input = "";
// For each delta event:
input += delta.partial_json;
// Final: input = "{\"file_path\":\"/path/to/test.ts\"}"
const params = JSON.parse(input);

Prompt Caching

Overview

Claude Code uses extensive prompt caching to reduce costs and latency.

Cache Control Format

{
  "type": "text",
  "text": "Large content to cache",
  "cache_control": {
    "type": "ephemeral"
  }
}

What Gets Cached

System Prompts - Agent instructions
Project Context - CLAUDE.md contents
Tool Definitions - All 80+ tools
User Messages - Some user inputs

Cache Lifecycle

Type: Ephemeral (5 minutes TTL)
Scope: Per user, per conversation
Hit Rate: Very high on subsequent calls

Cache Usage Metrics

From message_start event:

{
  "usage": {
    "input_tokens": 1234,
    "cache_creation_input_tokens": 8500,  // Tokens written to cache (Call 1)
    "cache_read_input_tokens": 8500,      // Tokens read from cache (Call 2+)
    "output_tokens": 0
  }
}

Cost Impact:

Writing to cache: 1.25x input cost
Reading from cache: 0.1x input cost (90% savings!)

Caching Strategy

Call 1 (Warmup):
- Creates cache with system prompts + context
- cache_creation_input_tokens: ~8500

Call 2 (Main):
- Reads from cache
- cache_read_input_tokens: ~8500
- Adds tool definitions (not cached initially)

Call 3+ (Tool Results):
- Reads from cache
- cache_read_input_tokens: ~8500
- Only tool results are new tokens

Total Token Savings:

Without caching: 8500 tokens * 3 calls = 25,500 tokens input
With caching: 8500 + (8500 * 0.1 * 2) = 10,200 effective tokens
Savings: 60% reduction in input costs

Beta Features

Required Beta Header

anthropic-beta: claude-code-20250219,interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14

Feature Breakdown

1. `claude-code-20250219`

Purpose: Claude Code-specific features

Enables:

Enhanced tool calling
CLI-specific optimizations
Agent framework support

2. `interleaved-thinking-2025-05-14`

Purpose: Thinking mode (extended reasoning)

Enables:

Thinking blocks in responses
Internal reasoning visible to user
Better complex problem solving

Block Types:

thinking - Internal reasoning
text - Final answer

Pattern:

<thinking>Analyzing the problem...</thinking>
<text>Here's my solution...</text>

3. `fine-grained-tool-streaming-2025-05-14`

Purpose: Stream tool inputs incrementally

Enables:

input_json_delta events
Progressive tool parameter revelation
Better UX for slow tool calls

Without: Tool inputs appear only when complete With: Tool inputs stream character by character

Complete Examples

Example 1: Simple Query (No Tools)

User Request: "What is 2+2?"

Call 1: Warmup

POST /v1/messages
{
  "model": "claude-haiku-4-5-20251001",
  "messages": [{
    "role": "user",
    "content": [{
      "type": "text",
      "text": "<system-reminder>...CLAUDE.md...</system-reminder>",
      "cache_control": {"type": "ephemeral"}
    }, {
      "type": "text",
      "text": "Warmup",
      "cache_control": {"type": "ephemeral"}
    }]
  }],
  "system": [...],
  "max_tokens": 32000,
  "stream": true
}

Response: Authentication error (in our logs - API key was placeholder)

Call 2: Main Execution

POST /v1/messages
{
  "model": "claude-sonnet-4-5-20250929",
  "messages": [{
    "role": "user",
    "content": [{
      "type": "text",
      "text": "<system-reminder>...CLAUDE.md...</system-reminder>",
      "cache_control": {"type": "ephemeral"}
    }, {
      "type": "text",
      "text": "What is 2+2?",
      "cache_control": {"type": "ephemeral"}
    }]
  }],
  "system": [...],
  "tools": [...],  // 80+ tools
  "max_tokens": 32000,
  "stream": true
}

Expected Response Stream:

event: message_start
data: {"type":"message_start","message":{"id":"msg_01ABC",...}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"2 + 2 equals 4."}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":8}}

event: message_stop
data: {"type":"message_stop"}

Example 2: Tool Use (Read File)

User Request: "Read package.json and tell me the version"

Call 2: Main Execution (after warmup)

POST /v1/messages
{
  "model": "claude-sonnet-4-5-20250929",
  "messages": [{
    "role": "user",
    "content": [{
      "type": "text",
      "text": "Read package.json and tell me the version"
    }]
  }],
  "tools": [...],
  "max_tokens": 32000,
  "stream": true
}

Response Stream (Tool Call):

event: message_start
data: {"type":"message_start",...}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"I'll read the package.json file."}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"tool_use","id":"toolu_01ABC123","name":"Read","input":{}}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"{\"file_path\":\"/path/to/project/package.json\"}"}}

event: content_block_stop
data: {"type":"content_block_stop","index":1}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"tool_use"},"usage":{"output_tokens":45}}

event: message_stop
data: {"type":"message_stop"}

Call 3: Tool Result

POST /v1/messages
{
  "model": "claude-sonnet-4-5-20250929",
  "messages": [{
    "role": "user",
    "content": [{"type": "text", "text": "Read package.json and tell me the version"}]
  }, {
    "role": "assistant",
    "content": [{
      "type": "tool_use",
      "id": "toolu_01ABC123",
      "name": "Read",
      "input": {"file_path": "/path/to/project/package.json"}
    }]
  }, {
    "role": "user",
    "content": [{
      "type": "tool_result",
      "tool_use_id": "toolu_01ABC123",
      "content": "{\"name\":\"claudish\",\"version\":\"1.0.8\",...}"
    }]
  }],
  "tools": [...],
  "max_tokens": 32000,
  "stream": true
}

Response Stream (Final Answer):

event: message_start
data: {"type":"message_start",...}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"The version is 1.0.8."}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":10}}

event: message_stop
data: {"type":"message_stop"}

Summary

Key Takeaways

Always Streaming - No non-streaming mode exists
Multi-Call Pattern - Warmup → Main → Tool Loop
Extensive Caching - 60%+ cost savings
Beta Features Required - claude-code-20250219, thinking, tool streaming
Fine-Grained Streaming - Even tool inputs stream incrementally
16 Core Tools - Task, Bash, Read, Edit, Write, etc.
Thinking Mode - Supported but not yet observed in simple queries
Robust Error Handling - Authentication errors gracefully handled

For Proxy Implementers

Must Support:

✅ Server-Sent Events (SSE) streaming
✅ All beta features in header
✅ Prompt caching (ephemeral)
✅ Multi-turn conversations
✅ Tool calling protocol
✅ Fine-grained tool streaming
✅ 600s timeout minimum
✅ 32000 max_tokens default

Nice to Have:

⭐ Thinking mode block recognition
⭐ Cache analytics
⭐ Request/response logging
⭐ Token usage tracking

Last Updated: 2025-11-10 Based On: Monitor mode logs from Claudish v1.0.8 Status: ⚠️ INCOMPLETE - Need streaming response capture with real API key

TODO:

Capture actual streaming responses
Document thinking mode blocks in detail
Test multi-tool sequences
Document error response formats
Add timing/latency metrics

27 KiB Raw Blame History

Claude Code Protocol Specification

Table of Contents

Protocol Overview

Core Characteristics

Key Specifications

Request Structure

HTTP Headers

Critical Headers

Request Body Structure

Message Content Types

1. Text Block (Standard)

2. Text Block with Caching

3. Tool Result Block

4. System Reminder Block

Multi-Call Pattern

Call Sequence

Call 1: Warmup Phase

Call 2: Main Execution

Call 3+: Tool Execution Loop

Streaming Protocol

Overview

SSE Format

Event Types

1. message_start

2. content_block_start

3. content_block_delta

4. ping

5. content_block_stop

6. message_delta

7. message_stop

Complete Streaming Sequence

Example 1: Simple Text Response

Example 2: Tool Use Response

Thinking Mode

Overview

Thinking Block Structure

Interleaved Pattern

Tool Call Protocol

Tool Definition Format

Available Tools

Tool Use Request (from Model)

Tool Result Response (from Claude Code)

Tool Error Response

Fine-Grained Tool Streaming

Prompt Caching

Overview

Cache Control Format

What Gets Cached

Cache Lifecycle

Cache Usage Metrics

Caching Strategy

Beta Features

Required Beta Header

Feature Breakdown

1. claude-code-20250219

2. interleaved-thinking-2025-05-14

3. fine-grained-tool-streaming-2025-05-14

Complete Examples

Example 1: Simple Query (No Tools)

Example 2: Tool Use (Read File)

Summary

Key Takeaways

For Proxy Implementers

27 KiB

Raw Blame History

1. `message_start`

2. `content_block_start`

3. `content_block_delta`

4. `ping`

5. `content_block_stop`

6. `message_delta`

7. `message_stop`

1. `claude-code-20250219`

2. `interleaved-thinking-2025-05-14`

3. `fine-grained-tool-streaming-2025-05-14`