claudish/ai_docs/PROTOCOL_SPECIFICATION.md

27 KiB

Claude Code Protocol Specification

COMPREHENSIVE DOCUMENTATION of Claude Code's communication protocol with Anthropic API

Based on deep analysis of monitor mode logs and real-world traffic patterns.


Table of Contents

  1. Protocol Overview
  2. Request Structure
  3. Multi-Call Pattern
  4. Streaming Protocol
  5. Thinking Mode
  6. Tool Call Protocol
  7. Prompt Caching
  8. Beta Features
  9. Complete Examples

Protocol Overview

Core Characteristics

Claude Code communicates with Anthropic API using:

  • Transport: HTTPS with Server-Sent Events (SSE) for streaming
  • Format: JSON for requests, SSE for responses
  • Authentication: API key via x-api-key header
  • Streaming: Always enabled (stream: true)
  • Caching: Extensive prompt caching with ephemeral cache controls

Key Specifications

API Version: 2023-06-01
User Agent: claude-cli/2.0.36 (external, cli)
Timeout: 600 seconds (10 minutes) - Set by Claude Code SDK (not configurable)
Max Tokens: 32000 (configurable)
Beta Features: claude-code-20250219, interleaved-thinking-2025-05-14, fine-grained-tool-streaming-2025-05-14

Request Structure

HTTP Headers

Claude Code sends comprehensive metadata in every request:

{
  "accept": "application/json",
  "accept-encoding": "gzip, deflate, br, zstd",
  "anthropic-beta": "claude-code-20250219,interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14",
  "anthropic-dangerous-direct-browser-access": "true",
  "anthropic-version": "2023-06-01",
  "content-type": "application/json",
  "user-agent": "claude-cli/2.0.36 (external, cli)",
  "x-api-key": "sk-ant-api03-...",
  "x-app": "cli",
  "x-stainless-arch": "arm64",
  "x-stainless-helper-method": "stream",
  "x-stainless-lang": "js",
  "x-stainless-os": "MacOS",
  "x-stainless-package-version": "0.68.0",
  "x-stainless-retry-count": "0",
  "x-stainless-runtime": "node",
  "x-stainless-runtime-version": "v24.3.0",
  "x-stainless-timeout": "600"
}

Critical Headers

Header Purpose Example Value
anthropic-beta Enable beta features claude-code-20250219,interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14
anthropic-version API version 2023-06-01
x-api-key Authentication sk-ant-api03-...
x-stainless-timeout Request timeout (seconds) 600 (set by SDK)
x-stainless-helper-method Streaming flag stream

Note: x-stainless-* headers are set by Claude Code's Anthropic TypeScript SDK, which is generated by Stainless. These are not configurable by the proxy.

Request Body Structure

{
  "model": "claude-sonnet-4-5-20250929",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "<system-reminder>...CLAUDE.md content...</system-reminder>",
          "cache_control": { "type": "ephemeral" }
        },
        {
          "type": "text",
          "text": "User's actual query",
          "cache_control": { "type": "ephemeral" }
        }
      ]
    }
  ],
  "system": [
    {
      "type": "text",
      "text": "You are Claude Code, Anthropic's official CLI...",
      "cache_control": { "type": "ephemeral" }
    },
    {
      "type": "text",
      "text": "Agent-specific instructions and environment info...",
      "cache_control": { "type": "ephemeral" }
    }
  ],
  "tools": [...],  // Array of 80+ tool definitions
  "metadata": {
    "user_id": "user_f925af13bf4d0fe65c090d75dbee55fca59693b4c4cbeb48994578dda58eb051_account__session_5faaad4e-780f-4f05-b320-49a85727901b"
  },
  "max_tokens": 32000,
  "stream": true
}

Message Content Types

1. Text Block (Standard)

{
  "type": "text",
  "text": "Content here"
}

2. Text Block with Caching

{
  "type": "text",
  "text": "Large content to cache",
  "cache_control": {
    "type": "ephemeral"
  }
}

3. Tool Result Block

{
  "type": "tool_result",
  "tool_use_id": "toolu_01ABC123",
  "content": "Tool execution result"
}

4. System Reminder Block

{
  "type": "text",
  "text": "<system-reminder>\n# Context\nProject-specific information...\n</system-reminder>",
  "cache_control": { "type": "ephemeral" }
}

Multi-Call Pattern

Claude Code makes multiple sequential API calls for each user request:

Call Sequence

User Request
    ↓
┌─────────────────────────────────────┐
│ Call 1: Warmup (Haiku 4.5)         │
│ - Fast, cheap model                 │
│ - Context loading                   │
│ - No tools                          │
│ - Returns planning/warmup info      │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│ Call 2: Main Execution (Sonnet 4.5)│
│ - Primary model                     │
│ - Full tool definitions (80+)      │
│ - Can execute tools                 │
│ - Returns response or tool calls    │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│ Call 3+: Tool Results (if needed)  │
│ - Contains tool_result blocks       │
│ - Continues conversation            │
│ - May trigger more tool calls       │
└─────────────────────────────────────┘

Call 1: Warmup Phase

Purpose: Fast context loading and preparation

Model: claude-haiku-4-5-20251001 (fast, cheap)

Characteristics:

  • System prompts included
  • Project context (CLAUDE.md)
  • Agent instructions
  • No tools
  • No actual execution

Request Size: ~20-50 KB

Example:

{
  "model": "claude-haiku-4-5-20251001",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "<system-reminder>...project context...</system-reminder>",
          "cache_control": { "type": "ephemeral" }
        },
        {
          "type": "text",
          "text": "Warmup",
          "cache_control": { "type": "ephemeral" }
        }
      ]
    }
  ],
  "system": [...],
  "tools": [],  // NO TOOLS
  "max_tokens": 32000,
  "stream": true
}

Call 2: Main Execution

Purpose: Actual task execution with tools

Model: claude-sonnet-4-5-20250929 (powerful)

Characteristics:

  • System prompts (cached from Call 1)
  • Project context (cached from Call 1)
  • Agent instructions (cached from Call 1)
  • Full tool definitions (80+ tools)
  • Can execute tools
  • User's actual query

Request Size: ~70-100 KB (due to tool definitions)

Example:

{
  "model": "claude-sonnet-4-5-20250929",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "<system-reminder>...same as Call 1...</system-reminder>",
          "cache_control": { "type": "ephemeral" }
        },
        {
          "type": "text",
          "text": "What is 2+2?",
          "cache_control": { "type": "ephemeral" }
        }
      ]
    }
  ],
  "system": [...],  // Same as Call 1
  "tools": [
    {
      "name": "Task",
      "description": "Launch specialized agents...",
      "input_schema": {...}
    },
    {
      "name": "Bash",
      "description": "Execute shell commands...",
      "input_schema": {...}
    },
    // ... 80+ more tools
  ],
  "max_tokens": 32000,
  "stream": true
}

Call 3+: Tool Execution Loop

Purpose: Continue conversation with tool results

Model: Same as Call 2 (Sonnet 4.5)

Pattern:

1. Model responds with tool_use blocks
2. Claude Code executes tools
3. Claude Code sends tool_result blocks
4. Model processes results
5. Repeat if more tools needed

Example Request with Tool Results:

{
  "model": "claude-sonnet-4-5-20250929",
  "messages": [
    {
      "role": "user",
      "content": [...]  // Original query
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "tool_use",
          "id": "toolu_01ABC123",
          "name": "Read",
          "input": {
            "file_path": "/path/to/file.ts"
          }
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "tool_result",
          "tool_use_id": "toolu_01ABC123",
          "content": "// File contents here..."
        }
      ]
    }
  ],
  "system": [...],
  "tools": [...],
  "max_tokens": 32000,
  "stream": true
}

Streaming Protocol

Overview

Claude Code ALWAYS uses streaming (stream: true). Responses are Server-Sent Events (SSE).

SSE Format

event: message_start
data: {"type":"message_start","message":{...}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{...}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{...}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{...},"usage":{...}}

event: message_stop
data: {"type":"message_stop"}

Event Types

1. message_start

When: First event in stream

Purpose: Initialize message metadata

Example:

{
  "type": "message_start",
  "message": {
    "id": "msg_01ABC123",
    "type": "message",
    "role": "assistant",
    "content": [],
    "model": "claude-sonnet-4-5-20250929",
    "stop_reason": null,
    "stop_sequence": null,
    "usage": {
      "input_tokens": 1234,
      "cache_creation_input_tokens": 0,
      "cache_read_input_tokens": 5000,
      "output_tokens": 0
    }
  }
}

Key Fields:

  • id - Unique message ID (format: msg_XXXXX)
  • usage.cache_read_input_tokens - Tokens read from cache
  • usage.cache_creation_input_tokens - Tokens written to cache

2. content_block_start

When: Starting a new content block (text or tool_use)

Purpose: Declare block type and metadata

Example (Text Block):

{
  "type": "content_block_start",
  "index": 0,
  "content_block": {
    "type": "text",
    "text": ""
  }
}

Example (Tool Use Block):

{
  "type": "content_block_start",
  "index": 1,
  "content_block": {
    "type": "tool_use",
    "id": "toolu_01ABC123",
    "name": "Read",
    "input": {}
  }
}

3. content_block_delta

When: Streaming content within a block

Purpose: Incrementally send text or tool input

Example (Text Delta):

{
  "type": "content_block_delta",
  "index": 0,
  "delta": {
    "type": "text_delta",
    "text": "The answer is "
  }
}

Example (Tool Input Delta):

{
  "type": "content_block_delta",
  "index": 1,
  "delta": {
    "type": "input_json_delta",
    "partial_json": "{\"file_path\":\"/path/"
  }
}

Note: Tool inputs are streamed as partial JSON strings that must be concatenated.

4. ping

When: Periodically during long streams

Purpose: Keep connection alive

Example:

{
  "type": "ping"
}

5. content_block_stop

When: Finishing a content block

Purpose: Signal block completion

Example:

{
  "type": "content_block_stop",
  "index": 0
}

6. message_delta

When: Message metadata updates (usually at end)

Purpose: Provide stop_reason and final usage

Example:

{
  "type": "message_delta",
  "delta": {
    "stop_reason": "end_turn",
    "stop_sequence": null
  },
  "usage": {
    "output_tokens": 145
  }
}

Stop Reasons:

  • end_turn - Normal completion
  • max_tokens - Hit token limit
  • tool_use - Waiting for tool execution
  • stop_sequence - Hit stop sequence

7. message_stop

When: Final event in stream

Purpose: Signal stream completion

Example:

{
  "type": "message_stop"
}

Complete Streaming Sequence

Example 1: Simple Text Response

event: message_start
data: {"type":"message_start","message":{"id":"msg_01ABC","role":"assistant",...}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: ping
data: {"type":"ping"}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"2"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" + "}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"2"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" equals "}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"4"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"."}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":12}}

event: message_stop
data: {"type":"message_stop"}

Example 2: Tool Use Response

event: message_start
data: {"type":"message_start",...}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"I'll read the file."}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"tool_use","id":"toolu_01ABC","name":"Read","input":{}}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"{\"file"}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"_path\":\"/path/to/"}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"test.ts\"}"}}

event: content_block_stop
data: {"type":"content_block_stop","index":1}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"tool_use"},"usage":{"output_tokens":87}}

event: message_stop
data: {"type":"message_stop"}

Thinking Mode

Overview

Feature: interleaved-thinking-2025-05-14

Thinking mode allows the model to include internal reasoning blocks in responses.

Thinking Block Structure

NOT YET OBSERVED IN LOGS - Placeholder for when we capture it.

Expected format based on beta feature:

{
  "type": "thinking",
  "thinking": "Internal reasoning here..."
}

Expected in streaming:

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"thinking"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":"Let me think..."}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"text_delta","text":"Based on my analysis..."}}

...

Interleaved Pattern

Thinking blocks appear before text/tool blocks:

[thinking] → [text]
[thinking] → [tool_use]
[thinking] → [thinking] → [text]

To capture: Need to run monitor mode with tasks that trigger extended reasoning.


Tool Call Protocol

Tool Definition Format

Each tool has complete JSON Schema:

{
  "name": "Read",
  "description": "Reads a file from the local filesystem...",
  "input_schema": {
    "type": "object",
    "properties": {
      "file_path": {
        "type": "string",
        "description": "The absolute path to the file to read"
      },
      "limit": {
        "type": "number",
        "description": "The number of lines to read..."
      },
      "offset": {
        "type": "number",
        "description": "The line number to start reading from..."
      }
    },
    "required": ["file_path"],
    "additionalProperties": false,
    "$schema": "http://json-schema.org/draft-07/schema#"
  }
}

Available Tools

Claude Code provides 16 core tools:

  1. Task - Launch specialized agents
  2. Bash - Execute shell commands
  3. Glob - File pattern matching
  4. Grep - Content search
  5. ExitPlanMode - Exit planning mode
  6. Read - Read files
  7. Edit - Edit files
  8. Write - Write files
  9. NotebookEdit - Edit Jupyter notebooks
  10. WebFetch - Fetch web content
  11. TodoWrite - Manage task list
  12. WebSearch - Search the web
  13. BashOutput - Get background shell output
  14. KillShell - Kill background shell
  15. Skill - Execute skills
  16. SlashCommand - Execute slash commands

Tool Use Request (from Model)

{
  "type": "tool_use",
  "id": "toolu_01ABC123XYZ",
  "name": "Read",
  "input": {
    "file_path": "/path/to/test.ts"
  }
}

Key Fields:

  • id - Unique tool call ID (format: toolu_XXXXX)
  • name - Tool name (must match definition)
  • input - Tool parameters (validated against schema)

Tool Result Response (from Claude Code)

{
  "type": "tool_result",
  "tool_use_id": "toolu_01ABC123XYZ",
  "content": "const x = 42;\nfunction test() {\n  return x;\n}"
}

Key Fields:

  • tool_use_id - References original tool_use.id
  • content - Tool execution result (string or JSON)

Tool Error Response

{
  "type": "tool_result",
  "tool_use_id": "toolu_01ABC123XYZ",
  "content": "Error: File not found",
  "is_error": true
}

Fine-Grained Tool Streaming

Feature: fine-grained-tool-streaming-2025-05-14

Tool inputs are streamed incrementally as partial JSON:

event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"tool_use","id":"toolu_01ABC","name":"Read","input":{}}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"{\"file"}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"_path\":\""}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"/path/to/test.ts\"}"}}

event: content_block_stop
data: {"type":"content_block_stop","index":1}

Reconstructing Input:

let input = "";
// For each delta event:
input += delta.partial_json;
// Final: input = "{\"file_path\":\"/path/to/test.ts\"}"
const params = JSON.parse(input);

Prompt Caching

Overview

Claude Code uses extensive prompt caching to reduce costs and latency.

Cache Control Format

{
  "type": "text",
  "text": "Large content to cache",
  "cache_control": {
    "type": "ephemeral"
  }
}

What Gets Cached

  1. System Prompts - Agent instructions
  2. Project Context - CLAUDE.md contents
  3. Tool Definitions - All 80+ tools
  4. User Messages - Some user inputs

Cache Lifecycle

  • Type: Ephemeral (5 minutes TTL)
  • Scope: Per user, per conversation
  • Hit Rate: Very high on subsequent calls

Cache Usage Metrics

From message_start event:

{
  "usage": {
    "input_tokens": 1234,
    "cache_creation_input_tokens": 8500,  // Tokens written to cache (Call 1)
    "cache_read_input_tokens": 8500,      // Tokens read from cache (Call 2+)
    "output_tokens": 0
  }
}

Cost Impact:

  • Writing to cache: 1.25x input cost
  • Reading from cache: 0.1x input cost (90% savings!)

Caching Strategy

Call 1 (Warmup):
- Creates cache with system prompts + context
- cache_creation_input_tokens: ~8500

Call 2 (Main):
- Reads from cache
- cache_read_input_tokens: ~8500
- Adds tool definitions (not cached initially)

Call 3+ (Tool Results):
- Reads from cache
- cache_read_input_tokens: ~8500
- Only tool results are new tokens

Total Token Savings:

Without caching: 8500 tokens * 3 calls = 25,500 tokens input
With caching: 8500 + (8500 * 0.1 * 2) = 10,200 effective tokens
Savings: 60% reduction in input costs

Beta Features

Required Beta Header

anthropic-beta: claude-code-20250219,interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14

Feature Breakdown

1. claude-code-20250219

Purpose: Claude Code-specific features

Enables:

  • Enhanced tool calling
  • CLI-specific optimizations
  • Agent framework support

2. interleaved-thinking-2025-05-14

Purpose: Thinking mode (extended reasoning)

Enables:

  • Thinking blocks in responses
  • Internal reasoning visible to user
  • Better complex problem solving

Block Types:

  • thinking - Internal reasoning
  • text - Final answer

Pattern:

<thinking>Analyzing the problem...</thinking>
<text>Here's my solution...</text>

3. fine-grained-tool-streaming-2025-05-14

Purpose: Stream tool inputs incrementally

Enables:

  • input_json_delta events
  • Progressive tool parameter revelation
  • Better UX for slow tool calls

Without: Tool inputs appear only when complete With: Tool inputs stream character by character


Complete Examples

Example 1: Simple Query (No Tools)

User Request: "What is 2+2?"

Call 1: Warmup

POST /v1/messages
{
  "model": "claude-haiku-4-5-20251001",
  "messages": [{
    "role": "user",
    "content": [{
      "type": "text",
      "text": "<system-reminder>...CLAUDE.md...</system-reminder>",
      "cache_control": {"type": "ephemeral"}
    }, {
      "type": "text",
      "text": "Warmup",
      "cache_control": {"type": "ephemeral"}
    }]
  }],
  "system": [...],
  "max_tokens": 32000,
  "stream": true
}

Response: Authentication error (in our logs - API key was placeholder)

Call 2: Main Execution

POST /v1/messages
{
  "model": "claude-sonnet-4-5-20250929",
  "messages": [{
    "role": "user",
    "content": [{
      "type": "text",
      "text": "<system-reminder>...CLAUDE.md...</system-reminder>",
      "cache_control": {"type": "ephemeral"}
    }, {
      "type": "text",
      "text": "What is 2+2?",
      "cache_control": {"type": "ephemeral"}
    }]
  }],
  "system": [...],
  "tools": [...],  // 80+ tools
  "max_tokens": 32000,
  "stream": true
}

Expected Response Stream:

event: message_start
data: {"type":"message_start","message":{"id":"msg_01ABC",...}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"2 + 2 equals 4."}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":8}}

event: message_stop
data: {"type":"message_stop"}

Example 2: Tool Use (Read File)

User Request: "Read package.json and tell me the version"

Call 2: Main Execution (after warmup)

POST /v1/messages
{
  "model": "claude-sonnet-4-5-20250929",
  "messages": [{
    "role": "user",
    "content": [{
      "type": "text",
      "text": "Read package.json and tell me the version"
    }]
  }],
  "tools": [...],
  "max_tokens": 32000,
  "stream": true
}

Response Stream (Tool Call):

event: message_start
data: {"type":"message_start",...}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"I'll read the package.json file."}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"tool_use","id":"toolu_01ABC123","name":"Read","input":{}}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"{\"file_path\":\"/path/to/project/package.json\"}"}}

event: content_block_stop
data: {"type":"content_block_stop","index":1}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"tool_use"},"usage":{"output_tokens":45}}

event: message_stop
data: {"type":"message_stop"}

Call 3: Tool Result

POST /v1/messages
{
  "model": "claude-sonnet-4-5-20250929",
  "messages": [{
    "role": "user",
    "content": [{"type": "text", "text": "Read package.json and tell me the version"}]
  }, {
    "role": "assistant",
    "content": [{
      "type": "tool_use",
      "id": "toolu_01ABC123",
      "name": "Read",
      "input": {"file_path": "/path/to/project/package.json"}
    }]
  }, {
    "role": "user",
    "content": [{
      "type": "tool_result",
      "tool_use_id": "toolu_01ABC123",
      "content": "{\"name\":\"claudish\",\"version\":\"1.0.8\",...}"
    }]
  }],
  "tools": [...],
  "max_tokens": 32000,
  "stream": true
}

Response Stream (Final Answer):

event: message_start
data: {"type":"message_start",...}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"The version is 1.0.8."}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":10}}

event: message_stop
data: {"type":"message_stop"}

Summary

Key Takeaways

  1. Always Streaming - No non-streaming mode exists
  2. Multi-Call Pattern - Warmup → Main → Tool Loop
  3. Extensive Caching - 60%+ cost savings
  4. Beta Features Required - claude-code-20250219, thinking, tool streaming
  5. Fine-Grained Streaming - Even tool inputs stream incrementally
  6. 16 Core Tools - Task, Bash, Read, Edit, Write, etc.
  7. Thinking Mode - Supported but not yet observed in simple queries
  8. Robust Error Handling - Authentication errors gracefully handled

For Proxy Implementers

Must Support:

  • Server-Sent Events (SSE) streaming
  • All beta features in header
  • Prompt caching (ephemeral)
  • Multi-turn conversations
  • Tool calling protocol
  • Fine-grained tool streaming
  • 600s timeout minimum
  • 32000 max_tokens default

Nice to Have:

  • Thinking mode block recognition
  • Cache analytics
  • Request/response logging
  • Token usage tracking

Last Updated: 2025-11-10 Based On: Monitor mode logs from Claudish v1.0.8 Status: ⚠️ INCOMPLETE - Need streaming response capture with real API key

TODO:

  • Capture actual streaming responses
  • Document thinking mode blocks in detail
  • Test multi-tool sequences
  • Document error response formats
  • Add timing/latency metrics