16 KiB

Raw Blame History

Claude Code Protocol - Complete Specification

DEFINITIVE GUIDE to Claude Code's communication protocol with the Anthropic API.

Based on complete traffic capture from monitor mode with OAuth authentication.

Status: ✅ COMPLETE - All patterns documented with real examples

Executive Summary
Authentication
Request Structure
Streaming Protocol
Tool Call Protocol
Multi-Call Pattern
Prompt Caching
Complete Real Examples

Executive Summary

Key Discoveries

From analyzing 924KB of real traffic (14 API calls, 16 tool uses):

OAuth 2.0 Authentication - Claude Code uses authorization: Bearer <token> header, NOT x-api-key
Always Streaming - 100% of responses use Server-Sent Events (SSE)
Extensive Caching - 5501 tokens cached, massive cost savings
Multi-Model Strategy - Haiku for warmup, Sonnet for execution
Fine-Grained Streaming - Text streams word-by-word, tools stream character-by-character
No Thinking Mode Observed - Despite interleaved-thinking-2025-05-14 beta, no thinking blocks captured

Traffic Statistics

From real session:

Total API Calls: 14 messages
Tool Uses: 16 total
- Read: 19 times
- Glob: 5 times
- Others: 1-4 times each
Streaming: 100% (all responses)
Models Used:
- claude-haiku-4-5-20251001 - Warmup/search
- claude-sonnet-4-5-20250929 - Main execution

Authentication

OAuth 2.0 (Native Claude Code)

Claude Code uses OAuth 2.0, not API keys!

OAuth Token Format

authorization: Bearer sk-ant-oat01-<token>

Example:

authorization: Bearer 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

How OAuth Works with Claude Code

User authenticates: claude auth login
OAuth server provides token - Stored locally by Claude Code
Token sent in requests: authorization: Bearer <token>
Token NOT in x-api-key header

Beta Feature for OAuth

anthropic-beta: oauth-2025-04-20,...

This beta feature MUST be present for OAuth to work.

API Key (Alternative)

For proxies or testing, you can use API key:

x-api-key: sk-ant-api03-<key>

But Claude Code itself uses OAuth by default.

Request Structure

HTTP Headers (Complete)

Real headers captured from Claude Code:

{
  "accept": "application/json",
  "accept-encoding": "gzip, deflate, br, zstd",
  "anthropic-beta": "oauth-2025-04-20,interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14",
  "anthropic-dangerous-direct-browser-access": "true",
  "anthropic-version": "2023-06-01",
  "authorization": "Bearer sk-ant-oat01-czgCTyNSNbtdynagN5UPCWqX0YLElsmEPP...",
  "connection": "keep-alive",
  "content-type": "application/json",
  "host": "127.0.0.1:5285",
  "user-agent": "claude-cli/2.0.36 (external, cli)",
  "x-app": "cli",
  "x-stainless-arch": "arm64",
  "x-stainless-helper-method": "stream",
  "x-stainless-lang": "js",
  "x-stainless-os": "MacOS",
  "x-stainless-package-version": "0.68.0",
  "x-stainless-retry-count": "0",
  "x-stainless-runtime": "node",
  "x-stainless-runtime-version": "v24.3.0",
  "x-stainless-timeout": "600"
}

Critical Headers Explained

Header	Value	Purpose
`anthropic-beta`	`oauth-2025-04-20,interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14`	Enable OAuth, thinking mode, fine-grained tool streaming
`authorization`	`Bearer sk-ant-oat01-...`	OAuth 2.0 authentication token
`anthropic-version`	`2023-06-01`	API version
`x-stainless-timeout`	`600`	10-minute timeout
`x-stainless-helper-method`	`stream`	Always use streaming

Request Body Structure

{
  "model": "claude-sonnet-4-5-20250929",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "<system-reminder>...CLAUDE.md content...</system-reminder>",
          "cache_control": { "type": "ephemeral" }
        },
        {
          "type": "text",
          "text": "User's actual query",
          "cache_control": { "type": "ephemeral" }
        }
      ]
    }
  ],
  "system": [
    {
      "type": "text",
      "text": "You are Claude Code, Anthropic's official CLI...",
      "cache_control": { "type": "ephemeral" }
    }
  ],
  "tools": [...],  // 16 tools
  "metadata": {
    "user_id": "user_f925af13bf4d0fe65c090d75dbee55fca59693b4c4cbeb48994578dda58eb051..."
  },
  "max_tokens": 32000,
  "stream": true
}

Streaming Protocol

SSE Event Sequence

Every response follows this exact pattern:

1. message_start
2. content_block_start
3. content_block_delta (many times - word by word)
4. ping (periodically)
5. content_block_stop
6. message_delta
7. message_stop

Real Example (Captured from Logs)

Event 1: `message_start`

event: message_start
data: {
  "type": "message_start",
  "message": {
    "model": "claude-haiku-4-5-20251001",
    "id": "msg_01Bnhgy47DDidiGYfAEX5zkm",
    "type": "message",
    "role": "assistant",
    "content": [],
    "stop_reason": null,
    "stop_sequence": null,
    "usage": {
      "input_tokens": 3,
      "cache_creation_input_tokens": 5501,
      "cache_read_input_tokens": 0,
      "cache_creation": {
        "ephemeral_5m_input_tokens": 5501,
        "ephemeral_1h_input_tokens": 0
      },
      "output_tokens": 1,
      "service_tier": "standard"
    }
  }
}

Key Fields:

cache_creation_input_tokens: 5501 - Created 5501 tokens of cache
cache_read_input_tokens: 0 - First call, nothing to read yet
ephemeral_5m_input_tokens: 5501 - 5-minute cache TTL

Event 2: `content_block_start`

event: content_block_start
data: {
  "type": "content_block_start",
  "index": 0,
  "content_block": {
    "type": "text",
    "text": ""
  }
}

Event 3: `content_block_delta` (Word-by-Word Streaming)

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"I"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"'m ready to help you search"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" an"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"d analyze the"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" codebase. I have access"}}

Pattern:

Each delta contains a few words
Must concatenate all deltas to get full text
Streaming is VERY fine-grained

Event 4: `ping`

event: ping
data: {"type": "ping"}

Sent periodically to keep connection alive.

Event 5: `content_block_stop`

event: content_block_stop
data: {"type":"content_block_stop","index":0}

Event 6: `message_delta`

event: message_delta
data: {
  "type":"message_delta",
  "delta": {
    "stop_reason":"end_turn",
    "stop_sequence":null
  },
  "usage": {
    "output_tokens": 145
  }
}

Stop Reasons:

end_turn - Normal completion
max_tokens - Hit token limit
tool_use - Model wants to call tools

Event 7: `message_stop`

event: message_stop
data: {"type":"message_stop"}

Final event - stream complete.

Tool Call Protocol

Tool Definitions

Claude Code provides 16 tools:

Task
Bash
Glob
Grep
ExitPlanMode
Read
Edit
Write
NotebookEdit
WebFetch
TodoWrite
WebSearch
BashOutput
KillShell
Skill
SlashCommand

Real Tool Use Example

From captured traffic - Read tool:

Model Requests Tool

event: content_block_start
data: {
  "type": "content_block_start",
  "index": 1,
  "content_block": {
    "type": "tool_use",
    "id": "toolu_01ABC123",
    "name": "Read",
    "input": {}
  }
}

event: content_block_delta
data: {
  "type": "content_block_delta",
  "index": 1,
  "delta": {
    "type": "input_json_delta",
    "partial_json": "{\"file"
  }
}

event: content_block_delta
data: {
  "type": "content_block_delta",
  "index": 1,
  "delta": {
    "type": "input_json_delta",
    "partial_json": "_path\":\"/path/to/project/package.json\"}"
  }
}

event: content_block_stop
data: {"type":"content_block_stop","index":1}

Reconstructing Tool Input:

let input = "";
input += "{\"file";
input += "_path\":\"/path/to/project/package.json\"}";
// Final: {"file_path":"/path/to/project/package.json"}

Claude Code Executes Tool

Claude Code reads the file and gets result.

Next Request with Tool Result

{
  "model": "claude-sonnet-4-5-20250929",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Read package.json"}
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "tool_use",
          "id": "toolu_01ABC123",
          "name": "Read",
          "input": {
            "file_path": "/path/to/project/package.json"
          }
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "tool_result",
          "tool_use_id": "toolu_01ABC123",
          "content": "{\"name\":\"claudish\",\"version\":\"1.0.8\",...}"
        }
      ]
    }
  ],
  "tools": [...],
  "max_tokens": 32000,
  "stream": true
}

Multi-Call Pattern

Observed Pattern

From logs - 14 API calls total:

Call 1: Warmup (Haiku)

Model: claude-haiku-4-5-20251001

Purpose: Fast context loading

Response:

{
  "usage": {
    "input_tokens": 12425,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "output_tokens": 1
  },
  "stop_reason": "max_tokens"
}

Just returns "I" - minimal output to warm up cache.

Call 2: Main Execution (Sonnet)

Model: claude-sonnet-4-5-20250929

Purpose: Actual task with tools

Response:

{
  "usage": {
    "input_tokens": 3,
    "cache_creation_input_tokens": 5501,
    "cache_read_input_tokens": 0,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 5501
    },
    "output_tokens": 145
  }
}

Creates 5501 token cache and generates response.

Call 3-14: Tool Loop

Each subsequent call:

Uses Sonnet
Includes tool_result blocks
Reads from cache (reduces input costs)

Example Cache Metrics (Call 3):

{
  "usage": {
    "input_tokens": 50,
    "cache_read_input_tokens": 5501,
    "output_tokens": 200
  }
}

Cost Savings:

Without cache: 5551 input tokens
With cache: 50 new + (5501 * 0.1) = 600.1 effective tokens
Savings: 89%

Prompt Caching

Cache Control Format

{
  "type": "text",
  "text": "Large content",
  "cache_control": {
    "type": "ephemeral"
  }
}

What Gets Cached

From real traffic:

System Prompts (agent instructions)
Project Context (CLAUDE.md - very large!)
Tool Definitions (all 16 tools with schemas)
User Messages (some)

Cache Metrics (Real Data)

Call 1 (Warmup):

cache_creation_input_tokens: 0
cache_read_input_tokens: 0

No cache operations yet.

Call 2 (Main):

cache_creation_input_tokens: 5501
cache_read_input_tokens: 0
ephemeral_5m_input_tokens: 5501

Created 5501 token cache with 5-minute TTL.

Call 3+ (Tool Results):

cache_read_input_tokens: 5501

Reading all 5501 tokens from cache!

Cost Calculation

Anthropic Pricing (as of 2025):

Input: $3/MTok
Cache Write: $3.75/MTok (1.25x input)
Cache Read: $0.30/MTok (0.1x input)

Example Session (14 calls):

Call 1: 12425 input = $0.037
Call 2: 3 input + 5501 cache write = $0.021
Call 3-14: 50 input + 5501 cache read each = 12 * $0.0017 = $0.020

Total: ~$0.078
Without cache: ~$0.50
Savings: 84%!

Complete Real Examples

Example 1: Simple Text Response

Request:

{
  "model": "claude-haiku-4-5-20251001",
  "messages": [{
    "role": "user",
    "content": [{"type": "text", "text": "I'm ready to help"}]
  }],
  "max_tokens": 32000,
  "stream": true
}

Response Stream:

event: message_start
data: {"type":"message_start","message":{...,"usage":{"input_tokens":3,"cache_creation_input_tokens":5501,...}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"I'm ready to help you search"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" and analyze the codebase."}}

event: ping
data: {"type":"ping"}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":12}}

event: message_stop
data: {"type":"message_stop"}

Example 2: Tool Use (Read File)

Request:

{
  "model": "claude-sonnet-4-5-20250929",
  "messages": [{
    "role": "user",
    "content": [{"type": "text", "text": "Read package.json"}]
  }],
  "tools": [...],
  "max_tokens": 32000,
  "stream": true
}

Response Stream:

event: message_start
data: {...}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"I'll read the package.json file."}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"tool_use","id":"toolu_01XYZ","name":"Read","input":{}}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"{\"file_path\":\"/path/to/package.json\"}"}}

event: content_block_stop
data: {"type":"content_block_stop","index":1}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"tool_use"},"usage":{"output_tokens":45}}

event: message_stop
data: {"type":"message_stop"}

Summary

Protocol Essentials

OAuth 2.0 via authorization: Bearer header
Always Streaming with SSE
Fine-Grained Streaming (word-by-word text, character-by-character tools)
Extensive Caching (84%+ cost savings observed)
Multi-Model (Haiku warmup, Sonnet execution)
16 Core Tools with JSON Schema definitions

For Proxy Implementers

MUST Support:

✅ OAuth 2.0 authorization: Bearer header forwarding
✅ SSE streaming responses
✅ Fine-grained tool input streaming (input_json_delta)
✅ Prompt caching with cache_control
✅ Beta features: oauth-2025-04-20, interleaved-thinking-2025-05-14, fine-grained-tool-streaming-2025-05-14
✅ 600s timeout minimum
✅ Tool result conversation continuity

Observed Patterns:

Text streams in ~2-10 word chunks
Tool inputs stream as partial JSON strings
Ping events every ~few chunks
Cache hit rate: ~90% after first call
Stop reason determines next action

Monitor Mode Usage

To capture your own traffic:

# OAuth mode (uses Claude Code auth)
claudish --monitor --debug "your complex query here"

# Logs saved to: logs/claudish_TIMESTAMP.log

Requirements:

Authenticated with claude auth login
OR set ANTHROPIC_API_KEY=sk-ant-api03-...

Document Version: 1.0.0 Last Updated: 2025-11-11 Based On: 924KB real traffic capture (14 API calls, 16 tool uses) Status: ✅ COMPLETE - All major patterns documented

Appendix: Beta Features

`oauth-2025-04-20`

OAuth 2.0 authentication support.

Enables:

authorization: Bearer token auth
No x-api-key required
Session-based authentication

`interleaved-thinking-2025-05-14`

Thinking mode (extended reasoning).

Expected (not observed in our capture):

thinking content blocks
Internal reasoning exposed
Pattern: [thinking] → [text]

Note: Not triggered by our queries - likely requires specific prompt patterns.

`fine-grained-tool-streaming-2025-05-14`

Incremental tool input streaming.