15 KiB

Raw Permalink Blame History

Claude Code Streaming Protocol - Complete Explanation

Visual guide to understanding how Server-Sent Events (SSE) streaming works in Claude Code.

Based on real captured traffic from monitor mode.

How Streaming Communication Works

The Big Picture

Claude Code                    Claudish Proxy              Anthropic API
    |                                |                           |
    |------ POST /v1/messages ------>|                           |
    |  (JSON request body)           |                           |
    |                                |------ POST /v1/messages ->|
    |                                |  (same JSON body)         |
    |                                |                           |
    |                                |<----- SSE Stream ---------|
    |                                |  (text/event-stream)      |
    |<----- SSE Stream --------------|                           |
    |  (forwarded as-is)             |                           |
    |                                |                           |
    |  [Reading events...]           |  [Logging events...]      |
    |                                |                           |

SSE (Server-Sent Events) Format

What is SSE?

SSE is a standard for streaming text data from server to client over HTTP:

Content-Type: text/event-stream

event: event_name
data: {"json":"data"}

event: another_event
data: {"more":"data"}

Key Characteristics:

Plain text protocol
Events separated by blank lines (\n\n)
Each event has event: and data: lines
Connection stays open

Complete Streaming Sequence (Real Example)

Step 1: Client Sends Request

Claude Code → Proxy:

POST /v1/messages HTTP/1.1
Host: 127.0.0.1:5285
Content-Type: application/json
authorization: Bearer sk-ant-oat01-...
anthropic-beta: oauth-2025-04-20,interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14

{
  "model": "claude-haiku-4-5-20251001",
  "messages": [{
    "role": "user",
    "content": [{"type": "text", "text": "Analyze this codebase"}]
  }],
  "max_tokens": 32000,
  "stream": true
}

Step 2: Server Responds with SSE

Anthropic API → Proxy → Claude Code:

HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

event: message_start
data: {"type":"message_start","message":{"id":"msg_01ABC","model":"claude-haiku-4-5-20251001","usage":{"input_tokens":3,"cache_creation_input_tokens":5501}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"I"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"'m ready to help you search"}}

event: ping
data: {"type":"ping"}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" and analyze the"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" codebase."}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":12}}

event: message_stop
data: {"type":"message_stop"}

Step 3: Client Reconstructs Response

Claude Code processes events:

let fullText = "";
let messageId = "";
let usage = {};

// Read SSE stream
stream.on('event:message_start', (data) => {
  messageId = data.message.id;
  usage = data.message.usage;
});

stream.on('event:content_block_delta', (data) => {
  if (data.delta.type === 'text_delta') {
    fullText += data.delta.text;
    // Display incrementally to user
    console.log(data.delta.text);
  }
});

stream.on('event:message_stop', () => {
  // Complete! Final text: "I'm ready to help you search and analyze the codebase."
});

Event Types Explained

1. `message_start` - Initialize Message

When: First event in every stream

Purpose: Provide message metadata and usage stats

Example:

{
  "type": "message_start",
  "message": {
    "id": "msg_01Bnhgy47DDidiGYfAEX5zkm",
    "model": "claude-haiku-4-5-20251001",
    "role": "assistant",
    "content": [],
    "usage": {
      "input_tokens": 3,
      "cache_creation_input_tokens": 5501,
      "cache_read_input_tokens": 0,
      "output_tokens": 1
    }
  }
}

What Claude Code Does:

Extracts message ID
Records cache metrics (important for cost tracking!)
Initializes content array

2. `content_block_start` - Begin Content Block

When: Starting a new text or tool block

Purpose: Declare block type

Example (Text Block):

{
  "type": "content_block_start",
  "index": 0,
  "content_block": {
    "type": "text",
    "text": ""
  }
}

Example (Tool Block):

{
  "type": "content_block_start",
  "index": 1,
  "content_block": {
    "type": "tool_use",
    "id": "toolu_01XYZ",
    "name": "Read",
    "input": {}
  }
}

What Claude Code Does:

Creates new content block
Prepares to receive deltas
Displays block header if needed

3. `content_block_delta` - Stream Content

When: Incrementally sending content

Purpose: Send text/tool input piece by piece

Text Delta:

{
  "type": "content_block_delta",
  "index": 0,
  "delta": {
    "type": "text_delta",
    "text": "I'm ready to help"
  }
}

Tool Input Delta:

{
  "type": "content_block_delta",
  "index": 1,
  "delta": {
    "type": "input_json_delta",
    "partial_json": "{\"file_path\":\"/Users/"
  }
}

What Claude Code Does:

Text: Append to buffer, display immediately
Tool Input: Concatenate JSON fragments

Streaming Granularity:

Real example from logs:

Delta 1: "I"
Delta 2: "'m ready to help you search"
Delta 3: " an"
Delta 4: "d analyze the"
Delta 5: " codebase. I have access"
...

Very fine-grained! Each delta is 1-20 characters.

4. `ping` - Keep Alive

When: Periodically during long streams

Purpose: Prevent connection timeout

Example:

{
  "type": "ping"
}

What Claude Code Does:

Ignores (doesn't affect content)
Resets timeout timer

5. `content_block_stop` - End Content Block

When: Content block is complete

Purpose: Signal block finished

Example:

{
  "type": "content_block_stop",
  "index": 0
}

What Claude Code Does:

Finalizes block
Moves to next block if any

6. `message_delta` - Update Message Metadata

When: Near end of stream

Purpose: Provide stop_reason and final usage

Example:

{
  "type": "message_delta",
  "delta": {
    "stop_reason": "end_turn",
    "stop_sequence": null
  },
  "usage": {
    "output_tokens": 145
  }
}

Stop Reasons:

end_turn - Normal completion
max_tokens - Hit token limit
tool_use - Wants to call tools
stop_sequence - Hit stop sequence

What Claude Code Does:

Records why stream ended
Updates final token count
Determines next action

7. `message_stop` - End Stream

When: Final event

Purpose: Signal stream complete

Example:

{
  "type": "message_stop"
}

What Claude Code Does:

Closes connection
Returns control to user
Or executes tools if stop_reason: "tool_use"

Tool Call Streaming (Fine-Grained)

Text Block Then Tool Block

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"I'll read the file."}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"tool_use","id":"toolu_01ABC","name":"Read","input":{}}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"{\"file"}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"_path\":\"/path/to/package.json\"}"}}

event: content_block_stop
data: {"type":"content_block_stop","index":1}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"tool_use"},"usage":{"output_tokens":45}}

event: message_stop
data: {"type":"message_stop"}

Reconstructing Tool Input

let toolInput = "";

// Receive deltas
toolInput += "{\"file";              // Delta 1
toolInput += "_path\":\"/path/to/package.json\"}";  // Delta 2

// Parse complete JSON
const params = JSON.parse(toolInput);
// Result: {file_path: "/path/to/package.json"}

// Execute tool
const result = await readFile(params.file_path);

// Send tool_result in next request

Why Streaming?

Benefits

Immediate Feedback
- User sees response appear word-by-word
- Better UX than waiting for complete response
Reduced Latency
- No need to wait for full generation
- Can start displaying/processing immediately
Tool Calls Visible
- User sees "thinking" process
- Tool calls stream as they're generated
Better Error Handling
- Can detect errors mid-stream
- Connection issues obvious

Drawbacks

Complex Parsing
- Must handle partial JSON
- Event order matters
- Concatenation required
Connection Management
- Must handle disconnects
- Timeouts need management
- Reconnection logic needed
Buffering Challenges
- Character encoding issues
- Partial UTF-8 characters
- Line boundary detection

How Claudish Handles Streaming

Monitor Mode (Pass-Through)

// proxy-server.ts:194-247

if (contentType.includes("text/event-stream")) {
  return c.body(
    new ReadableStream({
      async start(controller) {
        const reader = anthropicResponse.body?.getReader();
        const decoder = new TextDecoder();
        let buffer = "";
        let eventLog = "";

        while (true) {
          const { done, value } = await reader.read();
          if (done) break;

          // Pass through to Claude Code immediately
          controller.enqueue(value);

          // Also log for analysis
          buffer += decoder.decode(value, { stream: true });
          const lines = buffer.split("\n");
          buffer = lines.pop() || "";

          for (const line of lines) {
            if (line.trim()) {
              eventLog += line + "\n";
            }
          }
        }

        // Log complete stream
        log(eventLog);
        controller.close();
      },
    })
  );
}

Key Points:

Pass-through: Forward bytes immediately to Claude Code
No modification: Don't parse or transform
Logging: Decode and log for analysis
Line buffering: Handle partial lines correctly

OpenRouter Mode (Translation)

// proxy-server.ts:583-896

// Send initial events IMMEDIATELY
sendSSE("message_start", {...});
sendSSE("content_block_start", {...});
sendSSE("ping", {...});

// Read OpenRouter stream
const reader = openrouterResponse.body?.getReader();
let buffer = "";

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split("\n");
  buffer = lines.pop() || "";

  for (const line of lines) {
    if (!line.startsWith("data: ")) continue;

    const data = JSON.parse(line.slice(6));

    if (data.choices[0].delta.content) {
      // Send text delta
      sendSSE("content_block_delta", {
        type: "content_block_delta",
        index: 0,
        delta: {
          type: "text_delta",
          text: data.choices[0].delta.content
        }
      });
    }

    if (data.choices[0].delta.tool_calls) {
      // Send tool input deltas
      // ...complex tool streaming logic
    }
  }
}

// Send final events
sendSSE("content_block_stop", {...});
sendSSE("message_delta", {...});
sendSSE("message_stop", {...});

Key Points:

OpenAI → Anthropic: Transform event format
Buffer management: Handle partial lines
Tool call mapping: Convert OpenAI tool format
Immediate events: Send message_start before first chunk

Real Example: Word-by-Word Assembly

From our logs, here's how one sentence streams:

Original sentence: "I'm ready to help you search and analyze the codebase."

Delta 1:  "I"
Delta 2:  "'m ready to help you search"
Delta 3:  " an"
Delta 4:  "d analyze the"
Delta 5:  " codebase."

Assembled: "I" + "'m ready to help you search" + " an" + "d analyze the" + " codebase."
Result:    "I'm ready to help you search and analyze the codebase."

Why so granular?

Model generates text incrementally
Anthropic sends immediately (low latency)
Network packets don't align with word boundaries
Fine-grained streaming beta feature

Cache Metrics in Streaming

First Call (Creates Cache)

event: message_start
data: {
  "usage": {
    "input_tokens": 3,
    "cache_creation_input_tokens": 5501,
    "cache_read_input_tokens": 0,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 5501
    }
  }
}

Meaning:

Read 3 new tokens
Wrote 5501 tokens to cache (5-minute TTL)
Cache will be available for next 5 minutes

Subsequent Calls (Reads Cache)

event: message_start
data: {
  "usage": {
    "input_tokens": 50,
    "cache_read_input_tokens": 5501
  }
}

Meaning:

Read 50 new tokens
Read 5501 cached tokens (90% discount!)
Total effective: 50 + (5501 * 0.1) = 600.1 tokens

Summary

How Streaming Works

Client sends: Single HTTP POST with stream: true
Server responds: Content-Type: text/event-stream
Events stream: 7 event types in sequence
Client assembles: Concatenate deltas to build response
Connection closes: After message_stop event

Key Insights

Always streaming: 100% of Claude Code responses
Fine-grained: Text streams 1-20 chars per delta
Tools stream too: input_json_delta for tool parameters
Cache info included: Usage stats in message_start
Stop reason determines action: tool_use triggers execution loop

For Proxy Implementers

MUST:

✅ Support SSE (text/event-stream)
✅ Forward all 7 event types
✅ Handle partial JSON in tool inputs
✅ Buffer partial lines correctly
✅ Send events immediately (don't batch)
✅ Include cache metrics

Common Pitfalls:

❌ Buffering whole response before sending
❌ Not handling partial UTF-8 characters
❌ Batching events (breaks UX)
❌ Missing ping events (causes timeouts)
❌ Wrong event sequence (breaks parsing)

Last Updated: 2025-11-11 Based On: Real traffic capture from monitor mode Status: ✅ Complete with real examples

15 KiB Raw Permalink Blame History

Claude Code Streaming Protocol - Complete Explanation

How Streaming Communication Works

The Big Picture

SSE (Server-Sent Events) Format

What is SSE?

Complete Streaming Sequence (Real Example)

Step 1: Client Sends Request

Step 2: Server Responds with SSE

Step 3: Client Reconstructs Response

Event Types Explained

1. message_start - Initialize Message

2. content_block_start - Begin Content Block

3. content_block_delta - Stream Content

4. ping - Keep Alive

5. content_block_stop - End Content Block

6. message_delta - Update Message Metadata

7. message_stop - End Stream

Tool Call Streaming (Fine-Grained)

Text Block Then Tool Block

Reconstructing Tool Input

Why Streaming?

Benefits

Drawbacks

How Claudish Handles Streaming

Monitor Mode (Pass-Through)

OpenRouter Mode (Translation)

Real Example: Word-by-Word Assembly

Cache Metrics in Streaming

First Call (Creates Cache)

Subsequent Calls (Reads Cache)

Summary

How Streaming Works

Key Insights

For Proxy Implementers

15 KiB

Raw Permalink Blame History

1. `message_start` - Initialize Message

2. `content_block_start` - Begin Content Block

3. `content_block_delta` - Stream Content

4. `ping` - Keep Alive

5. `content_block_stop` - End Content Block

6. `message_delta` - Update Message Metadata

7. `message_stop` - End Stream