665 lines
15 KiB
Markdown
665 lines
15 KiB
Markdown
|
|
# Claude Code Streaming Protocol - Complete Explanation
|
||
|
|
|
||
|
|
> **Visual guide** to understanding how Server-Sent Events (SSE) streaming works in Claude Code.
|
||
|
|
>
|
||
|
|
> Based on real captured traffic from monitor mode.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## How Streaming Communication Works
|
||
|
|
|
||
|
|
### The Big Picture
|
||
|
|
|
||
|
|
```
|
||
|
|
Claude Code Claudish Proxy Anthropic API
|
||
|
|
| | |
|
||
|
|
|------ POST /v1/messages ------>| |
|
||
|
|
| (JSON request body) | |
|
||
|
|
| |------ POST /v1/messages ->|
|
||
|
|
| | (same JSON body) |
|
||
|
|
| | |
|
||
|
|
| |<----- SSE Stream ---------|
|
||
|
|
| | (text/event-stream) |
|
||
|
|
|<----- SSE Stream --------------| |
|
||
|
|
| (forwarded as-is) | |
|
||
|
|
| | |
|
||
|
|
| [Reading events...] | [Logging events...] |
|
||
|
|
| | |
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## SSE (Server-Sent Events) Format
|
||
|
|
|
||
|
|
### What is SSE?
|
||
|
|
|
||
|
|
SSE is a standard for streaming text data from server to client over HTTP:
|
||
|
|
|
||
|
|
```
|
||
|
|
Content-Type: text/event-stream
|
||
|
|
|
||
|
|
event: event_name
|
||
|
|
data: {"json":"data"}
|
||
|
|
|
||
|
|
event: another_event
|
||
|
|
data: {"more":"data"}
|
||
|
|
|
||
|
|
```
|
||
|
|
|
||
|
|
**Key Characteristics:**
|
||
|
|
- Plain text protocol
|
||
|
|
- Events separated by blank lines (`\n\n`)
|
||
|
|
- Each event has `event:` and `data:` lines
|
||
|
|
- Connection stays open
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Complete Streaming Sequence (Real Example)
|
||
|
|
|
||
|
|
### Step 1: Client Sends Request
|
||
|
|
|
||
|
|
**Claude Code → Proxy:**
|
||
|
|
```http
|
||
|
|
POST /v1/messages HTTP/1.1
|
||
|
|
Host: 127.0.0.1:5285
|
||
|
|
Content-Type: application/json
|
||
|
|
authorization: Bearer sk-ant-oat01-...
|
||
|
|
anthropic-beta: oauth-2025-04-20,interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14
|
||
|
|
|
||
|
|
{
|
||
|
|
"model": "claude-haiku-4-5-20251001",
|
||
|
|
"messages": [{
|
||
|
|
"role": "user",
|
||
|
|
"content": [{"type": "text", "text": "Analyze this codebase"}]
|
||
|
|
}],
|
||
|
|
"max_tokens": 32000,
|
||
|
|
"stream": true
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Step 2: Server Responds with SSE
|
||
|
|
|
||
|
|
**Anthropic API → Proxy → Claude Code:**
|
||
|
|
|
||
|
|
```
|
||
|
|
HTTP/1.1 200 OK
|
||
|
|
Content-Type: text/event-stream
|
||
|
|
Cache-Control: no-cache
|
||
|
|
Connection: keep-alive
|
||
|
|
|
||
|
|
event: message_start
|
||
|
|
data: {"type":"message_start","message":{"id":"msg_01ABC","model":"claude-haiku-4-5-20251001","usage":{"input_tokens":3,"cache_creation_input_tokens":5501}}}
|
||
|
|
|
||
|
|
event: content_block_start
|
||
|
|
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
|
||
|
|
|
||
|
|
event: content_block_delta
|
||
|
|
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"I"}}
|
||
|
|
|
||
|
|
event: content_block_delta
|
||
|
|
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"'m ready to help you search"}}
|
||
|
|
|
||
|
|
event: ping
|
||
|
|
data: {"type":"ping"}
|
||
|
|
|
||
|
|
event: content_block_delta
|
||
|
|
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" and analyze the"}}
|
||
|
|
|
||
|
|
event: content_block_delta
|
||
|
|
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" codebase."}}
|
||
|
|
|
||
|
|
event: content_block_stop
|
||
|
|
data: {"type":"content_block_stop","index":0}
|
||
|
|
|
||
|
|
event: message_delta
|
||
|
|
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":12}}
|
||
|
|
|
||
|
|
event: message_stop
|
||
|
|
data: {"type":"message_stop"}
|
||
|
|
|
||
|
|
```
|
||
|
|
|
||
|
|
### Step 3: Client Reconstructs Response
|
||
|
|
|
||
|
|
**Claude Code processes events:**
|
||
|
|
|
||
|
|
```javascript
|
||
|
|
let fullText = "";
|
||
|
|
let messageId = "";
|
||
|
|
let usage = {};
|
||
|
|
|
||
|
|
// Read SSE stream
|
||
|
|
stream.on('event:message_start', (data) => {
|
||
|
|
messageId = data.message.id;
|
||
|
|
usage = data.message.usage;
|
||
|
|
});
|
||
|
|
|
||
|
|
stream.on('event:content_block_delta', (data) => {
|
||
|
|
if (data.delta.type === 'text_delta') {
|
||
|
|
fullText += data.delta.text;
|
||
|
|
// Display incrementally to user
|
||
|
|
console.log(data.delta.text);
|
||
|
|
}
|
||
|
|
});
|
||
|
|
|
||
|
|
stream.on('event:message_stop', () => {
|
||
|
|
// Complete! Final text: "I'm ready to help you search and analyze the codebase."
|
||
|
|
});
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Event Types Explained
|
||
|
|
|
||
|
|
### 1. `message_start` - Initialize Message
|
||
|
|
|
||
|
|
**When:** First event in every stream
|
||
|
|
|
||
|
|
**Purpose:** Provide message metadata and usage stats
|
||
|
|
|
||
|
|
**Example:**
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"type": "message_start",
|
||
|
|
"message": {
|
||
|
|
"id": "msg_01Bnhgy47DDidiGYfAEX5zkm",
|
||
|
|
"model": "claude-haiku-4-5-20251001",
|
||
|
|
"role": "assistant",
|
||
|
|
"content": [],
|
||
|
|
"usage": {
|
||
|
|
"input_tokens": 3,
|
||
|
|
"cache_creation_input_tokens": 5501,
|
||
|
|
"cache_read_input_tokens": 0,
|
||
|
|
"output_tokens": 1
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**What Claude Code Does:**
|
||
|
|
- Extracts message ID
|
||
|
|
- Records cache metrics (important for cost tracking!)
|
||
|
|
- Initializes content array
|
||
|
|
|
||
|
|
### 2. `content_block_start` - Begin Content Block
|
||
|
|
|
||
|
|
**When:** Starting a new text or tool block
|
||
|
|
|
||
|
|
**Purpose:** Declare block type
|
||
|
|
|
||
|
|
**Example (Text Block):**
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"type": "content_block_start",
|
||
|
|
"index": 0,
|
||
|
|
"content_block": {
|
||
|
|
"type": "text",
|
||
|
|
"text": ""
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Example (Tool Block):**
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"type": "content_block_start",
|
||
|
|
"index": 1,
|
||
|
|
"content_block": {
|
||
|
|
"type": "tool_use",
|
||
|
|
"id": "toolu_01XYZ",
|
||
|
|
"name": "Read",
|
||
|
|
"input": {}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**What Claude Code Does:**
|
||
|
|
- Creates new content block
|
||
|
|
- Prepares to receive deltas
|
||
|
|
- Displays block header if needed
|
||
|
|
|
||
|
|
### 3. `content_block_delta` - Stream Content
|
||
|
|
|
||
|
|
**When:** Incrementally sending content
|
||
|
|
|
||
|
|
**Purpose:** Send text/tool input piece by piece
|
||
|
|
|
||
|
|
**Text Delta:**
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"type": "content_block_delta",
|
||
|
|
"index": 0,
|
||
|
|
"delta": {
|
||
|
|
"type": "text_delta",
|
||
|
|
"text": "I'm ready to help"
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Tool Input Delta:**
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"type": "content_block_delta",
|
||
|
|
"index": 1,
|
||
|
|
"delta": {
|
||
|
|
"type": "input_json_delta",
|
||
|
|
"partial_json": "{\"file_path\":\"/Users/"
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**What Claude Code Does:**
|
||
|
|
- **Text:** Append to buffer, display immediately
|
||
|
|
- **Tool Input:** Concatenate JSON fragments
|
||
|
|
|
||
|
|
**Streaming Granularity:**
|
||
|
|
```
|
||
|
|
Real example from logs:
|
||
|
|
|
||
|
|
Delta 1: "I"
|
||
|
|
Delta 2: "'m ready to help you search"
|
||
|
|
Delta 3: " an"
|
||
|
|
Delta 4: "d analyze the"
|
||
|
|
Delta 5: " codebase. I have access"
|
||
|
|
...
|
||
|
|
```
|
||
|
|
|
||
|
|
Very fine-grained! Each delta is 1-20 characters.
|
||
|
|
|
||
|
|
### 4. `ping` - Keep Alive
|
||
|
|
|
||
|
|
**When:** Periodically during long streams
|
||
|
|
|
||
|
|
**Purpose:** Prevent connection timeout
|
||
|
|
|
||
|
|
**Example:**
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"type": "ping"
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**What Claude Code Does:**
|
||
|
|
- Ignores (doesn't affect content)
|
||
|
|
- Resets timeout timer
|
||
|
|
|
||
|
|
### 5. `content_block_stop` - End Content Block
|
||
|
|
|
||
|
|
**When:** Content block is complete
|
||
|
|
|
||
|
|
**Purpose:** Signal block finished
|
||
|
|
|
||
|
|
**Example:**
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"type": "content_block_stop",
|
||
|
|
"index": 0
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**What Claude Code Does:**
|
||
|
|
- Finalizes block
|
||
|
|
- Moves to next block if any
|
||
|
|
|
||
|
|
### 6. `message_delta` - Update Message Metadata
|
||
|
|
|
||
|
|
**When:** Near end of stream
|
||
|
|
|
||
|
|
**Purpose:** Provide stop_reason and final usage
|
||
|
|
|
||
|
|
**Example:**
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"type": "message_delta",
|
||
|
|
"delta": {
|
||
|
|
"stop_reason": "end_turn",
|
||
|
|
"stop_sequence": null
|
||
|
|
},
|
||
|
|
"usage": {
|
||
|
|
"output_tokens": 145
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Stop Reasons:**
|
||
|
|
- `end_turn` - Normal completion
|
||
|
|
- `max_tokens` - Hit token limit
|
||
|
|
- `tool_use` - Wants to call tools
|
||
|
|
- `stop_sequence` - Hit stop sequence
|
||
|
|
|
||
|
|
**What Claude Code Does:**
|
||
|
|
- Records why stream ended
|
||
|
|
- Updates final token count
|
||
|
|
- Determines next action
|
||
|
|
|
||
|
|
### 7. `message_stop` - End Stream
|
||
|
|
|
||
|
|
**When:** Final event
|
||
|
|
|
||
|
|
**Purpose:** Signal stream complete
|
||
|
|
|
||
|
|
**Example:**
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"type": "message_stop"
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**What Claude Code Does:**
|
||
|
|
- Closes connection
|
||
|
|
- Returns control to user
|
||
|
|
- Or executes tools if `stop_reason: "tool_use"`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Tool Call Streaming (Fine-Grained)
|
||
|
|
|
||
|
|
### Text Block Then Tool Block
|
||
|
|
|
||
|
|
```
|
||
|
|
event: content_block_start
|
||
|
|
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
|
||
|
|
|
||
|
|
event: content_block_delta
|
||
|
|
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"I'll read the file."}}
|
||
|
|
|
||
|
|
event: content_block_stop
|
||
|
|
data: {"type":"content_block_stop","index":0}
|
||
|
|
|
||
|
|
event: content_block_start
|
||
|
|
data: {"type":"content_block_start","index":1,"content_block":{"type":"tool_use","id":"toolu_01ABC","name":"Read","input":{}}}
|
||
|
|
|
||
|
|
event: content_block_delta
|
||
|
|
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"{\"file"}}
|
||
|
|
|
||
|
|
event: content_block_delta
|
||
|
|
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"_path\":\"/path/to/package.json\"}"}}
|
||
|
|
|
||
|
|
event: content_block_stop
|
||
|
|
data: {"type":"content_block_stop","index":1}
|
||
|
|
|
||
|
|
event: message_delta
|
||
|
|
data: {"type":"message_delta","delta":{"stop_reason":"tool_use"},"usage":{"output_tokens":45}}
|
||
|
|
|
||
|
|
event: message_stop
|
||
|
|
data: {"type":"message_stop"}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Reconstructing Tool Input
|
||
|
|
|
||
|
|
```javascript
|
||
|
|
let toolInput = "";
|
||
|
|
|
||
|
|
// Receive deltas
|
||
|
|
toolInput += "{\"file"; // Delta 1
|
||
|
|
toolInput += "_path\":\"/path/to/package.json\"}"; // Delta 2
|
||
|
|
|
||
|
|
// Parse complete JSON
|
||
|
|
const params = JSON.parse(toolInput);
|
||
|
|
// Result: {file_path: "/path/to/package.json"}
|
||
|
|
|
||
|
|
// Execute tool
|
||
|
|
const result = await readFile(params.file_path);
|
||
|
|
|
||
|
|
// Send tool_result in next request
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Why Streaming?
|
||
|
|
|
||
|
|
### Benefits
|
||
|
|
|
||
|
|
1. **Immediate Feedback**
|
||
|
|
- User sees response appear word-by-word
|
||
|
|
- Better UX than waiting for complete response
|
||
|
|
|
||
|
|
2. **Reduced Latency**
|
||
|
|
- No need to wait for full generation
|
||
|
|
- Can start displaying/processing immediately
|
||
|
|
|
||
|
|
3. **Tool Calls Visible**
|
||
|
|
- User sees "thinking" process
|
||
|
|
- Tool calls stream as they're generated
|
||
|
|
|
||
|
|
4. **Better Error Handling**
|
||
|
|
- Can detect errors mid-stream
|
||
|
|
- Connection issues obvious
|
||
|
|
|
||
|
|
### Drawbacks
|
||
|
|
|
||
|
|
1. **Complex Parsing**
|
||
|
|
- Must handle partial JSON
|
||
|
|
- Event order matters
|
||
|
|
- Concatenation required
|
||
|
|
|
||
|
|
2. **Connection Management**
|
||
|
|
- Must handle disconnects
|
||
|
|
- Timeouts need management
|
||
|
|
- Reconnection logic needed
|
||
|
|
|
||
|
|
3. **Buffering Challenges**
|
||
|
|
- Character encoding issues
|
||
|
|
- Partial UTF-8 characters
|
||
|
|
- Line boundary detection
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## How Claudish Handles Streaming
|
||
|
|
|
||
|
|
### Monitor Mode (Pass-Through)
|
||
|
|
|
||
|
|
```typescript
|
||
|
|
// proxy-server.ts:194-247
|
||
|
|
|
||
|
|
if (contentType.includes("text/event-stream")) {
|
||
|
|
return c.body(
|
||
|
|
new ReadableStream({
|
||
|
|
async start(controller) {
|
||
|
|
const reader = anthropicResponse.body?.getReader();
|
||
|
|
const decoder = new TextDecoder();
|
||
|
|
let buffer = "";
|
||
|
|
let eventLog = "";
|
||
|
|
|
||
|
|
while (true) {
|
||
|
|
const { done, value } = await reader.read();
|
||
|
|
if (done) break;
|
||
|
|
|
||
|
|
// Pass through to Claude Code immediately
|
||
|
|
controller.enqueue(value);
|
||
|
|
|
||
|
|
// Also log for analysis
|
||
|
|
buffer += decoder.decode(value, { stream: true });
|
||
|
|
const lines = buffer.split("\n");
|
||
|
|
buffer = lines.pop() || "";
|
||
|
|
|
||
|
|
for (const line of lines) {
|
||
|
|
if (line.trim()) {
|
||
|
|
eventLog += line + "\n";
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
|
||
|
|
// Log complete stream
|
||
|
|
log(eventLog);
|
||
|
|
controller.close();
|
||
|
|
},
|
||
|
|
})
|
||
|
|
);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Key Points:**
|
||
|
|
1. **Pass-through:** Forward bytes immediately to Claude Code
|
||
|
|
2. **No modification:** Don't parse or transform
|
||
|
|
3. **Logging:** Decode and log for analysis
|
||
|
|
4. **Line buffering:** Handle partial lines correctly
|
||
|
|
|
||
|
|
### OpenRouter Mode (Translation)
|
||
|
|
|
||
|
|
```typescript
|
||
|
|
// proxy-server.ts:583-896
|
||
|
|
|
||
|
|
// Send initial events IMMEDIATELY
|
||
|
|
sendSSE("message_start", {...});
|
||
|
|
sendSSE("content_block_start", {...});
|
||
|
|
sendSSE("ping", {...});
|
||
|
|
|
||
|
|
// Read OpenRouter stream
|
||
|
|
const reader = openrouterResponse.body?.getReader();
|
||
|
|
let buffer = "";
|
||
|
|
|
||
|
|
while (true) {
|
||
|
|
const { done, value } = await reader.read();
|
||
|
|
if (done) break;
|
||
|
|
|
||
|
|
buffer += decoder.decode(value, { stream: true });
|
||
|
|
const lines = buffer.split("\n");
|
||
|
|
buffer = lines.pop() || "";
|
||
|
|
|
||
|
|
for (const line of lines) {
|
||
|
|
if (!line.startsWith("data: ")) continue;
|
||
|
|
|
||
|
|
const data = JSON.parse(line.slice(6));
|
||
|
|
|
||
|
|
if (data.choices[0].delta.content) {
|
||
|
|
// Send text delta
|
||
|
|
sendSSE("content_block_delta", {
|
||
|
|
type: "content_block_delta",
|
||
|
|
index: 0,
|
||
|
|
delta: {
|
||
|
|
type: "text_delta",
|
||
|
|
text: data.choices[0].delta.content
|
||
|
|
}
|
||
|
|
});
|
||
|
|
}
|
||
|
|
|
||
|
|
if (data.choices[0].delta.tool_calls) {
|
||
|
|
// Send tool input deltas
|
||
|
|
// ...complex tool streaming logic
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
|
||
|
|
// Send final events
|
||
|
|
sendSSE("content_block_stop", {...});
|
||
|
|
sendSSE("message_delta", {...});
|
||
|
|
sendSSE("message_stop", {...});
|
||
|
|
```
|
||
|
|
|
||
|
|
**Key Points:**
|
||
|
|
1. **OpenAI → Anthropic:** Transform event format
|
||
|
|
2. **Buffer management:** Handle partial lines
|
||
|
|
3. **Tool call mapping:** Convert OpenAI tool format
|
||
|
|
4. **Immediate events:** Send message_start before first chunk
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Real Example: Word-by-Word Assembly
|
||
|
|
|
||
|
|
From our logs, here's how one sentence streams:
|
||
|
|
|
||
|
|
```
|
||
|
|
Original sentence: "I'm ready to help you search and analyze the codebase."
|
||
|
|
|
||
|
|
Delta 1: "I"
|
||
|
|
Delta 2: "'m ready to help you search"
|
||
|
|
Delta 3: " an"
|
||
|
|
Delta 4: "d analyze the"
|
||
|
|
Delta 5: " codebase."
|
||
|
|
|
||
|
|
Assembled: "I" + "'m ready to help you search" + " an" + "d analyze the" + " codebase."
|
||
|
|
Result: "I'm ready to help you search and analyze the codebase."
|
||
|
|
```
|
||
|
|
|
||
|
|
**Why so granular?**
|
||
|
|
- Model generates text incrementally
|
||
|
|
- Anthropic sends immediately (low latency)
|
||
|
|
- Network packets don't align with word boundaries
|
||
|
|
- Fine-grained streaming beta feature
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Cache Metrics in Streaming
|
||
|
|
|
||
|
|
### First Call (Creates Cache)
|
||
|
|
|
||
|
|
```
|
||
|
|
event: message_start
|
||
|
|
data: {
|
||
|
|
"usage": {
|
||
|
|
"input_tokens": 3,
|
||
|
|
"cache_creation_input_tokens": 5501,
|
||
|
|
"cache_read_input_tokens": 0,
|
||
|
|
"cache_creation": {
|
||
|
|
"ephemeral_5m_input_tokens": 5501
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Meaning:**
|
||
|
|
- Read 3 new tokens
|
||
|
|
- Wrote 5501 tokens to cache (5-minute TTL)
|
||
|
|
- Cache will be available for next 5 minutes
|
||
|
|
|
||
|
|
### Subsequent Calls (Reads Cache)
|
||
|
|
|
||
|
|
```
|
||
|
|
event: message_start
|
||
|
|
data: {
|
||
|
|
"usage": {
|
||
|
|
"input_tokens": 50,
|
||
|
|
"cache_read_input_tokens": 5501
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Meaning:**
|
||
|
|
- Read 50 new tokens
|
||
|
|
- Read 5501 cached tokens (90% discount!)
|
||
|
|
- Total effective: 50 + (5501 * 0.1) = 600.1 tokens
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Summary
|
||
|
|
|
||
|
|
### How Streaming Works
|
||
|
|
|
||
|
|
1. **Client sends:** Single HTTP POST with `stream: true`
|
||
|
|
2. **Server responds:** `Content-Type: text/event-stream`
|
||
|
|
3. **Events stream:** 7 event types in sequence
|
||
|
|
4. **Client assembles:** Concatenate deltas to build response
|
||
|
|
5. **Connection closes:** After `message_stop` event
|
||
|
|
|
||
|
|
### Key Insights
|
||
|
|
|
||
|
|
- **Always streaming:** 100% of Claude Code responses
|
||
|
|
- **Fine-grained:** Text streams 1-20 chars per delta
|
||
|
|
- **Tools stream too:** `input_json_delta` for tool parameters
|
||
|
|
- **Cache info included:** Usage stats in `message_start`
|
||
|
|
- **Stop reason determines action:** `tool_use` triggers execution loop
|
||
|
|
|
||
|
|
### For Proxy Implementers
|
||
|
|
|
||
|
|
**MUST:**
|
||
|
|
- ✅ Support SSE (text/event-stream)
|
||
|
|
- ✅ Forward all 7 event types
|
||
|
|
- ✅ Handle partial JSON in tool inputs
|
||
|
|
- ✅ Buffer partial lines correctly
|
||
|
|
- ✅ Send events immediately (don't batch)
|
||
|
|
- ✅ Include cache metrics
|
||
|
|
|
||
|
|
**Common Pitfalls:**
|
||
|
|
- ❌ Buffering whole response before sending
|
||
|
|
- ❌ Not handling partial UTF-8 characters
|
||
|
|
- ❌ Batching events (breaks UX)
|
||
|
|
- ❌ Missing ping events (causes timeouts)
|
||
|
|
- ❌ Wrong event sequence (breaks parsing)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Last Updated:** 2025-11-11
|
||
|
|
**Based On:** Real traffic capture from monitor mode
|
||
|
|
**Status:** ✅ Complete with real examples
|