16 KiB
Protocol Compliance Plan: Achieving 1:1 Claude Code Compatibility
Goal: Ensure Claudish proxy provides identical user experience to official Claude Code, regardless of which model is used.
Status: Testing framework complete ✅ | Proxy fixes pending ⏳
Executive Summary
We have built a comprehensive snapshot testing system that captures real Claude Code protocol interactions and validates proxy responses. The current proxy implementation is 60-70% compliant with critical gaps in streaming protocol, tool handling, and cache metrics.
What's Complete ✅
- Monitor Mode - Pass-through proxy with complete logging
- Fixture Capture - Tool to extract test cases from monitor logs
- Snapshot Tests - Automated validation of protocol compliance
- Protocol Validators - Event sequence, block indices, tool streaming, usage, stop reasons
- Example Fixtures - Documented examples for text and tool use
- Workflow Scripts - End-to-end capture → test automation
What's Pending ⏳
- Fix content block index management (CRITICAL)
- Add tool input JSON validation (CRITICAL)
- Implement continuous ping events (MEDIUM)
- Add cache metrics emulation (MEDIUM)
- Capture comprehensive fixture library (20+ scenarios)
- Run full test suite and fix remaining issues
Testing System Architecture
╔══════════════════════════════════════════════════════════════╗
║ MONITOR MODE (Capture) ║
╠══════════════════════════════════════════════════════════════╣
║ ║
║ 1. Run: ./dist/index.js --monitor "query" ║
║ 2. Captures: Request + Response (SSE events) ║
║ 3. Logs: Complete Anthropic API traffic ║
║ ║
║ Output: logs/capture_*.log ║
╚══════════════════════════════════════════════════════════════╝
↓
╔══════════════════════════════════════════════════════════════╗
║ FIXTURE GENERATION (Extract) ║
╠══════════════════════════════════════════════════════════════╣
║ ║
║ 1. Parse: bun tests/capture-fixture.ts logs/file.log ║
║ 2. Normalize: Dynamic values (IDs, timestamps) ║
║ 3. Analyze: Build assertions (blocks, sequence, usage) ║
║ ║
║ Output: tests/fixtures/*.json ║
╚══════════════════════════════════════════════════════════════╝
↓
╔══════════════════════════════════════════════════════════════╗
║ SNAPSHOT TESTING (Validate) ║
╠══════════════════════════════════════════════════════════════╣
║ ║
║ 1. Replay: Request through proxy ║
║ 2. Capture: Actual SSE response ║
║ 3. Validate: Against captured fixture ║
║ 4. Report: Pass/Fail with detailed errors ║
║ ║
║ Run: bun test tests/snapshot.test.ts ║
╚══════════════════════════════════════════════════════════════╝
Protocol Requirements (From Analysis)
Streaming Events (7 Types)
Claude Code ALWAYS uses streaming. Complete sequence:
- message_start → Initialize message with usage
- content_block_start → Begin text or tool block
- content_block_delta → Stream content incrementally
- ping → Keep-alive (every 15s)
- content_block_stop → End content block
- message_delta → Stop reason + final usage
- message_stop → Stream complete
Content Block Management
Blocks must have sequential indices:
Expected: [text @ 0] [tool @ 1] [tool @ 2]
Current: [text @ 0] [tool @ 0] [tool @ 1] ❌ WRONG
Fine-Grained Tool Streaming
Tool input must stream as partial JSON:
// Chunk 1: {"event": "content_block_delta", "data": {"delta": {"partial_json": "{\"file"}}}
// Chunk 2: {"event": "content_block_delta", "data": {"delta": {"partial_json": "_path\":\"test.ts\""}}}
// Chunk 3: {"event": "content_block_delta", "data": {"delta": {"partial_json": "}"}}}
// Result: {"file_path":"test.ts"} ✅ Valid JSON
Usage Metrics
Must include cache metrics:
{
"usage": {
"input_tokens": 150,
"cache_creation_input_tokens": 5501, // NEW
"cache_read_input_tokens": 0, // NEW
"output_tokens": 50,
"cache_creation": { // OPTIONAL
"ephemeral_5m_input_tokens": 5501
}
}
}
Required Headers
anthropic-version: 2023-06-01
anthropic-beta: oauth-2025-04-20,interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14
Critical Fixes Required
1. Content Block Index Management (CRITICAL)
File: src/proxy-server.ts:600-850
Current Problem:
// Line 750 - Text block delta
sendSSE("content_block_delta", {
index: 0, // ❌ Hardcoded!
delta: { type: "text_delta", text: delta.content }
});
// Line 787 - Text block stop
sendSSE("content_block_stop", {
index: 0, // ❌ Hardcoded!
});
Fix Required:
// Initialize block tracking
let currentBlockIndex = 0;
let textBlockIndex = -1;
const toolBlocks = new Map<number, number>(); // toolIndex → blockIndex
// Start text block
textBlockIndex = currentBlockIndex++;
sendSSE("content_block_start", {
index: textBlockIndex,
content_block: { type: "text", text: "" }
});
// Text delta
sendSSE("content_block_delta", {
index: textBlockIndex, // ✅ Correct
delta: { type: "text_delta", text: delta.content }
});
// Start tool block
const toolBlockIndex = currentBlockIndex++;
toolBlocks.set(toolIndex, toolBlockIndex);
sendSSE("content_block_start", {
index: toolBlockIndex, // ✅ Sequential
content_block: { type: "tool_use", id: toolId, name: toolName }
});
Impact: HIGH - Claude Code may reject responses with incorrect indices
Complexity: MEDIUM - Need to track state across stream
2. Tool Input JSON Validation (CRITICAL)
File: src/proxy-server.ts:829
Current Problem:
// Line 829 - Close tool block immediately
if (choice?.finish_reason === "tool_calls") {
sendSSE("content_block_stop", {
index: toolState.blockIndex // No validation!
});
}
Fix Required:
// Validate JSON before closing
if (choice?.finish_reason === "tool_calls") {
for (const [toolIndex, toolState] of toolCalls.entries()) {
// Validate JSON is complete
try {
JSON.parse(toolState.args);
log(`[Proxy] Tool ${toolState.name} arguments valid JSON`);
sendSSE("content_block_stop", {
index: toolState.blockIndex
});
} catch (e) {
log(`[Proxy] WARNING: Tool ${toolState.name} has incomplete JSON!`);
log(`[Proxy] Args so far: ${toolState.args}`);
// Don't close block yet - wait for more chunks
}
}
}
Impact: HIGH - Malformed tool calls will fail execution
Complexity: LOW - Simple JSON.parse check
3. Continuous Ping Events (MEDIUM)
File: src/proxy-server.ts:636
Current Problem:
// Line 636 - One ping at start
sendSSE("ping", {
type: "ping",
});
// No more pings!
Fix Required:
// Send ping every 15 seconds
const pingInterval = setInterval(() => {
if (!isClosed) {
sendSSE("ping", { type: "ping" });
}
}, 15000);
// Clear interval when done
try {
// ... streaming logic ...
} finally {
clearInterval(pingInterval);
if (!isClosed) {
controller.close();
isClosed = true;
}
}
Impact: MEDIUM - Long streams may timeout without pings
Complexity: LOW - Simple setInterval
4. Cache Metrics Emulation (MEDIUM)
File: src/proxy-server.ts:614
Current Problem:
// Line 614 - Missing cache fields
usage: {
input_tokens: 0,
cache_creation_input_tokens: 0, // Present but always 0
cache_read_input_tokens: 0, // Present but always 0
output_tokens: 0
}
Fix Required:
// Estimate cache metrics from multi-turn conversations
// First turn: All tokens go to cache_creation
// Subsequent turns: Most tokens come from cache_read
let isFirstTurn = /* detect from conversation history */;
let estimatedCacheTokens = Math.floor(inputTokens * 0.8);
usage: {
input_tokens: inputTokens,
cache_creation_input_tokens: isFirstTurn ? estimatedCacheTokens : 0,
cache_read_input_tokens: isFirstTurn ? 0 : estimatedCacheTokens,
output_tokens: outputTokens,
cache_creation: {
ephemeral_5m_input_tokens: isFirstTurn ? estimatedCacheTokens : 0
}
}
Impact: MEDIUM - Inaccurate cost tracking in Claude Code UI
Complexity: MEDIUM - Need conversation state tracking
5. Stop Reason Validation (LOW)
File: src/proxy-server.ts:695
Current Check:
// Line 695 - Basic mapping exists
stop_reason: "end_turn", // From mapStopReason()
Verify Mapping:
function mapStopReason(finishReason: string | undefined): string {
switch (finishReason) {
case "stop": return "end_turn"; // ✅
case "length": return "max_tokens"; // ✅
case "tool_calls": return "tool_use"; // ✅
case "content_filter": return "stop_sequence"; // ⚠️ Not quite right
default: return "end_turn"; // ✅ Safe fallback
}
}
Impact: LOW - Already mostly correct
Complexity: LOW - Verify edge cases
Testing Workflow
Phase 1: Capture Fixtures (2-3 hours)
Capture comprehensive test cases:
# Build
bun run build
# Capture scenarios
./tests/snapshot-workflow.sh --capture
Scenarios to Capture (20+ fixtures):
- Simple text (2+2)
- Long text (explain quantum physics)
- Read file
- Grep search
- Glob pattern
- Write file
- Edit file
- Bash command
- Multi-tool (Read + Edit)
- Tool with error
- Multi-turn conversation
- All 16 official tools
- Thinking mode (if supported)
- Max tokens reached
- Content filter
Phase 2: Run Baseline Tests (30 mins)
Run tests to identify failures:
bun test tests/snapshot.test.ts --verbose > test-results.txt 2>&1
Expected Failures (before fixes):
- ❌ Content block indices
- ❌ Tool JSON validation
- ⚠️ Ping events (may pass if short)
- ⚠️ Cache metrics (present but zero)
Phase 3: Fix Proxy (1-2 days)
Implement fixes in order:
- Day 1 Morning: Fix content block indices
- Day 1 Afternoon: Add tool JSON validation
- Day 2 Morning: Add continuous ping events
- Day 2 Afternoon: Add cache metrics estimation
Phase 4: Validate (1-2 hours)
Re-run tests after each fix:
# After each fix
bun test tests/snapshot.test.ts
# Expected progression:
# After fix #1: 70-80% pass
# After fix #2: 85-90% pass
# After fix #3: 90-95% pass
# After fix #4: 95-100% pass
Phase 5: Integration Testing (2-3 hours)
Test with real Claude Code:
# Start proxy
./dist/index.js --model "anthropic/claude-sonnet-4.5"
# In another terminal, use real Claude Code
# Point it to localhost:8337
# Perform various tasks
# Validate:
# - No errors in Claude Code UI
# - Tools execute correctly
# - Multi-turn conversations work
# - Cost tracking accurate
Success Criteria
For 1:1 compatibility:
- ✅ 100% test coverage for critical paths
- ✅ All snapshot tests pass
- ✅ Event sequences match protocol spec
- ✅ Block indices sequential (0, 1, 2, ...)
- ✅ Tool JSON validates before block close
- ✅ Ping events sent every 15 seconds
- ✅ Cache metrics present (even if estimated)
- ✅ Stop reason valid in all cases
- ✅ No Claude Code errors in real usage
- ✅ Multi-turn works perfectly
Risk Mitigation
If OpenRouter Models Don't Support Feature X
Problem: Model doesn't provide thinking mode, cache metrics, etc.
Solution: Implement graceful degradation
// Example: Thinking mode emulation
if (modelSupportsThinking(model)) {
// Use real thinking blocks
} else {
// Convert to text blocks with prefix
sendSSE("content_block_delta", {
index: textBlockIndex,
delta: {
type: "text_delta",
text: "[Thinking: " + thinkingContent + "]\n\n"
}
});
}
If Tests Fail on Specific Models
Problem: Model behaves differently than Claude
Solution: Model-specific adapters
// tests/model-adapters.ts
export const modelAdapters = {
"openai/gpt-4": {
// GPT-4 specific quirks
requiresSpecialToolFormat: true,
maxToolsPerCall: 5
},
"anthropic/claude-sonnet-4.5": {
// Should be 100% compatible
requiresSpecialToolFormat: false
}
};
If Proxy Performance Issues
Problem: Snapshot tests timeout
Solution: Optimize streaming
// Batch small deltas
let deltaBuffer = "";
let bufferTimeout: Timer;
function sendDelta(text: string) {
deltaBuffer += text;
clearTimeout(bufferTimeout);
bufferTimeout = setTimeout(() => {
if (deltaBuffer) {
sendSSE("content_block_delta", { /* ... */ });
deltaBuffer = "";
}
}, 50); // Batch deltas every 50ms
}
Timeline
| Phase | Duration | Status |
|---|---|---|
| Testing Framework | 1 day | ✅ Complete |
| Fixture Capture | 2-3 hours | ⏳ Pending |
| Proxy Fixes | 1-2 days | ⏳ Pending |
| Validation | 2-3 hours | ⏳ Pending |
| Total | 2-3 days | In Progress |
Next Steps
-
Immediate (Today):
- Run
./tests/snapshot-workflow.sh --captureto build fixture library - Run
bun test tests/snapshot.test.tsto see current failures - Start with Fix #1 (content block indices)
- Run
-
Tomorrow:
- Complete Fixes #1-2 (critical)
- Re-run tests, validate improvements
- Implement Fixes #3-4 (medium priority)
-
Day 3:
- Run full test suite
- Fix any remaining issues
- Integration test with real Claude Code
- Document model-specific limitations
Files Created
| File | Purpose |
|---|---|
tests/capture-fixture.ts |
Extract fixtures from monitor logs |
tests/snapshot.test.ts |
Snapshot test runner with validators |
tests/fixtures/README.md |
Fixture format documentation |
tests/fixtures/example_simple_text.json |
Example text fixture |
tests/fixtures/example_tool_use.json |
Example tool use fixture |
tests/snapshot-workflow.sh |
End-to-end workflow automation |
SNAPSHOT_TESTING.md |
Testing system documentation |
PROTOCOL_COMPLIANCE_PLAN.md |
This file |
References
- Protocol Specification - Complete protocol docs
- Snapshot Testing Guide - Testing system docs
- Monitor Mode Guide - Monitor mode usage
- Streaming Protocol - SSE event details
Status: Framework complete, ready for fixture capture and proxy fixes
Next Action: Run ./tests/snapshot-workflow.sh --capture
Owner: Jack Rudenko @ MadAppGang
Last Updated: 2025-01-15