Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
453ce20
Add MCP compliance test infrastructure with core types and validation
ochafik Jul 9, 2025
0342c21
Add slash commands for MCP compliance test automation
ochafik Jul 9, 2025
23590fa
Add comprehensive MCP compliance test scenarios
ochafik Jul 9, 2025
b071da8
add CLAUDE.md + example stdio-wrapper
ochafik Jul 9, 2025
63d26a7
Implement stdio interceptor for MCP compliance testing
ochafik Jul 9, 2025
dfad4cc
Implement SSE interceptor for MCP compliance testing
ochafik Jul 9, 2025
44f03af
Implement streamable HTTP interceptor for MCP compliance testing
ochafik Jul 9, 2025
ac69959
Create MITM CLI to wire interceptors together
ochafik Jul 9, 2025
f58e1b6
Update CLAUDE.md
ochafik Jul 9, 2025
df5d5f0
/update_sdk typescript: Implement MCP compliance test server for Type…
ochafik Jul 9, 2025
37668df
add intermediate-outputs so far
ochafik Jul 9, 2025
8b020ea
feat(mitm): add --scenario-id flag to write scenario description as c…
ochafik Jul 10, 2025
36425fc
feat(validation): add parseJSONLLog function to handle comment lines
ochafik Jul 10, 2025
7a0d4e0
fix(mitm): correct path resolution for scenarios data.json
ochafik Jul 10, 2025
4b23306
feat(compliance): add golden generation script
ochafik Jul 10, 2025
60e2be9
fix(typescript-sdk): partial fixes for compilation errors
ochafik Jul 10, 2025
048f50b
Update CLAUDE.md
ochafik Jul 10, 2025
0a96a52
fix(typescript-sdk): update to SDK 1.15.0 and fix compilation errors
ochafik Jul 10, 2025
38707f2
refactor(typescript-sdk): simplify tool enable/disable using SDK feat…
ochafik Jul 10, 2025
8c294c4
fix(compliance): get generate-goldens working with TypeScript SDK
ochafik Jul 10, 2025
65abbdc
fix(compliance): make generate-goldens continue on errors and fix log…
ochafik Jul 10, 2025
cf5b952
fix(mitm): use __dirname for reliable path resolution
ochafik Jul 10, 2025
841bc86
fix(compliance): ensure scenario descriptions appear in golden files
ochafik Jul 10, 2025
3f3984c
feat(compliance): implement cross-SDK testing infrastructure
ochafik Jul 10, 2025
74da3d1
refactor(compliance): move goldens directory under scenarios
ochafik Jul 10, 2025
9d4bf3e
refactor(compliance): remove timestamps from logs and add proper send…
ochafik Jul 11, 2025
0497fd5
refactor(compliance): restructure AnnotatedJSONRPCMessage to use from…
ochafik Jul 11, 2025
e28ad0b
update goldens
ochafik Jul 11, 2025
04ae52e
chore: add package-lock files
ochafik Jul 11, 2025
2c675cc
fix(compliance): add elicitation handler to TypeScript SDK client
ochafik Jul 11, 2025
a57c77f
refactor(compliance): make elicitation handlers scenario-specific
ochafik Jul 11, 2025
bb4be94
update(compliance): fix scenario 3 golden to include tool response
ochafik Jul 11, 2025
112154d
Update package-lock.json
ochafik Jul 11, 2025
bb2479a
fix(compliance): improve stdio interceptor message handling and updat…
ochafik Jul 11, 2025
bced77a
Update 22.jsonl
ochafik Jul 11, 2025
b5749d9
Update update_scenarios.md slash command with comprehensive spec link…
ochafik Jul 11, 2025
be3d992
updated goldens
ochafik Jul 11, 2025
b9f0bcc
Update 8.jsonl
ochafik Jul 11, 2025
315ed9e
Update CLAUDE.md
ochafik Jul 11, 2025
a864c04
Update 24.jsonl
ochafik Jul 11, 2025
28da113
Update client.ts
ochafik Jul 11, 2025
23c168c
clarify(compliance): update scenario descriptions for clarity and acc…
ochafik Jul 11, 2025
975c0f6
update(compliance): update TypeScript SDK and golden files for scenar…
ochafik Jul 11, 2025
7911dab
update(compliance): update golden file comments to match scenario des…
ochafik Jul 11, 2025
0fd7e65
Update CLAUDE.md
ochafik Jul 11, 2025
69618a0
Update test-client
ochafik Jul 11, 2025
98d07de
Update test-server
ochafik Jul 11, 2025
74c9969
basic-test
ochafik Jul 11, 2025
504cac4
Create sdk-binary-validation.test.ts
ochafik Jul 11, 2025
03a3e2f
Merge branch 'ochafik/sdk-binary-validation' into ochafik/mcp-complia…
ochafik Jul 11, 2025
fdfdbb9
feat(compliance): add SDK binary validation tests
ochafik Jul 11, 2025
9b49e1a
Delete package-lock.json
ochafik Jul 13, 2025
210bd99
Delete package-lock.json
ochafik Jul 13, 2025
ef8bcc4
Delete package-lock.json
ochafik Jul 13, 2025
0a0d561
Revert "Delete package-lock.json"
ochafik Jul 13, 2025
f001e03
fix(compliance): fix TypeScript SDK test runner
ochafik Jul 14, 2025
2e4abc4
fix(compliance): handle scenario 24 elicitation test limitation
ochafik Jul 14, 2025
37014a1
feat(compliance): implement elicitation support in TypeScript SDK server
ochafik Jul 14, 2025
4eac14b
feat(compliance): implement proper elicitation support in TypeScript SDK
ochafik Jul 14, 2025
9266411
typescript: fix build by updating rootDir in tsconfig.json
ochafik Aug 5, 2025
06be33c
python: first run of `/update_sdk python`
ochafik Aug 5, 2025
cd8d3df
harness: enable TypeScript-Python cross-SDK testing
ochafik Aug 5, 2025
2a43e98
rust: first run of `/update_sdk rust`
ochafik Aug 5, 2025
8cab500
Update .gitignore
ochafik Aug 5, 2025
0691064
refresh goldens:
ochafik Aug 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions .claude/commands/cross_test_sdks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Cross-Test SDKs

This command runs compliance tests across different SDK combinations.

## Usage

```
/cross_test_sdks <sdk1> [<sdk2> ...]
```

Example:
```
/cross_test_sdks typescript python go
```

## Process

1. **Validate SDKs**
- Ensure all specified SDKs have implementations in `compliance/<sdk-name>/`
- Check that binaries are built and executable

2. **Run Test Matrix**
- For each client SDK:
- For each server SDK:
- For each scenario:
- Run the test with MITM logger
- Capture the replay log
- Compare against golden log
- Record success/failure

3. **Generate Report**
- Create a test matrix showing:
- ✅ Passing combinations
- ❌ Failing combinations
- Detailed error messages for failures
- Log differences when comparison fails

4. **Output Results**
- Console summary of pass/fail rates
- Detailed report in `compliance/test-results/`
- JSONL logs for failed tests for debugging

## Test Execution

For each test:
1. Start server binary with MITM logger
2. Run client binary through MITM
3. Capture all traffic
4. Normalize logs for comparison
5. Compare against golden reference
6. Clean up processes

## Error Handling

- Timeout if client/server hangs
- Capture stderr for debugging
- Ensure clean process termination
- Report transport-specific issues

## Output

- Summary matrix in console
- Detailed results in `compliance/test-results/cross-test-<timestamp>.json`
- Individual test logs for debugging failures
55 changes: 55 additions & 0 deletions .claude/commands/update_goldens.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Update Golden Replay Logs

This command captures the reference replay logs using the TypeScript SDK implementation.

## Usage

```
/update_goldens
```

## Prerequisites

- TypeScript SDK implementation must be complete (`compliance/typescript-sdk/`)
- All scenarios must be defined in `compliance/scenarios/data.json`
- Transport interceptor CLI must be built

## Process

1. **Build TypeScript SDK binaries**
- Ensure `test-client` and `test-server` are built and executable

2. **Run Each Scenario**
- For each scenario in `data.json`:
- Start the server with appropriate transport
- Run the client through the MITM logger
- Capture all JSON-RPC messages with annotations

3. **Generate Golden Files**
- Save captured logs to `compliance/goldens/<scenario-id>.jsonl`
- Each line is an `AnnotatedJSONRPCMessage`
- Include all metadata (sender, recipient, timestamps, transport details)

4. **Validate Goldens**
- Ensure all scenarios have corresponding golden files
- Validate that logs match expected patterns
- Check for completeness of captured interactions

## Example Command

For stdio transport:
```bash
compliance/typescript-sdk/test-client \
--scenario-id 1 \
--client-id "client1" \
stdio \
compliance/harness/mitm stdio \
--log compliance/goldens/1.jsonl \
compliance/typescript-sdk/test-server \
--server-name CalcServer \
--transport stdio
```

## Output

Creates JSONL files in `compliance/goldens/` that serve as the reference for cross-SDK testing. These files contain the expected message flow for each scenario.
98 changes: 98 additions & 0 deletions .claude/commands/update_scenarios.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Update MCP Compliance Scenarios

This command suggests improvements to `compliance/scenarios/data.json` based on the MCP specification. It can be run from scratch to seed scenarios or after spec updates to ensure coverage.

## Usage

Run this command to:
1. Create initial scenarios if `data.json` doesn't exist
2. Update scenarios after spec changes to ensure coverage
3. Review and disambiguate existing scenario descriptions
4. Handle golden file updates if they exist

## Specification Files

This command must analyze ALL of the following specification files:

### Main Specification
- `@/docs/specification/draft/index.mdx` - Overview and key principles

### Architecture
- `@/docs/specification/draft/architecture/index.mdx` - System architecture

### Basic Protocol
- `@/docs/specification/draft/basic/index.mdx` - Core protocol mechanics
- `@/docs/specification/draft/basic/lifecycle.mdx` - Connection lifecycle
- `@/docs/specification/draft/basic/transports.mdx` - Transport layers (stdio, SSE, streamable HTTP)
- `@/docs/specification/draft/basic/authorization.mdx` - Authorization mechanisms
- `@/docs/specification/draft/basic/security_best_practices.mdx` - Security guidelines
- `@/docs/specification/draft/basic/utilities/cancellation.mdx` - Request cancellation
- `@/docs/specification/draft/basic/utilities/ping.mdx` - Ping/pong keepalive
- `@/docs/specification/draft/basic/utilities/progress.mdx` - Progress notifications

### Server Features
- `@/docs/specification/draft/server/index.mdx` - Server capabilities overview
- `@/docs/specification/draft/server/tools.mdx` - Tool definitions and execution
- `@/docs/specification/draft/server/resources.mdx` - Resource management and subscriptions
- `@/docs/specification/draft/server/prompts.mdx` - Prompt templates
- `@/docs/specification/draft/server/utilities/completion.mdx` - Text completion
- `@/docs/specification/draft/server/utilities/logging.mdx` - Server-side logging
- `@/docs/specification/draft/server/utilities/pagination.mdx` - Pagination patterns

### Client Features
- `@/docs/specification/draft/client/index.mdx` - Client capabilities overview
- `@/docs/specification/draft/client/elicitation.mdx` - User input elicitation
- `@/docs/specification/draft/client/roots.mdx` - File system roots
- `@/docs/specification/draft/client/sampling.mdx` - LLM sampling requests

## Implementation Guidelines

### 1. Scenario Quality Requirements
- **Unambiguous**: Descriptions must be prescriptive enough that any SDK implementation would produce identical replay logs
- **Complete**: Each scenario should specify exact inputs, outputs, and expected behavior
- **Testable**: Scenarios must be verifiable through message logs

### 2. Process for New Scenarios
1. Analyze all specification files listed above
2. Extract protocol features and requirements
3. Design scenarios covering:
- Basic connectivity and initialization
- Tool invocation (simple and with elicitation)
- Resource management and subscriptions
- Prompt templates with parameters
- Multi-client scenarios
- Transport-specific scenarios (stdio, SSE, streamable HTTP)
- Error handling and edge cases
- Cancellation flows
- Progress notifications
- Concurrent operations
- Protocol version negotiation
- Change notifications (tools, resources, roots)

### 3. Process for Existing Scenarios with Goldens
If golden files exist in `@/compliance/scenarios/goldens/`:
1. Review each scenario description for ambiguity
2. Update TypeScript SDK implementation if scenarios change
3. Update golden files and verify they match expected behavior per specs
4. Iterate until golden logs align with scenario descriptions
5. Use one subagent per scenario for parallel analysis

### 4. Golden File Handling
- Golden files contain expected JSON-RPC message sequences
- First line(s) are comments with scenario description
- Messages include sender/recipient metadata and transport details
- Ensure goldens match what specs prescribe for each scenario

## Output

Produce updates to `@/compliance/scenarios/data.json` matching the `Scenarios` type from `@/compliance/src/types.ts`.

## Execution Strategy

Use subagents for parallel work:
- One subagent per scenario for reviewing/updating
- Subagents should check:
- Scenario description clarity
- Coverage of spec features
- Golden file alignment (if exists)
- Ambiguity in expected behavior
65 changes: 65 additions & 0 deletions .claude/commands/update_sdk.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Update SDK Implementation

This command generates or updates client and server test binaries for a specific SDK.

## Usage

```
/update_sdk <sdk-name>
```

Where `<sdk-name>` is one of:
- typescript
- python
- go
- java
- kotlin
- rust
- swift
- csharp
- ruby

## Process

1. **Create/Update CLAUDE.md**
- Create `compliance/<sdk-name>/CLAUDE.md` with:
- Links to SDK documentation
- Links to official examples
- SDK-specific implementation guidance
- Build and test instructions

2. **Generate Client Binary**
- Create `compliance/<sdk-name>/test-client` that:
- Accepts `--scenario-id` and `--id` flags
- Reads scenarios from `compliance/scenarios/data.json`
- Implements the client behavior for the specified scenario
- Supports stdio and HTTP transports as specified

3. **Generate Server Binary**
- Create `compliance/<sdk-name>/test-server` that:
- Accepts `--server-name` and `--transport` flags
- Reads server definitions from `compliance/scenarios/data.json`
- Implements CalcServer functionality as defined
- Supports all specified transports

4. **Generate Tests**
- Create unit tests for client and server implementations
- Ensure binaries can be built and run
- Test basic scenario execution

5. **Build Scripts**
- Create build scripts appropriate for the SDK
- Ensure binaries are executable from the compliance test harness

## Implementation Details

Each SDK implementation should:
- Check that scenario/server descriptions match when built
- Handle all transport types correctly
- Implement proper error handling
- Support JSON-RPC message format
- Follow SDK best practices and idioms

## Output

The command creates a fully functional SDK test implementation in `compliance/<sdk-name>/` ready for cross-SDK testing.
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,7 @@
node_modules/
.DS_Store
target/
intermediate-outputs/
__pycache__/
uv.lock
package-lock.json
Loading
Loading