MCP Testing Framework - API Reference

Complete API documentation with endpoints, parameters, and response examples

MCP Testing Evaluation Framework - API Reference

Complete API reference guide for developers integrating with the MCP Testing Evaluation Framework

Table of Contents

API Overview

The MCP Testing Evaluation Framework provides both REST API and WebSocket interfaces for programmatic access to evaluation capabilities. The API is designed for integration with CI/CD pipelines, development tools, and custom applications.

Base URLs

  • REST API: http://localhost:3457/api/v1
  • WebSocket: ws://localhost:3457/ws
  • Dashboard: http://localhost:3457

API Versioning

The API uses URL-based versioning with the following support matrix:

VersionStatusSupport LevelSunset Date
v1CurrentFull SupportN/A
v0DeprecatedSecurity Only2025-06-01

Content Types

  • Request: application/json
  • Response: application/json
  • WebSocket: text/plain (JSON strings)

Authentication & Authorization

API Key Authentication

For production deployments, API key authentication is required:

http
Authorization: Bearer <api-key>

Public Access Mode

For local development and testing, authentication can be disabled:

json
{
  "security": {
    "requireAuth": false,
    "allowedOrigins": ["http://localhost:3000"]
  }
}

Permission Scopes

ScopeDescriptionEndpoints
evaluate:readView evaluationsGET endpoints
evaluate:writeStart/stop evaluationsPOST, DELETE endpoints
evaluate:adminSystem administrationConfig endpoints
evaluate:exportExport capabilitiesExport endpoints

JWT Token Structure

json
{
  "iss": "mcp-evaluator",
  "sub": "user-id",
  "aud": "api",
  "exp": 1640995200,
  "iat": 1640908800,
  "scopes": ["evaluate:read", "evaluate:write"]
}

REST API Endpoints

Evaluations

#### Start New Evaluation

http
POST /api/v1/evaluations
Content-Type: application/json
Authorization: Bearer <token>

{ "serverPath": "/path/to/mcp-server", "options": { "transport": "stdio", "runStatic": true, "runRuntime": true, "timeout": 30000, "retries": 3, "tags": ["development", "ci"] } }

Response (201 Created):
json
{
  "id": "eval_1704284400000",
  "status": "queued",
  "serverPath": "/path/to/mcp-server",
  "options": {
    "transport": "stdio",
    "runStatic": true,
    "runRuntime": true,
    "timeout": 30000,
    "retries": 3
  },
  "createdAt": "2024-01-03T10:00:00Z",
  "estimatedDuration": 45000,
  "queuePosition": 1
}

#### Get Evaluation Status

http
GET /api/v1/evaluations/{id}
Authorization: Bearer <token>
Response (200 OK):
json
{
  "id": "eval_1704284400000",
  "status": "running",
  "serverPath": "/path/to/mcp-server",
  "progress": {
    "phase": "runtime",
    "static": {
      "total": 5,
      "completed": 5,
      "passed": 4,
      "failed": 1
    },
    "runtime": {
      "total": 12,
      "completed": 8,
      "passed": 7,
      "failed": 1
    }
  },
  "currentTask": "Testing tool: search_documents",
  "startedAt": "2024-01-03T10:00:00Z",
  "updatedAt": "2024-01-03T10:02:30Z",
  "estimatedCompletion": "2024-01-03T10:03:45Z",
  "results": {
    "static": {
      "Functionality Match": {
        "score": 0.9,
        "status": "pass",
        "evidence": ["All documented features implemented"],
        "recommendations": []
      }
    },
    "runtime": {
      "toolTests": [
        {
          "name": "search_documents",
          "status": "passed",
          "responseTime": 245,
          "result": {
            "success": true,
            "data": "Tool executed successfully"
          }
        }
      ]
    }
  }
}

#### List Evaluations

http
GET /api/v1/evaluations
Authorization: Bearer <token>

Query Parameters:

  • • status: queued
    runningcompleted
    failed
  • • limit: number (default: 50, max: 200)
  • • offset: number (default: 0)
  • • sortBy: createdAt
    updatedAt
    score
  • • sortOrder: asc|desc
  • • tags: comma-separated list
  • • serverPath: filter by server path
  • • dateFrom: ISO 8601 date
  • • dateTo: ISO 8601 date

Response (200 OK):
json
{
  "evaluations": [
    {
      "id": "eval_1704284400000",
      "status": "completed",
      "serverPath": "/path/to/server",
      "score": 85.5,
      "createdAt": "2024-01-03T10:00:00Z",
      "completedAt": "2024-01-03T10:03:45Z",
      "duration": 225000,
      "tags": ["development"]
    }
  ],
  "pagination": {
    "total": 150,
    "limit": 50,
    "offset": 0,
    "hasNext": true,
    "hasPrev": false
  },
  "filters": {
    "status": "completed",
    "dateRange": "2024-01-01T00:00:00Z to 2024-01-03T23:59:59Z"
  }
}

#### Cancel Evaluation

http
DELETE /api/v1/evaluations/{id}
Authorization: Bearer <token>
Response (200 OK):
json
{
  "id": "eval_1704284400000",
  "status": "cancelled",
  "message": "Evaluation cancelled successfully",
  "cancelledAt": "2024-01-03T10:02:00Z"
}

Reports

#### Generate Evaluation Report

http
POST /api/v1/evaluations/{id}/reports
Content-Type: application/json
Authorization: Bearer <token>

{ "format": "markdown", "includeEvidence": true, "includeRecommendations": true, "template": "submission" }

Response (200 OK):
json
{
  "reportId": "rpt_1704284500000",
  "format": "markdown",
  "downloadUrl": "/api/v1/reports/rpt_1704284500000/download",
  "expiresAt": "2024-01-10T10:00:00Z",
  "size": 15420,
  "createdAt": "2024-01-03T10:05:00Z"
}

#### Download Report

http
GET /api/v1/reports/{reportId}/download
Authorization: Bearer <token>
Response (200 OK):
code
Content-Type: application/octet-stream
Content-Disposition: attachment; filename="evaluation-report.md"

<h1 class="text-3xl font-bold mt-12 mb-8">MCP Server Evaluation Report</h1> ...report content...

Tool Testing

#### Test Individual Tool

http
POST /api/v1/tools/test
Content-Type: application/json
Authorization: Bearer <token>

{ "serverPath": "/path/to/server", "toolName": "search_documents", "arguments": { "query": "test query", "limit": 10 }, "transport": "stdio", "timeout": 15000 }

Response (200 OK):
json
{
  "testId": "test_1704284600000",
  "toolName": "search_documents",
  "status": "passed",
  "executionTime": 245,
  "result": {
    "success": true,
    "data": {
      "documents": [
        {"id": 1, "title": "Test Document", "score": 0.95}
      ]
    }
  },
  "validation": {
    "responseTime": "excellent",
    "errorHandling": "good",
    "dataFormat": "valid"
  },
  "testedAt": "2024-01-03T10:10:00Z"
}

System Configuration

#### Get System Configuration

http
GET /api/v1/config
Authorization: Bearer <token>
Response (200 OK):
json
{
  "version": "1.0.0",
  "capabilities": {
    "staticAnalysis": true,
    "runtimeTesting": true,
    "observabilityIntegration": true,
    "multiFormat": true
  },
  "limits": {
    "maxConcurrentEvaluations": 5,
    "maxFileSize": 104857600,
    "evaluationTimeout": 300000
  },
  "supportedTransports": ["stdio", "sse", "http"],
  "supportedFormats": ["json", "markdown", "html", "xml"],
  "hooks": {
    "available": [
      "functionality-match",
      "prompt-injection",
      "tool-naming",
      "working-examples",
      "error-handling"
    ],
    "custom": []
  }
}

#### Update Configuration

http
PUT /api/v1/config
Content-Type: application/json
Authorization: Bearer <token>

{ "limits": { "maxConcurrentEvaluations": 3, "evaluationTimeout": 180000 }, "observability": { "enabled": true, "endpoint": "http://localhost:3456" } }

Response (200 OK):
json
{
  "updated": true,
  "changes": [
    "limits.maxConcurrentEvaluations: 5 → 3",
    "limits.evaluationTimeout: 300000 → 180000"
  ],
  "appliedAt": "2024-01-03T10:15:00Z"
}

Health & Metrics

#### Health Check

http
GET /api/v1/health
Response (200 OK):
json
{
  "status": "healthy",
  "version": "1.0.0",
  "uptime": 3661.23,
  "memory": {
    "used": 67108864,
    "total": 134217728,
    "percentage": 50.0
  },
  "services": {
    "database": "healthy",
    "inspector": "healthy",
    "hooks": "healthy",
    "observability": "degraded"
  },
  "lastCheck": "2024-01-03T10:20:00Z"
}

#### System Metrics

http
GET /api/v1/metrics
Authorization: Bearer <token>
Response (200 OK):
json
{
  "evaluations": {
    "total": 1247,
    "completed": 1156,
    "failed": 91,
    "averageScore": 78.3,
    "averageDuration": 42500
  },
  "performance": {
    "requestsPerMinute": 15.2,
    "responseTime": {
      "p50": 125,
      "p95": 450,
      "p99": 1200
    },
    "errorRate": 0.02
  },
  "resources": {
    "cpuUsage": 23.5,
    "memoryUsage": 45.8,
    "diskUsage": 12.3
  },
  "period": "24h",
  "generatedAt": "2024-01-03T10:25:00Z"
}

WebSocket API

Connection

javascript
const ws = new WebSocket('ws://localhost:3457/ws');

// Authentication (if required) ws.onopen = () => { ws.send(JSON.stringify({ type: 'auth', token: 'your-jwt-token' })); };

Event Types

#### Evaluation Events

evaluation:started
json
{
  "type": "evaluation:started",
  "data": {
    "id": "eval_1704284400000",
    "serverPath": "/path/to/server",
    "timestamp": "2024-01-03T10:00:00Z"
  }
}
evaluation:progress
json
{
  "type": "evaluation:progress",
  "data": {
    "id": "eval_1704284400000",
    "phase": "runtime",
    "progress": {
      "current": 8,
      "total": 12,
      "percentage": 66.7
    },
    "currentTask": "Testing tool: analyze_data",
    "estimatedCompletion": "2024-01-03T10:03:45Z"
  }
}
evaluation:completed
json
{
  "type": "evaluation:completed",
  "data": {
    "id": "eval_1704284400000",
    "score": 85.5,
    "status": "completed",
    "duration": 225000,
    "completedAt": "2024-01-03T10:03:45Z",
    "summary": {
      "passed": 4,
      "warned": 1,
      "failed": 0
    }
  }
}

#### Hook Events

hook:running
json
{
  "type": "hook:running",
  "data": {
    "evaluationId": "eval_1704284400000",
    "hook": "functionality-match",
    "requirement": "Functionality Match",
    "startedAt": "2024-01-03T10:01:00Z"
  }
}
hook:completed
json
{
  "type": "hook:completed",
  "data": {
    "evaluationId": "eval_1704284400000",
    "hook": "functionality-match",
    "score": 0.9,
    "status": "pass",
    "duration": 1500,
    "evidence": ["All documented features implemented"],
    "recommendations": []
  }
}

#### Runtime Testing Events

tool:testing
json
{
  "type": "tool:testing",
  "data": {
    "evaluationId": "eval_1704284400000",
    "toolName": "search_documents",
    "arguments": {"query": "test", "limit": 5},
    "startedAt": "2024-01-03T10:02:15Z"
  }
}
tool:result
json
{
  "type": "tool:result",
  "data": {
    "evaluationId": "eval_1704284400000",
    "toolName": "search_documents",
    "status": "passed",
    "responseTime": 245,
    "result": {
      "success": true,
      "data": {...}
    }
  }
}

#### System Events

system:status
json
{
  "type": "system:status",
  "data": {
    "activeEvaluations": 3,
    "queueLength": 2,
    "systemLoad": 45.2,
    "timestamp": "2024-01-03T10:30:00Z"
  }
}
error:occurred
json
{
  "type": "error:occurred",
  "data": {
    "evaluationId": "eval_1704284400000",
    "error": {
      "code": "HOOK_EXECUTION_FAILED",
      "message": "Hook functionality-match.py failed with exit code 1",
      "details": "Python traceback here..."
    },
    "timestamp": "2024-01-03T10:01:30Z"
  }
}

Client Commands

#### Subscribe to Evaluation

json
{
  "type": "subscribe",
  "data": {
    "evaluationId": "eval_1704284400000",
    "events": ["progress", "completed", "error"]
  }
}

#### Unsubscribe

json
{
  "type": "unsubscribe",
  "data": {
    "evaluationId": "eval_1704284400000"
  }
}

#### Get Live Status

json
{
  "type": "getStatus",
  "data": {
    "evaluationId": "eval_1704284400000"
  }
}

Request/Response Formats

Standard Response Envelope

All API responses follow this structure:

json
{
  "success": true,
  "data": {...},
  "meta": {
    "requestId": "req_1704284700000",
    "timestamp": "2024-01-03T10:35:00Z",
    "version": "1.0.0",
    "rateLimit": {
      "limit": 1000,
      "remaining": 987,
      "resetAt": "2024-01-03T11:00:00Z"
    }
  }
}

Error Response Format

json
{
  "success": false,
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Invalid server path provided",
    "details": {
      "field": "serverPath",
      "value": "/invalid/path",
      "constraint": "must be an existing directory"
    },
    "documentation": "https://docs.mcp-evaluator.com/errors/VALIDATION_ERROR"
  },
  "meta": {
    "requestId": "req_1704284800000",
    "timestamp": "2024-01-03T10:40:00Z",
    "version": "1.0.0"
  }
}

Validation Schema

Request validation follows JSON Schema:

json
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://api.mcp-evaluator.com/schemas/evaluation-request",
  "type": "object",
  "required": ["serverPath"],
  "properties": {
    "serverPath": {
      "type": "string",
      "minLength": 1,
      "maxLength": 500,
      "pattern": "^[^\\0]+$"
    },
    "options": {
      "type": "object",
      "properties": {
        "transport": {
          "type": "string",
          "enum": ["stdio", "sse", "http"]
        },
        "timeout": {
          "type": "integer",
          "minimum": 1000,
          "maximum": 300000
        },
        "retries": {
          "type": "integer",
          "minimum": 0,
          "maximum": 10
        }
      },
      "additionalProperties": false
    }
  },
  "additionalProperties": false
}

Error Handling

Error Codes

CodeHTTP StatusDescriptionRetry Safe
VALIDATION_ERROR400Invalid request parametersNo
AUTHENTICATION_REQUIRED401Missing or invalid authenticationNo
INSUFFICIENT_PERMISSIONS403Insufficient scopesNo
RESOURCE_NOT_FOUND404Evaluation/resource not foundNo
RATE_LIMIT_EXCEEDED429Too many requestsYes
SERVER_UNREACHABLE422Cannot connect to MCP serverYes
EVALUATION_TIMEOUT422Evaluation exceeded timeoutYes
CONCURRENT_LIMIT503Max concurrent evaluations reachedYes
INTERNAL_SERVER_ERROR500Unexpected server errorYes

Error Recovery Strategies

javascript
class APIClient {
  async makeRequest(endpoint, options) {
    const maxRetries = 3;
    const backoffBase = 1000;
    
    for (let attempt = 0; attempt <= maxRetries; attempt++) {
      try {
        const response = await fetch(endpoint, options);
        
        if (response.ok) {
          return await response.json();
        }
        
        const error = await response.json();
        
        // Don't retry client errors (4xx) except rate limiting
        if (response.status >= 400 && response.status < 500 && response.status !== 429) {
          throw new APIError(error.error.code, error.error.message);
        }
        
        // Server errors and rate limiting are retryable
        if (attempt === maxRetries) {
          throw new APIError(error.error.code, error.error.message);
        }
        
        // Exponential backoff with jitter
        const delay = backoffBase <em class="italic"> Math.pow(2, attempt) + Math.random() </em> 1000;
        await new Promise(resolve => setTimeout(resolve, delay));
        
      } catch (error) {
        if (attempt === maxRetries) throw error;
      }
    }
  }
}

WebSocket Error Handling

javascript
const ws = new WebSocket('ws://localhost:3457/ws');

ws.onerror = (error) => { console.error('WebSocket error:', error); };

ws.onclose = (event) => { if (event.code !== 1000) { console.warn('WebSocket closed unexpectedly:', event.code, event.reason); // Reconnect with exponential backoff setTimeout(() => { reconnectWebSocket(); }, Math.min(1000 * Math.pow(2, reconnectAttempts), 30000)); } };

function reconnectWebSocket() { // Implement reconnection logic with backoff reconnectAttempts++; // ... reconnection code }

Rate Limiting

Rate Limit Headers

All responses include rate limiting information:

http
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1640995200
X-RateLimit-Window: 3600

Rate Limits by Endpoint

EndpointLimitWindowScope
POST /evaluations501 hourPer API key
GET /evaluations10001 hourPer API key
GET /evaluations/{id}20001 hourPer API key
POST /tools/test1001 hourPer API key
WebSocket connections10ConcurrentPer API key

Rate Limit Handling

javascript
async function handleRateLimit(response) {
  if (response.status === 429) {
    const resetTime = response.headers.get('X-RateLimit-Reset');
    const waitTime = (resetTime * 1000) - Date.now();
    
    console.log(Rate limited. Waiting ${waitTime}ms);
    await new Promise(resolve => setTimeout(resolve, waitTime));
    
    // Retry the request
    return makeRequest();
  }
  
  return response;
}

SDK Usage Examples

Node.js SDK

javascript
const { MCPEvaluatorClient } = require('mcp-evaluator-sdk');

const client = new MCPEvaluatorClient({ baseUrl: 'http://localhost:3457', apiKey: 'your-api-key', timeout: 30000, retries: 3 });

// Start evaluation const evaluation = await client.evaluations.create({ serverPath: '/path/to/server', options: { transport: 'stdio', runStatic: true, runRuntime: true } });

console.log('Evaluation started:', evaluation.id);

// Monitor progress with events client.on('evaluation:progress', (data) => { console.log(Progress: ${data.progress.percentage}%); });

client.on('evaluation:completed', async (data) => { console.log('Completed with score:', data.score); // Generate report const report = await client.reports.create(data.id, { format: 'markdown', includeEvidence: true }); console.log('Report URL:', report.downloadUrl); });

// Subscribe to specific evaluation await client.subscribe(evaluation.id);

Python SDK

python
from mcp_evaluator import MCPEvaluatorClient

client = MCPEvaluatorClient( base_url='http://localhost:3457', api_key='your-api-key', timeout=30 )

<h1 class="text-3xl font-bold mt-12 mb-8">Start evaluation</h1> evaluation = client.evaluations.create( server_path='/path/to/server', options={ 'transport': 'stdio', 'run_static': True, 'run_runtime': True } )

print(f"Evaluation started: {evaluation['id']}")

<h1 class="text-3xl font-bold mt-12 mb-8">Wait for completion</h1> result = client.evaluations.wait_for_completion( evaluation['id'], timeout=300 )

print(f"Score: {result['score']}")

<h1 class="text-3xl font-bold mt-12 mb-8">Generate and download report</h1> report = client.reports.create( evaluation['id'], format='markdown' )

with open('evaluation-report.md', 'w') as f: f.write(client.reports.download(report['reportId']))

CLI Integration

bash
<h1 class="text-3xl font-bold mt-12 mb-8">Start evaluation and get JSON response</h1>
RESULT=$(mcp-evaluate /path/to/server --json --ci)
SCORE=$(echo "$RESULT" | jq '.score')

if (( $(echo "$SCORE >= 80" | bc -l) )); then echo "✅ Evaluation passed with score: $SCORE" exit 0 else echo "❌ Evaluation failed with score: $SCORE" exit 1 fi

cURL Examples

bash
<h1 class="text-3xl font-bold mt-12 mb-8">Start evaluation</h1>
curl -X POST http://localhost:3457/api/v1/evaluations \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "serverPath": "/path/to/server",
    "options": {
      "transport": "stdio",
      "runStatic": true,
      "runRuntime": true
    }
  }'

<h1 class="text-3xl font-bold mt-12 mb-8">Monitor with polling</h1> while true; do STATUS=$(curl -s -H "Authorization: Bearer $API_KEY" \ http://localhost:3457/api/v1/evaluations/$EVAL_ID | jq -r '.status') if [[ "$STATUS" == "completed" || "$STATUS" == "failed" ]]; then break fi sleep 5 done

<h1 class="text-3xl font-bold mt-12 mb-8">Get final results</h1> curl -H "Authorization: Bearer $API_KEY" \ http://localhost:3457/api/v1/evaluations/$EVAL_ID

Version Compatibility

API Version Matrix

SDK VersionAPI VersionNode.jsPython
1.0.xv1≥14.0≥3.7
0.9.xv1, v0≥14.0≥3.7
0.8.xv0≥12.0≥3.6

Breaking Changes

#### v1.0.0 (Current)

  • Added: WebSocket authentication support
  • Changed: Response envelope format
  • Deprecated: v0 API endpoints
  • Removed: Legacy callback-style events
#### Migration Guide v0 → v1
javascript
// v0 (deprecated)
const result = await client.evaluate('/path/to/server');
console.log(result.score);

// v1 (current) const evaluation = await client.evaluations.create({ serverPath: '/path/to/server' }); const result = await client.evaluations.waitForCompletion(evaluation.id); console.log(result.data.score);

Backward Compatibility

The API maintains backward compatibility for:

  • • Core evaluation functionality
  • • Basic WebSocket events
  • • Report generation formats
  • • Authentication mechanisms
This comprehensive API reference provides all the information needed to integrate with the MCP Testing Evaluation Framework programmatically.