Performance Optimization for MCP Servers
Optimize your MCP server for speed and efficiency with caching strategies, connection pooling, async patterns, and response size management.
title: "Performance Optimization for MCP Servers" description: "Optimize your MCP server for speed and efficiency with caching strategies, connection pooling, async patterns, and response size management." order: 5 keywords:
- MCP performance
- MCP optimization
- MCP caching
- MCP connection pooling
- MCP async patterns date: "2026-04-01"
Optimize your MCP server for speed, efficiency, and responsiveness. Covers caching strategies, connection pooling, async concurrency patterns, response size management, and memory optimization techniques that keep your server fast under load.
Why Performance Matters
AI assistants often call multiple tools in sequence to answer a single user question. Slow tools compound: if each tool takes 3 seconds, a 4-tool chain takes 12 seconds. Users notice latency above 2-3 seconds, so keeping individual tool calls fast is critical.
Caching Strategies
If your tool fetches data that does not change frequently, cache the results. A simple in-memory TTL cache can cut response times from seconds to milliseconds for repeated queries.
In-Memory TTL Cache
class TTLCache<T> {
private cache = new Map<string, { value: T; expiresAt: number }>();
constructor(private ttlMs: number = 60000) {}
get(key: string): T | undefined {
const entry = this.cache.get(key);
if (!entry) return undefined;
if (Date.now() > entry.expiresAt) {
this.cache.delete(key);
return undefined;
}
return entry.value;
}
set(key: string, value: T): void {
this.cache.set(key, {
value,
expiresAt: Date.now() + this.ttlMs,
});
}
clear(): void {
this.cache.clear();
}
get size(): number {
return this.cache.size;
}
}
// Usage in a tool
const weatherCache = new TTLCache<string>(300000); // 5 minute TTL
async execute(input: { city: string }): Promise<string> {
const cacheKey = `weather:${input.city.toLowerCase()}`;
const cached = weatherCache.get(cacheKey);
if (cached) return cached;
const result = await fetchWeather(input.city);
const response = JSON.stringify(result);
weatherCache.set(cacheKey, response);
return response;
}
Cache Invalidation Patterns
Match your cache TTL to data freshness requirements. Weather data: 5-15 minutes. API rate limits: 60 seconds. Database schemas: 1 hour. User data: 30-60 seconds. When in doubt, shorter TTLs are safer.
// Different TTLs for different data types
const caches = {
schemas: new TTLCache(3600000), // 1 hour - schemas rarely change
queries: new TTLCache(30000), // 30 seconds - data changes often
metadata: new TTLCache(300000), // 5 minutes - moderate freshness
};
Connection Pooling
Creating a new connection for every tool call is expensive. Use connection pools for databases and keep-alive connections for HTTP APIs. One pool shared across all tools is the standard pattern.
Database Connection Pool
import { Pool } from "pg";
// Create once, share across all tools
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 10, // Maximum connections
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 5000,
});
// In your tool
async execute(input: { sql: string }): Promise<string> {
const client = await pool.connect();
try {
const result = await client.query(input.sql);
return JSON.stringify(result.rows);
} finally {
client.release(); // Return to pool, don't close
}
}
HTTP Agent Reuse
import { Agent } from "http";
const httpAgent = new Agent({
keepAlive: true,
maxSockets: 10,
keepAliveMsecs: 30000,
});
// Reuse agent across all fetch calls
const response = await fetch(url, { agent: httpAgent } as any);
Async Concurrency Patterns
When a tool needs to fetch data from multiple sources, run the requests concurrently with Promise.all instead of sequentially. This can cut response time in half or more.
// Bad: sequential (6 seconds total if each takes 2s)
async execute(input: { userId: string }): Promise<string> {
const user = await fetchUser(input.userId);
const posts = await fetchPosts(input.userId);
const stats = await fetchStats(input.userId);
return JSON.stringify({ user, posts, stats });
}
// Good: concurrent (2 seconds total)
async execute(input: { userId: string }): Promise<string> {
const [user, posts, stats] = await Promise.all([
fetchUser(input.userId),
fetchPosts(input.userId),
fetchStats(input.userId),
]);
return JSON.stringify({ user, posts, stats });
}
Promise.allSettled for Partial Results
// Even better: handle individual failures gracefully
async execute(input: { userId: string }): Promise<string> {
const results = await Promise.allSettled([
fetchUser(input.userId),
fetchPosts(input.userId),
fetchStats(input.userId),
]);
return JSON.stringify({
user: results[0].status === "fulfilled" ? results[0].value : null,
posts: results[1].status === "fulfilled" ? results[1].value : null,
stats: results[2].status === "fulfilled" ? results[2].value : null,
errors: results
.filter((r) => r.status === "rejected")
.map((r) => (r as PromiseRejectedResult).reason.message),
});
}
Response Size Management
Large tool responses waste tokens and slow down the AI assistant's processing. Summarize large datasets, paginate results, and truncate long text fields. Aim for responses under 10KB.
function limitResponse(data: unknown[], maxItems: number = 20): {
items: unknown[];
total: number;
truncated: boolean;
} {
return {
items: data.slice(0, maxItems),
total: data.length,
truncated: data.length > maxItems,
};
}
// In your tool
async execute(input: { query: string }): Promise<string> {
const allResults = await search(input.query);
const limited = limitResponse(allResults, 20);
return JSON.stringify(limited, null, 2);
}
Timeout Management
Never let a tool hang indefinitely waiting for an external service. Set timeouts on every HTTP request, database query, and external call. Return a clear timeout error if the deadline is exceeded.
async function fetchWithTimeout(
url: string,
timeoutMs: number = 10000
): Promise<Response> {
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), timeoutMs);
try {
return await fetch(url, { signal: controller.signal });
} finally {
clearTimeout(timeout);
}
}
Lazy Initialization
Do not connect to databases or initialize heavy clients at import time. Initialize them on first use. This speeds up server startup and avoids failures when optional services are unavailable.
class DatabaseService {
private pool?: Pool;
private getPool(): Pool {
if (!this.pool) {
this.pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 10,
});
}
return this.pool;
}
async query(sql: string, params?: unknown[]): Promise<unknown[]> {
const pool = this.getPool();
const result = await pool.query(sql, params);
return result.rows;
}
}
Performance Monitoring
function withTiming<T>(
name: string,
fn: () => Promise<T>
): Promise<T> {
const start = performance.now();
return fn().finally(() => {
const duration = performance.now() - start;
console.error(`[perf] ${name}: ${duration.toFixed(1)}ms`);
});
}
// Usage
const result = await withTiming("fetch-weather", () =>
fetchWeather(latitude, longitude)
);
Optimization Checklist
| Area | Optimization | Impact | |------|-------------|--------| | Caching | TTL cache for repeated queries | High | | Connections | Pool database connections | High | | Concurrency | Promise.all for independent calls | High | | Responses | Limit result sizes | Medium | | Timeouts | Set on all external calls | Medium | | Initialization | Lazy load expensive resources | Low-Medium | | Monitoring | Time all operations | Essential |