How do I know if my MCP server is slow?

Add timing to every tool execution and log it to stderr. The MCP Inspector also shows response times. Aim for under 1 second per tool call. If a tool regularly takes more than 3 seconds, it needs optimization.

Should I use Redis for caching instead of in-memory?

In-memory caching is simpler and faster for single-instance servers. Use Redis when you have multiple server instances (horizontal scaling) that need to share a cache, or when you need cache persistence across restarts.

How many concurrent connections should my pool have?

Start with 5-10 connections for database pools. Monitor usage and increase if you see connection wait times. For HTTP, 10-20 concurrent sockets is usually sufficient. Too many connections can overload the downstream service.

Does response size affect AI performance?

Yes. Larger responses consume more tokens and increase the AI's processing time. Very large responses can also hit context window limits. Keep responses focused and concise -- return summaries with the option to fetch details.

Performance Optimization for MCP Servers

Optimize your MCP server for speed and efficiency with caching strategies, connection pooling, async patterns, and response size management.

title: "Performance Optimization for MCP Servers" description: "Optimize your MCP server for speed and efficiency with caching strategies, connection pooling, async patterns, and response size management." order: 5 keywords:

MCP performance
MCP optimization
MCP caching
MCP connection pooling
MCP async patterns date: "2026-04-01"

Quick Summary

Optimize your MCP server for speed, efficiency, and responsiveness. Covers caching strategies, connection pooling, async concurrency patterns, response size management, and memory optimization techniques that keep your server fast under load.

Why Performance Matters

AI assistants often call multiple tools in sequence to answer a single user question. Slow tools compound: if each tool takes 3 seconds, a 4-tool chain takes 12 seconds. Users notice latency above 2-3 seconds, so keeping individual tool calls fast is critical.

< 1starget response time per tool call

Caching Strategies

Cache Expensive Results

If your tool fetches data that does not change frequently, cache the results. A simple in-memory TTL cache can cut response times from seconds to milliseconds for repeated queries.

In-Memory TTL Cache

class TTLCache<T> {
  private cache = new Map<string, { value: T; expiresAt: number }>();

  constructor(private ttlMs: number = 60000) {}

  get(key: string): T | undefined {
    const entry = this.cache.get(key);
    if (!entry) return undefined;
    if (Date.now() > entry.expiresAt) {
      this.cache.delete(key);
      return undefined;
    }
    return entry.value;
  }

  set(key: string, value: T): void {
    this.cache.set(key, {
      value,
      expiresAt: Date.now() + this.ttlMs,
    });
  }

  clear(): void {
    this.cache.clear();
  }

  get size(): number {
    return this.cache.size;
  }
}

// Usage in a tool
const weatherCache = new TTLCache<string>(300000); // 5 minute TTL

async execute(input: { city: string }): Promise<string> {
  const cacheKey = `weather:${input.city.toLowerCase()}`;
  const cached = weatherCache.get(cacheKey);
  if (cached) return cached;

  const result = await fetchWeather(input.city);
  const response = JSON.stringify(result);
  weatherCache.set(cacheKey, response);
  return response;
}

Cache Invalidation Patterns

Choose the Right Cache TTL

Match your cache TTL to data freshness requirements. Weather data: 5-15 minutes. API rate limits: 60 seconds. Database schemas: 1 hour. User data: 30-60 seconds. When in doubt, shorter TTLs are safer.

// Different TTLs for different data types
const caches = {
  schemas: new TTLCache(3600000),    // 1 hour - schemas rarely change
  queries: new TTLCache(30000),       // 30 seconds - data changes often
  metadata: new TTLCache(300000),     // 5 minutes - moderate freshness
};

Connection Pooling

Pool Database and HTTP Connections

Creating a new connection for every tool call is expensive. Use connection pools for databases and keep-alive connections for HTTP APIs. One pool shared across all tools is the standard pattern.

Database Connection Pool

import { Pool } from "pg";

// Create once, share across all tools
const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 10,           // Maximum connections
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 5000,
});

// In your tool
async execute(input: { sql: string }): Promise<string> {
  const client = await pool.connect();
  try {
    const result = await client.query(input.sql);
    return JSON.stringify(result.rows);
  } finally {
    client.release(); // Return to pool, don't close
  }
}

HTTP Agent Reuse

import { Agent } from "http";

const httpAgent = new Agent({
  keepAlive: true,
  maxSockets: 10,
  keepAliveMsecs: 30000,
});

// Reuse agent across all fetch calls
const response = await fetch(url, { agent: httpAgent } as any);

Async Concurrency Patterns

Use Promise.all for Independent Operations

When a tool needs to fetch data from multiple sources, run the requests concurrently with Promise.all instead of sequentially. This can cut response time in half or more.

// Bad: sequential (6 seconds total if each takes 2s)
async execute(input: { userId: string }): Promise<string> {
  const user = await fetchUser(input.userId);
  const posts = await fetchPosts(input.userId);
  const stats = await fetchStats(input.userId);
  return JSON.stringify({ user, posts, stats });
}

// Good: concurrent (2 seconds total)
async execute(input: { userId: string }): Promise<string> {
  const [user, posts, stats] = await Promise.all([
    fetchUser(input.userId),
    fetchPosts(input.userId),
    fetchStats(input.userId),
  ]);
  return JSON.stringify({ user, posts, stats });
}

Promise.allSettled for Partial Results

// Even better: handle individual failures gracefully
async execute(input: { userId: string }): Promise<string> {
  const results = await Promise.allSettled([
    fetchUser(input.userId),
    fetchPosts(input.userId),
    fetchStats(input.userId),
  ]);

  return JSON.stringify({
    user: results[0].status === "fulfilled" ? results[0].value : null,
    posts: results[1].status === "fulfilled" ? results[1].value : null,
    stats: results[2].status === "fulfilled" ? results[2].value : null,
    errors: results
      .filter((r) => r.status === "rejected")
      .map((r) => (r as PromiseRejectedResult).reason.message),
  });
}

Response Size Management

Keep Responses Under 10KB

Large tool responses waste tokens and slow down the AI assistant's processing. Summarize large datasets, paginate results, and truncate long text fields. Aim for responses under 10KB.

function limitResponse(data: unknown[], maxItems: number = 20): {
  items: unknown[];
  total: number;
  truncated: boolean;
} {
  return {
    items: data.slice(0, maxItems),
    total: data.length,
    truncated: data.length > maxItems,
  };
}

// In your tool
async execute(input: { query: string }): Promise<string> {
  const allResults = await search(input.query);
  const limited = limitResponse(allResults, 20);
  return JSON.stringify(limited, null, 2);
}

Timeout Management

Set Timeouts on All External Calls

Never let a tool hang indefinitely waiting for an external service. Set timeouts on every HTTP request, database query, and external call. Return a clear timeout error if the deadline is exceeded.

async function fetchWithTimeout(
  url: string,
  timeoutMs: number = 10000
): Promise<Response> {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), timeoutMs);

  try {
    return await fetch(url, { signal: controller.signal });
  } finally {
    clearTimeout(timeout);
  }
}

Lazy Initialization

Initialize Expensive Resources Lazily

Do not connect to databases or initialize heavy clients at import time. Initialize them on first use. This speeds up server startup and avoids failures when optional services are unavailable.

class DatabaseService {
  private pool?: Pool;

  private getPool(): Pool {
    if (!this.pool) {
      this.pool = new Pool({
        connectionString: process.env.DATABASE_URL,
        max: 10,
      });
    }
    return this.pool;
  }

  async query(sql: string, params?: unknown[]): Promise<unknown[]> {
    const pool = this.getPool();
    const result = await pool.query(sql, params);
    return result.rows;
  }
}

Performance Monitoring

function withTiming<T>(
  name: string,
  fn: () => Promise<T>
): Promise<T> {
  const start = performance.now();
  return fn().finally(() => {
    const duration = performance.now() - start;
    console.error(`[perf] ${name}: ${duration.toFixed(1)}ms`);
  });
}

// Usage
const result = await withTiming("fetch-weather", () =>
  fetchWeather(latitude, longitude)
);

Optimization Checklist

| Area | Optimization | Impact | |------|-------------|--------| | Caching | TTL cache for repeated queries | High | | Connections | Pool database connections | High | | Concurrency | Promise.all for independent calls | High | | Responses | Limit result sizes | Medium | | Timeouts | Set on all external calls | Medium | | Initialization | Lazy load expensive resources | Low-Medium | | Monitoring | Time all operations | Essential |