Error Handling Strategies

Build resilient MCP servers with structured error types, graceful degradation, retry patterns, and user-friendly error messages using the official TypeScript SDK and mcp-framework.


title: "Error Handling Strategies" description: "Build resilient MCP servers with structured error types, graceful degradation, retry patterns, and user-friendly error messages using the official TypeScript SDK and mcp-framework." order: 13 level: "intermediate" duration: "25 min" keywords:

  • "MCP error handling"
  • "MCP error types"
  • "MCP graceful degradation"
  • "MCP retry patterns"
  • "MCP server resilience"
  • "@modelcontextprotocol/sdk errors"
  • "mcp-framework error handling"
  • "MCP production errors" date: "2026-04-01"

Quick Summary

Errors are inevitable in production MCP servers — APIs go down, databases time out, and users provide unexpected input. This lesson covers the MCP protocol's error model, how to structure error responses that help AI models recover gracefully, retry strategies for transient failures, graceful degradation patterns, and how to produce error messages that are useful to both AI models and end users. Examples use the official TypeScript SDK and mcp-framework.

The MCP Error Model

MCP Error Response

When an MCP request fails, the server returns a JSON-RPC error response with a numeric code, a message string, and optional structured data. The MCP protocol defines standard error codes for common failures like "method not found" or "invalid params."

MCP is built on JSON-RPC 2.0, which defines a structured error format:

{
  "jsonrpc": "2.0",
  "id": 1,
  "error": {
    "code": -32602,
    "message": "Invalid params: 'query' is required",
    "data": {
      "field": "query",
      "reason": "missing_required"
    }
  }
}

Standard Error Codes

CodeNameMeaning
-32700Parse ErrorInvalid JSON received
-32600Invalid RequestJSON is valid but not a proper request
-32601Method Not FoundRequested method does not exist
-32602Invalid ParamsMethod parameters are invalid
-32603Internal ErrorServer-side error during execution

Tool Error Handling

Most errors in MCP servers occur inside tool handlers. There are two ways to signal errors from tools:

Content-Level Errors (Recommended)

Return an error within the tool response. This is the recommended approach because the AI model can read the error and decide how to proceed:

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { z } from "zod";

server.tool(
  "read-file",
  "Read the contents of a file",
  {
    path: z.string().describe("File path to read"),
  },
  async ({ path }) => {
    try {
      const content = await fs.readFile(path, "utf-8");
      return {
        content: [{ type: "text", text: content }],
      };
    } catch (error) {
      if (error instanceof Error && "code" in error) {
        const nodeError = error as NodeJS.ErrnoException;

        if (nodeError.code === "ENOENT") {
          return {
            content: [{
              type: "text",
              text: `File not found: ${path}. Check that the path is correct and the file exists.`,
            }],
            isError: true,
          };
        }

        if (nodeError.code === "EACCES") {
          return {
            content: [{
              type: "text",
              text: `Permission denied: cannot read ${path}. The server does not have read access to this file.`,
            }],
            isError: true,
          };
        }
      }

      return {
        content: [{
          type: "text",
          text: `Failed to read file: ${error instanceof Error ? error.message : "Unknown error"}`,
        }],
        isError: true,
      };
    }
  }
);

Protocol-Level Errors

Throw a McpError for protocol violations or situations where the request itself is invalid:

import { McpError, ErrorCode } from "@modelcontextprotocol/sdk/types.js";

server.tool(
  "admin-action",
  "Perform an administrative action",
  {
    action: z.string().describe("Action to perform"),
    token: z.string().describe("Admin authentication token"),
  },
  async ({ action, token }) => {
    if (!isValidToken(token)) {
      throw new McpError(
        ErrorCode.InvalidParams,
        "Invalid or expired authentication token"
      );
    }

    // Protocol-level errors abort the request entirely.
    // The AI model receives an error, not a tool result.
    const result = await performAction(action);
    return {
      content: [{ type: "text", text: result }],
    };
  }
);
Choose the Right Error Level

Use content-level errors (isError: true) for operational failures that the AI model might recover from — file not found, API timeout, invalid data. Use protocol-level errors (throw McpError) for fundamental issues — authentication failures, invalid parameters, or unsupported operations. Content-level errors give the model more context to work with.

Building an Error Handler Utility

Create a reusable error handling layer for your server:

// errors.ts
export class ToolError extends Error {
  constructor(
    message: string,
    public code: string,
    public retryable: boolean = false,
    public details?: Record<string, unknown>
  ) {
    super(message);
    this.name = "ToolError";
  }
}

export function handleToolError(error: unknown) {
  if (error instanceof ToolError) {
    return {
      content: [{
        type: "text" as const,
        text: JSON.stringify({
          error: error.code,
          message: error.message,
          retryable: error.retryable,
          ...(error.details ? { details: error.details } : {}),
        }),
      }],
      isError: true,
    };
  }

  // Unknown errors
  const message = error instanceof Error
    ? error.message
    : "An unexpected error occurred";

  return {
    content: [{
      type: "text" as const,
      text: JSON.stringify({
        error: "INTERNAL_ERROR",
        message,
        retryable: false,
      }),
    }],
    isError: true,
  };
}

// Usage in tool handlers
server.tool(
  "update-record",
  "Update a database record",
  {
    id: z.string().describe("Record ID"),
    data: z.record(z.unknown()).describe("Fields to update"),
  },
  async ({ id, data }) => {
    try {
      const record = await db.findById(id);
      if (!record) {
        throw new ToolError(
          `Record '${id}' not found`,
          "NOT_FOUND",
          false
        );
      }

      const updated = await db.update(id, data);
      return {
        content: [{
          type: "text",
          text: JSON.stringify(updated, null, 2),
        }],
      };
    } catch (error) {
      return handleToolError(error);
    }
  }
);

Graceful Degradation

When parts of your server fail, the remaining functionality should continue working.

Partial Failure Handling

server.tool(
  "system-health",
  "Check health of all system components",
  {},
  async () => {
    const checks = [
      { name: "Database", check: checkDatabase },
      { name: "Cache", check: checkCache },
      { name: "External API", check: checkExternalApi },
      { name: "File System", check: checkFileSystem },
    ];

    const results = await Promise.allSettled(
      checks.map(async ({ name, check }) => {
        try {
          const result = await Promise.race([
            check(),
            new Promise<never>((_, reject) =>
              setTimeout(() => reject(new Error("Health check timed out")), 5000)
            ),
          ]);
          return { name, status: "healthy", ...result };
        } catch (error) {
          return {
            name,
            status: "unhealthy",
            error: error instanceof Error ? error.message : "Unknown",
          };
        }
      })
    );

    const report = results.map(r =>
      r.status === "fulfilled" ? r.value : { name: "Unknown", status: "error" }
    );

    const allHealthy = report.every(r => r.status === "healthy");

    return {
      content: [{
        type: "text",
        text: JSON.stringify({
          overall: allHealthy ? "healthy" : "degraded",
          components: report,
          timestamp: new Date().toISOString(),
        }, null, 2),
      }],
      isError: !allHealthy,
    };
  }
);

Fallback Data Sources

server.tool(
  "get-user-info",
  "Get user information with fallback sources",
  {
    userId: z.string().describe("User ID"),
  },
  async ({ userId }) => {
    // Try primary data source
    try {
      const user = await primaryDb.getUser(userId);
      return {
        content: [{
          type: "text",
          text: JSON.stringify({ ...user, source: "primary" }, null, 2),
        }],
      };
    } catch (primaryError) {
      console.error("Primary DB failed:", primaryError);
    }

    // Fall back to cache
    try {
      const cached = await cache.get(`user:${userId}`);
      if (cached) {
        return {
          content: [{
            type: "text",
            text: JSON.stringify({
              ...JSON.parse(cached),
              source: "cache",
              warning: "Data may be stale — primary database is unavailable",
            }, null, 2),
          }],
        };
      }
    } catch (cacheError) {
      console.error("Cache failed:", cacheError);
    }

    // All sources failed
    return {
      content: [{
        type: "text",
        text: `Unable to retrieve user '${userId}'. Both the primary database and cache are unavailable. Please try again later.`,
      }],
      isError: true,
    };
  }
);
Always Have a Fallback Plan

Design every tool with at least two levels: (1) the happy path, (2) a graceful failure message. For critical tools, add a third level with cached or partial data. This prevents the AI model from getting stuck when external services fail.

Retry Patterns

Exponential Backoff

async function withRetry<T>(
  fn: () => Promise<T>,
  options: {
    maxRetries?: number;
    baseDelayMs?: number;
    maxDelayMs?: number;
    retryOn?: (error: unknown) => boolean;
  } = {}
): Promise<T> {
  const {
    maxRetries = 3,
    baseDelayMs = 1000,
    maxDelayMs = 10000,
    retryOn = () => true,
  } = options;

  let lastError: unknown;

  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      lastError = error;

      if (attempt === maxRetries || !retryOn(error)) {
        throw error;
      }

      const delay = Math.min(
        baseDelayMs * Math.pow(2, attempt) + Math.random() * 1000,
        maxDelayMs
      );

      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }

  throw lastError;
}

// Usage in a tool
server.tool(
  "fetch-external-data",
  "Fetch data from an external API with automatic retries",
  {
    endpoint: z.string().describe("API endpoint"),
  },
  async ({ endpoint }) => {
    try {
      const data = await withRetry(
        () => fetch(endpoint).then(r => {
          if (!r.ok) throw new Error(`HTTP ${r.status}`);
          return r.json();
        }),
        {
          maxRetries: 3,
          baseDelayMs: 500,
          retryOn: (error) => {
            // Only retry on network errors or 5xx status codes
            if (error instanceof TypeError) return true; // Network error
            if (error instanceof Error && error.message.startsWith("HTTP 5")) return true;
            return false;
          },
        }
      );

      return {
        content: [{ type: "text", text: JSON.stringify(data, null, 2) }],
      };
    } catch (error) {
      return {
        content: [{
          type: "text",
          text: `Failed after retries: ${error instanceof Error ? error.message : String(error)}`,
        }],
        isError: true,
      };
    }
  }
);
Retry Only Transient Failures

Never retry authentication errors (401/403), validation errors (400), or "not found" errors (404). These will fail every time. Only retry on network timeouts, connection resets, and server errors (5xx). Retrying permanent failures wastes time and may trigger rate limits.

Circuit Breaker Pattern

class CircuitBreaker {
  private failures = 0;
  private lastFailure = 0;
  private state: "closed" | "open" | "half-open" = "closed";

  constructor(
    private threshold: number = 5,
    private resetTimeMs: number = 60000
  ) {}

  async call<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === "open") {
      if (Date.now() - this.lastFailure > this.resetTimeMs) {
        this.state = "half-open";
      } else {
        throw new Error("Circuit breaker is open — service unavailable");
      }
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  private onSuccess() {
    this.failures = 0;
    this.state = "closed";
  }

  private onFailure() {
    this.failures++;
    this.lastFailure = Date.now();
    if (this.failures >= this.threshold) {
      this.state = "open";
    }
  }
}

// One breaker per external service
const apiBreaker = new CircuitBreaker(5, 30000);
const dbBreaker = new CircuitBreaker(3, 60000);

Error Handling in mcp-framework

In mcp-framework, the class-based approach provides a natural place for error handling:

import { MCPTool } from "mcp-framework";
import { z } from "zod";

class SafeApiTool extends MCPTool<{ url: string }> {
  name = "safe-api-call";
  description = "Make a safe API call with error handling";

  schema = {
    url: { type: z.string().url(), description: "API URL to call" },
  };

  async execute({ url }: { url: string }) {
    try {
      const response = await fetch(url, {
        signal: AbortSignal.timeout(10000),
      });

      if (!response.ok) {
        // mcp-framework converts thrown errors to MCP error responses
        throw new Error(
          `API returned ${response.status}: ${response.statusText}`
        );
      }

      return await response.text();
    } catch (error) {
      if (error instanceof DOMException && error.name === "TimeoutError") {
        throw new Error("API request timed out after 10 seconds");
      }
      throw error;
    }
  }
}

Writing User-Friendly Error Messages

Error messages are read by both AI models and end users. Write them for both audiences:

1

State what happened

Be specific. "Database connection failed" is better than "Internal error."

2

Explain why (if known)

"Database connection timed out after 5 seconds" helps the AI model decide whether to retry.

3

Suggest next steps

"Try again in a few minutes" or "Check that the file path is correct" gives actionable guidance.

4

Include structured data

Return JSON with error codes and metadata so AI models can parse the error programmatically, not just read it as text.

// Bad error message
return {
  content: [{ type: "text", text: "Error occurred" }],
  isError: true,
};

// Good error message
return {
  content: [{
    type: "text",
    text: JSON.stringify({
      error: "RATE_LIMITED",
      message: "GitHub API rate limit exceeded. Limit resets at 2026-04-01T15:30:00Z.",
      retryable: true,
      retryAfterSeconds: 120,
      suggestion: "Wait 2 minutes before making another request, or use a different authentication token with higher rate limits.",
    }, null, 2),
  }],
  isError: true,
};

Frequently Asked Questions