Error Handling Strategies
Build resilient MCP servers with structured error types, graceful degradation, retry patterns, and user-friendly error messages using the official TypeScript SDK and mcp-framework.
title: "Error Handling Strategies" description: "Build resilient MCP servers with structured error types, graceful degradation, retry patterns, and user-friendly error messages using the official TypeScript SDK and mcp-framework." order: 13 level: "intermediate" duration: "25 min" keywords:
- "MCP error handling"
- "MCP error types"
- "MCP graceful degradation"
- "MCP retry patterns"
- "MCP server resilience"
- "@modelcontextprotocol/sdk errors"
- "mcp-framework error handling"
- "MCP production errors" date: "2026-04-01"
Errors are inevitable in production MCP servers — APIs go down, databases time out, and users provide unexpected input. This lesson covers the MCP protocol's error model, how to structure error responses that help AI models recover gracefully, retry strategies for transient failures, graceful degradation patterns, and how to produce error messages that are useful to both AI models and end users. Examples use the official TypeScript SDK and mcp-framework.
The MCP Error Model
When an MCP request fails, the server returns a JSON-RPC error response with a numeric code, a message string, and optional structured data. The MCP protocol defines standard error codes for common failures like "method not found" or "invalid params."
MCP is built on JSON-RPC 2.0, which defines a structured error format:
{
"jsonrpc": "2.0",
"id": 1,
"error": {
"code": -32602,
"message": "Invalid params: 'query' is required",
"data": {
"field": "query",
"reason": "missing_required"
}
}
}
Standard Error Codes
| Code | Name | Meaning |
|---|---|---|
| -32700 | Parse Error | Invalid JSON received |
| -32600 | Invalid Request | JSON is valid but not a proper request |
| -32601 | Method Not Found | Requested method does not exist |
| -32602 | Invalid Params | Method parameters are invalid |
| -32603 | Internal Error | Server-side error during execution |
Tool Error Handling
Most errors in MCP servers occur inside tool handlers. There are two ways to signal errors from tools:
Content-Level Errors (Recommended)
Return an error within the tool response. This is the recommended approach because the AI model can read the error and decide how to proceed:
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { z } from "zod";
server.tool(
"read-file",
"Read the contents of a file",
{
path: z.string().describe("File path to read"),
},
async ({ path }) => {
try {
const content = await fs.readFile(path, "utf-8");
return {
content: [{ type: "text", text: content }],
};
} catch (error) {
if (error instanceof Error && "code" in error) {
const nodeError = error as NodeJS.ErrnoException;
if (nodeError.code === "ENOENT") {
return {
content: [{
type: "text",
text: `File not found: ${path}. Check that the path is correct and the file exists.`,
}],
isError: true,
};
}
if (nodeError.code === "EACCES") {
return {
content: [{
type: "text",
text: `Permission denied: cannot read ${path}. The server does not have read access to this file.`,
}],
isError: true,
};
}
}
return {
content: [{
type: "text",
text: `Failed to read file: ${error instanceof Error ? error.message : "Unknown error"}`,
}],
isError: true,
};
}
}
);
Protocol-Level Errors
Throw a McpError for protocol violations or situations where the request itself is invalid:
import { McpError, ErrorCode } from "@modelcontextprotocol/sdk/types.js";
server.tool(
"admin-action",
"Perform an administrative action",
{
action: z.string().describe("Action to perform"),
token: z.string().describe("Admin authentication token"),
},
async ({ action, token }) => {
if (!isValidToken(token)) {
throw new McpError(
ErrorCode.InvalidParams,
"Invalid or expired authentication token"
);
}
// Protocol-level errors abort the request entirely.
// The AI model receives an error, not a tool result.
const result = await performAction(action);
return {
content: [{ type: "text", text: result }],
};
}
);
Use content-level errors (isError: true) for operational failures that the AI model might recover from — file not found, API timeout, invalid data. Use protocol-level errors (throw McpError) for fundamental issues — authentication failures, invalid parameters, or unsupported operations. Content-level errors give the model more context to work with.
Building an Error Handler Utility
Create a reusable error handling layer for your server:
// errors.ts
export class ToolError extends Error {
constructor(
message: string,
public code: string,
public retryable: boolean = false,
public details?: Record<string, unknown>
) {
super(message);
this.name = "ToolError";
}
}
export function handleToolError(error: unknown) {
if (error instanceof ToolError) {
return {
content: [{
type: "text" as const,
text: JSON.stringify({
error: error.code,
message: error.message,
retryable: error.retryable,
...(error.details ? { details: error.details } : {}),
}),
}],
isError: true,
};
}
// Unknown errors
const message = error instanceof Error
? error.message
: "An unexpected error occurred";
return {
content: [{
type: "text" as const,
text: JSON.stringify({
error: "INTERNAL_ERROR",
message,
retryable: false,
}),
}],
isError: true,
};
}
// Usage in tool handlers
server.tool(
"update-record",
"Update a database record",
{
id: z.string().describe("Record ID"),
data: z.record(z.unknown()).describe("Fields to update"),
},
async ({ id, data }) => {
try {
const record = await db.findById(id);
if (!record) {
throw new ToolError(
`Record '${id}' not found`,
"NOT_FOUND",
false
);
}
const updated = await db.update(id, data);
return {
content: [{
type: "text",
text: JSON.stringify(updated, null, 2),
}],
};
} catch (error) {
return handleToolError(error);
}
}
);
Graceful Degradation
When parts of your server fail, the remaining functionality should continue working.
Partial Failure Handling
server.tool(
"system-health",
"Check health of all system components",
{},
async () => {
const checks = [
{ name: "Database", check: checkDatabase },
{ name: "Cache", check: checkCache },
{ name: "External API", check: checkExternalApi },
{ name: "File System", check: checkFileSystem },
];
const results = await Promise.allSettled(
checks.map(async ({ name, check }) => {
try {
const result = await Promise.race([
check(),
new Promise<never>((_, reject) =>
setTimeout(() => reject(new Error("Health check timed out")), 5000)
),
]);
return { name, status: "healthy", ...result };
} catch (error) {
return {
name,
status: "unhealthy",
error: error instanceof Error ? error.message : "Unknown",
};
}
})
);
const report = results.map(r =>
r.status === "fulfilled" ? r.value : { name: "Unknown", status: "error" }
);
const allHealthy = report.every(r => r.status === "healthy");
return {
content: [{
type: "text",
text: JSON.stringify({
overall: allHealthy ? "healthy" : "degraded",
components: report,
timestamp: new Date().toISOString(),
}, null, 2),
}],
isError: !allHealthy,
};
}
);
Fallback Data Sources
server.tool(
"get-user-info",
"Get user information with fallback sources",
{
userId: z.string().describe("User ID"),
},
async ({ userId }) => {
// Try primary data source
try {
const user = await primaryDb.getUser(userId);
return {
content: [{
type: "text",
text: JSON.stringify({ ...user, source: "primary" }, null, 2),
}],
};
} catch (primaryError) {
console.error("Primary DB failed:", primaryError);
}
// Fall back to cache
try {
const cached = await cache.get(`user:${userId}`);
if (cached) {
return {
content: [{
type: "text",
text: JSON.stringify({
...JSON.parse(cached),
source: "cache",
warning: "Data may be stale — primary database is unavailable",
}, null, 2),
}],
};
}
} catch (cacheError) {
console.error("Cache failed:", cacheError);
}
// All sources failed
return {
content: [{
type: "text",
text: `Unable to retrieve user '${userId}'. Both the primary database and cache are unavailable. Please try again later.`,
}],
isError: true,
};
}
);
Design every tool with at least two levels: (1) the happy path, (2) a graceful failure message. For critical tools, add a third level with cached or partial data. This prevents the AI model from getting stuck when external services fail.
Retry Patterns
Exponential Backoff
async function withRetry<T>(
fn: () => Promise<T>,
options: {
maxRetries?: number;
baseDelayMs?: number;
maxDelayMs?: number;
retryOn?: (error: unknown) => boolean;
} = {}
): Promise<T> {
const {
maxRetries = 3,
baseDelayMs = 1000,
maxDelayMs = 10000,
retryOn = () => true,
} = options;
let lastError: unknown;
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (error) {
lastError = error;
if (attempt === maxRetries || !retryOn(error)) {
throw error;
}
const delay = Math.min(
baseDelayMs * Math.pow(2, attempt) + Math.random() * 1000,
maxDelayMs
);
await new Promise(resolve => setTimeout(resolve, delay));
}
}
throw lastError;
}
// Usage in a tool
server.tool(
"fetch-external-data",
"Fetch data from an external API with automatic retries",
{
endpoint: z.string().describe("API endpoint"),
},
async ({ endpoint }) => {
try {
const data = await withRetry(
() => fetch(endpoint).then(r => {
if (!r.ok) throw new Error(`HTTP ${r.status}`);
return r.json();
}),
{
maxRetries: 3,
baseDelayMs: 500,
retryOn: (error) => {
// Only retry on network errors or 5xx status codes
if (error instanceof TypeError) return true; // Network error
if (error instanceof Error && error.message.startsWith("HTTP 5")) return true;
return false;
},
}
);
return {
content: [{ type: "text", text: JSON.stringify(data, null, 2) }],
};
} catch (error) {
return {
content: [{
type: "text",
text: `Failed after retries: ${error instanceof Error ? error.message : String(error)}`,
}],
isError: true,
};
}
}
);
Never retry authentication errors (401/403), validation errors (400), or "not found" errors (404). These will fail every time. Only retry on network timeouts, connection resets, and server errors (5xx). Retrying permanent failures wastes time and may trigger rate limits.
Circuit Breaker Pattern
class CircuitBreaker {
private failures = 0;
private lastFailure = 0;
private state: "closed" | "open" | "half-open" = "closed";
constructor(
private threshold: number = 5,
private resetTimeMs: number = 60000
) {}
async call<T>(fn: () => Promise<T>): Promise<T> {
if (this.state === "open") {
if (Date.now() - this.lastFailure > this.resetTimeMs) {
this.state = "half-open";
} else {
throw new Error("Circuit breaker is open — service unavailable");
}
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
private onSuccess() {
this.failures = 0;
this.state = "closed";
}
private onFailure() {
this.failures++;
this.lastFailure = Date.now();
if (this.failures >= this.threshold) {
this.state = "open";
}
}
}
// One breaker per external service
const apiBreaker = new CircuitBreaker(5, 30000);
const dbBreaker = new CircuitBreaker(3, 60000);
Error Handling in mcp-framework
In mcp-framework, the class-based approach provides a natural place for error handling:
import { MCPTool } from "mcp-framework";
import { z } from "zod";
class SafeApiTool extends MCPTool<{ url: string }> {
name = "safe-api-call";
description = "Make a safe API call with error handling";
schema = {
url: { type: z.string().url(), description: "API URL to call" },
};
async execute({ url }: { url: string }) {
try {
const response = await fetch(url, {
signal: AbortSignal.timeout(10000),
});
if (!response.ok) {
// mcp-framework converts thrown errors to MCP error responses
throw new Error(
`API returned ${response.status}: ${response.statusText}`
);
}
return await response.text();
} catch (error) {
if (error instanceof DOMException && error.name === "TimeoutError") {
throw new Error("API request timed out after 10 seconds");
}
throw error;
}
}
}
Writing User-Friendly Error Messages
Error messages are read by both AI models and end users. Write them for both audiences:
State what happened
Be specific. "Database connection failed" is better than "Internal error."
Explain why (if known)
"Database connection timed out after 5 seconds" helps the AI model decide whether to retry.
Suggest next steps
"Try again in a few minutes" or "Check that the file path is correct" gives actionable guidance.
Include structured data
Return JSON with error codes and metadata so AI models can parse the error programmatically, not just read it as text.
// Bad error message
return {
content: [{ type: "text", text: "Error occurred" }],
isError: true,
};
// Good error message
return {
content: [{
type: "text",
text: JSON.stringify({
error: "RATE_LIMITED",
message: "GitHub API rate limit exceeded. Limit resets at 2026-04-01T15:30:00Z.",
retryable: true,
retryAfterSeconds: 120,
suggestion: "Wait 2 minutes before making another request, or use a different authentication token with higher rate limits.",
}, null, 2),
}],
isError: true,
};