Running MCP Servers in Production
Real-world considerations for deploying MCP servers to production. Covers monitoring, scaling, reliability patterns, error handling, security, and operational best practices.
title: "Running MCP Servers in Production" description: "Real-world considerations for deploying MCP servers to production. Covers monitoring, scaling, reliability patterns, error handling, security, and operational best practices." date: "2026-04-01" order: 4 keywords:
- MCP production
- MCP deployment
- MCP monitoring
- MCP scaling
- MCP server operations
- production MCP servers author: "MCP Academy"
Building an MCP server is one thing. Running it in production is another. This post covers the real-world concerns you will face when deploying MCP servers: transport selection, error handling, monitoring, scaling, security, and the operational patterns that keep servers reliable under load.
Beyond the Tutorial
Most MCP tutorials end once the server responds to a tool call. But production systems need to handle failures gracefully, scale under load, report their health, and operate securely. The good news is that MCP's clean protocol design makes production-grade servers very achievable. You just need to think about a few things beyond the happy path.
The state where a server is not just functionally correct, but also observable, resilient, secure, and operationally manageable. A production-ready MCP server handles errors gracefully, exposes health metrics, and can be deployed and updated without downtime.
Choosing the Right Transport
Your transport choice is the first production decision and it shapes your entire deployment architecture.
stdio — Local and Process-Managed
stdio is the simplest transport. The client spawns your server as a child process and communicates over standard input/output. This is ideal for:
- Claude Desktop integrations
- Local development tools
- Single-user scenarios
- Environments where the client manages the server lifecycle
In production, stdio servers are managed by the client process. If the client restarts, so does the server. There is no network layer to secure or monitor separately.
SSE and Streamable HTTP — Remote and Multi-Client
For servers that need to run independently of any specific client, HTTP-based transports are the production choice. They enable:
- Remote access from multiple clients
- Independent server lifecycle
- Standard HTTP infrastructure (load balancers, reverse proxies, TLS)
- Horizontal scaling
| Concern | stdio | HTTP-based (SSE / Streamable HTTP) |
|---|---|---|
| Deployment | Bundled with client | Independent service |
| Scaling | One instance per client | Shared across clients |
| Networking | None (pipes) | Standard HTTP |
| TLS/Security | Not applicable | Standard TLS, OAuth 2.1 |
| Load balancing | Not applicable | Standard HTTP load balancers |
| Monitoring | Client-side logging | Standard HTTP observability |
| Best for | Local tools, Claude Desktop | Shared services, team tools, APIs |
If your server is used by a single person on their machine, use stdio. If your server is shared across a team, accessed remotely, or needs to scale independently, use an HTTP-based transport.
Error Handling That Does Not Lie
In production, how your server handles errors matters as much as how it handles success. MCP defines structured error responses, and you should use them consistently.
Fail Clearly
When a tool fails, return a clear error message that helps the LLM (and the user) understand what went wrong. Do not swallow errors silently or return vague messages.
class DatabaseQueryTool extends MCPTool<typeof inputSchema> {
name = "query_database";
description = "Execute a read-only SQL query";
schema = {
query: z.string().describe("SQL SELECT statement"),
};
async execute({ query }) {
try {
const results = await this.db.query(query);
return JSON.stringify(results, null, 2);
} catch (error) {
if (error instanceof QuerySyntaxError) {
// Help the LLM fix the query
return `Query syntax error: ${error.message}. Check the SQL syntax near: ${error.position}`;
}
if (error instanceof ConnectionError) {
// Signal a transient failure
throw new Error("Database temporarily unavailable. Try again in a moment.");
}
// Unknown errors — log internally, return safe message
console.error("Unexpected database error:", error);
throw new Error("An unexpected error occurred while querying the database.");
}
}
}
Distinguish Transient From Permanent Failures
Not all errors are equal. A network timeout is transient — retrying might work. A malformed query is permanent — retrying will fail the same way. Your error messages should make this distinction clear so that LLMs and clients can decide whether to retry.
Production error messages should be helpful without leaking internal implementation details. Never include stack traces, connection strings, internal IP addresses, or credentials in error responses.
Validate Inputs Aggressively
Do not trust that the LLM will always send valid inputs. Validate everything at the boundary.
schema = {
query: z.string()
.min(1, "Query cannot be empty")
.max(10000, "Query too long")
.refine(
(q) => q.trim().toUpperCase().startsWith("SELECT"),
"Only SELECT queries are allowed"
),
};
Zod schemas in mcp-framework give you this validation layer for free. Use it thoroughly.
Monitoring and Observability
A production server you cannot observe is a production server you cannot trust.
Structured Logging
Log every tool call with enough context to debug issues after the fact. Use structured logging (JSON format) so your logs are searchable and parseable. Each log entry should include the tool name, a timestamp, the duration, and whether the call succeeded or failed.
Key Metrics to Track
At minimum, track these metrics for every MCP server in production:
- Request rate — How many tool calls per minute
- Error rate — What percentage of calls fail
- Latency — p50, p95, and p99 response times
- Active connections — How many clients are connected (for HTTP transports)
- Dependency health — Status of databases, APIs, and other backends your tools use
Health Checks
For HTTP-based servers, expose a health check endpoint that verifies not just that the process is running, but that dependencies (databases, APIs) are reachable. Return a 503 status when any critical dependency is down so your load balancer can route traffic elsewhere.
Set alerts on user-facing symptoms: error rate above threshold, latency above SLA, health check failures. Investigating the cause is the job of your logs and metrics, not your alerts.
Scaling Patterns
Stateless Tools Scale Horizontally
If your tools do not maintain state between calls (and most should not), scaling is straightforward. Run multiple instances behind a load balancer and let the infrastructure distribute requests.
Client -> Load Balancer -> MCP Server Instance 1
-> MCP Server Instance 2
-> MCP Server Instance 3
Each instance handles requests independently. Add instances to handle more load, remove them when demand drops.
When State Is Unavoidable
Some tools need state — a tool that manages an ongoing session, or a resource that watches a file for changes. For these cases:
- Externalize state — Use Redis, a database, or another shared store so any instance can access it
- Use sticky sessions — Route requests from the same client to the same instance (less preferred, limits scaling flexibility)
- Accept eventual consistency — If your state can tolerate brief inconsistency, you gain much simpler scaling
Design your tools to be stateless whenever possible. Stateless tools are easier to scale, test, and reason about. If you need to maintain context across calls, externalize that state to a shared store.
Security Considerations
Production MCP servers must treat all tool call inputs as untrusted. LLMs respond to user prompts, which can include adversarial instructions. Apply the same rigor you would for a public-facing API.
Treat all input from tool calls as untrusted user input. Apply input validation, use parameterized queries, validate file paths, and never pass unsanitized input to shell commands.
Key security practices:
- Least privilege — Database tools should connect with read-only users. File system tools should be restricted to specific directories.
- Input sanitization — Guard against SQL injection, path traversal, and command injection in every tool.
- Authentication — HTTP-based servers must authenticate clients. The MCP specification supports OAuth 2.1.
- TLS everywhere — All remote connections should use HTTPS. Log authentication attempts, both successful and failed.
- Rate limiting — Protect your server and its dependencies from overload, especially tools that call expensive external APIs.
Deployment Patterns
Docker is a natural fit for MCP servers. Use multi-stage builds to keep images small, run as a non-root user, and handle SIGTERM and SIGINT for graceful shutdown so in-flight requests complete before the process exits.
For zero-downtime deployments, use rolling updates: start new instances before stopping old ones and let the load balancer drain connections from retiring instances.
Before deploying, verify: structured logging is configured, health checks are exposed, error responses are safe and helpful, input validation is thorough, dependencies are monitored, graceful shutdown is handled, and security boundaries are enforced.
Putting It All Together
Running MCP servers in production is not fundamentally different from running any other service. The same principles apply: observe everything, fail gracefully, scale horizontally, and secure the boundaries.
What makes MCP servers unique is that their consumers are LLMs, which means:
- Error messages need to be understandable by both humans and AI
- Input validation needs to account for the creative ways LLMs might misuse tools
- Latency matters because it affects the perceived speed of AI interactions
- Reliability directly impacts user trust in the AI system
Get these right, and your MCP servers will be solid foundations for production AI applications.
Further Reading
- Getting Started with mcp-framework — Build your first server
- Introducing mcp-framework — Framework features and architecture
- Best Practices — Development patterns for MCP servers
- MCP Specification — The full protocol documentation