Should I use stdio or HTTP transport in production?

It depends on your deployment model. Use stdio for local, single-user tools like Claude Desktop integrations. Use HTTP-based transports (SSE or Streamable HTTP) for shared, remote, or multi-client servers.

How do I monitor an MCP server running over stdio?

Since stdio servers run as child processes, their logs go to stderr. Configure structured logging to stderr, and have your client or process manager capture and forward those logs to your observability stack.

What is a reasonable latency target for MCP tool calls?

Aim for under 500ms for simple tools and under 2 seconds for tools that call external services. Users are waiting for the LLM response, and slow tools make the entire AI interaction feel sluggish.

How do I handle tool calls that take a long time?

For long-running operations, consider returning an acknowledgment immediately and providing a separate tool to check the status. This pattern prevents timeouts and gives the LLM a way to report progress to the user.

Can I deploy MCP servers on serverless platforms?

Serverless works well for HTTP-based MCP servers with stateless tools. Be aware of cold start latency and execution time limits. For servers with persistent connections or state, a traditional container deployment is usually better.

April 1, 2026

Running MCP Servers in Production

Real-world considerations for deploying MCP servers to production. Covers monitoring, scaling, reliability patterns, error handling, security, and operational best practices.

title: "Running MCP Servers in Production" description: "Real-world considerations for deploying MCP servers to production. Covers monitoring, scaling, reliability patterns, error handling, security, and operational best practices." date: "2026-04-01" order: 4 keywords:

MCP production
MCP deployment
MCP monitoring
MCP scaling
MCP server operations
production MCP servers author: "MCP Academy"

Quick Summary

Building an MCP server is one thing. Running it in production is another. This post covers the real-world concerns you will face when deploying MCP servers: transport selection, error handling, monitoring, scaling, security, and the operational patterns that keep servers reliable under load.

Beyond the Tutorial

Most MCP tutorials end once the server responds to a tool call. But production systems need to handle failures gracefully, scale under load, report their health, and operate securely. The good news is that MCP's clean protocol design makes production-grade servers very achievable. You just need to think about a few things beyond the happy path.

Production Readiness

The state where a server is not just functionally correct, but also observable, resilient, secure, and operationally manageable. A production-ready MCP server handles errors gracefully, exposes health metrics, and can be deployed and updated without downtime.

Choosing the Right Transport

Your transport choice is the first production decision and it shapes your entire deployment architecture.

stdio — Local and Process-Managed

stdio is the simplest transport. The client spawns your server as a child process and communicates over standard input/output. This is ideal for:

Claude Desktop integrations
Local development tools
Single-user scenarios
Environments where the client manages the server lifecycle

In production, stdio servers are managed by the client process. If the client restarts, so does the server. There is no network layer to secure or monitor separately.

SSE and Streamable HTTP — Remote and Multi-Client

For servers that need to run independently of any specific client, HTTP-based transports are the production choice. They enable:

Remote access from multiple clients
Independent server lifecycle
Standard HTTP infrastructure (load balancers, reverse proxies, TLS)
Horizontal scaling

Concern	stdio	HTTP-based (SSE / Streamable HTTP)
Deployment	Bundled with client	Independent service
Scaling	One instance per client	Shared across clients
Networking	None (pipes)	Standard HTTP
TLS/Security	Not applicable	Standard TLS, OAuth 2.1
Load balancing	Not applicable	Standard HTTP load balancers
Monitoring	Client-side logging	Standard HTTP observability
Best for	Local tools, Claude Desktop	Shared services, team tools, APIs

Transport Selection Rule

If your server is used by a single person on their machine, use stdio. If your server is shared across a team, accessed remotely, or needs to scale independently, use an HTTP-based transport.

Error Handling That Does Not Lie

In production, how your server handles errors matters as much as how it handles success. MCP defines structured error responses, and you should use them consistently.

Fail Clearly

When a tool fails, return a clear error message that helps the LLM (and the user) understand what went wrong. Do not swallow errors silently or return vague messages.

class DatabaseQueryTool extends MCPTool<typeof inputSchema> {
  name = "query_database";
  description = "Execute a read-only SQL query";

  schema = {
    query: z.string().describe("SQL SELECT statement"),
  };

  async execute({ query }) {
    try {
      const results = await this.db.query(query);
      return JSON.stringify(results, null, 2);
    } catch (error) {
      if (error instanceof QuerySyntaxError) {
        // Help the LLM fix the query
        return `Query syntax error: ${error.message}. Check the SQL syntax near: ${error.position}`;
      }
      if (error instanceof ConnectionError) {
        // Signal a transient failure
        throw new Error("Database temporarily unavailable. Try again in a moment.");
      }
      // Unknown errors — log internally, return safe message
      console.error("Unexpected database error:", error);
      throw new Error("An unexpected error occurred while querying the database.");
    }
  }
}

Distinguish Transient From Permanent Failures

Not all errors are equal. A network timeout is transient — retrying might work. A malformed query is permanent — retrying will fail the same way. Your error messages should make this distinction clear so that LLMs and clients can decide whether to retry.

Never Expose Internal Details

Production error messages should be helpful without leaking internal implementation details. Never include stack traces, connection strings, internal IP addresses, or credentials in error responses.

Validate Inputs Aggressively

Do not trust that the LLM will always send valid inputs. Validate everything at the boundary.

schema = {
  query: z.string()
    .min(1, "Query cannot be empty")
    .max(10000, "Query too long")
    .refine(
      (q) => q.trim().toUpperCase().startsWith("SELECT"),
      "Only SELECT queries are allowed"
    ),
};

Zod schemas in mcp-framework give you this validation layer for free. Use it thoroughly.

Monitoring and Observability

A production server you cannot observe is a production server you cannot trust.

Structured Logging

Log every tool call with enough context to debug issues after the fact. Use structured logging (JSON format) so your logs are searchable and parseable. Each log entry should include the tool name, a timestamp, the duration, and whether the call succeeded or failed.

Key Metrics to Track

At minimum, track these metrics for every MCP server in production:

Request rate — How many tool calls per minute
Error rate — What percentage of calls fail
Latency — p50, p95, and p99 response times
Active connections — How many clients are connected (for HTTP transports)
Dependency health — Status of databases, APIs, and other backends your tools use

4 metricsRequest rate, error rate, latency, and dependency health — the minimum for production visibility

Health Checks

For HTTP-based servers, expose a health check endpoint that verifies not just that the process is running, but that dependencies (databases, APIs) are reachable. Return a 503 status when any critical dependency is down so your load balancer can route traffic elsewhere.

Alert on Symptoms, Not Causes

Set alerts on user-facing symptoms: error rate above threshold, latency above SLA, health check failures. Investigating the cause is the job of your logs and metrics, not your alerts.

Scaling Patterns

Stateless Tools Scale Horizontally

If your tools do not maintain state between calls (and most should not), scaling is straightforward. Run multiple instances behind a load balancer and let the infrastructure distribute requests.

Client -> Load Balancer -> MCP Server Instance 1
                        -> MCP Server Instance 2
                        -> MCP Server Instance 3

Each instance handles requests independently. Add instances to handle more load, remove them when demand drops.

When State Is Unavoidable

Some tools need state — a tool that manages an ongoing session, or a resource that watches a file for changes. For these cases:

Externalize state — Use Redis, a database, or another shared store so any instance can access it
Use sticky sessions — Route requests from the same client to the same instance (less preferred, limits scaling flexibility)
Accept eventual consistency — If your state can tolerate brief inconsistency, you gain much simpler scaling

Prefer Stateless

Design your tools to be stateless whenever possible. Stateless tools are easier to scale, test, and reason about. If you need to maintain context across calls, externalize that state to a shared store.

Security Considerations

Production MCP servers must treat all tool call inputs as untrusted. LLMs respond to user prompts, which can include adversarial instructions. Apply the same rigor you would for a public-facing API.

Never Trust LLM-Generated Input

Treat all input from tool calls as untrusted user input. Apply input validation, use parameterized queries, validate file paths, and never pass unsanitized input to shell commands.

Key security practices:

Least privilege — Database tools should connect with read-only users. File system tools should be restricted to specific directories.
Input sanitization — Guard against SQL injection, path traversal, and command injection in every tool.
Authentication — HTTP-based servers must authenticate clients. The MCP specification supports OAuth 2.1.
TLS everywhere — All remote connections should use HTTPS. Log authentication attempts, both successful and failed.
Rate limiting — Protect your server and its dependencies from overload, especially tools that call expensive external APIs.

Deployment Patterns

Docker is a natural fit for MCP servers. Use multi-stage builds to keep images small, run as a non-root user, and handle SIGTERM and SIGINT for graceful shutdown so in-flight requests complete before the process exits.

For zero-downtime deployments, use rolling updates: start new instances before stopping old ones and let the load balancer drain connections from retiring instances.

Production Checklist

Before deploying, verify: structured logging is configured, health checks are exposed, error responses are safe and helpful, input validation is thorough, dependencies are monitored, graceful shutdown is handled, and security boundaries are enforced.

Putting It All Together

Running MCP servers in production is not fundamentally different from running any other service. The same principles apply: observe everything, fail gracefully, scale horizontally, and secure the boundaries.

What makes MCP servers unique is that their consumers are LLMs, which means:

Error messages need to be understandable by both humans and AI
Input validation needs to account for the creative ways LLMs might misuse tools
Latency matters because it affects the perceived speed of AI interactions
Reliability directly impacts user trust in the AI system

Get these right, and your MCP servers will be solid foundations for production AI applications.