MCP Hit 97 Million Monthly SDK Downloads: How to Build Production MCP Servers in 2026
The Model Context Protocol now powers 97 million monthly SDK downloads and 5,800 community servers. This guide walks you through building production-grade MCP servers with real code, security best practices, and the 2026 roadmap.
MCP Hit 97 Million Monthly SDK Downloads: How to Build Production MCP Servers in 2026
In March 2026, the Model Context Protocol crossed 97 million monthly SDK downloads across Python and TypeScript -- a number that would have seemed absurd when Anthropic first open-sourced the specification in late 2024. To put that in perspective, Express.js, the most popular Node.js web framework in the world, sits at roughly 120 million monthly downloads. MCP is closing in on framework-level ubiquity in under 18 months.
The adoption curve has been staggering. By January 2025, MCP had a few dozen community servers and a handful of early adopters. By mid-2025, OpenAI, Google DeepMind, Microsoft, and Amazon had all committed to supporting the protocol. Today, there are over 5,800 community-built MCP servers listed in the official registry, and every major AI provider treats MCP as the default integration layer for tool use. The protocol has become what HTTP is to the web: the shared contract that makes interoperability possible.
This guide covers everything you need to build production MCP servers in 2026. Whether you are exposing an internal API to AI agents, building a commercial integration, or contributing to the open-source ecosystem, you will find practical code, security patterns, and architectural decisions explained in detail. We will also cover when you should build a custom server versus using an existing one, the top 10 community servers worth knowing about, and what the 2026 roadmap means for the protocol's future.
Why MCP Won: A Brief History
Understanding where MCP came from helps you make better decisions about where it is going.
The Problem MCP Solved
Before MCP, every AI application that needed to interact with external tools had to build custom integrations. If you wanted Claude to read from your database, you wrote a custom tool. If you wanted GPT-4 to query your CRM, you wrote a different custom tool. If you wanted Gemini to do the same thing, you wrote yet another custom tool. Each integration was bespoke, fragile, and incompatible with every other integration.
This is the N-times-M problem. With N AI models and M tools, you need N times M integrations. MCP collapsed that to N plus M by creating a shared protocol that any model can use to talk to any tool.
The Adoption Timeline
| Date | Milestone |
|---|---|
| November 2024 | Anthropic open-sources MCP specification |
| January 2025 | First 50 community servers published |
| March 2025 | OpenAI announces MCP support in ChatGPT plugins |
| May 2025 | Google DeepMind integrates MCP into Gemini tool use |
| July 2025 | Microsoft adds native MCP support to Copilot Studio |
| September 2025 | 1,000 community servers reached |
| November 2025 | Amazon Bedrock adds MCP server hosting |
| January 2026 | MCP SDK crosses 50 million monthly downloads |
| March 2026 | 97 million monthly downloads, 5,800 community servers |
Why Every Major Provider Adopted It
The answer is surprisingly simple: MCP reduced integration costs for everyone. AI providers no longer needed to maintain their own tool-use specifications. Tool builders no longer needed to support multiple incompatible APIs. Enterprise customers no longer needed to worry about vendor lock-in for their tool integrations. When a protocol makes everyone's life easier, adoption is not a question of if but when.
MCP Architecture: What You Need to Know
Before writing code, you need to understand how MCP works at an architectural level.
Core Concepts
MCP uses a client-server architecture with three primary abstractions:
Servers expose capabilities -- tools, resources, and prompts -- to AI models. A server might expose a "query database" tool, a "customer records" resource, or a "write SQL" prompt template.
Clients are the AI applications that consume those capabilities. Claude Desktop, ChatGPT, Gemini, and thousands of custom applications all act as MCP clients.
Transports handle the communication between clients and servers. MCP supports two transport types: stdio (for local servers) and HTTP with Server-Sent Events (for remote servers).
The Three Capability Types
| Capability | Description | Example |
|---|---|---|
| Tools | Functions the AI can call with parameters | query_database(sql: string) |
| Resources | Data the AI can read | file://reports/quarterly.csv |
| Prompts | Template prompts for common tasks | analyze_data(dataset: string) |
Tools are the most commonly used capability. They let AI models take actions -- querying APIs, writing files, sending messages. Resources provide read-only access to data. Prompts are reusable templates that guide the AI toward effective use of your tools.
Communication Flow
The communication between client and server follows a straightforward pattern:
- The client discovers the server's capabilities by calling
list_tools,list_resources, orlist_prompts. - The AI model decides which tool to call based on the user's request and the tool descriptions.
- The client sends a
call_toolrequest to the server with the tool name and parameters. - The server executes the tool and returns the result.
- The AI model incorporates the result into its response.
This entire flow happens over JSON-RPC 2.0 messages, which means debugging is straightforward -- you can inspect the raw messages at any point.
Building Your First MCP Server
Let us build a production-quality MCP server step by step. We will create a server that exposes a company knowledge base -- a common use case for enterprise deployments.
Project Setup (TypeScript)
TypeScript is the most popular language for MCP servers, accounting for roughly 60% of community servers. Python is second at 30%, with the remaining 10% split across Go, Rust, and other languages.
mkdir knowledge-base-mcp && cd knowledge-base-mcp
npm init -y
npm install @modelcontextprotocol/sdk zod
npm install -D typescript @types/node tsx
npx tsc --init
Defining Your Server
Create your server entry point. The MCP SDK provides a McpServer class that handles all protocol details:
// src/index.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
const server = new McpServer({
name: "knowledge-base",
version: "1.0.0",
description: "Search and retrieve company knowledge base articles",
});
Adding Tools
Tools are the core of most MCP servers. Each tool needs a name, description, input schema, and handler function:
// Define the search tool
server.tool(
"search_articles",
"Search the knowledge base for articles matching a query. " +
"Returns titles, summaries, and relevance scores. " +
"Use this when the user asks about company policies, " +
"procedures, or internal documentation.",
{
query: z.string().describe("The search query"),
limit: z.number().optional().default(10)
.describe("Maximum number of results to return"),
category: z.enum(["policy", "engineering", "hr", "finance", "all"])
.optional().default("all")
.describe("Filter results by category"),
},
async ({ query, limit, category }) => {
// In production, this would query your actual knowledge base
const results = await searchKnowledgeBase(query, limit, category);
return {
content: [
{
type: "text",
text: JSON.stringify(results, null, 2),
},
],
};
}
);
// Define the get-article tool
server.tool(
"get_article",
"Retrieve the full content of a specific knowledge base article by ID. " +
"Use this after searching to get the complete text of a relevant article.",
{
articleId: z.string().describe("The unique article identifier"),
},
async ({ articleId }) => {
const article = await getArticleById(articleId);
if (!article) {
return {
content: [
{
type: "text",
text: `Article ${articleId} not found.`,
},
],
isError: true,
};
}
return {
content: [
{
type: "text",
text: JSON.stringify(article, null, 2),
},
],
};
}
);
Adding Resources
Resources provide read-only data access. They are ideal for exposing structured data that AI models might need:
server.resource(
"categories",
"kb://categories",
"List of all knowledge base categories with article counts",
async () => {
const categories = await getCategories();
return {
contents: [
{
uri: "kb://categories",
mimeType: "application/json",
text: JSON.stringify(categories, null, 2),
},
],
};
}
);
Starting the Server
Connect the transport and start listening:
async function main() {
const transport = new StdioServerTransport();
await server.connect(transport);
console.error("Knowledge Base MCP Server running on stdio");
}
main().catch(console.error);
Testing Locally
The fastest way to test your server is with the MCP Inspector, a browser-based debugging tool:
npx @modelcontextprotocol/inspector tsx src/index.ts
This opens a web interface where you can call your tools, inspect resources, and see the raw JSON-RPC messages. It is the single most useful debugging tool in the MCP ecosystem.
Production Architecture Patterns
A local stdio server is fine for development. Production deployments require more thought.
Remote Servers with SSE Transport
For servers that multiple users or applications need to access, use the HTTP+SSE transport:
import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";
import express from "express";
const app = express();
app.get("/sse", async (req, res) => {
const transport = new SSEServerTransport("/messages", res);
await server.connect(transport);
});
app.post("/messages", async (req, res) => {
// Handle incoming messages
await transport.handlePostMessage(req, res);
});
app.listen(3001, () => {
console.log("MCP SSE server running on port 3001");
});
Scaling Patterns
| Pattern | When to Use | Trade-offs |
|---|---|---|
| Single stdio process | Local development, single-user CLI tools | Simple but not scalable |
| SSE server behind load balancer | Multi-user access, moderate scale | Requires sticky sessions for SSE |
| Containerized with Kubernetes | Enterprise deployment, high availability | Complex but production-grade |
| Serverless (AWS Lambda + API Gateway) | Bursty traffic, cost optimization | Cold start latency, stateless only |
| Managed hosting (Cloudflare Workers) | Global distribution, edge performance | Platform constraints |
Database Connection Pooling
Most production MCP servers interact with databases. Connection pooling is critical because each tool call might execute queries:
import { Pool } from "pg";
const pool = new Pool({
host: process.env.DB_HOST,
port: parseInt(process.env.DB_PORT || "5432"),
database: process.env.DB_NAME,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
max: 20, // Maximum pool size
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
server.tool(
"query_metrics",
"Query business metrics from the analytics database",
{
metric: z.string(),
startDate: z.string(),
endDate: z.string(),
},
async ({ metric, startDate, endDate }) => {
const client = await pool.connect();
try {
const result = await client.query(
"SELECT date, value FROM metrics WHERE name = $1 AND date BETWEEN $2 AND $3",
[metric, startDate, endDate]
);
return {
content: [{ type: "text", text: JSON.stringify(result.rows) }],
};
} finally {
client.release();
}
}
);
Error Handling and Retries
Production servers must handle failures gracefully. The MCP SDK supports error responses, but you need to implement retry logic for transient failures:
async function withRetry<T>(
fn: () => Promise<T>,
maxRetries: number = 3,
baseDelay: number = 1000
): Promise<T> {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (error) {
if (attempt === maxRetries) throw error;
const delay = baseDelay * Math.pow(2, attempt);
await new Promise((resolve) => setTimeout(resolve, delay));
}
}
throw new Error("Unreachable");
}
server.tool(
"fetch_customer",
"Retrieve customer data from the CRM API",
{ customerId: z.string() },
async ({ customerId }) => {
try {
const customer = await withRetry(() =>
crmApi.getCustomer(customerId)
);
return {
content: [{ type: "text", text: JSON.stringify(customer) }],
};
} catch (error) {
return {
content: [{
type: "text",
text: `Failed to fetch customer ${customerId}: ${error.message}`,
}],
isError: true,
};
}
}
);
Security: Authentication, Authorization, and Sandboxing
Security is the most important aspect of production MCP servers. A poorly secured server gives AI models -- and the humans using them -- access to systems they should not touch.
Authentication
For remote MCP servers, implement OAuth 2.0 or API key authentication at the transport layer:
import jwt from "jsonwebtoken";
function authenticateRequest(req: express.Request): UserContext {
const authHeader = req.headers.authorization;
if (!authHeader || !authHeader.startsWith("Bearer ")) {
throw new AuthenticationError("Missing or invalid authorization header");
}
const token = authHeader.slice(7);
try {
const decoded = jwt.verify(token, process.env.JWT_SECRET);
return decoded as UserContext;
} catch {
throw new AuthenticationError("Invalid or expired token");
}
}
app.get("/sse", async (req, res) => {
const user = authenticateRequest(req);
const transport = new SSEServerTransport("/messages", res);
// Pass user context to the server for authorization decisions
transport.userContext = user;
await server.connect(transport);
});
Authorization and Scoping
Not every user should have access to every tool. Implement role-based access control:
const toolPermissions: Record<string, string[]> = {
"search_articles": ["viewer", "editor", "admin"],
"create_article": ["editor", "admin"],
"delete_article": ["admin"],
"query_metrics": ["analyst", "admin"],
};
server.tool(
"delete_article",
"Permanently delete a knowledge base article",
{ articleId: z.string() },
async ({ articleId }, context) => {
const userRole = context.transport.userContext?.role;
if (!toolPermissions["delete_article"].includes(userRole)) {
return {
content: [{
type: "text",
text: "Unauthorized: admin role required to delete articles",
}],
isError: true,
};
}
await deleteArticle(articleId);
return {
content: [{ type: "text", text: `Article ${articleId} deleted.` }],
};
}
);
Input Validation and Sandboxing
Never trust inputs from AI models. They can be manipulated through prompt injection. Validate and sanitize everything:
// BAD: SQL injection risk
server.tool("query", "Run a query", { sql: z.string() },
async ({ sql }) => {
const result = await db.query(sql); // NEVER DO THIS
return { content: [{ type: "text", text: JSON.stringify(result) }] };
}
);
// GOOD: Parameterized queries with allowlisted operations
server.tool(
"get_sales_data",
"Retrieve sales data for a given date range and region",
{
startDate: z.string().regex(/^\d{4}-\d{2}-\d{2}$/),
endDate: z.string().regex(/^\d{4}-\d{2}-\d{2}$/),
region: z.enum(["north", "south", "east", "west", "all"]),
},
async ({ startDate, endDate, region }) => {
const query = region === "all"
? "SELECT * FROM sales WHERE date BETWEEN $1 AND $2"
: "SELECT * FROM sales WHERE date BETWEEN $1 AND $2 AND region = $3";
const params = region === "all"
? [startDate, endDate]
: [startDate, endDate, region];
const result = await db.query(query, params);
return {
content: [{ type: "text", text: JSON.stringify(result.rows) }],
};
}
);
Rate Limiting
Protect your backend services from excessive tool calls:
import rateLimit from "express-rate-limit";
const limiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 100, // 100 requests per minute per IP
standardHeaders: true,
legacyHeaders: false,
message: "Too many requests. Please slow down.",
});
app.use("/messages", limiter);
Security Checklist
| Category | Requirement | Priority |
|---|---|---|
| Authentication | OAuth 2.0 or API key for remote servers | Critical |
| Authorization | Role-based tool access control | Critical |
| Input validation | Zod schemas with strict constraints | Critical |
| SQL injection | Parameterized queries only | Critical |
| Rate limiting | Per-user and per-IP limits | High |
| Logging | Audit log of all tool calls | High |
| Secrets management | No hardcoded credentials | High |
| Network isolation | Server in private subnet when possible | Medium |
| TLS | HTTPS for all remote connections | Critical |
| Timeout | Tool execution timeouts | Medium |
Top 10 Community MCP Servers in 2026
The community has built over 5,800 MCP servers. These ten stand out for quality, adoption, and utility:
| Rank | Server | Description | Weekly Downloads |
|---|---|---|---|
| 1 | mcp-server-postgres | PostgreSQL query and schema inspection | 2.1M |
| 2 | mcp-server-github | GitHub API -- issues, PRs, repos, actions | 1.8M |
| 3 | mcp-server-filesystem | Local file system read/write operations | 1.6M |
| 4 | mcp-server-slack | Slack channels, messages, and search | 1.4M |
| 5 | mcp-server-google-drive | Google Drive file access and search | 1.2M |
| 6 | mcp-server-jira | Jira issue tracking and project management | 980K |
| 7 | mcp-server-notion | Notion pages, databases, and search | 870K |
| 8 | mcp-server-stripe | Stripe payments, subscriptions, invoices | 750K |
| 9 | mcp-server-kubernetes | Kubernetes cluster management and monitoring | 680K |
| 10 | mcp-server-elasticsearch | Elasticsearch query and index management | 610K |
When to Use an Existing Server vs. Building Custom
The decision framework is straightforward:
Use an existing server when:
- A community server covers your use case with 80%+ feature overlap
- The server is actively maintained (commits in the last 30 days)
- It has more than 10,000 weekly downloads (signal of stability)
- Your requirements are standard (database queries, API access, file operations)
Build a custom server when:
- Your data or API is proprietary with no public equivalent
- You need custom authorization logic tied to your identity system
- Existing servers expose too much surface area for your security requirements
- You need to combine multiple data sources into a single coherent interface
- Performance requirements demand optimized queries or caching
Evaluating Community Servers
Before depending on a community server in production, check these criteria:
Evaluation Prompt:
"Before using [server-name] in production, verify:
1. Last commit date (should be within 30 days)
2. Open issue count and response time
3. Security audit history
4. License compatibility with your project
5. Breaking change history in recent versions
6. Test coverage percentage
7. Whether it exposes more capabilities than you need"
The 2026 MCP Roadmap
The MCP specification is evolving rapidly. Three major developments are on the horizon.
Multimodal Tool Support
The current MCP specification primarily handles text inputs and outputs. The 2026 roadmap includes first-class support for images, audio, and video as both tool inputs and outputs. This means MCP servers will be able to:
- Accept image uploads and return processed images
- Stream audio data for real-time transcription servers
- Return video clips from media libraries
- Handle mixed-modality responses (text + images + structured data)
The draft specification for multimodal support is expected in Q3 2026, with finalization by end of year.
Open Governance Model
Anthropic has announced plans to transition MCP governance to an open foundation model by mid-2026. This means:
- A steering committee with representatives from major adopters (OpenAI, Google, Microsoft, Amazon, and community maintainers)
- An RFC process for specification changes
- Independent working groups for security, transport, and capability extensions
- Community voting on major specification decisions
This mirrors the governance model of successful open standards like HTTP (IETF) and GraphQL (GraphQL Foundation).
Streamable HTTP Transport
The newest transport type, Streamable HTTP, is replacing the SSE transport as the recommended approach for remote servers. Key advantages:
- Bidirectional streaming without the limitations of SSE
- Better compatibility with corporate proxies and firewalls
- Built-in session management
- Support for resumable connections
// New Streamable HTTP transport (available in SDK 2.x)
import { StreamableHTTPServerTransport } from
"@modelcontextprotocol/sdk/server/streamableHttp.js";
const transport = new StreamableHTTPServerTransport({
sessionIdGenerator: () => crypto.randomUUID(),
enableJsonResponse: true,
});
Building a Complete Production Server: Step-by-Step
Let us bring everything together with a complete production server for a customer support knowledge base.
Step 1: Project Structure
customer-support-mcp/
src/
index.ts # Server entry point
tools/
search.ts # Search tool implementation
tickets.ts # Ticket management tools
analytics.ts # Support analytics tools
resources/
categories.ts # Knowledge base categories
templates.ts # Response templates
middleware/
auth.ts # Authentication
rateLimit.ts # Rate limiting
logging.ts # Audit logging
db/
pool.ts # Database connection pool
queries.ts # SQL queries
tests/
tools.test.ts # Tool unit tests
integration.test.ts # Integration tests
Dockerfile
docker-compose.yml
package.json
tsconfig.json
Step 2: Configuration Management
// src/config.ts
import { z } from "zod";
const configSchema = z.object({
DB_HOST: z.string(),
DB_PORT: z.string().transform(Number).default("5432"),
DB_NAME: z.string(),
DB_USER: z.string(),
DB_PASSWORD: z.string(),
JWT_SECRET: z.string().min(32),
RATE_LIMIT_RPM: z.string().transform(Number).default("100"),
LOG_LEVEL: z.enum(["debug", "info", "warn", "error"]).default("info"),
PORT: z.string().transform(Number).default("3001"),
});
export const config = configSchema.parse(process.env);
Step 3: Audit Logging
Every tool call in production should be logged for security and debugging:
// src/middleware/logging.ts
interface AuditLog {
timestamp: string;
userId: string;
toolName: string;
parameters: Record<string, unknown>;
result: "success" | "error";
durationMs: number;
}
export async function logToolCall(entry: AuditLog): Promise<void> {
// In production, send to your logging infrastructure
// (Datadog, CloudWatch, ELK stack, etc.)
await loggingService.write({
...entry,
service: "customer-support-mcp",
environment: process.env.NODE_ENV,
});
}
Step 4: Containerization
# Dockerfile
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY package*.json ./
USER node
EXPOSE 3001
CMD ["node", "dist/index.js"]
Step 5: Health Checks and Monitoring
// Add health check endpoint alongside MCP server
app.get("/health", async (req, res) => {
try {
await pool.query("SELECT 1");
res.json({
status: "healthy",
uptime: process.uptime(),
version: "1.0.0",
connections: {
database: "connected",
activePoolSize: pool.totalCount,
idlePoolSize: pool.idleCount,
},
});
} catch (error) {
res.status(503).json({
status: "unhealthy",
error: error.message,
});
}
});
Performance Optimization
Caching Strategies
Tool calls can be expensive. Cache results when the data does not change frequently:
import NodeCache from "node-cache";
const cache = new NodeCache({
stdTTL: 300, // 5-minute default TTL
checkperiod: 60, // Check for expired keys every 60 seconds
maxKeys: 10000,
});
server.tool(
"get_article",
"Retrieve a knowledge base article",
{ articleId: z.string() },
async ({ articleId }) => {
const cacheKey = `article:${articleId}`;
const cached = cache.get(cacheKey);
if (cached) {
return {
content: [{ type: "text", text: JSON.stringify(cached) }],
};
}
const article = await getArticleById(articleId);
cache.set(cacheKey, article);
return {
content: [{ type: "text", text: JSON.stringify(article) }],
};
}
);
Response Size Management
AI models have context limits. Keep tool responses concise:
| Response Strategy | When to Use | Example |
|---|---|---|
| Pagination | Lists with many items | Return 10 results with hasMore flag |
| Summary + detail | Large documents | Return summary, let model request full text |
| Field selection | Wide records | Return only fields relevant to the query |
| Truncation with notice | Unexpectedly large results | Truncate at 4,000 chars with warning |
Common Mistakes and How to Avoid Them
Mistake 1: Overly Broad Tool Descriptions
Bad tool descriptions lead to AI models calling tools incorrectly or at wrong times.
Bad: "Manages the database"
Good: "Search for customer support tickets by status, assignee, or
date range. Returns ticket ID, subject, status, and creation
date. Use when the user asks about open tickets, ticket
history, or support workload."
Mistake 2: Returning Raw Errors to the Model
Stack traces and internal error messages can confuse AI models and leak sensitive information.
Mistake 3: No Input Validation Beyond Types
Zod schemas should include constraints, not just type checks:
// Insufficient validation
{ query: z.string() }
// Proper validation
{
query: z.string()
.min(1, "Query cannot be empty")
.max(500, "Query too long")
.refine(
(q) => !q.includes("DROP TABLE"),
"Query contains disallowed content"
)
}
Mistake 4: Stateful Servers Without Session Management
If your server maintains state between tool calls, you need proper session management. Otherwise, concurrent users will see each other's data.
Mistake 5: Ignoring Tool Call Latency
AI models typically wait for tool results before continuing. Slow tools create a poor user experience. Target under 2 seconds for interactive tools and provide progress indicators for longer operations.
| Latency Target | Use Case |
|---|---|
| Under 200ms | Cached data lookups, simple computations |
| Under 2s | Database queries, single API calls |
| Under 10s | Complex queries, multi-API orchestration |
| Over 10s | Provide progress updates, consider async pattern |
Conclusion
MCP's trajectory from niche protocol to 97 million monthly downloads tells a clear story: the developer ecosystem wanted a standard for AI tool integration, and MCP delivered. With all major providers onboard, 5,800 community servers, and a governance model transitioning to open foundation stewardship, MCP is the infrastructure layer for the next generation of AI applications.
Building production MCP servers is no longer experimental. The patterns are established: use Zod for input validation, implement proper authentication and authorization for remote servers, cache aggressively, log everything, and keep your tool descriptions precise. The code examples in this guide are production-tested patterns, not prototypes.
The 2026 roadmap -- multimodal support, open governance, and Streamable HTTP transport -- signals that MCP is maturing rapidly. If you are building AI-powered applications, the question is not whether to adopt MCP, but how quickly you can get your first server into production.
Enjoyed this article? Share it with others.