Stateless: The Future of MCP Transports - Agentic AI Foundation (AAIF)

Agentic AI Foundation

A deep dive into Shaun Smith and Kurtis Van Gent’s talk at MCP Dev Summit North America 2026

The operational scale of MCP is significant: Google Cloud maintains the reliability for its MCP servers supporting AlloyDB, Spanner, Cloud SQL, Bigtable, and Firestore. The team’s open source MCP Toolbox for Databases has seen rapid adoption, surpassing 13,000 GitHub stars and managing over 20 million tool calls across 40+ databases in the last month. Simultaneously, Hugging Face utilizes more than 2,500 MCP servers via Spaces to provide essential inference access connectivity — and publishes detailed, publicly available data on the MCP clients it observes connecting through its transport endpoints.

Both of them ran into the same wall. MCP, as currently designed, is stateful. And at the scale these teams operate, that causes real problems and forces new approaches. For Hugging Face, the overhead is measurable: a single tool call can generate over 100 MCP protocol messages. Smith’s team tracks the conversion rate between clients that connect and clients that go on to make a tool call, and the protocol’s chattiness works against that ratio.

The State Problem

Every MCP session today starts with initialization. Before a client can do anything useful, it negotiates with the server: exchanging protocol versions, capabilities, session IDs, and all of that context has to persist for the life of the connection.

That works fine for a local stdio server tied to a single developer’s machine. It breaks down the moment you add a load balancer.

When requests can arrive at any server in a pool, and each server needs to know what happened in previous requests to respond correctly, you have a coordination problem. Servers have to stay in sync. Sessions tie you to a specific connection lifecycle. If that connection drops, the state goes with it.

Van Gent put it plainly: a stateless protocol means every request carries everything needed to handle it. Any server in the pool can pick it up. Regions can fail and recover. You don’t lose work. And debugging gets simpler, if the request is self-contained, you can inspect it directly rather than reconstructing what led up to it.

The goal driving their work in the Transport Working Group is to make MCP that.

Removing Initialization

The first and biggest change is SEP-1442: removing the initialization handshake as a required first step.

Instead of a separate initialization phase before any real work can happen, protocol negotiation will fold into the first actual request.

A client sends a tools/list call with its protocol version.
The server either accepts and responds, or returns a list of versions it does support so the client can retry.

One round trip instead of a ceremony. The request is the negotiation.

This also lets protocol version, client capabilities, and server capabilities change independently. Today, updating any of these can require tearing down the whole session. In a stateless world, a client that adds support for elicitation mid-session can just say so in its next request, without restarting.

Fixing Elicitation

The state problem becomes most visible with elicitation: the pattern where a server needs to ask the client a follow-up question mid-tool-call.

Today, when a server sends an elicitation request over its SSE stream, the client’s response comes back as a new HTTP request. That response might land at a completely different server than the one that sent the original elicitation. Now two servers have to coordinate. If either connection drops, the whole exchange falls apart.

This is why major providers haven’t shipped elicitation. The reliability story doesn’t hold.

SEP-2322 addresses this directly. The flow becomes a sequence of independent requests.

A server returns an intermediate result that says, effectively, “I need more information.”
The client collects that input and sends a new tool call with the previous response included.
The state travels in the request. Any server can pick it up. The elicitation completes, and so does the tool call.

Sessions, Clarified

Sessions in the current spec are, in Smith’s words, “almost a side effect of the transport.” For stdio, your session is the process lifetime. For streamable HTTP, the server may or may not issue a session ID depending on implementation choices.

The ambiguity creates hazards. If an MCP server developer has made assumptions about what a session means, two conversation threads can end up sharing state through the same server instance. Instructions from one conversation influence the other. Smith’s team at Hugging Face initially deployed without sessions entirely, it worked well, but they lost the ability to track which clients were making which calls, so they started using session IDs as an analytics hook.

The proposal the group converged on moves toward a session-free protocol by default. A common pattern already does most of the work sessions were doing: explicit resource handles passed as tool call parameters. If a client tells a server which Spanner instance it wants to act on, the server doesn’t need to remember it from a previous request. The state is in the payload.

For the subset of deployments that genuinely need stickiness, sessions will move to an extension layer rather than the core spec.

HTTP Standardization

One more problem Van Gent addressed is specific to how JSON-RPC interacts with HTTP infrastructure.

HTTP was designed so that routing information lives at the top of the request, in headers, up front, and easy to parse. Proxies, load balancers, and intermediaries can read the envelope and forward the rest without parsing the full payload.

JSON-RPC puts everything in the body. Every node in the chain that needs to make a routing decision has to parse the entire payload to do it. At scale, that adds overhead to every hop.

SEP-2243 introduces HTTP standardization: key information from the JSON-RPC payload gets mirrored into HTTP headers. The method, the resource name, the tool being called, all visible at the envelope level. For tool calls with routing-relevant parameters, servers can declare which fields should be promoted into headers, so a load balancer can route a Spanner call to the right region and instance without touching the payload at all.

What’s Next

The transport changes are on track for the June spec release. SEPs are landing in April, giving two months of implementation across the SDKs before the spec finalizes. Smith noted that the goal, once these foundational decisions are made, is to not revisit the transport layer often. The sequencing matters, get statelessness right first, then build on it.

From there, the group is looking at optimization work: eTags for resource freshness, time-to-live hints on tool lists so clients know how often to refresh them, and pluggable transports to support WebSockets, gRPC, and other deployment scenarios without fragmenting the core standard.

The through-line in both presentations was the same: the assumptions baked into MCP’s early design made sense for local stdio servers. The deployment reality is different. Twenty million tool calls a month across 2,500 servers demands infrastructure thinking, and that’s the work the Transport Working Group is doing.

Shaun Smith is an MCP core maintainer at Hugging Face and leads the transports working group alongside Kurtis Van Gent, an MCP core maintainer at Google Cloud. The Transport Working Group operates under the Agentic AI Foundation. Learn more and get involved at aaif.io.

Join the conversation in the AAIF Discord and explore the MCP GitHub repository to start building.