MCP Just Killed the Session — And Solved a Problem Your API Probably Still Has

Somewhere in your codebase, there’s an endpoint that needs to ask the caller a follow-up question mid-request. Confirm this deletion. Provide a second factor. Approve this amount. And somewhere near that endpoint, there’s a session store, a sticky-session rule on your load balancer, or a Redis key holding state that only makes sense for the next ninety seconds. You built it because the alternative — designing the whole flow to survive without server memory — felt like more work than it was worth. On July 28, 2026, the Model Context Protocol shipped a spec that had to solve exactly this problem at the scale of every AI agent talking to every tool on the internet, and the fix it landed on is worth stealing.

TL;DR

MCP’s 2026-07-28 specification removes server-side session state entirely — no handshake, no sticky sessions, servers scale behind a plain round-robin load balancer
The hard part of statelessness has always been mid-flow, server-initiated questions: how do you ask the caller something without holding a connection open to remember the context?
SEP-2322 solves this with Multi Round-Trip Requests (MRTR): the server returns an InputRequiredResult carrying the question and an opaque requestState blob; the client answers and echoes the blob back unmodified
Because the entire context rides inside the payload, literally any server instance — not just the one that asked the question — can pick up the retry and finish the job
This is the same idea behind pagination cursors, idempotency keys, and JWTs: stop making the server remember, make the client carry a receipt
You can apply this pattern to any multi-step API today, with or without MCP, and it removes an entire category of horizontal-scaling headaches

The Problem Every Multi-Step API Eventually Hits

Request-response is a beautiful lie. It works perfectly as long as the server can answer immediately with everything it knows. The moment the server needs to ask the caller something — a missing parameter, a confirmation, an approval — the clean request-response model cracks, and every team patches the crack the same way: hold the connection open, or spin up a session and remember where you left off.

Both patches scale badly, for the same underlying reason. A held-open connection ties up a worker thread or process for however long the human on the other end takes to answer, which might be seconds or might be never. A server-side session ties the next request to the specific instance that created it, which means sticky sessions at the load balancer, a shared session store when you scale beyond one box, and a very bad day when that instance gets rescheduled mid-conversation.

Why “Just Hold the Connection Open” Doesn’t Scale

This gets structurally worse the moment your compute isn’t a long-lived server at all. A function that spins up per request, does its job, and disappears — the kind of thing you’d run on Lambda, Cloudflare Workers, or any serverless platform — has nowhere to put a session even if it wanted to. There’s no process still alive in ninety seconds to remember what it was waiting for. If your architecture depends on the ability to hold state in server memory between two calls, serverless isn’t an optimization you can bolt on later. It’s a wall you hit.

MCP’s July 28, 2026 Spec Goes Fully Stateless — On Purpose

The Model Context Protocol — the open standard that lets AI agents call tools, read resources, and interact with external systems — hit this exact wall at a brutal scale. Every AI client talking to every MCP server is, structurally, a multi-step conversation: the agent calls a tool, the tool sometimes needs to ask a clarifying question or request a piece of missing context before it can finish, and the agent needs to answer before the call can complete. Multiply that by an ecosystem running on serverless functions and autoscaled containers, and the session problem stops being an edge case and becomes the whole ballgame.

Deep Dive

Scalable Systems: Explicit State

The Engineering Reality of Explicit State Ownership in Scalable Systems Scalability is often oversimplified as merely "adding more servers." Tutorials make it look easy: spin up a Docker container, call it "stateless," and you’re done....

The 2026-07-28 specification, finalized this month after a release candidate published in May, answers this by removing server-side session state from the protocol entirely. No initialization handshake to negotiate a session. No sticky routing requirement. A production MCP server can now sit behind a dumb round-robin load balancer with zero shared session storage, and it works, because the protocol no longer assumes any single instance remembers anything about a prior call.

What SEP-2322 (MRTR) Actually Changes

Removing sessions is the easy 90%. The genuinely hard 10% — the part that would’ve sunk this redesign — is what happens when the server still needs to ask the client something mid-call. That’s what Specification Enhancement Proposal 2322 solves, and it’s called Multi Round-Trip Requests, or MRTR. It’s the mechanism that lets elicitation (asking the user for input), sampling (asking the client’s model for a completion), and long-running tasks all keep working without a single byte of state stored anywhere on the server between the question and the answer.

Inside the Resumption Token: How MRTR Actually Works

Here’s the mechanism, stripped down to what matters. A call to a tool, a prompt, or a resource read can return a new kind of result instead of completing normally — an InputRequiredResult, flagged with a resultType of input_required. That result carries two things: inputRequests, a map describing exactly what the server needs from the client, and requestState, an opaque value the server is telling the client to hang onto and return unchanged.

// Server response — the call didn't finish, it's asking for input
{
 "resultType": "input_required",
 "inputRequests": {
 "confirm_deletion": {
 "type": "elicitation",
 "message": "This will permanently delete 1.2M records. Confirm?"
 }
 },
 "requestState": "eyJvcCI6ImJ1bGtfZGVsZXRlIiwic3RlcCI6MSwidG9rZW4iOiJ4eXoifQ=="
}

The client resolves whatever inputRequests is asking for — shows the user a confirmation dialog, runs a model completion, whatever the request type calls for — then re-issues the original call, this time with an inputResponses map keyed identically to the original requests, plus the exact same requestState value echoed back untouched.

// Client retry — same call, now with answers attached
{
 "inputResponses": {
 "confirm_deletion": { "confirmed": true }
 },
 "requestState": "eyJvcCI6ImJ1bGtfZGVsZXRlIiwic3RlcCI6MSwidG9rZW4iOiJ4eXoifQ=="
}

Why Any Server Instance Can Pick Up the Retry

Notice what’s missing: nowhere in this exchange does the server rely on its own memory. The requestState value is opaque to the client — it doesn’t need to know or care what’s inside it — but it’s everything the server needs to know to resume exactly where it left off. Encode the operation, the step, and a resumption identifier into that blob, and it doesn’t matter whether the retry lands on the same server instance, a sibling instance behind the same load balancer, or an instance that didn’t exist yet when the original call was made. The state isn’t on a server. It’s in transit, riding shotgun in the request itself.

You’ve Already Met This Pattern — It’s Just New Clothes on an Old Idea

If this feels familiar, it should. MRTR is a specific, well-engineered application of a pattern that’s been quietly holding up scalable systems for years: stop making the server remember anything, and make the client carry a self-contained receipt instead. A pagination cursor is exactly this — instead of the server holding a “where were we” pointer per client, it encodes the position into an opaque token and hands it back. An idempotency key does the reverse trick for retries: the client generates a value, the server uses it to recognize “I’ve already done this,” without either side needing a live shared session. A JWT replaced server-side session lookups with a signed, self-contained blob for the same underlying reason: the server shouldn’t have to remember who you are between requests when the token can just tell it.

Technical Reference

Eventual consistency

Your Distributed System Is Eventually Consistent. Your Users Don't Care The moment you split a monolith into services, you lose the safety net of a single transactional boundary. What you get in return is horizontal...

MRTR is the same trick applied to a harder problem: not just “who are you” or “where were we,” but “here’s an entire paused computation, portable enough to resume on a machine that’s never seen it before.”

The One Real Difference: What Goes Inside the Token

A pagination cursor typically encodes a handful of primitives — an offset, a sort key, maybe a timestamp. A resumption token in the MRTR sense can be encoding something considerably heavier: which step of a multi-step operation you’re on, what partial work has already committed, what still needs to happen once the missing input arrives. It’s less like a bookmark and more like a snapshot of a paused coroutine, serialized into a string. That’s a meaningfully bigger engineering surface, and it’s why this pattern deserves more attention than “oh, it’s just a cursor” — the failure modes when you get it wrong are bigger too.

Building This Pattern Into Your Own Multi-Step API

You don’t need to be building an MCP server to use this. Any API where the server sometimes needs to pause and ask for something mid-operation is a candidate — a checkout flow that needs step-up authentication only for large orders, a bulk-import endpoint that needs to flag ambiguous rows for human review, a document-processing pipeline that occasionally needs a human to resolve a conflict before continuing.

// Generic resumption-token pattern for a checkout flow
// that needs step-up MFA only above a spending threshold

func Checkout(order Order) Response {
 if order.Total > mfaThreshold && !order.MFAVerified {
 token := encodeResumptionToken(ResumeState{
 Op: "checkout",
 OrderID: order.ID,
 Step: "await_mfa",
 })
 return Response{
 Status: "input_required",
 InputRequest: MFAChallenge{OrderID: order.ID},
 ResumeToken: token,
 }
 }
 return finalizeCheckout(order)
}

// Client resolves the MFA challenge, then re-calls Checkout
// with the MFA result and the same opaque ResumeToken attached.
// No server-side session was ever created.

The API surface looks almost identical to a session-based version. The difference is entirely in where the state lives: instead of a session ID pointing at server memory, you hand the client a self-describing token pointing at nothing but itself.

Three Rules for What You Put Inside a Resumption Token

First, encode identifiers, not data. Put the order ID in the token, not the order’s full contents — fetch the current state fresh when the retry lands, so you’re never resuming against data that’s gone stale during the pause. Second, sign or encrypt it if it crosses a trust boundary — an opaque token the client can tamper with is an opaque token an attacker can tamper with, and “resume this operation at step 3 as a different user” is exactly the kind of bug this pattern can introduce if you skip that step. Third, put an expiry inside the token itself rather than relying on server-side cleanup — a resumption token that’s valid forever is a liability sitting in your logs and your client’s local storage indefinitely.

What You Trade Away When You Remove the Session

None of this is free. A resumption token has to be re-validated and re-hydrated on every retry, which is real work the server used to skip by just reading from memory. Debugging gets a layer more indirect — instead of inspecting live server state, you’re decoding a blob to understand where an operation actually is. And if you’re retrofitting this onto an existing session-based system, every mid-flow interaction needs to be redesigned around “what’s the minimum state I need to make this resumable,” which is a genuinely different design exercise than “what do I need to remember.” The payoff — servers that scale behind a dumb load balancer with zero shared state — is exactly why MCP’s maintainers decided it was worth the redesign for a protocol that has to run at agent-swarm scale.

Worth Reading

Scaling: Time & Coordination

Scalable Systems: Coordination and Latency Horizontal scaling is often sold as a linear equation: double the nodes, double the throughput. In reality, distributed systems are governed by the physics of information, where the speed of...

FAQ: Stateless Multi-Step APIs and MRTR

What is MRTR in the MCP specification?

Multi Round-Trip Requests, introduced by SEP-2322 in the MCP 2026-07-28 specification, is the mechanism that lets a server pause a call to ask the client for input — an elicitation, a sampling request, or task-related input — without holding a connection open or storing session state on the server.

When did the MCP stateless specification ship?

The 2026-07-28 specification finalized on that date, following a release candidate published in May 2026 and beta SDKs across Python, TypeScript, Go, and C# during the validation window.

How does a stateless server know how to resume a paused request?

The server encodes everything it needs into an opaque requestState value returned to the client. The client echoes that value back unmodified on its retry, so any server instance — not necessarily the one that issued the original question — can decode it and resume the operation.

Is the MRTR pattern only useful for AI agent protocols?

No. It’s a general solution to any multi-step API that needs to pause mid-operation for input — step-up authentication, human review steps, or confirmation flows are all candidates, regardless of whether AI or MCP is involved anywhere in the system.

What’s the difference between a resumption token and a pagination cursor?

Both move state out of the server and into an opaque client-held value, but a resumption token typically encodes a paused multi-step operation — including which step is in progress and what still needs to happen — while a pagination cursor usually encodes just a position, like an offset or sort key.

Can a resumption token be tampered with by the client?

It can, if you don’t protect it. Any resumption token that crosses a trust boundary should be signed or encrypted, since an attacker who can freely edit an unprotected token can potentially resume an operation at an arbitrary step or under a different identity.

Does removing server sessions make MCP servers work on serverless platforms?

Yes — that’s the primary motivation. Session-based protocols can’t run cleanly on per-request compute like Lambda or Cloudflare Workers, because there’s no long-lived process to hold the session between calls. A fully stateless protocol has no such requirement.

Do I need to adopt MCP to use the resumption token pattern in my own API?

No. The pattern — encode a paused operation into an opaque, client-held value instead of server memory — is protocol-agnostic and can be implemented in any request-response API, in any language, independent of MCP.

Written by:

H.C. Choud

Related Articles