In November 2024, Anthropic open-sourced their model interaction protocol, which they named the Model Context Protocol, or MCP. For the uninitiated, MCP enables connecting AI applications, such as claude.ai and ChatGPT, to arbitrary external systems over a single universal protocol. While there were many similar protocols before it—ChatGPT started offering function-calling all the way back in 2023—MCP represented a democratized way to expose tools to AI models without that interaction having been blessed by a model provider or some other gatekeeper.

MCP servers can be run either locally over standard input/output streams on a process, or remotely over HTTP, and there are accordingly two specific transport definitions that are generally-expected to be supported: stdio and Streamable HTTP. In constrained environments, such as the Claude mobile apps, stdio isn't supported, as there's not really any way to just spawn untrusted subprocesses on a phone. Historically, there have also been other transports, including the now-deprecated "HTTP with SSE" and an official-unofficial WebSocket transport, which has only ever existed as code in the TypeScript and Python SDKs and is not actually defined in the specification itself.

The existence of this WebSocket transport is pretty interesting, and it's also the simplest transport implementation we know of (albeit a rarely-used one, so it may be riddled with subtle bugs for all anyone knows). Incidentally, it's also what spurred this blog post, as it is the only remote transport that claude.ai visibly uses, despite no MCP servers (that I know of) supporting it. I've dug into this a few times over the past year, and have never come away with a very satisfying conclusion as to why it's used here. It is almost completely incompatible with the current MCP ecosystem—and yet, claude.ai can still connect to remote servers without a hitch.

Wait, what?

If we connect to an MCP server (I'm using the public MCP Shop server for this example, which uses Streamable HTTP) and look through some requests in the Network tab of the browser developer tools, the only thing that looks like an MCP server connection is this WebSocket connection upgrade, which we can recognize via the 101 Switching Protocols status code, and the request endpoint including the term mcp:

Screenshot of the network tab of the browser console. A request is selected that has a request endpoint of "wss://claude.ai/api/ws/organizations/ba5387b2-6ae2-42e2-bfa7-001557a8d2d4/mcp/servers/ff0d8201-3504-4f8b-a1cc-0dbe120e3994/" and a status code of "101 Switching Protocols."

If we take a look at the request initiator on the Initiator tab for this request, we can see it includes a start() method, which brings us to code that references WebSocketClientTransport, confirming our suspicions:

Screenshot of the client bundle code, showing a start() method that has an error including the phrase "WebSocketClientTransport already started!"

Strangely, however, when we ask the model to call a tool over MCP, we don't see any tool call events in that stream, only the initialize event, which just serves to establish some connection metadata:

User: Show me the listings in the MCP Shop!
Assistant: OHHH YOU WANNA SEE THE MCP SHOP?!?! (☆▽☆) this is like the ULTIMATE loot drop opportunity!! let me pull up the legendary item catalog for you RIGHT NOW~!!!
The assistant calls the "Show store content" tool, but there are no new events in the WebSocket stream.
Assistant: OMG OMG OMGGGG!!!! (ノ◕ヮ◕)ノ*:・゚✧ THIS IS LIKE THE MOST UMAZING LOOT DROP EVER!!!
I'll spare you the rest of the response.

In the WebSocket event stream, there is only a single `initialize` event present, and no `tools/call` event.

If we instead look at the actual model response SSE stream (/completion), we can see the tool use there:

A screenshot of the SSE stream used for model completions. There's a tool_use event and a tool_result event that map to the MCP tool that was invoked by the model earlier.

If we look at the request payload, we can see the input message and the tool list, as well:

A screenshot of the request payload for the model completion. In the tool list, there are order_shirt and show_store_content tools from our MCP server.

Notably, there is no actual MCP message here, only the raw tool input and output in the model completion stream. So, what's going on here? Is the WebSocket transport actually being used for anything important? Let's take a step back, actually—to the beginning, when we connected to the MCP server in the first place.

Woe is Shaped Like an OAuth Proxy

When we connected to this MCP server, we had to do an explicit "Connect" action in the claude.ai Connectors settings to initiate an OAuth 2.0 authorization grant with the MCP server, entirely separate from anything in the standard MCP message flow. That in turn required us to grant the MCP Shop server read access to either our Google or GitHub account for it to know our identity through a separate OAuth authorization grant.

This is ultimately just two separate authorization grants, but something interesting happens in the initial one. Typically, when we authenticate with an MCP server, we're actually authenticating with that MCP server's configured authorization server, which will give us an access token that the MCP server can validate to know who we are. When we use MCP servers with claude.ai, we've already observed that we're not actually talking with the MCP server directly, so we don't have any way of knowing what the configured authorization server is on our own.

When we initiate the authorization grant with claude.ai, what we're really doing is treating claude.ai as an authorization server, and granting it a bearer token to connect to the upstream MCP server on our behalf, through something very similar to what an earlier version of the MCP specification refers to as a "third-party authorization flow." Here's the diagram from the old specification:

The MCP client sends an initial OAuth request to the MCP server. The MCP server redirects the browser to a third-party authorization server for the end-user to authorize. Upon success, the authorization server redirects to the MCP server callback endpoint with an authorization code, which the MCP server uses to exchange for an access token. The MCP server takes this access token and generates a separate bound access token, with a mapping between them. The MCP server redirects the browser to a separate MCP client callback with its own authorization code, which the MCP client exchanges with the MCP server for the final bound access token.

This flow is used in cases where the upstream authorization server has not explicitly allowed an intermediary—claude.ai, in this case—to access its services, forcing that intermediary to itself act as an authorization server to the client instead. This isn't an ideal scenario (as should be clear by the 2025-06-18 spec removing it), and we really hope that the latest 2025-11-25 specification's support for CIMD encourages authorization servers to offer better support for these types of use cases.

As an aside, this architecture makes debugging OAuth flow errors obnoxiously difficult for MCP server developers, as it involves an opaque intermediary making requests, which can fail in non-obvious ways that intentionally aren't exposed to the end-user to avoid token exfiltration.

Going back to the sequence diagram for MCP's third-party authorization flow, we can see one key distinction compared to what happened with claude.ai, however: There's no MCP client in the mix. As we recognized before, when we work with MCP servers in claude.ai, we're not directly sending MCP messages to those servers, and now we're also seeing that no MCP client was involved in the authorization flow, either. These are both important details for understanding Anthropic's architecture, and taken together, I think we can finally draw a complete picture about what's happening here.

Behind the Presentation Layer's Facade

In one SaaS offering I used to work on (for like, actual work), we were designing multi-user support for project editing in an SSR-based application, and we referred to the web client as the "presentation layer" of that system. Essentially, we had multiple levels of fidelity for understanding the ground truth application state:

  • The ground truth itself—this would be the hard data in a database somewhere, in addition to an API over that to manage locking.

  • The server end of the SSR setup—this would be doing some heavier processing and retries to resolve simple conflicts.

  • The client end of the SSR setup—this would be doing the least work, as almost all conflict-resolution and locking was already being handled transparently by the core API server and the server session. This was what we called the "presentation layer," as it was the furthest from the ground truth and took some liberties in how the application state was processed to maximize perceived performance.

This was (perhaps fortunately) not the final design, but it's a useful example to describe what claude.ai is doing. In short: every Claude conversation is remotely-managed, as are any MCP clients being used by a given conversation. The various frontends (the website, Claude Desktop, and the mobile apps) act as a presentation layer over those remote sessions, which enables conversations to be seamlessly transferred from one platform to another, and also allows Claude to generate text and call tools while the application is closed or when the connection is interrupted.

I'm intentionally leaving stdio in Claude Desktop out of this discussion for simplicity, but for those interested: that just creates separate local MCP clients which are only used when local tools need to be invoked. Local tool calls fail when Claude Desktop is closed.

When we authenticate with a remote MCP server, that involves something separate, as the bearer tokens are reused across multiple remote sessions. Presumably, a separate service is acting as a credential vault of sorts, maintaining mappings between upstream bearer tokens and separate bearer tokens used between the credential vault and the frontend. When the MCP clients in the remote session connect to an MCP server, they skip the authentication step and just use the bearer token from the credential vault to authenticate with the MCP server instead. If we replace the MCP server with this credential vault as an intermediary in the prior flow diagram and make some other minor tweaks to account for binding to a claude.ai account instead of generating a separate access token, we get something recognizably similar to the flow from the old spec:

The browser makes an initial OAuth request to the Credential Vault, which discovers the MCP server's supported authorization server and redirects to the third-party authorization endpoint. The browser completes an authorization code grant flow with the third-party authorization server, which results in the Credential Vault getting the third-party access token. The Credential Vault binds this to the user's claude.ai account and passes it to MCP clients in claude.ai remote sessions. These MCP clients use the pre-authenticated access token to interact with the MCP server.

Putting everything together, here's a diagram of the inferred application architecture, following our prior conclusions:

Bottom-to-top architecture diagram of claude.ai's MCP infrastructure showing data flow. At the bottom, three frontend clients (Web Browser, Claude Desktop, Mobile Apps) send WebSocket connections upward to a Remote Session Manager. The session manager orchestrates all interactions: it sends MCP protocol messages and pre-authenticated bearer tokens to MCP Servers (using Streamable HTTP), sends requests with tool definitions to the Model Completion API, and receives SSE tool event streams back from the API. The session manager then sends the rendered completion stream back down to the clients. A separate Credential Vault provides cached bearer tokens to the session manager and handles OAuth flows with Third-Party Auth providers. Thick solid arrows show the primary data paths (client connections, MCP calls, model requests/responses, completion streams), while lighter dotted lines show one-time OAuth setup flows.

Looking at this holistically, we can see how things (probably) work from beginning to end:

  • First, the presentation layer authenticates with the authorization server via an OAuth 2.0 flow, resulting in the Credential Vault receiving a bearer token.

  • Then, the presentation layer initializes an MCP connection with the session manager over the WebSocket transport. This may have actually been used for tool calls historically, but now appears to be mostly vestigial, only used to validate that a connection can be opened.

  • At the beginning of a chat session, the discovered tools are sent with a prompt to the completions API behind the remote session, along with all enabled tools.

  • If a tool is executed, a separate MCP client behind the remote session interacts with the upstream MCP server using the pre-established OAuth 2.0 credentials. This is the answer to how the WebSocket/Streamable HTTP incompatibility is being resolved—there are two entirely separate connections between the presentation layer and the session manager, and the session manager and the upstream MCP server.

  • The result of the tool is sent back to the completions API by the remote session, and both the tool result and the final model completions are streamed back to the client over an SSE connection.

As an aside, based on the fact that the WebSocket transport never had a server implementation in TypeScript in the initial open-sourcing of the protocol—only a Python server—we can infer that the remote session service is likely a Python application. That's not particularly useful information, but hopefully it's interesting.

Closing Thoughts

I've always found it very strange that claude.ai was built this way, as it seemed to conflict with the direction of MCP outside of Anthropic in several ways, including the continued use of both the WebSocket transport and the old third-party authorization flow. In hindsight, however, I see these decisions as being the result of claude.ai being built in an extremely early part of MCP's lifecycle and of continued struggles around achieving universal authentication in MCP as a whole.

Building a high-quality UX around remote MCP servers that works uniformly across platforms is worthy of note, and (resources permitting) there are some aspects of Anthropic's approach which may be broadly useful for replicating that quality in other cross-platform applications, particularly by optimizing for remote sessions. While I don't expect anyone to have quite as sophisticated of an approach as Anthropic for MCP integration, I hope this was all interesting and potentially gives people ideas around integration patterns in other applications, too—or at least serves as an example of how to reverse-engineer them.