The agentic web needs both a language and a speed layer. This week, it got both.

Google's WebMCP turns every website into a structured tool for AI agents. OpenAI's Codex-Spark makes those agents fast enough to act in real time. Two announcements, 24 hours apart, and the agentic web just stopped being theoretical.

On February 11, Google published a W3C draft that lets any website expose callable functions to AI agents through a single browser API. The next day, OpenAI shipped a coding model that generates over 1,000 tokens per second on non-Nvidia hardware. Two companies, two different problems, one week.

But they solve the same underlying challenge. The agentic web needs both a language and a speed layer. A way for agents to understand what websites can do, and the raw throughput to act on that understanding in real time.

This week, it got both.

The infrastructure era of agentic commerce is no longer a roadmap. It is being assembled, layer by layer, in public.

What WebMCP Actually Does

For the past two years, AI agents have browsed the web the way a tourist navigates a foreign city without a map. Screenshot-based agents capture full-page images and pass them to vision models like Claude or Gemini. Each screenshot consumes thousands of tokens. Each interaction requires a new screenshot, a new inference call, a new decision. A single product search can require dozens of sequential interactions before the agent finds what it is looking for.

DOM-based approaches are not much better. They ingest raw HTML and JavaScript, burning through context windows and increasing inference costs with every page load.

WebMCP changes the model entirely. Published as a W3C Draft Community Group Report on February 10, 2026, it is a browser API that lets websites declare exactly what actions they support. Instead of an agent guessing what a button does by looking at a screenshot, the website tells the agent directly: here are the available tools, here are their parameters, here is how to call them.

The technical mechanism is a new browser API called navigator.modelContext. Through this API, a website publishes a structured list of callable functions. A flight booking site might expose searchFlights(origin, destination, date) and bookFlight(flightId, passenger). An e-commerce store might expose searchProducts(query, filters) and addToCart(productId, quantity).

The specification provides two paths for implementation. A Declarative API works with standard HTML forms, meaning existing websites can adopt it with minimal changes. An Imperative API allows JavaScript-driven interactions for more complex workflows.

Think of it as the difference between shouting at a building and reading the directory in the lobby. Same building, completely different efficiency.

Early benchmarks from the specification show approximately 67 percent reduction in computational overhead compared to traditional visual agent-browser interactions. That is not a marginal improvement. That is the difference between agents that cost too much to deploy and agents that work at scale.

Google and Microsoft co-authored the specification, and it is being incubated through the W3C's Web Machine Learning community group. Developers can already test it in Chrome 146 Canary behind the "Experimental Web Platform Features" flag.

What Codex-Spark Actually Does

While Google was publishing WebMCP, OpenAI was solving the other half of the equation: speed.

GPT-5.3-Codex-Spark is OpenAI's first model running on non-Nvidia hardware. It runs on the Cerebras WSE-3, a single wafer-scale chip containing 4 trillion transistors. The result is raw inference speed that GPU-based systems cannot match: over 1,000 tokens per second, roughly 15 times faster than GPT-5.3-Codex running on traditional GPU clusters.

The speed difference is not abstract. On SWE-Bench Pro, a benchmark that measures real-world software engineering tasks, Codex-Spark completes the same problems in two to three minutes that the GPU-based version takes 15 to 17 minutes to solve. The accuracy on that particular benchmark stays comparable. On Terminal-Bench 2.0, there is a tradeoff: Codex-Spark scores 58.4 percent versus 77.3 percent for the GPU version. Speed and accuracy exist on a curve, and OpenAI is betting that for many agentic workloads, speed wins.

The infrastructure story matters just as much as the benchmarks. OpenAI achieved an 80 percent reduction in per-roundtrip overhead through what they describe as optimizations across the inference pipeline. That is not just faster token generation. That is faster everything: faster tool calls, faster context loading, faster decision loops.

Codex-Spark is currently available as a research preview for ChatGPT Pro subscribers, with broader API access expected to follow. It is optimised for coding and agentic tool-use tasks, not general conversation. That narrower scope is deliberate. When your model's job is to make function calls, parse responses, and decide the next action, you do not need it to write poetry. You need it to be fast.

This is also the first visible step in OpenAI's diversification away from Nvidia. The Cerebras partnership is part of a reported $10 billion infrastructure deal. AMD and Broadcom partnerships are reportedly in development. The inference hardware market is fragmenting, and the models are following. For the agentic web to work at scale, inference cannot remain bottlenecked on a single hardware vendor's supply chain.

Why These Two Announcements Belong in the Same Article

WebMCP is the interface layer. Codex-Spark is the speed layer. Neither is sufficient alone.

Consider what happens when an AI agent needs to buy a product on your behalf today. Without WebMCP, the agent screenshots the page, processes thousands of tokens through a vision model, clicks a button, waits for the page to reload, screenshots again, processes again. Each cycle takes seconds. A complete checkout flow might take minutes.

With WebMCP but without speed, the agent can call addToCart(productId) directly instead of clicking through a visual interface. But if the underlying model takes 15 seconds per inference call, the workflow still crawls.

With both, the agent discovers the website's tool contract, makes a structured function call, gets a typed response, and moves to the next step. At 1,000 tokens per second with 80 percent less overhead per roundtrip, multi-step workflows start to approach the speed of a human clicking through the same flow. Except the agent does not get distracted, does not mis-click, and does not abandon the cart.

The web just got its first native API for AI agents. And the models just got fast enough to use it.

WordLift has called WebMCP "the new Schema.org moment," and the comparison is apt. When Google introduced structured data through Schema.org, it transformed how search engines understood web content. WebMCP does the same thing for AI agents. The difference is that agents do not just read. They act.

The Agentic Web Stack Takes Shape

Zoom out and a full protocol stack is emerging. We covered the commerce layer in our analysis of the agentic commerce stack last week. WebMCP and Codex-Spark now fill in the pieces above and below it.

At the browser layer, WebMCP gives agents structured access to any website that implements the specification. This is the digital hand that reaches into the web.

At the backend layer, Anthropic's Model Context Protocol (MCP) provides the server-side infrastructure. Where WebMCP operates entirely client-side within the browser, MCP connects AI platforms to service providers through hosted servers using JSON-RPC. The two protocols are complementary. A travel company might maintain a backend MCP server for direct API integrations with platforms like ChatGPT or Claude, while simultaneously implementing WebMCP tools on its consumer-facing website for browser-based agents.

At the commerce layer, Google's Universal Commerce Protocol (UCP) and OpenAI's Agentic Commerce Protocol (ACP) handle structured checkout, payments, and merchant integrations. UCP, built with Shopify and Walmart, manages complex retail flows and inventory logic. ACP, developed with Stripe, uses ephemeral shared payment tokens for conversational commerce.

At the speed layer, models like Codex-Spark running on purpose-built inference hardware provide the throughput that makes real-time agent workflows viable. You cannot have an agent complete a multi-step purchase flow if each inference call takes 15 seconds.

Four layers. Four protocols. Two weeks ago, only one of them existed in any usable form.

The enterprise adoption signals are already appearing. SAP announced a storefront MCP server for SAP Commerce Cloud, planned for Q2 availability. Visa expanded its Intelligent Commerce program with its own MCP server and a pilot of the Visa Acceptance Agent Toolkit. PayPal will use OpenAI and Stripe's ACP to enable consumers to use their PayPal wallet in Instant Checkout later in 2026.

The Uncomfortable Questions

The technology is impressive. The implications are uncomfortable.

For website operators, WebMCP creates a genuine paradox. If you implement it, AI agents can interact with your site efficiently, but they also have less reason to render your pages, see your ads, or engage with your brand experience. If you do not implement it, agents will continue to screenshot and scrape your site anyway, just less efficiently. Either way, you risk becoming background infrastructure for someone else's AI assistant.

As Adweek reported, this shift poses a real threat to operators who rely on direct user engagement. Fewer page views means less ad revenue. Fewer direct visits means weaker customer relationships. The web has always been about human attention. Agents do not have attention. They have task completion.

Security is the other open wound. The WebMCP specification pushes prompt injection protection to individual AI agents rather than building it into the protocol itself. That is a deliberate design choice, and a controversial one. If a malicious website publishes a tool contract that looks legitimate but contains adversarial instructions, the agent's own defences are the only safeguard. Google's engineers have acknowledged this is an unresolved challenge.

For merchants, the fragmentation problem is already real. If you want your products discoverable by both ChatGPT and Gemini, you will likely need to support both ACP and UCP. Add WebMCP for browser-based agents, and you are now maintaining three protocol implementations alongside your existing website and API infrastructure. As GR4VY noted, orchestration is becoming essential rather than optional.

What Comes Next

The timeline is accelerating. Chrome 146 Canary already supports WebMCP behind a flag. Microsoft's co-authorship of the specification suggests Edge support is forthcoming. Industry observers expect formal browser announcements by mid-to-late 2026.

On the speed front, OpenAI's Cerebras deployment is just the beginning. Non-Nvidia inference is opening up because the economics demand it. When agents are making hundreds of tool calls per session, the cost per inference call becomes the binding constraint. Purpose-built hardware that can deliver 1,000+ tokens per second at lower cost per token changes the unit economics of agentic workflows entirely.

The convergence to watch is when the protocol layer (WebMCP), the commerce layer (UCP and ACP), and the speed layer (Codex-Spark class models) all mature simultaneously. We are not there yet. WebMCP is in early preview. UCP and ACP are in pilots. Fully autonomous agentic payments remain on the horizon. But the pieces are assembling faster than most predicted.

The security story will need to evolve in parallel. Right now, each layer handles trust differently. WebMCP pushes it to agents. MCP handles it through server authentication. UCP and ACP use tokenised payment flows. There is no unified trust framework spanning all four layers. Until there is, every deployment will require bespoke security work at each integration point. That is manageable for enterprise early adopters. It is not manageable for the long tail of merchants who will eventually need to support agentic transactions.

McKinsey recently framed agentic commerce as the next frontier for consumers and merchants. That framing feels accurate, but incomplete. This is not just about commerce. This is about the fundamental architecture of the web shifting from a system designed for human browsers to a system designed for machine agents. Commerce is just where the money is, so it moves first.

The web was built for humans to browse. It is being rebuilt for machines to act. The question is not whether this happens. It is whether your infrastructure is ready when it does.

Sources

The web just got its first native API for AI agents, and the models just got fast enough to use it. If your website is not agent-ready by the end of 2026, will your customers even notice it exists?

Reply

Avatar

or to participate

Keep Reading