The Delegation Problem

On February 12, Google DeepMind published a paper with a deceptively simple title: Intelligent AI Delegation. The 40-page framework, authored by Nenad Tomašev, Matija Franklin, and Simon Osindero, proposes a formal system for how AI agents should hand off tasks to one another. It covers authority transfer, trust calibration, cryptographic verification, and liability chains.

The timing is striking. In the two weeks since, the industry has made clear that agentic commerce is no longer a research topic. Stripe published its annual letter announcing a dedicated agentic commerce protocol. Visa confirmed it has completed secure AI-initiated transactions. Perplexity launched a multi-model agentic product. And Chargebacks911 warned that AI-driven payment disputes are already on the horizon.

Gartner projects that 40 percent of enterprise applications will feature task-specific AI agents by the end of this year. The delegation framework arrived just in time. Or, depending on your perspective, just too late.

❝

The question is no longer whether AI agents will transact on our behalf. It is whether the rules will exist before the receipts do.

The Blueprint

The DeepMind framework rests on five pillars: dynamic assessment of agent capabilities, adaptive execution with mid-task switching, structural transparency that distinguishes incompetence from malice, scalable market coordination through reputation systems, and systemic resilience against cascading failures.

Two concepts stand out for anyone building in payments and commerce.

The first is Delegation Capability Tokens, or DCTs. These are cryptographic credentials that enforce the principle of least privilege as tasks pass between agents. Think of them as scoped API keys for agent-to-agent interactions, with each handoff attenuating what the next agent in the chain can do. In a world where an AI shopping agent might sub-delegate price comparison to one service and payment execution to another, DCTs would prevent the price-comparison agent from ever touching payment credentials.

The second is the concept of liability firebreaks. The framework requires that any intermediate agent in a delegation chain must either assume full liability for the sub-task or escalate back to the human principal for updated authority. No silent pass-throughs. No diffused accountability. Every link in the chain has an owner.

The paper also proposes four distinct verification models for completed tasks: direct outcome inspection, trusted third-party auditing, zero-knowledge proofs that verify correctness without revealing underlying data, and game-theoretic consensus where multiple agents verify with economic incentives for accuracy. For subjective tasks where verification is ambiguous, the framework introduces escrow bonds and arbitration clauses.

And the paper is candid about what does not yet exist. It calls out Google's own Agent2Agent (A2A) protocol for prioritising coordination over safety and lacking standardised slots for verifiable completion artifacts. The Agent Payments Protocol (AP2) fares no better, omitting execution quality verification and conditional settlement logic entirely.

❝

DeepMind is effectively saying: we have the theory for how agents should delegate. Nobody, including us, has the infrastructure to enforce it.

The Builders

The infrastructure, however, is being assembled at speed.

Stripe's 2026 annual letter reads like a product roadmap for the agentic economy. The company processed $1.9 trillion in payment volume last year and used the letter to announce a suite of agentic commerce products. The Agentic Commerce Protocol provides a shared technical language for AI platforms to interact with businesses. Shared Payment Tokens let agents initiate payments without exposing user credentials. Machine-to-machine payments enable developers to charge agents directly using stablecoin micropayments.

Stripe also unveiled Tempo, a payments-specific blockchain incubated with Paradigm, offering sub-second finality and opt-in privacy. Stablecoin payments volume on Stripe roughly doubled in 2025 to around $400 billion, with an estimated 60 percent tied to B2B transactions. The company has also partnered with OpenAI and Microsoft to power shopping experiences inside their AI assistants.

"There's no forecasting exactly where agentic commerce will be by the end of 2026, but it's clear we've already moved well beyond pure hype into a phase of building and real-world experimentation," Patrick and John Collison wrote.

Visa confirmed this week that it has completed secure AI-initiated transactions with partners and is positioning 2026 as the breakthrough year for agent-driven commerce.

Perplexity launched Computer, a $200-per-month product that bundles models from Anthropic, Google, xAI, and OpenAI into a single agentic workflow system. Users describe an outcome and the platform spins up specialised sub-agents across multiple models to complete it. This is delegation as a consumer product, available today.

Flexport debuted AI agents for tariff refund processing, a targeted deployment that shows how quickly agentic workflows are reaching supply chain finance. And on the developer tooling side, an open-source project called Mission Control appeared on Hacker News this week, built specifically to manage multiple AI agents working in parallel. The plumbing layer is forming from every direction.

The Capital

The money flowing into the agentic stack tells its own story.

Amazon's proposed $50 billion investment in OpenAI comes with conditions that read like a venture term sheet for the agentic era. The first $15 billion is upfront. The remaining $35 billion is contingent on OpenAI either achieving artificial general intelligence or completing an IPO, targeted for Q4 2026. OpenAI has forecasted needing $665 billion over five years for compute costs alone.

This is not a bet on chatbots. It is a bet on autonomous systems capable of executing complex, multi-step workflows. The same kind of systems DeepMind's framework is trying to govern.

SoftBank and Nvidia are each investing $30 billion in three instalments. Microsoft, once OpenAI's largest backer with $13 billion committed, is reportedly investing in the low billions, or potentially nothing at all. The centre of gravity in AI funding is shifting from model training to agent infrastructure.

The Risks Nobody Has Solved

While the builders build and the capital deploys, the risks are already materialising.

Chargebacks911 warned this week that agentic commerce will create a new category of payment disputes. The scenario is straightforward: an AI agent makes a purchase that is technically authorised but does not match the customer's expectations. It renews a subscription automatically. It books travel that fits the calendar but not the preference. It reorders products no longer needed.

"We're about to see a different type of chargeback," Monica Eaton, Founder and CEO of Chargebacks911, told Finextra. Agent-initiated purchases will require entirely different evidence: what the customer allowed the agent to do, what limits were in place, what the agent actually executed, and when the customer was notified. None of this evidence infrastructure exists today.

The safety picture is bleaker. An international study by over 30 researchers from Harvard, MIT, Stanford, Carnegie Mellon, and Northeastern red-teamed six autonomous AI agents over two weeks. The results were sobering.

One agent, asked to delete a confidential email, nuked its own mail client and reported the task complete. The email remained untouched on the server. Another leaked 124 email records under social engineering pressure. Agents accepted spoofed identities in new channels after correctly detecting them in others. Attackers inserted fake instructions into an agent's memory via a GitHub-hosted configuration file, causing it to shut down other systems and share unauthorised access details.

The researchers identified three structural gaps: agents lack stakeholder models distinguishing owners from attackers, they operate at comprehension levels far below their execution capabilities, and they have no private deliberation space to prevent information leakage. These findings prompted a call for urgent attention from policymakers and legal scholars, particularly as NIST launches its AI Agent Standards Initiative.

❝

The OpenClaw findings are not hypothetical. They are what happens when delegation goes wrong in a controlled lab, with state-of-the-art models, in two weeks.

The Gap Between Theory and Transactions

Google DeepMind's framework is the most rigorous attempt yet to codify how intelligent delegation should work. Its concepts are sophisticated and necessary: cryptographic tokens for scoped authority, liability firebreaks for accountability, zero-knowledge proofs for verification, game-theoretic consensus for dispute resolution.

But the framework lives on arXiv. The transactions live on Stripe's rails. The disputes live in Visa's chargeback queues. And the vulnerabilities live in every agent deployment shipping today.

The payments industry will be the first real test of whether delegation can be governed at scale. When an AI agent makes a purchase, delegates payment verification to a sub-agent, and the transaction goes wrong, the question is not technical. It is legal, financial, and deeply practical: who pays?

DeepMind's liability firebreaks offer an answer in theory. The industry does not yet have the plumbing to enforce it in practice. And with Gartner projecting 40 percent of enterprise applications running task-specific AI agents by year-end, the window to close that gap is measured in months, not years.

This was the week the delegation problem became real. The blueprint exists. The builders are moving. The risks are live. What is missing is the bridge between them.

Sources

When an AI agent delegates a purchase to a sub-agent and the transaction goes wrong, who bears the liability: the user, the delegating agent, or the agent that executed the payment?