
OpenAI and Anthropic shipped agent management platforms within hours of each other. The message is clear: stop talking to AI and start managing it.
On February 5, something quietly remarkable happened. Within hours of each other, OpenAI launched Frontier, an enterprise platform for building, deploying, and managing AI agents, and Anthropic released Claude Opus 4.6, its most capable model to date, built explicitly for multi-step autonomous work. One gave AI agents employee IDs. The other gave them longer attention spans and better judgement.
This was not a coincidence. And it was not a routine product update.
The two most important AI companies on the planet looked at the same data, talked to the same enterprise customers, and arrived at the same conclusion: the chatbot era is over. The agent era has begun. And the companies selling picks and shovels just pivoted from selling chatbots to selling workforce management.
We have been tracking this shift at Major Matters for months. The SaaS selloff, the $650 billion AI capex arms race, the Moltbook breach. They are all threads in the same story. But this week, the story got a name.
The AI industry is no longer asking "can machines think?" It is asking "can machines work?" The answer, it turns out, is complicated.
The Pivot
OpenAI Frontier is not another wrapper around GPT. It is an enterprise operating system for AI workers. Every agent built on Frontier gets its own identity, customisable permissions, data access boundaries, and an audit trail. Managers can onboard agents through natural language, connect them to CRM systems, ticketing tools, and data warehouses, then monitor their performance through dashboards that track metrics like tickets processed and success rates.
The pitch is striking in its corporate familiarity. Frontier agents "build memories" of completed tasks and improve over time. They operate across local environments, enterprise cloud infrastructure, and OpenAI-hosted runtimes. HP, Oracle, State Farm, and Uber are already signed up. OpenAI CFO Sarah Friar told CNBC that enterprise customers now account for roughly 40 percent of OpenAI's business, with expectations to reach 50 percent.
The platform is also open. Frontier can manage agents built by other companies, not just OpenAI's own. That is a deliberate play for infrastructure dominance: become the operating system that manages everyone's agents, not just yours.
Early results from Frontier customers paint an aggressive picture. A major manufacturer reportedly reduced production optimisation work from six weeks to one day. A global investment company deployed agents across its sales process and freed up over 90 percent of its salespeople's time for customer-facing work. Another tech customer saved 1,500 hours a month in product development. These numbers are self-reported and should be treated with caution, but they signal the scale of ambition enterprises are bringing to agent deployment.
Hours later, Anthropic dropped Claude Opus 4.6. Where Frontier focuses on the management layer, Opus 4.6 focuses on the worker itself. The model ships with a one million token context window, adaptive reasoning controls, and expanded safety tooling, all optimised for agentic coding and multi-step autonomous tasks. This is not a chatbot that answers questions. It is a system designed to take actions, chain decisions, and complete work across extended sessions.
Ars Technica captured the moment with precision: AI companies want you to stop chatting with bots and start managing them.
Two companies. Two launches. One message: AI is no longer a tool you use. It is a worker you manage.
The 24 Percent Problem
There is just one issue with the "AI coworker" pitch. The coworkers are not very good at their jobs. Not yet. And the data on just how bad they are is sobering enough that we think it deserves more attention than it is getting.
The APEX-Agents benchmark, a productivity index built from 480 real tasks designed by investment banking analysts, management consultants, and corporate lawyers, found that AI agents achieved a 24 percent success rate on knowledge-intensive work. The tasks spanned 33 detailed project environments with access to documents, spreadsheets, presentations, email, calendars, and code execution. When tested for consistency across all eight runs, the top-performing model managed just 13.4 percent.
A Carnegie Mellon University study, conducted in collaboration with Salesforce, reinforced the picture. Researchers built a simulated technology company staffed entirely by AI agents using models from OpenAI, Google, Anthropic, and Amazon. The agents were assigned roles (CTO, HR manager, engineer) and tested on their ability to work independently using internal chat, company handbooks, and websites. No agent completed more than 24 percent of its assigned tasks. They confused information, fabricated data, and made decisions a human would easily avoid.
This is not a fringe finding. G2's Enterprise AI Report found that while 57 percent of companies now have AI agents in production, 32 percent cite quality as the single biggest barrier to scaling.
The gap between demo and deployment is wide enough that both OpenAI and Anthropic have quietly started doing something unexpected: becoming consultants. The Decoder reported that OpenAI has expanded its technical consulting division to roughly 60 specialised engineers plus over 200 in technical support, all working directly with enterprise clients to get agents functioning. Anthropic is doing the same. The products do not work out of the box. They require hand-holding.
The example that captures this best: French retailer Fnac tested customer support agents from both OpenAI and Google. The agents consistently confused serial numbers, a fundamental failure for a commerce application. The system only became functional after receiving assistance from AI21 Labs, a third-party provider.
Let that sit for a moment. Two of the largest AI companies on the planet built agents that could not reliably distinguish one product from another in a retail setting. And the fix did not come from either of them. It came from a smaller, specialised firm. If our readers take one thing from this section, it should be this: the "just plug in AI" narrative that dominated 2025 is colliding with operational reality in 2026.
AI agents succeed 24 percent of the time. The companies selling them have started sending engineers to make them work. That is the real state of the market.
When Your Agent Goes Shopping Without Permission
If the reliability numbers worry you, the payments implications should keep you up at night. This is where the agent revolution gets personal for our readers.
Financial institutions are racing to deploy AI agents capable of autonomously initiating transactions, approving payments, and freezing accounts in real time. But the infrastructure built to verify humans does not work for machines. Bank Info Security reported that banks now face a dual authentication crisis: they must verify not just the identity of an agent, but its intent.
Consider a simple scenario. You authorise an AI agent to buy concert tickets under $900. The agent, operating autonomously, finds premium seats for $25,000 and purchases them. It authenticated correctly. It had valid credentials. But it exceeded its mandate. Traditional fraud detection, built to spot stolen cards and unusual locations, has no framework for catching an agent that is technically authorised but operationally rogue.
The scale of the challenge is staggering. Non-human identities, API tokens, service accounts, and now AI agents, already outnumber human users on most enterprise networks by a ratio of roughly 100 to one. The Moltbook breach, which exposed 1.5 million API authentication tokens from a social network for AI agents, demonstrated what happens when agent infrastructure moves faster than agent security.
The card networks are responding. Mastercard has launched its Agent Suite with built-in security protocols for agentic commerce. Prove has introduced "Know Your Agent," a verification framework modelled on traditional Know Your Customer (KYC) processes but adapted for non-human actors. Payments Dive reported that agentic robots capable of shopping and paying without human involvement could eventually render traditional retail marketplaces obsolete.
We have written extensively about how the payments industry has spent decades building systems to answer one question: "Is this really you?" Now it needs to answer a fundamentally different one: "Is this agent doing what you actually wanted?"
The distinction matters more than it sounds. Identity verification confirms who is making a request. Intent verification confirms what was authorised. Every fraud model in production today is built for the former. Almost none are built for the latter. And agents are arriving faster than the models can adapt.
"Every Company Is an API Company Now"
If the authentication problem is about controlling what agents do, Sam Altman's latest prediction is about whether control is even possible.
In a statement reported by The Decoder, the OpenAI CEO declared that "every company is an API company now, whether they want to be or not." His argument: AI agents will write their own code to access any service they need, regardless of whether the company offers an official API. Companies cannot opt out. Agents will find a way in.
For payments companies, commerce platforms, and any business that controls access through rate limits, API keys, and developer agreements, this is a direct challenge to the moat. If agents can bypass the front door, the gatekeeping model that underpins billions in platform revenue starts to erode.
Think about what this means in practice. A payments processor that charges for API access is only valuable if integrating through that API is the easiest path. If an agent can reverse-engineer the same functionality by writing its own connector, the toll booth disappears. A commerce marketplace that controls the buyer-seller relationship loses leverage the moment an agent can find, compare, and purchase goods across the open web without ever visiting the platform. The entire value chain that sits between "customer wants something" and "customer gets it" is suddenly up for renegotiation.
The SaaS implications are equally stark. Public B2B stocks are already down 30 to 40 percent in recent weeks, with Microsoft losing $360 billion in market cap in a single day. SpaceX absorbed both xAI and Twitter into a $1.25 trillion private entity. Waymo is adding dozens of new cities. The tectonic plates of the technology industry are shifting simultaneously, and every shift traces back to the same force: autonomous agents that do not respect the boundaries the previous generation of software was built on.
Altman's framing accelerates the existential question facing every mid-tier software company: if an AI agent can do what your product does by writing its own integration, what exactly are customers paying for?
The winners in this scenario are the platforms with deep, irreplaceable system access. Salesforce, Microsoft, and the card networks sit on data and infrastructure that agents need but cannot easily replicate. The losers are the "thin layer" solutions, the niche tools that sit between a user and a system, doing work that an autonomous agent can now handle directly.
When the CEO of the company building the agents tells you that your API strategy is irrelevant, it is worth listening.
What to Watch
The agent era is arriving faster than the governance frameworks needed to contain it. Gartner projects a leap from under five percent of applications embedding agent capabilities in 2025 to 40 percent in 2026. Yet only one in five companies has a mature governance model for AI agents. Security remains the top concern: 62 percent of practitioners and 53 percent of leadership identified it as their primary challenge.
Three things will define the next 12 months.
First, the consultancy trap. If OpenAI and Anthropic need to send engineers to every major customer to make agents work, the unit economics of AI-as-a-service start to look a lot like the unit economics of traditional consulting. Scaling becomes the bottleneck, not the selling point.
Second, the authentication arms race. As agents proliferate across payments, commerce, and financial services, the companies that solve agent identity verification, "Know Your Agent" and intent-based authentication, will own a critical piece of the infrastructure stack. This is a race worth watching.
Third, the platform war. OpenAI Frontier is explicitly pitched as the operating system for AI workers. But Microsoft has Agent 365, Salesforce has Agentforce, and Google has Gemini Enterprise. The competition also extends to vertical players: Mastercard and Visa are building agent infrastructure for payments, while Glean is targeting enterprise knowledge work. The question is not whether enterprises will adopt agent platforms. It is which platform becomes the default. The winner inherits the enterprise relationship for the next decade.
There is also a darker question we should not ignore. Nearly 95 percent of IT leaders report integration as a hurdle to effective AI implementation. Weak observability and immature guardrails are the most common pain points in production. Enterprises cannot scale agents without trust, and trust comes from visibility into what agents are actually doing. Right now, that visibility is alarmingly limited.
The gap between ambition and execution has never been wider. AI agents that can approve payments, freeze accounts, negotiate contracts, and initiate transactions are being deployed into systems that were never designed to accommodate them. The technology is moving. The guardrails are not.
We will be tracking all three of these threads closely. The companies that navigate this transition, that find the balance between agent capability and agent governance, will define the next era of enterprise technology. The companies that do not will become case studies in what happens when you deploy a workforce you cannot control.
Related reading from Major Matters:
Sources
If your AI agent can approve payments, freeze accounts, and negotiate contracts, who exactly is the employee?