Google-Agent vs Googlebot: The Access Control Shift

Google now distinguishes between Googlebot, which crawls your site on a schedule, and Google-Agent, which acts on behalf of users in real time. Google-Agent ignores robots.txt entirely. For merchants, the access control question just changed.

On March 20, Google quietly added a new entry to its official list of user-triggered fetchers: Google-Agent. If you weren't paying attention to crawler documentation that week, you missed it. Most people did. But this is one of the most consequential changes to how the web works in years.

Here is what happened. Google split its web access into two distinct entities. Googlebot is the crawler you know. It visits your site on a schedule, indexes your pages, and obeys your robots.txt file. It has done this for decades. Google-Agent is something different. It acts on behalf of a real user, in real time, to complete tasks. It serves Gemini, NotebookLM, AdSense, Google Shopping, and a growing list of AI agent products.

Both run on the same crawling infrastructure. Same pipes. Different purposes.

And here is the part that matters: Google-Agent does not follow robots.txt.

❝

The 30-year-old access control framework that merchants have relied on to manage who crawls their sites does not apply to the entity Google built to act inside them.

What Google Actually Did

The distinction is technical but the implications are commercial.

Googlebot is a scheduled crawler. It visits your site autonomously, reads your pages, and feeds that information into Google's search index. It respects robots.txt because it is a crawler. That is the deal. You tell it which pages to index, it listens. This relationship is the foundation of how search has worked since the mid-1990s.

Google-Agent is a user-triggered fetcher. When someone uses Gemini to research a product, or Google Shopping's AI finds a deal, or an agent browses your catalogue on a user's behalf, that is Google-Agent. It has its own user agent string and a dedicated IP range file. It is a formally distinct identity.

Because Google-Agent is "user-triggered," Google classifies it the same way it classifies a browser. And browsers don't follow robots.txt. They never have. Robots.txt was designed to tell automated crawlers where they can and cannot go. A user clicking a link in Chrome ignores your robots.txt. Google-Agent now has the same status.

This is not a loophole. It is a design decision. Google is saying: when our AI acts for a user, it is the user. Not a crawler. Not a bot. A user.

The distinction feels reasonable until you think about what it means in practice.

The robots.txt Illusion

If you are a merchant who added AI crawler blocks to your robots.txt file in the past two years, you probably felt you had some control over how AI systems accessed your content. Many businesses did exactly that after OpenAI's GPTBot, Anthropic's ClaudeBot, and other AI crawlers started hitting sites at scale. Block the user agent, protect your content. Simple.

Except Google-Agent bypasses all of it. Not because it is breaking the rules. Because Google defined it as outside the scope of those rules.

This matters enormously for merchants. Your product catalogue, your pricing data, your inventory levels, your checkout flow: Google-Agent can access all of it. And unlike Googlebot, which reads your pages and goes away, Google-Agent can interact with them. It can add items to a cart. It can initiate a purchase. It can negotiate on behalf of a user.

Robots.txt was never designed for this. It was designed to tell a librarian which shelves to catalogue. Google-Agent is not a librarian. It is a customer walking through your door.

The only ways to control Google-Agent access are the same ways you'd control any user: authentication, server-side permissions, or simply not exposing data publicly. If your product feed is open to the web, it is open to Google-Agent. Full stop.

As DataDome's research puts it, robots.txt enforcement has always been voluntary, and many AI agents ignore it entirely. Google just made the quiet part loud. They aren't even pretending their agent will follow it.

The Emerging Access Control Stack

Google did not create this problem in a vacuum. The entire industry is scrambling to figure out what replaces robots.txt in an agentic world. Multiple frameworks are emerging simultaneously, and none of them are dominant yet.

Mastercard's developer guides now encourage businesses to configure agents.txt files for AI shopping agents. The community is converging on /.well-known/agents as a universal discovery endpoint, a standard location where AI agents can find out what a site allows and what it doesn't.

There is also the proposed ai.txt format, which offers granular control that robots.txt never had. Allow summarisation but block image extraction. Allow search indexing but block training data. Allow browsing but block purchasing. These are the kinds of distinctions merchants actually need, and robots.txt was never built to express them.

Then there is WebMCP, released in February 2026. It lets websites declare what AI agents can do on them. Think of it as a menu for machines: here are the actions available, here are the constraints, here is how to authenticate. Sites that implement WebMCP tell agents what they can do. Sites without it may simply not appear when users ask an AI agent to "buy this" or "book that."

Cloudflare has responded with managed robots.txt tools for AI content control, giving site operators a dashboard to manage which AI crawlers get access. Akamai published its "Bot Management for the Agentic Era" framework. Known Agents (formerly Dark Visitors) tracks AI agent user agents, creating a living registry of who is crawling the web and why.

None of these are standards yet. All of them are attempts to fill the vacuum that Google just exposed.

❝

The web is moving from a single access control question ("can you crawl this?") to a layered one: who are you, who sent you, what do you want to do, and are you allowed to do it?

The Commerce Implications

For merchants, this is not an abstract web standards debate. It is an operational question with revenue attached.

Shopify already requires that "buy-for-me" agents include human verification steps and use Shopify's built-in checkout. That is a clear line: agents can browse, but purchasing requires guardrails. It is also an implicit acknowledgement that agents will be buying, and the platform needs to control how.

Amazon started hiring "agentic commerce" specialists earlier this year, and CEO Andy Jassy has publicly acknowledged that AI agents will "permeate daily life." When Amazon staffs up for something, it tends to arrive fast.

The payments layer is the crux of it. We have covered the trust gap in agentic commerce extensively, and the missing dispute resolution layer that sits underneath it. Google-Agent makes both problems more urgent. If an AI agent that bypassed your robots.txt accesses your site, interacts with your checkout, and completes a purchase on behalf of a user, who is liable when something goes wrong? The user who triggered it? Google, whose agent executed it? The merchant who had no way to block it?

This connects directly to what we explored in our analysis of agentic payments: the infrastructure for agents to pay is being built, but the infrastructure for agents to be held accountable is not.

The agentic commerce stack is assembling itself in real time. Discovery protocols, trust frameworks, payment rails, and now access control. Each piece matters. But access control may matter most, because it determines who gets through the door in the first place.

The Security Problem Nobody Wants to Talk About

Google-Agent is not the only entity claiming to act on behalf of users. It is just the most legitimate one.

Attackers are already impersonating AI crawlers to bypass security. DataDome's research documented agents spoofing user agent strings for ChatGPT-User, MistralAI-User, and Perplexity-User to slip past bot management systems. If a site whitelists known AI agents, a spoofed user agent string is a free pass.

Google-Agent's dedicated IP range file helps here. You can verify that traffic claiming to be Google-Agent actually comes from Google's infrastructure. But that only solves the Google problem. The broader AI agent landscape has no equivalent verification system. As we covered in our piece on NVIDIA's agent security gap, the security layer for agentic commerce is dangerously thin.

AI agents can also be manipulated via prompt injection, exposing content, privacy, and site security risks that robots.txt was never designed to address. An agent that is supposed to buy a flight could be tricked into exposing a user's payment credentials. An agent browsing a product catalogue could be redirected to extract pricing data for a competitor.

This is the tension at the heart of the Google-Agent decision. When you classify an AI agent as a "user," you give it user-level access. But users have intent. Users have context. Users have judgment. AI agents have instructions. Those are different things, and the security models for them should be different too.

Google's move also echoes the infrastructure-level access decisions we examined in our FTC debanking analysis. When a platform decides who gets access and on what terms, without the affected parties having meaningful input, the consequences ripple through every layer of commerce that sits on top.

What Comes Next

Robots.txt is 30 years old. It was a gentleman's agreement between website owners and search crawlers, and it worked because both sides had reasons to honour it. Crawlers that ignored robots.txt got blocked at the IP level. Site owners who were too restrictive disappeared from search results. Mutual self-interest kept the system functional.

That equilibrium is gone. The web now has entities that are neither crawlers nor users but something in between. Google chose to put its agent on the "user" side of that line. Others will make different choices. The result will be a fragmented access control landscape where merchants need to manage multiple frameworks, verify multiple agent identities, and make nuanced decisions about what different agents can do.

The merchants who figure this out first will have an advantage. Not just in controlling costs and protecting content, but in shaping how agentic commerce develops. If you implement WebMCP, your site is visible to AI agents for transactional queries. If you configure agents.txt with clear permissions, legitimate agents can interact confidently and bad actors are easier to identify. If you do nothing, you are leaving it to every AI agent to decide for itself what it can do on your site.

Google's split between Googlebot and Google-Agent is not the end of this story. It is the starting gun. Every major AI lab, every search engine, every commerce platform will need to make the same distinction. And every merchant will need to decide: do we treat these agents as crawlers to be managed, customers to be served, or something new that requires its own set of rules?

The answer, almost certainly, is the third option. We just don't have the rules yet.

Sources

Your robots.txt was built to manage librarians. Google just sent a customer through your door with no ID and full purchasing authority. How are you going to decide who gets in?