The infrastructure layer that will decide what AI knows and who gets paid for it.

The free-for-all era of AI companies scraping the web is ending. The question now is who becomes the intermediary, and what publishers actually get paid.

On February 9, Amazon began circulating slides to publishing executives ahead of an AWS conference, outlining plans for a content marketplace where publishers can sell their work directly to companies building AI products. According to The Information, the marketplace would sit alongside Amazon Bedrock and QuickSight in the AWS product portfolio, positioning content licensing as core AI infrastructure rather than an afterthought. This move signals Amazon's recognition that sustainable AI models require transparent, licensed content relationships rather than continued reliance on web scraping.

One week earlier, Microsoft launched the Publisher Content Marketplace, a platform doing essentially the same thing from the other side of the cloud wars.

Two of the three major cloud providers are now building the same product simultaneously. That is not a coincidence. It is a signal that content licensing is rapidly transitioning from a legal afterthought to a core infrastructure problem, one that shapes who controls access to quality training data and how AI developers manage their computational costs.

The content licensing problem is about to get an infrastructure layer. Whoever builds it controls the economics of what AI knows.

What Amazon Is Building

Details remain thin, but the slides paint a clear picture. Amazon wants to create a marketplace where publishers set their own terms, AI companies buy access to licensed content, and AWS sits in the middle collecting the transaction. An Amazon spokesperson told Reuters the company had "nothing specific to share," while adding that it has "built long-lasting relationships with publishers."

The strategic logic is straightforward. Amazon has committed $200 billion in capital expenditure for 2026, with a significant portion going to AI and cloud services. Anthropic, Amazon's primary AI partner, is expected to spend roughly $7 billion on inference and $12 billion on training in 2026. Models need data. Licensed data is better than scraped data, both legally and in quality. A marketplace ensures supply while creating a new revenue stream that ties publishers deeper into the AWS ecosystem.

This is Amazon doing what Amazon does: building the marketplace, taking a position in the transaction, and letting everyone else set the prices. The economics of this arrangement favor the platform operator significantly. Amazon captures a transaction fee on every piece of licensed content while shouldering minimal risk, as publishers and AI companies directly negotiate pricing terms.

Microsoft Got There First

Microsoft's Publisher Content Marketplace launched on February 3, co-designed with The Associated Press, Condé Nast, Business Insider, Hearst Magazines, Vox Media, and USA TODAY. The model is usage-based: publishers define licensing terms, AI builders access content for specific grounding scenarios, and reporting shows publishers exactly how their content was used and where it added value.

Microsoft Copilot is the first AI buyer on the platform. Yahoo is among the first demand-side partners being onboarded.

The timing is not accidental. Microsoft faces active copyright lawsuits from The New York Times, The Intercept, and other publishers over training data. A licensing marketplace does not make those lawsuits disappear, but it establishes a forward-looking framework that says: we are willing to pay, and here is the mechanism. This represents a critical strategic shift in how major cloud providers view content relationships, moving from confrontational litigation defense to proactive licensing infrastructure.

Microsoft's marketplace represents not just a legal settlement mechanism, but a fundamental re-architecting of how AI models access information. By making content provenance and licensing explicit, publishers gain visibility into how their work contributes to model quality and can negotiate accordingly. The platform transforms content from a commodity scraped freely to a strategic input with measurable value.

Meanwhile, the Really Simple Licensing (RSL) open standard is gaining traction among publishers. RSL embeds licensing terms directly into websites, specifying payment requirements for any bot scraping content. It is the robots.txt of the AI era, and it is designed to work with or without a platform marketplace.

Why This Matters for Payments and Commerce

This is where the story connects to the broader shift we have been tracking at Major Matters.

Content licensing for AI is not a one-time transaction. It is a recurring, usage-based revenue stream. Every time a model grounds a response in licensed content, there is a metering event, a calculation, and a payment. At scale, this is a B2B payments infrastructure problem. Someone has to track the usage, calculate the fees, and move the money. The winners in this infrastructure layer will capture more value than the marketplace operators themselves, which is why both Amazon and Microsoft are positioning themselves as the underlying platforms rather than just intermediaries.

For publishers who have watched their Google traffic decline by a third in the past year, these marketplaces offer something search never did: a direct commercial relationship with the AI companies consuming their work. No middleman ad network. No hope that someone clicks through. Just a licence fee for content that has measurable value. Early signals suggest licensing fees could range from fractions of a cent to dollars per usage, depending on content quality and licensing exclusivity.

The shift from ad-supported digital publishing to licensing-based AI revenue fundamentally changes how we value information. Publishers no longer compete on traffic and audience size, but on content precision and trustworthiness. For the first time, investigative journalism, fact-checked reporting, and domain expertise have explicit economic value in the AI supply chain.

Smaller startups like ProRata.ai and TollBit have been building similar marketplaces, but they lack the inventory and distribution to compete with AWS and Azure. This is a scale game, and the cloud giants are best positioned to win it. The question is whether publishers end up with a genuine market or a duopoly that sets the terms for them. If Amazon and Microsoft dominate these platforms, they will effectively control what information their models consider trustworthy and how publishers are compensated for that designation. That is a significant concentration of power over the information economy.

What to Watch

The obvious missing player is Google. As both the largest search engine and a major AI model builder through DeepMind and Gemini, Google has the most complicated relationship with publisher content. It uses that content to power AI Overviews, which are reducing clicks to publisher sites, while simultaneously depending on publishers to keep creating the content that makes those overviews useful. Whether Google builds its own marketplace, joins one of the existing ones, or tries to avoid the issue entirely will tell us a lot about where the power sits. Google's decision will be particularly revealing, as the company has historically benefited from free content access and may resist participating in licensing frameworks that reduce its competitive advantages.

The pricing models will also matter enormously. If licensing fees are low enough that publishers see pennies per query, the marketplace becomes a fig leaf, offering the appearance of compensation without real economic impact. If they are high enough to genuinely compensate for lost traffic revenue, the cost of running AI models goes up, which has implications for everyone from OpenAI to the smallest startup fine-tuning on Bedrock. We should watch closely for what licensing fees actually look like in the first few months of these platforms.

And there is a deeper question lurking underneath: does this create a two-tier AI? Premium models grounded in licensed, high-quality content from established publishers versus budget models trained on whatever is freely available. If the best information is locked behind licensing agreements, the quality gap between AI products could widen significantly. This divergence could reshape competitive dynamics in the AI market, where content quality becomes a more explicit moat than compute power alone.

Related reading from Major Matters:

Sources

If the value of AI increasingly depends on which content it can legally access, does the content marketplace become more valuable than the AI itself?

Reply

Avatar

or to participate

Keep Reading