AI Is Costing Enterprise More Than the Employees It Replaced

Written byRhys Donnelly

PublishedMay 23, 2026

UpdatedJun 17, 2026

9 min read

The Bill Arrived

The promise of enterprise AI in 2024 was straightforward: replace expensive human labor with cheap tokens, improve productivity, reduce headcount. The pitch was clean enough that hundreds of organizations either ran pilots or fully deployed AI coding tools, customer service agents, and workflow automation across every function that looked automatable. The productivity gains were real in many cases. The cost projections were not.

Fortune’s headline from May 22 lands hard: “Microsoft reports are exposing AI’s real cost problem: Using the tech is more expensive than paying human employees.” This isn’t a contrarian take or a tech pessimism piece. It’s a summary of what the internal reporting at Microsoft — one of the largest enterprise AI deployments in the world — is showing to the people responsible for managing the budgets. The AI tools are being used. They are not cheap. And in multiple documented cases, the cost of running the tools has exceeded the cost of the human labor they were positioned to replace or augment.

Microsoft is canceling most of its direct Claude Code licenses and moving engineers back toward GitHub Copilot CLI. Uber burned through its entire 2026 AI coding tools budget in four months, having actively encouraged adoption through internal leaderboards that ranked teams by AI tool usage. These are not isolated edge cases. They are the leading indicators of a broader reckoning with the actual economics of AI deployment at scale.

The Tokenmaxxing Problem

The term “tokenmaxxing” has emerged from internal discussions at tech companies to describe the behavior pattern that makes the cost problem structural rather than marginal. When employees are incentivized to use AI tools — through leaderboards, efficiency mandates, or management pressure to demonstrate AI adoption — they maximize AI usage rather than maximizing productive output. Token consumption increases faster than output quality. The AI is being used because using the AI is the measurable behavior, not because each specific use of the AI produces proportional value.

Uber’s leaderboard system created exactly this dynamic. Teams that ranked high on AI tool usage were visibly “doing AI.” Teams that used AI more selectively but produced better outcomes were less visible in the metric that management was tracking. The rational response to being evaluated on a usage metric rather than an outcome metric is to maximize usage, regardless of the marginal value of each additional AI interaction. Four months into the year, the budget was gone.

The tokenmaxxing phenomenon is not unique to Uber. It is the predictable outcome of any enterprise rollout that measures adoption rather than value. The AI vendor’s incentive is to report high adoption numbers — more tokens consumed means more revenue. The internal champion’s incentive is to demonstrate that the AI initiative they sponsored is being used. The individual employee’s incentive is to use the tool that they’ve been told to use. Everyone in the chain has a reason to maximize token consumption, and nobody in the chain is directly responsible for whether the token consumption produced proportional business value.

Agentic AI Makes This Worse by Orders of Magnitude

The cost problem with standard AI coding assistants — chatbot-style interfaces where a developer asks a question and receives an answer — is manageable if usage discipline exists. The cost problem with agentic AI is structurally different. Tom’s Hardware reports that agentic AI consumes up to 1,000 times more tokens than standard AI for equivalent tasks. Goldman Sachs forecasts that agentic AI will drive a 24-fold increase in token consumption by 2030 as enterprises adopt AI agents, reaching 120 quadrillion tokens per month.

An agentic system that executes a multi-step task — researching, drafting, reviewing, revising, and submitting a document, for instance — consumes tokens at every step, including the reasoning steps between actions. The model thinks out loud in tokens. It reads tool outputs in tokens. It writes intermediate plans in tokens. A task that a human completes in forty-five minutes might generate tens of thousands of tokens of intermediate reasoning and output that never reaches the end user, but all of which is billed by the model provider.

For tasks where the agent completes the work successfully and the cost is less than the human equivalent, this is fine. For tasks where the agent fails, retries, or produces output that requires significant human correction, you have paid for the token consumption of a failed attempt and still need the human labor to finish the job. The failure cost is tokens plus human time, which is strictly worse than human time alone.

Nvidia’s Bryan Catanzaro, speaking internally, said: “For my team, the cost of compute is far beyond the costs of the employees.” He was speaking about ML research, where compute costs are exceptionally high. But the direction of the ratio is the same across enterprise functions as agentic AI usage scales: compute costs grow faster than the productivity gains that justify them, until the organization reaches a deployment scale where the gains are large enough or the token costs are low enough that the economics invert.

Microsoft’s Specific Situation

Microsoft’s cancellation of most direct Claude Code licenses — moving engineers to GitHub Copilot CLI instead — is simultaneously a cost management decision and a strategic one. Copilot is Microsoft’s own product, powered by OpenAI models under the Microsoft-OpenAI partnership agreement. Claude Code is Anthropic’s product. When Microsoft licenses Claude Code for its engineers, it pays Anthropic for the tokens. When Microsoft uses GitHub Copilot CLI, the economics are internal — the compute costs are real but the payment structure is different.

The engineers who had been using Claude Code were not using it incorrectly. They were using it the way the product is designed to be used: as a coding assistant that could handle complex, multi-step engineering tasks. The problem was that Claude Code’s power as an agentic coding tool meant high token consumption per session, and at the scale of thousands of Microsoft engineers using it, the cumulative cost exceeded what Microsoft had budgeted for external AI tool licenses.

This is a case where the product worked as designed and the economics didn’t work at scale. That’s a different problem than the product being bad. It’s a problem with how enterprise AI tools are priced relative to the value they produce when deployed across large engineering organizations. Anthropic and other model providers will need to develop enterprise pricing structures that decouple cost from token volume for organizations that have both high usage and usage discipline — where the high consumption is producing proportional value but the bill is still unacceptable relative to the benchmark of human labor cost.

What the Reckoning Produces

The cost reckoning doesn’t mean AI tools don’t work or don’t produce value. It means the ROI calculation that enterprise buyers made in 2024 was based on token costs and productivity assumptions that didn’t survive contact with production deployment at scale. The revised calculation requires acknowledging that: AI tools produce uneven value across different task types; token costs at agentic scale are substantially higher than chat-mode costs; adoption incentives that measure usage rather than outcomes will generate wasteful token consumption; and the comparison to human labor cost needs to include the cost of the human labor still required to manage, review, and correct AI output.

For AI model providers, the reckoning means pricing pressure. Enterprise customers who discovered their AI budgets were wrong are negotiating harder on renewal. They’re asking for usage-based caps, volume discounts that reflect enterprise deployment economics, and SLAs that tie costs to outcomes rather than token consumption. These are normal commercial pressures that the vendor market was going to face as the enterprise AI market matured. The Fortune headline and the Microsoft and Uber examples are the moment that maturity begins arriving.

For enterprises, the reckoning means adoption will slow from “deploy everywhere and measure usage” to “deploy where the economics work and measure outcomes.” That’s a more sustainable approach. It’s also a less exciting narrative for AI vendors who were reporting adoption curves that looked like hockey sticks. The hockey stick was partly real productivity and partly tokenmaxxing. Separating them is the work the enterprise AI market is now doing.

The bill arrived. Reading it carefully is how the market figures out what it actually bought.

The Perceptual Gap Between What AI Was Sold As and What It Actually Bills

The token bill arrived and it turns out to be larger than the productivity gain. This should not be surprising to anyone who has thought carefully about how organisations adopt new technologies — and yet it has surprised nearly every enterprise that adopted AI tooling in 2023-2024 at scale.

The surprise is not an economic failure. It is a perceptual failure. The sales process for AI coding tools, and for enterprise AI more broadly, was conducted in the register of capability: what the tool can do, which tasks it handles, how many hours it saves. The billing cycle operates in a different register entirely: what the tool consumed, how many tokens were processed, what the compute actually cost per interaction. The two registers are not connected by any transparent conversion factor the buyer can evaluate before purchase. The gap between them is where the cost overrun lives.

This is structurally identical to how subscription software has always been sold versus how it has always been used. The vendor demos the maximum-use case; the buyer budgets for the average-use case; the actual-use case, once employees discover the tool is useful and reach for it constantly, lands somewhere between the two and produces a bill that matches neither. The difference with AI tooling is that the scaling factor is not seats but interactions — and interactions are harder to predict because they are driven by use-case discovery, not headcount.

The term tokenmaxxing — employees maximising their use of the token budget whether or not each use is cost-justified — is the correct description of what happens once the tool is available and the cost is invisible to the user. Visibility is the fix. The AI capex bet the large platforms made assumed the productivity gains would cover the compute cost; the tokenmaxxing data is the early evidence on whether that assumption holds at the enterprise level.

The Token Bill Exposes a Mismatch in How Enterprise Sold AI Internally

Rory Sutherland’s behavioral economics lens centers on the observation that value is subjective and that the problem is often not what it appears to be. The enterprise AI cost overrun is not primarily an economics problem. It is a framing problem that became an economics problem.

Enterprise AI was sold internally as a headcount alternative. The ROI spreadsheet compared the tool cost to the salary being replaced. The tool looked cheap in that comparison. What the spreadsheet did not model is that agentic AI tools don’t have a fixed consumption cost — they have a variable token consumption that scales with usage in ways that don’t map to the headcount math. A developer who would have spent three hours on a problem now runs twenty agent loops to solve it in thirty minutes. The output is better. The token bill for those twenty loops was not in the procurement forecast.

Microsoft’s investor communications show enterprise AI revenue growing strongly even as individual enterprise customers report cost overruns. The revenue growth and the customer cost complaints are the same phenomenon from different sides of the transaction. What Uber’s public statements on AI tooling costs add is that the overrun is not unique to one sector or one tool — it is a pattern across any enterprise where AI agents run at scale. This is part of the same structural shift that has redirected the $700 billion in AI infrastructure spending toward inference capacity rather than training compute. The reckoning is not that AI is too expensive. It is that the expectation, the one that got the budget approved, was formed for a different product than the one that arrived.

Rhys Donnelly

Rhys Donnelly studied electrical engineering at Trinity College Dublin before pivoting to journalism. He has visited semiconductor fabs in Taiwan, South Korea, and TSMC’s Arizona facility. Based in San Francisco, he covers the full stack from process node economics to platform strategy, with particular focus on where the AI infrastructure buildout creates genuine constraints versus vendor narratives.

Latest Posts

Alani Tahir

AMD Outran Nvidia by More Than 100 Points in 2026. The AI Chip Trade Just Priced In Commoditization

Tech·10 min read·Updated Jul 15, 2026

Kai Nakamura

Amazon’s $20 Billion Silicon Business Is a Threat to Decentralized Compute, Not a Validation of It

AI·10 min read·Updated Jul 15, 2026

Nadia Mercer

The GENIUS Act Deadline Doesn’t Legitimize Stablecoins. It Picks Winners, and Circle Already Won

Crypto·10 min read·Updated Jul 15, 2026