BRENT$100.21▼ 3.22%ETH$2,114.55▼ 0.23%NATGAS$3.02▲ 3.92%ZEC$663.81▲ 3.88%ADA$0.2456▲ 0.24%USDS$0.9998▲ 0.03%HYPE$63.49▲ 0.81%XAG$76.20▲ 0.40%FIGR_HELOC$1.03▸ 0.00%XMR$390.12▲ 0.84%BCH$350.97▼ 0.59%BTC$77,462.00▲ 0.48%XRP$1.36▼ 0.19%XAU$4,523.20▲ 0.05%SOL$85.98▼ 0.45%TRX$0.3661▲ 0.67%BNB$661.11▲ 0.16%DOGE$0.1028▼ 0.06%WTI$96.60▸ 0.00%LEO$10.05▼ 0.26%BRENT$100.21▼ 3.22%ETH$2,114.55▼ 0.23%NATGAS$3.02▲ 3.92%ZEC$663.81▲ 3.88%ADA$0.2456▲ 0.24%USDS$0.9998▲ 0.03%HYPE$63.49▲ 0.81%XAG$76.20▲ 0.40%FIGR_HELOC$1.03▸ 0.00%XMR$390.12▲ 0.84%BCH$350.97▼ 0.59%BTC$77,462.00▲ 0.48%XRP$1.36▼ 0.19%XAU$4,523.20▲ 0.05%SOL$85.98▼ 0.45%TRX$0.3661▲ 0.67%BNB$661.11▲ 0.16%DOGE$0.1028▼ 0.06%WTI$96.60▸ 0.00%LEO$10.05▼ 0.26%
Prices as of 10:58 UTC

AI Costs More Than the Employees It Was Supposed to Replace: Microsoft, Uber, and the Token Economics Reckoning

The Bill Arrived

The promise of enterprise AI in 2024 was straightforward: replace expensive human labor with cheap tokens, improve productivity, reduce headcount. The pitch was clean enough that hundreds of organizations either ran pilots or fully deployed AI coding tools, customer service agents, and workflow automation across every function that looked automatable. The productivity gains were real in many cases. The cost projections were not.

Fortune’s headline from May 22 lands hard: “Microsoft reports are exposing AI’s real cost problem: Using the tech is more expensive than paying human employees.” This isn’t a contrarian take or a tech pessimism piece. It’s a summary of what the internal reporting at Microsoft — one of the largest enterprise AI deployments in the world — is showing to the people responsible for managing the budgets. The AI tools are being used. They are not cheap. And in multiple documented cases, the cost of running the tools has exceeded the cost of the human labor they were positioned to replace or augment.

Microsoft is canceling most of its direct Claude Code licenses and moving engineers back toward GitHub Copilot CLI. Uber burned through its entire 2026 AI coding tools budget in four months, having actively encouraged adoption through internal leaderboards that ranked teams by AI tool usage. These are not isolated edge cases. They are the leading indicators of a broader reckoning with the actual economics of AI deployment at scale.

The Tokenmaxxing Problem

The term “tokenmaxxing” has emerged from internal discussions at tech companies to describe the behavior pattern that makes the cost problem structural rather than marginal. When employees are incentivized to use AI tools — through leaderboards, efficiency mandates, or management pressure to demonstrate AI adoption — they maximize AI usage rather than maximizing productive output. Token consumption increases faster than output quality. The AI is being used because using the AI is the measurable behavior, not because each specific use of the AI produces proportional value.

Uber’s leaderboard system created exactly this dynamic. Teams that ranked high on AI tool usage were visibly “doing AI.” Teams that used AI more selectively but produced better outcomes were less visible in the metric that management was tracking. The rational response to being evaluated on a usage metric rather than an outcome metric is to maximize usage, regardless of the marginal value of each additional AI interaction. Four months into the year, the budget was gone.

The tokenmaxxing phenomenon is not unique to Uber. It is the predictable outcome of any enterprise rollout that measures adoption rather than value. The AI vendor’s incentive is to report high adoption numbers — more tokens consumed means more revenue. The internal champion’s incentive is to demonstrate that the AI initiative they sponsored is being used. The individual employee’s incentive is to use the tool that they’ve been told to use. Everyone in the chain has a reason to maximize token consumption, and nobody in the chain is directly responsible for whether the token consumption produced proportional business value.

Agentic AI Makes This Worse by Orders of Magnitude

The cost problem with standard AI coding assistants — chatbot-style interfaces where a developer asks a question and receives an answer — is manageable if usage discipline exists. The cost problem with agentic AI is structurally different. Tom’s Hardware reports that agentic AI consumes up to 1,000 times more tokens than standard AI for equivalent tasks. Goldman Sachs forecasts that agentic AI will drive a 24-fold increase in token consumption by 2030 as enterprises adopt AI agents, reaching 120 quadrillion tokens per month.

An agentic system that executes a multi-step task — researching, drafting, reviewing, revising, and submitting a document, for instance — consumes tokens at every step, including the reasoning steps between actions. The model thinks out loud in tokens. It reads tool outputs in tokens. It writes intermediate plans in tokens. A task that a human completes in forty-five minutes might generate tens of thousands of tokens of intermediate reasoning and output that never reaches the end user, but all of which is billed by the model provider.

For tasks where the agent completes the work successfully and the cost is less than the human equivalent, this is fine. For tasks where the agent fails, retries, or produces output that requires significant human correction, you have paid for the token consumption of a failed attempt and still need the human labor to finish the job. The failure cost is tokens plus human time, which is strictly worse than human time alone.

Nvidia’s Bryan Catanzaro, speaking internally, said: “For my team, the cost of compute is far beyond the costs of the employees.” He was speaking about ML research, where compute costs are exceptionally high. But the direction of the ratio is the same across enterprise functions as agentic AI usage scales: compute costs grow faster than the productivity gains that justify them, until the organization reaches a deployment scale where the gains are large enough or the token costs are low enough that the economics invert.

Microsoft’s Specific Situation

Microsoft’s cancellation of most direct Claude Code licenses — moving engineers to GitHub Copilot CLI instead — is simultaneously a cost management decision and a strategic one. Copilot is Microsoft’s own product, powered by OpenAI models under the Microsoft-OpenAI partnership agreement. Claude Code is Anthropic’s product. When Microsoft licenses Claude Code for its engineers, it pays Anthropic for the tokens. When Microsoft uses GitHub Copilot CLI, the economics are internal — the compute costs are real but the payment structure is different.

The engineers who had been using Claude Code were not using it incorrectly. They were using it the way the product is designed to be used: as a coding assistant that could handle complex, multi-step engineering tasks. The problem was that Claude Code’s power as an agentic coding tool meant high token consumption per session, and at the scale of thousands of Microsoft engineers using it, the cumulative cost exceeded what Microsoft had budgeted for external AI tool licenses.

This is a case where the product worked as designed and the economics didn’t work at scale. That’s a different problem than the product being bad. It’s a problem with how enterprise AI tools are priced relative to the value they produce when deployed across large engineering organizations. Anthropic and other model providers will need to develop enterprise pricing structures that decouple cost from token volume for organizations that have both high usage and usage discipline — where the high consumption is producing proportional value but the bill is still unacceptable relative to the benchmark of human labor cost.

What the Reckoning Produces

The cost reckoning doesn’t mean AI tools don’t work or don’t produce value. It means the ROI calculation that enterprise buyers made in 2024 was based on token costs and productivity assumptions that didn’t survive contact with production deployment at scale. The revised calculation requires acknowledging that: AI tools produce uneven value across different task types; token costs at agentic scale are substantially higher than chat-mode costs; adoption incentives that measure usage rather than outcomes will generate wasteful token consumption; and the comparison to human labor cost needs to include the cost of the human labor still required to manage, review, and correct AI output.

For AI model providers, the reckoning means pricing pressure. Enterprise customers who discovered their AI budgets were wrong are negotiating harder on renewal. They’re asking for usage-based caps, volume discounts that reflect enterprise deployment economics, and SLAs that tie costs to outcomes rather than token consumption. These are normal commercial pressures that the vendor market was going to face as the enterprise AI market matured. The Fortune headline and the Microsoft and Uber examples are the moment that maturity begins arriving.

For enterprises, the reckoning means adoption will slow from “deploy everywhere and measure usage” to “deploy where the economics work and measure outcomes.” That’s a more sustainable approach. It’s also a less exciting narrative for AI vendors who were reporting adoption curves that looked like hockey sticks. The hockey stick was partly real productivity and partly tokenmaxxing. Separating them is the work the enterprise AI market is now doing.

The bill arrived. Reading it carefully is how the market figures out what it actually bought.

Home » AI Costs More Than the Employees It Was Supposed to Replace: Microsoft, Uber, and the Token Economics Reckoning