AMD’s Instinct MI350 Has 288GB of Memory and Claims 40% More Tokens Per Dollar Than Blackwell. Nvidia Still Has 85% of the Market. Here’s Why Both Things Are True.

Written byRhys Donnelly

PublishedMay 20, 2026

UpdatedJun 1, 2026

8 min read

The GPU War Is Real Now

For most of the AI infrastructure buildout that began in earnest in 2022, the GPU procurement question at enterprise scale had one answer: Nvidia. AMD’s Instinct series existed, and Instinct cards have found workload niches in specific inference and HPC applications, but the combination of Nvidia’s CUDA software ecosystem, its relationships with every major hyperscaler, and the performance lead of the H100 and then H200 over AMD’s comparable offerings meant that AI infrastructure procurement decisions were not genuinely competitive. Nvidia was the answer; everything else was a fallback when Nvidia supply was unavailable.

The MI350 series changes the texture of that competition in ways that matter. AMD’s Instinct MI350X ships with 288 GB of HBM3E memory — substantially more than the standard 192 GB configuration of Nvidia’s Blackwell B200. AMD has published benchmark results claiming 40% more tokens-per-dollar than the Blackwell B200 on inference workloads. Multiple independent evaluations have confirmed that the memory capacity advantage produces genuine performance benefits for inference tasks involving very large models — specifically the cases where the model weights and KV cache together approach or exceed 192 GB, which is the configuration that increasingly characterizes frontier model deployment. At those scales, the 288 GB MI350X doesn’t just have more memory — it can run models that the 192 GB B200 cannot run without offloading, which produces latency and throughput advantages that memory capacity alone doesn’t capture.

The CUDA Problem

Nvidia’s 85% market share does not rest primarily on hardware performance at this point. The Blackwell architecture’s absolute performance is strong, but AMD’s competitive claim on specific benchmarks is credible enough that hardware performance alone cannot explain the market share gap. The real explanation is CUDA — Nvidia’s proprietary GPU programming framework that has accumulated over a decade of optimization from ML framework developers, hardware vendors, and the research community. Nearly every AI model, every training framework, every inference optimization tool in the ecosystem was developed first for CUDA and optimized for CUDA before any other hardware target was considered.

PyTorch and TensorFlow, the dominant training frameworks, support AMD’s ROCm stack — AMD’s open CUDA alternative — but support and optimization are different things. A workload that runs on ROCm may run correctly and still run slower than the same workload on CUDA, because the CUDA-specific optimizations embedded in ML framework kernels represent years of engineering work that ROCm hasn’t fully replicated. The practical effect is that organizations deploying AMD GPUs often need to invest engineering resources in workload optimization that organizations deploying Nvidia GPUs don’t require. The MI350’s hardware performance may be competitive; the total cost of ownership, including the engineering investment in ROCm optimization, is less clearly competitive for most enterprise buyers.

AMD has been investing in ROCm for several years, and the software ecosystem gap has narrowed substantially since 2022. The specific workloads where AMD’s hardware advantages are clearest — large-memory inference, specific transformer architectures, HPC workloads — tend to be the workloads where AMD has also concentrated ROCm optimization investment. The result is a competitive landscape where AMD is genuinely strong in certain configurations and competitive in others, but still requires buyers to make a deliberate choice to invest in a less mature software ecosystem. That choice is easier to make when the hardware savings are substantial enough to justify the switching cost.

Where AMD Is Actually Winning

AMD’s real inroads in AI infrastructure are happening at the hyperscalers — Microsoft, Meta, and Google — that have the engineering capacity to optimize workloads for non-CUDA hardware and the purchasing scale to extract meaningful savings from AMD’s more competitive pricing. Meta has been the most publicly active Nvidia alternative deployer, having invested in AMD GPU infrastructure alongside its continued Nvidia procurement and contributing to ROCm optimization through its open-source ML work. Microsoft has AMD Instinct capacity in Azure, providing AMD GPU cloud instances for enterprise customers who want cost flexibility or specific workload profiles. Google has its own TPU alternative to both Nvidia and AMD but has also added AMD capacity in Google Cloud.

The enterprise buyers who are most likely to actually switch from Nvidia to AMD in 2026 are the ones deploying primarily inference workloads at scale where the memory capacity advantage of the MI350 is most relevant — large context window inference, very large model serving, and multi-model serving where GPU memory is the binding constraint. These workloads are growing as frontier models have expanded from 100K to multi-million token context windows and as enterprises deploy larger models in production rather than smaller fine-tuned versions. The MI350’s memory capacity advantage is more relevant to the 2026 inference deployment landscape than it would have been to the 2023 training-dominated landscape.

Nvidia’s Response and Rubin

Nvidia has not been sitting still while AMD has been building the MI350. The Rubin architecture — Nvidia’s next GPU generation after Blackwell — has been previewed at GTC 2026 with specifications that include substantially increased memory capacity (addressing the MI350’s primary competitive angle) and new interconnect capabilities. Rubin is expected to ship in limited quantities in late 2026 and ramp through 2027, and its memory configuration will close the gap with the MI350’s primary advantage. The GPU performance race is iterative: AMD’s MI350 closed a significant gap with Blackwell and established a memory capacity lead; Rubin is expected to close that lead and extend Nvidia’s performance edge on training workloads where CUDA optimization compounds.

Nvidia’s $80 billion stock buyback and $91 billion Q2 revenue guidance — reported in the most recent earnings — reflect a company that is not operationally threatened by AMD’s competitive progress. The 85% market share figure is stable enough that Nvidia’s financial performance doesn’t require a competitive threat response in the near term. The long-term strategic concern is whether AMD’s ROCm investment, combined with the enterprise engineering capacity to optimize for non-CUDA hardware, eventually narrows the software ecosystem gap to the point where hardware performance and pricing differences drive more procurement decisions. That’s a multi-year story, not a Q2 story.

What Procurement Teams Should Know

Enterprise AI infrastructure teams evaluating GPU procurement in 2026 are operating in the first period since the AI buildout began where the AMD option deserves serious evaluation on its own merits rather than as a fallback for Nvidia supply constraints. The MI350’s memory capacity advantage is real and material for specific workload configurations. AMD’s pricing is competitive. The ROCm ecosystem has improved substantially. The switching costs — the engineering investment in workload optimization, the retraining of ML engineering teams, the ecosystem compatibility work — are real and should be fully costed in any build-versus-buy comparison.

The practical recommendation for most enterprises: maintain the existing Nvidia infrastructure for training workloads where CUDA optimization is entrenched, evaluate MI350 seriously for new inference infrastructure deployments where the memory capacity advantage is workload-relevant, and pilot AMD capacity at a scale that allows real-world performance validation before committing to large-scale procurement. The GPU war that was theoretical for most of the AI buildout is now real enough to be worth the evaluation effort. Nvidia’s dominance is intact and likely durable. AMD’s competitive position is meaningfully stronger than it was two years ago, in specific configurations, for buyers willing to make the ecosystem investment. Both things are simultaneously true.

Memory Advantage, CUDA Moat: How to Score the Gap

Hamilton Helmer’s 7 Powers framework identifies the specific structural conditions that allow a company to maintain superior returns against competitors over time. The framework does not evaluate products. It evaluates whether advantages are durable. AMD’s Instinct MI350X is a product evaluation question that becomes a 7 Powers question only if the advantage it demonstrates is structural rather than temporary.

The relevant Power candidates for Nvidia, when examined against AMD’s MI350X challenge, reduce to two: Switching Cost and Counter-Positioning. CUDA is the canonical switching cost example in AI infrastructure. Machine learning engineers trained on CUDA, frameworks optimised for CUDA, production pipelines dependent on CUDA — the cost of migrating a mature AI workload from Nvidia to an AMD alternative is not primarily a hardware cost. It is a software and organisational cost that makes rational buyers reluctant to change suppliers even when the hardware alternative performs better on specific benchmarks.

AMD’s MI350X creates a genuine hardware performance argument. The 288 GB of HBM3E memory represents a measurable advantage over the Blackwell B200’s standard 192 GB configuration for inference workloads on very large models. Independent evaluations have confirmed the tokens-per-dollar improvement on the workload categories AMD targeted. This is a Power-relevant data point — but only if the advantage is structural. Hardware performance leads in semiconductors are temporary. Nvidia’s next generation will address the memory gap. The CUDA switching cost, by contrast, compounds over time as more engineers train on it and more frameworks depend on it.

AMD’s MI350X establishes genuine market access in specific workload categories — very large model inference and memory-intensive tasks where the HBM3E gap is material. Customers procuring for those workloads now have a credible alternative. That is real market access. Whether it compounds into a structural competitive position depends on AMD building enough software ecosystem momentum to compete with CUDA’s switching cost before Nvidia’s next generation closes the hardware gap. Nvidia’s $75.2 billion in data center revenue in a single quarter is the financial expression of that switching cost being intact.

Helmer’s framework scores the current position plainly. AMD holds a real product advantage, not yet a Power. Nvidia holds Switching Cost power intact and Counter-Positioning strengthening as CUDA investment deepens across the industry. The MI350X matters — it changes procurement decisions for a specific workload slice. It does not change the score.

Rhys Donnelly

Rhys Donnelly studied electrical engineering at Trinity College Dublin before pivoting to journalism. He has visited semiconductor fabs in Taiwan, South Korea, and TSMC’s Arizona facility. Based in San Francisco, he covers the full stack from process node economics to platform strategy, with particular focus on where the AI infrastructure buildout creates genuine constraints versus vendor narratives.

Latest Posts

Zoe Kessler

Adobe Firefly Crossed 12 Billion AI Generations and the Enterprise Creative Market Has Shifted

AI·6 min read·Updated Jun 23, 2026

Victor Hale

Solana DEX Volume Surpassed Ethereum Mainnet in Q2 2026 and Jupiter Drove the Shift

Crypto·6 min read·Updated Jun 23, 2026

Jamie Rowe

Netflix’s Reality TV Bet Is Driving Subscriber Growth and the Unscripted Format Has Matured

Streaming·6 min read·Updated Jun 23, 2026