Amazon.com (AMZN) is set to massively deploy its self-developed AI chips, specifically the Trainium and Inferentia AI ASIC compute clusters, to develop and update its proprietary large AI models. This strategic shift aims to significantly reduce development costs. For the AI GPU ecosystem, currently dominated by NVIDIA (NVDA) and AMD, Amazon's move could introduce medium-to-long-term marginal pressure and erode monopoly premiums. Amid the rising wave of AI inference demand, the more cost-effective AI ASIC technology path may pose the most substantial challenge yet to NVIDIA's near 90% market share in AI chips.
From the perspective of the AI compute supply chain and chip engineering, Amazon's cloud platform AWS utilizing its own AI chips for training large models—rather than focusing solely on inference tasks as before—marks a critical milestone for the in-house AI ASIC strategy. This is not the starting point for AI ASICs, a path already validated by Google's TPU. However, Amazon is now elevating its approach by integrating self-developed AI ASIC clusters directly into the core compute infrastructure for its advanced AI models, a significant development for hyperscale cloud providers like Amazon, Google, and Microsoft.
Market concerns regarding NVIDIA's future prospects appear justified. Peter DeSantis, Amazon's new head of AI infrastructure, stated in a recent interview, "If we can build models on our own AI chips, we can construct these models at a fraction of the cost faced by pure AI model providers." He added, "Building hyperscale AI data centers does involve cost challenges. For AI to truly transform everything, costs must change."
NVIDIA, the dominant force in AI chip infrastructure, recently reported stronger-than-expected results for fiscal Q4 2026 and provided robust guidance for the next quarter. Despite this, its stock fell 5% on Thursday, reflecting growing investor anxiety over announcements from hyperscalers about developing more cost-efficient, proprietary AI ASICs. These developments signal potential risks to NVIDIA's long-standing supremacy in the core AI chip market. Amazon's plan to utilize Trainium and Inferentia for model development validates these concerns.
Earlier this month, Amazon's management indicated that capital expenditures for 2026 could reach approximately $2 trillion, far exceeding Wall Street expectations. CEO Andy Jassy noted that a portion of this expenditure will fund the development and iteration of in-house AI chips. Jassy explained, "Given strong demand across our e-commerce services, traditional cloud services, and AI compute needs—plus massive growth opportunities in AI models, humanoid robots, and low Earth orbit satellites—we anticipate robust long-term returns on this investment."
The true novelty of Amazon's latest plan lies not in proving that AI ASICs can train large models, but in positioning its custom chips from optional cloud compute resources to the core pathway for foundational model development. While NVIDIA's AI GPUs dominate the training segment, requiring high versatility and rapid iteration, the inference side prioritizes cost per token, latency, and energy efficiency after AI technologies scale. Google, for instance, positions its Ironwood TPU generation as built for the "AI inference era," emphasizing performance, efficiency, and scalability.
Nevertheless, Amazon's actions demonstrate that AI ASICs possess substantial potential for model training. The AI ASIC compute framework will likely continue to erode NVIDIA's monopoly premiums and market share over the medium to long term, though not through a linear replacement of GPUs. The fundamental reason is that competition in the inference era shifts from peak compute power to metrics like cost per token, power consumption, memory bandwidth utilization, interconnect efficiency, and total cost of ownership with hardware-software co-design. On these fronts, ASICs, customized for specific workloads with optimized data flow, compilers, and interconnects, naturally achieve higher cost efficiency than general-purpose GPUs.
For NVIDIA and AMD, this largely implies that marginal pressure is real, likely manifesting as reduced pricing power, market share erosion, and compressed valuation premiums, rather than a collapse in absolute demand. Under the AI inference super-cycle, AI ASICs will undoubtedly challenge the GPU-dominated landscape, but the impact is more about reshaping industry profit pools and customer procurement structures than invalidating GPU expansion logic.
AWS officially positions Trainium and Inferentia as specialized accelerators for generative AI training and inference, with Trainium2 offering approximately 30-40% better price-performance compared to its AI GPU cloud instances. Google has also publicly stated that Gemini 2.0 is trained and inferred entirely on TPUs. This indicates that hyperscalers using proprietary ASICs for core model training and inference is transitioning from proof-of-concept to a replicable, industrial phase. However, extrapolating this trend to suggest a rapid collapse of the GPU ecosystem is an overstatement.
NVIDIA's true moat extends beyond the chips themselves to its CUDA platform, developer toolchain, model adaptation breadth, and ecosystem inertia. Analysts noted last year that over 4 million developers worldwide rely on CUDA, meaning that many advanced training tasks, complex mixed workloads, and rapidly iterating new models will still be more suitable for GPUs in the short term. Notably, even as AWS advances its custom AI chips, it continues to integrate GPU architectures into future chips and offer NVIDIA-based AI infrastructure. This precisely illustrates that the real strategy for hyperscalers is not "de-GPU-ization" but rather retaining GPUs for high-end training layers while increasing the ASIC share in large-scale inference and proprietary model stacks.
Therefore, from an engineering standpoint, the future is more likely to resemble a "GPU + ASIC coexistence and layering" model rather than a single victorious path.