CONNECT WITH US

Nvidia adopts Groq to tackle AI inference and expand global reach

Emily Kuo, DIGITIMES Asia, Taipei 0

At its annual GTC conference in San Jose, Nvidia unveiled a major shift in its AI hardware strategy: integrating technology from AI chip startup Groq to address growing demand in AI inference, while simultaneously preparing new products for global markets, including China.

According to The Information, Nvidia announced plans to incorporate Groq's technology into its GPU systems to handle specialized inference workloads such as coding and real-time AI responses. The move reflects a subtle but significant acknowledgment: GPUs alone are no longer sufficient for all AI tasks, particularly those requiring low latency and high responsiveness.

The new system is already drawing interest from major AI players, including OpenAI, with potential evaluations by rivals such as Anthropic, as well as interest from Meta Platforms and xAI.

Groq architecture fills the inference gap

At the core of this shift is Groq's architecture, which places memory directly on the chip, enabling significantly faster data access. This design, however, comes with trade-offs, including limited on-chip memory capacity that requires scaling through interconnected chips. Nvidia's solution is to pair Groq's inference-optimized chips with its own high-memory GPUs, creating a hybrid system.

Jon Peddie Research reported that Nvidia completed a roughly US$20 billion acquisition of Groq in late 2025, its largest ever. At GTC 2026, CEO Jensen Huang explained that Groq's language processing unit (LPU) fills a critical gap in GPU performance — specifically, memory bandwidth limitations at extreme token generation speeds.

By integrating LPUs alongside next-generation Vera Rubin GPUs, Nvidia can dramatically improve efficiency, achieving up to 35 times higher tokens per watt for real-time inference workloads.

Huang likened the Groq deal to Nvidia's earlier acquisition of Mellanox, which transformed the company's networking capabilities. In the same way, Groq is now being embedded as a core component of Nvidia's AI infrastructure, signaling a transition from a GPU-centric model to a full-stack AI platform optimized for both training and inference.

Reshaping the competitive landscape

This pivot comes as inference emerges as the next major growth driver. Reuters reports Huang stating during his keynote that "the inference inflection has arrived," with Nvidia projecting up to US$1 trillion in AI chip revenue between 2025 and 2027.

At the same time, Nvidia is navigating geopolitical constraints. The company is preparing a version of its Groq-based chips suitable for the Chinese market. While its latest Vera Rubin GPUs cannot be exported to China due to US restrictions, the new systems will combine Groq chips with alternative configurations that comply with regulations. Nvidia has also resumed production of its H200 chips after securing export licenses and new orders from Chinese customers.

While Nvidia dominates AI training, it faces increasing pressure in inference from both hyperscalers and regional players. Major cloud providers — including Google, Amazon, and Microsoft — account for about 60% of Nvidia's revenue but are also developing their own chips, such as Google's TPUs, to reduce dependence on Nvidia, according to The Information. Chinese firms like Baidu are likewise building domestic inference solutions.

AI infrastructure scales across the ecosystem

The GTC event itself underscored the scale of the AI boom, with companies across the ecosystem announcing partnerships, data center expansions, and multibillion-dollar deals. Meta committed up to US$27 billion to AI infrastructure with cloud provider Nebius, highlighting the massive capital flowing into AI compute.

Ultimately, Nvidia's integration of Groq marks a turning point in AI infrastructure. The industry is shifting from a training-centric model to an "inference economy," where performance is measured not just in compute power but in speed, efficiency, and cost per token. By combining GPUs with specialized inference accelerators, Nvidia is positioning itself to remain at the center of this next phase of AI computing.

Article edited by Jerry Chen