Nvidia Launches Rubin Architecture to Drastically Reduce AI Costs
Las Vegas, Tuesday, 6 January 2026.
At CES 2026, Nvidia CEO Jensen Huang unveiled the Vera Rubin platform, a six-chip architecture now in full production. Promising to slash AI inference costs by 90%, the system utilizes “extreme codesign” to address the critical power and financial constraints facing the next generation of generative AI.
Extreme Codesign: A Six-Chip System
Nvidia’s strategy with the Rubin platform represents a shift from producing standalone components to engineering the data center as a singular unit of compute [3]. The architecture is built upon six distinct chips designed to function in unison: the Vera CPU, the Rubin GPU, the NVLink 6 Switch, the ConnectX-9 SuperNIC, the BlueField-4 DPU, and the Spectrum-6 Ethernet Switch [2][3]. By integrating these components, Nvidia aims to eliminate the bottlenecks inherent in traditional infrastructure, with the full platform officially entering production as of January 5, 2026 [2][5].
Redefining Data Center Economics
The primary value proposition of the Rubin architecture addresses the escalating costs associated with generative AI. During the opening keynote on January 4, CEO Jensen Huang stated that the new platform is engineered to reduce the cost of generating tokens to approximately one-tenth of the previous generation’s capability [1][5]. This efficiency extends to training workloads as well; the Rubin platform requires four times fewer GPUs to train complex Mixture of Experts (MoE) models compared to the preceding Blackwell platform [2]. These reductions are critical for the industry, as Huang noted that “$10 trillion or so of the last decade of computing is now being modernized” to accommodate accelerated AI workloads [1].
Unprecedented Density and Connectivity
The physical architecture of the Rubin platform is defined by its density and interconnect speeds. The rack-scale solution, known as the NVIDIA Vera Rubin NVL72, integrates 72 Rubin GPUs and 36 Vera CPUs into a single system [3]. With each compute tray hosting two Vera Rubin superchips (comprising four GPUs total), the rack effectively consolidates 18 compute trays to function as one massive accelerator [3]. Connectivity is handled by the sixth-generation NVLink, which provides 3.6 terabytes per second (TB/s) of bandwidth per GPU, ensuring that the system can handle the massive data throughput required by next-generation models [3].
From Silicon to Strategy
While the hardware specifications are ambitious, the commercial rollout is already underway. Systems based on the Rubin architecture are scheduled to reach partners in the second half of 2026 [4][5]. Microsoft has announced plans to deploy the technology in its “Fairwater” AI superfactories, which will scale to hundreds of thousands of Vera Rubin Superchips [2]. Similarly, cloud provider CoreWeave is set to be among the first to offer Rubin-based instances to developers [2]. Beyond the data center, Nvidia is pushing into the automotive sector with the Mercedes-Benz CLA, which will feature the “Alpamayo” AI-defined driving capabilities and is expected to arrive in the U.S. later this year [1].
Sources
- blogs.nvidia.com
- nvidianews.nvidia.com
- developer.nvidia.com
- www.theverge.com
- www.wired.com
- mashable.com
- www.servethehome.com