Huawei CloudMatrix 384 vs Nvidia GB200 NVL72: Key AI Hardware Differences Explained

Image Credit: Paul Hanaoka | Splash (for illustration only)

Huawei Technologies unveiled its CloudMatrix 384 artificial intelligence computing system at the World Artificial Intelligence Conference in Shanghai last month, positioning the hardware as a domestic alternative to Nvidia's advanced offerings in a market constrained by U.S. export restrictions.

The system, which integrates 384 of Huawei's Ascend 910C chips, aims to support large-scale AI model training and inference tasks. Huawei Cloud CEO Zhang Pingan said in June the platform was operational on the company's cloud infrastructure, marking a step toward reducing China's reliance on foreign technology.

Background on Development

Huawei's push into high-performance AI hardware accelerated after U.S. sanctions, imposed since 2019, limited access to cutting-edge chips from Nvidia and others. The restrictions, aimed at curbing China's military and technological advances, barred exports of Nvidia's H100 and subsequent models like the GB200 to the mainland. In response, Huawei has invested in its Ascend series, with the 910C succeeding the 910B model released in 2022.

The CloudMatrix 384 spans 16 racks, including 12 for computation and four for networking, using a proprietary "supernode" architecture with optical interconnections to link components. This design addresses bottlenecks in data transfer for AI workloads, such as those involving trillion-parameter models. Nvidia CEO Jensen Huang noted in May that Huawei was progressing rapidly, citing the CloudMatrix as an example of competitive pressure.

Experts view the launch as part of China's broader strategy to build self-sufficient AI infrastructure, driven by national priorities in technology sovereignty. According to industry reports, Huawei has shipped more than 10 units to domestic clients, including public sector and financial entities, though independent benchmarks remain limited.

System Comparison

Comparing Huawei's CloudMatrix 384 and Nvidia's GB200 NVL72 requires a balanced framework that accounts for differences in design, scale and efficiency, experts say, amid ongoing U.S. restrictions accelerating China's domestic AI hardware development.

Both systems target high-performance AI workloads like model training and inference, but they diverge in architecture: Huawei's multi-rack setup emphasizes volume scaling with 384 Ascend 910C chips, while Nvidia's single-rack design prioritizes dense integration with 72 Blackwell B200 GPUs. A fair basis involves normalizing metrics across per-chip, per-rack and efficiency dimensions, while considering software ecosystems and real-world applications.

Per-Chip Comparison

At the chip level, Nvidia's B200 outperforms Huawei's Ascend 910C in raw specifications, based on industry analyses and third-party reports, as Huawei has not released official datasheets for the 910C. The B200 delivers up to 2,500 teraflops in BF16 precision, roughly three times the 800 teraflops of the 910C. Memory capacity stands at 192 GB HBM3e for the B200 versus 128 GB HBM for the 910C, with bandwidth at 8 terabytes per second compared to 3.2 terabytes per second—yielding multipliers of 0.7x and 0.4x, respectively, in Huawei's favour. This gap stems from manufacturing differences: Nvidia uses advanced TSMC processes, while Huawei relies on SMIC's 7nm nodes, constrained by sanctions.

Per-Rack and System-Level Normalization

System comparisons must adjust for scale, as the CloudMatrix 384 spans 16 racks versus the GB200 NVL72's one. Huawei achieves 300 petaflops in BF16 compute, 1.67 times Nvidia's 180 petaflops, with 3.6 times the aggregate memory (49.2 terabytes) and 2.1 times the bandwidth (1,229 terabytes per second). However, per rack, Nvidia leads by up to 9.6 times in compute, 5.6 times in bandwidth and 4.4 times in memory capacity due to tighter integration. Huawei's "supernode" optical networking aids scalability for massive clusters, but Nvidia's NVLink provides higher intra-rack speeds (1,040,000 Gb/s versus Huawei's optical mesh).

Efficiency and Cost Considerations

Power efficiency is a critical fair metric, where Nvidia excels. The GB200 NVL72 consumes 132 kilowatts for 1.36 petaflops per kilowatt, while Huawei's 559 kilowatts yields about 0.54 petaflops per kilowatt—roughly 2.5 times less efficient. This "brute force" approach suits China's lower energy costs but raises sustainability concerns globally. Cost data is limited to market estimates, but Huawei systems may be priced at around US$8 million per unit, compared to Nvidia's US$3 million, potentially offering better value in sanction-hit markets, though Nvidia's ecosystem reduces total ownership costs through faster development.

Broader Impacts

Huawei's strengths include scalability for hyperscale AI in China and resilience to sanctions, bolstering national tech goals. Drawbacks encompass lower efficiency and supply chain risks from foreign components. Nvidia offers superior per-unit performance and global ecosystem, but faces export barriers in China. This rivalry could fragment the AI market, with China prioritizing volume to close gaps, while the West emphasizes efficiency.

Experts predict evolving comparisons as Huawei refines yields and Nvidia launches compliant variants like the B20, underscoring the need for context-specific assessments in AI infrastructure choices.

Future Trends

Looking ahead, Huawei plans expansions to even larger clusters, potentially surpassing 100,000 chips with yield improvements. This aligns with China's investments in 39 AI data centers, though global trends favor energy-efficient designs amid rising sustainability concerns. Escalating U.S.-China tensions could prompt tighter restrictions on components, while Nvidia develops compliant chips like the B20 for the market. Analysts predict a bifurcated AI landscape, with China focusing on volume and integration to offset chip gaps.

3% Cover the Fee
TheDayAfterAI News

We are a leading AI-focused digital news platform, combining AI-generated reporting with human editorial oversight. By aggregating and synthesizing the latest developments in AI — spanning innovation, technology, ethics, policy and business — we deliver timely, accurate and thought-provoking content.

Previous
Previous

AI-Powered Robotics Reshape Facility Management with KABAM Deployments

Next
Next

Genspark.AI Emerges in U.S. as AI Agent Startup Led by Former Baidu Executives