CONNECT WITH US

Shenzhen launches China's first full-stack domestic 14,000P AI cluster

, DIGITIMES Asia, Taipei
0

Credit: EET China

Shenzhen has brought online what project materials describe as China's first 14,000P, 10,000-card AI computing cluster built around a fully domestic technology stack, marking a new stage in the country's push to reduce reliance on foreign hardware and software in large-scale model training.

Activated on March 26, 2026, the project combines an 11,000P second phase with an earlier 3,000P deployment, bringing total capacity to 14,000P. More significant than its scale is its positioning as China's first end-to-end domestic AI infrastructure stack, spanning chips, servers, networking, storage, software, and scheduling.

This matters because competition in AI infrastructure has shifted beyond individual chip performance. At the 10,000-card scale, the challenge is no longer compute alone, but whether systems can be fed, cooled, connected, and orchestrated reliably over sustained training cycles. Shenzhen's project, therefore, tests whether China can build a usable, industrial-scale AI training base — rather than a nominally large but operationally fragmented cluster.

Why China's AI cluster matters

China's AI infrastructure push has long faced a structural gap: even where domestic chips were deployed, key layers — such as interconnects, software environments, and supporting components — often remained reliant on foreign technologies. Shenzhen's cluster is positioned as a break from that dependency.

The system is built around Huawei's Ascend 910C accelerators and an "Ascend + CAAN" ecosystem. It is designed as a fully self-controlled computing base for large model training and inference, supported by localized data, integrated operations, and centralized scheduling. Officials also position it within a broader national computing network strategy rather than as a stand-alone municipal project.

Early demand supports that positioning. Phase one was fully allocated, while nearly 50 companies, universities, and research institutes signed agreements for second-phase capacity. With an overall utilization rate of 92%, the constraint is shifting from demand to whether reliable compute can be delivered at scale.

Beyond scale: coordination defines performance

The cluster's significance lies less in card count than in how those resources are organized.

The system deploys roughly 14,000 Ascend 910C accelerators in a supernode architecture, rather than a conventional stack of smaller AI servers. In traditional designs, scaling card count increases inter-node communication overhead, raising latency, fragmenting utilization, and reducing training efficiency.

The architecture groups compute into dense supernodes linked by a domestic high-speed interconnect and coordinated through a unified scheduling layer. By localizing communication within nodes and reducing cross-cluster traffic, the design aims to improve efficiency at scale while addressing three core constraints: communication bottlenecks, engineering complexity, and fault management.

This systems-level approach is reflected in reported performance metrics. The initial 3,000P deployment recorded an average daily failure rate of 0.3‰, while training linearity on the Pangu-718B model reached 93.12%. A reported PUE of 1.08 also highlights a parallel focus on energy efficiency.

How the domestic stack is put together

While some technical details read more like engineering briefings than independently verified disclosures, the broader picture is clear: Shenzhen is positioning the project as a rare full-stack domestic AI infrastructure system spanning all major layers — not just accelerators.

At the core is Huawei's Ascend 910C accelerator. Surrounding it is a domestically built stack: high-density servers designed for supernode deployment, a proprietary high-speed interconnect marketed as Ascend Fabric (often referred to as Xinghe AI Fabric in its latest iterations), distributed storage combining local NVMe and parallel systems, and a software layer built on Ascend Intelligent Computing Platform with a distributed scheduling engine for resource orchestration and fault isolation.

Crucially, the project is not built around a single supplier. It draws on a broader domestic ecosystem spanning CPUs, memory, power systems, and infrastructure vendors, reinforcing China's push to localize not just chips, but the entire AI supply chain.

China-made by layers

Layer

Main brand/supplier

Product/system

Role in the cluster

AI accelerator

Huawei

Ascend 910C

Core training accelerator for the 10,000-card cluster

CPU

Phytium (FeiTeng) / Hygon

FT-3000 / Hygon 7390

General-purpose compute and system control

Server/motherboard

Huawei and domestic ODMs

Ascend server boards

High-density server integration for supernode deployment

Network

Huawei and domestic vendors

Ascend Fabric, 400G/800G RDMA switches

High-speed interconnect for large-scale distributed training

Storage media

YMTC / CXMT

NVMe SSD / memory components

Local cache and storage support for model training

Power systems

Huawei / Hangzhou Zhongheng Electric

Integrated power modules

Power delivery for dense AI infrastructure

Cooling/cabinet

Shenzhen-based local suppliers

Custom liquid-cooled racks

Thermal management and high-density enclosure design

Software platform

Huawei

Ascend Intelligent Computing Platform

Cluster management, resource control and monitoring

Scheduling layer

Proprietary domestic software

Distributed scheduling engine

Job scheduling, topology-aware allocation and fault handling

Source:.Nstipsp.com, compiled by DIGITIMES, April 2026

Performance meets economics

Large AI clusters often fail to convert scale into usable compute. As systems grow, communication overhead increases, faults accumulate, and power costs escalate. Shenzhen's project is designed to address these constraints directly.

Project metrics focus on three areas: linearity, reliability, and energy efficiency. The reported training linearity of 93.12% for the Pangu-718B model suggests performance scales with cluster size, while a 0.3‰ daily failure rate addresses concerns over long-duration training stability. A reported PUE of 1.08 points to aggressive optimization in cooling and power systems, including full liquid cooling and integrated energy management.

These metrics underpin the project's commercial positioning. Beyond sovereignty, Shenzhen is pitching the cluster as cost-efficient, reliable AI infrastructure capable of attracting model developers, robotics firms, research institutions, and enterprise users.

A test for China's AI infrastructure

The implications extend beyond Shenzhen. China's AI race is no longer defined by chip design alone, but by whether it can build a complete training environment — spanning compute, networking, storage, orchestration, and operations — that developers will adopt at scale.

If validated in real-world deployment, the cluster could serve as a reference point. It would signal a shift from component substitution to systems-level engineering across China's AI stack, with implications for large models, AI for Science, robotics, and autonomous driving — all of which depend on sustained, high-availability compute.

The next phase will test execution. Operators plan to expand capacity, integrate resources into a unified platform, and support both training and inference workloads. If Shenzhen can sustain utilization, improve software compatibility, and scale without compromising reliability, the project could signal a broader shift: turning AI compute from a constraint into a controllable industrial capability.

Article edited by Jerry Chen