CONNECT WITH US

NVIDIA prepares for next-generation AI computing

DIGITIMES staff

NVIDIA, with the acquisition of Arm and its keen push in AI applications, has been aggressively releasing innovations to help improve datacenters' computing capability over AI workloads.

During Computex 2021, NVIDIA unveiled several new hardware and software including NVIDIA BlueField-2 DPUs, NVIDIA-Certified Systems and NVIDIA Base Command Platform, for datacenter servers.

The BlueField-2 is a data processing unit, offering high-speed networking, programmable Arm cores, hardware-accelerated encryption/decryption and additional offloads for networking, security and storage.

As for how servers can benefit from the BlueField-2 DPU in performance boost, NVIDIA's Head of Enterprise Computing Manuvir Das explained that at the current state, some of the CPUs in servers can use up to around 30% of their capacities to handle the systems' average administration tasks, instead of supporting the operation of applications.

The BlueField-2 DPU is designed to help servers handle the infrastructure works and free up the occupied resources at the CPUs and with the adoption of the DPUs, companies now need fewer servers to achieve the same workload since each server is able to do more work.

It also works differently from accelerator cards. Comparing to GPU-based accelerator cards, which are used to accelerate application calculations, the DPU is primarily accelerating the networking activities between servers and storage as well as servers' I/O activities.

The acceleration over the networking activities is also an advantage of the BlueField DPU as most other competing products available in the market are simply looking to move the workload to the chip without providing any acceleration.

The DPU's Arm-based CPU cores and dedicated silicon chips also play keys in the acceleration for some parts of the network traffic, noted Das, adding that NVIDIA is also seeking to integrate its DPU and GPU into one chip in the long term.

NVIDIA has partnered with many server companies including ASUS, GIGABYTE, Quanta Cloud Technology (QCT), Dell Technologies and Supermicro to release server products equipped with the DPUs.

Since the DPU's job is mainly to help take workload off servers' CPUs, the hardware can be applied to servers for all kinds of industries and applications.

However, security works of servers can particularly benefit from the DPU. With security growing more important to the cloud computing industry, the DPU is able to offload the heavy resource consuming task from CPUs.

For NVIDIA-Certified Systems, the company has cooperated with server vendors to create a template for server architectures. The vendors need to have the correct setups and configurations of NVIDIA's hardware in their servers, and pass the benchmark in order to be certified, Das noted.

With the NVIDIA certification, buyers of the certified servers are guaranteed of AI computing performance. They are also able to obtain support from NVIDIA on any performance issues with their software, as the chance of having issues with the hardware has already been minimized by following the certification guidelines.

NVIDIA is also planning to expand the certification services to cover Arm-based servers in 2022. For AI applications, most workload is handled by servers' GPUs so vendors usually prefer equipping lighter CPUs for their systems to cut down power consumption. And Arm-based processors, which have been known for its low power usage, are perfect for vendors to build such a solution.

NVIDIA has seen requests from vendors for providing assistance in creating Arm-based server solution and though the x86 architecture is still the mainstream of the server industry at the moment, NVIDIA still tries to satisfy customers' demand for niche applications, Das stated.

NVIDIA's Base Command Platform is designed for large-scale, multi-user and multi-team AI development workflows hosted either on premises or in the cloud. It enables numerous researchers and data scientists to simultaneously work on accelerated computing resources, helping enterprises maximize the productivity of both their expert developers and their valuable AI infrastructure.

Base Command Platform was originally a system used by NVIDIA internally for the company's engineering teams to share their latest contents and work progresses among data scientists for AI projects.

With the platform growing mature and clients also having demand for such a system, NVIDIA has decided to make it available to customers.

NVIDIA is responsible for the maintenance of Base Command Platform in terms of both software and hardware infrastructure, while the services' subscription and promotion will be handled by NetApp, Das said.

When comes to the question of what the barriers to AI adoption at mainstream enterprises are today, Das pointed out that a key challenge that NVIDIA is seeing at the moment is the challenge of coordinating between data scientists, who are experts in AI training and models, and IT engineers, who specialize in traditional applications such as SAP and VMware. NVIDIA is hoping Base Command Platform will help smooth out the issue.

As AI computing servers become more popularized among datacenters worldwide, Das expects volumes of NVIDIA-Certified servers to be a good indication for the penetration of AI-ready hardware. At the moment, most datacenter operators still acquire servers and GPU-accelerator cards separately, but the certification program should boost datacenter operators' demand for complete NVIDIA-certified server sets in the years to come.

However, since servers' replacement cycle usually takes a couple of years, it may still require several years of replacement cycles for more mainstream datacenter servers to run AI computing.

Das also shared the latest progresses of NVIDIA's projects with VMware including over vSphere and Project Monterey. VMware in March updated its vSphere with support to NVIDIA AI Enterprise software suite, but at the time, NVIDIA was only able to provide the early access version of its software to customers.

This summer, VMware will bring another update to vSphere, and this time, NVIDIA will be able to provide the first general available version of the AI Enterprise software suite, allowing customers to move their development to production.

Project Monterey is a case that NVIDIA is working with VMware to have VMware's software running on the BlueField DPU. Since VMware's hypervisor is completely running on CPUs, the project is aiming to have the DPU handling the hypervisor to reduce CPUs' workload.

See a replay of NVIDIA COMPUTEX 2021 Keynote to get more details of NVIDIA Enterprise AI announcements.

NVIDIA's Head of Enterprise Computing Manuvir Das

NVIDIA's Head of Enterprise Computing Manuvir Das