Nvidia has just announced its HGX-2 cloud server platform, which it claims powers “the fastest single computer humanity has ever created”.
It combines 16 Tesla V100 graphics cards, which work together to create a giant virtual GPU with half a terabyte of GPU memory and two petaflops of compute power. This is achieved using Nvidia NVSwitch interconnect fabric technology, which links the GPUs together to work as a single GPU.
The announcement was made at Nvidia’s GTC (GPU Technology Conference) event in Taiwan, which is a bit of an appetizer before the main course of Computex 2018 arrives next week.
While the HGX-2 certainly has some mind-blowing specifications, it won’t be used in standard computers. Instead, it will be capable of high-precision calculations using FP64 and FP32 for scientific computing and simulations, while enabling FP16 and Int8 for AI training.
Jensen Huang, founder and chief executive officer of Nvidia, announced at GTC that “The world of computing has changed […] CPU scaling has slowed at a time when computing demand is skyrocketing. NVIDIA’s HGX-2 with Tensor Core GPUs gives the industry a powerful, versatile computing platform that fuses HPC and AI to solve the world’s grand challenges.”
According to Nvidia, the HGX-2 has achieved record AI training speeds of 15,500 images per second on the ResNet-50 training benchmark, and is powerful enough to replace up to 300 CPU-only servers.
At Computex 2017, Nvidia announced the HGX-1 which became pretty popular, being used by companies that rely on massive datacenters such as Facebook and Microsoft.
Nvidia has high hopes for HGX-2 as well, with some major businesses, including Lenovo, QCT, Supermicro, Foxconn and Wiwynn, announcing plans to launch HGX-2 systems this year.
According to Paul Ju, vice president and general manager of Lenovo DCG , “NVIDIA’s HGX-2 ups the ante with a design capable of delivering two petaflops of performance for AI and HPC-intensive workloads. With the HGX-2 server building block, we’ll be able to quickly develop new systems that can meet the growing needs of our customers who demand the highest performance at scale.”
The HGX-2 is powered by the Nvidia Tesla V100 GPU, which comes equipped with 32GB of high-bandwidth memory to deliver 125 teraflops of deep learning performance. Combining 16 of those GPUs together is going to produce some excellent results.
“Every one of the GPUs can talk to every one of the GPUs simultaneously at a bandwidth of 300 GB/s, 10 times PCI Express,” Huang said, “so everyone can talk to each other all at the same time.”
Nvidia also showed off the Nvidia DGX-2, which is the first system built using the HGX-2 server platform, and comes with 2 petaflops of computing power and 512GB of HBM2 memory.
According to Huang, “this is the fastest single computer humanity has ever created”. Pretty exciting stuff.