NVIDIA Volta GV100 GPU Chip For Summit Supercomputer Twice as Fast as Pascal P100 – Speculated To Hit 9.5 TFLOPs FP64 Compute

Hassan Mujtaba • Dec 20, 2016 08:23 AM EST

• Copy Shortlink

NVIDIA Volta is being prepped for launch in the next generation supercomputers known as Summit and Sierra. Little is known about Volta GPU specifications but an analysis down by NextPlatform over the details of Summit supercomputer reveal that it can be an insanely fast chips capable of delivering multi-tflops compute power in the HPC market.

NVIDIA Volta VG100 GPUs - The Heart of the Summit and Sierra Supercomputer, Multi-TFLOPs Chip With Fastest HBM2 Configuration

When NVIDIA announced their Pascal GP100 GPU at GTC 2016, they called it the largest chip endeavor in the history of humanity. With a R&D budget of over several Billion dollars, Pascal GP100 was indeed the great chip of 2016, aimed to power the HPC and datacenter market with performance never before seen in the graphics industry. NVIDIA also utilized Pascal GP100 GPUs inside their own DGX SaturnV supercomputer that is designed to help them build smarter cards and next generation GPUs (GPUs Designing GPUs).

Just a year after their successful Pascal launch in the HPC market, NVIDIA will be planning to introduce their next grand chip for the HPC market, codenamed Volta. Details of the chip first emerged back at GTC 2015 where NVIDIA showcased what the predict to be the estimated performance output of their upcoming chips. Do note that Pascal was not launched at that time. According to the slides presented that day, Volta would have twice of everything that Pascal has. Double the memory capacity, double the compute, higher efficiency and faster bandwidth.

nvidia-volta-v100-gpu-single-precision-compute

We aren't sure how much of that may end up being true but what NVIDIA estimated for Pascal was close to the final product (if not entirely the same). The only thing that Pascal currently lacks is the promised 32 GB capacity but that's mostly an issue due to HBM production which has already ramped up and we can expect a full GP100 configuration with 32 GB capacity since that is entirely possible with the chip design. In short, VRAM limitation is due to production, not the chip design.

Summit Supercomputer Latest Details Provide First Glimpse of NVIDIA Volta VG100 GPU Specs

The latest details for the Summit Supercomputer have been confirmed and they are incredible if we look from a HPC perspective. The Summit Supercomputer has 5-10x improvement in application performance over the Titan supercomputer that featured the Kepler GK110 GPU architecture. The Titan was comprised of 18,688 nodes rated at 1.4 TF (per node). The Summit features around 4,600 nodes with a rated compute output of over 40 TF (per node).

NVIDIA Volta and IBM Power 9 CPU Summit Supercomputer Specifications — Specifications comparison of Titan and Summit Supercomputer. (Image Credits: The Next Platform)

There's 512 GB of DDR4 and additional HBM2 memory on each node. Titan in comparison had just 38 GB of DDR3 and 6 GB GDDR5 (per GPU) memory on each node. There's also total of 800 GB NV memory per node. In total, the memory on Titan supercomputer was 710 TB, Summit peaks at over 6 Petabytes of memory (all DDR4 + HBM2 + Non-Volatile combined).

The Power9 chips will have 48 lanes of PCI-Express 4.0 peripheral I/O per socket, for an aggregate of 192 GB/sec of duplex bandwidth, as well as 48 lanes of 25 Gb/sec “Bluelink” connectivity, with an aggregate bandwidth of 300 GB/sec for linking various kinds of accelerators. These Bluelink ports are used to run the NVLink 2.0 protocol that will be supported on the Volta GPUs from Nvidia, and which have about 56 percent more bandwidth than the PCI-Express ports. IBM could support a lot of the SMX2-style, on-motherboard Tesla cards in a system, given all of these Bluelink ports, but remember it needs to allow the Volta accelerators to link to each other over NVLink so they can share memory as well as using NVLink to share memory back with the two Power9 chips. via The Next Platform

Each node will house 2 IBM Power9 CPUs and 6 NVIDIA Volta V100 GPUs. NVIDIA's NVLINK2 interconnect will be fully integrated between these nodes. The system would consume 13 MW peak power which is just 4 MW more than the Titan supercomputer (9 MW) for over 10x the performance improvement.

NVIDIA Volta Tesla V100 - The Next-Generation Compute Powerhouse

NVIDIA previously stated through their roadmaps that NVIDIA Volta GV100 GPUs will deliver SGEMM (Single precision floating General Matrix Multiply) of 72 GFLOPS/Watt compared to 42 GFLOPs/Watt on Pascal GP100. Using the mentioned ration, a Volta GV100 based GPU with a TDP of 300W can theoretically deliver 9.5 TFLOPs of double precision performance, almost twice that of the current generation GP100 GPU. NVIDIA's Tesla P100 cards also ship at 300W but the nodes are expected to feature around 40 TFLOPs of compute performance so it is possible that NVIDIA may use TDP configured variants for the Summit supercomputer.

Since six Volta V100 GPUs with a rated 300W TDP will go beyond the 40 TF node barrier, delivering around 57.2 TFLOPs which isn't as claimed in the Summit specs sheet. A geared down version that runs with a TDP around 200W will manage 20-25% lower performance and deliver 7.6 TFLOPs and 38.2 GFLOPs/Watt which aligns with the Summit node specs.

Six of these Volta Tesla V100 GPUs can run with 45 TFLOPs compute which sounds more possible. There's possibility that the final dual precision compute of Volta V100 may end up near 8-9 TFLOPs which would be an impressive feat for the graphics manufacturer.

Summit Supercomputer Specifications:

Supercomputer	Titan	Summit
Number of Nodes	18688	4608
Processors	1 Opteron 1 Kepler K20X	2 IBM Power9 6 NVIDIA Tesla V100
GPUs	18688 NVIDIA Tesla K20X	27648 NVIDIA Telsa V100
CPUs	18688 Opteron CPUs	9216 Power9 CPUs
Node Performance	1.44 TF	49 TF
Peak Performance	27 PF	200 PF
Peak OPs (Tensor)	N/A	3.3 ExaOps
Memory Per Node	38 GB DDR3 + 6 GB GDDR5	512 GB DDR4 + HBM2 (16/32 GB) + NVDIMM
NV Memory Per Node	0	800 GB (Flash based)
Total System Memory	710 TB	10 PB
System Interconnect	Gemini (6.4 GB/s) PCIe 8 GB/s	Dual Rail EDR-IB (23 GB/s) / Dual Rail HDR-IB (48 GB/s) NVLINK 300 GB/s
Interconnect Topology	3D Tours	Non-Blocking Fat Free
File System	32 PB, 1 TB/s Lustre	250 PC, 2.5 TB/s, GPFS
Peak Power Input	9 MW	13 MW

Furthermore, Volta GV100 may ship or exceed the promised 32 GB HBM2 capacity of Pascal GPUs and have bandwidth tuned around 1 TB/s. NVIDIA slides from GTC 2015 claim bandwidths of ~900 GB/s while Pascal currently operates with 732 GB/s.

The Looming Memory Crisis With HBM2

On further explaining the next generation GPU architectures and efficiency, Stephen W.Keckler (Senior Director of GPU Architecture) pointed out that HBM is a great memory architecture which will be implemented across Pascal and Volta chips but those chips have max bandwidth of 1.2 TB/s (Volta GPU). Moving forward, there exists a looming memory power crisis. HBM2 at 1.2 TB/s sure is great but it adds 60W to the power envelope on a standard GPU.

The current implementation of HBM1 on Fiji chips adds around 25W to the chip. Moving onwards, chips with access of 2 TB/s bandwidth will increase the overall power limit on chips which will go from worse to breaking point. A chip with 2.5 TB/s HBM (2nd generation) memory will reach a 120W TDP for the memory architecture alone, a 1.5 times efficient HBM 2 architecture that outputs over 3 TB/s bandwidth will need 160W to feed the memory alone.

This is not the power of the whole chip mentioned but just the memory layout, typically, these chips will be considered non-efficient for the consumer and HPC sectors but NVIDIA is trying to change that and is exploring new means to solve the memory power crisis that exists ahead with HBM and higher bandwidth. In the near future, Pascal and Volta don’t see a major consumption increase from HBM but moving onward in 2020, when NVIDIA’s next gen architecture is expected to arrive, we will probably see a new memory architecture being introduced to solve the increased power needs.

GPU Family	AMD Vega	AMD Navi	NVIDIA Pascal	NVIDIA Volta
Flagship GPU	Vega 10	Navi 10	NVIDIA GP100	NVIDIA GV100
GPU Process	14nm FinFET	7nm FinFET	TSMC 16nm FinFET	TSMC 12nm FinFET
GPU Transistors	15-18 Billion	TBC	15.3 Billion	21.1 Billion
GPU Cores (Max)	4096 SPs	TBC	3840 CUDA Cores	5376 CUDA Cores
Peak FP32 Compute	13.0 TFLOPs	TBC	12.0 TFLOPs	>15.0 TFLOPs (Full Die)
Peak FP16 Compute	25.0 TFLOPs	TBC	24.0 TFLOPs	120 Tensor TFLOPs
VRAM	16 GB HBM2	TBC	16 GB HBM2	16 GB HBM2
Memory (Consumer Cards)	HBM2	HBM3	GDDR5X	GDDR6
Memory (Dual-Chip Professional/ HPC)	HBM2	HBM3	HBM2	HBM2
HBM2 Bandwidth	484 GB/s (Frontier Edition)	>1 TB/s?	732 GB/s (Peak)	900 GB/s
Graphics Architecture	Next Compute Unit (Vega)	Next Compute Unit (Navi)	5th Gen Pascal CUDA	6th Gen Volta CUDA
Successor of (GPU)	Radeon RX 500 Series	Radeon RX 600 Series	GM200 (Maxwell)	GP100 (Pascal)
Launch	2017	2019	2016	2017

With the final configuration of Volta V100 and IBM Power9 CPUs in place, the Summit Supercomputer would be ranked as the top performing machine in the world with performance crossing the 250 Petaflops mark.

Deal of the Day

NVIDIA Volta GV100 GPU Chip For Summit Supercomputer Twice as Fast as Pascal P100 – Speculated To Hit 9.5 TFLOPs FP64 Compute

NVIDIA Volta VG100 GPUs - The Heart of the Summit and Sierra Supercomputer, Multi-TFLOPs Chip With Fastest HBM2 Configuration

Summit Supercomputer Latest Details Provide First Glimpse of NVIDIA Volta VG100 GPU Specs

NVIDIA Volta Tesla V100 - The Next-Generation Compute Powerhouse

Summit Supercomputer Specifications:

The Looming Memory Crisis With HBM2

Deal of the Day

Comments

Popular Discussions

AMD RDNA 4 & RDNA 3+ GPUs Receive Updated Support In Linux Graphics Drivers

Intel Arrow Lake-S 24 & 20 Core Desktop CPUs Spotted: Core Ultra 200 ES Chips Without SMT, Up To 3 GHz

NVIDIA GeForce RTX 4090 RMA in China Becomes a Huge Issue, Full Refunds Offered To Affectees

Jim Keller Criticizes NVIDIA’s Blackwell’s $10 Billion R&D Cost, Says It Could’ve Been Achievable In $1 Billion

AMD Launches Ryzen PRO 8000 Desktop APUs, Bringing Graphics & AI Leadership To Businesses

NVIDIA Volta GV100 GPU Chip For Summit Supercomputer Twice as Fast as Pascal P100 – Speculated To Hit 9.5 TFLOPs FP64 Compute

NVIDIA Volta VG100 GPUs - The Heart of the Summit and Sierra Supercomputer, Multi-TFLOPs Chip With Fastest HBM2 Configuration

Summit Supercomputer Latest Details Provide First Glimpse of NVIDIA Volta VG100 GPU Specs

NVIDIA Volta Tesla V100 - The Next-Generation Compute Powerhouse

Summit Supercomputer Specifications:

The Looming Memory Crisis With HBM2

Deal of the Day

Further Reading

Comments

Trending Stories

Popular Discussions