NVIDIA is allegedly working on a brand new Hopper H100 GPU-based graphics card that would feature up to 120 GB HBM2e memory capacity.
NVIDIA Hopper H100 GPU-Powered PCIe Graphics Card With 120 GB HBM2e Memory Capacity Spotted
NVIDIA has so far officially announced two versions of the Hopper H100 GPU, an SXM5 board and a PCIe variant. Both feature differently configured Hopper H100 GPUs and while their VRAM capacity is the same at 80 GB, the former utilizes the brand new HBM3 standard while the latter utilizes the HBM2e standard.
Now based on information by s-ss.cc (via MEGAsizeGPU), NVIDIA might be working on a brand new PCIe version of the Hopper H100 GPU. The new graphics card won't feature 80 GB HBM2e but will go all out with 120 GB of HBM2e memory.
As per the information available, the Hopper H100 PCIe graphics card not only comes with all six HBM2e stacks enabled for 120 GB memory across a 6144-bit bus interface, but it also comes with the same GH100 GPU configuration as the SXM5 variant. This is a total of 16,896 CUDA cores and memory bandwidth that exceeds 3 TB/s. The single-precision compute performance has been rated at 30 TFLOPs which is the same as the SXM5 variant.
So coming to the specifications, the NVIDIA Hopper GH100 GPU is composed of a massive 144 SM (Streaming Multiprocessor) chip layout which is featured in a total of 8 GPCs. These GPCs rock total of 9 TPCs which are further composed of 2 SM units each. This gives us 18 SMs per GPC and 144 on the complete 8 GPC configuration. Each SM is composed of up to 128 FP32 units which should give us a total of 18,432 CUDA cores. Following are some of the configurations you can expect from the H100 chip:
The full implementation of the GH100 GPU includes the following units:
- 8 GPCs, 72 TPCs (9 TPCs/GPC), 2 SMs/TPC, 144 SMs per full GPU
- 128 FP32 CUDA Cores per SM, 18432 FP32 CUDA Cores per full GPU
- 4 Fourth-Generation Tensor Cores per SM, 576 per full GPU
- 6 HBM3 or HBM2e stacks, 12 512-bit Memory Controllers
- 60 MB L2 Cache
The NVIDIA H100 GPU with SXM5 board form-factor includes the following units:
- 8 GPCs, 66 TPCs, 2 SMs/TPC, 132 SMs per GPU
- 128 FP32 CUDA Cores per SM, 16896 FP32 CUDA Cores per GPU
- 4 Fourth-generation Tensor Cores per SM, 528 per GPU
- 80 GB HBM3, 5 HBM3 stacks, 10 512-bit Memory Controllers
- 50 MB L2 Cache
- Fourth-Generation NVLink and PCIe Gen 5
Now it is unknown if this is a test board or a future iteration of the Hopper H100 GPU that is being tested out. NVIDIA recently stated at GTC 22 that their Hopper GPU was now in full production and the first wave of products are rolling out next month. As yields get better, we may definitely see the 120 GB Hopper H100 PCIe graphics card and SXM5 variants in the market but for now, the 80 GB is what most customers are going to get.
NVIDIA HPC / AI GPUs
NVIDIA Tesla Graphics Card | NVIDIA B200 | NVIDIA H200 (SXM5) | NVIDIA H100 (SMX5) | NVIDIA H100 (PCIe) | NVIDIA A100 (SXM4) | NVIDIA A100 (PCIe4) | Tesla V100S (PCIe) | Tesla V100 (SXM2) | Tesla P100 (SXM2) | Tesla P100 (PCI-Express) | Tesla M40 (PCI-Express) | Tesla K40 (PCI-Express) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
GPU | B200 | H200 (Hopper) | H100 (Hopper) | H100 (Hopper) | A100 (Ampere) | A100 (Ampere) | GV100 (Volta) | GV100 (Volta) | GP100 (Pascal) | GP100 (Pascal) | GM200 (Maxwell) | GK110 (Kepler) |
Process Node | 4nm | 4nm | 4nm | 4nm | 7nm | 7nm | 12nm | 12nm | 16nm | 16nm | 28nm | 28nm |
Transistors | 208 Billion | 80 Billion | 80 Billion | 80 Billion | 54.2 Billion | 54.2 Billion | 21.1 Billion | 21.1 Billion | 15.3 Billion | 15.3 Billion | 8 Billion | 7.1 Billion |
GPU Die Size | TBD | 814mm2 | 814mm2 | 814mm2 | 826mm2 | 826mm2 | 815mm2 | 815mm2 | 610 mm2 | 610 mm2 | 601 mm2 | 551 mm2 |
SMs | 160 | 132 | 132 | 114 | 108 | 108 | 80 | 80 | 56 | 56 | 24 | 15 |
TPCs | 80 | 66 | 66 | 57 | 54 | 54 | 40 | 40 | 28 | 28 | 24 | 15 |
L2 Cache Size | TBD | 51200 KB | 51200 KB | 51200 KB | 40960 KB | 40960 KB | 6144 KB | 6144 KB | 4096 KB | 4096 KB | 3072 KB | 1536 KB |
FP32 CUDA Cores Per SM | TBD | 128 | 128 | 128 | 64 | 64 | 64 | 64 | 64 | 64 | 128 | 192 |
FP64 CUDA Cores / SM | TBD | 128 | 128 | 128 | 32 | 32 | 32 | 32 | 32 | 32 | 4 | 64 |
FP32 CUDA Cores | TBD | 16896 | 16896 | 14592 | 6912 | 6912 | 5120 | 5120 | 3584 | 3584 | 3072 | 2880 |
FP64 CUDA Cores | TBD | 16896 | 16896 | 14592 | 3456 | 3456 | 2560 | 2560 | 1792 | 1792 | 96 | 960 |
Tensor Cores | TBD | 528 | 528 | 456 | 432 | 432 | 640 | 640 | N/A | N/A | N/A | N/A |
Texture Units | TBD | 528 | 528 | 456 | 432 | 432 | 320 | 320 | 224 | 224 | 192 | 240 |
Boost Clock | TBD | ~1850 MHz | ~1850 MHz | ~1650 MHz | 1410 MHz | 1410 MHz | 1601 MHz | 1530 MHz | 1480 MHz | 1329MHz | 1114 MHz | 875 MHz |
TOPs (DNN/AI) | 20,000 TOPs | 3958 TOPs | 3958 TOPs | 3200 TOPs | 2496 TOPs | 2496 TOPs | 130 TOPs | 125 TOPs | N/A | N/A | N/A | N/A |
FP16 Compute | 10,000 TFLOPs | 1979 TFLOPs | 1979 TFLOPs | 1600 TFLOPs | 624 TFLOPs | 624 TFLOPs | 32.8 TFLOPs | 30.4 TFLOPs | 21.2 TFLOPs | 18.7 TFLOPs | N/A | N/A |
FP32 Compute | 90 TFLOPs | 67 TFLOPs | 67 TFLOPs | 800 TFLOPs | 156 TFLOPs (19.5 TFLOPs standard) | 156 TFLOPs (19.5 TFLOPs standard) | 16.4 TFLOPs | 15.7 TFLOPs | 10.6 TFLOPs | 10.0 TFLOPs | 6.8 TFLOPs | 5.04 TFLOPs |
FP64 Compute | 45 TFLOPs | 34 TFLOPs | 34 TFLOPs | 48 TFLOPs | 19.5 TFLOPs (9.7 TFLOPs standard) | 19.5 TFLOPs (9.7 TFLOPs standard) | 8.2 TFLOPs | 7.80 TFLOPs | 5.30 TFLOPs | 4.7 TFLOPs | 0.2 TFLOPs | 1.68 TFLOPs |
Memory Interface | 8192-bit HBM4 | 5120-bit HBM3e | 5120-bit HBM3 | 5120-bit HBM2e | 6144-bit HBM2e | 6144-bit HBM2e | 4096-bit HBM2 | 4096-bit HBM2 | 4096-bit HBM2 | 4096-bit HBM2 | 384-bit GDDR5 | 384-bit GDDR5 |
Memory Size | Up To 192 GB HBM3 @ 8.0 Gbps | Up To 141 GB HBM3e @ 6.5 Gbps | Up To 80 GB HBM3 @ 5.2 Gbps | Up To 94 GB HBM2e @ 5.1 Gbps | Up To 40 GB HBM2 @ 1.6 TB/s Up To 80 GB HBM2 @ 1.6 TB/s | Up To 40 GB HBM2 @ 1.6 TB/s Up To 80 GB HBM2 @ 2.0 TB/s | 16 GB HBM2 @ 1134 GB/s | 16 GB HBM2 @ 900 GB/s | 16 GB HBM2 @ 732 GB/s | 16 GB HBM2 @ 732 GB/s 12 GB HBM2 @ 549 GB/s | 24 GB GDDR5 @ 288 GB/s | 12 GB GDDR5 @ 288 GB/s |
TDP | 700W | 700W | 700W | 350W | 400W | 250W | 250W | 300W | 300W | 250W | 250W | 235W |