NVIDIA Hopper H100 GPU Is Even More Powerful In Latest Specifications, Up To 67 TFLOPs Single-Precision Compute Listed

Hassan Mujtaba • Oct 3, 2022 04:50 AM EDT

• Copy Shortlink

NVIDIA Kepler GK110 GPU Is Equivalent To A Single GPC on Hopper H100 GPU, 4th Gen Tensor Cores Up To 2x Faster 1

NVIDIA has published the official specifications of its Hopper H100 GPU which is more powerful than what we had expected.

NVIDIA Hopper H100 GPU Specs Updated, Now Features Even Faster 67 TFLOPs FP32 Compute Horsepower

When NVIDIA announced its Hopper H100 GPU for AI Datacenters earlier this year, the company had published up to 60 TFLOPs FP32 and 30 TFLOPs FP64 figures. However, as the launch comes close, the company has now updated the specifications to reflect more realistic expectations and as it turns out, the flagship and fastest chip for the AI segment is, even more, faster now.

One reason why the compute numbers have seen a boost is because when the chip goes through production, the GPU manufacturer can finalize the numbers based on actual clock speeds. It is likely that NVIDIA used conservative clock figures to provide the preliminary performance figures and as the production hit full swing, the company saw that the chip can offer much better clocks.

Last month at GTC, NVIDIA confirmed that their Hopper H100 GPU was under full production and partners will be rolling out the first wave of products in October this year. It was also confirmed that the global rollout for Hopper will include three phases, the first will be pre-orders for NVIDIA DGX H100 systems & free hands of labs to customers directly from NVIDIA with systems such as Dell's Power Edge servers which are now available on NVIDIA LaunchPad.

NVIDIA Hopper H100 GPU Specifications At A Glance

So coming to the specifications, the NVIDIA Hopper GH100 GPU is composed of a massive 144 SM (Streaming Multiprocessor) chip layout which is featured in a total of 8 GPCs. These GPCs rock total of 9 TPCs which are further composed of 2 SM units each. This gives us 18 SMs per GPC and 144 on the complete 8 GPC configuration. Each SM is composed of up to 128 FP32 units which should give us a total of 18,432 CUDA cores.

Following are some of the configurations you can expect from the H100 chip:

The full implementation of the GH100 GPU includes the following units:

8 GPCs, 72 TPCs (9 TPCs/GPC), 2 SMs/TPC, 144 SMs per full GPU
128 FP32 CUDA Cores per SM, 18432 FP32 CUDA Cores per full GPU
4 Fourth-Generation Tensor Cores per SM, 576 per full GPU
6 HBM3 or HBM2e stacks, 12 512-bit Memory Controllers
60 MB L2 Cache
Fourth-Generation NVLink and PCIe Gen 5

The NVIDIA H100 GPU with SXM5 board form-factor includes the following units:

8 GPCs, 66 TPCs, 2 SMs/TPC, 132 SMs per GPU
128 FP32 CUDA Cores per SM, 16896 FP32 CUDA Cores per GPU
4 Fourth-generation Tensor Cores per SM, 528 per GPU
80 GB HBM3, 5 HBM3 stacks, 10 512-bit Memory Controllers
50 MB L2 Cache
Fourth-Generation NVLink and PCIe Gen 5

This is a 2.25x increase over the full GA100 GPU configuration. NVIDIA is also leveraging more FP64, FP16 & Tensor cores within its Hopper GPU which would drive up performance immensely. And that's going to be a necessity to rival Intel's Ponte Vecchio which is also expected to feature 1:1 FP64. NVIDIA states that the 4th Gen Tensor Cores on Hopper deliver 2 times the performance at the same clock.

The following NVIDIA Hopper H100 performance breakdown shows that the additional SMs are only a 20% performance increase. The main benefit comes from the 4th Gen Tensor Cores and the FP8 compute the path. Higher frequency also adds a decent 30% uplift to the mix.

An interesting comparison that points out GPU scaling shows that a single GPC on a Hopper H100 GPU is equivalent to a Kepler GK110 GPU, a flagship HPC chip from 2012. The Kepler GK110 housed a total of 15 SMs whereas the Hopper H110 GPU packs 132 SMs and even a singular GPC on the Hopper GPU features 18 SMs, 20% more than the entirety of SMs on the Kepler flagship.

The cache is another space where NVIDIA has given much attention, upping it to 48 MB in the Hopper GH100 GPU. This is a 20% increase over the 50 MB cache featured on the Ampere GA100 GPU and 3x the size of AMD's flagship Aldebaran MCM GPU, the MI250X.

Rounding up the performance figures, NVIDIA's GH100 Hopper GPU will offer 4000 TFLOPs of FP8, 2000 TFLOPs of FP16, 1000 TFLOPs of TF32, 67 TFLOPs of FP32 and 34 TFLOPs of FP64 Compute performance. These record-shattering figures decimate all other HPC accelerators that came before it. For comparison, this is 3.3x faster than NVIDIA's own A100 GPU and 28% faster than AMD's Instinct MI250X in the FP64 compute. In FP16 compute, the H100 GPU is 3x faster than A100 and 5.2x faster than MI250X which is literally bonkers.

The PCIe variant which is a cut-down model was recently listed over in Japan for over $30,000 US so one can imagine that the SXM variant with a beefier configuration will easily cost around $50 grand.

NVIDIA HPC / AI GPUs

NVIDIA Tesla Graphics Card	NVIDIA B200	NVIDIA H200 (SXM5)	NVIDIA H100 (SMX5)	NVIDIA H100 (PCIe)	NVIDIA A100 (SXM4)	NVIDIA A100 (PCIe4)	Tesla V100S (PCIe)	Tesla V100 (SXM2)	Tesla P100 (SXM2)	Tesla P100 (PCI-Express)	Tesla M40 (PCI-Express)	Tesla K40 (PCI-Express)
GPU	B200	H200 (Hopper)	H100 (Hopper)	H100 (Hopper)	A100 (Ampere)	A100 (Ampere)	GV100 (Volta)	GV100 (Volta)	GP100 (Pascal)	GP100 (Pascal)	GM200 (Maxwell)	GK110 (Kepler)
Process Node	4nm	4nm	4nm	4nm	7nm	7nm	12nm	12nm	16nm	16nm	28nm	28nm
Transistors	208 Billion	80 Billion	80 Billion	80 Billion	54.2 Billion	54.2 Billion	21.1 Billion	21.1 Billion	15.3 Billion	15.3 Billion	8 Billion	7.1 Billion
GPU Die Size	TBD	814mm2	814mm2	814mm2	826mm2	826mm2	815mm2	815mm2	610 mm2	610 mm2	601 mm2	551 mm2
SMs	160	132	132	114	108	108	80	80	56	56	24	15
TPCs	80	66	66	57	54	54	40	40	28	28	24	15
L2 Cache Size	TBD	51200 KB	51200 KB	51200 KB	40960 KB	40960 KB	6144 KB	6144 KB	4096 KB	4096 KB	3072 KB	1536 KB
FP32 CUDA Cores Per SM	TBD	128	128	128	64	64	64	64	64	64	128	192
FP64 CUDA Cores / SM	TBD	128	128	128	32	32	32	32	32	32	4	64
FP32 CUDA Cores	TBD	16896	16896	14592	6912	6912	5120	5120	3584	3584	3072	2880
FP64 CUDA Cores	TBD	16896	16896	14592	3456	3456	2560	2560	1792	1792	96	960
Tensor Cores	TBD	528	528	456	432	432	640	640	N/A	N/A	N/A	N/A
Texture Units	TBD	528	528	456	432	432	320	320	224	224	192	240
Boost Clock	TBD	~1850 MHz	~1850 MHz	~1650 MHz	1410 MHz	1410 MHz	1601 MHz	1530 MHz	1480 MHz	1329MHz	1114 MHz	875 MHz
TOPs (DNN/AI)	20,000 TOPs	3958 TOPs	3958 TOPs	3200 TOPs	2496 TOPs	2496 TOPs	130 TOPs	125 TOPs	N/A	N/A	N/A	N/A
FP16 Compute	10,000 TFLOPs	1979 TFLOPs	1979 TFLOPs	1600 TFLOPs	624 TFLOPs	624 TFLOPs	32.8 TFLOPs	30.4 TFLOPs	21.2 TFLOPs	18.7 TFLOPs	N/A	N/A
FP32 Compute	90 TFLOPs	67 TFLOPs	67 TFLOPs	800 TFLOPs	156 TFLOPs (19.5 TFLOPs standard)	156 TFLOPs (19.5 TFLOPs standard)	16.4 TFLOPs	15.7 TFLOPs	10.6 TFLOPs	10.0 TFLOPs	6.8 TFLOPs	5.04 TFLOPs
FP64 Compute	45 TFLOPs	34 TFLOPs	34 TFLOPs	48 TFLOPs	19.5 TFLOPs (9.7 TFLOPs standard)	19.5 TFLOPs (9.7 TFLOPs standard)	8.2 TFLOPs	7.80 TFLOPs	5.30 TFLOPs	4.7 TFLOPs	0.2 TFLOPs	1.68 TFLOPs
Memory Interface	8192-bit HBM4	5120-bit HBM3e	5120-bit HBM3	5120-bit HBM2e	6144-bit HBM2e	6144-bit HBM2e	4096-bit HBM2	4096-bit HBM2	4096-bit HBM2	4096-bit HBM2	384-bit GDDR5	384-bit GDDR5
Memory Size	Up To 192 GB HBM3 @ 8.0 Gbps	Up To 141 GB HBM3e @ 6.5 Gbps	Up To 80 GB HBM3 @ 5.2 Gbps	Up To 94 GB HBM2e @ 5.1 Gbps	Up To 40 GB HBM2 @ 1.6 TB/s Up To 80 GB HBM2 @ 1.6 TB/s	Up To 40 GB HBM2 @ 1.6 TB/s Up To 80 GB HBM2 @ 2.0 TB/s	16 GB HBM2 @ 1134 GB/s	16 GB HBM2 @ 900 GB/s	16 GB HBM2 @ 732 GB/s	16 GB HBM2 @ 732 GB/s 12 GB HBM2 @ 549 GB/s	24 GB GDDR5 @ 288 GB/s	12 GB GDDR5 @ 288 GB/s
TDP	700W	700W	700W	350W	400W	250W	250W	300W	300W	250W	250W	235W

News Source: Videocardz

Deal of the Day

NVIDIA Hopper H100 GPU Is Even More Powerful In Latest Specifications, Up To 67 TFLOPs Single-Precision Compute Listed

NVIDIA Hopper H100 GPU Specs Updated, Now Features Even Faster 67 TFLOPs FP32 Compute Horsepower

NVIDIA HPC / AI GPUs

Deal of the Day

Comments

Popular Discussions

AMD RDNA 4 & RDNA 3+ GPUs Receive Updated Support In Linux Graphics Drivers

Intel Arrow Lake-S 24 & 20 Core Desktop CPUs Spotted: Core Ultra 200 ES Chips Without SMT, Up To 3 GHz

Jim Keller Criticizes NVIDIA’s Blackwell’s $10 Billion R&D Cost, Says It Could’ve Been Achievable In $1 Billion

AMD Launches Ryzen PRO 8000 Desktop APUs, Bringing Graphics & AI Leadership To Businesses

It’s Time To Bid Farewell To AMD RDNA 2 “Radeon RX 6000” GPUs, Inventory Hits Rock Bottom

NVIDIA Hopper H100 GPU Is Even More Powerful In Latest Specifications, Up To 67 TFLOPs Single-Precision Compute Listed

NVIDIA Hopper H100 GPU Specs Updated, Now Features Even Faster 67 TFLOPs FP32 Compute Horsepower

Related Story SXM-To-PCIe Adapter Board Will Let Users Convert NVIDIA’s Top H100 AI GPUs Into 1 or 2-Slot Graphics Cards

NVIDIA HPC / AI GPUs

Deal of the Day

Further Reading

Comments

Trending Stories

Popular Discussions