NVIDIA GeForce RTX Turing GPUs Detailed – GeForce RTX 2070 Features TU106, 50% Faster Per Core Performance, 50% Better Memory Compression Than Pascal

Hassan Mujtaba • Sep 12, 2018 11:52 AM EDT

• Copy Shortlink

New features of the NVIDIA Turing GPU architecture have been revealed and detailed by the folks over at Videocardz. The new details show how the Turing GPUs are a huge departure from current GeForce graphics cards based on the Pascal GPU architecture and the techniques NVIDIA is using to deliver the best performance to end users and gamers.

NVIDIA Turing GPUs For GeForce RTX Graphics Cards Detailed - More Core Performance, Better Memory Compression, and New Features For Gamers

Starting with the most significant part of the Turing GPU architecture, the Turing SM, we are seeing an entirely new graphics core. The Turing SM is made up of a combination of INT32, FP32, and the new Tensor cores. Each SM has 96 KB of L1 cache which is shared across the entire GPU. There are four warp schedulers and dispatchers inside a Turing GPU and similarly, there are four register file units.

Coming to the new execution units or cores, Turing has both INT32 and FP32 units. Each SM has 64 each and 8 Tensor cores. This new architectural design allows Turing to execute floating point and non-floating point operations in parallel which allows for up to 36% higher throughput in standard floating point operations. The entire SM works in harmony by using different blocks to deliver high performance and better texture caching, enabling for up to 50% better CUDA core performance when compared to the previous generation.

Following is a shot of the Turing SM by Videocardz:

NVIDIA GeForce RTX/GTX "Turing" Family:

Graphics Card Name	NVIDIA GeForce GTX 1650	NVIDIA GeForce GTX 1650 D6	NVIDIA GeForce GTX 1650	NVIDIA GeForce GTX 1660	NVIDIA GeForce GTX 1660 SUPER	NVIDIA GeForce GTX 1660 Ti	NVIDIA GeForce RTX 2060	NVIDIA GeForce RTX 2070	NVIDIA GeForce RTX 2080	NVIDIA GeForce RTX 2080 Ti
GPU Architecture	Turing GPU (TU117)	Turing GPU (TU117)	Turing GPU (TU116)	Turing GPU (TU116)	Turing GPU (TU116)	Turing GPU (TU116)	Turing GPU (TU106)	Turing GPU (TU106)	Turing GPU (TU104)	Turing GPU (TU102)
Process	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN
Die Size	200mm2	200mm2	284mm2	284mm2	284mm2	284mm2	445mm2	445mm2	545mm2	754mm2
Transistors	4.7 Billion	4.7 Billion	6.6 Billion	6.6 Billion	6.6 Billion	6.6 Billion	10.6 Billion	10.6 Billion	13.6 Billion	18.6 Billion
CUDA Cores	896 Cores	896 Cores	1280 Cores	1408 Cores	1408 Cores	1536 Cores	1920 Cores	2304 Cores	2944 Cores	4352 Cores
TMUs/ROPs	56/32	56/32	80/32	88/48	88/48	96/48	120/48	144/64	192/64	288/96
GigaRays	N/A	N/A	N/A	N/A	N/A	N/A	5 Giga Rays/s	6 Giga Rays/s	8 Giga Rays/s	10 Giga Rays/s
Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	4 MB L2 Cache	4 MB L2 Cache	4 MB L2 Cache	6 MB L2 Cache
Base Clock	1485 MHz	1410 MHz	1530 MHz	1530 MHz	1530 MHz	1500 MHz	1365 MHz	1410 MHz	1515 MHz	1350 MHz
Boost Clock	1665 MHz	1590 MHz	1725 MHz	1785 MHz	1785 MHz	1770 MHz	1680 MHz	1620 MHz 1710 MHz OC	1710 MHz 1800 MHz OC	1545 MHz 1635 MHz OC
Compute	3.0 TFLOPs	3.0 TFLOPs	4.4 TFLOPs	5.0 TFLOPs	5.0 TFLOPs	5.5 TFLOPs	6.5 TFLOPs	7.5 TFLOPs	10.1 TFLOPs	13.4 TFLOPs
Memory	Up To 4 GB GDDR5	Up To 4 GB GDDR6	Up To 4 GB GDDR6	Up To 6 GB GDDR5	Up To 6 GB GDDR6	Up To 6 GB GDDR6	Up To 6 GB GDDR6	Up To 8 GB GDDR6	Up To 8 GB GDDR6	Up To 11 GB GDDR6
Memory Speed	8.00 Gbps	12.00 Gbps	12.00 Gbps	8.00 Gbps	14.00 Gbps	12.00 Gbps	14.00 Gbps	14.00 Gbps	14.00 Gbps	14.00 Gbps
Memory Interface	128-bit	128-bit	128-bit	192-bit	192-bit	192-bit	192-bit	256-bit	256-bit	352-bit
Memory Bandwidth	128 GB/s	192 GB/s	192 GB/s	192 GB/s	336 GB/s	288 GB/s	336 GB/s	448 GB/s	448 GB/s	616 GB/s
Power Connectors	N/A	N/A	6 Pin	8 Pin	8 Pin	8 Pin	8 Pin	8 Pin	8+8 Pin	8+8 Pin
TDP	75W	75W	100W	120W	125W	120W	160W	185W (Founders) 175W (Reference)	225W (Founders) 215W (Reference)	260W (Founders) 250W (Reference)
Starting Price	$149 US	$149 US	$159 US	$219 US	$229 US	$279 US	$349 US	$499 US	$699 US	$999 US
Price (Founders Edition)	$149 US	$149 US	$159 US	$219 US	$229 US	$279 US	$349 US	$599 US	$799 US	$1,199 US
Launch	April 2019	April 2020	November 2019	March 2019	October 2019	February 2019	January 2019	October 2018	September 2018	September 2018

The Turing GPUs Dissected - TU102 For RTX 2080 Ti, TU104 For RTX 2080 and TU106 For RTX 2070

NVIDIA is for the first time not only launching the **80 and **70 cards along with the flagship **80 Ti model but they are also launching graphics cards with three different GPUs. While the GPUs are similar in design, the configurations are very different and one thing we can tell is that the configs leave a lot of room for NVIDIA to expand upon in the future if they want to.

What I mean to say is that the RTX 2080 Ti isn't based on the full TU102 GPU, the RTX 2080 is also not based on the full TU104 GPU while the RTX 2070 is the only card that utilizes the full config of the GPU its based upon, the Turing TU106.

One more thing, these GPUs are really huge in terms of die size compared to the Pascal GPU, while using the 12nm process. The reason being the added INT32 execution units and Tensor cores which weren't available on any previous consumer based GeForce graphics cards. Hence, the TU106 GPU which succeeds the GP106 GPU is over twice as large as its predecessor (445mm2 versus 200mm2).

Here's another thing, the GP106 was used in the GTX 1060 which is more of a mainstream graphics card. However, while the RTX 2070 rocks a TU106 GPU which may make it look like a mainstream GPU with a much higher price tag, it does have overall better specifications compared to the GP104 based GTX 1070 with higher cores, better memory, and more features. It also has around twice as many cores as the GTX 1060 so calling it a mainstream graphics card won't be a wise choice.

NVIDIA Turing TU102 GPU

So overall, the TU102 is made up of 6 graphics processing clusters with 6 SM units on each cluster. That makes up 36 SM units for a total of 4608 Cores in an 18.6 billion transistor package measuring 775mm2.

NVIDIA Turing TU104 GPU

The TU104 is made up of 6 graphics processing clusters with 4 SM units on each cluster. That makes up 24 SM units for a total of 3072 cores in a 13.6 billion transistor package measuring 545mm2.

NVIDIA Turing TU106 GPU

The TU106 is made up of 3 Graphics processing clusters with 6 SM units on each cluster. That makes up 18 SM units for a total of 2304 Cores in a 10.6 billion transistor package measuring 445mm2.

NVIDIA Turing GPU Packs 50% Better Performance Per Core Than Pascal GPUs

In terms of shading performance which is the direct result of the enhanced core design and GPU architecture revamp, the Turing GPU offers an average uplift of 50% better performance per core compared to Pascal GPUs. In VR games, the shading performance would be a good 2x ahead than what Pascal achieved while many modern gaming titles show a ~50% lead over Pascal with Turing’s enhanced core design.

It should be pointed that these are just per core performance gains at the same clock speeds without adding the benefits of other technologies that Turing comes with. That would further increase the performance in a wide variety of gaming applications as we have already seen the gaming performance of a GeForce RTX 2080 to be 50% faster than the GTX 1080 on average and twice as fast with the new DLSS technology.

NVIDIA is also incorporating new shading models, one of which is known as Mesh Shading that would significantly help games process vertex, tesselation, and geometry shading:

Mesh Shading — new shader model for vertex, tesselation, geometry shading (more objects per scene)
Variable Rate Shading (VRS) — developer control over shading rates (to limit shading where it does not provide visual benefit)
Texture-Space Sharing — Storing shading results in memory (no need to duplicate sharing work for the processes)
Multi-View Rendering (MVR) — Extends Pascal’s Single Pass Stereo to multi-views in a single pass

NVIDIA Turing GPUs With Better Memory Compression - Effective Memory Bandwidth Increased Up To 50% Over Pascal GPUs, Over 1.5 TB/s

One of the key improvements of Pascal over Maxwell was the faster memory compression algorithms which delivered very high bandwidth by using various compression and caching techniques.

With Turing, we are looking at the third generation of memory compression architecture which is said to effectively deliver up to 50% boost in effective bandwidth when compared to Pascal GPUs. We know that the Pascal GeForce GTX 1080 Ti memory bandwidth was boosted to 1.2 TB/s over the raw 484.4 GB/s bandwidth when using these algorithms and with Turing, NVIDIA is saying that we should expect 50% more effective bandwidth with Memory Compression 3.0.

Since Turing GPU already have higher raw bandwidth compared to Pascal GPUs (RTX 2080 Ti with 616 GB/s), we can expect the effective bandwidth using the new algorithm to reach past 1.5 TB/s which is very good considering it would help games deliver even better performance on higher resolutions which the graphics cards are aiming at.

NVIDIA Turing GPUs With Display Port 1.4a, Enhanced NVENC Encoder/Decoder

The Turing GPUs featured on the GeForce RTX graphics cards also come with new display capabilities. The highlight of them may be the VirtualLink USB Type-C port but there's also DisplayPort 1.4a, both of which enable 8K at 60 Hz.

The cards will also be equipped with an enhanced NVENC encoder and decoder that can encode H.265 streams at 8K/30 FPS and decode with HEV YUV444 10/12bit HDR. H.264 8K and support HDR (VP9 10/12).

The GeForce RTX 20 Series Market Availability – Preorder and Shipping Today, On Shelves 20th September

The NVIDIA GeForce RTX 20 series launches today in reference variants first. This time, NVIDIA has already given the green light to their manufacturers to announce custom cards soon after the reference launch which are now available to pre-order on the official GeForce webpage. Or you can head over to this article and check out all the glorious non-reference models which you will be able to get very soon.

nvidia_gamescom_2018_geforce_rtx_20_series_launch_25

nvidia_gamescom_2018_geforce_rtx_20_series_launch_24

The one thing we should tell is that the performance numbers are still under wraps till 19th August which leaves little or no time for consumers to reconsider the pre-orders since the availability is a day later or less than 24 hours. The reviews for the GeForce RTX 2080 Ti and RTX 2080 will go live on 19th September at the same time, but if you are planning to buy one, or already pre-ordered one, but going to reconsider your purchase, then you will have little to think.

Check out the other cards in the links below:

Which NVIDIA GeForce RTX 20 Series graphics card are you buying?

Deal of the Day

NVIDIA GeForce RTX Turing GPUs Detailed – GeForce RTX 2070 Features TU106, 50% Faster Per Core Performance, 50% Better Memory Compression Than Pascal

NVIDIA Turing GPUs For GeForce RTX Graphics Cards Detailed - More Core Performance, Better Memory Compression, and New Features For Gamers

NVIDIA GeForce RTX/GTX "Turing" Family:

The Turing GPUs Dissected - TU102 For RTX 2080 Ti, TU104 For RTX 2080 and TU106 For RTX 2070

NVIDIA Turing GPU Packs 50% Better Performance Per Core Than Pascal GPUs

NVIDIA Turing GPUs With Better Memory Compression - Effective Memory Bandwidth Increased Up To 50% Over Pascal GPUs, Over 1.5 TB/s

NVIDIA Turing GPUs With Display Port 1.4a, Enhanced NVENC Encoder/Decoder

The GeForce RTX 20 Series Market Availability – Preorder and Shipping Today, On Shelves 20th September

Deal of the Day

Comments

Popular Discussions

AMD Radeon RX 7000 & NVIDIA GeForce RTX 40 GPUs Available Below MSRP Across All Models In Germany

NVIDIA Acknowledges “Strong Competition” In AI Market, Reaffirms Company’s Business Not Just Hardware But Software Too

AMD Radeon RX 7000 GPU Deals: 7900 XTX For $799, 7900 GRE For $510, 7800 XT For $457, 7700 XT For $351, 7600 XT For $299

AMD Launches Ryzen PRO 8000 Desktop APUs, Bringing Graphics & AI Leadership To Businesses

It’s Time To Bid Farewell To AMD RDNA 2 “Radeon RX 6000” GPUs, Inventory Hits Rock Bottom

NVIDIA GeForce RTX Turing GPUs Detailed – GeForce RTX 2070 Features TU106, 50% Faster Per Core Performance, 50% Better Memory Compression Than Pascal

NVIDIA Turing GPUs For GeForce RTX Graphics Cards Detailed - More Core Performance, Better Memory Compression, and New Features For Gamers

Related Story A Wild Rumor Suggests That You Can Still Get NVIDIA’s H100 Chips in China From Super Micro Computer (SMCI)

NVIDIA GeForce RTX/GTX "Turing" Family:

The Turing GPUs Dissected - TU102 For RTX 2080 Ti, TU104 For RTX 2080 and TU106 For RTX 2070

NVIDIA Turing GPU Packs 50% Better Performance Per Core Than Pascal GPUs

NVIDIA Turing GPUs With Better Memory Compression - Effective Memory Bandwidth Increased Up To 50% Over Pascal GPUs, Over 1.5 TB/s

NVIDIA Turing GPUs With Display Port 1.4a, Enhanced NVENC Encoder/Decoder

The GeForce RTX 20 Series Market Availability – Preorder and Shipping Today, On Shelves 20th September

Deal of the Day

Further Reading

Comments

Trending Stories

Popular Discussions