NVIDIA GeForce RTX Turing GPUs Detailed – GeForce RTX 2070 Features TU106, 50% Faster Per Core Performance, 50% Better Memory Compression Than Pascal

Hassan Mujtaba

New features of the NVIDIA Turing GPU architecture have been revealed and detailed by the folks over at Videocardz. The new details show how the Turing GPUs are a huge departure from current GeForce graphics cards based on the Pascal GPU architecture and the techniques NVIDIA is using to deliver the best performance to end users and gamers.

NVIDIA Turing GPUs For GeForce RTX Graphics Cards Detailed - More Core Performance, Better Memory Compression, and New Features For Gamers

Starting with the most significant part of the Turing GPU architecture, the Turing SM, we are seeing an entirely new graphics core. The Turing SM is made up of a combination of INT32, FP32, and the new Tensor cores. Each SM has 96 KB of L1 cache which is shared across the entire GPU. There are four warp schedulers and dispatchers inside a Turing GPU and similarly, there are four register file units.

Related Story A Wild Rumor Suggests That You Can Still Get NVIDIA’s H100 Chips in China From Super Micro Computer (SMCI)

Coming to the new execution units or cores, Turing has both INT32 and FP32 units. Each SM has 64 each and 8 Tensor cores. This new architectural design allows Turing to execute floating point and non-floating point operations in parallel which allows for up to 36% higher throughput in standard floating point operations. The entire SM works in harmony by using different blocks to deliver high performance and better texture caching, enabling for up to 50% better CUDA core performance when compared to the previous generation.

Following is a shot of the Turing SM by Videocardz:

NVIDIA GeForce RTX/GTX "Turing" Family:

Graphics Card NameNVIDIA GeForce GTX 1650NVIDIA GeForce GTX 1650 D6NVIDIA GeForce GTX 1650NVIDIA GeForce GTX 1660NVIDIA GeForce GTX 1660 SUPERNVIDIA GeForce GTX 1660 TiNVIDIA GeForce RTX 2060NVIDIA GeForce RTX 2070NVIDIA GeForce RTX 2080NVIDIA GeForce RTX 2080 Ti
GPU ArchitectureTuring GPU (TU117)Turing GPU (TU117)Turing GPU (TU116)Turing GPU (TU116)Turing GPU (TU116)Turing GPU (TU116)Turing GPU (TU106)Turing GPU (TU106)Turing GPU (TU104)Turing GPU (TU102)
Process12nm FNN12nm FNN12nm FNN12nm FNN12nm FNN12nm FNN12nm FNN12nm FNN12nm FNN12nm FNN
Die Size200mm2200mm2284mm2284mm2284mm2284mm2445mm2445mm2545mm2754mm2
Transistors4.7 Billion4.7 Billion6.6 Billion6.6 Billion6.6 Billion6.6 Billion10.6 Billion10.6 Billion13.6 Billion18.6 Billion
CUDA Cores896 Cores896 Cores1280 Cores1408 Cores1408 Cores1536 Cores1920 Cores2304 Cores2944 Cores4352 Cores
TMUs/ROPs56/3256/3280/3288/4888/4896/48120/48144/64192/64288/96
GigaRaysN/AN/AN/AN/AN/AN/A5 Giga Rays/s6 Giga Rays/s8 Giga Rays/s10 Giga Rays/s
Cache1.5 MB L2 Cache1.5 MB L2 Cache1.5 MB L2 Cache1.5 MB L2 Cache1.5 MB L2 Cache1.5 MB L2 Cache4 MB L2 Cache4 MB L2 Cache4 MB L2 Cache6 MB L2 Cache
Base Clock1485 MHz1410 MHz1530 MHz1530 MHz1530 MHz1500 MHz1365 MHz1410 MHz1515 MHz1350 MHz
Boost Clock1665 MHz1590 MHz1725 MHz1785 MHz1785 MHz1770 MHz1680 MHz1620 MHz
1710 MHz OC
1710 MHz
1800 MHz OC
1545 MHz
1635 MHz OC
Compute3.0 TFLOPs3.0 TFLOPs4.4 TFLOPs5.0 TFLOPs5.0 TFLOPs5.5 TFLOPs6.5 TFLOPs7.5 TFLOPs10.1 TFLOPs13.4 TFLOPs
MemoryUp To 4 GB GDDR5Up To 4 GB GDDR6Up To 4 GB GDDR6Up To 6 GB GDDR5Up To 6 GB GDDR6Up To 6 GB GDDR6Up To 6 GB GDDR6Up To 8 GB GDDR6Up To 8 GB GDDR6Up To 11 GB GDDR6
Memory Speed8.00 Gbps12.00 Gbps12.00 Gbps8.00 Gbps14.00 Gbps12.00 Gbps14.00 Gbps14.00 Gbps14.00 Gbps14.00 Gbps
Memory Interface128-bit128-bit128-bit192-bit192-bit192-bit192-bit256-bit256-bit352-bit
Memory Bandwidth128 GB/s192 GB/s192 GB/s192 GB/s336 GB/s288 GB/s336 GB/s448 GB/s448 GB/s616 GB/s
Power ConnectorsN/AN/A6 Pin8 Pin8 Pin8 Pin8 Pin8 Pin8+8 Pin8+8 Pin
TDP75W75W100W120W125W120W160W185W (Founders)
175W (Reference)
225W (Founders)
215W (Reference)
260W (Founders)
250W (Reference)
Starting Price$149 US$149 US$159 US$219 US$229 US$279 US$349 US$499 US$699 US$999 US
Price (Founders Edition)$149 US$149 US$159 US$219 US$229 US$279 US$349 US$599 US$799 US$1,199 US
LaunchApril 2019April 2020November 2019March 2019October 2019February 2019January 2019October 2018September 2018September 2018

The Turing GPUs Dissected - TU102 For RTX 2080 Ti, TU104 For RTX 2080 and TU106 For RTX 2070

NVIDIA is for the first time not only launching the **80 and **70 cards along with the flagship **80 Ti model but they are also launching graphics cards with three different GPUs. While the GPUs are similar in design, the configurations are very different and one thing we can tell is that the configs leave a lot of room for NVIDIA to expand upon in the future if they want to.

What I mean to say is that the RTX 2080 Ti isn't based on the full TU102 GPU, the RTX 2080 is also not based on the full TU104 GPU while the RTX 2070 is the only card that utilizes the full config of the GPU its based upon, the Turing TU106.

One more thing, these GPUs are really huge in terms of die size compared to the Pascal GPU, while using the 12nm process. The reason being the added INT32 execution units and Tensor cores which weren't available on any previous consumer based GeForce graphics cards. Hence, the TU106 GPU which succeeds the GP106 GPU is over twice as large as its predecessor (445mm2 versus 200mm2).

Here's another thing, the GP106 was used in the GTX 1060 which is more of a mainstream graphics card. However, while the RTX 2070 rocks a TU106 GPU which may make it look like a mainstream GPU with a much higher price tag, it does have overall better specifications compared to the GP104 based GTX 1070 with higher cores, better memory, and more features. It also has around twice as many cores as the GTX 1060 so calling it a mainstream graphics card won't be a wise choice.

NVIDIA Turing TU102 GPU

So overall, the TU102 is made up of 6 graphics processing clusters with 6 SM units on each cluster. That makes up 36 SM units for a total of 4608 Cores in an 18.6 billion transistor package measuring 775mm2.

NVIDIA Turing TU104 GPU

The TU104 is made up of 6 graphics processing clusters with 4 SM units on each cluster. That makes up 24 SM units for a total of 3072 cores in a 13.6 billion transistor package measuring 545mm2.

NVIDIA Turing TU106 GPU

The TU106 is made up of 3 Graphics processing clusters with 6 SM units on each cluster. That makes up 18 SM units for a total of 2304 Cores in a 10.6 billion transistor package measuring 445mm2.

NVIDIA Turing GPU Packs 50% Better Performance Per Core Than Pascal GPUs

In terms of shading performance which is the direct result of the enhanced core design and GPU architecture revamp, the Turing GPU offers an average uplift of 50% better performance per core compared to Pascal GPUs. In VR games, the shading performance would be a good 2x ahead than what Pascal achieved while many modern gaming titles show a ~50% lead over Pascal with Turing’s enhanced core design.

It should be pointed that these are just per core performance gains at the same clock speeds without adding the benefits of other technologies that Turing comes with. That would further increase the performance in a wide variety of gaming applications as we have already seen the gaming performance of a GeForce RTX 2080 to be 50% faster than the GTX 1080 on average and twice as fast with the new DLSS technology.

nv-geforce-rtx-2080-performance-games
nv-geforce-rtx-2080-performance
geforce-experience-introducing-ansel-rtx-v3
nvidia_gamescom_2018_geforce_rtx_20_series_launch_20
nvidia_gamescom_2018_geforce_rtx_20_series_launch_19
nvidia_gamescom_2018_geforce_rtx_20_series_launch_18
nvidia_gamescom_2018_geforce_rtx_20_series_launch_17
nvidia_gamescom_2018_geforce_rtx_20_series_launch_16
nvidia_gamescom_2018_geforce_rtx_20_series_launch_15
nvidia_gamescom_2018_geforce_rtx_20_series_launch_28

NVIDIA is also incorporating new shading models, one of which is known as Mesh Shading that would significantly help games process vertex, tesselation, and geometry shading:

  • Mesh Shading — new shader model for vertex, tesselation, geometry shading (more objects per scene)
  • Variable Rate Shading (VRS) — developer control over shading rates (to limit shading where it does not provide visual benefit)
  • Texture-Space Sharing — Storing shading results in memory (no need to duplicate sharing work for the processes)
  • Multi-View Rendering (MVR) — Extends Pascal’s Single Pass Stereo to multi-views in a single pass

NVIDIA Turing GPUs With Better Memory Compression - Effective Memory Bandwidth Increased Up To 50% Over Pascal GPUs, Over 1.5 TB/s

One of the key improvements of Pascal over Maxwell was the faster memory compression algorithms which delivered very high bandwidth by using various compression and caching techniques.

With Turing, we are looking at the third generation of memory compression architecture which is said to effectively deliver up to 50% boost in effective bandwidth when compared to Pascal GPUs. We know that the Pascal GeForce GTX 1080 Ti memory bandwidth was boosted to 1.2 TB/s over the raw 484.4 GB/s bandwidth when using these algorithms and with Turing, NVIDIA is saying that we should expect 50% more effective bandwidth with Memory Compression 3.0.

Since Turing GPU already have higher raw bandwidth compared to Pascal GPUs (RTX 2080 Ti with 616 GB/s), we can expect the effective bandwidth using the new algorithm to reach past 1.5 TB/s which is very good considering it would help games deliver even better performance on higher resolutions which the graphics cards are aiming at.

NVIDIA Turing GPUs With Display Port 1.4a, Enhanced NVENC Encoder/Decoder

The Turing GPUs featured on the GeForce RTX graphics cards also come with new display capabilities. The highlight of them may be the VirtualLink USB Type-C port but there's also DisplayPort 1.4a, both of which enable 8K at 60 Hz.

The cards will also be equipped with an enhanced NVENC encoder and decoder that can encode H.265 streams at 8K/30 FPS and decode with HEV YUV444 10/12bit HDR. H.264 8K and support HDR (VP9 10/12).

The GeForce RTX 20 Series Market Availability – Preorder and Shipping Today, On Shelves 20th September

The NVIDIA GeForce RTX 20 series launches today in reference variants first. This time, NVIDIA has already given the green light to their manufacturers to announce custom cards soon after the reference launch which are now available to pre-order on the official GeForce webpageOr you can head over to this article and check out all the glorious non-reference models which you will be able to get very soon.

nvidia_gamescom_2018_geforce_rtx_20_series_launch_25
nvidia_gamescom_2018_geforce_rtx_20_series_launch_24
nvidia_gamescom_2018_geforce_rtx_20_series_launch_23
nvidia_gamescom_2018_geforce_rtx_20_series_launch_22
nvidia_gamescom_2018_geforce_rtx_20_series_launch_26

The one thing we should tell is that the performance numbers are still under wraps till 19th August which leaves little or no time for consumers to reconsider the pre-orders since the availability is a day later or less than 24 hours. The reviews for the GeForce RTX 2080 Ti and RTX 2080 will go live on 19th September at the same time, but if you are planning to buy one, or already pre-ordered one, but going to reconsider your purchase, then you will have little to think.

Check out the other cards in the links below:

Which NVIDIA GeForce RTX 20 Series graphics card are you buying?
Share this story

Deal of the Day

Comments