NVIDIA GeForce GTX 870 “Maxwell” Specifications Analysis – 13 SMM Units With 2nd Generation 1664 CUDA Cores, Performance Benchmarked

Hassan Mujtaba

The specifications of NVIDIA's upcoming Maxwell based GeForce GTX 870 have been leaked Coolaler and we already covered the basic details of the card in another article which can be seen here. The NVIDIA Maxwell based GeForce GTX 870 graphics card is one of the three SKUs arriving to consumers in the upcoming months (September and October).GeForce GTX 870 / GeForce GTX 880 MakeUp

NVIDIA GeForce GTX 870 "Maxwell" Specifications Analysis

Last time, I did the analysis of the leaked GeForce GTX 880 PCB which was in engineering state and found out some intriguing details such as the 256-Bit memory bus, the GM204 chip being larger than GK104 while being based on the current 28nm process node. We didn't get to see any specifications of the card itself aside from a few details we were able to gather from the pictures. Today, we have a different sort of leak in the form of a screenshot which shows the GPU-z program running parallel to a 3DMark 11 score of an unknown graphics card.

The leaker from the main source has mentioned that the graphics card in question is NVIDIA's GeForce GTX 870 which is the skimmed down variant of the GeForce GTX 880 graphics which will be available to consumers in the upcoming months. The GeForce GTX 870 boasts the second generation Maxwell architecture and will feature the GM204 core architecture which will become the foundation of the flagship GeForce GTX 880 too. The GeForce GTX 860 is also expected to launch in Q4 2014 but it isn't expected whether the card will feature GM204 core or the GM206 core but let's forget that for a moment and focus on the details we have in hands.

The GeForce GTX 870 has the Device Id "NVIDIA D17U-20" and the GPU Id for the graphics core is 13C2. We have our own speculation regarding that but before we get into that, let's denote the entire specifications listed on the GPU-z panel. The GM204 SKU featured on the GeForce GTx 870 fuses 1664 CUDA Cores on the die along with 32 ROPs and 138 TMUs. From the first generation Maxwell core architecture, we learned that a Maxwell SMM (Streaming Multiprocessor Maxwell) unit has 128 cores compared to 192 on the current generation Kepler SMX units. Along with that, we have a 4 GB GDDR5 memory running across a 256-Bit memory interface clocked at 1753 MHz (7.00 GHz Effective) which pumps out 224.4 GB/s bandwidth. The core clock is maintained at 1051 MHz and 1178 MHz boost clock something which I was expecting if the cards were to be able to take on the GK110 core based graphics cards. Lastly, we have the fill rate numbers which amount to 33.6 GPixels/s Pixel and 145.0 GTexels/s Texture fill rates. The TMU count is a bit off since the maxwell core architecture holds 8 TMUs per SMX so we were supposed to see 104 TMUs in total instead of 138 so it could be a GPU-Z fault which could mean that these specifications might not be the actual deal.

Now as you may remember, the first generation Maxwell core architecture which is featured on the GM107 GPU has 2 times the performance per watt and the CUDA Cores, even though lower amount to more performance that is rated at 135% of the past generation. The key word here is first generation, since NVIDIA just previewed what their Maxwell architecture is capable of and having ample time for optimizing and enhancing the architecture, I expect to see better numbers being rated at 145-150% and we can see more power efficiency on the new cards. The GeForce GTX 750 Ti which is NVIDIA's entry level Maxwell offering is rated at 64W and comes with splendid performance for its wattage. Considering the updated design, I think there's no doubt that NVIDIA GeForce GTX 870 will be able to trump the GeForce GTX 780 in performance (although not by a big margin).

The GPU name is also interesting since 13C2 could probably mean 13 SMM units which are featured on the GPU which makes us think about the GeForce GTX 880. With a SMM or two in place, we can see the GeForce GTX 880 include 1792 or 1920 cores which although lower than the GK110 based cards is reasonably higher than what was featured on the GK104 chips and with the performance per core leverage, we can see the GeForce GTX 880 right next to the GeForce GTX 780 Ti in performance numbers. The GM200 is still far away from launch and is expected next year but that just gives a glimpse of what to expect from the daddy core which can easily topple the flagship chips.

The core count and configuration for the GeForce GTX 880 looks around right since looking at the trend in the table provided below, you can note that the skimmed down *70 graphics card has generally one or two SM units disabled. The GK110 is the only chip in the line which has more chips disabled but that is due to the extended lineup due to which we had several cards in this line aka GeForce GTX 780, GeForce GTX Titan, GeForce GTX 780 Ti, GeForce GTX Titan Black (all of which are configured around the same core). In case there will be a higher SM units SKU, then NVIDIA will most definitely launch it as a Ti part as was rumored by a few sources a while back. If there are more disabled SMMs, then NVIDIA will launch them as a different card rather than releasing a 4 SMM disabled part

The 4 GB GDDR5 route seems to be the reference route for NVIDIA this time around moving onwards from a 2 GB GDDR5 VRAM configuration on several Kepler cards. The chips will still be configured around a 256-Bit bus but this just seems to be the route for graphics makers these days considering AMD is going this route with their next performance chip, codenamed Tonga (Radeon R9 285) which is considered faster than the Tahiti but has a 256-Bit memory bus compared to 384-Bit on Tahiti based Radeon R9 280 series.

 NVIDIA GeForce GTX 870 Leaked Performance Numbers:

The performance numbers of the GeForce GTX 870 are interesting considering they are really close to the GeForce GTX 780. The card was tested with a Intel Core i7-4820K CPU and is currently a test sample so final performance updates and drivers are not in place considering the GPU-z couldn't even recognize the card properly until the database is updated. In the 3DMark 11 Performance and Extreme modes, the GeForce GTX 870 was able to score P11919 points and X4625 marks respectively. A GeForce GTX 780 on the other hand on average scores around X4500 points and P12000 points. With the GeForce GTX 870 still optimized and without any sort of official drivers, I say these numbers are good considering this card will retail at $349 US and get you performance on par or better than a card based on a flagship chip from current generation.

NVIDIA Maxwell GM107 Architecture

The NVIDIA Maxwell GM107 architecture is most clearly built from scratch yet looks like an hybrid of both Kepler and Fermi. The SMM or Streaming Multiprocessor of Maxwell will replace the SMX of Kepler and each of the smm are assembled into four blocks yet are defined as part of a single SMM which means the core architecture has got quite dense with Maxwell. Each of these blocks hold 32 CUDA cores so a single SMM with four of these operation units results in 128 CUDA cores. The SMM has 128 CUDA cores compared to 192 CUDA cores on the SMX.

The GM107 GPC ‘Graphics Processing Cluster’ consists of five of these streaming multiprocessors which are connected to a Raster Engine. Each SMM consists of Polymorph Engine 2.0 which includes the Vertex Fetch, Tessellator, Viewport transform, attribute setup and stream output. Each SMM has 8 texture mapping units which equates to 40 on the whole chip and 16 ROPs while connected to two 64-bit memory controllers.

There’s also a handful of cache on Maxwell which is 2 MB in total compared to 256 KB on Kepler GK107 which reduces GPU queuing. IPCC has been increased on the GPU core and balancing improvement to workload has been done. NVIDIA has improved their H.264 encoding and decoding with NVENC on Maxwell so that would result in better performance in ShadowPlay technology and improved sleep states have been implemented to reduce power input while the GPU is running idle. All of this is packed in a die which measures 148mm2  which means transistor density has been upped by 15% on the same 28nm process design.

  • NVIDIA GM200.GM210 (Maxwell Architecture, High-Performance for Telsa/Quadro Arrives later for Cosnumers, Successor of GK110)
  • NVIDIA GM204 (Maxwell Architecture, High-End Consumer, Successor of GK104, First GeForce 800 Series Products likely to feature)
  • NVIDIA GM206 (Maxwell Architecture, Performance Minded, Successor of GK206, Mid-Range GeForce 800 Series products to feature)
  • NVIDIA GM107/207 (Maxwell Architecture, Entry Level, Successor of GK107, Entry Level GeForce 800/700 Series To feature, Already introduced on GTX 750 Ti / GTX 750)

Do tell us if we missed out anything in the Disqus comments section below!

NVIDIA GeForce GTX 870 and GTX 880 Specifications (What We Know So Far):

GeForce GTX 470 GeForce GTX 480 GeForce GTX 570 GeForce GTX 580 GeForce GTX 670 GeForce GTX 680 GeForce GTX 770 GeForce GTX 780 GeForce GTX 780 Ti GeForce GTX 870 GeForce GTX 880
Codename GF100 GF100 GF110 GF110 GK104 GK104 GF114 GK110 GK110 GM204 GM204
Process 40nm 40nm 40nm 40nm 28nm 28nm 28nm 28nm 28nm 28nm 28nm
GPU Core Fermi Fermi Fermi Fermi Kepler Kepler Kepler Kepler Kepler Maxwell Maxwell
SM Units 14 x 32 15 x 32 15 x 32 16 x 32 7 x 192 8 x 192 8 x 192 12 x 192 14 x 192 13 x 128 16 x 128
CUDA Cores 448 480 480 512 1344 1536 1536 2304 2880 1664 2048
ROPS 40 48 40 48 32 32 32 48 48 TBC 64
TMUs 56 60 60 64 112 128 128 192 240 TBC 128
Core Clock 607 MHz 700 MHz 732 MHz 772 MHz 915 MHz 1006 MHz 1046 MHz 863 MHz 875 MHz 1051 MHz 1126 MHz
Boost Clock 1215 MHz 1401 MHz (Shader Clock) 1464 MHz 1544 MHz (Shader Clock) 980 MHz 1058 MHz 1085 MHz 900 MHz 928 MHz 1178 MHz 1216 MHz
Memory 1.2 GB GDDR5 1.5 GB GDDR5 1.2 GB GDDR5 1.5 GB GDDR5 2 GB GDDR5 2 GB GDDR5 2 GB GDDR5 3 GB GDDR5 3 GB GDDR5 4 GB GDDR5 4 GB GDDR5
Memory Bus 320-Bit 384-Bit 320-Bit 384-Bit 256-bit 256-bit 256-bit 384-Bit 384-Bit 256-bit 256-bit
Memory Clock 3.34 GB/s 3.69 GB/s 3.80 GB/s 4.0 GB/s 6.0 GHz 6.0 GHz 7.0 GHz 6.0 GHz 7.0 GHz 7.0 GHz 7.0 GHz
Memory Bandwidth 133.34 GB/s 177.4 GB/s 152.00 GB/s 192.4 GB/s 192.0 GB/s 192.0 GB/s 224.5 GB/s 288.6 GB/s 336.0 GB/s 224.5 GB/s 224.5 GB/s
Texture Fill Rate GT/s 34 42 43.92 49.41 102.5 128.8 134 166 210 145.0 TBC
TDP 215W 250W 219W 244W 170W 192W 220W 250W 250W 148W 165W
Power Connectors 8+6 Pin 8+6 Pin 6+6 Pin 8+6 Pin 6+6 Pin 6+6 Pin 8+6 Pin 8+6 Pin 8+6 Pin 6+6 Pin 6+6 Pin
DirectX 12 Support Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Launch March 26th 2010 March 26th 2010 December 7th 2010 November 09 2010 May 10th 2012 March 22nd 2012 May 30th 2013 May 23rd 2013 December 2013 18th September 2014 18th September 2014
Price $349 US $499 US $349 US $499 US $349 US $499 US $349 US $499 US $699 US $299 Reference
$329+ Custom
$549 Reference
$549+ Custom

 

Share this story

Deal of the Day

Comments