Big Maxwell, GM200, has been one of the most anticipated GPU’s from NVIDIA in a very long time. The prospect of having so very many Maxwell cores at our disposal to be able to run games at the highest possible detail levels and at the highest possible resolutions is indeed quite tantalizing. Oh so very tantalizing. Big Maxwell and the Titan X represents the absolute fastest technology ever produced by NVIDIA, with theoretical performance on par with that of the Titan Z, a dual GK110 part.

Big Maxwell

Sure there has been a lot of hype surrounding nearly all the launches of GPU’s that make large technological leaps. Kepler was great, it even arrived into the world mostly keeping the promises that hype kept. Big Kepler? Yeah, we all got excited about GK110 alright, it gave us some fantastic performance numbers, in compute related tasks too. You mean we can have our single precision cake and eat our double precision cake too? Finally NVIDIA! But then came Big Maxwell. The rumors were coming at us fast and furious. Lots of CUDA cores, 12GB RAM and all on the ubiquitous and mature 28nm process from TSMC we’ve all come to know and love. It sounded perfect, absolutely perfect. There is, of course, the giant purple and pink polka dotted elephant in the room; the $999 price of entry. But we’ll talk about that near the end. First, let’s explore what it can do.

Then some performance numbers started getting leaked, nothing real-world, mind you, but 3D Mark is at least something to measure it against the competition. And it was good. Over 6 Tflops of single-precision computing power to tackle all your needs. Gaming needs, of course. Unfortunately, the FP64 performance has been a bit neutered.

The Titan X makes use of the absolutely marvelous second generation Maxwell, the GM200-400-A1 core chip, which is much larger in size than the previous GM204 and GM206 chips. Instead of the more pedestrian CUDA core count of 2048, the GM200 ups that to 3072 CUDA cores that have 192 texture units and 96 raster devices at its disposal. It’s also paired with 12GB of GDDR5 VRAM that runs at 7GHZ over a 384-bit memory bus. There are 8 billion transistors packed into a 601mm2 area.

Those 3072 cores are split up amongst six separate Graphics Processing Clusters that each have four separate SMM units that consist of 128 cores each. All interconnected together and sitting on 12GB of RAM with 336GB/s of memory bandwidth. All of that translates roughy into a great gaming experience that’s offered quite nicely within the 250W TDP that they promise. Very close in fact.

GM200 is an interesting chip design coming from NVIDIA. It isn’t what GK110 was by any means, it doesn’t have hidden or sequestered transistors that point towards more components to be leveraged like GK110 did. There isn’t anything in the 8 billion transistors that isn’t being utilized. Essentially, Titan X is almost literally a GTX 980 with 50% more SMX’s.

NVIDIA GeForce GTX Titan X Specifications:

	NVIDIA GeForce GTX Titan X	NVIDIA GeForce GTX Titan Black	NVIDIA GeForce GTX 980	NVIDIA GeForce GTX 970	NVIDIA GeForce GTX 960
GPU Architecture	Maxwell	Kepler	Maxwell	Maxwell	Maxwell
GPU Name	GM200	GK110	GM204	GM204	GM206
Die Size	601mm²	561mm²	398mm²	398mm²	228mm²
Process	28nm	28nm	28nm	28nm	28nm
CUDA Cores	3072	2880	2048	1664	1024
Texture Units	192	240	128	104	64
Raster Devices	96	48	64	64	32
Clock Speed	1002 MHz	889 MHz	1126 MHz	1051 MHz	1127 MHz
Boost Clock	1089 MHz	980 MHz	1216 MHz	1178 MHz	1178 MHz
VRAM	12 GB GDDR5	6 GB GDDR5	4 GB GDDR5	4 GB GDDR5	2 GB GDDR5
Memory Bus	384-bit	384-bit	256-bit	256-bit	128-bit
Memory Clock	7.0 GHz	7.0 GHz	7.0 GHz	7.0 GHz	7.0 GHz
Memory Bandwidth	336.0 GB/s	336.0 GB/s	224.0 GB/s	224.0 GB/s	112.0 GB/s
TDP	250W	250W	165W	145W	120W
Power Connectors	8+6 Pin	8+6 Pin	Two 6-Pin	Two 6-Pin	One 6-Pin
Price	$999 US	$999 US	$549 US	$329 US	$199 US

Eff Pee Sixty.. What?

Does the FP64 performance really matter? Not necessarily, and of course that depends on the industry you want to use it in too. Oddly, AMD generally offers good native FP64 performance, making those good choices for compute heavy workloads. But all gaming and most commercial resources likely rely on FP32 or lower precision, only scientific workloads really make use of anything higher. So for the average joe or jane? You’re fine and this bad boy will be more than enough.

GM200 used all of it’s space to cater to the FP32 workloads, making it a purely gaming-centric device. Having so little space dedicated to FP64 registers means that the Quadro products will be just as limited, and that there is likely not to be a Tesla part made from GM200. This is a gaming GPU through and through. But guess what? It delivers the goods, and it delivers them with lots of cake and pie to boot!

Of course the original Titan, Titan Black and Titan Z had non-neutered FP64 leading it to have a “semi-professional” status amongst some. Doing so would have vastly increased the size of the chip, increased the cost of both manufacture and of retail price, and it would have been far more power hungry and hot than it is. Perhaps on 20nm or below will we see a non FP64 neutered chip. But until then, this thing is blazing fast for just about anything. So get over yourself already.

I really have to reiterate this for anyone who might somehow think that the lack of FP64 means that the chip is somehow poorly made, a cop-out or just plain broken compared to AMD. FP64 means nothing to the gaming crowd. It also means nothing to the majority of the commercial crowd. The only reason you need FP64 is in some scientific applications that absolutely need to have that double precision. Sure, it looks cool to have that spec, but it’s meaningless for 99% of us. And even some very well put together scientific distributed computing project applications only make use of single precision, so it’s still useless to have FP64. If you need it, then don’t buy this, and you already know who you are anyway, and won’t be looking at the Titan X for that.

Another use that NVIDIA see's this card being used for is that of the perfect card to be paired with that nifty new VR headset from Valve, Oculus or even Razer's open source model. The massive 12GB of RAM make it possible to not only process large amounts of data at a time and keep it in memory, not to mention the plethora of CUDA cores, but perhaps some head tracking data can be offloaded to the GPU to help keep things running a bit more smoothly. Not all of the GPU is always going to be completely utilized, so why not add in some OpenCL or CUDA calls to help with other tasks in a game. This may have the overhead when used in conjunction with some titles.

Oh it’s good alright.

To test this beast I’ve compiled a great list of benchmarks for you. First of all are the comparative benchmarks that will be used to compare against the Titan X’s direct and indirect competitors. Then I have a long list of other benchmarks that are fairly all inclusive list with both old and new games. I even have the original Crysis in there. These list frame rates from the Titan X only and are for your enjoyment only. Feel free to report your own numbers from odd games and other applications in the comments, we’d love to see what you’ve done with your own. Lastly I have a list of compute benchmarks to round it off. I’ve used distributed computing projects as well as more mainstream benchmarks that can be used as a point of comparison.

Test Setup

My test bench is a bit more pedestrian, something to represent the majority rather than the minority. Unfortunately the Titan X is most definitely not the majority. That’s okay though, because that just means that we’re depicting what one could do to save costs to afford a Titan X while still maintaining a good PC.

CPU	Intel Xeon 1230V3 @ 3.3GHz
Motherboard	Gigabyte Z97N Wi-Fi Mini-Itx
Power Supply	XFX 1250W Pro Black Edition
Hard Disk	SanDisk Extreme II 120GB
Storage Disk	Seagate 2TB
Memory	Crucial Ballistix Tactical Tracer 8GB (4GBx2) DDR3 1866
Monitor	BenQ BL2710PT 27" WQHD
Video Cards	Geforce Titan X, Geforce GTX 980 Reference, Geforce GTX 970 Reference, AMD R9 295X2
Drivers	NVIDIA 347.84 Beta, AMD Catalyst 14.2
Operation System	Windows 8.1 Pro

For all the tests MSAA was set to X2 to even the playing field. Battlefield 4 consisted of a play through of a 64 player server on the Siege of Shanghai level. Crysis 3's benchmark was done by playing through the first level. Fraps was used to capture the framerates of Battlefield 4 and Crysis 3. The rest use the internal benchmarking services that were available.

Benchmarks

Battlefield 4

Crysis 3

Dragon Age Inquisition

Middle-Earth: Shadow of Mordor

Civilization Beyond Earth

Compilation

For the compilation I've used many games from our past and some present, but perhaps not common, games to highlight the performance capable of this beast. Some games had a frame rate cap regardless of vsync and were thus useless, though other titles allowed me to remove the cap, so I've done so with those titles.

I want to introduce the play through style of benchmarking. Benchmarking a GPU in the context of gaming is not scientific nor should it be treated as such. Playing games presents differing and challenging situations for the GPU to render that constantly change. Simply running through a static benchmark isn't indicative of how it will run in the real-world at all. Sure, it stresses the components and tells you how well it runs the engine, but how does it handle the random events that can come into play with a real play through? That's why I play through a level and capture the information to display to you. I'll certainly provide benchmarks that cater to those looking to systematically compare information, but I also want you to be informed of how it handles a game in a more realistic way.

All games consisted of a play through of a level, typically the first level offered. Fraps was used to capture the framerate information. More games will be added to the list as I have time and the resources to play them.

All of these games were run at the highest possible settings at WQHD resolution.

Now we’re on to the compute benchmarks. As you can almost plainly see. The Titan X provides a gaming experience that rivals even the R9 295X2 at 1440P. It certainly gives it a run, though not really for it’s money due to the Titan being more expensive.

But I digress. Now let’s see how that high throughput single precision does in the real world. I’ve selected three distributed computing tests from Einstein@home, POEM@home and Primegrid@home. All three tests were selected due to them being updated frequently and even recently enough to take in account Maxwell’s architecture. I’ve also included the benches form a novel little benchmark known as ViennaBench. ViennaCL is a a linear algebra library that runs in OpenCL and even CUDA. The creator, Karl Rupp, has even added a nifty benchmark that measures performance in a number of different mathematical tests.

Time is in seconds, faster is better.

ViennaCL is in GB/s and GFLOPs, more is better here.

POEM@home

Einstein@home

Primegrid

ViennaCL

This can be run in both single and double precision mode, and you can certainly see the lack of FP64 performance here too.

Temperatures

The unfortunate thing about cramming so many transistors into such a small area on the same process node is the increase in heat generation across the entire core. Does this mean it runs hot? No! But it is hotter than your GTX 980, and this could limit the overclocking potential as well. Unfortunately I don't have professional audio monitoring hardware, so can't tell you how quiet it is, except that it isn't annoying in the slightest.

Now what

All the benchmarks in the world are fantastic to look at. Now we know about the chip, how it works and what it can do. But what about that multi-colored polka dotted elephant in the room? That price is absolutely enormous. $999 is a lot of dough to slap down for a GPU that performs as good, though sometimes better, than a dual GPU card from Team Red that costs less. But in the context of NVIDIA's current inventory, the Titan X does indeed cost as much if not less than an SLI solution of GTX 980's, and performs nearly as well, though using less power.

So what does one do in this type of dilemma? Certainly the gamer with disposable money that wants the absolute best in performance should look at NVIDIA's best solution, the Titan X. But does it fit within the confines of a more modest system limited by its other components? My fairly inexpensive Xeon 1230V3 actually seems to be able to feed it fast enough to make it worth it, perhaps. I can't truly tell you whether or not it's a great and amazing buy at $999. The Titan X is expensive, and NVIDIA seems to be able to price the card to reflect the lack of real competition for it at the moment. So very expensive!

What I can tell you, however, is how enjoyable it was to be able to play with it and use it. It's fast alright, and it makes old games play well just as it enables higher settings in newer games. Is the Titan X worth that $999 price of entry? Not quite. Even $800 would be far more enticing. In fact, I would give this a perfect 10 if that were the case. But it's just too expensive to justify for the average gamer. But for the average gamer this ain't. If you need or want the best, then of course this is worth it!

I dare you to buy one. It's fantastic!

Overclocking and SLI performance will be in another upcoming article. Expect to see some even better performance numbers when two of these babies are put together.