Benchmarking DX12 and Vulkan is a bit of a different beast. Working with DX9, 10, and 11 in the past has been a fairly painless ordeal requiring nothing more than a licensed copy of FRAPs and the FRAPs analyzing tool to provide us with frametimes as well as Average, 99th and 99.9th percentile lows. This allows for showing a more granular and accurate representation of how a game performs with a particular graphics card. Yes, there are more advanced methods such as FCAT, but for me that’s a bit out of reach due to the expense of the hardware required for capturing and analyzing the render outputs. So in walks DX12 and Vulkan….and this changes everything. Not having FCAT at my disposal I’ve resorted to learning the ins and outs of PresentMon. This has been no easy task for me but luckily I had a few hands from other reviewers learning how to best implement this. I want to take a moment and thank AdoredTV, Son of a Tech, and Donny from Custom PC Review. Thanks to these fellows I am now able to bring you all DX12 and Vulkan results going forward.
Cards
The two cards going head to head today are the XFX RX 480 8GB OC (flashed from 4GB) vs the NVIVIA GTX 1060 Founders Edition in our battery of DX12/Vulkan titles to see where things stand today with these next generation APIs.
XFX Radeon RX 480 8GB OC
The RX 480 is AMD Radeon’s latest generation Polaris based 14nm graphics card. The RX 480 features 2304 Stream Processors cranking up to 1266MHz, or 1288MHz in our case with the XFX OC model. It comes in one of two flavors of VRAM configurations with either 4GB GDDR5 clocked at 7Gbs or 8GB GDDR5 pumped to 8Gbs. This is all on a 256bit memory bus and sports a 150w TDP. We’re using the reference design card for these tests.
AMD RX 400 Series Specifications
Graphics Card Name | AMD Radeon RX 480 | AMD Radeon RX 470 | AMD Radeon RX 460 |
---|---|---|---|
Graphics Core | Polaris 10 XT | Polaris 10 Pro | Polaris 11 |
Process Node | 14nm FinFET | 14nm FinFET | 14nm FinFET |
Boost Clock | 1266Mhz | 1206Mhz | 1200Mhz |
Peak Compute | 5.83 TFLOPs | 4.9 TFLOPs | 2.2 TFLOPs |
Memory | 4/8 GB GDDR5 | 4/8 GB GDDR5 | 2/4 GB GDDR5 |
Memory Interface | 256-bit | 256-bit | 128-bit |
Memory Speed | 8 GHz | 6.6 GHz | 7 GHz |
Memory Bandwidth | 256 GB/s | 211 GB/s | 112 GB/s |
Power | 150W | 120W | 75W |
MSRP | $199 (4 GB) $239 (8 GB) | $179 (4 GB) | $109 (2 GB) |
NVIDIA GeForce GTX 1060 Founders Edition
The GTX 1060 is NVIDIA’s smallest missile in their 16nm Pascal assault. The GTX 1060 features 1280 CUDA cores screaming along at a rated boost clock of 1708MHz, we found ours runs easily past that settling around 1860MHz consistently. Only one memory configuration comes out of the GTX 1060 with 6GB of GDDR5 at 8Gbs on a 192bit bus. I know, there’s that one 3GB model from Zotac floating around, I’m not counting that. All of this wrapped up nicely in a 120w TDP configuration. For these tests we are using our Founders Edition of the GTX 1060.
NVIDIA GeForce 10 Pascal Family
Graphics Card Name | NVIDIA GeForce GTX 1050 2 GB | NVIDIA GeForce GTX 1050 3 GB | NVIDIA GeForce GTX 1050 Ti | NVIDIA GeForce GTX 1060 3 GB | NVIDIA GeForce GTX 1060 5 GB | NVIDIA GeForce GTX 1060 6 GB | NVIDIA GeForce GTX 1070 | NVIDIA GeForce GTX 1070 Ti | NVIDIA GeForce GTX 1080 | NVIDIA Titan X | NVIDIA GeForce GTX 1080 Ti | NVIDIA Titan Xp |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Graphics Core | GP107 | GP107 | GP107 | GP106 / GP104 | GP106 | GP106 / GP104 | GP104 | GP104 | GP104 | GP102 | GP102 | GP102 |
Process Node | 14nm FinFET | 14nm FinFET | 14nm FinFET | 16nm FinFET | 16nm FinFET | 16nm FinFET | 16nm FinFET | 16nm FinFET | 16nm FinFET | 16nm FinFET | 16nm FinFET | 16nm FinFET |
Die Size | 132mm2 | 132mm2 | 132mm2 | 200mm2 | 200mm2 | 200mm2 | 314mm2 | 314mm2 | 314mm2 | 471mm2 | 471mm2 | 471mm2 |
Transistors | 3.3 Billion | 3.3 Billion | 3.3 Billion | 4.4 Billion | 4.4 Billion | 4.4 Billion | 7.2 Billion | 7.2 Billion | 7.2 Billion | 12 Billion | 12 Billion | 12 Billion |
CUDA Cores | 640 CUDA Cores | 768 CUDA Cores | 768 CUDA Cores | 1152 CUDA Cores | 1280 CUDA Cores | 1280 CUDA Cores | 1920 CUDA Cores | 2432 CUDA Cores | 2560 CUDA Cores | 3584 CUDA Cores | 3584 CUDA Cores | 3840 CUDA Cores |
Base Clock | 1354 MHz | 1392 MHz | 1290 MHz | 1506 MHz | 1506 MHz | 1506 MHz | 1506 MHz | 1607 MHz | 1607 MHz | 1417 MHz | 1480 MHz | 1480 MHz |
Boost Clock | 1455 MHz | 1518 MHz | 1392 MHz | 1708 MHz | 1708 MHz | 1708 MHz | 1683 MHz | 1683 MHz | 1733 MHz | 1530 MHz | 1583 MHz | 1582 |
FP32 Compute | 1.8 TFLOPs | 2,3 TFLOPs | 2.1 TFLOPs | 4.0 TFLOPs | 4.4 TFLOPs | 4.4 TFLOPs | 6.5 TFLOPs | 8.1 TFLOPs | 9.0 TFLOPs | 11 TFLOPs | 11.5 TFLOPs | 12.5 TFLOPs |
VRAM | 2 GB GDDR5 | 3 GB GDDR5 | 4 GB GDDR5 | 3 GB GDDR5 | 6 GB GDDR5 | 6 GB GDDR5/X | 8 GB GDDR5/X | 8 GB GDDR5 | 8 GB GDDR5X | 12 GB GDDR5X | 11 GB GDDR5X | 12 GB GDDR5X |
Memory Speed | 7 Gbps | 7 Gbps | 7 Gbps | 8 Gbps | 8 Gbps | 9 Gbps / 10 Gbps | 8 Gbps | 8 Gbps | 11 Gbps | 10 Gbps | 11 Gbps | 11.4 Gbps |
Memory Bandwidth | 112 GB/s | 84 GB/s | 112 GB/s | 192 GB/s | 160 GB/s | 224 GB/s / 240 GB/s | 256 GB/s | 256 GB/s | 352 GB/s | 480 GB/s | 484 GB/s | 547 GB/s |
Bus Interface | 128-bit bus | 96-bit bus | 128-bit bus | 192-bit bus | 160-bit bus | 192-bit bus | 256-bit bus | 256-bit bus | 256-bit bus | 384-bit bus | 352-bit bus | 384-bit bus |
Power Connector | None | None | None | Single 6-Pin Power | Single 6-Pin Power | Single 6-Pin Power | Single 8-Pin Power | Single 8-Pin Power | Single 8-Pin Power | 8+6 Pin Power | 8+6 Pin Power | 8+6 Pin Power |
TDP | 75W | 75W | 75W | 120W | 120W | 120W | 150W | 180W | 180W | 250W | 250W | 250W |
Display Outputs | 1x Display Port 1.4 1x HDMI 2.0b 1x DVI | 1x Display Port 1.4 1x HDMI 2.0b 1x DVI | 1x Display Port 1.4 1x HDMI 2.0b 1x DVI | 3x Display Port 1.4 1x HDMI 2.0b 1x DVI | 3x Display Port 1.4 1x HDMI 2.0b 1x DVI | 3x Display Port 1.4 1x HDMI 2.0b 1x DVI | 3x Display Port 1.4 1x HDMI 2.0b 1x DVI | 3x Display Port 1.4 1x HDMI 2.0b 1x DVI | 3x Display Port 1.4 1x HDMI 2.0b 1x DVI | 3x Display Port 1.4 1x HDMI 2.0b 1x DVI | 3x Display Port 1.4 1x HDMI 2.0b | 3x Display Port 1.4 1x HDMI 2.0b |
Launch Date | October 2016 | May 2018 | October 2016 | September 2016 | August 2018 | July 2016 | June 2016 | October 2017 | May 2016 | August 2016 | March 2017 | April 2017 |
Launch Price | $109 US | $119 US-$129 US | $139 US | $199 US | TBD | $249 US | $349 US | $449 US | $499 US | $1200 US | $699 US | $1200 US |
Test System
With the cards briefly out of the way, let us jump into the test rig we’re using. No we’re not using our 6 core i7 test rig this go around as this entire article is being written from the comfort of a beach front hotel room and I couldn’t take all that with me. Instead this is all being done in my personal gaming rig. One thing about my personal rig I’m using is that I feel it is a fairly typical setup for these graphics cards, albeit the form factor is a bit non typical.
Intel Core i5 6600k Test System
CPU | Intel i5 6600k (4Ghz) |
Case/PSU | EVGA Hadron and 500w PSU |
GPU | XFX RX 480 8GB OC, NVIDIA GTX 1060 FE |
HDD | 2TB Seagate SSHD |
Memory | 16GB (2x8) G.Skill Trident Z 3200Mhz |
Motherboard | EVGA Z170 Stinger |
SSD | Crucial MX100 512GB |
Testing Methodology
To touch on the testing method throughout this write up I am working with PresentMon to draw accurate results from the frames being displayed to draw the Frame times and from there we were able to get the frame rates. Each title in this includes several bits of information. Starting off with the game we’ve recorded the instance that is being used for the benchmarks in the exact way that we took the measurements so that you know exactly where and how we tested. Next we’ve included all of the settings for each game so that you can replicate this for yourselves if you so wish. Next, and this is something I want to improve on in the future, is the Average FPS. I haven’t learning the proper formula for extrapolating the 99th and 99.9th percentile results yet. Because of that little road bump I’ve taken the entirety of the frame rates from each run and plotted them on a graph that depicts FPS over time. Hopefully this will shed a little more light on the results instead of simply a number, but also allow for you to visually compare one card to the next. All tests were run at 1080p.
Drivers Used
Geforce 368.81
Crimson 16.7.3
Ashes of the Singularity
Ashes of the Singularity has possibly been the longest go-to DX12 benchmark, mostly because it was one of the first. Most benchmark results you see floating use the “Crazy” preset for this game, but we’re using the “High” as we feel it’s fairly representative of what you would be running this game at if you owned one of these cards.
DOOM
DOOM, the first non-beta example of the Vulkan API running with full Asynchronous Compute support. We did make sure we ran this game with the settings that would take full advantage of this feature. One think I will say about this game is it really shows that you don’t have to use Direct X if you want to make a beautiful game.
Forza Motorsports 6 Apex
Apex has to be the first title to come out of the Windows Store using the UWP that didn’t perform like a sack of rotten potatoes on day one. This has been a title that has enjoyed very good performance across the board since day one. The hardest part of benchmarking this game was stopping and not continuing to the next lap!
Gears of War: Ultimate Edition
Remember what I said about performing like a sack of rotten potatoes, this is the game I was referring to. The Windows Store first big DX12 launch was an absolute disaster performance wise at launch. I’m happy to report that all of that is no longer the case, even though the game has swelled to over 50GB in size.
HITMAN
HITMAN 2016 is the latest in the series and is being released as an episodic adventure. This approach feels natural with this game, however with each update they tend to toss in performance ‘upgrades’ as well. Because of this the game needs to be retested regularly.
Rise of the Tomb Raider
RotTR had pretty bad performance when it first rolled out the DX12 patch. Thankfully that has changed significantly and has even released a recent update that allows for Async Compute capability.
Total War: Warhammer
If there’s any game series in history that could benefit from DX12 it’s this one. Total War has been a notoriously single threaded game in the past making it pretty much perform the same regardless of what high end GPU you have once the screen is full of units.
Results
In the end DX12 and Vulkan are still very young and even the titles we’ve tested here are ever evolving and getting regular updates that could very well change these performance numbers drastically. I know very well that the Radeon fans will be eager to point out the massive lead that the RX 480 is enjoying in DOOM, but look just past it and see the tables turn in the opposite direction with Forza. In all the other titles they stay so close it really comes down to which one you want. Both cards perform great in DX12 and something to consider in DOOM is, we still are waiting on an updated driver to enable Async Compute in Pascal. When that does happen we’ll be revisiting it as well. But as we get a better understanding of testing and showing results for these next generation APIs you can expect more coverage as time marches on.