NVIDIA and AMD Ready For Next Generation DirectX 12 API – Showcase New Features and Benefits of D3D12 API

Hassan Mujtaba

Microsoft is yet to launch their next generation DirectX 12 API and the only applications to use it currently are demos of AAA titles which are currently under development. With the launch of Windows 10 nearing, AMD and NVIDIA are ramping up support for DirectX 12 API on their latest graphics cards which include Radeon HD 7000, Radeon R200 series on the red team and GeForce Fermi, GeForce Kepler and GeForce Maxwell generation GPU over at the green team.

NVIDIA and AMD Showcase New Features and Support Under DirectX 12 API

Both NVIDIA and AMD have been showcasing several slides on how their graphics architecture will support the new API and were uploaded on Slideshare (AMD / NVIDIA). We will go into details but as expected, DirectX 12 API is a game changer, not only for the users but for the developers too which allows them to harness a ton of unharnessed potential from PC hardware (mainly GPUs and CPUs). This in term leads not only to better stability and performance but also better game optimizations, a much more smoother development process and deliver more visually enhancing effects.

AMD Radeon and DirectX 12 - A Powerful Bond

AMD DirectX 12 API Benefits

We will start off with the AMD slides first and proceed towards the NVIDIA presentation afterwards. The slides posted by AMD are from February 17th, 2015 and were only presented to a specific audience until today. The slide goes off to detail what we already know about the current state of multi-core processors (AMD's FX CPUs in specific) which even though have a higher core count, don't tend to deliver much enhanced performance as compared to Intel parts due to applications not featuring multi-core support which leads them to deliver better performance on CPUs with higher single core IPC. AMD's Bulldozer and Piledriver CPUs didn't manage to deliver a huge gain on IPC hence the processors pretty much slacked down against three generations of Intel CPUs. The other factor is that modern CPUs are technically unable to keep up with performance growth of graphics cards which is usually in double digits compared to a single digit growth on the CPU side. There's also the fact that current API/Driver overhead are restricting CPUs with multiple cores to utilize only 1 core to talk to the GPU to execute a process. So even with 4,6 or 8 core CPUs, you are in the same league as a person who owns a dual core processor unless or until the application can guarantee more brute multi-processor communication.

With DirectX 12 API, that scenario is going to change and following are just a couple of benefits to name:

  • Better use of multi-core CPUs
  • More on-screen detail
  • Higher Min/Max/Avg framerates
  • Smoother gameplay
  • More efficent use of GPU hardware
  • Reduced System power draw
  • Allows for new game designs previously considered impossible due to restriction by older APIs

Multi-threaded Command Buffer Recording:

AMD FX CPUs DirectX 12

With DirectX 12, AMD has three key features to incorporate in their hardware. The first is Multi-threaded command buffer recording, with command buffer being a list of commands issued to the CPU to execute while running a game such as Lightning, Effects and will be used to  improve multi-core CPU performance. The fact about this technology is that it won't only work for AMD CPUs, both also for other processors which include Intel and several ARM based processors. It allows higher FPS gain by utilizing several more CPU cores and allowing them to communicate with the GPU cores simultaneously. This also allows much better CPU utilization and delivers better performance per watt since previously, you were wasting most of the CPU power on doing nothing. The technology will allow the CPU to become less of a bottleneck then it previously was and the majority of the performance will be determined by the GPU since the CPU will be properly utilized.

A chart made by AMD perfectly illustrates how the API is going to work in the benefit of CPUs. On DirectX 11, most of the work is handled by a single core which is represented by "Core 1". This core is processing all the tasks currently issued to it and the majority of the bottleneck is caused by the DirectX work which takes even more time than the game itself. At the same time, other cores are working but some cores remain unused while the CPU is sipping the same amount of performance as any normal workload. The total time it takes to complete a workload on DirectX 11 is 29ms or 34 FPS.

The difference with DirectX 12 is that all CPU cores are being utilized with load distributed across all cores of an 8 core processor. The DirectX work is significantly reduced which allows the game to quickly present the execution to the user in just 15ms or 66 FPS compared to 29ms or 34 FPS with DirectX 11. This technology also allows CPU to deliver higher draw calls than previously with an AMD A10-7850K pumping out 2,739,266 draw calls at 84W compared to 521,221 draw calls at 91W.

amd-command-buffer-directx-11
amd-command-buffer-directx-12

Fine-Grain Asynchronous Compute Scheduling/Execution:

AMD DirectX 12 Fine-Grain Asynchronus Compute

The next feature from AMD is Fine-Grain Asynchronous Compute Scheduling and Execution which will deliver on three promises, higher FPS, Greater VR Support and better image quality. To gain higher FPS, DirectX 12 will shift from several complex serial workloads to several parallel workloads allowing several way execution rather than a complex one-way execution on serial loads. Aside from that, if there's additional workload, the idle GPU resources will be put to task to work instead of waiting for their turn which results in a speedy execution process that will deliver better performance through multi-threading improvements on the GPU side. This great parallelism also means that GPUs will allow lower latency on virtual reality headsets and make them more responsive. Effects such as Physics, Lightning and memory utilize different GPU resources but the execution is serial only, a parallel path on DirectX 12 allows the total render time to be reduced to deliver lower latency and higher FPS/

amd-gpu-pipeline-directx-11
amd-gpu-pipeline-directx-12

An Un-Announced Feature For AMD Hardware "TBA":

At the end, AMD lists down a last feature which is "To Be Announced" by Microsoft and AMD. There's no information regarding this feature but considering the leaked Radeon R9 390X slides, it could be the Tier 3 "Resource Binding" capability which was seen mentioned on the presentation however this isn't fact and should be taken with a grain of salt until official announcements by the company.

NVIDIA GeForce and DirectX 12 - Latest Software and Feature Level Support

Just like AMD and their latest GCN architecture, NVIDIA is ready to support DirectX 12 with their latest Maxwell and even the previous generation Fermi and Kepler generation of GPU architectures. However, both AMD and NVIDIA are going to adopt the latest features only in their latest graphics hardware and the older ones, while capable of supporting DirectX 12 will only incorporate the most basic features of the API.

Moving on to the details, NVIDIA is working with Microsoft in developing a wide array of features which will be supported by Maxwell. The presentation starts off with detailing a few key features such as Conservative Boundary Rasterization (Conservative Rasterization), Raster Ordered Views, Tiled Resources and other special effects which will leverage the graphics performance on NVIDIA hardware. Some details which are true for both NVIDIA and AMD is the need to enhance efficiency of muti-core CPUs to allow parallel rendering overhead for each core to be effectively utilized so that it can communicate with the GPU more efficiently to enhanced work load times and performance per watt.

NVIDIA Maxwell DirectX 12 Graphics Technology - Voxelization, ROVs, 2D/3D Mapping, Faster Geometry Shader Rendering and Ray Tracing Shadows

The effects which will be supported by second generation Maxwell include DirectX 12 Voxelization which is also an enhancement of the VXGI (Voxel based Global Illumination) introduced on GeForce 900 series cards which are comprised of GeForce GTX 960, GeForce GTX 970, GeForce GTX 980 and GeForce GTX Titan X. The NVIDIA Maxwell core architecture adds the new tiled resources and multi-projection technology for voxel grids (future VXGI) which enhances global illumination. The DirectX 11.2 API makes use of 3D Tiled Resources that allows hardware managed virtual memory for the graphics processing unit and has several Tier-2 features supported such as Shader LOD clamp and mapped status feedback, mini/max reduction filtering and reads from non-mapped title returns 0. A mapped path of pixels will be covered if they are already covered by a triangle which is the conservative raster enabler which notices both orange and purple colors and covers them conserving the time it requires for calculation. This enables new rendering algorithms and the result of this voxelization tech improves performance by three times with the new hardware enabled acceleration available on Maxwell.

nvidia-directx-12_geometery
nvidia-directx-12_smoke
nvidia-directx-12_sp
nvidia-directx-12_pixel_aa
nvidia-directx-12_ready-dx12
nvidia-directx-12_rovs

Advanced AA Techniques - MFAA, AGAA, CAAA

The new advanced multi-sampling feature pixel shader can specify the location of all the sub-pixel allowing the pixel shader to learn the depth of each sub-piexel. NVIDIA is also bringing a series of new anti-aliasing technologies that include multi-frame sampling anti-aliasing (Multi-frame sampled AA), polymerization G-Buffer anti-aliasing (Aggregate G-Buffer AA) Cumulative anti-aliasing (Accumulative AA). The latest AGAA 2A technique delivers quality of 32x MSAA while having the performance penalty of 4x MSAA which is quite a big deal. The new technology is also supported by the current generation of Maxwell cards.

Lastly, NVIDIA talks on VR Direct and how DirectX 12 will decrease the response time which their latest "Time Wrap" technology by allowing lower than 20ms delay between each frame and supported by all Fermi, Kepler and Maxwell cards. There's also VR SLI API which will allow faster frames to be rendered when using two cards in SLI. More details on technologies from AMD and NVIDIA can be seen in the slides below:

nvidia-directx-12_agaa-anti-aliasing
nvidia-directx-12_agaa-sample
nvidia-directx-12_agaa

AMD DirectX 12 Presentation (via Slideshare):

amd-radeon-and-directx-12
amd-cpu-directx-12
amd-directx-12-api-benefits
amd-directx-12-api-features
amd-fx-cpus-directx-12
amd-command-buffer-directx-11
amd-command-buffer-directx-12
amd-ashes-of-the-singularity-directx-12
amd-directx-12-stardock-oxide-games
amd-directx-12-multi-core-power-efficiency
amd-directx-12-ready
amd-directx-12-fine-grain-asynchronus-compute
amd-gpu-pipeline-directx-11
amd-gpu-pipeline-directx-12
amd-directx-12-ready-gcn-architecture

NVIDIA DirectX 12 Presentation (via Slideshare):

nvidia-directx-12_ready-dx12
nvidia-directx-12_maxwell-gpu
nvidia-directx-12_maxwell-multi-projection
nvidia-directx-12_maxwell
nvidia-directx-12_sp
nvidia-directx-12_smoke
nvidia-directx-12_smoke-2
nvidia-directx-12_geometery
nvidia-directx-12_feature-level-12
nvidia-directx-12_feature-level-12-2
nvidia-directx-12_d3d12
nvidia-directx-12_cpu-overhead
nvidia-directx-12_conservative-rasterization
nvidia-directx-12_pixel_aa
nvidia-directx-12_rovs
nvidia-directx-12_sm-8k
nvidia-directx-12_sm-pm
nvidia-directx-12_no-gi
nvidia-directx-12_simple-gi
nvidia-directx-12_agaa
nvidia-directx-12_agaa-sample
nvidia-directx-12_agaa-anti-aliasing
nvidia-directx-12_texture-streaming
nvidia-directx-12_voxel-global-illumination
Share this story

Deal of the Day

Comments