AMD Opens The Lid on Zen Architectural Details at Hot Chips – Huge Performance Leap Over Excavator, Massive Throughput on 14nm FinFET Design

Hassan Mujtaba

AMD has presented tons of more information on their upcoming Zen architecture at Hot Chips. Expected to launch later this year, the Zen architecture focuses on three key departments, performance, throughput and efficiency. With Zen, AMD plans to come back to the performance CPU sector with a bang in the mainstream and enthusiast market.

AMD Zen Architecture Fully Detailed - Wider, High-Performance and Efficient Core Design

To start off with the details, Zen is based on the latest 14nm FinFET node. The only two foundries that have this node are Global Foundries and Samsung but we suspect AMD is using the former to develop Zen chips. The Zen core is said to feature 40% more instructions per clock compared to Excavator core.

AMD's full Zen Hot Chips presentation reveals complete architecture details. (Image Credits: Golem.de)

Excavator core is featured on AMD's Carrizo and Godavari processors. The large jump in IPC would help AMD achieve performance parity with Intel chips. In fact, AMD already demoed a 8 core Summit Ridge CPU based on Zen against a Broadwell-E 8 core chip. The demo showed AMD's solution having better rendering performance than Intel's HEDT solution.

amd-zen_x86
amd-zen_agenda
amd-zen_ipc-improvement
amd-zen_efficiency
amd-zen_feature
amd-zen_zen

AMD Zen Core Design and Core Engine

The basic building block of Zen is the core complex. The core complex comprises of four cores connected to an L3 cache. The L3 cache is 16-Way associated and makes up a total of 8 MB (mostly exclusive of L2 cache). The L3 cache is sliced into four, each comprising of two 1 MB L3 sub-slices. All cores can access these cache blocks with the same average latency speed.

The cores themselves feature two threads each. The core complex hence comprises of 8 threads while the 8 core SKUs will comprise of 16 threads. On each core, branch misdirect is improved and the branch prediction has been improved with two branches per BTB. The large Op cache helps improve throughput and latency at the same time. The integer cluster in each Zen core has six pipes, four ALUs, Arithmetic Logic Units, and two AGUs which is short for Address Generation Units.

These AGUs can perform two 16-byte loads and oine 16-byte store per cycle via a 32 KB 8-way set associative write-back L1 data cache. According to AMD the move from a write-through to a write-back cache has noticeably reduced stalls in several types of code paths. The load/store cache operations cache in Zen also reportedly exhibit lower latency compared to Excavator.

amd-zen_decode
amd-zen_execute
amd-zen_fetch
amd-zen_floating-point

AMD has tried to improve Zen with a larger dispatch of 6 vs 4 on Excavator. Instruction schedulers for integer and floating point have also increased to 84 and 96, respectively. The FPU is now an Quad Issue while queue sizes for retire, load and store have increased to 192, 72, 44 compared to 128, 44, 32 on Excavator.

The two floating point units on the new core consist of 4 pipes with 128 FMACs per FPU. There are two FADD and two FMUL units for calculations on the FPU. The FPU consists of a 2-level scheduling queue with a 160 entry register file, 8-Wide retire and a single pipe for 128b store.It has its own two AES units and is SSE, AVX1,  AVX2, AES, SHA and legacy MMX compliant.

AMD Zen With SMT (Simultaneous Multi-Threading Support)

One of the most anticipated arrival on the new core is SMT support. This brings the design level much closer to Intel's implementation. The SMT design offers increased throughput by executing two threads simultaneously. These virtual threads will appear as independent cores to software and allow more execution resources at the hand applications.

Along with the SMT support, Zen also features support for several new instructions. These include ADX, RDSEED, SMAP, SHA1, XSAVEC, CLZERO and PTE Coalescing. AMD also supports all the standard ISA that are mentioned above.

AMD Zen High Bandwidth, Low Latency Cache System

AMD has been talking about a disruptive cache system on their new core for a while. With the details finally out, we can now better understand this system. The cache hierarchy is made up of a fast private L2 cache on each core (512 KB L2 L+D 8-Way) and a fast shared L3 cache (8 MB L3 L+D 16-Way).

amd-zen_l2-cache

This enables faster band width for prefetch improvements allowing faster cache-to-cache transfers. The L3 cache is mostly filled up of the L2 victims while offering larger queues for L1 and L2 misses.

Each core also comprises of an 64K L1 L (4-Way) cache and 32K L1 D (8-Way) cache. The entire systems adds up to faster L1, L2 and L3 caches that offer faster load to FPU (7 cycles required). Bandwidth is improved to almost 2x on L1 and L2 while L3 cache system bandwidth is improved by 5x.

AMD Zen - A 14nm FinFET, Low Power and Faster Design

AMD Zen_14nm FinFET

Performance is one thing but one place where AMD has really lacked is efficiency. With Zen, that is going to change. Zen has much higher efficiency than Excavator which is a highly tuned design in itself. This is achieved through the use of aggressive clock-gating techniques on multi-level regions inside the core block. Some of the features that help achieve lower power on Zen include:

amd-zen_low-power
amd-zen_bandwidth
amd-zen_throughput
amd-zen_block-diagram

AMD Zen Low Power Features:

  • Aggressive Clock Gating with multi-level regions
  • Write Back L1 Cache
  • Large OP Cache
  • Stack Engine
  • Move Elimination
  • Power Focus from Project Inception
  • Low Power design Methodologies
CPU MicroarchitectureAMD Phenom II / K10AMD BD/PDAMD SR/XVAMD ZenIntel Skylake
Instruction Decode Width3-wide4-wide8-wide4-wide4-wide
Single Core Peak Decode Rate3 instructions4 instructions8 instructions4 instructions4 instructions
Dual Core Peak Decode Rate6 instructions4 instructions8 instructions8 instructions8 instructions

AMD Zen To Arrive as Summit Ridge on Desktop and Naples on Server Platforms

The desktop lineup based on the Zen architecture will be known as Summit Ridge and is expected to arrive at the end of 2016. Launching in limited quantities, AMD will ramp up availability for Summit Ridge lineup in Q1 2017 when it will be available to a wide range of audience who are planning to build gaming and enthusiast grade PCs.

AMD Demo PC Running Zen CPU and R9 Nano GPU (Via Paul's hardware):

The Summit Ridge platform is very impressive as it rids AMD of their existing and old AM3+ and FM2+ platform. The AM4 platform which will support the new chips comes with a slew of new features and capabilities such as support for the latest DDR4 memory, PCI-e Gen 3.0 and next-gen I/O support. AMD will have several SKUs in the works, ranging from quad core to octa core models. All multi-threaded and featuring overclock support.

amd-zen_summit-ridge
amd-zen_performance
amd-zen_path-to-zen

AMD was very proud to showcase performance parity of their 8 core Summit Ridge chip to be on par with Intel Broadwell-E 8 core part. The Intel part comes at a price of $999 US so AMD will tackle it with a more aggressive price. Those looking forward to building gaming PCs can also expect tons of performance from this platform. In fact, these would be a worthy upgrade if you are planning to get a Vega GPU next year which will be coupled with HBM2 memory.

AMD Summit Ridge 4K Gaming and Blender Render Demo:

AMD would first be rolling out Zen to the desktop gaming market but they plan to introduce it to the server market just a few months after the desktop launch. AMD also showcased their 32 core / 64 thread Naples platform which are aimed at the workstation 2P (dual socket) market. (Image Credits: Anandtech)

amd-zen-naples-server-platform_1
amd-zen-naples-server-platform_2
amd-zen-gaming-pc-with-radeon-r9_1
amd-zen-gaming-pc-with-radeon-r9_2

A 2 socket solution should mean a total of 64 cores and 128 threads along with denser memory capacities that make Opteron sound like a kid in the park. We would also see next generation HPC server chips which combine 32 high-performance cores alongside massive Vega GPUs that will be used to crunch FP64 calculations. That has been in the plans for quite some time and with Naples rolling out in Q2 2017, we might see it sometime around 2018.

amd-zen_naples-server-soc
amd-zen_market-focus

This shows AMD's seriousness with Zen that is to offer it in all markets which they have lost their edge to Intel in the past several months. In 2017, AMD would also bring Raven Ridge APUs for the mobility market followed by embedded SOCs based on the new core sometime later. Overall, Zen is a very exciting product for consumers and a key product for AMD themselves which should finally make them competitive against Intel once again.

Share this story

Deal of the Day

Comments