AMD has officially launched their Vega GPU based Instinct MI25 accelerator for large-scale machine intelligence and deep learning data center applications. The new graphics card deploys some of the latest Radeon technologies which boost performance and deliver much higher compute throughput in AI learning tasks.
AMD Radeon Instinct Lineup Launched - Vega Based Instinct MI25, Fiji Based Instinct MI8 and Polaris Based Instinct MI6 Graphics Accelerators
AMD is launching three new graphics accelerators today which are part of the Radeon Instinct line up. These include the Vega 10 based Radeon Instinct MI25, the Fiji XT based Radeon Instinct MI8 and the Polaris 10 based Radeon Instinct MI6. The "MI" in the Instinct family branding stands for "Machine Intelligence" while the corresponding number is the total half precision compute output of the card itself.
Through our Radeon Instinct server accelerator products and open ecosystem approach, we’re able to offer our customers cost-effective machine and deep learning training, edge-training and inference solutions, where workloads can take the most advantage of the GPU’s highly parallel computing capabilities.
We’ve also designed the three initial Radeon Instinct accelerators to address a wide range of machine intelligence applications, which includes data-centric HPC-class systems in academics, government labs, energy, life science, financial, automotive and other industries via Radeon
AMD Radeon Instinct MI25 Accelerator With Vega GPU (24.6 TFLOPs FP16) and 16 GB HBM2
The AMD Radeon Instinct MI25 accelerator is the fastest of the Instinct lineup. It features the Vega 10 graphics core with 4096 stream processors that are clocked at 1500 MHz. With these clock rates, the card delivers 24.6 TFLOPs of FP16, 12.3 TFLOPs of FP32 and 768 GFLOPs of FP64 compute that is aimed at deep learning tasks. The card also packs 16 GB of ECC HBM2 memory which delivers a total of 484 GB/s bandwidth.
It should be noted that the card is slightly lower clocked compared to the Vega Frontier Edition which packs a 1600 MHz clock rate and delivers 13 TFLOPs of FP32, 25 TFLOP of FP16 compute. AMD has said that the card delivers up to 82 GFLOPs/Watt FP16 and 41 GFLOPs/Watt FP32 peak GPU compute performance.
Highlights:
- Industry Leading Performance for Deep Learning
- Next-Gen “Vega” Architecture
- Advanced Memory Engine
- Large BAR Support for Multi-GPU Peer to Peer
- ROCm Open Software Platform for Rack Scale
- Optimized MIOpen Libraries for Deep Learning
- MxGPU Hardware Virtualization
The Radeon Instinct MI25 accelerator, based on the new “Vega” GPU architecture with a 14nm FinFET process, will be the world’s ultimate training accelerator for large-scale machine intelligence and deep learning datacenter applications. The MI25 will deliver superior FP16 and FP32 performance in a passively-cooled single GPU server card with 24.6 TFLOPS of FP16 or 12.3 TFLOPS of FP32 peak performance through its 64 compute units (4,096 stream processors). With 16GB of ultra–high bandwidth HBM2 ECC GPU memory and up to 484 GB/s of memory bandwidth, the Radeon Instinct MI25’s design is optimized for massively parallel applications with large datasets for Machine Intelligence and HPC-class systems. via AMD
In addition to the specifications, the card comes in a dual slot, full height form factor. It requires dual 8-pin connectors to power and the TDP is rated at 300W. The card is passively cooled so it's going to receive cooling from air inside large server racks. The card ships with a three year limited warranty.
AMD Radeon Instinct MI8 Accelerator With Fiji GPU (8.20 TFLOPs FP16) and 4 GB HBM1
AMD is also launching the Radeon Instinct MI8 accelerator which is designed as an inference card. The Instinct MI8 comes packed with the Fiji XT GPU that is based on the 28nm process. The GPU is housing the same number of cores as the Instinct MI25 which are 4096 in total but they are based on the older GCN revision and clocked much slower.
Highlights:
- 8.2 TFLOPS FP16 or FP32 Performance
- Up To 47 GFLOPS Per Watt FP16 or FP32 Performance
- 4GB HBM1 on 512-bit Memory Interface
- Passively Cooled Server Accelerator
- Large BAR Support for Multi GPU Peer to Peer
- ROCm Open Platform for HPC-Class Rack Scale
- Optimized MIOpen Libraries for Deep Learning
- MxGPU SR-IOV Hardware Virtualization
The Radeon Instinct MI8 accelerator, harnessing the high-performance, energy-efficiency of the “Fiji” GPU architecture, is a small form factor HPC and inference accelerator with 8.2 TFLOPS of peak FP16|FP32 performance at less than 175W board power and 4GB of High-Bandwidth Memory (HBM) on a 512-bit memory interface. The MI8 is well suited for machine learning inference and HPC applications. via AMD
In terms of specifications, the card features 4096 stream processors that are clocked at 1000 MHz. This delivers a rated compute output of 8.2 TFLOPs (FP16 / FP32) and 512 GFLOPs of FP64 compute at 1/16th rate. The card also features 4 GB of HBM1 memory which delivers 512 GB/s bandwidth. It is slightly faster than the Vega based Instinct MI25 accelerator but requires two more stacks and is more power hungry. AMD is rating the compute output of this card at 47 GFLOPs/Watt of FP16 and FP32 compute while FP64 compute is rated at 2.9 GFLOPs/Watt.
The card comes in the same small, dual slot package as the Radeon R9 Nano. It has a rated TDP of 175W and power is provided through a single 8-pin connector. The card also lacks active cooling since it's aimed at servers.
AMD Radeon Instinct MI6 Accelerator With Polaris GPU (5.70 TFLOPs FP16) and 16 GB GDDR5
Lastly, we have the AMD Radeon Instinct MI6 graphics accelerator. This card packs the Polaris 10 core and is aimed at both Deep Learning and Inferencing workloads. In terms of specifications, the chip packs the complete 2304 stream processors. All cores are clocked at 1237 MHz. At the rated clock speeds, the chip delivers 5.7 TFLOPs (FP16 / FP32) compute and 358 GFLOPs of dual precision compute performance.
AMD has rated the single and half precision throughput of this card at 2.4 GFLOPs/Watt while the dual precision compute throughput is rated at 358 GFLOPs/Watt.
Highlights:
- 5.7 TFLOPS FP16 or FP32 Performance
- Up To 38 GFLOPS Per Watt Peak FP16 or FP32 Performance
- 16GB Ultra-Fast GDDR5 Memory on 256-bit Memory Interface
- Passively Cooled Server Accelerator
- Large BAR Support for Multi-GPU Peer to Peer
- ROCm Open Platform for HPC-Class Scale Out
- Optimized MIOpen Libraries for Deep Learning
- MxGPU SR-IOV Hardware Virtualization
The Radeon Instinct MI6 accelerator, based on the acclaimed “Polaris” GPU architecture, is a passively cooled inference accelerator with 5.7 TFLOPS of peak FP16|FP32 performance at 150W board power and 16GB of ultra-fast GDDR5 GPU memory on a 256-bit memory interface. The MI6 is a versatile accelerator ideal for HPC and machine learning inference and edge-training deployments. via AMD
The card also comes with 16 GB of GDDR5 memory clocked at 7000 MHz along a 256-bit wide bus interface. This delivers up to 224 GB/s of bandwidth on the card. The card comes in a single slot, full length form factor and is passive cooled with air coming in from the large server arrays. TDP on the card is set at 150W so power is provided by a single 6-pin connector.
AMD Radeon Instinct Accelerators:
Accelerator Name | AMD Radeon Instinct MI6 | AMD Radeon Instinct MI8 | AMD Radeon Instinct MI25 | AMD Radeon Instinct MI60 | AMD Radeon Instinct MI60 |
---|---|---|---|---|---|
GPU Architecture | Polaris 10 | Fiji XT | Vega 10 | Vega 20 | Vega 20 |
GPU Process Node | 14nm FinFET | 28nm | 14nm FinFET | 7nm FinFET | 7nm FinFET |
GPU Cores | 2304 | 4096 | 4096 | 3840 | 4096 |
GPU Clock Speed | 1237 MHz | 1000 MHz | 1500 MHz | 1746 MHz | 1800 MHz |
FP16 Compute | 5.7 TFLOPs | 8.2 TFLOPs | 24.6 TFLOPs | 26.8 TFLOPs | 29.6 TFLOPs |
FP32 Compute | 5.7 TFLOPs | 8.2 TFLOPs | 12.3 TFLOPs | 13.4 TFLOPs | 14.8 TFLOPs |
FP64 Compute | 384 GFLOPs | 512 GFLOPs | 768 GFLOPs | 6.7 TFLOPs | 7.4 TFLOPs |
VRAM | 16 GB GDDR5 | 4 GB HBM1 | 16 GB HBM2 | 16 GB HBM2 | 32 GB HBM2 |
Memory Clock | 1750 MHz | 500 MHz | 472 MHz | 500 MHz | 500 MHz |
Memory Bus | 256-bit bus | 4096-bit bus | 2048-bit bus | 4096-bit bus | 4096-bit bus |
Memory Bandwidth | 224 GB/s | 512 GB/s | 484 GB/s | 1 TB/s | 1 TB/s |
Form Factor | Single Slot, Full Length | Dual Slot, Half Length | Dual Slot, Full Length | Dual Slot, Full Length | Dual Slot, Full Length |
Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling |
TDP | 150W | 175W | 300W | 300W | 300W |
Planned for June 29th rollout, the ROCm 1.6 software platform with performance improvements and now support for MIOpen 1.0 is scalable and fully open source providing a flexible, powerful heterogeneous compute solution for a new class of hybrid Hyperscale and HPC-class systems.
Comprised of an open-source Linux driver optimized for scalable multi-GPU computing, the ROCm software platform provides multiple programming models, the HIP CUDA conversion tool, and support for GPU acceleration using the Heterogeneous Computing Compiler (HCC). AMD also showcased several server racks from their partners that utilized the new EPYC 7000 series processors and Instinct MI25 accelerators.