AMD's upcoming Radeon Instinct MI100 HPC accelerator which would feature the Arcturus GPU has been spotted by Komachi. The existence of the AMD Arcturus GPU was confirmed all the way back in 2018 and two years later, we are finally starting to get details regarding the specifications for AMD's next HPC/AI accelerator.
AMD Arcturus GPU Powered Radeon Instinct MI100 HPC / AI Accelerator Features 32 GB HBM2, 200W TDP In Early Prototypes
The "Arcturus" codename comes from the red giant star which is the brightest in the constellation of Bootes and among the brightest stars that can be seen from space. Similar to Vega and Navi, both of which are also some of the brightest stars visible in the night sky, the naming scheme takes inspiration from the time since RTG was created and the founding father, Raja Koduri (ex AMD RTG President), put a lot of emphasis on bright stars when they first introduced Polaris.
Previously, we have seen support for Arcturus GPU added to HWiNFO, in particular, the XL variant. To our surprise, the new variant that has leaked out 'D34303' is also based on the XL die and would go on to power the Radeon Instinct MI100. The information for this part is based on a test board so it is likely that final specifications would not be the same but here are the key points:
- Based on Arcturus XL GPU
- Test Board has a TDP of 200W
- Up To 32 GB HBM2 Memory
- HBM2 Memory Clocks Reported Between 1000-1200 MHz
?
AMD MI100 HBM2 D34303 A1 XL 200W 32GB 1000M.— 比屋定さんの戯れ言@Komachi (@KOMACHI_ENSAKA) February 7, 2020
The Radeon Instinct MI100 test board has a TDP of 200W and is based on the XL variant of AMD's Arcturus GPU. The card also features 32 GB of HBM2 memory with pin speeds of 1.0 - 1.2 GHz. The MI60 in comparison has 64 CUs with a TDP of 300W while clock speeds are reported at 1200 MHz (Base Clock) while the memory operates at 1.0 GHz along with a 4096-bit bus interface, pumping out 1 TB/s bandwidth. There's a big chance that the final design of the Arcturus GPU could be featuring Samsung's latest HBM2E 'Flashbolt' memory which offers 3.2 Gbps speeds for up to 1.5 Tb/s of bandwidth.
AMD Radeon Instinct Accelerators
Accelerator Name | AMD Instinct MI400 | AMD Instinct MI350X | AMD Instinct MI300X | AMD Instinct MI300A | AMD Instinct MI250X | AMD Instinct MI250 | AMD Instinct MI210 | AMD Instinct MI100 | AMD Radeon Instinct MI60 | AMD Radeon Instinct MI50 | AMD Radeon Instinct MI25 | AMD Radeon Instinct MI8 | AMD Radeon Instinct MI6 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CPU Architecture | Zen 5 (Exascale APU) | N/A | N/A | Zen 4 (Exascale APU) | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
GPU Architecture | CDNA 4 | CDNA 3+? | Aqua Vanjaram (CDNA 3) | Aqua Vanjaram (CDNA 3) | Aldebaran (CDNA 2) | Aldebaran (CDNA 2) | Aldebaran (CDNA 2) | Arcturus (CDNA 1) | Vega 20 | Vega 20 | Vega 10 | Fiji XT | Polaris 10 |
GPU Process Node | 4nm | 4nm | 5nm+6nm | 5nm+6nm | 6nm | 6nm | 6nm | 7nm FinFET | 7nm FinFET | 7nm FinFET | 14nm FinFET | 28nm | 14nm FinFET |
GPU Chiplets | TBD | TBD | 8 (MCM) | 8 (MCM) | 2 (MCM) 1 (Per Die) | 2 (MCM) 1 (Per Die) | 2 (MCM) 1 (Per Die) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) |
GPU Cores | TBD | TBD | 19,456 | 14,592 | 14,080 | 13,312 | 6656 | 7680 | 4096 | 3840 | 4096 | 4096 | 2304 |
GPU Clock Speed | TBD | TBD | 2100 MHz | 2100 MHz | 1700 MHz | 1700 MHz | 1700 MHz | 1500 MHz | 1800 MHz | 1725 MHz | 1500 MHz | 1000 MHz | 1237 MHz |
INT8 Compute | TBD | TBD | 2614 TOPS | 1961 TOPS | 383 TOPs | 362 TOPS | 181 TOPS | 92.3 TOPS | N/A | N/A | N/A | N/A | N/A |
FP16 Compute | TBD | TBD | 1.3 PFLOPs | 980.6 TFLOPs | 383 TFLOPs | 362 TFLOPs | 181 TFLOPs | 185 TFLOPs | 29.5 TFLOPs | 26.5 TFLOPs | 24.6 TFLOPs | 8.2 TFLOPs | 5.7 TFLOPs |
FP32 Compute | TBD | TBD | 163.4 TFLOPs | 122.6 TFLOPs | 95.7 TFLOPs | 90.5 TFLOPs | 45.3 TFLOPs | 23.1 TFLOPs | 14.7 TFLOPs | 13.3 TFLOPs | 12.3 TFLOPs | 8.2 TFLOPs | 5.7 TFLOPs |
FP64 Compute | TBD | TBD | 81.7 TFLOPs | 61.3 TFLOPs | 47.9 TFLOPs | 45.3 TFLOPs | 22.6 TFLOPs | 11.5 TFLOPs | 7.4 TFLOPs | 6.6 TFLOPs | 768 GFLOPs | 512 GFLOPs | 384 GFLOPs |
VRAM | TBD | HBM3e | 192 GB HBM3 | 128 GB HBM3 | 128 GB HBM2e | 128 GB HBM2e | 64 GB HBM2e | 32 GB HBM2 | 32 GB HBM2 | 16 GB HBM2 | 16 GB HBM2 | 4 GB HBM1 | 16 GB GDDR5 |
Infinity Cache | TBD | TBD | 256 MB | 256 MB | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
Memory Clock | TBD | TBD | 5.2 Gbps | 5.2 Gbps | 3.2 Gbps | 3.2 Gbps | 3.2 Gbps | 1200 MHz | 1000 MHz | 1000 MHz | 945 MHz | 500 MHz | 1750 MHz |
Memory Bus | TBD | TBD | 8192-bit | 8192-bit | 8192-bit | 8192-bit | 4096-bit | 4096-bit bus | 4096-bit bus | 4096-bit bus | 2048-bit bus | 4096-bit bus | 256-bit bus |
Memory Bandwidth | TBD | TBD | 5.3 TB/s | 5.3 TB/s | 3.2 TB/s | 3.2 TB/s | 1.6 TB/s | 1.23 TB/s | 1 TB/s | 1 TB/s | 484 GB/s | 512 GB/s | 224 GB/s |
Form Factor | TBD | TBD | OAM | APU SH5 Socket | OAM | OAM | Dual Slot Card | Dual Slot, Full Length | Dual Slot, Full Length | Dual Slot, Full Length | Dual Slot, Full Length | Dual Slot, Half Length | Single Slot, Full Length |
Cooling | TBD | TBD | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling |
TDP (Max) | TBD | TBD | 750W | 760W | 560W | 500W | 300W | 300W | 300W | 300W | 300W | 175W | 150W |
It is also mentioned that the Arcturus XL GPU could be a single huge monolithic die and not a chiplet based design like AMD's Zen 2 based Ryzen CPU lineup. The naming of the Radeon Instinct MI100 itself gives us a hint of its absolute performance metrics which would be around 100 TFLOPs of INT8. That's a 66% increase in INT8 (AI/DNN) compute horsepower. Similarly, the FP16 compute would be rated at around 50 TFLOPs, 25 TFLOPs of FP32 and 12.5 TFLOPs of FP64. The extra GPU horsepower could be coming through either an updated graphics architecture, much higher clocks or higher CUs, which is the best assumption.
We have only seen little details which are also speculation at best such as the GPU cache info that is part of the Virtual CRAT (vCrat) size. The GPU cache correlates with the CU count. In the case of AMD Arcturus GPU, the cache size has been increased and so have the CU count from 64 to 128. That is twice as many CUs as Vega 10 which would give us 8192 stream processors if AMD is using 64 stream processors per CU like their current and modern-day GPU designs.
While Arcturus is a Vega derivative, it's also a custom design solely for the HPC segment. This way, AMD can focus on parallel developments for the gaming/consumer segment and the HPC market which consists of AI/DNN and datacenter customers.
Just a few days ago, some interesting speculation based on the new configuration for the Big Red 200 supercomputer was posted by Dylan522p who suggests that NVIDIA's next-generation Ampere GPU based HPC parts could potentially feature up to 18 TFLOPs of FP64 compute. That would almost be a 50% lead over the Instinct MI100, but AMD has proved that they can offer more FLOPs at a competitive price so maybe that is where Arcturus would be targetting. There's no word on when Arcturus would land, but AMD has hinted at an Instinct product later this year.