AMD Radeon Instinct MI100 With Arcturus GPU Spotted – 32 GB HBM2 Memory, 200W TDP In Early Prototype

Hassan Mujtaba

AMD's upcoming Radeon Instinct MI100 HPC accelerator which would feature the Arcturus GPU has been spotted by Komachi. The existence of the AMD Arcturus GPU was confirmed all the way back in 2018 and two years later, we are finally starting to get details regarding the specifications for AMD's next HPC/AI accelerator.

AMD Arcturus GPU Powered Radeon Instinct MI100 HPC / AI Accelerator Features 32 GB HBM2, 200W TDP In Early Prototypes

The "Arcturus" codename comes from the red giant star which is the brightest in the constellation of Bootes and among the brightest stars that can be seen from space. Similar to Vega and Navi, both of which are also some of the brightest stars visible in the night sky, the naming scheme takes inspiration from the time since RTG was created and the founding father, Raja Koduri (ex AMD RTG President), put a lot of emphasis on bright stars when they first introduced Polaris.

Related Story It’s Time To Bid Farewell To AMD RDNA 2 “Radeon RX 6000” GPUs, Inventory Hits Rock Bottom

Previously, we have seen support for Arcturus GPU added to HWiNFO, in particular, the XL variant. To our surprise, the new variant that has leaked out 'D34303' is also based on the XL die and would go on to power the Radeon Instinct MI100. The information for this part is based on a test board so it is likely that final specifications would not be the same but here are the key points:

  • Based on Arcturus XL GPU
  • Test Board has a TDP of 200W
  • Up To 32 GB HBM2 Memory
  • HBM2 Memory Clocks Reported Between 1000-1200 MHz

The Radeon Instinct MI100 test board has a TDP of 200W and is based on the XL variant of AMD's Arcturus GPU. The card also features 32 GB of HBM2 memory with pin speeds of 1.0 - 1.2 GHz. The MI60 in comparison has 64 CUs with a TDP of 300W while clock speeds are reported at 1200 MHz (Base Clock) while the memory operates at 1.0 GHz along with a 4096-bit bus interface, pumping out 1 TB/s bandwidth. There's a big chance that the final design of the Arcturus GPU could be featuring Samsung's latest HBM2E 'Flashbolt' memory which offers 3.2 Gbps speeds for up to 1.5 Tb/s of bandwidth.

AMD Radeon Instinct Accelerators

Accelerator NameAMD Instinct MI400AMD Instinct MI350XAMD Instinct MI300XAMD Instinct MI300AAMD Instinct MI250XAMD Instinct MI250AMD Instinct MI210AMD Instinct MI100AMD Radeon Instinct MI60AMD Radeon Instinct MI50AMD Radeon Instinct MI25AMD Radeon Instinct MI8AMD Radeon Instinct MI6
CPU ArchitectureZen 5 (Exascale APU)N/AN/AZen 4 (Exascale APU)N/AN/AN/AN/AN/AN/AN/AN/AN/A
GPU ArchitectureCDNA 4CDNA 3+?Aqua Vanjaram (CDNA 3)Aqua Vanjaram (CDNA 3)Aldebaran (CDNA 2)Aldebaran (CDNA 2)Aldebaran (CDNA 2)Arcturus (CDNA 1)Vega 20Vega 20Vega 10Fiji XTPolaris 10
GPU Process Node4nm4nm5nm+6nm5nm+6nm6nm6nm6nm7nm FinFET7nm FinFET7nm FinFET14nm FinFET28nm14nm FinFET
GPU ChipletsTBDTBD8 (MCM)8 (MCM)2 (MCM)
1 (Per Die)
2 (MCM)
1 (Per Die)
2 (MCM)
1 (Per Die)
1 (Monolithic)1 (Monolithic)1 (Monolithic)1 (Monolithic)1 (Monolithic)1 (Monolithic)
GPU CoresTBDTBD19,45614,59214,08013,3126656768040963840409640962304
GPU Clock SpeedTBDTBD2100 MHz2100 MHz1700 MHz1700 MHz1700 MHz1500 MHz1800 MHz1725 MHz1500 MHz1000 MHz1237 MHz
INT8 ComputeTBDTBD2614 TOPS1961 TOPS383 TOPs362 TOPS181 TOPS92.3 TOPSN/AN/AN/AN/AN/A
FP16 ComputeTBDTBD1.3 PFLOPs980.6 TFLOPs383 TFLOPs362 TFLOPs181 TFLOPs185 TFLOPs29.5 TFLOPs26.5 TFLOPs24.6 TFLOPs8.2 TFLOPs5.7 TFLOPs
FP32 ComputeTBDTBD163.4 TFLOPs122.6 TFLOPs95.7 TFLOPs90.5 TFLOPs45.3 TFLOPs23.1 TFLOPs14.7 TFLOPs13.3 TFLOPs12.3 TFLOPs8.2 TFLOPs5.7 TFLOPs
FP64 ComputeTBDTBD81.7 TFLOPs61.3 TFLOPs47.9 TFLOPs45.3 TFLOPs22.6 TFLOPs11.5 TFLOPs7.4 TFLOPs6.6 TFLOPs768 GFLOPs512 GFLOPs384 GFLOPs
VRAMTBDHBM3e192 GB HBM3128 GB HBM3128 GB HBM2e128 GB HBM2e64 GB HBM2e32 GB HBM232 GB HBM216 GB HBM216 GB HBM24 GB HBM116 GB GDDR5
Infinity CacheTBDTBD256 MB256 MBN/AN/AN/AN/AN/AN/AN/AN/AN/A
Memory ClockTBDTBD5.2 Gbps5.2 Gbps3.2 Gbps3.2 Gbps3.2 Gbps1200 MHz1000 MHz1000 MHz945 MHz500 MHz1750 MHz
Memory BusTBDTBD8192-bit8192-bit8192-bit8192-bit4096-bit4096-bit bus4096-bit bus4096-bit bus2048-bit bus4096-bit bus256-bit bus
Memory BandwidthTBDTBD5.3 TB/s5.3 TB/s3.2 TB/s3.2 TB/s1.6 TB/s1.23 TB/s1 TB/s1 TB/s484 GB/s512 GB/s224 GB/s
Form FactorTBDTBDOAMAPU SH5 SocketOAMOAMDual Slot CardDual Slot, Full LengthDual Slot, Full LengthDual Slot, Full LengthDual Slot, Full LengthDual Slot, Half LengthSingle Slot, Full Length
CoolingTBDTBDPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive Cooling
TDP (Max)TBDTBD750W760W560W500W300W300W300W300W300W175W150W

It is also mentioned that the Arcturus XL GPU could be a single huge monolithic die and not a chiplet based design like AMD's Zen 2 based Ryzen CPU lineup. The naming of the Radeon Instinct MI100 itself gives us a hint of its absolute performance metrics which would be around 100 TFLOPs of INT8. That's a 66% increase in INT8 (AI/DNN) compute horsepower. Similarly, the FP16 compute would be rated at around 50 TFLOPs, 25 TFLOPs of FP32 and 12.5 TFLOPs of FP64. The extra GPU horsepower could be coming through either an updated graphics architecture, much higher clocks or higher CUs, which is the best assumption.

We have only seen little details which are also speculation at best such as the GPU cache info that is part of the Virtual CRAT (vCrat) size. The GPU cache correlates with the CU count. In the case of AMD Arcturus GPU, the cache size has been increased and so have the CU count from 64 to 128. That is twice as many CUs as Vega 10 which would give us 8192 stream processors if AMD is using 64 stream processors per CU like their current and modern-day GPU designs.

While Arcturus is a Vega derivative, it's also a custom design solely for the HPC segment. This way, AMD can focus on parallel developments for the gaming/consumer segment and the HPC market which consists of AI/DNN and datacenter customers.

Just a few days ago, some interesting speculation based on the new configuration for the Big Red 200 supercomputer was posted by Dylan522p who suggests that NVIDIA's next-generation Ampere GPU based HPC parts could potentially feature up to 18 TFLOPs of FP64 compute. That would almost be a 50% lead over the Instinct MI100, but AMD has proved that they can offer more FLOPs at a competitive price so maybe that is where Arcturus would be targetting. There's no word on when Arcturus would land, but AMD has hinted at an Instinct product later this year.

Share this story

Deal of the Day

Comments