It looks like AMD's next-gen Instinct MI300 GPU accelerator has made a possible first appearance in the latest Linux patch.
AMD Instinct MI300 'GFX940' GPU, Next-Gen Data Center MCM Accelerator, Makes Possible First Appearance In Linux Patch
The latest Linux Patch has included a new target for an unreleased AMD 'GFX940' GP which has a similar ISA as the Aldebaran 'GFX90a' GPU. It is speculated that this chip could be powering AMD's next-generation Instinct MI300 GPU accelerator and supports all the data-centric features such as MFMA (Matrix-Fused-Multiply-Add), full-rate FP64, and packed FP32 operations. Other features also include XNACK which is specific to CPU+GPU memory space integration, as Coelacanth-Dream puts it.
The source states that although the GPU ISA is similar, the GFX940 does have a few differences when compared to Aldebaran 'CDNA 2' GPUs which are listed below:
Previous rumors have indicated that the AMD Instinct MI300 will feature a 4-GCD design based on the brand new CDNA 3 architecture. The upcoming Instinct MI200 was going to feature 128 compute units per die but that has changed to 110 compute units since last week's rumor. A total of 220 Compute Units would net 14,080 cores and if we take the exact number and multiply it by 4 (the number of GCDs on Instinct MI300), we end up with 440 Compute Units or an insane 28,160 cores.
MI300 😍https://t.co/B3qlnQBbVG
— Kepler (@Kepler_L2) March 1, 2022
MI300 will feature 4 GCDs 🧐
— Kepler (@Kepler_L2) September 7, 2021
A recent AMD ROCm Developer Tools update that was spotted by Komachi did confirm a maximum of 4 MCM GPUs but those are simply 'Aldebaran' SKUs. There are expected to be at least four CDNA 2 powered Instinct accelerators with their respective (unique IDs) listed below. Note that the number doesn't represent the number of dies on each device but rather the device itself:
- 0x7408
- 0x740C
- 0x740F
- 0x7410
Now that would be true if AMD makes no changes whatsoever when moving from CDNA 2 to CDNA 3 but that's not the case. CDNA 3 is expected to bring forward a revised new architecture that won't be another Vega derivative like Arcturus or Aldebaran which makes this rumor more believable.
The GPU architecture may also use a layout that might end up looking similar to the new WGP/SE arrangement on the new RDNA 3 chips or an entirely new design tailored towards the HPC segment. But one thing is for sure, those quad-MCM GPUs definitely are something that we can't wait to see in action!
AMD Radeon Instinct Accelerators
Accelerator Name | AMD Instinct MI400 | AMD Instinct MI350X | AMD Instinct MI300X | AMD Instinct MI300A | AMD Instinct MI250X | AMD Instinct MI250 | AMD Instinct MI210 | AMD Instinct MI100 | AMD Radeon Instinct MI60 | AMD Radeon Instinct MI50 | AMD Radeon Instinct MI25 | AMD Radeon Instinct MI8 | AMD Radeon Instinct MI6 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CPU Architecture | Zen 5 (Exascale APU) | N/A | N/A | Zen 4 (Exascale APU) | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
GPU Architecture | CDNA 4 | CDNA 3+? | Aqua Vanjaram (CDNA 3) | Aqua Vanjaram (CDNA 3) | Aldebaran (CDNA 2) | Aldebaran (CDNA 2) | Aldebaran (CDNA 2) | Arcturus (CDNA 1) | Vega 20 | Vega 20 | Vega 10 | Fiji XT | Polaris 10 |
GPU Process Node | 4nm | 4nm | 5nm+6nm | 5nm+6nm | 6nm | 6nm | 6nm | 7nm FinFET | 7nm FinFET | 7nm FinFET | 14nm FinFET | 28nm | 14nm FinFET |
GPU Chiplets | TBD | TBD | 8 (MCM) | 8 (MCM) | 2 (MCM) 1 (Per Die) | 2 (MCM) 1 (Per Die) | 2 (MCM) 1 (Per Die) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) |
GPU Cores | TBD | TBD | 19,456 | 14,592 | 14,080 | 13,312 | 6656 | 7680 | 4096 | 3840 | 4096 | 4096 | 2304 |
GPU Clock Speed | TBD | TBD | 2100 MHz | 2100 MHz | 1700 MHz | 1700 MHz | 1700 MHz | 1500 MHz | 1800 MHz | 1725 MHz | 1500 MHz | 1000 MHz | 1237 MHz |
INT8 Compute | TBD | TBD | 2614 TOPS | 1961 TOPS | 383 TOPs | 362 TOPS | 181 TOPS | 92.3 TOPS | N/A | N/A | N/A | N/A | N/A |
FP16 Compute | TBD | TBD | 1.3 PFLOPs | 980.6 TFLOPs | 383 TFLOPs | 362 TFLOPs | 181 TFLOPs | 185 TFLOPs | 29.5 TFLOPs | 26.5 TFLOPs | 24.6 TFLOPs | 8.2 TFLOPs | 5.7 TFLOPs |
FP32 Compute | TBD | TBD | 163.4 TFLOPs | 122.6 TFLOPs | 95.7 TFLOPs | 90.5 TFLOPs | 45.3 TFLOPs | 23.1 TFLOPs | 14.7 TFLOPs | 13.3 TFLOPs | 12.3 TFLOPs | 8.2 TFLOPs | 5.7 TFLOPs |
FP64 Compute | TBD | TBD | 81.7 TFLOPs | 61.3 TFLOPs | 47.9 TFLOPs | 45.3 TFLOPs | 22.6 TFLOPs | 11.5 TFLOPs | 7.4 TFLOPs | 6.6 TFLOPs | 768 GFLOPs | 512 GFLOPs | 384 GFLOPs |
VRAM | TBD | HBM3e | 192 GB HBM3 | 128 GB HBM3 | 128 GB HBM2e | 128 GB HBM2e | 64 GB HBM2e | 32 GB HBM2 | 32 GB HBM2 | 16 GB HBM2 | 16 GB HBM2 | 4 GB HBM1 | 16 GB GDDR5 |
Infinity Cache | TBD | TBD | 256 MB | 256 MB | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
Memory Clock | TBD | TBD | 5.2 Gbps | 5.2 Gbps | 3.2 Gbps | 3.2 Gbps | 3.2 Gbps | 1200 MHz | 1000 MHz | 1000 MHz | 945 MHz | 500 MHz | 1750 MHz |
Memory Bus | TBD | TBD | 8192-bit | 8192-bit | 8192-bit | 8192-bit | 4096-bit | 4096-bit bus | 4096-bit bus | 4096-bit bus | 2048-bit bus | 4096-bit bus | 256-bit bus |
Memory Bandwidth | TBD | TBD | 5.3 TB/s | 5.3 TB/s | 3.2 TB/s | 3.2 TB/s | 1.6 TB/s | 1.23 TB/s | 1 TB/s | 1 TB/s | 484 GB/s | 512 GB/s | 224 GB/s |
Form Factor | TBD | TBD | OAM | APU SH5 Socket | OAM | OAM | Dual Slot Card | Dual Slot, Full Length | Dual Slot, Full Length | Dual Slot, Full Length | Dual Slot, Full Length | Dual Slot, Half Length | Single Slot, Full Length |
Cooling | TBD | TBD | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling |
TDP (Max) | TBD | TBD | 750W | 760W | 560W | 500W | 300W | 300W | 300W | 300W | 300W | 175W | 150W |