NVIDIA has unveiled the very first architectural details for their next-generation Tegra Parker SOC at Hot Chips. The latest Tegra SOC is based on TSMC's advanced FinFET process, combining the Pascal GPU and Denver 2 CPU architecture for unprecedented increase in performance and efficiency. NVIDIA revealed at Hot Chips that they are currently focusing the SOC at automotive markets but there are possibilities that it could arrive in different solutions too.
NVIDIA Tegra Parker SOC Detailed - TSMC FinFET Process, Pascal GPU and Denver 2 CPU Architecture
Starting with the details, the Tegra Parker SOC is based on the 16nm FinFET process node from TSMC and uses NVIDIA's latest CPU and GPU architecture. The bulk of the mass is dedicated to the Pascal GPU cores and their Denver 2 CPU cores. The chip features 256 CUDA cores that are based on the same DNA that is featured on the Titan X (Pascal) graphics card. The ARM v8 CPU complex comprises of two Denver 2 and four A57 cores with an coherent HMP (Heterogeneous Multi-Processor Architecture).
The Denver 2 and A57 chips each pack 2 MB of L2 cache and are linked via the HMP architecture to deliver 4 MB L2 cache. The Denver 2 chips also pack 128K+64K sub cache while the A57 chips include a 48K+32K sub-cache system. In addition to the CPU cores, the unit also packs 128b LPDDR4 support with 50 GB/s bandwidth (ECC). Display is a triple pipeline (4K @ 60 FPS) link while camera features include auto-HDR technology on up to 12 cameras.
NVIDIA mentions that their Denver 2 chips are the most advanced and highest performance ARM CPUs with significant performance improvements over first gen Denver cores. The new cores feature dynamic code optimizations, a 7-wide superscalar architecture and several low power retention states. This leads to a 40% performance increase in CPU performance over Apple's A9X chip.
While NVIDIA may not reveal the full purpose of Tegra Parker aside from automotive, I believe they have hinted that Parker is as good as an gaming chip as it's an automotive processor. The Multiprocessor architecture that combines big+super cores are mentioned to be great for single threaded performance, maximize the aggregate performance and have a sufficient thread count for automotive and gaming applications.
NVIDIA Tegra Parker SOC Supports Hardware Enabled Virtualization
The Tegra Parker SOC is also the first Tegra chip to support Hardware enabled CPU, GPU and SOC Virtualization. The chip can drive up to 8 virtualized machines with each VM having its dedicated display pipeline. NVIDIA will also provide their own software solutions to deliver the best virtualization experience using the Tegra SOC.
The Tegra SOC also supports 4K 60 FPS Encode/Decode, Ethernet-AVB, Dual CAN, QSPI for automotive, eMMC 5.2 and SATA for storage, PCI-E and a dedicated audio-processing chip.
The Tegra Parker SOC is first featured on the Drive PX 2 which comes with two such Tegra modules and even more space to support dedicated MXM graphics cards. This product packs 12 CPU Cores, four Pascal GPUs (2 Tegra / 2 MXM) with 8 TFLOPs of FP32 and 24 TFLOPs of INT8 compute. We have already seen the product packing two GP106 GPUs in MXM form factor so we expect something close to the GTX 1060 (Notebook) on NVIDIA's Drive PX 2 solution.
NVIDIA Drive PX Generation Comparison:
Product Name | NVIDIA Drive PX | NVIDIA Drive PX 2 | NVIDIA Drive Xavier | NVIDIA Drive Pegasus | NVIDIA Drive AGX Orin |
---|---|---|---|---|---|
SOC Name | Tegra X1 | Parker | Xavier | Xavier | Orin |
Process Technology | 20nm SOC | 16nm FinFET | 12nm FinFET | 12nm FinFET | TBA |
SOC Transistors | 2 Billion (Tegra X1) | N/A | 7 Billion (Xavier) | 7 Billion (Xavier) | 17 Billion (Orin) |
GPU Architecture | Maxwell (256 Core) | Pascal (256 Core) | Volta (512 Core) | Volta (512 Core) | Ampere? |
CPU | 16 Core ARM CPU | 12 Core ARM CPU | 8 Core ARM CPU | 16 Core ARM CPU | 12 Core ARM CPU |
CPU Architecture | 8x Cortex A57 8x Cortex A53 | 4x Denver 8x Cortex A57 | Carmel ARM64 8 Core CPU (8 MB L2 + 4 MB L3) | Carmel ARM64 8 Core CPU (8 MB L2 + 4 MB L3) | ARM Herclues Cores |
Compute DLTOPs | N/A | 20 DLTOPs | 30 TOPs | 320 TOPs | 200 TOPs |
Total Chips | 2 x Tegra X1 | 2 x Tegra X2 2 x Pascal MXM GPUs | 1 x Xavier | 2 x Volta 2 x Turing | 1 x Ampere |
System Memory | LPDDR4 | 8 GB LPDDR4 (50+ GB/s) | 16 GB 256-bit LPDDR4 | LPDDR4 + GDDR6 | N/A |
Graphics Memory | N/A | 4 GB GDDR5 (80+ GB/s) | 137 GB/s | 1 TB/s | 200 GB/s |
TDP | 20W | 80W | 30W | 500W | TBA |