NVIDIA, Microsoft and Ingrasys have teamed up to announce the Pascal Tesla P100 based HGX-1 hyperscale GPU accelerator. The new system would deliver an exponential boost in AI cloud computing and would follow an open-source design which is in conjunction with Microsoft's Project Olympus initiative.
NVIDIA and Microsoft Announce The Multi Tesla P100 Powered HGX-1 Hyperscale GPU Accelerator For AI and Cloud Computing
NVIDIA's dive into the AI industry has been a major success for them. During the previous quarter, the NVIDIA workstation segment reported a revenue increase of 145% from the the previous fiscal year. In Q4 FY16, the Datacenter market reported a revenue of $93 Billion US and during Q4 FY17, this revenue reached the $296 Billion US market which is an impressive feat for the company in this market segment and their investment in high-performance GPUs dedicated for accelerating AI, DNN and cloud computing technologies has really payed off.
HGX-1 does for cloud-based AI workloads what ATX (Advanced Technology eXtended) did for the PC motherboards when it was introduced more than two decades ago. It establishes an industry standard that can be rapidly and efficiently embraced to help meet surging market demand.
The new architecture is designed to meet the exploding demand for AI computing in the cloud -- in fields such as autonomous driving, personalized healthcare, superhuman voice recognition, data and video analytics, and molecular simulations. "AI is a new computing model that requires a new architecture," said Jen-Hsun Huang, founder and chief executive officer of NVIDIA.
"The HGX-1 hyperscale GPU accelerator will do for AI cloud computing what the ATX standard did to make PCs pervasive today. It will enable cloud-service providers to easily adopt NVIDIA GPUs to meet surging demand for AI computing." via NVIDIA
"The HGX-1 AI accelerator provides extreme performance scalability to meet the demanding requirements of fast-growing machine learning workloads, and its unique design allows it to be easily adopted into existing data centers around the world," wrote Kushagra Vaid, general manager and distinguished engineer, Azure Hardware Infrastructure, Microsoft, in a blog post. via NVIDIA
For the thousands of enterprises and startups worldwide that are investing in AI and adopting AI-based approaches, the HGX-1 architecture provides unprecedented configurability and performance in the cloud.
What Powers The NVIDIA HGX-1 Hyperscale GPU Accelerator?
Powered by eight NVIDIA Tesla P100 GPUs in each chassis, it features an innovative switching design based on NVIDIA NVLink interconnect technology and the PCIe standard, enabling a CPU to dynamically connect to any number of GPUs. This allows cloud service providers that standardize on the HGX-1 infrastructure to offer customers a range of CPU and GPU machine instance configurations.
Cloud workloads are more diverse and complex than ever. AI training, inferencing and HPC workloads run optimally on different system configurations, with a CPU attached to a varying number of GPUs. The highly modular design of the HGX-1 allows for optimal performance no matter the workload. It provides up to 100x faster deep learning performance compared with legacy CPU-based servers, and is estimated at one-fifth the cost for conducting AI training and one-tenth the cost for AI inferencing.
With its flexibility to work with data centers across the globe, HGX-1 offers existing hyperscale data centers a quick, simple path to be ready for AI.
Tesla Products | Tesla K40 | Tesla M40 | Tesla P100 |
---|---|---|---|
GPU | GK110 (Kepler) | GM200 (Maxwell) | GP100 (Pascal) |
SMs | 15 | 24 | 56 |
TPCs | 15 | 24 | 28 |
FP32 CUDA Cores / SM | 192 | 128 | 64 |
FP32 CUDA Cores / GPU | 2880 | 3072 | 3584 |
FP64 CUDA Cores / SM | 64 | 4 | 32 |
FP64 CUDA Cores / GPU | 960 | 96 | 1792 |
Base Clock | 745 MHz | 948 MHz | 1328 MHz |
GPU Boost Clock | 810/875 MHz | 1114 MHz | 1480 MHz |
Compute Performance - FP32 | 5.04 TFLOPS | 6.82 TFLOPS | 10.6 TFLOPS |
Compute Performance - FP64 | 1.68 TFLOPS | 0.21 TFLOPS | 5.3 TFLOPS |
Texture Units | 240 | 192 | 224 |
Memory Interface | 384-bit GDDR5 | 384-bit GDDR5 | 4096-bit HBM2 |
Memory Size | Up to 12 GB | Up to 24 GB | 16 GB |
L2 Cache Size | 1536 KB | 3072 KB | 4096 KB |
Register File Size / SM | 256 KB | 256 KB | 256 KB |
Register File Size / GPU | 3840 KB | 6144 KB | 14336 KB |
TDP | 235 Watts | 250 Watts | 300 Watts |
Transistors | 7.1 billion | 8 billion | 15.3 billion |
GPU Die Size | 551 mm² | 601 mm² | 610 mm² |
Manufacturing Process | 28-nm | 28-nm | 16-nm |
NVIDIA Joins Open Compute Project
NVIDIA is joining the Open Compute Project to help drive AI and innovation in the data center. The company plans to continue its work with Microsoft, Ingrasys and other members to advance AI-ready computing platforms for cloud service providers and other data center customers.
NVIDIA and Fujitsu Build a AI Supercomputer Based on 24 NVIDIA DGX-1s Units, Packs 192 Tesla P100 Modules
NVIDIA's Tesla P100 based DGX-1 platform has also been used to power the RIKEN supercomputer. This supercomputer is dedicated to Japan's largest comprehensive research institute, for deep learning and AI research.
The largest customer installation of DGX-1 systems to date, the supercomputer will accelerate the application of AI to solve complex challenges in healthcare, manufacturing and public safety.
“DGX-1 is like a time-machine for AI researchers,” said Jen-Hsun Huang, founder and CEO of NVIDIA. “Enterprises, research centers and universities worldwide are adopting DGX-1 to ride the wave of deep learning — the technology breakthrough at the center of the AI revolution.” via NVIDIA
The RIKEN Supercomputer is Fitted With Clusters of DGX-1 Units
The RIKEN Center for Advanced Intelligence Project will use the new supercomputer, scheduled to go online next month, to accelerate AI research in several areas, including medicine, manufacturing, healthcare and disaster preparedness.
“We believe that the NVIDIA DGX-1-based system will accelerate real-world implementation of the latest AI technologies as well as research into next-generation AI algorithms,” said Arimichi Kunisawa, head of the Technical Computing Solution Unit at Fujitsu Limited. “Fujitsu is leveraging its extensive experience in high-performance computing development and AI research to support R&D that utilizes this system, contributing to the creation of a future in which AI is used to find solutions to a variety of social issues.”
The supercomputer will also use 32 Fujitsu PRIMERGY servers, which, combined with the DGX-1 systems, will boost its total theoretical processing performance to 4 petaflops when running half-precision floating point calculations
The system features a number of technological innovations unique to the DGX-1, including:
- Containerized deep learning frameworks, optimized by NVIDIA for maximum GPU-accelerated deep learning training
- Greater performance and multi-GPU scaling with NVIDIA NVLink, accelerating time to discovery
- An integrated software and hardware architecture optimized for deep learning
NVIDIA Jetson AI Supercomputer Gets Upgraded With Pascal GPUs In TX2 Revision
NVIDIA has also announced their Jetson TX2 AI supercomputer which is an update to the Maxwell based Jetson TX1. The new embedded AI supercomputer now features the power of Pascal's GPU processing cores and packs double the performance at just 7.5W.
In terms of specifications, Jetson TX2 packs the Pascal GPU along with a combination of 64-bit Denver 2 and A57 CPUs. There's 8 GB of 128-bit LPDDR4 memory dedicated on the board that provides 58.4 GB/s bandwidth. The bandwidth is more than twice of what was featured on the previous Jetson TX1 board. Storage is also twice and now sits in at 32 GB of eMMC.
A few highlights:
- Startup Blue River Technologies offers “lettuce thinning as a service” to farmers in the Salinas Valley that allows them to manage the year-round lettuce crop in one of the nation’s most productive agricultural regions.
- VIMOC uses Jetson as part of its hardware and software platform to integrate data from a wide array of different sensors to help manage complex building, such as parking garages.
- Enroute demonstrated how it’s using Jetson TX1 to create autonomous search and rescue drones that can bring payloads of up to 20 pounds to where they’re needed most.
- Fellow Robots is using Jetson to power a fleet of robots that help manage inventory, or even help customers find exactly what they need in sprawling big box stores.
The Jetson TX2 joins the Jetson TX1 and TK1 products for embedded computing. Jetson is an open platform. So it’s accessible to anyone for putting advanced AI to work “at the edge,” or in devices in the world all around us. Jetson TX2 doubles the performance of its predecessor. Or it can run at more than twice the power efficiency, while drawing less than 7.5 watts of power.
NVIDIA Jetston TX2 Availability
Our NVIDIA Jetson TX2 Developer Kit can be pre-ordered today for $599 in the United States and Europe and will begin shipping March 14. It will be available in other regions in the coming weeks. The Jetson TX2 module will be available in the second quarter for $399 in quantities of 1,000 or more.