NVIDIA AI Continues Leading Performance Across MLPerf Tests

NVIDIA AI Continues Leading Performance Across MLPerf Tests

June 30, 2022 0 By Hermie Ansay

NVIDIA AI continues to hold its lead in AI training performance and most submissions across all benchmarks with 90% of all entries coming from the ecosystem, according to MLPerf benchmarks.

The NVIDIA AI platform covered all eight benchmarks in the MLPerf Training 2.0 round, highlighting its leading versatility.

No other accelerator ran all benchmarks, which represent popular AI use cases including speech recognition, natural language processing, recommender systems, object detection, image classification and more. NVIDIA has done so consistently since submitting in December 2018 to the first round of MLPerf, an industry-standard suite of AI benchmarks.

Leading Benchmark Results, Availability

In its fourth consecutive MLPerf Training submission, the NVIDIA A100 Tensor Core GPU based on the NVIDIA Ampere architecture continued to excel.

Fastest time to train on each network by each submitter’s platform | Format: Chip count, Submitter, MLPerf-ID | RNN-T: 1536x NVIDIA 2.0-2104 | BERT: 4096x NVIDIA 2.0-2106, 4096x Google 2.0-2012, 256x Graphcore 2.0-2053, 8x Intel-HabanaLabs 2.0-2073 | RN-50: 4216x NVIDIA 2.0-2107, 4096x Google 2.0-2012, 256x Graphcore 2.0-2054, 8x Intel-HabanaLabs 2.0-2073 | 3D U-Net: 768x NVIDIA 2.0-2100 | RetinaNet: 1280x NVIDIA 2.0-2103, 2048x Google 2.0-2010 | Mask R-CNN: 384x NVIDIA 2.0-2099, 2048x Google 2.0-2010 | MiniGo: 1792x NVIDIA 2.0-2105 | DLRM: 112x NVIDIA 2.0-2098

Selene — Nvidia’s in-house AI supercomputer based on the modular NVIDIA DGX SuperPOD and powered by NVIDIA A100 GPUs, their software stack and NVIDIA InfiniBand networking — turned in the fastest time to train on four out of eight tests.

Per-chip performance is not a primary metric of MLPerf™ Training. To calculate per-chip performance, this chart normalizes every submission to the closest scale of the fastest competitor. The fastest competitors are shown with 1x. To determine the fastest competitor, we selected the scale common to most submitters.| Format: Chip count, Submitter, MLPerf-ID | ResNet-50: 8x Inspur 2.0-2069, 3456x Google 2.0-2011, 16x Graphcore 2.0-2047, 8x Intel-HabanaLabs 2.0-2073 | BERT: 8x Inspur 2.0-2070, 3456x Google 2.0-2011, 16x Graphcore 2.0-2045, 8x Intel-HabanaLabs 2.0-2073 | DLRM: 8x Inspur 2.0-2068 | Mask R-CNN: 384x NVIDIA 2.0-2099, 1024x Google 2.0-2009 | RetinaNet: 1280x NVIDIA 2.0-2103, 2048x Google 2.0-2010 | RNN-T 8x Inspur 2.0-2066 | 3D-UNet: 8x H3C 2.0-2060,| MiniGo: 8x H3C 2.0-2059

NVIDIA A100 also continued its per-chip leadership, proving the fastest on six of the eight tests.

A total of 16 partners submitted results this round using the NVIDIA AI platform. They include ASUS, Baidu, CASIA (Institute of Automation, Chinese Academy of Sciences), Dell Technologies, Fujitsu, GIGABYTE, H3C, Hewlett Packard Enterprise, Inspur, KRAI, Lenovo, Microsoft Azure, MosaicML, Nettrix and Supermicro.

Most of our OEM partners submitted results using NVIDIA-Certified Systems, servers validated by NVIDIA to provide great performance, manageability, security and scalability for enterprise deployments.

Many Models Power Real AI Applications

An AI application may need to understand a user’s spoken request, classify an image, make a recommendation and deliver a response as a spoken message.


Even the simple above use case requires nearly 10 models, highlighting the importance of running every benchmark

These tasks require multiple kinds of AI models to work in sequence, also known as a pipeline. Users need to design, train, deploy and optimize these models fast and flexibly.
That’s why both versatility – the ability to run every model in MLPerf and beyond – as well as leading performance are vital for bringing real-world AI into production.

Delivering ROI With AI

For customers, their data science and engineering teams are their most precious resources, and their productivity determines the return on investment for AI infrastructure. Customers must consider the cost of expensive data science teams, which often plays a significant part in the total cost of deploying AI, as well as the relatively small cost of deploying the AI infrastructure itself.

AI researcher productivity depends on the ability to quickly test new ideas, requiring both the versatility to train any model as well as the speed afforded by training those models at the largest scale.That’s why organizations focus on overall productivity per dollar to determine the best AI platforms — a more comprehensive view that more accurately represents the true cost of deploying AI.

In addition, the utilization of their AI infrastructure relies on its fungibility, or the ability to accelerate the entire AI workflow — from data prep to training to inference — on a single platform.

With NVIDIA AI, customers can use the same infrastructure for the entire AI pipeline, repurposing it to match the varying demands between data preparation, training and inference, which dramatically boosts utilization, leading to very high ROI.

And, as researchers discover new AI breakthroughs, supporting the latest model innovations is key to maximizing the useful life of AI infrastructure.

NVIDIA AI delivers the highest productivity per dollar as it is universal and performant for every model, scales to any size and accelerates AI from end to end — from data prep to training to inference.

Today’s results provide the latest demonstration of NVIDIA’s broad and deep AI expertise shown in every MLPerf training, inference and HPC round to date.