NVIDIA Unveils Grace: A High-Performance Arm Server CPU For Use In Big AI Systems

Name: NVIDIA Unveils Grace: A High-Performance Arm Server CPU For Use In Big AI Systems
Item: NVIDIA Unveils Grace: A High-Performance Arm Server CPU For Use In Big AI Systems
Author: Ryan Smith

by Ryan Smith on April 12, 2021 12:20 PM EST

119 Comments | Add A Comment

119 Comments

Kicking off another busy Spring GPU Technology Conference for NVIDIA, this morning the graphics and accelerator designer is announcing that they are going to once again design their own Arm-based CPU/SoC. Dubbed Grace – after Grace Hopper, the computer programming pioneer and US Navy rear admiral – the CPU is NVIDIA’s latest stab at more fully vertically integrating their hardware stack by being able to offer a high-performance CPU alongside their regular GPU wares. According to NVIDIA, the chip is being designed specifically for large-scale neural network workloads, and is expected to become available in NVIDIA products in 2023.

With two years to go until the chip is ready, NVIDIA is playing things relatively coy at this time. The company is offering only limited details for the chip – it will be based on a future iteration of Arm’s Neoverse cores, for example – as today’s announcement is a bit more focused on NVIDIA’s future workflow model than it is speeds and feeds. If nothing else, the company is making it clear early on that, at least for now, Grace is an internal product for NVIDIA, to be offered as part of their larger server offerings. The company isn’t directly gunning for the Intel Xeon or AMD EPYC server market, but instead they are building their own chip to complement their GPU offerings, creating a specialized chip that can directly connect to their GPUs and help handle enormous, trillion parameter AI models.

NVIDIA SoC Specification Comparison
	Grace	Xavier	Parker (Tegra X2)
CPU Cores	?	8	2
CPU Architecture	Next-Gen Arm Neoverse (Arm v9?)	Carmel (Custom Arm v8.2)	Denver 2 (Custom Arm v8)
Memory Bandwidth	>500GB/sec LPDDR5X (ECC)	137GB/sec LPDDR4X	60GB/sec LPDDR4
GPU-to-CPU Interface	>900GB/sec NVLink 4	PCIe 3	PCIe 3
CPU-to-CPU Interface	>600GB/sec NVLink 4	N/A	N/A
Manufacturing Process	?	TSMC 12nm	TSMC 16nm
Release Year	2023	2018	2016

More broadly speaking, Grace is designed to fill the CPU-sized hole in NVIDIA’s AI server offerings. The company’s GPUs are incredibly well-suited for certain classes of deep learning workloads, but not all workloads are purely GPU-bound, if only because a CPU is needed to keep the GPUs fed. NVIDIA’s current server offerings, in turn, typically rely on AMD’s EPYC processors, which are very fast for general compute purposes, but lack the kind of high-speed I/O and deep learning optimizations that NVIDIA is looking for. In particular, NVIDIA is currently bottlenecked by the use of PCI Express for CPU-GPU connectivity; their GPUs can talk quickly amongst themselves via NVLink, but not back to the host CPU or system RAM.

The solution to the problem, as was the case even before Grace, is to use NVLink for CPU-GPU communications. Previously NVIDIA has worked with the OpenPOWER foundation to get NVLink into POWER9 for exactly this reason, however that relationship is seemingly on its way out, both as POWER’s popularity wanes and POWER10 is skipping NVLink. Instead, NVIDIA is going their own way by building an Arm server CPU with the necessary NVLink functionality.

The end result, according to NVIDIA, will be a high-performance and high-bandwidth CPU that is designed to work in tandem with a future generation of NVIDIA server GPUs. With NVIDIA talking about pairing each NVIDIA GPU with a Grace CPU on a single board – similar to today’s mezzanine cards – not only does CPU performance and system memory scale up with the number of GPUs, but in a roundabout way, Grace will serve as a co-processor of sorts to NVIDIA’s GPUs. This, if nothing else, is a very NVIDIA solution to the problem, not only improving their performance, but giving them a counter should the more traditionally integrated AMD or Intel try some sort of similar CPU+GPU fusion play.

By 2023 NVIDIA will be up to NVLink 4, which will offer at least 900GB/sec of cummulative (up + down) bandwidth between the SoC and GPU, and over 600GB/sec cummulative between Grace SoCs. Critically, this is greater than the memory bandwidth of the SoC, which means that NVIDIA’s GPUs will have a cache coherent link to the CPU that can access the system memory at full bandwidth, and also allowing the entire system to have a single shared memory address space. NVIDIA describes this as balancing the amount of bandwidth available in a system, and they’re not wrong, but there’s more to it. Having an on-package CPU is a major means towards increasing the amount of memory NVIDIA’s GPUs can effectively access and use, as memory capacity continues to be the primary constraining factors for large neural networks – you can only efficiently run a network as big as your local memory pool.

CPU & GPU Interconnect Bandwidth
	Grace	EPYC 2 + A100	EPYC 1 + V100
GPU-to-CPU Interface (Cummulative, Both Directions)	>900GB/sec NVLink 4	~64GB/sec PCIe 4 x16	~32GB/sec PCIe 3 x16
CPU-to-CPU Interface (Cummulative, Both Directions)	>600GB/sec NVLink 4	304GB/sec Infinity Fabric 2	152GB/sec Infinity Fabric

And this memory-focused strategy is reflected in the memory pool design of Grace, as well. Since NVIDIA is putting the CPU on a shared package with the GPU, they’re going to put the RAM down right next to it. Grace-equipped GPU modules will include a to-be-determined amount of LPDDR5x memory, with NVIDIA targeting at least 500GB/sec of memory bandwidth. Besides being what’s likely to be the highest-bandwidth non-graphics memory option in 2023, NVIDIA is touting the use of LPDDR5x as a gain for energy efficiency, owing to the technology’s mobile-focused roots and very short trace lengths. And, since this is a server part, Grace’s memory will be ECC-enabled, as well.

As for CPU performance, this is actually the part where NVIDIA has said the least. The company will be using a future generation of Arm’s Neoverse CPU cores, where the initial N1 design has already been turning heads. But other than that, all the company is saying is that the cores should break 300 points on the SPECrate2017_int_base throughput benchmark, which would be comparable to some of AMD’s second-generation 64 core EPYC CPUs. The company also isn’t saying much about how the CPUs are configured or what optimizations are being added specifically for neural network processing. But since Grace is meant to support NVIDIA’s GPUs, I would expect it to be stronger where GPUs in general are weaker.

Otherwise, as mentioned earlier, NVIDIA big vision goal for Grace is significantly cutting down the time required for the largest neural networking models. NVIDIA is gunning for 10x higher performance on 1 trillion parameter models, and their performance projections for a 64 module Grace+A100 system (with theoretical NVLink 4 support) would be to bring down training such a model from a month to three days. Or alternatively, being able to do real-time inference on a 500 billion parameter model on an 8 module system.

Overall, this is NVIDIA’s second real stab at the data center CPU market – and the first that is likely to succeed. NVIDIA’s Project Denver, which was originally announced just over a decade ago, never really panned out as NVIDIA expected. The family of custom Arm cores was never good enough, and never made it out of NVIDIA’s mobile SoCs. Grace, in contrast, is a much safer project for NVIDIA; they’re merely licensing Arm cores rather than building their own, and those cores will be in use by numerous other parties, as well. So NVIDIA’s risk is reduced to largely getting the I/O and memory plumbing right, as well as keeping the final design energy efficient.

If all goes according to plan, expect to see Grace in 2023. NVIDIA is already confirming that Grace modules will be available for use in HGX carrier boards, and by extension DGX and all the other systems that use those boards. So while we haven’t seen the full extent of NVIDIA’s Grace plans, it’s clear that they are planning to make it a core part of future server offerings.

First Two Supercomputer Customers: CSCS and LANL

And even though Grace isn’t shipping until 2023, NVIDIA has already lined up their first customers for the hardware – and they’re supercomputer customers, no less. Both the Swiss National Supercomputing Centre (CSCS) and Los Alamos National Laboratory are announcing today that they’ll be ordering supercomputers based on Grace. Both systems will be built by HPE’s Cray group, and are set to come online in 2023.

CSCS’s system, dubbed Alps, will be replacing their current Piz Daint system, a Xeon plus NVIDIA P100 cluster. According to the two companies, Alps will offer 20 ExaFLOPS of AI performance, which is presumably a combination of CPU, CUDA core, and tensor core throughput. When it’s launched, Alps should be the fastest AI-focused supercomputer in the world.

An artist's rendition of the expected Alps system

Interestingly, however, CSCS’s ambitions for the system go beyond just machine learning workloads. The institute says that they’ll be using Alps as a general purpose system, working on more traditional HPC-type tasks as well as AI-focused tasks. This includes CSCS’s traditional research into weather and the climate, which the pre-AI Piz Daint is already used for as well.

As previously mentioned, Alps will be built by HPE, who will be basing on their previously-announced Cray EX architecture. This would make NVIDIA’s Grace the second CPU option for Cray EX, along with AMD’s EPYC processors.

Meanwhile Los Alamos’ system is being developed as part of an ongoing collaboration between the lab and NVIDIA, with LANL set to be the first US-based customer to receive a Grace system. LANL is not discussing the expected performance of their system beyond the fact that it’s expected to be “leadership-class,” though the lab is planning on using it for 3D simulations, taking advantage of the largest data set sizes afforded by Grace. The LANL system is set to be delivered in early 2023.

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

119 Comments

View All Comments

Qasar - Tuesday, April 13, 2021 - link
most of those i know that are looking for cpu upgrades, including one with a 2600K, are ALL looking to go with a ryzen 5000.
ChrisGX - Tuesday, April 13, 2021 - link
>> I am saying that we're very close to peak Intel. Will it be 2020 or 2022?

Some time in that timeframe, I expect. ARM based devices, robots and systems are dominant in new domains/sectors getting the autonomous intelligent systems treatment - motor vehicles, factory floors, etc. - and ARM seems to be taking business from x86 everywhere else so there isn't such a clear growth path for Intel, these days. Admittedly, Cloud Edge computing has a lot of growing to do so, if enterprises drag their feet the transition to ARM will take longer.

I can't wait to see Intel and AMD bludgeoning one another for exigent sales in a declining market. The smart play for both, it seems to me, would be to go down the custom ARM course. Within two years, I think that apparently insane proposition will probably have become an imperative one. (I suspect Pat Gelsinger's recent statements and others he will no doubt make in the future will be overtaken by events that aren't under PG's or Intel's control.)
Linustechtips12#6900xt - Tuesday, April 13, 2021 - link
"chrisGX" I totally agree with you but I kinda see why they keep competing in the x86 market, mainly due to Microsoft and what seems like since the lack of press reporting on it, not caring about porting windows to arm or just making windows 10x fully arm and then x86 can emulate arm. the first arm stuff they release is either gonna be mobile or server I'm betting on AMD releasing mobile Athlon chips in the next couple years with arm cores probably a Big little design to I would imagine.
mode_13h - Wednesday, April 14, 2021 - link
> The smart play for both, it seems to me, would be to go down the custom ARM course. Within two years, I think that apparently insane proposition will probably have become an imperative one.

AMD K12. Look it up.
dotjaz - Monday, April 12, 2021 - link
N2/V1 was announced last year, you should expect product to appear late this year, so there's a good chance this would be next generation.
Although N2 was supposed to be later and seems to be ARMv9 or something very close (ARMv8.5a+SVE/SVE2+I8MM+MEMTAG+BF16), so it's till possible.
Yojimbo - Monday, April 12, 2021 - link
I think it will be based on Arm's Poseidon platform. Why hold back? In the supercomputer space, AMD and Intel are going to be pushing hard for Genoa-Instinct and Granite Rapids-Xe systems. NVIDIA's memory bandwidth advantage will be a good selling point, but they will want a strong GPU to compete as even with CXL it may become lot harder to get their GPUs into supercomputers otherwise.

Now in the data center it'll be a different situation. There NVIDIA's GPU is likely to be the key selling point and will likely be chosen the majority of the time no matter which CPU it's paired with, though for large AI models the bandwidth the GPUs will enjoy to system memory using the Grace CPUs will be a strong selling point.
ChrisGX - Tuesday, April 13, 2021 - link
>> I think it will be based on Arm's Poseidon platform.

I think that is likely, too. Poseidon IP is due in 2022 and Grace will be released sometime in 2023. Not a lot of time but probably enough to get everything done right (as long as Poseidon IP can be delivered to licensees by early 2022). The scope of Poseidon IP (or that part of it due in 2022) isn't that clear, though. Will the IP be for Neoverse N3 or V2 or both N3 and V2? And nothing Nvidia has said about Grace gives a clue as to what pieces Nvidia will be using to build it. The SPECrate numbers quoted by Jensen Huang didn't really help in making any educated guesses about what cores might show up in Grace. I do think Grace will be an ARMv9 (with SVE2) CPU, however.

I can see this stinging Intel badly. I do not foresee any slow down of Nvidia in the supercomputer domain.

https://www.anandtech.com/show/16073/arm-announces...
Yojimbo - Tuesday, April 13, 2021 - link
EDIT: I meant "...they will want a strong CPU to compete..."
WaltC - Wednesday, April 21, 2021 - link
People really need to back up and get a grip, imo...announcing things, and shipping things, are two very different sides of the same "coin." The "shipping" facet being the whole ball of wax, so to speak. The present situation with nVidia and AMD should illustrate that fact convincingly--both companies have announced powerful, capable new GPUs, and both companies have actually made a few products with specific and well-known capabilities and hardware--but so far neither company has been able to get close to actually making/shipping enough of those products to meet global demand for them even fractionally!

And Grace isn't anywhere near the state of AMD and nVidia GPU development. Its problems are somewhat different--we know exactly what the GPUs are in terms of hardware, but nonetheless neither company is making anywhere near enough to meet a fraction of the global demand. But Grace? As nVidia doesn't know what Grace is specifically there is essentially a 0 demand globally for "Grace" for exactly that reason--what Grace is or will be is anything but clear, etc. As far as "Grace" is concerned we know almost nothing about it. Except that it presently does not exist atm, and that nVidia has announced that it intends to sell Grace, when and if ever nVidia can succeed in making *something* they will call "Grace,"--but then once they make it, what is it, then? Obviously, we don't know because nVidia doesn't really know just yet--beyond the broadest set of theoretical generalities, etc. nVidia hasn't even decided what its actual capabilities are--because nVidia doesn't actually know what the hardware will consist of--specifically! Generally, *everybody knows* what Grace will be if it is ever manufactured and sold--and that is an ARM server CPU of nVidia custom design. But that's not really saying much, is it?...;) I'm baffled that this aspect of Grace isn't crystal clear to everyone at this stage.

(Interjection--I like new stuff as much as the next guy--we *all* like new stuff! There would be something *wrong* with all of us if we didn't...;) But I much prefer *shipping* new stuff to new stuff which is still in the announcement/theoretical/embryonic design stage. I think that also applies to most people..;))

But here's some speculation and some theory of my own. I will be very surprised if the UK regulators approve the sale of ARM to nVidia--very surprised. I think nVidia's bid to purchase ARM stems from a desperation JHH has about nVidia's future beyond the next GPU--GPUs that hopefully nVidia (and AMD!) will be able to ship globally the next time they announce global availability for *anything*! It's an act of desperation by JHH, as I see it. He's trying to in one incredibly expensive fell swoop to create a rough equivalency between nVidia and AMD--nVidia would go from owning no CPU IP with an established, global market to an extensive global market and the IP of ARM--Apple, among others, would then find itself dancing to nVidia's tune--and so it's no wonder it's an appealing idea for JHH. "If you can't build it yourself, then buy what someone else has already accomplished, and try and appropriate it to yourself" etc. I'm no fan of Apple, but I can surely understand why both Apple and Microsoft, among others, vigorously oppose turning the ARM world over to JHH & nVidia.

Last, I read that there's a clause in the current purchase agreement that penalizes nVidia $1.25B if the deal collapses for any reason, apparently! So basically, nVidia's position is that even if the regulators in the UK nix the deal--which I think is all but inevitable--that nVidia pays ARM (or Softbank) one point two-five billion dollars. Also, why has nVidia signed on for an additional $750M paid to ARM for "ARM IP"--which, one would think, as these things go, that the payment for the ARM IP would be included in the ~$37B+ transfer of money and stock from nVidia to ARM/Softbank! Right? So what's the extra $750M for--bribes--kickbacks, greased palms? Even so, even if it's just a dodge to cover the funny-money siphoned off to the regulators to cover their "hard work" in approving the deal--that's a lot of money right off the top to let go in the *likely* even the UK regulators will kill the deal. This deal has so much of the UK's national security tied up in it in a variety of ways that I really wonder why JHH is taking such a risk which at least outwardly doesn't have a snowball's chance in Hades. That is why I think there's more than a bit of desperation behind the bid for ARM. And possibly more here than meets the eye--I'd bet on it.

Last, about date speculation fixing specifically on Grace deployment...not wise to count chickens before hatching, etc. There's no guarantee that when Grace is ever finalized and actually produced that anyone will actually want it, imo...;) People should remember the Larrabe debacle--that Intel had to finally kill before production every began. Speaking of dates and times, IIRC Intel was supposed to have shipped 10nm a couple of years ago, and so on. It's going to be interesting to see how all of this shakes out...lots and lots of 'announcements' everywhere...still, only genuine shipping products impress me, don't care who tried to make them.
mode_13h - Wednesday, April 21, 2021 - link
> so far neither company has been able to get close to actually making/shipping enough of those products to meet global demand for them even fractionally!

See: crypto mining.

> As nVidia doesn't know what Grace is specifically there is essentially a 0 demand globally for "Grace"

Lol. What?? I'm sure Nvidia knows EXACTLY what Grace is. It's a specialized coprocessor for some of their datacenter GPUs. It has in-built demand, because you probably won't be able to buy their next-gen DGX/AGX systems without it.

> "If you can't build it yourself, then buy what someone else has already accomplished, and try and appropriate it to yourself"

How does that not apply to practically all big acquisitions? Like Intel's purchase of Altera, Habana, & Mobileye, and AMD's purchase of Xylinx, for instance? That doesn't mean they're not good strategy. Do you expect these companies NOT to make good strategic moves, for some reason?

> So what's the extra $750M for--bribes--kickbacks, greased palms? Even so, even if it's just a dodge to cover the funny-money siphoned off to the regulators to cover their "hard work" in approving the deal--that's a lot of money right off the top to let go in the *likely* even the UK regulators will kill the deal.

Wow, such a load of BS. I'll bet it's not hard to find out what that's for, if you actually cared to know. It should be further elaborated in shareholder documents.

> People should remember the Larrabe debacle

What's your deal, man? Are you trying to short Nvidia? Good luck.

NVIDIA Unveils Grace: A High-Performance Arm Server CPU For Use In Big AI Systems

First Two Supercomputer Customers: CSCS and LANL

Post Your Comment

119 Comments

View All Comments

Qasar - Tuesday, April 13, 2021 - link

ChrisGX - Tuesday, April 13, 2021 - link

Linustechtips12#6900xt - Tuesday, April 13, 2021 - link

mode_13h - Wednesday, April 14, 2021 - link

dotjaz - Monday, April 12, 2021 - link

Yojimbo - Monday, April 12, 2021 - link

ChrisGX - Tuesday, April 13, 2021 - link

Yojimbo - Tuesday, April 13, 2021 - link

WaltC - Wednesday, April 21, 2021 - link

mode_13h - Wednesday, April 21, 2021 - link

Log in

Don't have an account? Sign up now