Next For AMD In The Data Center: Tighter EPYC-Radeon Integration

'I think you'll see a greater and greater coupling in terms of timing, capability [and] workload affinity that I think will be more the norm,' AMD executive Scott Aylor says of the chipmaker's plans for its next-generation EPYC and Radeon Instinct processors for the data center.

ARTICLE TITLE HERE

It's no question 2019 was an important year for AMD as the chipmaker launched its first 7-nanometer CPUs for desktops and servers, but this year was also pivotal for the company because of its continued re-entry into the data center market with its new EPYC Rome processors.

So now that AMD has released its first 7nm products, what's next for the chipmaker in the data center? It's all about the tighter integration of AMD's EPYC processors and its Radeon Instinct GPUs, both from an architectural standpoint and a release cadence standpoint, to improve system performance.

[Related: Forrest Norrod On Why AMD EPYC 'Rome Kicks A**']

id
unit-1659132512259
type
Sponsored post

That's according to Scott Aylor, corporate vice president and general manager of AMD's Datacenter Solutions, who pointed to AMD's EPYC-Radeon integration plans in a recent interview with CRN when asked about the chipmaker's next big story in the data center.

"I think you'll see a greater and greater coupling in terms of timing, capability [and] workload affinity that I think will be more the norm," he said. "We're working to plan them closer together, and when I think about [CRN's] readership, that has benefits on things like [high-performance computing], but it also has big benefits on what we're doing from a machine learning and [artificial intelligence] perspective [as well as] what we're doing around visualization."

AMD's plans to launch future generations of EPYC processors and Radeon GPUs closer together comes after two years in which the chipmaker launched new versions of the respective product lines separately. For instance, the most recent Radeon Instinct GPUs, the MI50 and MI60, were released in November 2018 while second-generation EPYC, code-named Rome, released in August of this year.

The chipmaker's north star for tighter EPYC-Radeon integration is the 1.5-exaflop Frontier supercomputer AMD is developing with Cray for the U.S. Department of Energy's Oak Ridge National Laboratory, according to Aylor. The supercomputer, which could become the world's fastest supercomputer when it goes online in 2021, is set to use EPYC processors based on a future iteration of AMD's Zen architecture along with new Radeon GPUs and a custom Infinity Fabric interconnect.

"You can think about how those have been co-architected to create a one-and-a-half exaflop system," he said. "You don't get to that mountaintop in one step."

As for the architectural work it will take for tighter integration between EPYC and Radeon, it will happen from both a hardware and software perspective, according to Aylor. That includes continued development on AMD's Infinity Fabric and PCIe connectivity as well as the chipmaker's ROCm software platform for GPU-accelerated computing.

"When you hear the guys at Oak Ridge talk about Frontier, they have a vision that we're working to enable: you decide where you want to target the work, CPU- or GPU-based, rather than I've got to do a bunch of bespoke stuff to make the GPU work, I've got to do a bunch of bespoke stuff to make the CPU work," Aylor said of AMD's software work.

While AMD has a lot of work cut out for the Frontier supercomputer, the chipmaker's CPU-GPU integration work is already starting to show promising results.

For a server configuration with no PCIe switches that consists of two EPYC 7742 processors and eight Radeon Instinct MI50 GPUs, the system is 26 percent faster than a server with two Intel Xeon Platinum 8280 processors, two PCIe 3.0 switches and eight Nvidia Tesla V100 GPUs, according to NAMD 2.13 benchmark results presented by AMD.

One of the keys to this faster performance is PCIe 4.0 support from AMD's latest EPYC and Radeon Instinct processors, which enables higher data throughput over PCIe 3.0, according to Ogi Brkic, corporate vice president and general manager of the AMD's Datacenter GPU business. This means that the benchmarked AMD system can achieve throughput speeds of 512 GB/s for CPU-to-GPU communication while the Intel system can only reach 64 GB/s, according to AMD.

"It's all about feeding the beast," Brkic said.

And because AMD's server configuration doesn't require any PCIe switches, it doesn't just make the system faster, it also impacts the total cost of ownership, the executive added.

"So that's a beauty: It reduces the cost of your system. You don't have to put the switches in there. It gives you less latency. It means you have persistent performance," Brkic said.

While AMD is working on tighter integration between its CPU and GPU products, Aylor said he wants to make it clear that the chipmaker will remain open so that its products will continue to work with products from rivals Intel and Nvidia, all of whom have shared motherboard space for years.

"We're going to tell you that, but what you're going to see from us even more, what's next is, how do we think about what we call 'A plus A?' AMD GPU, AMD CPU delivering a better together experience in the data center," Aylor said.

Dominic Daninger, vice president of engineering at Nor-Tech, a Burnsville, Minn.-based HPC system integrator, said AMD's CPU-GPU integration work aims to tackle throughput, which is one of the growing bottlenecks in servers as processors begin to outpace the speed at which data travels. It's something that Intel is also working on with its upcoming slate of GPUs for PCs and servers.

"Intel sees the same thing with their GPU products: this ability to move data between [the CPU and GPU] at higher bandwidths is going to be important for both camps," he said.

Many GPU-accelerated servers have traditionally been architected to keep data in the GPU's memory without having it travel to the CPU, according to Daninger, because the slow throughput between the components would drag the performance of the entire application.

But thanks to AMD's PCIe 4.0 support with its EPYC Rome processors and Radeon Instinct GPUs, the chipmaker is already making it more realistic for server operators to move more data between the two component types, according to Daninger. And that means more opportunities for channel partners to sell GPU-accelerated servers to those who want to benefit from the GPU's parallel computing capabilities while also allowing data to move around more freely, he added.

"If you can make that penalty of moving data to and from [the CPU] less significant, it's going to broaden the market opportunities for GPU," he said.