AMD Zen Architecture Roadmap: Zen 5 in 2024 With All-New Microarchitectureby Ryan Smith on June 9, 2022 4:21 PM EST
Today is AMD’s Financial Analyst Day, the company’s semi-annual, analyst-focused gathering. While the primary purpose of the event is for AMD to reach out to investors, analysts, and others to demonstrate the performance of the company and why they should continue to invest in the company, FAD has also become AMD’s de-facto product roadmap event. After all, how can you wisely invest in AMD if you don’t know what’s coming next?
As a result, the half-day series of presentations is full of small nuggets of information about products and plans across the company. Everything here is high-level – don’t expect AMD to hand out the Zen 4 transistor floorplan – but it’s easily our best look at AMD’s product plans for the next couple of years.
Kicking off FAD 2022 with what’s always AMD’s most interesting update is the Zen architecture roadmap. The cornerstone of AMD’s recovery and resurgence into a competitive and capable player in the x86 processor space, the Zen architecture is the basis of everything from AMD’s smallest embedded CPUs to their largest enterprise chips. So what’s coming down the pipe over the next couple of years is a very big deal for AMD, and the industry as a whole.
Zen 4: Improving Performance and Perf-Per-Watt, Shipping Later This Year
Diving right in, AMD is currently in the process of ramping up their Zen 4 architecture-based products. This includes the Ryzen 7000 (Raphael) client CPUs, as well as their 4th generation EPYC (Genoa) server CPUs. Both of these are due to launch later this year.
We’ve seen bits and pieces of information on Zen 4 thus far, most recently with the Ryzen 7000 announcement at Computex. Zen 4 brings new CPU core chiplets as well as a new I/O die, adding support for features such as PCI-Express 5.0 and DDR5 memory. And on the performance front, AMD is aiming for significant performance-per-watt and clockspeed improvements over their current Zen 3-based products.
Meanwhile, AMD is following up that Computex announcement by clarifying a few things. In particular, the company is addressing questions around Instruction per Clock (IPC) expectations, stating that they expect Zen 4 to offer an 8-10% IPC uplift over Zen 3. The initial Computex announcement and demo seemed to imply that most of AMD’s performance gains were from clockspeed improvements, so AMD is working to respond to that without showing too much of their hand months out from the product launches.
Coupled with that, AMD is also disclosing that they’re expecting an overall single-threaded performance gain of greater than 15% – with an emphasis on “greater than.” ST performance is a mix of IPC and clockspeeds, so at this point AMD can’t get overly specific since they haven’t locked down final clockspeeds. But as we’ve seen with their Computex demos, for lightly threaded workloads, 5.5GHz (or more) is currently on the table for Zen 4.
Finally, AMD is also confirming that there are ISA extensions for AI and AVX-512 coming for Zen 4. At this point the company isn’t clarifying whether either (or both) of those extensions will be in all Zen 4 products or just a subset – AVX-512 is a bit of a space and power hog, for example – but at a minimum, it’s reasonable to expect these to show up in Zen 4 server parts. The addition of AI instructions will help AMD keep up with Intel and other competitors in the short run, as CPU AI performance has already become a battleground for chipmakers. Though just what this does for AMD’s competitiveness there will depend in large part on just what instructions (and data types) get added.
AMD will be producing three flavors of Zen 4 products. This includes the vanilla Zen 4 core, as well as the previously-announced Zen 4c core – a compact core that is for high density servers and will be going into the 128 core EPYC Bergamo processor. AMD is also confirming for the first time that there will be V-Cache equipped Zen 4 parts as well – which although new information, does not come as a surprise given the success of AMD’s V-Cache consumer and server parts.
Interestingly, AMD is planning on using both 5nm and 4nm processes for the Zen 4 family. We already know that Ryzen 7000 and Genoa are slated to use one of TSMC’s 5nm processes, and that Zen 4c chiplets are set to be built on the HPC version of N5. So it’s not immediately clear where 4nm fits into AMD’s roadmap, though we can’t rule out that AMD is playing a bit fast and loose with terminology here, since TSMC’s 4nm processes are an offshoot of 5nm (rather than a wholly new node) and are typically classified as 5nm variants to start with.
At this point, AMD is expecting to see a >25% increase in performance-per-watt with Zen 4 over Zen 3 (based on desktop 16C chips running CineBench). Meanwhile the overall performance improvement stands at >35%, no doubt taking advantage of both the greater performance of the architecture per-thread, and AMD’s previously disclosed higher TDPs (which are especially handy for uncorking more performance in MT workloads). And yes, these are terrible graphs.
Zen 5 Architecture: All-New Microarchitecture for 2024
Meanwhile, carrying AMD’s Zen architecture roadmap into 2024 is the Zen 5 architecture, which is being announced today. Given that AMD isn’t yet shipping Zen 4, their details on Zen 5 are understandably at a very high level. None the less, they also indicate that AMD won’t be resting on their laurels, and have some aggressive updates planned.
The big news here is that AMD is terming the Zen 5 architecture as an “All-new microarchitecture”. Which is to say, it’s not merely going to be an incremental improvement over Zen 4.
In practice, no major vendor designs a CPU architecture completely from scratch – there’s always going to be something good enough for reuse – but the message from AMD is clear: they’re going to be doing some significant reworking of their core CPU architecture in order to further improve their performance as well as energy efficiency.
As for what AMD will disclose for right now, Zen 5 will be re-pipelining the front end and once again increasing their issue width. The devil is in the details here, but coming from Zen 3 and its 4 instruction/cycle decode rate, it’s easy to see why AMD would want to focus on that next – especially when on the backend, the integer units already have a 10-wide issue width.
Meanwhile, on top of Zen 4’s new AI instructions, Zen 5 is integrating further AI and machine learning optimizations. AMD isn’t saying much else here, but they have a significant library of tools to pick from, covering everything from AI-focused instructions to adding support for even more data types.
AMD expects the Zen 5 chip stack to be similar to Zen 4 – which is to say that they’re going to have the same trio of designs: a vanilla Zen 5 core, a compact core (Zen 5c), and a V-Cache enabled core. For AMD’s customers this kind of continuity is very important, as it gives customers a guarantee that AMD’s more bespoke configurations (Zen 4c & V-Cache) will have successors in the 2024+ timeframe. From a technical perspective none of this is too surprising, but from a business standpoint, customers want to make sure they aren’t adopting dead-end hardware.
Finally, AMD has an interesting manufacturing mix planned for Zen 5. Zen 5 CPU cores will be fabbed on a mix of 4nm and 3nm processes, which unlike the 5nm/4nm mix for Zen 4, TSMC’s 4nm and 3nm nodes are very different. 4nm is an optimized version of 5nm, whereas 3nm is a whole new node. So if AMD’s manufacturing plans move ahead as currently laid out, Zen 5 will be straddling a major node jump. That said, it’s not unreasonable to suspect that AMD is hedging their bets here and leaving 4nm on the table in case 3nm isn’t as far along as they’d like.
Wrapping things up, the Zen 5 architecture is slated for 2024. AMD isn’t giving any further information on when in the year that might be, though looking at Zen 3 and Zen 4, both of those were/will be released later on in 2020 and 2022 respectively. So H2/EOY 2024 is as good as guess as any.
Post Your CommentPlease log in or sign up to comment.
View All Comments
Bruzzone - Wednesday, June 15, 2022 - linkmb
Dolda2000 - Friday, June 10, 2022 - link>they're actually worse or similar to big P-cores
The same Chips & Cheese article I alluded to showed Gracemont using half the energy of Golden Cove in similar workloads, so I think "worse than P-cores", at least, would be an exaggeration.
Login - Friday, June 10, 2022 - linkIt makes some sense
Dolda2000 - Saturday, June 18, 2022 - linkGracemont certainly did do worse in AVX efficiency than otherwise, but I think that is to be expected, as energy use in AVX workloads is probably going to be dominated by the execution unit itself, and therefore isn't going to be significantly different just because it has been integrated into a different microarchitecture. I think the more pertinent picture is https://i0.wp.com/chipsandcheese.com/wp-content/up... which, being integer-bound, is more likely to show the energy use of the whole out-of-order engine and whatnot, rather than that of a specific execution unit.
That being said, if you look at the cumulative energy use of the libx264 execution, it looks more impressive than on the instantaneous power draw chart and shows that Gracemont is, at least, *more* efficient (if not by a huge amount): https://i0.wp.com/chipsandcheese.com/wp-content/up...
mode_13h - Monday, June 20, 2022 - linkThanks for sharing! Very interesting that Gracemont seems designed to clock only up to about 3.0 GHz, while the knee in Golden Cove's energy curve comes at about about 4.2 GHz.
It's also interesting to look at where those curves first start to climb. In Gracemont's case, it begins a gradual climb above 1.2 GHz, whereas Golden Cove doesn't markedly increase energy usage until about 2.0 GHz and 1.4 GHz. Those would be the peak-efficiency points (at least clock-speed wise - those graphs don't show actual performance).
One thing I didn't expect was for the slopes to be so similar, for so long. However, it's hard to read much into that, without knowing how actual performance scales with clocks. Ultimately, what we care about is how much perf/W the cores are delivering.
ian9298 - Friday, June 10, 2022 - linkIntel clocked E core way too high, to the point they were not efficient any more
techjunkie123 - Thursday, June 9, 2022 - linkThat's what the 4 and 4c designs will be, but mainstream users would probably not use all the 4c cores, so they are sticking to big cores (little cores on server for nT tasks).
More importantly, as another commenter said, zen 2 and (presumably, but less than zen 2) zen 3 are already pretty efficient.
Intel's big.little seem to have missed the mark in terms of efficiency, at least this generation. It allowed them to get solid 1T and nT performance, but the efficiency and hence laptop battery life is not great. Hopefully next generation will be more impressive in this regards.
Kangal - Sunday, June 12, 2022 - linkI wonder if in the distant future, would AMD come up with something for portables.
Specifically for the 10in-18in portables, running anywhere from 7W-70W power. Hypothetically, they might be able to do a BIG.little design but with a mixture of x86 and ARM cores. For example; 15W for a chipset with 2x "Zen7" BIG Cores and 8x "Cortex-A750" little cores. And more "gaming" oriented Laptops (35W) could have a 4x Zen Cores with 8x ARM cores, for modern gaming engines which use more threads to work more appropriately. This all would run on the new Windows20 Operating System, where everything can run on the ARM cores with decent performance and low energy drain, but Programs needing to run Legacy Mode, would be shifted to the x86 cores. When maximum performance is required, the x86 cores are prioritised for that activity, and the ARM cores are pressed to handle the background tasks. This way we get the best of both worlds. Meanwhile, Full-sized Desktops would have an all-x86 chipset since they don't have the limitations of thermals, power, and portability.
Obviously this is just pie in the sky thinking at this point, but it might be a possibility someday.
michael2k - Sunday, June 12, 2022 - linkSince the 4C design is socket compatible it, naively, appears that the 4C design is designed to consume and shed 3/4 the energy/heat.
DannyH246 - Friday, June 10, 2022 - linkWhy? Introduce all that extra complexity for what? Currently their big cores are more power efficient that Intel's little cores. If you want extreme multithreaded performance then go with Zen 4c. But standard Zen4 will have excellent single threaded performance, and excellent multi threaded performance.