Intel Unveils Lunar Lake Architecture: New P and E cores, Xe2-LPG Graphics, New NPU 4 Brings More AI Performance
by Gavin Bonshor on June 3, 2024 11:00 PM ESTIntel Lunar Lake: New P-Core, Enter Lion Cove
Diving straight into the Performance, or P-Core commonly referred to, has had major architectural updates to increase power efficiency and performance. Bigger of these updates, Intel needed to comprehensively update its classic P-core cache hierarchy.
Key among these improvements is a significant overhaul of Intel's traditional P-core cache hierarchy. The fresh design for Lion Cove uses a multi-tier data cache containing a 48KB L0D cache with 4-cycle load-to-use latency, a 192KB L1D cache with 9-cycle latency, and an extended L2 cache that gets up to 3MB with 17-cycle latency. In total, this puts 240KB of cache within 9 cycles' latency of the CPU cores, whereas Redwood Cove before it could only reach 48KB of cache in the same period of time.
The data translation lookaside buffer (DTLB) has also been revised, increasing its depth from 96 to 128 pages to improve its hit rate.
Intel has also added a third Address Generation Unit (AGU)/Store Unit pair to further boost the performance of data write operations. Intel has also thrown more cache at the problem, and as CPU complexity grows, so does the reliance on the cache subsystems to keep them fed. Intel has also reworked the core-level cache subsystem by adding an intermediate data cache (IDC) between the 48 KB L1 and the L2 level. The original L1D cache is now called the L0 D-cache internally and retires to a 192 KB L1 D-cache.
The latest Lion Cove P-core design also includes a new front-end for handling instructions. The prediction block is 8x larger, fetch is wider, decode bandwidth is higher than on Raptor Cove, and there has been an enormous increase in Uops cache capacity and read bandwidth. The change in Uop queue capacity is designed to enhance the overall performance throughput.
The out-of-order engine in Lion Cove is partitioned in the footprint for Integer (INT) and Vector (VEC) domains Execution Domain with Independent renaming and scheduling. This type of partitioning allows for expandability in the future, independent growth of each domain, and benefits toward reduced power consumption for a domain-specific workload. The out-of-order engine is also improved, going from 6 to 8-wide allocation/rename and 8 to 12-wide retirement, with the deep instruction window increased from 512 to 576 entries and from 12 to 18 execution ports.
Lion Cove's integer execution units have also been improved over Raptor Cove, with execution resources grown from 5 to 6 integer ALUs, 2 to 3 jump units, and 2 to 3 shift units. Scaling from 1 to 3 units, these multiply 64x64 units to 64, which takes 3 units and gives even more compute power for the harder part of computation. Another significant development is transforming the P-core database from a 'sea of fubs' to a 'sea of cells.' This process of migrating the sub-organization of the P-cores structure from fubs to more organized cells essentially increases the density.
Intel has removed Hyper-Threading (HT) from their Lunar Lake SoC, with one potential reason being to enhance power efficiency and single-thread performance. By eliminating HT, Intel reduces power consumption and simplifies thermal management, which should extend battery life in ultra-thin notebooks. Intel does make a couple of claims regarding the Lion Cove P-cores, which are set to offer approximately 15% better performance-to-power and performance-to-area ratios than cores with HT. Intel's hybrid architecture, which effectively utilizes E-cores for multi-threaded tasks, reduces the need for HT, allowing workloads to be distributed more efficiently by the Intel Thread Director.
Power management has also been refined by including AI self-tuning controllers to replace the static thermal guard bands. This lets the system respond dynamically to real-time operating conditions in an adaptive way to achieve higher sustained performance. Intel also implements Lion Cove P-Core clock speeds at tighter 16.67MHz intervals rather than the traditional 100MHz. This means more accurate power management and finer tuning to squeeze as much from the power budget as possible.
Intel's Lion Cove P-Core microarchitecture looks like a nice upgrade over Golden Cove. Lion Cove incorporates improved memory and cache subsystems and better power management while not relying solely on opting for faster P-core frequencies to boost the IPC performance.
91 Comments
View All Comments
Ryan Smith - Tuesday, June 4, 2024 - link
"Was there a deadline push to get it out as soon as Intel released the information on Lunar Lake?"Yes. There was a hard deadline on this. Copyediting is ongoing.
Drumsticks - Tuesday, June 4, 2024 - link
Thank you for responding. I hope there’s an opportunity to address some of the writing at a bare minimum and maybe inject some of your own voice.Lunar Lake honestly looks like a pretty big deal. The process is great, the microarchs seem impressive on paper, and it’s coming out to market within a quarter of when it needs to, not within three years. Looking forward to deeper analysis and comparisons, wherever they come from, when the time comes.
NetMage - Tuesday, June 4, 2024 - link
A deadline for what? To regurgitate Intel PR hype as an LLM enhanced unreadable English as a second language article before everyone else publishes their copy of Intel’s Press Release? This article should have never been published.The Xe2 page is particularly horrendous.
Terry_Craig - Thursday, June 6, 2024 - link
There's nothing particularly interesting about Lunar Lake that deserves all this hunger for high-quality content.mode_13h - Friday, June 7, 2024 - link
Sarcasm detected.eastcoast_pete - Tuesday, June 4, 2024 - link
Now, I also like good copy editing, but this kind of coverage is done almost in real time. That, plus the time zone difference makes it hard to have a fully copy-edited and proofed article posted, all within hours of the presentation.Drumsticks - Wednesday, June 5, 2024 - link
I understand that. Anandtech has done it in the past, and produced really stellar coverage that way. I'm a little surprised that some of the paragraphs haven't gotten complete, proper rewrites more than 36 hours after the article went live. (To be fair, that's one business day really). But it's a headliner article on their front page; I'd hoped for more love.skavi - Tuesday, June 4, 2024 - link
Thank you for writing this out. Similar feelings, but I could not have put it so well. Here’s hoping AnandTech can figure out how to get back to that level of quality.Strom- - Tuesday, June 4, 2024 - link
I loved reading about the benefits of Thunderbolt 5.EthiaW - Tuesday, June 4, 2024 - link
Well done Intel, receiving enormous CHIPS subsidy while hollowing out production to TSMC tile by tile.