CPU Tests: Microbenchmarks

Core-to-Core Latency

As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true especially in multi-socket server environments.

But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.

If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test built by Andrei, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.

The core-to-core numbers are interesting, being worse (higher) than the previous generation across the board. Here we are seeing, mostly, 28-30 nanoseconds, compared to 18-24 nanoseconds with the 10700K. This is part of the L3 latency regression, as shown in our next tests.

One pair of threads here are very fast to access all cores, some 5 ns faster than any others, which again makes the layout more puzzling. 

Update 1: With microcode 0x34, we saw no update to the core-to-core latencies.

Cache-to-DRAM Latency

This is another in-house test built by Andrei, which showcases the access latency at all the points in the cache hierarchy for a single core. We start at 2 KiB, and probe the latency all the way through to 256 MB, which for most CPUs sits inside the DRAM (before you start saying 64-core TR has 256 MB of L3, it’s only 16 MB per core, so at 20 MB you are in DRAM).

Part of this test helps us understand the range of latencies for accessing a given level of cache, but also the transition between the cache levels gives insight into how different parts of the cache microarchitecture work, such as TLBs. As CPU microarchitects look at interesting and novel ways to design caches upon caches inside caches, this basic test proves to be very valuable.

Looking at the rough graph of the 11700K and the general boundaries of the cache hierarchies, we again see the changes of the microarchitecture that had first debuted in Intel’s Sunny Cove cores, such as the move from an L1D cache from 32KB to 48KB, as well as the doubling of the L2 cache from 256KB to 512KB.

The L3 cache on these parts look to be unchanged from a capacity perspective, featuring the same 16MB which is shared amongst the 8 cores of the chip.

On the DRAM side of things, we’re not seeing much change, albeit there is a small 2.1ns generational regression at the full random 128MB measurement point. We’re using identical RAM sticks at the same timings between the measurements here.

It’s to be noted that these slight regressions are also found across the cache hierarchies, with the new CPU, although it’s clocked slightly higher here, shows worse absolute latency than its predecessor, it’s also to be noted that AMD’s newest Zen3 based designs showcase also lower latency across the board.

With the new graph of the Core i7-11700K with microcode 0x34, the same cache structures are observed, however we are seeing better performance with L3.

The L1 cache structure is the same, and the L2 is of a similar latency. In our previous test, the L3 latency was 50.9 cycles, but with the new microcode is now at 45.1 cycles, and is now more in line with the L3 cache on Comet Lake.

Out at DRAM, our 128 MB point reduced from 82.4 nanoseconds to 72.8 nanoseconds, which is a 12% reduction, but not the +40% reduction that other media outlets are reporting as we feel our tools are more accurate. Similarly, for DRAM bandwidth, we are seeing a +12% memory bandwidth increase between 0x2C and 0x34, not the +50% bandwidth others are claiming. (BIOS 0x1B however, was significantly lower than this, resulting in a +50% bandwidth increase from 0x1B to 0x34.)

In the previous edition of our article, we questioned the previous L3 cycle being a larger than estimated regression. With the updated microcode, the smaller difference is still a regression, but more in line with our expectations. We are waiting to hear back from Intel what differences in the microcode encouraged this change.

Frequency Ramping

Both AMD and Intel over the past few years have introduced features to their processors that speed up the time from when a CPU moves from idle into a high powered state. The effect of this means that users can get peak performance quicker, but the biggest knock-on effect for this is with battery life in mobile devices, especially if a system can turbo up quick and turbo down quick, ensuring that it stays in the lowest and most efficient power state for as long as possible.

Intel’s technology is called SpeedShift, although SpeedShift was not enabled until Skylake.

One of the issues though with this technology is that sometimes the adjustments in frequency can be so fast, software cannot detect them. If the frequency is changing on the order of microseconds, but your software is only probing frequency in milliseconds (or seconds), then quick changes will be missed. Not only that, as an observer probing the frequency, you could be affecting the actual turbo performance. When the CPU is changing frequency, it essentially has to pause all compute while it aligns the frequency rate of the whole core.

We wrote an extensive review analysis piece on this, called ‘Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics’, due to an issue where users were not observing the peak turbo speeds for AMD’s processors.

We got around the issue by making the frequency probing the workload causing the turbo. The software is able to detect frequency adjustments on a microsecond scale, so we can see how well a system can get to those boost frequencies. Our Frequency Ramp tool has already been in use in a number of reviews.

Our ramp test shows a jump straight from 800 MHz up to 4900 MHz in around 17 milliseconds, or a frame at 60 Hz. 

Power Consumption: Hot Hot HOT CPU Tests: Office and Science
Comments Locked

541 Comments

View All Comments

  • CiccioB - Friday, March 5, 2021 - link

    I would like to know what you will say if Anantech will do the same with AMD Ryzen 4, that is reviewing it on an early motherboard with a beta BIOS and not yet tuned microcode and it will result not being as fast as you would expect (or hoped it to be) one month head of the actual release date.
    And present it as an official review of the product.

    I would bet you (and your "friends") would go and cry out for a payed article by Intel to make AMD product look worse that it really is "like the good old times when it payed everywhere on Earth to not make AMD sell its products".
  • supdawgwtfd - Friday, March 5, 2021 - link

    Except it appears the CPU is operating at advertised performance levels...

    It provides +19% improved performance for some things but not others.
  • the_eraser1 - Friday, March 5, 2021 - link

    It would actually be fine because Zen 4 is already showing 30% performance uplift clock-for-clock, and that's a full year ahead of launch.

    Be honest with yourself. Achieving even a 5% uplift with microcode optimizations ahead of launch is a pipe dream. The review successfully shows the kind of performance you would expect. Is there room for improvement over time? Of course, but that applies to any product.
  • CiccioB - Friday, March 5, 2021 - link

    I'm not speaking about the improvements in IPC. IPC is not everything to evaluate a product. If that were true ARM chip would be the winner since early '90s.
    I'm speaking about the fact that in many tests shwn here this architecture shows worse results than the previous one. That would mean there's something really broken in the architecture or in the SW they execute.

    About Zen 4, don't old your breath because 5nm for HP are not that close. Even Intel Ocean Cove is said to be the really new revolutionary architecture that is finally going to show what the new 7nm PP could have really brought if it was available today as we speak.
    These are speculations, while this chip will be out in less than a month and waiting for the final tuning would have just made a better service to anyone that really wants to know how really it behaves. Not how it looks like on a unknown motherboard with a not updated BIOS and not the final (or even the first version) of the microcode.
  • the_eraser1 - Friday, March 5, 2021 - link

    This performance from RKL is unsurprising if you had been paying attention. We've known for months that Intel hasn't had great results with RKL and that's why they're pushing for Alder Lake ASAP.
    Once again, you cannot honestly expect substantial performance uplifts a month before launch. It's possible the ring/uncore frequency was low for this review, however that will only make a significant difference in games, or other latency sensitive scenarios.

    As for Zen 4, I know for fact from reputable sources that it's coming around the middle of 2022, with large uplifts in performance.
  • Qasar - Saturday, March 6, 2021 - link

    " I'm speaking about the fact that in many tests shwn here this architecture shows worse results than the previous one. That would mean there's something really broken in the architecture or in the SW they execute. " no your crying cause intel still lost. and that well. rocket lake, isnt the performer it was made out to be. AND it looks like some of the performance regressions, were explained/accounted for, in the review, which you obviously did not read
  • Otritus - Saturday, March 6, 2021 - link

    Originally ocean cove was leaked to be a revolutionary architecture with a massive ipc uplift that can serve as the backbone for future architectures like Conroe did in the original Core2 line. Leaks have later said that Intel cancelled the revolutionary nature of the product and ocean cove is simply going to be another microarchitecture like sunny or golden cove. By that point AMD should have zen 5 and we the consumer can enjoy healthy competition.

    As to the point on how it really behaves. We can see from the frequency graphs that rocket lake is turbo boosting as it should. A new BIOS could change power limits which would change behavior, but when rocket lake is already given infinite turbo time, such changes are likely to lead to performance regressions. The only other possibility is that the slight maturation of the bios leads to a small performance uplift. This WOULD BE IPC, but such improvement would be small at best given that z590 (and the similar sunny cove) has been out for a while. At best gaming performance may not suck as hard (or be fixed), but overall the performance improvements should be less than 5%. Not enough to change any conclusion.

    My only gripe is the usage of "review" over something that indicates it's a pre-release product. However, given that Intel themselves didn't have any comments on the article, the final performance is likely to be so similar that this is basically the review of the final-release product.
  • Spunjji - Saturday, March 6, 2021 - link

    They've already done that with a bunch of AMD products - like the OEM-only 4000 series APU and the weird Xbox One S APU desktop board.

    Speaking as a tech enthusiast, if they get hold of Zen 4 before release and can do a preview that doesn't break NDA, I would be over the moon. I love to get an idea of how a release will shape up, as long as there are caveats that it may not be final performance - which is exactly what we got here.
  • Timoo - Monday, March 8, 2021 - link

    I'd Lóve it!
    Beta BIOS might be a slight disadvantage, but we've seen it with the release of ZEN1. At the time everyone blamed the BIOS for memory compatibility, etc. etc. etc.

    In the end, not much improvement was found, once stable BIOSes were out. Bugs were fixed, but ZEN1 was still not beating Intel.
  • TheinsanegamerN - Monday, March 8, 2021 - link

    It would still match the expected out of box performance. Stop being so salty over Intel sucking the big one, RKL is a total dud performance wise, a microcode tweak is not going to increase IPC. It already boosts to where it should be and draws LMFAO power.

Log in

Don't have an account? Sign up now