Core-to-Core, Cache Latency, Ramp

For some of our standard tests, we look at how the CPU performs in a series of synthetic workloads to example any microarchitectural changes or differences. This includes our core-to-core latency test, a cache latency sweep across the memory space, and a ramp test to see how quick a system runs from idle to load.

Core-to-Core

Inside the chip are eight cores connected through a bi-directional ring, each direction capable of transmitting 32 bytes per cycle. In this test we test how long it takes to probe an L3 cache line from a different core on the chip and return the result.

For two threads on the same core, we’re seeing a 7 nanosecond difference, whereas for two separate cores we’re seeing a latency from 15.5 nanoseconds up to 21.2 nanoseconds, which is a wide gap. Finding out exactly how much each jump takes is a bit tricky, as the overall time is reliant on the frequency of the core, of the cache, and of the fabric over the time of the test. It also doesn’t tell us if there is anything else on the ring aside from the cores, as there is also going to be some form of external connectivity to other elements of the SoC.

 

However, compared to the Zen3 numbers we saw on the Ryzen 9 5980HS, they are practically the same.

Cache Latency Ramp

This test showcases the access latency at all the points in the cache hierarchy for a single core. We start at 2 KiB, and probe the latency all the way through to 256 MB, which for most CPUs sits inside the DRAM.

Part of this test helps us understand the range of latencies for accessing a given level of cache, but also the transition between the cache levels gives insight into how different parts of the cache microarchitecture work, such as TLBs. As CPU microarchitects look at interesting and novel ways to design caches upon caches inside caches, this basic test proves to be very valuable.

The data here again mirrors exactly what we saw with the previous generation on Zen3.

Frequency Ramp

Both AMD and Intel over the past few years have introduced features to their processors that speed up the time from when a CPU moves from idle into a high-powered state. The effect of this means that users can get peak performance quicker, but the biggest knock-on effect for this is with battery life in mobile devices, especially if a system can turbo up quick and turbo down quick, ensuring that it stays in the lowest and most efficient power state for as long as possible.

Intel’s technology is called SpeedShift, although SpeedShift was not enabled until Skylake.

One of the issues though with this technology is that sometimes the adjustments in frequency can be so fast, software cannot detect them. If the frequency is changing on the order of microseconds, but your software is only probing frequency in milliseconds (or seconds), then quick changes will be missed. Not only that, as an observer probing the frequency, you could be affecting the actual turbo performance. When the CPU is changing frequency, it essentially has to pause all compute while it aligns the frequency rate of the whole core.

We wrote an extensive review analysis piece on this, called ‘Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics’, due to an issue where users were not observing the peak turbo speeds for AMD’s processors.

We got around the issue by making the frequency probing the workload causing the turbo. The software is able to detect frequency adjustments on a microsecond scale, so we can see how well a system can get to those boost frequencies. Our Frequency Ramp tool has already been in use in a number of reviews.

A ramp time of within one millisecond is as expected for modern AMD platforms, although we didn’t see the high 4.9 GHz that AMD has listed this processor as being able to obtain. We saw it hit that frequency in a number of tests, but not this one. AMD’s previous generation took a couple of milliseconds to hit around the 4.0 GHz mark, but then another 16 milliseconds to go full speed. We didn’t see it in this test, perhaps due to some of the new measurements AMD is doing on core workload and power. We will have to try this on a different AMD Ryzen 6000 Mobile system to see if we get the same result.

AMD's Ryzen 9 6900HS Rembrandt Benchmarked Power Consumption
Comments Locked

92 Comments

View All Comments

  • yankeeDDL - Tuesday, March 1, 2022 - link

    Great article, as usual.
    It seems clear that Intel's AL still has the performance advantage, however, in the Conclusion page, the performance comparison is reference to the nominal consumption (35W, 45W, 65W), while we know that Intel's part can reach twice as much power, in practice, making an apples-to-apples comparison quite difficult, especially in light of Intel's better scaling with more Power.

    Is there a way to check the exact performance per core under the same exact consumption (or scaled)?
    I am especially interested as a user of the 1165G, which is an absolute battery eater (and/or heater): it seems that AL is a huge improvement, but if it also draws 100W (instead of 45W) to beat Ryzen by a 10%, then it's not worth it. In my opinion.
  • Spunjji - Tuesday, March 1, 2022 - link

    Yes, the overall picture that has built up is of Intel's Alder Lake winning out at higher power levels (40W+) while AMD coming out ahead below that.

    This is good, because it means that we have great options for people who want the best possible performance in a mobile form-factor and for people who want a more even balance of performance and power usage. It's a nicer situation to be in than when Intel complete owned the mobile segment, followed by the years of stagnation at 14nm.
  • yankeeDDL - Tuesday, March 1, 2022 - link

    Agree on all points.
    Intel's Tiger Lake is an absolute disaster, and it is actually surprising that Intel only managed to lose 50% market share with such a lousy product compared to Ryzen.
    And equally surprising is the insane jump in performance and perf/watt achieved with AL. Definitely good for the consumers.
  • mode_13h - Tuesday, March 1, 2022 - link

    > Intel's Tiger Lake is an absolute disaster

    That seems like an overstatement. It just didn't improve enough against Ryzen, particularly in light of the 5000-series' gains. However, especially in light of Ice Lake's disappointments, Tiger Lake didn't seem so bad.
  • Alistair - Tuesday, March 1, 2022 - link

    Tiger Lake was a stroke of luck for Intel, their worst product ever during a massive silicon shortage. They spent the year selling quad cores because AMD was selling everything they could make, not because Tiger Lake was any good.
  • bigboxes - Wednesday, March 2, 2022 - link

    For sure. I went with AMD for the first time since 2006 this last year.
  • Samus - Thursday, March 3, 2022 - link

    The irony here is AMD mobile CPU's are widespread in lots of desktops and AIO's, even high end units. You would rarely, if ever, see Intel U-series parts in desktops\AIO's outside of USFF's or low-end AIO's with Celeron\Pentiums.

    This is happening partially because AMD doesn't have a wide product stack like Intel. And they don't need too. The AMD U-series parts are absolute performance monsters and have been for the last 3 generations.
  • abufrejoval - Friday, March 4, 2022 - link

    I own both, a Ryzen 5800U in a notebook and an i7-1165G7 as a NUC.

    They are really quite comparable, both in iGPU performance, in scalar CPU power and even in multi-threaded CPU power.

    At 15 Watts the 8 Ryzen cores operate below the CMOS knee, which means they have to clock so low they can't really gain much against 4 Tiger Lake cores clocking above it. Synthetic benchmarks may prove a lead that's next to impossible to realize or really relevant in day-to-day work. For the heavy lifting, I use a 5950X, which isn't that much faster on scalar loads, but runs almost as many rings around the 5800U as the i7-1165G7: the extra Watts make more of a difference than the cores alone.

    My impression is that the Ryzen needs the higher power envelope, 35 or even 65 Watts, and of course a matching workload to put those extra cores to work. AMD's primary aim for their APUs was to cover as many use cases as possible from a single part and they do amazingly well. If they could afford to do a native 4 core variant as well, I'm pretty sure that would outsell the 8 core.

    In fact the SteamDeck SoC would probably make a better notebook part for many (not everyone).

    And there is nothing wrong with Tiger Lake, except that perhaps today there are better SoCs around: it was and remains a welcome improvement over the previous generations from Intel.

    Buy it used and/or cheaper than these AL parts and you should have little to complain about... unless complaining is what you really enjoy.
  • mode_13h - Tuesday, March 1, 2022 - link

    > Great article, as usual.

    I thought so, as well, which was a relief. Then, I noticed the by-line:

    "by Dr. Ian Cutress"
  • lemurbutton - Tuesday, March 1, 2022 - link

    People shouldn't care that much about AMD and Intel on laptops right now. M1 series completely destroys both. AMD and Intel are 3-4 years behind.

Log in

Don't have an account? Sign up now