Core-to-Core Latency

As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true especially in multi-socket server environments.

But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.

If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.


(Click on the image to enlarge)

Analyzing core-to-core latencies on the AMD Ryzen Threadripper 7980X (64C/128T), our test is limited to probing the first 64 threads, although scaling out to 128 threads would be identical. Each CCD on the Threadripper 7980X has 8 x Zen 4 cores, with 32 MB of L3 cache. Looking at the latency range within the CCD, we can see inner latencies between 7 and 20 ns, which increase to 89 and 96 ns as each core communicates within the CCX.


A visual render of the AMD Ryzen Threadripper 7980X with 8 x CCDs and IOD

Given we've reviewed the AMD Ryzen 9 7950X, which has the same Zen 4 cores and the same CCD complex approach to communicating between the cores, we see relatively similar latencies in both Threadripper 7000 and Ryzen 7000. A quad-channel DDR5 memory controller integrated within the large IOD and using PCIe 5.0 lanes as the primary pathway is important in enhancing the Infinity Fabric interconnect to reduce latencies and help counteract any penalties.

SPEC2017 Multi-Threaded Results Threadripper 7000 vs. Threadripper 3000: Generational Improvements
Comments Locked

66 Comments

View All Comments

  • GeoffreyA - Thursday, November 23, 2023 - link

    Yes. Deceptive everything.
  • boozed - Monday, November 20, 2023 - link

    "While it's clear in multi-threaded workloads such as rendering, the Ryzen Threadripper 7980X and 7970X are more potent with higher core counts, there are certain situations where the current desktop flagship processors still represent a better buy."

    Good to know if I ever start playing Dwarf Fortress?
  • FatFlatulentGit - Monday, November 20, 2023 - link

    One test I'd like to see is encoding 4+ videos at once. One 4K AV1 or HEVC encode is not going to top out all of the cores on the 7980X, but enough parallel encodes will blast the thing.

    I also wouldn't mind seeing how they stack up against the WX series, especially in regard to RAM channels when the CPU is saturated.
  • garblah - Tuesday, November 21, 2023 - link

    So, even with a 5,000 dollar CPU, encoding an hour of 1080p AV1 video at 30fps with the medium quality preset would take nearly 2 hours? I guess AV1 software encoding is still pretty slow.
  • GeoffreyA - Tuesday, November 21, 2023 - link

    Just raising the presets a few steps can cut down the time considerably, without too much of a loss of quality. On my system, SVT-AV1's fastest preset, 12, approaches x264 preset medium, if I remember right, and the quality is still better than the latter.
  • GeoffreyA - Tuesday, November 21, 2023 - link

    And preset 6, which is medium, is roughly similar to libaom's fastest, cpu-used 8.
  • FatFlatulentGit - Tuesday, November 21, 2023 - link

    A single AV1 encode is not going to saturate 64/128 cores. The advantage is being able to do multiple simultaneous encodes.
  • GeoffreyA - Thursday, November 23, 2023 - link

    Or splitting into scene-based chunks.
  • SanX - Wednesday, November 22, 2023 - link

    These new processors are just the BS and utter ripoff. Look at supercomputers which use very similar processors: You can find there a lot of different models and test them. What these tests show is that during simulations they almost always stay around base frequency which is for this article's 64-core 2.5GHz processor equivalent to 32-cores of standard consumer ~5 Ghz 7950x which costs ~$500. So you pay 10x money for just the 2x increase in performance. What is 2x increase in performance ? NOTHING! When you compare computers, remember, you compare not a salary, game fps or your weight loss :) stop thinking this way, in computers, and specifically in supercomputers it is 3-10x when things are really different. Typically if usual PC is really not enough for you then the next step you need is 10x or 100x more, or even 1000x. So these hell expensive toys have no economic sense for almost everyone. Just get supercomputer time if you need more than your PC gives you and stop wasting your money. By the way these processors made off $10 chiplets cost probably $100 to manufacture
  • Thunder 57 - Wednesday, November 22, 2023 - link

    You're all over the place. First of all a 7950X has 16 cores. Even if tweo of those could match a 64 core TR (it won't), you'd need all of the other parts associated with a second computer. You are also forgetting about PCIe and memory bandwidth.

    Then you say maybe $100 to manufacture. You know how much it costs to develop these chips? AN insane amount of money. You make it sound like AMD is selling a $100 widget for $5000 because they can. People will buy these for $1000's. If they didn't sell, AMD would have to lower prices. The market will determine what is "fair".

Log in

Don't have an account? Sign up now