Core-to-Core Latency

As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true especially in multi-socket server environments.

But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.

If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.


(Click on the image to enlarge)

Analyzing core-to-core latencies on the AMD Ryzen Threadripper 7980X (64C/128T), our test is limited to probing the first 64 threads, although scaling out to 128 threads would be identical. Each CCD on the Threadripper 7980X has 8 x Zen 4 cores, with 32 MB of L3 cache. Looking at the latency range within the CCD, we can see inner latencies between 7 and 20 ns, which increase to 89 and 96 ns as each core communicates within the CCX.


A visual render of the AMD Ryzen Threadripper 7980X with 8 x CCDs and IOD

Given we've reviewed the AMD Ryzen 9 7950X, which has the same Zen 4 cores and the same CCD complex approach to communicating between the cores, we see relatively similar latencies in both Threadripper 7000 and Ryzen 7000. A quad-channel DDR5 memory controller integrated within the large IOD and using PCIe 5.0 lanes as the primary pathway is important in enhancing the Infinity Fabric interconnect to reduce latencies and help counteract any penalties.

SPEC2017 Multi-Threaded Results Threadripper 7000 vs. Threadripper 3000: Generational Improvements
Comments Locked

66 Comments

View All Comments

  • Threska - Wednesday, November 22, 2023 - link

    Sounds like the complaint of a cheap person that doesn't want to spend their money on anything. Starts with a fruit-vegetable comparison and ends with an absurdly low-balled figure.
  • SanX - Thursday, November 23, 2023 - link

    It is better to be cheap than dumb. I wrote TR is 2x faster than consumer 7950X? Let's take this more precisely from "Science and Simulation" for example as scientists should do. Out of its 13 tests the TR 7980x won only 5. Even more, taking the mean square root of test ratios we can get that TR actually only 33% faster than 7950X3D. Couple tests look like a single core taking them out changes this outcome just 5%. What a misery, it is actually a TOTAL DEBACLE! Buy the way, just in case.tell your relatives to take the credit card from you
  • BushLin - Thursday, November 23, 2023 - link

    Tonight's Headlines:
    Guy on the internet with a narrow use case decrees AMD's entire HEDT lineup BS. His application runs just as well on a consumer platform so no one else could possibility find value...
  • SanX - Sunday, November 26, 2023 - link

    YMMV
  • SanX - Thursday, November 23, 2023 - link

    "You know how much it costs to develop these chips? AN insane amount of money."
    OK, tell us how much exactly.

    AMD first introduced chiplets in 2015. The cost of that development returned many times since. As to the cost of chiplets themselves, Zen4 chiplets have around 6B transistors. Apple Bionic A14 chip has twice of that and costs $17. Do the math
  • Shmee - Wednesday, November 22, 2023 - link

    I wonder why there is no 16 core option. It would be nice to have a less expensive HEDT CPU for gaming, with higher clocks. Also, why no gaming benchmarks?
  • Oxford Guy - Wednesday, November 22, 2023 - link

    Games aren't designed to leverage these chips (too many cores, not enough clock, no 3D cache, too much inter-module latency).

    Games are designed for low-end CPUs, comparatively.

    As for a 16-core version, it wouldn't be enough cores to justify the cost of the motherboard unless AMD were targeting extreme clocks, which the company isn't.
  • mvkorpel - Thursday, November 23, 2023 - link

    The 7970X actually has a max boost clock of 5.3 GHz, according to AMD. It is reported as 5.1 GHz in the article.
  • PeachNCream - Sunday, November 26, 2023 - link

    HEDT is a terribly scammy space for CPUs. The markup for overall compute power is high, the maximum CPU clocks are low, power consumption and cooling is crazy, and then there is the biggest issue - per CPU memory bandwidth to RAM. Modern 4-8 core laptop CPUs get two memory channels. This chip gives you a measly 4 channels far more processor cores to squabble over. That's woefully inefficient scaling to say the least and I'm sure someone will start crying about wiring complexity in a world where we have 172-layer stacked NAND and hundreds of CPU cores on a single chip package while ignoring that wiring for 8 memory channels would be trivial with a little bit of effort and thought put into it.
  • TomWomack - Monday, November 27, 2023 - link

    Usually secondhand last-generation servers are a better source of pure computrons than HEDT; on the other hand third-generation Xeon Scalable with eight channels per processor hasn't made it to the second-hand market yet, and whilst the less-popular many-core Skylake CPUs are under £100 the base systems are still quite expensive and the stock levels aren't great.

Log in

Don't have an account? Sign up now