SMT4 Details - Four Threads Per Core

One of the things that makes the Thunder line-up stand out from the competition is its inclusion of 4-way SMT, meaning that each core can execute up to 4 threads.

 

Each thread is viewed from the OS as a fully independent CPU and each has its own independent Arm architecture state, sharing the vast majority of the core’s resources bar a very few exceptions such as the aforementioned Skid Buffer.

The microarchitecture had always been multi-threaded, but Marvell went ahead to re-account for the area impact of SMT and discloses that it only takes about 5% of a core.

The company further details some of the mechanisms of its SMT, such as its arbitration mechanism between threads. During the fetch stage for example, the core will pick the thread which currently has the least amount of instructions live in the core’s pipelines, ensuring that the number of micro-ops and instructions further down the pipeline are balanced between the threads. We see a similar logic in the dispatch stage, and the thread with the fewest instructions downstream in the pipeline is picked out of the Skid Buffer.

The back-end has no notion of threads and simply executes the micro-ops which are oldest first. Retiring happens with priority in regards to the threads that have most backed up instructions for retiring.

Marvell says that this thread arbitrations works quite well on most codes, with the execution latencies between threads being quite uniform.

The speed-up that SMT can bring to the table is reversely correlated with the IPC of a given workload, meaning that a low IPC workload will see the biggest improvements with SMT. Other way to describe this is data-plane centric workloads which have a high latency to data fetching for execution are better suited to hide these kind of bottlenecks and idle-periods of the core through SMT.

Low-IPC workloads such as databases see a quite big gain in IPC and performance reaching up to 2x for 4 threads. Higher IPC workloads with a smaller data footprint will see more limited benefit to IPC.

Translating this to socket-level performance, we see a great scaling up to 60 cores which is essentially the physical core count of the processor, and a more sub-linear, but still quite respectable scaling up to 240 threads. Performance from 60 to 240 threads increases by roughly 60% which is a nice gain considering the very low area impact of SMT4 on Marvell’s cores.

When asked about how its ThudnerX3 is positioned against the competition, Marvell says that against Intel based products the company will slightly lag behind in single-thread performance, but will offer vastly greater multi-threaded throughput. Against AMD (we assume Rome), the TX3 is said to perform better in single-threaded performance with AMD taking the lead in workloads with low data sharing, although the TX3 to do better in workloads with more data-sharing such as database applications. Graviton2 is said to be a very good chip, although it offered low frequency and no threading support, so those are the areas the TX3 would be better in.

Overall, the TX3 seems like a solid candidate in the current server space, however I don’t feel like it differentiates itself very much aside from the fact that it offers SMT support. I feel like the CPU’s microarchitecture is still quite narrow, and although the IPC improvements are generationally good, Marvell also has significantly longer time between releases than Arm. In that regard, only slightly beating the Graviton2 here doesn’t seem enough and I do expect Altra-based designs to be faster.

We’ll have to see how the ThunderX3 ends up in terms of performance and power efficiency, but aside from dataplane heavy workloads that can fully take advantage of SMT, I feel like it might be a too close for comfort race for Marvell. For the consumer and enterprises, it’s exciting either way as this means we’ll have a ton of viable options in the near future.

Related Reading:

Triton CPU Core - 30% Generational IPC Improvements
Comments Locked

27 Comments

View All Comments

  • McCartney - Tuesday, August 18, 2020 - link

    i just want to give quantumz0d credit for having the courage at expressing what is clearly the case. i'm tired of ARM trash being propounded by losers stuck in the stock market and trying to pump whenever they can.

    i foresaw this years ago when i was propositioned with "entering the market" and doing an IPO. i steadfastly refused since i looked at it as a "credit aggregation scheme" in which the quality of any monetary "cashout" would be directly dependent on those buying in. and as far as i can see, that's not the best way to secure my future.

    8 years later, my fears have been realised. the "market" has destroyed the enthusiast industry, where the latter has been enslaved by the former. all we hear about from today's "enthusiast" is how ARM processors are great, with these foolish expectations that x86 binaries can somehow be transitioned to ARM seamlessly. there is a lack of appreciation for both sides of the coin with today's enthusiast (learning the software side and the hardware side) and it is reflected by the lack of diverse offerings from the manufacturers.

    in the words of one of my favourite people in the embedded space, ralph baechle (a huge contributor to MIPS), it was never foreseen that ARM would even go multicore (https://www.tldp.org/HOWTO/SMP-HOWTO-3.html).

    in fact, it's hard enough to make a good multicore embedded processor (the SH4[A] is/was amasing, and stacking more cores introduces bigger challenges when you compare the physical restrictions of the embedded segment versus the desktop.

    now, on to what you're saying Gomez (and originally the reason i wanted to post): i agree. for my line of work, an x86 or a good MIPS (Kfc, not just Kc) is an absolute necessity. i need larger shared memory and my work (10^5 dimension matrices that involve eigendecompositions) is not able to use "high core low cache+memory" designs such as nVidias (which has an API in MATLAB) or ARM.

    i agree with you entirely. it would be very interesting from a GPU design standpoint if nVidia absorbed ARM. i would love to see what their 'shader units' would look like after getting more direction from ARM cores.
  • Spunjji - Wednesday, August 19, 2020 - link

    Courage? For posting an ill-informed, barely-grammatical rant that didn't come close to a rational argument? Okay... 🤪

    Based on the rambling off-topic content of your post, it's hard to tell whether you're a z0d sockpuppet or just equally delusional.
  • mkanada - Wednesday, August 19, 2020 - link

    With Apple going to ARM, many desktop software will be ported to this architecture. So, in the next 5 years, I hope to see ARM workstations with powerfull GPUs, competing head-to-head with x86 based computers.
  • Gomez Addams - Friday, August 21, 2020 - link

    Windows already runs on ARM and Visual Studio can target ARM code generation. All it takes is a re-compilation. There are already ARM-powered GPUs available now. This site reviewed one recently. This is only the start.
  • Rudde - Friday, August 21, 2020 - link

    "8-wife fetch unit"
    Now I'm intrigued.
  • Industry_veteran - Saturday, August 29, 2020 - link

    Just 10 days after this announcement, the Marvell management seems to have realized there is no market for general purpose server grade ARM!. They pulled the rug under the feet of this team.
    It is funny because just stuffing more cores in the SoC doesn't win new customers in server market.
    For hyper scale customers the name of the game is performance per watt numbers.
    This Marvell team should have known this for long time yet they keep making these superficial announcements about how they can stuff so many cores in an SoC. Only less experienced people fall for that. The hyper scale customers know better.
  • pawder - Monday, September 14, 2020 - link

    Wow! That is impressive, thanks for charing with us and making it clear! Now I want to help you back, do you know that most of spouses are cheating? I`m sure that you know it and you know that they are keeping their secrets in their cell phones. Today I give you an opportunity to spy on them without accessing with https://topspying.com/spy-on-a-cell-phone-without-... So follow the instuctions and spy on anybody you want

Log in

Don't have an account? Sign up now