The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloadsby Johan De Gelas on March 31, 2016 12:30 PM EST
- Posted in
- Enterprise CPUs
We have been spoiled. Since the introduction of the Xeon "Nehalem" 5500 (Xeon 5500, March 2009), Intel has been increasing the core counts of their Xeon CPUs by nearly 50% almost every 18 months. We went from four to six (Xeon 5600) on June 2010. Sandy Bridge (Xeon E5-2600, March 2012) increased the core count to 8. That is only 33% more cores, but each core was substantially faster than the previous generation. Ivy Bridge EP (Xeon E5-2600 v2, launched September 2013) increased the core count from 8 to 12, the Haswell-EP (Xeon E5-2600 v3, sept 2014) surprised with an 18-core flagship SKU.
However it could not go on forever. Sooner or later Intel would need to slow down a bit on adding cores, for both power and space reasons, and today Intel has finally pumped the brakes a bit.
Launching today is the latest generation of Intel's Xeon E5 processors, the Xeon E5 v4 series.Fifteen months after Intel's Broadwell architecture and 14nm process first reached consumers, Broadwell has finally reached the multi-socket server space with Broadwell-EP. Like past EP cores, Broadwell-EP is the bigger, badder sibling of the consumer Broadwell parts, offering more cores, more memory bandwidth, more cache, and more server-focused features. And thanks to the jump from their 22nm process to their current-generation 14nm process, Intel gets to reap the benefits of a smaller, denser process.
Getting back to our discussion of core counts then, even with the jump to 14nm, Intel has played it more conservatively with their core counts. Compared to the Xeon E5 v3 (Haswell-EP), Xeon E5 v4 (Broadwell-EP) makes a smaller jump, going from 18 cores to 24 cores, for an increase of 33%. Yet even then, for the new Xeon E5 v4 "only" 22 cores are activated, so we won't get to see everything Broadwell-EP is capable of right away.
Meanwhile the highest (turbo) clockspeed is still 3.6 GHz, base clocks are reduced with one or two steps and the core improvements are very modest (+5%). Consequently, performance wise, this is probably the least spectacular product refresh we have seen in many years.
But there are still enough paper specs that make the Broadwell version of the Xeon E5 attractive. It finds a home in the same LGA 2011-3 socket. Few people will in-place upgrade from Xeon E5 v3s to Xeon E5 v4s, but using the same platform means less costs for the server vendors, and more software maturity (drivers etc.) for the buyers.
They look very different but fit in the same socket: Xeon E5 v4 on top, Xeon E5 v3 at the bottom
Broadwell also has several features that make it a more attractive processor for virtualized servers. Finer granular control over how applications share the uncore (caches and memory bandwidth) to avoid scenarios where low priority applications slow down high priority ones. Meanwhile quite a few improvements have been made to make the I/O intensive applications run smoother on top of a virtualized layer. Most businesses run their applications virtualized and virtualization is still the key ingredient of the fast growing cloud services (Amazon, Digital Ocean, Azure...), and more and more telecom operators are starting to virtualized their services, so these new features will definitely be put to good use. And of course, Intel made quite a few subtle - but worth talking about - tweaks to keep the HPC (mostly "simulation" and "scientific calculation software) crowd happy.
But don't make the mistake to think that only virtualization and HPC are the only candidates for the new up-to-22-cores Xeons. The newest generation of data analytics frameworks have made enormous performance steps forward by widening the network and storage bandwidth bottlenecks. One example is Apache Spark, which can crunch through terabytes of data much more efficiently than its grandparent Hadoop by making better use of RAM. To get results out of a massive hump of text data, for example, you can use some of most advanced statistical and machine learning algorithms. Mix machine learning with data mining and you get an application that is incredibly CPU-hungry but does not need the latest and fastest NVMe-based SSDs to keep the CPU busy.
Yes, we are proud to present our new benchmark based upon Apache Spark in this review. Combining analytics software with machine learning to get deeper insights is one of the most exciting trends in the enterprise world. And it is also one of the reason why even a 22-core Broadwell is still not fast enough.
Post Your CommentPlease log in or sign up to comment.
View All Comments
JohanAnandtech - Saturday, April 2, 2016 - linkOk, thanks, time to sleep a little longer. I have fixed the error.
xrror - Friday, April 1, 2016 - linkIt's depressing to see the mobile-first design philosophy really gutting into the last bastion of x86 performance.
I mean I get it - a 22 (20) core xeon wouldn't even exist without the aggressive power management tech needed to keep it from melting or needing exotic cooling. But it's still depressing to see ALL of the arch improvements immediately negated with lowered clock speeds, or worse "turbo speeds" you will never actually see once the machine is running production loads.
The engineering behind these big core count chips though is always very impressive. Also did Intel ever say how they "fixed" TSX?
FunBunny2 - Friday, April 1, 2016 - link"It's depressing to see the mobile-first design philosophy really gutting into the last bastion of x86 performance."
welcome to the world of laissez faire capitalism: do what makes the most money today, irregardless of future consequences. used to be, Intel could rely on M$ making the next versions of Windoze and Office impossible to run on existing Pentiums, thus driving sales of the next Pentium (a whole machine, at that). these days it's up to gamers and data centres. not taking any bets on which turns out to be in the driver's seat.
xrror - Friday, April 1, 2016 - linkWell, considering that "computer gaming" has degraded to whatever the kids are running on their smartphones, or the parent's tablet I'm not hopeful for any new resurgence in demand for high performance PC's in the mass market.
So the future consequences for Intel prioritizing power efficiency over performance, or possibly developing a separate fabrication tech for performance is... likely not very much. So there really is no "future consequence" for Intel. Sure they could go out and actually try and make a 10Ghz 9nm part possible, but nobody in 2020 would buy it because... it probably would go into whatever iDevice they care about. And HPC market I dunno. Maybe if it datamines marketing data faster or can microtrade on the stock market faster or something. meh.
The general public really doesn't care about performance anymore (honestly, they may never have), only how portable it is and if a device is good enough to run their stuff on the go.
The high end market like these multi-core xeons though, is strange because you'd think this is where Intel would go all in, but I guess when your only competitors are IBM Power and (currently non-competitive) AMD I dunno...
I mean it's sad, even Intel has to beg to justify it's R&D expenses to shareholders - which is stupid because Intel's R&D is one of it's biggest strengths. But such as it is. Apr 1 rant over ;)
abufrejoval - Friday, April 1, 2016 - linkJohan, you keep bemoaning the fact that lack of competition seems to stop "real progress" and I wonder where you expect that progress to happen.
More specifically you seem to desire more GHz and I can understand that desire, which may originate from that crazy 40MHz to 4GHz rush we all experienced somewhere in the decade starting in the mid nineties.
I understand the emotion, but I wonder how it fits the scientific mind I see everywhere else in your work, because 8, 16 or 32 GHz is simply not going to happen, competition or not.
Sure 8GHz are possible, you can even purchase 5GHz off the shelves. But it simply doesn't deliver in terms of Oomp/$. And Web Scale is all about value/€ and the main driver of server evolution today.
We'll still see radical speedups where it counts, but it will have to be via special purpose function blocks either on SoCs, or by adding a couple of extra instructions or by doing something as radical as Micron's Automata Processor.
But general purpose von Neumann has hit the Gigahertz wall years ago and nothing can change that except a different model of compute.
I liked the reference to Andreas Stiller, but I'm not sure everybody here has a subscription to c't like I do since the early 1990's. There could also be the tiny issue that not everyone outside Belgium is quadrilingual.
Make no mistake: I love your work! It's a pleasure to read for form, style and the content!
The Von Matrices - Saturday, April 2, 2016 - linkAny indication of the QPI speed of these chips? Did Intel increase it from the 9.6 GT/s in Haswell-EP?
Ian Cutress - Saturday, April 2, 2016 - linkMost of the high end are 9.6 GT/s. https://twitter.com/IanCutress/status/715582714099...
watersb - Saturday, April 2, 2016 - linkJohan, this is fantastic work. Thanks very much.
Any way to address RAS features?
isrv - Saturday, April 2, 2016 - linkwell, i'm completely dissapointed.
web servers wants higher clock speed.
single-thread load (like PHP) become even slower on those E5v4 due to drop in GHz's.
still, the best CPU's for that is E3-1290v2, E3-1281v3 (and 1286v3), E3-1280v5, E5-1630v3, E5-1620v2 and the only one 6-core E5-1660v2
all those are 3.7Ghz (pointless to look at turbo speed since we're under constant 24/7 load).
i was hoping to at least one 3.8GHz or even higher.
so no changes here, E5-1660v2 is still the fastest web-server CPU.
or E5-1630v3 by sacrificing 2 cores for a bit faster memory.
patrickjp93 - Sunday, April 3, 2016 - linkFor those 4-8 core chips, the turbo boost is maintainable for 24/7 workloads if your cooling is sufficient. You seem to know far less about this environment than you let on. And who the hell still uses single-threaded PHP? And you're not taking into account better caching algorithms and other architectural improvements that make the 200MHz slower V4 run faster than your V2.