The Core

As Ian already discussed, the new Xeon E7 v2 is a 6, 8, 10, 12 or 15-core Ivy Bridge Xeon, similar to the Xeon E5-2600 v2. The big difference of course is that this new Xeon E7 v2 can be plugged into a quad- or native octal-socket server. These processors have three QuickPath Interconnects to be able to communicate over one hop. More sockets are possible with third party "glue logic".

Compared to the old Xeon E7 based on the "Westmere" core, the new Xeon E7 v2 "Ivy Bridge EX" features a vast amount of improvements. We will not list all of them, but just to give you an idea of how much progress has been made since the Westmere core:

  • µop cache (less decoding)
  • Improved branch prediction
  • Deeper and larger OoO buffers
  • Turbo Boost 2.0
  • AVX instructions
  • Divider is twice as fast
  • MOVs take no execution slots
  • Improved prefetchers
  • Improved shift/rotate and split/load
  • Better balance between Hyper-Threading and single-threaded performance; buffers are dynamically allocated to threads
  • Faster memory controller

Most of the improvement were fine tuning but the combined effect of them should result in a tangible performance boost in integer performance. For software that uses AVX, the performance boost could be very substantial. Even in software that uses older SSE(2) code, we found that the Sandy Bridge/Ivy Bridge generations were 20% faster, clock for clock, and we should see similar results here.

The Uncore

Just like the Xeon E5-2600 v2, the Ivy Bridge EX cores and 2.5MB L3 cache slices are stacked in columns connected with three fast rings, which connect all cores and all other the units (called agents) on the SoC. These rings also make sure that the L3 slices can act as one unified 37.5MB L3 cache with 450GB/s of bandwidth. The latency to the L3 cache is very low: 15.5ns (at 2.8GHz) versus 20ns for Westmere-EX (Xeon E7-4780 at 2.4GHz). PCIe I/O now happens on the die as well, and each CPU can support 32 PCIe lanes.

Finally, some coherency improvements are also implemented. Modified cache lines are send straight to the requester, without any write back to the memory agent. Overall, the collective sum of the improvement should prove quite capable.

Intel Aiming High Now with High Bandwidth Memory
Comments Locked

125 Comments

View All Comments

  • Kevin G - Monday, February 24, 2014 - link

    With POWER8 due out later this year, I suspect they'll be updating their old benchmarks with the newer hardware.

    The real question is why hasn't IBM ever submitted benchmarks for their z-series mainframes? Performance data there is very lacking. Though z-series costumers tend to fall into two groups: legacy mainframe applications and those who desire ultimate RAS regardless of the performance.
  • Phil_Oracle - Tuesday, February 25, 2014 - link

    Yes, we shall see what Power8 delivers and when.. Its already a year late according to IBM's "3-year cadence". Power7 is 4 years old this month! As for Mainframe, its not about performance, it’s about uptime but at some point, you can get uptime through clustering and redundancy and then performance becomes the issue. We once did a POC comparing performance of latest Mainframe vs SPARC M6 and we estimated SPARC M6-32 to be 2-3x higher MIPs! as you can imagine, customer is migrating.
  • Kevin G - Tuesday, February 25, 2014 - link

    Everyone has been suffering delays with chips it seems. Intel even with their process advantage looks to be a 9 month to a year beyond schedule for their 14 nm roll out. IBM/TSMC/GF/Samsung are similarly behind in their roll out of 22/20 nm class logic.

    There has been a desire for ages to get off of mainframes in some industries. Reliability is 'good enough' and performance is better but the reason some don't migrate is simply software costs. I used to work in such a shop and the mainframe hanged around due to the extensive cost of porting and validating all the legacy software. Also 'if it ain't broke, don't fix it' was a theme at that place and well, the mainframe was never broken. I figure that many main frame shops fall into that category.

    A decked out M6-32 out running a mainframe in some tests by 2x within reason for some CPU tests. I'm more curious as to what specific workloads they were. In IO bound tests, the mainframe is still competitive due to raw amount of coprocessors and dedicated hardware thrown into the niche. Flash in the enterprise have helped narrowed the IO gap significantly but I don't think it has managed to surpass the ancient mainframe architecture.
  • PowerTrumps - Monday, February 24, 2014 - link

    Probably because most of their numbers have held up by and large to the competition. Unlike Sun SPARC and now Oracle SPARC which had disappeared from the benchmark scene for years with T1-T3 and most Fujitsu based servers. Oracle had cherry picked obscure benchmarks with T4 and now with T5 they have had a lot to make up. So, although you make it sound impressive let's not forget the past and the gap that needed to be filled.
  • Phil_Oracle - Tuesday, February 25, 2014 - link

    I'm a 15 yr Sun veteran now at Oracle so yes, I agree that in past, with older generation SPARC, especially the first generation T-Series, Sun only benchmarked where the T-Series did well and avoided benchmarks where it didn't as it was designed for web tier workloads. That was 5 generations ago! But that’s my point. A vendor isn't going to publish a poor or worse looking result that previous version so every vendor "cherry picks" as you say, Not having a benchmark tells me that either the previous version is better, new version isn't that much better or its worse (whether in throughput, per/core, etc). In any case, the more benchmarks, the better sign that its leading.. And while SPARC T4 was really the first Oracle SPARC developed processor, it caught up to competing CPUs, and with SPARC T5 and even SPARC M6, its hard to argue that SPARC T5 is not leading. With 16 x cores, 8 x threads/core @ 3.6GHz, and glue less scalability to 8-sockets, and SPARC M6 @ 12-cores, 8 x threads/core up to 32-sockets and now almost a year old, Intels latest Xeon Ivybridge-EX has finally caught up, but in certain areas, like DB and middleware performance, still lacking in benchmark proof points to show its superior. And as for Power8, well, we'll just have to wait and see what the systems will deliver and when. Clearly they are aiming at SPARC for high end, now that Itanium is all but dead, and on entry-mid range, competing against Xeon.
  • thunng8 - Friday, February 21, 2014 - link

    Great for intel that they have finally marginally overtaken a several year old IBM box in the sap sd benchmark. Only trouble is the 2.5x faster POWER8 (compared to POWER7) is coming in the next few months.
  • extide - Friday, February 21, 2014 - link

    Keep in mind that IBM POWER Chips are typically 200-250W TDP Chips. So yeah, on a performance per watt scale, these are quite impressive!
  • Kevin G - Friday, February 21, 2014 - link

    POWER7 is 200W and POWER7+ is 180W. Still higher than Intel but not as bad as you'd think.
  • JohanAnandtech - Saturday, February 22, 2014 - link

    Do you have a source for that? It is pretty hard to find good info on those CPUs. Or I have missed it somehow.
  • Kevin G - Saturday, February 22, 2014 - link

    IBM, like Intel, bins chips by power consumption. It looks like there are indeed 250W POWER7's but they do scale down to 150W.

    800W MCM for super computing, 200W POWER7 die @ 3.83 Ghz
    http://www.theregister.co.uk/Print/2009/11/27/ibm_...
    The final shipping speed was 3.83 Ghz which falls into the 3.5 to 4.0 Ghz range target in the article.

    250W for high end boxes & 150W for blade systems:
    http://www.realworldtech.com/forum/?threadid=12393...
    Note that this was an early IBM paper and that 300W per socket figure could have been provisioning for future dual die POWER7+ modules

    250W for POWER7 @ 4.0 Ghz and 250W for POWER7+ @ 4.5 Ghz:
    http://www-05.ibm.com/cz/events/febannouncement201...

    I'm trying to find the source to the 180W POWER7+ figure. The difficulty is that it appeared in a discussion about Intel's Poulson Itanium which consumes 10W less.

Log in

Don't have an account? Sign up now