Conclusion

The first impression that the Xeon 7500 series made on the world was seriously blurred. Part of the reason is that the testing platform had a firmware bug that decreased the memory bandwidth by 20% and more. Another reason were the weird benchmarking choices of reviewers. Lightwave, folding@home and Cinebench were somehow popular measuring sticks portraying the Xeon X7560 as the more expensive and at the same time slower brother of the Xeon X5670. That kind of software is run mostly on sub $4000 workstations and cheap 1U server farms, and we seriously doubt that anyone in their right mind would spend $30,000 on a server to run these kind of workloads.

Our own benchmarking was not complete either, as our virtualization benchmarking fell short of giving 32—let alone 64—threads enough work. Still, the impressive SAP S&D benchmark numbers, one of the most reliable and most relevant industry standard benchmarks out there, made it clear to us that we should give the Xeon X7560 another chance to prove itself.

Our new virtualization benchmark vApus Mark II shows that we should give credit where it is due: servers based on the X7560 are really impressive when consolidating services using virtualization: a quad Xeon X7560 can offer 2.3 times better performance than the best dual socket systems today! You might even call the performance numbers historical: for the first time in history, Intel’s multi-socket servers run circles around the dual socket servers. Remember how the quad Xeon 7200 hardly outperformed the dual Xeon 5300 at the end of 2006, and how the quad 7400 was humiliated by the dual Xeon X5500 in 2009? And even if we go even further back in history, the Xeon MP never outperformed the dual socket offerings by a large margin. Memory capacity and RAS features were almost always the main selling points. For the first time, scalability is more than just a hollow phrase; a Xeon X7560 server can replace two or more smaller servers in terms of memory capacity and processing power.

The end result is that these servers can be attractive for people who are not the traditional high-end server buyers. Using a few quad Xeon X7560 servers instead of a lot of dual socket servers to consolidate your software services may turn out to be a very healthy strategy. Based on our current data, two quad Xeon X7560 ($65k- $70k) are worth about five Xeon 5600 servers ($50k-$65k). The acquisitions costs are slightly higher, but you need fewer physical servers and that lowers the management costs somewhat. There are two questions that remain:

1) How bad or good is the power/performance ratio?

2) If RAS is not your top priority, does a quad Opteron 6174 make more sense?

A Dell R815 with four twelve-core Opteron 6174 processors has arrived in our labs. So our search for the best virtualization building block continues.

 

A big thanks to Tijl Deneut and Dieter Vandroemme.

The Virtualization Landscape So Far
Comments Locked

51 Comments

View All Comments

  • haplo602 - Wednesday, August 11, 2010 - link

    This is one of the bottlenecks of your virtualised environemnt. A storage solution is only the limit if you do not use it as it was designed to be used.

    the more IO demanding application you have, the less virtualisation is going to offer any benefits. usualy CPU power is the last issue after netwrok, disk and memory.

    I had a good laugh at the opening page. High end servers are High end not because of the increased performance but because of the better management and disaster tolerance/recovery they offer. After all, they use the same CPUs and memory as the low end servers, just everything else is different (OLRAD, hot swap/plug of almost anything except memory and CPU).
  • webdev511 - Thursday, August 12, 2010 - link

    Well, if you're willing to spend some more money on Solid State (if you go with two twelve core cpus you'll save on licences) you could stuff four of the new Fusion IO 1.28 TB Duo Drives into the box and map them as System Drives and then use attached storage for big files.
  • SomeITguy - Wednesday, August 11, 2010 - link

    No offense intended, and I know this will put you on the defensive, but it sounds to me like the "development environment" was ill conceived in the design phase. You obviously overbought on processor power. The first step in designing an environment, is knowing what your apps need. You can't just buy servers, then whine about how poorly the performance matches the overall system capability...

    Last job I had Citrix Xen on HP blades with 53xx and 54xx CPU's, running about 150 production VM's. On the order of >300 total, with R&D and QA. The company had no money, and because of that we only ran local storage for the OS and most functions. The shared data we did have were on Netapps, and that alone constantly spiked up to +25k IOPS. I can't remember were each blade sat on IOPS, but it was high. I was able to balance resources utilized most of the day to about the ~60% level, with spikes hitting the high 80's. No resources being overly wasted. To do this effectively takes time and patience. You need to economize. 12 VM's on a blade with 16GB of memory was not unheard of...

    Then there is the whole ESX thing, eh, won't get into that. Again, you need to know what is going to run on the servers before you spend (waste) money.

    In my experience, It's typical that managers just override the lowly sysadmin advice, take a vendors word over the sysadmin who manages the app, or a business unit buys you the equipment without consulting, then says "here, make it work".

    Overall, I thought the article good. It is just a guide, not a bible.
  • davegraham - Tuesday, August 10, 2010 - link

    So, i'm sitting here with a spanking new Dell R815 which is a quad socket G34 system and is shipping today w/ AMD Opteron 6176SE parts...so, this article is outdated even before it begins. (oh, did i mention it's only 2RU?)

    I'm also very curious as to what the underlying storage is for all these tests as it definitely can have an impact on the servicability of the testing.

    I'm curious as to the details per VM was well...IOMMU choices, HT sharing, NUMA settings, as well as the version of ESX being used?

    dave
  • JohanAnandtech - Wednesday, August 11, 2010 - link

    "So, i'm sitting here with a spanking new Dell R815 which is a quad socket G34 system and is shipping today w/ AMD Opteron 6176SE parts...so, this article is outdated even before it begins. (oh, did i mention it's only 2RU?)"

    Testing servers is not like testing videocards. I can not plug the R815 in a ready installed windows pc and push the button of "Servermark". It does not work that way as you indicate yourself. A complete storage system must be set up, and in many cases ESX fails to install the first time on a brand new server. We perform a whole battery of monitoring tests for example that confirm that the DQL is low enough.

    The storage system we use for the 4 tile test is a 8 disk SSD system for the OLTP tests (described in this article). The VMs themselves sit on a separate RAID controller connect to a promise JBOD. The JBOD has 8 15000 rpm SAS disks. The only really disk intensive app is Swingbench in this test, and by making sure both data and logs get their separate SSD , we achieve DQLs under 0.1. There is lot more to the Oracle config, but if you are interested, we can share the parameter file.

    Anyway, the low DQL and the fact that we scale well from 2 tot 4 tiles shows that we are not limited by the disks.
  • davegraham - Wednesday, August 11, 2010 - link

    johan,

    I work with VMware for a living doing platform testing for the product i support. ;) consequently, I'm very well aware of the requirements for testing VMware and the various and sundry components within the server. Hence, my slightly critical view of what you're doing here.

    appreciate the response on the storage....again, all well and good with that explanation.

    I'll put my quad socket 6176SE system against your 7500 system anyday and i'll enjoy lower rack footprint, lower power consumption, and a positively brilliant VMware experience. ;)

    keep up the good work.

    dave
  • blue_falcon - Wednesday, August 11, 2010 - link

    If you wan to do a similar 2U config, try the R810, only has 32 dimm sockets but nearly identical to the R910.
  • mapesdhs - Tuesday, August 10, 2010 - link


    Johan, how would this system compare to a low-end quad-socket Altix UV 10? (max
    RAM = 512GB).

    Ian.
  • JohanAnandtech - Wednesday, August 11, 2010 - link

    I never tested an SGI server, so I can not say for sure. But the hardware looks (and probably is) identical to what we have tested here.
  • Casper42 - Wednesday, August 11, 2010 - link

    Due to the way Dell implemented the memory on their latest Quad socket machines, if you run 2 CPUs with the FlexMem bridge, you get full memory bandwidth but half of the memory sockets are further away from the CPU due to the extra trace length of going to the empty CPU socket and through the FlexMem bridge.

    When you put in 4 CPUs you only get half the memory bandwidth of an Intel reference design. This is because the traces that would normally go to the empty CPU socket and through the FlexMem now go essentially nowhere because the CPU in that socket needs the access instead.

    I would say try IBM or HP. Just beware that IBM does some weird stuff when it comes to their Max5 memory expansion module that can also cause additional memory latency for some of the DIMM sockets and not the others.

Log in

Don't have an account? Sign up now